Last updated
Last updated
Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications. It has been a very widely applied spreadsheet for these platforms, especially since version 5 in 1993, and it has replaced Lotus 1-2-3 as the industry standard for spreadsheets. Excel forms part of the Microsoft Office suite of software.
Microsoft Excel is a powerful tool that will handle most tasks that are useful for a journalist who needs to analyze data to discover interesting patterns. These tasks include: Sorting Filtering Using math and text functions Excel will handle large amounts of data that is organized in table form, with rows and columns. The columns (which are labeled A, B, C…) list the variables (like Name, Age, Number of Crimes, etc.) Typically, the first row holds the names of the variables. The rest of the rows are for the individual records or cases being analyzed. Each cell (like A1) holds a piece of data.
Modern versions of Excel will hold as many as 1,048,576 records with as many as 16,384 variables! An Excel spreadsheet also will hold multiple tables on separate sheets, which are tabbed on the bottom of the page.
#Sorting
One of the most useful abilities of Excel is to sort the data into a more revealing order. Too often, we are given lists that are in alphabetical order, which is useful only for finding a particular record in a long list. In journalism, we usually are more interested in extremes: The most, the least, the biggest, the smallest, the best, the worst. Consider the data used in this workshop, a list of the provinces of Italy showing the number of various kinds of crimes reported during a recent year. Here is how it looks sorted in alphabetical order of province name:
Far more interesting would be to sort it in descending order of the total number of murders, with the most violent city at the top of the list:
There are two methods of sorting. The first method is quick and can be used for sorting by a single variable. Put the cursor in the column you wish to sort by (“Murders” in this case) and then click the A-Z icon:
You’ll get a window that looks like this:
Sort in whichever order you want. But beware! Put the cursor in the column, but DO NOT select the column letter (D, in this case) and then sort. Consider the example below:
You will get that warning message, but don’t choose “Continue with the current selection”. That will sort ONLY the data in that column, thereby disordering your data! The other method of sorting is for when you want to sort by more than one variable. For instance, suppose we wish to sort the crime data first by Territory in alphabetical order, but then by “Murders” in descending order within each Territory. To do that, go to the toolbar, click on the “Data” tab and then the “Sort” icon, and then choose the variables by which you wish to sort. (Click “Add level” to add new levels.) Then click “OK”.
The result will be this:
Sometimes you want to examine only particular records from a large collection of data. For that, you can use Excel’s Filter tool. On the toolbar, go to the “Data” tab, then click “Filter”. Small buttons will appear at the top of each column:
Suppose we wish to see only the records from the Territory of Lazio. Click on the button on the Territory column, uncheck the “Select All” box, and then choose Lazio from the list, like this:
This is the result:
Notice that you now are seeing only rows 36, 44, 78, 80 and 104. The rest are still there, but hidden. More complicated filters are possible. For instance, suppose you wish to see only records in which “Burglaries” is greater than or equal to 5,000 AND car thefts is less than 2,000. You start by filtering Burglaries like this:
then this…
Do the same for Car Thefts, and you get this:
Excel has many built-in functions useful for performing math calculations and working with dates and text. For instance, assume that we wish to calculate the total number of murders in all the provinces. To do this, we would go to the bottom of Column D, skip a row, and then enter this formula in Cell D106: =SUM(D2:D104). The equals sign (=) is necessary for all functions. The colon (:) means “all the numbers from Cell D2 to Cell D104”. After you hit Enter, the result is this:
(The reason for skipping a row is to separate the sum from the main table so that the table can be sorted without pulling the sum into the table during the sorting operation. This way the sum will stay at the bottom of the column. Often you will want to do a calculation on each row of your data table. For instance, you might want to calculate the auto theft rate (the number of cars stolen per 100,000 population), which would let you compare the auto theft problem in cities of different sizes. To do this, we would create a new variable called “Car Theft Rate per 100k” in Column J, the first empty column. Then, in Cell J2, we would enter this formula: =(G2/C2)*100000. This divides the stolen cars by the population, then multiplies the result by 100,000. (Notice that there are no spaces and no thousands separators used in the formula.) Here is the result:
Notice that the formula changes for each row, so that the Row 5 formula is =G5/C5*100000, and so on. That’s what makes Excel so powerful — the ability to change formulas as you copy down or across. Now, if we sort by Car Theft Rate in descending order, we see the cities with the worst auto theft problems:
and sorting in ascending order, the least crime:
Here are some other useful Excel functions that can be used in similar ways:
(You can add, subtract, multiply or divide by using the symbols + – * and /)
=AVERAGE – calculates the arithmetic mean of a column or row of numbers
=MEDIAN – finds the middle value of a column or row of numbers
=COUNT – tells you how many items there are in a column or row
=MAX – tells you the largest value in a column or row
=MIN – tells you the smallest value in a column or row
There are also a variety of text functions that can join and cut apart text strings. For instance:
If “Steve” is in Cell B2 and “Doig” is in Cell C2, then =B2&” “&C2 will produce “Steve Doig”. And =C2&”, “&B2 will produce “Doig, Steve”. Other text functions include:
=SEARCH – this will find the start of a desired string of text in a larger string.
=LEN – this will tell you how many characters are in a text string.
=LEFT – this will extract however many characters you specify starting from the left.
=RIGHT — this will extract characters starting from the right.
=MID — this will start extract where you tell it to start, and get as many characters as you specify.
You can format your numbers using various choices in this box under the “Home” tab: It would be very tedious to repeat writing that calculation in each of 103 rows of data. Happily, Excel has a way to rapidly copy a formula down a column of cells. To do that, you careful move the cursor (normally a big fat white cross) to the dot on the bottom right corner of the cell containing the formula. When it is in the right spot, the cursor will change to a small black cross. At that point, you can double-click and the formula will copy down the column until it reaches a blank cell in the column to the left. This would be the result:
You can also do date arithmetic, such as calculating the number of days or years between two dates, or hours, minutes and/or seconds between two times. For instance, to calculate on April 24, 2014, the age in years of someone whose birth date is in cell B2, you could use this formula: =(DATE(2014,4,24)-B2)/365.25. The first part of the formula calculates the number of days between the two dates, then that is divided by 365.25 (the .25 accounts for leap years) to produce the years. Another useful date function is =WEEKDAY, which will tell you on which day of the week a chosen date falls. For instance =WEEKDAY(DATE(1948,4,21)) returns a 4, which means I was born on a Wednesday.