Problems with "averages"

According to the Oxford Dictionary, an average is “a number expressing the central or typical value in a set of data…” There are several ways to calculate an average, but most people, at least non-mathematicians, use the word average to mean the arithmetic mean; or the sum of the values divided by the count of the values. The mean represents where a fulcrum would have to be placed on a number-line in order to get it to balance if we stacked blocks on it; one for each data point.

Consider the example of five factory workers sitting at a table in the cafeteria eating lunch. They all have the same job so they all have similar paychecks. Maybe one takes home $2,000, two more take home $2,100, the fourth takes home $2,200, and the fifth has the most seniority and takes home $2,600. The arithmetic mean would be $2,200 [(2000+2100+2100+2200+2600)/5)

If we were to balance a number line on a fulcrum and place one block on 2000, two blocks on 2100, and one block each on 2200 and 2600; the number line would balance at 2,200.

This is a pretty good measure because all of the data is fairly close together. However, what would happen if the third guy won the lottery next week and his income jumped to $11,100. The arithmetic mean would jump to $4,000 [(2000+2100+11,100+2200+2600)/5] but because the third guy’s salary is so much higher than the others, it skews the average up. In this case a median, which is just the number in the middle when the data is sorted, would better represent the amount of money factory workers make.

Arithmetic means are sensitive to outlaying data. This tendency is often misused. For instance, in a country with a very large and growing differential between the average person and a rich person, the party in power may want to use the arithmetic mean of salaries to describe family income. They would do this because the super-high salary figures from the rich will tend to pull up the average, even though the “average” person’s income may have actually dropped over the last year.

In my opinion, there is no such thing as a bad statistic, there are only bad contexts. A statistic is just a number; it says what it says. But when we don’t have the context, for instance when we don’t know if there are outliers that are artificially adjusting the number, the message can be a lie, even if the math supports the number.

See you next time…