Average's help us understand the 'middle' of a set of numbers, but they have another purpose too: to estimated the value we could expect to see if we selected a single value at random. There are five commonly-used 'averages':
- mode: the most commonly-occurring value
- median: the middle value, when all values are ordered
- mean: a calculated value, summing all observed values and dividing by the number of values
- weighted average: a mean where some values are given greater weight than others
- moving average: a mean where only the last n values are included in the calculation.
Modes
Modes are typically used when the data is categorical, in whatever form. For example, when analysing the data from a survey we might have a response to Gender with Male (78) and Female (64). The mode in this case would be male.
We might also have a situation where it isn't meaningful to report a figure that isn't a whole number - which could easily occur in each of the other types of average. Say, for example, our data is recording the number of tertiary qualifications held by UX practitioners. Our responses for range from 0 upwards. Lets say we have the following table of responses.
Qualifications | Respondents |
0 | 7 |
1 | 26 |
2 | 23 |
3 | 9 |
4 | 1 |
n | 66 |
Now, we could calculate a mean, or a median for that matter, but it doesn't make sense to report that UX practitioners have, on average, 1.560606 tertiary qualifications. Either you have a qualification; or you don't. In this case, the mode, 1 qualification, slightly under-represents what we would expect to receive in response if we asked a UX practitioner the question at random.
Medians
The term median means, literally, middle. To find the median value, we rank all of the observed values in order, and select the value that falls in the middle of the ordered list. In the above example, this would look something like: 000000011111111111111111111111111222222222222222222222223333333334. The middle of this sequence falls between 1 and 2, so our median value is half-way between these, i.e. 1.5.
Now, as mentioned above, this doesn't make practical sense, but it does illustrate the concept of medians. It also illustrates the need to take care when calculating an average!
Means
The mean is what most people think of when they hear the term 'average'. It is also called the 'arithmetic mean' and is calculated by adding up all of the observations and dividing by the number of observations.
A mean is very useful for characterizing the expected value of a collection of observations. It can accommodate the most common forms of measured data - that being continuous data. For example, the time-to-completion for a usability task can calculate a mean figure, which will provide a meaningful value regardless of the result. In our previous article on time-to-completion data, we calculated a mean figure of 143.8725s.
Weighted Average
There are occasions when we need to give more weight to one set of observations versus another. An example might be the page view data for a Web site as a predictor for tomorrow's traffic. Many web sites are cyclical in terms of the peaks and troughs in their visitor numbers. So in trying to determine tomorrow's traffic, today's traffic numbers are less important than, say, a week ago's traffic.
Lets say our traffic looked a little like this:
Monday | 12,358 |
Tuesday | 14,122 |
Wednesday | 14,823 |
Thursday | 13,905 |
Friday | 13,733 |
Saturday | 11,064 |
Sunday | 8,899 |
A straight mean calculation places as much importance on last Monday's traffic as it does on yesterday's. However, from a forecasting perspective, last Monday is likely to be a better indicator for this Monday, so we can give it more weight, like so:
Page Views | Weighting | Weighted page views | |
Monday | 12358 | 4 | 49432 |
Tuesday | 14122 | 1 | 14122 |
Wednesday | 14823 | 1 | 14823 |
Thursday | 13905 | 1 | 13905 |
Friday | 13733 | 1 | 13733 |
Saturday | 11064 | 1 | 11064 |
Sunday | 8899 | 1 | 8899 |
10 | 12597.8 |
We've increased the influence that last Monday's observation will have on the predicted value by giving it a weighting factor of 4. In doing so we increase the overall number of 'observations' from 7 to 10. Our weighted average is 12,597.8, as a forecast for this Monday's traffic. This compares to a straight mean of 12,700.57. So our weighting provides us with a reduced prediction.
Moving Average
The last type of average helps us to deal with time series data - observations made over a period on a regular basis. It recognises that when calculating an expected value, the most recent observations are likely to be better predictors than data going back to the earliest observations made. This is particularly true of something like Web site traffic data, where the overall size of the pool of potential visitors is increasing, so we would expect the overall traffic to be increasing also.
Moving averages are used frequently in economics, particularly with respect to share prices where the high volatility of the stock makes historical data meaningless.
A moving average is usually calculated on an on-going basis using the last n observations. Examples might be to use a 5-day moving average, or a 20-day moving average as part of our analysis. So lets say we are tracking our page view data (from above) for a period of three weeks. The 5-day average would be calculated after 5-days:
Page Views | 5-day ave | |
Monday | 12,358 | |
Tuesday | 14,122 | |
Wednesday | 14,823 | |
Thursday | 13,905 | |
Friday | 13,733 | 13,788.2 |
Saturday | 11,064 | 13,529.4 |
Sunday | 8,899 | 12,484.8 |
Monday | 12,589 | 12,038.0 |
Tuesday | 14,222 | 12,101.4 |
Wednesday | 14,813 | 12,317.4 |
Thursday | 14,099 | 12,924.4 |
Friday | 14,011 | 13,946.8 |
Saturday | 10,781 | 13,585.2 |
Sunday | 9,203 | 12,581.4 |
Monday | 12,993 | 12,217.4 |
Tuesday | 14,330 | 12,263.6 |
Wednesday | 15,198 | 12,501.0 |
Thursday | 14,078 | 13,160.4 |
Friday | 14,215 | 14,162.8 |
Saturday | 11,144 | 13,793.0 |
Sunday | 9,126 | 12,752.2 |
You can see that the average changes each day, which is the point.
Different types of 'average' are useful in different circumstances: something that we'll touch on in future articles in the series.
No comments:
Post a Comment