Saturday, April 5, 2008

Cause & correlation

There is a common misconception in quantitative analysis - and in the general media - that because two things are strongly correlated, either positively or negatively, there exists a causal relationship between the two. That is, changes in one thing are causing changes in another.

A causal relationship is something like: I stub my toe, and my toe hurts. A lightning strike can cause bush-fires. For example, it has been shown that the incidence of crime is higher during hot weather. But it would not be true to say that hot weather causes crime.

We can measure the extent to which two variables are correlated using Pearson's correlation co-efficient. This value takes two variables and looks at how closely a change in one variable is mirrored in the other. For example, we might record the daily temperature and the incidence of crime (ignoring the possibility of reporting errors that might arise from the different weather conditions).

Plotting these on a graph might illustrate the presence of a relationship between these two variables. The correlation co-efficient quantifies the strength of this relationship.

However, there are varying degrees of causality and none of them are reliant on the correlation co-efficient as the determining factor.

Firstly, a variable might be a necessary condition if it must be present in order for the other condition (the outcome) to occur. An outcome might have several necessary conditions before it might eventuate. In the absence of any one, the outcome does not occur.

Alternatively, a variable might be a sufficient condition if it is enough to trigger the outcome. For example, a lightning strike is sufficient to start a bush-fire, although it is hardly necessary - bush fires can be started in any number of ways.

The strongest causal relationships exist when a condition is both necessary and sufficient to create an outcome. Getting shot by a gun is a necessary & sufficient condition for a gun-shot wound - to cite a trivial example.

In our user experience work, we often come across circumstances where we record an outcome and look for conditions to help explain why. For example, the movement of an ad banner might coincide with an increase in the click-through rate. As humans we would naturally assign a causal relationship between the new position and the increase. But, it is coincidence only. If it happens consistently, we might suspect that there's something deeper going on.

However, it is important to understand that there's no quantitative test that one can perform that proves a causal relationship between an event and an outcome. You can determine the strong likelihood that a causal relationship exists - and you can search for an exception that proves the lack of a causal relationship - but no proof exists.

So, be skeptical when you hear people quote a correlation co-efficient and then start to talk about cause.

[For a philosophical discussion of causality, Wikipedia offers a good starting point.]