Thursday, January 17, 2008

Calculating correlation co-efficients

The correlation co-efficient for a set of pairs of data provides a measure of the strength and direction (positive or negative) of the linear relationship between two variables. The most commonly used is the Pearson correlation co-efficient, which uses a least squares method of calculating the dispersion of the data pairs from a theoretical straight-line (linear) relationship.

The correlation co-efficient, r, for a set of (x,y) data pairs is calculated as follows:


The following steps can be followed:
  1. Calculate the average values for both x (x*) & y (y*);
  2. For each row, calculate (x – x*) and (y – y*);
  3. For each row, calculate (x-x*)2, (y – y*)2, and (x-x*)(y-y*);
  4. Add up the values in each column, and store the totals
  5. For both x & y values, calculate the standard deviation, sx and sy, using the totals for (x-x*)2 and (y-y*)2 dividing each by the number of rows, and taking the square root of the results.
  6. Calculate r using the total for the column of (x-x*)(y-y*) and dividing by (n*sx*sy) where n is the number of rows in the table (i.e. the number of x,y pairs.

The following table should help to illustrate the calculation:


x

y

(x-x*)

(x-x*)2

(y-y*)

(y-y*)2

(x-x*)(y-y*)

1

8.56

-4

16

1.865556

3.480297531

-7.46222222

2

8.23

-3

9

1.535556

2.357930864

-4.60666667

3

7.62

-2

4

0.925556

0.856653086

-1.85111111

4

7.12

-1

1

0.425556

0.181097531

-0.42555556

5

6.99

0

0

0.295556

0.087353086

0

6

7.05

1

1

0.355556

0.126419753

0.355555556

7

4.98

2

4

-1.71444

2.939319753

-3.42888889

8

5.37

3

9

-1.32444

1.754153086

-3.97333333

9

4.33

4

16

-2.36444

5.590597531

-9.45777778

mean x*

5


Total

60


17.37382222

-30.85

mean y*


6.694444

std dev

2.581989


1.38939724



r=

-0.9555



In the above table, the x values represent the number of guests; and the y values represent the conversion rate given as a percentage. The columns headed by (x-x*)2 and (y-y*)2 are used in the calculation of the standard deviations for x and y – sx and sy. Once the last column is calculated, the values are totaled, giving the numerator (upper value of the fraction) in the equation for r.

For the above example, r is calculated as:


The use of the correlation co-efficient enables a determination as to whether or not there exists a relationship between the variables. A strong correlation does not indicate a causal relationship in the data; although causal relationships show strong correlation.

Note: the value for the correlation co-efficient r can range from -1 to 1. A value towards either end of the range indicates a strong correlation between the variables; values close to 0 indicate very little or no correlation.

1 comment:

BasiaBernstein said...

Correlation is computed into what is known as the correlation coefficient, which ranges between -1 and +1. Perfect positive correlation (a correlation co-efficient of +1) implies that as one security moves, either up or down, the other security will move in lockstep, in the same direction.
perason correlation