The correlation co-efficient, r, for a set of (x,y) data pairs is calculated as follows:
The following steps can be followed:
- Calculate the average values for both x (x*) & y (y*);
- For each row, calculate (x – x*) and (y – y*);
- For each row, calculate (x-x*)2, (y – y*)2, and (x-x*)(y-y*);
- Add up the values in each column, and store the totals
- For both x & y values, calculate the standard deviation, sx and sy, using the totals for (x-x*)2 and (y-y*)2 dividing each by the number of rows, and taking the square root of the results.
- Calculate r using the total for the column of (x-x*)(y-y*) and dividing by (n*sx*sy) where n is the number of rows in the table (i.e. the number of x,y pairs.
The following table should help to illustrate the calculation:
| x | y | (x-x*) | (x-x*)2 | (y-y*) | (y-y*)2 | (x-x*)(y-y*) |
1 | 8.56 | -4 | 16 | 1.865556 | 3.480297531 | -7.46222222 | |
2 | 8.23 | -3 | 9 | 1.535556 | 2.357930864 | -4.60666667 | |
3 | 7.62 | -2 | 4 | 0.925556 | 0.856653086 | -1.85111111 | |
4 | 7.12 | -1 | 1 | 0.425556 | 0.181097531 | -0.42555556 | |
5 | 6.99 | 0 | 0 | 0.295556 | 0.087353086 | 0 | |
6 | 7.05 | 1 | 1 | 0.355556 | 0.126419753 | 0.355555556 | |
7 | 4.98 | 2 | 4 | -1.71444 | 2.939319753 | -3.42888889 | |
8 | 5.37 | 3 | 9 | -1.32444 | 1.754153086 | -3.97333333 | |
9 | 4.33 | 4 | 16 | -2.36444 | 5.590597531 | -9.45777778 | |
mean x* | 5 | | Total | 60 | | 17.37382222 | -30.85 |
mean y* | | 6.694444 | std dev | 2.581989 | | 1.38939724 | |
| r= | -0.9555 | |
In the above table, the x values represent the number of guests; and the y values represent the conversion rate given as a percentage. The columns headed by (x-x*)2 and (y-y*)2 are used in the calculation of the standard deviations for x and y – sx and sy. Once the last column is calculated, the values are totaled, giving the numerator (upper value of the fraction) in the equation for r.
For the above example, r is calculated as:
The use of the correlation co-efficient enables a determination as to whether or not there exists a relationship between the variables. A strong correlation does not indicate a causal relationship in the data; although causal relationships show strong correlation.
Note: the value for the correlation co-efficient r can range from -1 to 1. A value towards either end of the range indicates a strong correlation between the variables; values close to 0 indicate very little or no correlation.
1 comment:
Correlation is computed into what is known as the correlation coefficient, which ranges between -1 and +1. Perfect positive correlation (a correlation co-efficient of +1) implies that as one security moves, either up or down, the other security will move in lockstep, in the same direction.
perason correlation
Post a Comment