How to Calculate the Correlation Coefficient in ExcelLast reviewed: November 4, 1994Article ID: Q66403 |
SUMMARYThe formula for the correlation coefficient is as follows: {=SQRT(SUM((TREND(Y's,X's)-AVERAGE(Y's))^2/SUM((Y's-AVERAGE(Y's))^2))} This MUST be entered as an array formula. The example assumes the following:
Known-Ys Known-Xs -------- -------- A1: 3100 B1: 3000 A2: 4500 B2: 4000 A3: 4400 B3: 5000 A4: 5400 B4: 6000 A5: 7500 B5: 7000 A6: 8100 B6: 8000{=SQRT(SUM((TREND(A1:A6,B1-B6)-AVERAGE(A1:A6)) ^2/SUM((A1:A6-AVERAGE(A1:A6))^2)))} In this case, the known "Y" values are close to the estimated "Y" values. Applying the above formula for the correlation coefficient returns a value of .9335. This value is quite close to +1, indicating that the TREND function returns Y values that are good predictions.
MORE INFORMATIONYou may want to know how closely the TREND function predicts new Ys according to your actual data. The correlation coefficient is a measure used primarily in psychological and educational research and helps to predict the "error" of the New Ys. The correlation coefficient is plus or minus the square root of the coefficient of determination. The correlation coefficient gives an approximation of the error of the Known-Ys in relation to the New-Ys (which are returned by the TREND function). The correlation coefficient is a number between -1 and +1. A value close to 0 means that the residuals (distance between the Known-Ys and New-Ys) are quite large, and that the plotted points are relatively far from the regression line. In other words, the New-Ys do not predict the Known-Ys very well, and the model (best-fit line) is a failure. Conversely, if the coefficient is close to +1, the residuals are relatively small and the plotted points are close to the regression line. In such a situation, the model is a relatively good predictor of the dependent variable (Known-Ys).
|
Additional reference words: 2.00 2.01 2.10 2.10c 2.10d 3.00 2.20 2.21
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |