Excel: Calculating Linear RegressionLast reviewed: November 2, 1994Article ID: Q51026 |
SUMMARYMicrosoft Excel can be very useful when you need to make predictions of future data based on historical data. One of the most common techniques used for making such predictions is that of linear regression.
MORE INFORMATIONFor example, assume you have accumulated eight years' worth of sales figures for a company. These figures are as follows:
| A | B ------------------------ 1 | Year | Quantity Sold 2 | 1 | 1250 3 | 2 | 1540 4 | 3 | 1798 5 | 4 | 3300 6 | 5 | 4000 7 | 6 | 3700 8 | 7 | 3100 9 | 8 | 3484From these numbers, you want to predict the sales for the next years (years 9, 10, and 11). To accomplish this, use the Excel TREND() function. The TREND() function accepts arrays of known y's and known x's, and returns a linear trend of values along the best fit line for these x's and y's. (Note: The cells in column C are formatted as "0.00".)
| A | B | C ------------------------------------- 1 | Year | Quantity Sold | Predicted 2 | 1 | 1250 | 1528.00 3 | 2 | 1540 | 1883.29 4 | 3 | 1798 | 2238.57 5 | 4 | 3300 | 2593.86 6 | 5 | 4000 | 2949.14 7 | 6 | 3700 | 3304.43 8 | 7 | 3100 | 3659.71 9 | 8 | 3484 | 4015.00 10 | 9 | | 4370.29 11 | 10 | | 4725.57 12 | 11 | | 5080.86In cells C2:C12, enter the formula =TREND(B2:B9,A2:A9,A2:A12) as an array (by selecting C2:C12, typing the formula, and pressing COMMAND+ENTER). In this formula, cells B2:B9 are the known y's, A2:A9 are the known x's, and A2:A12 are the new x's (the points on the line for which we want to have y values returned). You can now chart the data to view your results. If you're using Excel 2.20, do the following:
6000.00| | | 5000.00| Predicted ==> ! | ___ / | / /\ % <== Actual 4000.00| / / \/ | |/ | /| 3000.00| / / | / | | / / 2000.00| / _/ |/ _/ ! / 1000.00% | | 0.00| ---------------------- 1 2 3 4 5 6 7 8 9 10 11After predicting the values, you need to determine how accurately the predicted line reflects the actual data. To do this, calculate the coefficient of determination, or the "R squared" value, as follows:
r2=sum(Yest-Yavg)2/sum(Y-Yavg)2To begin with, calculate the average for the y's (Quantity Sold). In cell B15, enter the formula =AVERAGE(B2:B9) and receive the result of 2771.50. The worksheet now looks like the following:
| A | B | C -------------------------------------- 1 | Year | Quantity Sold | Predicted 2 | 1 | 1250 | 1528.00 3 | 2 | 1540 | 1883.29 4 | 3 | 1798 | 2238.57 5 | 4 | 3300 | 2593.86 6 | 5 | 4000 | 2949.14 7 | 6 | 3700 | 3304.43 8 | 7 | 3100 | 3659.71 9 | 8 | 3484 | 4015.00 10 | 9 | | 4370.29 11 | 10 | | 4725.57 12 | 11 | | 5080.86 13 | | | 14 | | | 15 | Yavg | | 2771.5Next, establish two columns to represent the two parts of your equation. In column D, enter the formula for (Yest-Yavg)2, and in column E, enter the formula for (Y-Yavg)2. In cell D2, enter the formula =(C2-$B$15)^2 and then fill the formula through cell D9. In cell E2, enter the formula =(B2-$B$15)^2 and then fill this formula down through cell E9. Now, calculate the coefficient of determination in cell E15 by entering the formula =SUM(D2:D9)/SUM(E2:E9). (Note: Columns C, D, and E are formatted as "0.00", and cell E15 is formatted as "General".)
| | A | B | C | D | E | | 1 | Year | Quantity Sold | Predicted | (Yest-Yavg)^2 | (Y-Yavg)^2 | | 2 | 1 | 1250 | 1528.00 | 1546292.25 | 2314962.25 | | 3 | 2 | 1540 | 1883.29 | 788924.62 | 1516592.25 | | 4 | 3 | 1798 | 2238.57 | 284012.86 | 947702.25 | | 5 | 4 | 3300 | 2593.86 | 31556.98 | 279312.25 | | 6 | 5 | 4000 | 2949.14 | 31556.98 | 1509212.25 | | 7 | 6 | 3700 | 3304.43 | 284012.86 | 862112.25 | | 8 | 7 | 3100 | 3659.71 | 788924.62 | 107912.25 | | 9 | 8 | 3484 | 4015.00 | 1546292.25 | 507656.25 | |10 | 9 | | 4370.29 | | | |11 | 10 | | 4725.57 | | | |12 | 11 | | 5080.86 | | | |13 | | | | | | |14 | | | | | | |15 | Yavg | 2771.5 | | | 0.658952019| The closer the R squared value comes to 1, the better the fit of data points to the predicted values. In this case, the R squared value returned is .65895, which represents a fairly good fit, but also indicates that our predicted values are not very reliable.
|
KBCategory: kbother
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |