XL: Regression Analysis and Best Fit Lines (XE0124)Last reviewed: February 2, 1998Article ID: Q103839 |
The information in this article applies to:
SUMMARYThe Application Note "Regression Analysis and Best Fit Lines" (XE0124), discusses how to use Microsoft Excel functions to perform simple, multiple, and polynomial regression analysis. It contains examples of how to use LINEST(), LOGEST(), TREND(), and GROWTH() to describe a best fit line or curve and to make predictions about your data. It also outlines some of the new statistical functions and tools available with Microsoft Excel versions 4.0 and later. This Application Note is available for download from the Microsoft Software Library:
~ XE0124.exe (size: 61612 bytes)You can also obtain this Application Note from the following sources:
ARTICLE-ID: Q119591 TITLE : How to Obtain Microsoft Support Files from Online ServicesIf you are unable to access the source(s) listed above, you can have this Application Note mailed or faxed to you by calling Microsoft Product Support Services Monday through Friday, 6:00 A.M. to 6:00 P.M. Pacific time at (425) 635-7070. If you are outside the United States, contact the Microsoft subsidiary for your area. To locate your subsidiary, see the Microsoft World Wide Offices Web site at:
http://www.microsoft.com/worldwide/default.htm TEXT OF XE0124The following is the full text of the Application Note. (Note that some graphics and formatting may be missing from this version of the document. For best results, download the Microsoft Word version of this document.)
OVERVIEW ========This Application Note discusses how to use Microsoft Excel functions to perform simple, multiple, and polynomial regression analysis. It contains examples of how to use LINEST, LOGEST, TREND, and GROWTH to describe a best-fit line or curve and to make predictions about your data. It also outlines some of the built-in statistical functions and tools available with Microsoft Excel.
General Information To Use the Analysis ToolPak Choosing the Best Function When Your Data Is Linear When Your Data Is Exponential When Your Data Is Curvilinear Simple Regression Analysis Describing a Best-fit Line Finding the Slope and the Y Intercept Calculating a Best-fit Line Using TREND Using LINEST Plotting the Best-fit Line Plotting the Trendline Automatically Predicting Future Values Using FORECAST Using TREND Using LINEST Multiple Regression Analysis Predicting Y-Values Using TREND Using LINEST Polynomial Regression Analysis Calculating a Polynomial Curve Charting a Polynomial Curve Using Regression Statistics Using LINEST/LOGEST for Regression Statistics Using R2 to Test Regression Model Accuracy GENERAL INFORMATIONRegression is a statistical method used to predict values based on relationships in existing data. By analyzing how a single dependent variable (y) is affected by the values of one or more independent variables (x), you can predict what y will be given x. You can use this information to fit a line or a curve to your existing data and to forecast future values. The LINEST, TREND, LOGEST, and GROWTH functions are the primary functions you will use to perform regression analysis in Microsoft Excel. While this Application Note focuses primarily on the functions that can be used in Microsoft Excel versions 3.0 and later, Microsoft Excel versions 4.0 and later offer several new functions and tools that you can use to perform regression analysis and to create best-fit lines. When one of these new functions can be used to perform a task described in this Application Note, the function will be noted in the appropriate section. The following table lists some of these new functions.
Use this To do this function ---------------------------------------------- Return the correlation coefficient CORREL for two arrays of cells Return a single predicted y-value FORECAST based on a linear regression of known x and y ranges Return the y intercept of the linear INTERCEPT regression line Calculate R2, the coefficient of RSQ Determination Return the slope of the linear SLOPE regression line Return the standard error of the STEYX Regression Table 1--Regression Analysis Functions in Microsoft Excel Versions 4.0 and later.In addition, the Analysis ToolPak add-in provides a special set of analysis tools, including tools to accomplish the following tasks.
Use this To do this analysis tool ------------------------------------------------------- Predict a value based on the forecast Exponential for the prior period, adjusted for the Smoothing error in that prior forecast Project values in the forecast period Moving Average based on the average value of the variable over a specific number of preceding periods Perform linear regression analysis and Regression return statistics and plots as specified Table 2--Analysis ToolPak Add-in Features To Use the Analysis ToolPakIn Microsoft Excel 5.0 and later:
In Microsoft Excel 4.0:
CHOOSING THE BEST FUNCTIONWhether you are performing simple regression (one x variable), multiple regression (two or more x variables), or polynomial regression (one x variable raised to different powers), you will get the most accurate results if the function that you choose to regress your data is based on the patterns in your existing data.
When Your Data Is LinearYour data is linear if the rate of change in your data is even to such an extent that when you plot it in a chart, the pattern in your data points resembles a line. If your data is linear, use the linear regression functions, LINEST and TREND. Both functions use the "least squares" method to calculate a straight line that best fits your data. LINEST returns information about the line, such as its slope and y intercept, and TREND returns predicted values along the line. In Microsoft Excel versions 4.0 and later, the Regression tool (in the Analysis ToolPak add-in) performs linear regression, returns regression statistics, calculates best-fit lines, and creates best-fit line charts.
When Your Data Is ExponentialYour data is exponential if the rate of change in your data, when plotted on a chart, resembles a curve that rises or falls at an increasingly higher rate. If your data is exponential, use the logarithmic regression functions, LOGEST and GROWTH. LOGEST calculates an exponential curve that best-fits your data and, like LINEST, returns information about the curve. Like TREND, GROWTH returns predicted values along the curve.
When Your Data Is CurvilinearTo most accurately predict values when the pattern in your data is neither linear nor exponential, use polynomial regression in conjunction with the TREND function to calculate a best-fit curve. For example, use this method if, when you plot your data in a chart, it resembles a curve for which the rate of change is not dramatic or if your data fluctuates in such a way that no linear or curved pattern can be identified.
SIMPLE REGRESSION ANALYSISYour regression analysis is "simple" if you have only one independent x variable for each dependent y variable. For example, assume you are analyzing the sales figures for the first six months of operation for Wingtip Toys, a company that specializes in the design and manufacture of toys.
NOTE: The following examples primarily use the LINEST and TREND functions. Wherever these two functions are discussed, LOGEST and GROWTH can be substituted if your data is exponentially curved and if a curve fit would be more accurate than a straight line.In the following sample data, the values in the Month column are the independent x variables and the values in the Sales column are the dependent y variables. Based on this data, you can describe, calcu late, and plot a best-fit line, and you can then predict future sales figures. Because the data is linear, you will use the LINEST and TREND functions to perform the regression analysis. The Regression tool in Microsoft Excel version 4.0 and later performs each of these tasks automatically. For additional information on calculating regression, see the following references.
Version of Microsoft Excel Reference ------------------------------------------------------------ 97, 98 In Help, search for "Regression, Analysis" 7.0 In Help, search for "Regression" 5.0 In Help, search for "Regression " 4.0 User's Guide 2, pages 41–45Because this tool performs linear regression, if your data resembles an exponential curve, use LOGEST and GROWTH. Following are the sales figures for Wingtip Toys and the corresponding months in both table and chart form.
A B 1 Month Sales 2 1 $4,200 3 2 $6,100 4 3 $7,300 5 4 $7,300 6 5 $8,700 7 6 $10,500 Table 3--Sample Data (Sales Figures for Wingtip Toys) DESCRIBING A BEST-FIT LINEThe equation of a straight line is y=mx+b, where m is the slope and b is the y intercept. LINEST returns the slope (m) and y intercept (b) values that describe the line derived from your existing data. Microsoft Excel versions 4.0 and later provide specific SLOPE and INTERCEPT functions for calculating the slope and the y intercept when your data is linear. For additional information, see the following references.
Version of Microsoft Excel Reference ------------------------------------------------------------ 97, 98 In Help, search for "slope" or "intercept" 7.0 In Help, search for "slope" or "intercept" 5.0 In Help, search for "slope" or "intercept" 4.0 Function Reference Guide, pages 405–406 NOTE: If your data is exponentially curved, use LOGEST to return the slope (m) and y intercept (b) values that describe the curve. The equation used by LOGEST is y=b*m^x. Finding the Slope and the Y InterceptTo calculate the values of the slope (m) and y intercept (b), use the procedure appropriate for your version of Microsoft Excel. Microsoft Excel 4.0 and later:
Microsoft Excel 3.0:
NOTE: LINEST also takes other optional arguments that are not necessary for this example.The result 1122.857, in E2, is the slope, and the result 3420, in cell F2, is where the line crosses the y-axis (y intercept).
E F 1 Slope Y intercept 2 1122.857 3420 Table 4--Example of Slope Intercept Values CALCULATING A BEST-FIT LINEIf your data is linear, use TREND or LINEST to calculate your best-fit line. In Microsoft Excel versions 4.0 and later, you can also use the FORECAST function (forecast is mainly useful for finding a data point based on existing data, but can it also be used as a substitute for the TREND function). If your data fits an exponential curve, use LOGEST or GROWTH.
Using TRENDThe TREND function is the easiest and most efficient function for calculating the points along a best-fit line. To simultaneously calculate all the values on the best-fit line, do the following:
NOTE: TREND also takes other optional arguments that aren't necessary for this example. Using LINESTYou can also use the slope (m) and the y intercept (b) values returned by LINEST to find data points on the best-fit line by substituting the y- values or the x-values into the equation for a line (y=mx+b). By plugging each month number into this formula, you can calculate all the data points for your best-fit line. With the slope(m) value (1122.857) in cell E2 and the intercept (b) value (3420) in cell F2, do the following to generate the points on your best-fit line:
The following table shows the results of the values returned when you use TREND and LINEST.
A B C D E F ----------------------------------------------------------------------- 1 Month Sales Predicted Y Predicted Y Slope Y Intercept TREND LINEST ----------------------------------------------------------------------- 2 1 $4,200 $4,543 $4,543 1122.857 3420 3 2 $6,100 $5,666 $5,666 4 3 $7,300 $6,789 $6,789 5 4 $7,300 $7,911 $7,911 6 5 $8,700 $9,031 $9,031 7 6 $10,500 $10,157 $10,157 Table 5--Results of the Values Returned When You Use TREND and LINEST. NOTE: The returned values for TREND and LINEST are identical. PLOTTING THE BEST-FIT LINEOnce you have calculated the values on your best-fit line, you can add that line to your existing chart by copying cells C1:C7 and pasting them into your existing chart. The resulting chart will have a straight line (best-fit line) running through your original data.
NOTE: When you display the points of a best-fit line against the original data in a chart, in most cases you will get the best results by using an xy (scatter) chart. If you use a line chart, the x-values will be treated as labels rather than as values, and curved lines may result. Plotting the Trendline AutomaticallyIn Microsoft Excel versions 5.0 and later, you can insert a trendline directly into your chart without having to first calculate the points or copy and past those points into the chart. Do the following to automatically insert a trendline in a chart:
PREDICTING FUTURE VALUESIn addition to returning values along the line fitted to your existing data, you can use TREND and LINEST to predict future values. In Microsoft Excel 4.0 and later, you can also use the FORECAST function to predict future values. Using the Wingtip Toys example, suppose you want to calculate sales figures for months 7, 8, and 9. The following examples show how to accomplish this using the FORECAST, TREND, and LINEST functions, respectively. To predict values for months 7, 8, and 9, first enter the month numbers for which you want predicted sales figures and then use the FORECAST function to calculate the values.
Using FORECAST
The resulting values in cells B8:B10 are the predicted sales for the next three months.
A B 1 Month Sales 2 1 $4,200 3 2 $6,100 4 3 $7,300 5 4 $7,300 6 5 $8,700 7 6 $10,500 8 7 $11,280 9 8 $12,403 10 9 $13,526 Table 6--Sample Data Using TREND to Predict Future Values Using TREND
=TREND(B2:B7,A2:A7,A8:A10) NOTE: Because the function returns data to more than one cell, you must enter the formula as an array by pressing CTRL+SHIFT+ENTER in Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel for the Macintosh. The third argument in the TREND function is the array containing the new x-values for which you want to derive predicted y-values. The resulting values in cells B8:B10 are the predicted sales for the next three months.
A B 1 Month Sales 2 1 $4,200 3 2 $6,100 4 3 $7,300 5 4 $7,300 6 5 $8,700 7 6 $10,500 8 7 $11,280 9 8 $12,403 10 9 $13,526 Table 7--Sample Data Using TREND to Predict Future Values NOTE: In Microsoft Excel versions 4.0 and later, you can use the AutoFill feature to predict future values. Using the data in the previous table, if you wanted to predict sales for months 7, 8, and 9, you would select cells B2:B7, select the AutoFill handle in the lower- right corner of the selected area, and drag down three additional cells. (The AutoFill method is by far the easiest method to use for predicting values; however, if you use the formulas, it is easier to tell which values are derived and which values are static). The functions give you more power and flexibility than the AutoFill feature does. CAUTION: In addition to returning predicted values for months 7, 8, and 9, the data in cells B2:B7 will be overwritten with the values that represent the best-fit line. If you do not want your original data to be overwritten, copy it to a separate area on your worksheet and then use AutoFill. Using LINESTTo obtain the new y-values, you can also substitute the slope and y intercept values that you derived with the LINEST function (these results are on page 5) and the new x-values (7, 8, and 9) into the formula, y=mx+b. See Using LINEST in the "Calculating a Best-fit Line" section for step-by- step instructions on how to do this.
MULTIPLE REGRESSION ANALYSIS ============================When you have two or more independent x variables for each y variable, the regression analysis is considered multiple. For example, you could predict a child's weight given his or her age and height. Assume you've collected the following data
A B C 1 Age Height Weight 2 3 32 35 3 5 40 40 4 6 39 43 5 10 50 70 Table 8--Sample Data: Age, Height, and Weight of Childwhere the values under Weight (C2:C5) represent the dependent y variables and the values under Age and Height (A2:B5) represent the independent x variables.
PREDICTING Y-VALUESYou can use either the TREND or the LINEST function to analyze the relationship of the age and height to weight, and you can make predictions based on the results of this analysis. In Microsoft Excel 4.0 and later, the Regression tool can also be used to predict y-values in a multiple regression model.
NOTE: Do not use the FORECAST function because it only works for simple regression.For additional information on predicting values, see the following references.
Version Of Microsoft Excel Reference --------------------------------------------------------------- 97, 98 In Help, search for "Multiple, Regression" 7.0 In Help, search for "Multiple Regression" 5.0 In Help, search for Multiple Regression" 4.0 User's Guide 2, pages 41-45 Using TRENDUsing TREND to Predict a Child's Weight: This example uses the data in Table 8--Sample Data: Age, Height, and Weight of Child. To use TREND to predict the weight of a 9-year-old, 45-inch child, do the following:
Using LINESTTo predict a y-value with LINEST, you must first calculate the slopes for each x variable and find the y intercept. Because a slope is returned for each x variable, when you use the LINEST function, you must first select a range of cells that consist of a single row and a single column plus an additional column for each x variable in your data table. In this example, because you have two x variables, you will need to select a range of three cells, three columns wide by one row tall. Using LINEST to Calculate a Child's Weight: This example uses the data in Table 8--Sample Data: Age, Height, and Weight of Child. To calculate the slopes and the y intercepts, select cells A7:C7 and type the following formula:
=LINEST(C2:C5,A2:B5)NOTE: Because the function returns data to more than one cell, you must enter the formula as an array by pressing CTRL+SHIFT+ENTER in Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel for the Macintosh. The following values will be returned
A B C 7 -0.32 5.98 24where -0.32 is the slope for the second x variable (height), 5.98 is the slope for the first x variable (age), and 24 is the y intercept.
NOTE: The slopes are in reverse order: the first slope value corresponds to the second x variable and the second slope value corresponds to the first x variable. The LINEST function always returns the slopes in reverse order when more than one x variable is involved.You can use the slope values and the y intercept value to make predictions based on your data. Using the formula, y=(m1*x1)+(m2*x2)+(mn*xn)+b, you can predict the weight of a 9-year old, 45-inch child:
=(9*5.98)+(45*-0.32)+24The result of the formula, 63.42, is the predicted weight. Similar to LINEST, the Regression tool in Microsoft Excel versions 4.0 and later returns the slope values and the y intercept value. As described previously, you can plug these values into the formula, y=mx+b, to predict y.
NOTE: Because the x variables are independent, there may not be a good graphical representation for a multiple regression model. Each x-value can be plotted with its corresponding y-value, but the individual lines may be completely unrelated, and, therefore, may be meaningless. POLYNOMIAL REGRESSION ANALYSIS ==============================When your data is neither exponentially curved nor consistently linear, use the polynomial method of regression. When you plot a best- fit curve calculated with polynomial regression, the curve will rise and fall with the data.
CALCULATING A POLYNOMIAL CURVETo calculate a polynomial curve, the dependent y variable is regressed against the independent x variable raised to different powers. To illustrate this process, take the following example. Fitting a straight line to the following data would not accurately predict the sales for any given month.
A B 1 Month Sales 2 1 $4,200 3 2 $1,600 4 3 $5,120 5 4 $4,500 6 5 $5,400 7 6 $1,460 Table 9--Six-Month Sales Figures Wingtip ToysIn this case, you will get the best results by setting up the following polynomial regression model.
A B C D E F 10 X X^2 X^3 X^4 Sales Trend 11 1 1 1 1 $4,200 $4,089 12 2 4 8 16 $1,600 $2,154 13 3 9 27 81 $5,120 $4,011 14 4 16 64 256 $4,500 $5,609 15 5 25 125 625 $5,400 $4,846 16 6 36 216 1296 $1,460 $1,571 Table 10--Polynomial Regression Model for Wingtip Sales FiguresThe values in cells A11:A16 are the month numbers copied from cells A2:A7 of Table 9--Six-Month Sales Figures Wingtip Toys. The values in cells B11:D16 are the original x variables raised to the second, third, and fourth powers, respectively. To obtain these values, do the following:
=TREND(E11:E16,A11:D16) NOTE: Because the function returns data to more than one cell, you must enter the formula as an array by pressing CTRL+SHIFT+ENTER in Microsoft Excel for Windows or COMMAND+RETURN in Microsoft Excel for the Macintosh. CHARTING A POLYNOMIAL CURVETo add the TREND results to this chart, select cells F10:F16, and then copy and paste them into your existing chart.
USING REGRESSION STATISTICS ===========================The LINEST and LOGEST functions can return additional regression statistics that can be helpful in using and evaluating your regression model. If you are using Microsoft Excel 4.0 or later and have linear data, you can use the Regression Tool from the Analysis ToolPak add- in. This tool will automatically return all the regression statistics. If your data resembles an exponential curve, use LOGEST to return accurate regression statistics.
USING LINEST/LOGEST FOR REGRESSION STATISTICSTo return the additional statistics using LINEST or LOGEST, you must select a range that includes five rows and a single column plus an additional column for each x variable in your data. In addition, the stats argument, which is the fourth argument in both of these functions, must be set to TRUE. The following table lists the ages, weights, and heights of a number of children.
A B C 1 Age Height Weight 2 3 32 35 3 5 40 40 4 6 39 43 5 10 50 70 Table 11--Sample Data: Age, Height, and Weight of ChildTo return the additional regression statistics using the data from "Table 11--Sample Data: Age, Height, and Weight of Child" use the following steps:
The resulting data should resemble the data in the following table.
D E F 1 -0.32 5.98 24 2 2.243569 5.647619 57.68449 3 .950813 6.024948 #N/A 4 9.665289 1 #N/A 5 701.7 36.3 #N/A Table 12--Regression StatisticsThe first row of the statistics contains the slope for the height, the slope for the age, and the y intercept. The second row contains the standard error of the slopes and of the y intercept. The third row contains R2 and the standard error for the y estimate. The fourth row contains the F statistic and degrees of freedom. And, the fifth row contains the regression sum of squares and the residual sum of squares.
USING R2 TO TEST REGRESSION MODEL ACCURACYA particularly useful statistic returned is the coefficient of determination called R2. In Microsoft Excel versions 4.0 and later, you can also use the RSQ function to find R2. This R2 indicator ranges in value from 0 to 1 and reveals how closely the estimated y-values correlate to your actual y-values. The closer R2 is to 1, the more perfect the correlation—this correlation indicates that the regression equation is very useful in accurately predicting a y-value. On the other hand, the closer R2 is to 0, the less helpful it will be in predicting a y-value. In the previous example, the value for R2 returned by LINEST is .95, an excellent correlation. This indicates that, based on the collected data, the LINEST model can be used to make extremely accurate predictions of a child's weight given a specific age and height. The disk and software contained on it, including any accompanying documentation (the "Software"), are provided to you at no additional charge. Microsoft Corporation owns all rights, title, and interest in and to the Software. The user assumes the entire risk as to the accuracy and the use of the Software.
###COPYRIGHT NOTICE. Copyright © 1992-1997 Microsoft Corporation. Microsoft and/or its suppliers, One Microsoft Way, Redmond, Washington 98052-6399 U.S.A. All rights reserved. TRADEMARKS. Microsoft, Windows, Windows NT, MSN, The Microsoft Network and/or other Microsoft products referenced herein are either trademarks or registered trademarks of Microsoft. Other product and company names mentioned herein may be the trademarks of their respective owners. The names of companies, products, people, characters and/or data mentioned herein are fictitious and are in no way intended to represent any real individual, company, product or event, unless otherwise noted. NO WARRANTY. THE SOFTWARE IS PROVIDED "AS-IS," WITHOUT WARRANTY OF ANY KIND, AND ANY USE OF THIS SOFTWARE PRODUCT IS AT YOUR OWN RISK. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, MICROSOFT AND ITS SUPPLIERS DISCLAIM ALL WARRANTIES AND CONDITIONS, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NON- INFRINGEMENT, WITH REGARD TO THE SOFTWARE. LIMITATION OF LIABILITY. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL MICROSOFT OR ITS SUPPLIERS BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT, OR CONSEQUENTIAL DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR ANY OTHER PECUNIARY LOSS) ARISING OUT OF THE USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF MICROSOFT HAS BEEN ADVISED OF THE POSSIBLITY OF SUCH DAMAGES. BECAUSE SOME STATES AND JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY. MICROSOFT’S ENTIRE LIABILITY AND YOUR EXCLUSIVE REMEDY UNDER THIS EULA SHALL NOT EXCEED FIVE DOLLARS (US$5.00). The following conditions also apply to your use of the Software: The Software may be copied and distributed internally only, subject to the following conditions: All text must be copied without modification and all pages must be included; If software is included, all files on the disk(s) |must be copied without modification [the MS-DOS(R) utility diskcopy is appropriate for this purpose]; All components of this Software must be distributed together; and This Software may not be distributed to any third party. If you are not a Microsoft Premier customer, Microsoft shall not provide technical support for this Software. The Software is provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by the Government is subject to restrictions set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and (2) of the Commercial Computer Software—Restricted Rights at 48 CFR 52.227- 19, as applicable. Manufacturer is Microsoft Corporation, One Microsoft Way, Redmond, WA 98052-6399. Any transfer of the Software must be accompanied by this statement and may only be transferred if first approved by Microsoft. You agree that you will not export or re-export the Software to any country, person, entity or end user subject to U.S.A. export restrictions, and you are responsible for complying with all applicable U.S. and local export laws in connection with the use of this Software. You warrant and represent that neither the U.S.A. Bureau of Export Administration nor any other federal agency has suspended, revoked or denied you export privileges. This EULA is governed by the laws of the State of Washington, U.S.A.
|
Additional query words: appnote regress stats tool pack toolpack graph
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |