This Document Serves As a Reference Guide (Or Tool) for Using the Author S Regression Analysis

Total Page:16

File Type:pdf, Size:1020Kb

This Document Serves As a Reference Guide (Or Tool) for Using the Author S Regression Analysis

1. Introduction

This document serves as a reference guide (or tool) for using the author’s “Regression Analysis —Instructional Resource.” As explained in the companion journal article, the resource is divided into two primary parts:

 Part One deals with the use of regression analysis to estimate simple linear cost functions, the use of Excel for estimating these functions, interpretation of regression-related output associated with cost estimation, and alternatives for estimating costs based on a regression model fit to a set of data. This portion of the resource consists of the following four files:

(1) a set of PowerPoint slides (“Estimating Linear Cost Functions”) that provides an overview of simple (one-variable) cost functions and OLS regression analysis;

(2) an Excel file (“Estimating Linear Cost Functions Using Excel”) that discusses five Excel-based methods that can be used to estimate a simple linear cost function;

(3) a Word file (“Cost Estimation and Statistical Issues—Regression Analysis”) that addresses three separate cost-estimation and statistical issues (five options in Excel for generating cost estimates after a regression analysis has been performed; an analysis of changes in the standard error of the regression, SE, as sample size, n, changes; and, constructing confidence intervals around point estimates); and,

(4) an Excel file (“Change in SE as n increases”) that can be used in conjunction with item (3) above.

 Part Two deals with estimating one form of non-linear cost function: the incremental unit-time learning-curve model. This portion of the instructional resource consists of the following three files:

(1) a PowerPoint file (“Estimating Learning-Curve Cost Functions”), which provides a review of logarithms and a discussion of common forms of learning-curve models;

(2) a Word file (“Example—Estimating a Learning-Curve Function”), which provides a discussion of two procedures that can be used within Excel to estimate a learning- curve model; and (3) an Excel file (“Learning-Curve Analysis [“Incremental Unit-Time Model]), which provides a worked example of using Excel to fit a learning-curve model to a set of data and a basis for discussing the interpretation and use of the estimated coefficients in this model.

A more detailed explanation of the above seven files is provided below.

2. Part One: Estimating Simple Linear (One-Variable) Cost Functions

As indicated above, this portion of the resource package consists of four files: one set of PowerPoint slides, two Excel files, and one Word document.

2.1. PowerPoint slides

The PowerPoint deck (“Estimating Linear Cost Functions”) consists of 14 numbered slides. Slides 2 through 4 provide a broad overview of regression analysis, as applied to the task of estimating simple (i.e., one-variable) linear cost functions. Slides 5 through 10 provide an overview of calculating and interpreting both the standard error of the regression (SE) and the coefficient of determination, R2.1,2 In discussing these slides with students, I relate the discussion to two statistical measures they should have learned from statistics: the mean (as a measure of central tendency) and the standard deviation (as a measure of variability). Specifically, I relate the determination of R2 and SE to these two points. This is an initial attempt to relate (or anchor) the topic at hand to something students should have already studied. In my experience, the context of cost estimation (a business-related topic) helps to “demystify” the discussion. In terms of the mean, the point can be made to students that one option is to use the mean value of the data set as an estimate of the dependent variable (cost), regardless of the value assumed by the cost driver (X). Nothing prohibits the cost analyst from doing this! The R2 value can then be interpreted as the average (percentage) increase in accuracy when using the regression equation to estimate values of Y (within the set of data) rather than the mean value, . It is then possible to indicate that, as learned from statistics, the standard deviation (or variance) for the data set at hand is a measure of the variability (or dispersion) of the actual Y values around the mean value, . In similar fashion, the standard error of the regression (SE) represents the dispersion of the actual Y values, not around the mean value of Y but around the OLS regression line.

1 Slides 9 and 10 refer, respectively, to the SE and R2 associated with a regression analysis of the data set contained in the Excel file “Estimating Linear Cost Functions.” As such, these two slides are linked to the related Excel file.

2 Students can access either of the following sources for an explanation of the coefficient of determination: http://www.ehow.com/how_8241563_calculate-squared-regression.html or http://www.ehow.com/how_5148712_calculate-r.html. Use of the RSQ built-in function in Excel to estimate the coefficient of determination is available at http://www.ehow.com/how_8498030_calculate-r2-excel.html.

Page 3 of 12 With this as background regarding the notion of variance (dispersion), the discussion can then turn to the formula for calculating R2.3 At this stage, it is useful to make the point that OLS regression receives its name by virtue of how it determines the “line of best fit” through any set of data. That is, OLS produces a function that minimizes SE. Because I previously show students the relationship between SE and R2, the point can also be made here that OLS regression produces the cost function that maximizes the calculated R2 value. Finally, the instructor can make the point that SE is of principal interest because the information contained therein can be used to construct confidence intervals around point estimates generated by the user’s regression function.

Slide 11 deals with the importance of graphing the data (i.e., preparing a scatter graph or scatter diagram) as a preliminary step to the application of regression analysis, while slide 12 presents a listing of four Excel-based methods that can be used to fit a linear regression model to a set of data. Slides 13 and 14 complete the deck by referencing the three other files associated with this component of the learning resource: an Excel file titled “Estimating Linear Cost Functions Using Excel,” a Word file titled “Cost Estimation and Statistical Issues—Regression Analysis,” and an Excel file4 titled “Change in SE as n increases.” Each of these three files is explained more fully below.5

2.2. Excel file: Estimating linear cost functions using Excel

As indicated by the title, this file provides a comprehensive tutorial on the use of Excel for estimating simple (i.e., one-variable) linear cost functions. A data set consisting of 14 observations (cost and associated cost-driver level) is used to illustrate the use of each of the following five methods of fitting a linear regression model to the set of supplied data:

 CHART Option  Built-in Regression routine  The LINEST built-in function

3 It is also appropriate here to anticipate the discussion of the output of Excel’s REGRESSION routine in the form of an ANOVA table. That is to say, I first attempt to ensure that students understand the notion of decomposing the total variability of Y into explained and unexplained portions and that while I present to them the formulas for calculating both the SE and R2, the calculations are done effortlessly using Excel. Put another way, I try to ensure a conceptual understanding of the mechanics of OLS regression before transitioning to Excel and a reinforcement of these concepts through an actual regression analysis that students can perform.

4 This Excel file is embedded, as an object, in the aforementioned Word file titled “Cost Estimation and Statistical Issues—Regression Analysis.” This embedded file can be accessed by double-clicking on the icon.

5 If the instructor chooses to use less than the full complement of files for this component of the resource, these last two PowerPoint slides would have to be adjusted accordingly.  The SOLVER routine  SLOPE, INTERCEPT, and RSQ built-in functions

Throughout the referenced Excel file, citations to online video clips and other supplementary documents regarding regression and regression-related topics are provided for the benefit of students. This ensures that students have a rich source of information at their disposal as they go through the cost-estimation process.

2.2.1. Using the CHART Option in Excel

The CHART option6 requires students to prepare and properly label (using whatever enhancements they deem appropriate) a scatterplot of the data. The instructor at this point can query students as to whether, based on the scatter plot of the data, it makes sense to fit a linear equation to the data set. Following this, students use the “Add Trendline” option to fit a linear equation to the dataset.7 Finally, students are instructed to use the option to “display the estimated cost function” and its associated coefficient of determination (R2) on the chart itself.8 During class, the instructor can (after clicking anywhere within the constructed chart) go through selected options under “Chart Tools” (under any of three categories: “Design,” “Layout,” and “Format”).

2.2.2. Using the REGRESSION Routine in Excel

Students then proceed to use the REGRESSION routine in Excel, which is accessed by going to “Data” and then “Data Analysis.”9 Summary output consists of the coefficient estimates

6 Tutorials for using the CHART option in Excel are available at http://office.microsoft.com/en-us/excel-help/how- to-create-a-basic-chart-in-excel-2010-RZ102559017.aspx?CTT=1&client=1, www.ehow.com/video_12309398_ insert-chart-excel-2010.html, and at http://www.dummies.com/how-to/content/the-essentials-of-working-with-excel- 2010-charts.html.

7 Tutorials for adding a trend line to an Excel chart are available at http://www.youtube.com/watch? v=ExfknNCvBYg, http://www.ehow.com/how_7224125_graph-trend-analysis-microsoft-excel.html, and at http://www.ehow.com/how_11403400_draw-trendline-excel.html .

8 See http://www.ehow.com/how_12115127_calculate-r-squared-measurements-excel.html. As noted above in footnote #2, the coefficient of determination (R2) for a linear regression model can also be generated through application of RSQ function built into Excel.

9 Clips presenting an introduction to ordinary least squares (OLS) regression analysis are available to students at: http://www.youtube.com/watch?v=ZkjP5RJLQF4, http://www.youtube.com/watch?v=Qa2APhWjQPc and http://www.youtube.com/watch?v=kHZBy1uVNnM. Tutorials for using the REGRESSION routine in Excel are available at: http://www.youtube.com/watch?v=ExfknNCvBYg (this clip shows both the use of the CHART function and the REGRESSION function in Excel), http://blog.yojimbocorp.com/2012/05/03/linear-regression-with- excel-2010/, http://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel, and http://www.excel- easy.com/examples/regression.html .

Page 5 of 12 (for the variable cost rate, b, and the fixed-cost component, a), R2 for the estimated cost function, and a complete ANOVA (Analysis of Variance) table.10 The Excel file includes a rich set of supplementary notes, designed principally to “demystify” the discussion for students. That is, I show in the Excel file precisely how the REGRESSION routine generated estimates of R2 and SE.11 As such, the discussion here can be linked back to the initial set of PowerPoint slides (discussed above). Students generally find this discussion illuminating. At the instructor’s discretion, a discussion can then be made of what I tell students are “sampling-related” issues: testing for the statistical significance of R2 (which, in a simple regression is equivalent to a test on the slope coefficient, that is, the variable cost rate, b),12 the reliability of the estimated slope coefficient (b), and the construction of confidence intervals around the coefficient estimates. Obviously, the instructor is free to cover all or as little as desired of this material.

2.2.3. Using the LINEST function in Excel

A third option for fitting a linear model to the data set consists of using the LINEST function in Excel.13 I sometimes cover this option in class principally because, to use this

10 Clips providing a discussion of regression-related output are available at http://www.youtube.com/watch? v=8R6UcK91Cec, http://www.youtube.com/watch?v=c5blVUkkjTM, and http://www.youtube.com/watch? v=aq8VU5KLmkY.

11 Clips providing a discussion of SE and R2, respectively, are available at http://www.youtube.com/watch? v=dJR1WqeBgCg and http://www.youtube.com/watch?v=aq8VU5KLmkY.

12 As shown in the Excel file, students can use the F.DIST.RT function in Excel to test for the statistical significance of the calculated R-squared value. The null hypothesis is that the population R-squared value is zero. The question we address is: given the sample, what is probability that such a value would occur if the population value for R-squared is zero. Students see from the Regression output that this test is an F-test. To conduct this test, we need three pieces of information: the F-statistic (from the Regression routine) and its associated degrees of freedom (both numerator and denominator). The F-statistic is defined as the ration of the MSR (mean square regression) to MSE (mean square error); the numerator degrees of freedom = k; the denominator degrees of freedom = n – k –1 (where k = the number of independent variables in the regression model) (Anderson et al., 1987, pp. 479- 480). These three pieces of information are then entered into the following formula (which is pasted into an open cell): =F.DIST.RT(F,dfn,dfd), where F = the calculated F-statistic from the Regression output, dfn = numerator degrees of freedom, and dfd = denominator degrees of freedom. The resulting value represent the probability of observing an F-statistic equal to F (or higher) if the population value for R-squared is zero. Basically, large values of the F-statistic cast doubt on the null hypothesis that the population R-squared value is zero (or, equivalently in the case of a simple regression model) that the population value of the slope coefficient in the cost equation is zero). An explanation of the F.DIST.RT built-in function in Excel is available at: http://www.excelfunctions.net/Excel-F-Dist- Rt-Function. html and http://msdn.microsoft.com/en-us/library/office/ff196140.aspx.

13 Useful supplementary sources for the LINEST function in Excel include the following: http://www.ehow.com/how_8454834_use-linest-excel.html; http://www.youtube.com/watch?v=K1uelkQ6D-o; http://www.youtube.com/watch?v=6wbcPbYbq6M; and, http://www.techonthenet.com/excel/formulas/linest.php. function, students must enter a command in the form of an array. The advantage of exposing students to the LINEST function in Excel is three-fold: one, they are exposed to the process of entering a formula as an array—something the students could leverage and apply in other contexts within Excel;14 two, use of the LINEST function in conjunction with either (or both) of the methods discussed above makes the point to students that in Excel there are different approaches or options to accomplishing designated tasks—a lesson that may be useful later in their studies and/or in professional practice; and three, it allows students to see that the regression results (estimated coefficients, standard errors of the coefficients, SE, R2, etc.) are consistent across methods used.15 Arrows inserted into the file allow students to readily “see” this consistency and therefore, it is hoped, demystify the process for them. 2.2.4. Using the SOLVER routine in Excel

Elsewhere, in both the undergraduate cost accounting course and in the MBA managerial accounting course I teach, I expose students to the use of the SOLVER routine in Excel.16 Previously, students learn from the regression module that OLS “works” by choosing the cost coefficients in a linear equation (i.e., a and b) such that the resulting equation minimizes the SE. This point is driven home to students by having them structure and solve a “constrained optimization” problem. Specifically, the SOLVER routine can be used to choose the two cost coefficients such that the resulting “sum of squared error terms” (that is, the SE) is minimized. Students “prove” this by using the SOLVER routine and to generate the coefficient estimates. They readily see that the coefficient estimates they generate using the SOLVER routine are precisely the same as those obtained using the alternative methods discussed above. As well, students “see” that the resulting error sum of squares (ESS) produced by the SOLVER routine is exactly the same as the ESS produced by the other methods. In this sense, then, students are better able to understand the “background mechanics” of the OLS method. I generally conclude this part of the in-class lecture by pointing out to students that the SOLVER routine will be used

14 The following clip provides background information on the use of array formulas and functions in Excel: http://www.youtube.com/watch?v=F2iS8fiqLao. Additional tutorials on the use of array formulas are available at http://office.microsoft.com/en-us/starter-help/create-or-delete-a-formula-HP010342373.aspx? CTT=1&client=1#_Toc251333379 and at http://www.dummies.com/how-to/content/how-to-build-an-array-formula- in-excel-2010.html.

15 Note that while the LINEST function in Excel generates all of this output, the individual components of the output are not labeled by Excel. For this reason, I have inserted into the Excel file “Estimating linear cost functions using Excel” boxes containing pertinent labels.

16 SOLVER is used in conjunction with the topic of choosing an optimum short-term product (or service) mix. Specifically, SOLVER is used to structure and solve a “constrained optimization” problem. Background information regarding the SOLVER routine in Excel is available at http://blogs.office.com/b/microsoft- excel/archive/2009/09/21/new-and-improved-solver.aspx and http://www.excel-easy.com/data-analysis/solver.html; the following video clips also provide useful information: http://www.youtube.com/watch?v=eQoPjlnuZ6o, http://www.youtube.com/watch?v=K4QkLA3sT1o, and http://www.youtube.com/watch?v=9G3MjOunLqQ.

Page 7 of 12 later in the course (determining optimal short-term product/service mix, as noted above) and in the operations management course they are taking (or will eventually take).

2.2.5. Using the SLOPE, INTERCEPT, and RSQ functions in Excel

Finally, I demonstrate to students that the built-in functions SLOPE, INTERCEPT, and RSQ in Excel can be used to produce regression results equivalent to those generated by application of the preceding methods.

2.3. Word file: Cost estimation and statistical issues—regression analysis

This file covers three topics: alternative cost-estimating procedures in Excel (i.e., alternative methods of estimating total cost, Y, given a value of the cost driver, X, after estimating a linear cost function); an analysis of changes in the standard error of the regression (SE) as the sample size, n, changes; and, building confidence intervals around point estimates for Y given a value of X. Each of these topics can be covered independently of the others, or skipped entirely if there are time constraints. As is the case with the aforementioned Excel file, the Word document contains numerous references to online resources (video clips, documents, etc.) that provide a rich source of information to students as they expand their knowledge of the above set of three topics. 2.3.1. Generating estimated cost data

As indicated in the Word file, there are five cost-estimation options in Excel: Trend Line Approach (after preparing a CHART from the user’s data set, the “Add Trend Line” option can be used to forecast backwards or forwards—this approach simply extends the Trend Line that is constructed using the CHART Option in Excel); Equation Approach (here there are two options: students can enter into a cell an equation that includes the regression coefficients generated using any of the four methods discussed earlier; alternatively, students can use the INDEX(LINEST…) function in Excel to generate and place in a designated cell the slope coefficient (variable cost rate, b) and a separate INDEX(LINEST…) function to generate and place in a designated cell the fixed-cost component of the cost function, a;17 Trend Function (the values of the cost driver, X, for which an estimated cost, Y, are to be generated, based on a linear regression analysis are included as one of the arguments in the Trend Function, the formula for which is pasted into an open cell);18 Formula Approach (the general formula, entered as an array, is: =SUM({b,a}* {x,1}), where b = the slope coefficient [variable cost rate], a = the fixed-cost component in the total cost function, and x = the cost driver value for which an estimate of total cost, Y, is needed;

17 Useful tutorials regarding the use of the INDEX function in Excel include: http://www.techonthenet.com/excel/formulas/index_function.php, http://office.microsoft.com/en-us/excel- help/video-index-function-VA102581137.aspx?CTT=1&client=1, or http://spreadsheets.about.com/od/lookupfunction1/ss/2011-03-02-excel-2010-index-function.htm . A useful explanatory source for using the LINEST function within INDEX is: http://www.mrexcel.com/forum/excel- questions/619168-having-trouble-understanding-linest-within-index.html. to calculate the two coefficients the aforementioned LINEST function can be used, in which case the above formula is rewritten as: =SUM({LINEST(B12:B19, A12: A19)} *{11,1}), where the cells B12:B19 refer to the values of the dependent variable, Y, and A12:A19 contain the related cost-driver amounts, X. The formula returns the estimated total cost, Y, for X =11, based on a linear equation fit to the data set found in A12:B19); and, Using the Forecast Function19 (the following formula can be used to forecast a value for Y (total cost) for a value of x (cost driver), after implicitly fitting a linear function to a data set: =FORECAST(x, known_y’s, known_x’s); note that the FORECAST function is similar to TREND, except that it is used to generate a single estimate, rather than an array of estimates (which could also be a single point).

2.3.2. Analysis of changes in SE as n changes

Students may ask: “how can we, as cost analysts, decrease the standard error of the regression, SE?” In general, this question seems to emanate from a desire to generate a more accurate cost function. Students might pose the question as: “If the line of best fit is determined by minimizing the SE, what strategies are available for decreasing SE?” To address this issue, I created (and embedded in the Word document as an object, as noted above) an Excel file titled “Change in SE as n increases.”

The discussion here begins with a presentation of the formula for calculating SE (i.e., the square root of the “mean squared error” (MSE), where MSE = the error sum of squares ÷ degrees of freedom, n – k – 1, where k = number of independent variables and n = sample size). From this, students can see that SE can be decreased either by decreasing the numerator in the MSE calculation OR by increasing the denominator (or by doing both). To illustrate these two points, I begin by calculating in the Excel file the SE associated with the following three observations (X,Y): (50, $250), (100, $310), and (150, $325). The generated regression equation based on this set of three data points is: Y = $220 + $0.75X.

The base-case situation in the Excel file “Change in SE as n changes” indicates that for the preceding cost function, SE = 18.37 (i.e., ). To show students how SE decreases as the sample size, n, increases (ceteris paribus), I then replicate the data set in the Excel file so there are six (rather than three) data points, which produces an SE of 12.99 (i.e., ). I tell students this is a “denominator effect.” Finally, to demonstrate the “numerator effect,” I hold constant the fact that there are six (rather than the initial three) observations. However, this time instead of replicating the initial data set, I generate in the Excel file three “abnormal” observations. For this 18 Background material regarding the use of the TREND function in Excel is available at: http://support.microsoft.com/kb/828801, http://www.excelfunctions.net/Excel-Trend-Function.html, http://www.ehow.com/how_2105842_use-excels-trend-function.html, http://www.ehow.com/how_5844333_use- trend-excel.html.

19 A discussion of the use of the FORECAST function in Excel can be found at http://www.techonthenet.com/excel/formulas/forecast.php and at http://support.microsoft.com/kb/828236.

Page 9 of 12 assumed data set, the SE increases to 24.94 (i.e., ), in spite of the fact that the sample size doubled. Thus, students should come to a realization that increasing the sample size does not necessarily decrease the SE! Finally, data are provided in the Excel file to show that whether and to what extent SE changes as n (the sample size) changes is a function of the rate of change in the numerator of the SE calculation relative to the rate of change in the denominator of the calculation. That is, the change is jointly a function of a “numerator effect” and a “denominator effect.”

2.3.3. Constructing confidence intervals around point estimates of the dependent variable

The third and final topic addressed in the supplementary Word document deals with the construction of confidence intervals around point estimates generated by the cost function that students develop. The discussion is subdivided into a technical analysis and a useful approximation (i.e., more practical) approach. In my opinion, the discussion of developing confidence intervals around point estimates is particularly relevant for MBA students.

I generally begin the discussion by reminding students that the regression model, as good as it might be, still produces estimates of costs. Such “future values” are subject to uncertainty. One way to capture this uncertainty is through the use of confidence intervals. In fact, I generally make the point to my graduate students that supplying point estimates is unwise, in large part because those estimates fail to capture the full information set from the data points used to estimate the regression model.

The technical analysis of confidence-interval construction begins with a discussion of variance; this time, however, I tell students that we need to estimate the variance associated with the regression-predicted value of Y. In addition, in building our confidence interval we need to specify a confidence level (e.g., 90% or 95%), which in turn is reflected in a t value (with n – 2 degrees of freedom). The Word document I created includes a table of t values for different confidence levels and degrees of freedom (df).20 Discussion then turns to the approximation approach, which uses SE as a substitute for the more precise and conceptually correct, but difficult to estimate, variance of the estimate of an individual value of Y based on a given value of X.

2.4. Excel file: Change in SE as n increases

As noted above, this file provides a simple example of how SE changes in response to changes in the sample size of the data set, n, and in response to changes in the underlying linear

20 Alternatively, the T.INV.2T built-in function in Excel can be used to generate the critical t value for constructing a confidence interval. For example, the critical t value for a two-tailed t-distribution, for α = 0.05 (i.e., 95% confidence interval) and df = 8 is 2.306004, the same value found in the chart referenced above. See: http://www.excelfunctions.net/Excel-T-Inv-2t-Function.html or http://msdn.microsoft.com/en- us/library/office/ff821541.aspx. “fit” of the model. As also noted above, this file is embedded as an object in the Word document titled “Cost Estimation and statistical issues—regression analysis.”

3. Part Two: Estimating Learning-Curve Functions (Incremental Unit-Time Model)

The second major component of the learning resource deals with the estimation of a particular type of non-linear function: a learning curve. I focus specifically on fitting what is known as the “incremental unit-time model” to a set of observations. As noted on page 2 of this document, there are three related files: a set of PowerPoint slides (“Estimating Learning-Curve Cost Functions”), a Word document (“Example—Estimating a Learning-Curve Function”), and an Excel file (“Learning-Curve Analysis [“Incremental Unit-Time Model”]).

3.1. PowerPoint slides

The PowerPoint deck consists of six numbered slides. Slide one, as a prelude to presenting the learning-curve model, is used as the basis for presenting to students a refresher on logarithms. I find, from experience, that many (if not most) students have little-to-no recollection of this topic. I tell students at this point that background information regarding logarithms is fundamental to our ability to generate more sophisticated cost-prediction models, represented in the present context as a learning-curve function. In slide #2 I introduce students to the following learning-curve model: Y = aXb, where b = the learning curve index (i.e., Log(LCR)/Log(2)). I indicate to students that there are two forms of this general model: the cumulative-average time model and the incremental unit-time model. Slides 3 and 4 are designed to have students think about the range of possible values for the exponent, b, in each of these two forms of the model; in other words, I try to get them to think about what is necessary for the preceding equation to capture learning effects (efficiency gains associated with experience in a process). The goal is to have students gain a conceptual understanding about the nature of a particular type of non-linear cost function, as well as a “feel” for each of two forms of the learning-curve model observed most often in business. The deck of slides ends with a call to explore (via the prepared Word document and the associated Excel file) the estimation and use of the “Incremental Unit-Time Model.”

3.2. Word document: “Example—Estimating a learning-curve function”

This document begins with a set of 14 observations (values of X and Y) and a plot of this set of data (created as an Excel CHART).21 This is followed by a two-page expanded discussion of issues raised on the PowerPoint slides (viz., learning-curve theory, general functional form of

21 Note that the data set used in part one of the tutorial (simple linear regression) differs from the data set used in part two (learning-curve analysis). The data set used in part two is, in fact, from the case by Stout and Juras (2009), which provides a much more detailed and comprehensive analysis than the introductory analysis contained in the present instructional resource.

Page 11 of 12 the learning-curve model used in business, and explanation of the “incremental unit-time learning curve model”). Finally, two approaches to fitting a learning-curve model using Excel are offered to students: a six-step process whereby an OLS regression is fit to log-transformed data, and using the built-in Power function in Excel, which is accessed by first graphing the data set, adding a Trendline, then choosing “Power” under “Trendline Options.” The former approach is detailed in the companion Excel file titled “Learning Curve Analysis (Incremental Unit-Time Model” while the latter approach is in the Word document itself. Students will see that these two approaches are equivalent (i.e., they result in the same estimated learning-curve model for the given data set). Both approaches are, in fact, rather straightforward extensions to the linear modeling exercise completed as part one of the instructional resource.

3.3. Excel file: “Learning Curve Analysis (Incremental Unit-Time Model)”

As noted above, this file shows how to use the REGRESSION function in Excel to fit a learning-curve model to log-transformed data (i.e., both X and Y values are log-transformed). The only “trick” is converting the resulting regression-estimated coefficients back to “regular” numbers. Under the assumption that log10 was used to transform the data before running the REGRESSION function, the fixed cost component (a) is found by inserting the following formula into an open cell: =POWER(10,Intercept),22 where “Intercept” = the estimated value of Y when X =1, as produced by the REGRESSION function, while the learning-curve index (b) is given as the estimated coefficient associated with the log-transformed independent variable, Log(X). Finally, the learning-curve rate (LCR) can be estimated by using the following function: =POWER(10,(LOG(X)*LOG(2))), where LOG(X) is the regression-generated coefficient associated with the log-transformed independent variable, Log(X).23 That is, LCR = 10 to the POWER of “LOG(X)*LOG(2).” For example, if (b) = −0.430446, then LCR = 10(-0.430446*Log(2)) = 10(-0.430446*0.30103) = 10-0.12958 = 74.20%.

22 The POWER function in Excel is discussed in http://www.itechtalk.com/thread10266.html and in http://www.techonthenet.com/excel/formulas/power.php.

23 If the original data set were converted to natural logs (Ln), then the “a” term in the learning-curve model would be estimated by raising e (the base of the natural logarithm) to the power of the estimated intercept term. Thus, if natural logs were used to transform the data, “a” would be found by raising e to the power of the estimated intercept term. In Excel, this is accomplished by inserting the following formula into an open cell: =EXP(Intercept), where Intercept is the coefficient estimated by the regression of Ln(Y) on Ln(X). (The EXP function in Excel is discussed in http://office.microsoft.com/en-us/excel-help/exp-function-HP010342500.aspx.) To estimate the learning-curve rate (LCR) when natural logs (LN) were used to transform the original data set, the following formula should be placed into an open cell: =EXP(b*LN(2)), where b = the coefficient estimate for LN(X). Results would be equivalent to those obtained when the original data set was transformed using log base 10: a = 25.028, b = −0.43, and LCR = 74.20%.

Recommended publications