Nonparametric Regression Techniques in Economics Author(S): Adonis Yatchew Source: Journal of Economic Literature, Vol

American Economic Association Nonparametric Regression Techniques in Economics Author(s): Adonis Yatchew Source: Journal of Economic Literature, Vol. 36, No. 2 (Jun., 1998), pp. 669-721 Published by: American Economic Association Stable URL: http://www.jstor.org/stable/2565120 . Accessed: 19/08/2011 15:48 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of Economic Literature. http://www.jstor.org Journal of Economic Literature Vol. XXXVI (June 1998) pp. 669-721 NonparametricRegression Techniques in Economics ADONIS YATCHEW' 1. Introduction on an infinite number of unknown pa- rameters. (Nonparametric regression 1.1 Setting the Stage typically assumes little else about the shape of the regression function beyond 1.1.1 Benefits of Nonparametric Esti- some degree of smoothness.) Yet, we mation. If economics is the dismal sci- will argue that it is because such tools ence, then econometrics is its ill-fated lead to more durable inferences that offspring. The limited number of strong they will become an enduring-even in- implications derived from economic dispensable-element of every econo- theory, the all but complete inability to mist's tool kit, in much the same way perform controlled experiments, the that linear and nonlinear regression are paucity of durable empirical results, the today. After all, reliable empirical re- errant predictions, the bottomless res- sults are essential to the formulation of ervoir of difficult to measure but poten- sound policy. tially relevant variables-these do not From the point of view of a pure data set a pleasant working environment for analyst, the added value of nonparamet- the empirical analyst in economics. ric techniques consists in their ability to When one contrasts the overall qual- deliver estimators and inference proce- ity of economic data and empirical in- dures that are less dependent on func- ferences with the plethora of sophisti- tional form assumptions. They are also cated econometric theory already useful for exploratory data analysis and available, it would appear difficult to as a supplement to parametric proce- justify learning (or teaching) techniques dures. If one has reservations about a where the regression function depends particular parametric form, specifica- 1 Department of Economics, University of tion tests against nonparametric alter- Toronto. The author is grateful to Andrey Feuer- natives can provide reassurance. verg;er, Zvi Griliches, James MacKinnon, Angelo From the point of view of the econo- Me ino, John Pencavel and particularly to Frank Wolak for their patient reading and insightful mist/econometrician, such techniques comments. The statistics and econometrics litera- possess additional appeal in that most ture on nonparametric regression is massive and implications of economic theory are inevitably many interesting and important results have been given little or no exposure in this intro- nonparametric. Typically, theoretical duction to the subiect. The author regrets this arguments exclude or include variables, state of affairs and hopes that his lacunae will be they imply monotonicity, concavity, or corrected in complementary and more artful re- views by other writers. The author may be con- homogeneity of various sorts, or they tacted at [email protected]. embody more comDlex structure such as 669 670 Journal of Economic Literature, Vol. XXXVI (June 1998) the implications of the maximization hy- because the bound on the first derivative pothesis. They almost never imply a implies that If(xt)-f(xti1) I < L Ixt - xt- I .) specific functional form (the pure quan- As long as z is not perfectly correlated tity theory of mioney equation being one with x, the ordinary least squares estima- exception). This paper will therefore fo- tor of f3using the differenced data, that is cus some considerable attention on con- strained nonparametric regression esti- X(yt - yt-i) (Zt - Zt-i) = Xzzt)2 (1.2) and Pdiff mation testing. y(Zt - Zt_1)2 In some cases, the researcher may feel comfortable with a particular para- has the approximate sampling distribu- metric form for one portion of the re- tion gression function, but less confident about the shape of another portion. fdiff NNf(Vj PI 1 ) 1.3 Such varying prior beliefs call for com- bining parametric and nonparametric where y2 is the conditional variance of z techniques to yield semiparametric re- given x.2 gression models (these have been stud- We have applied this estimator to ied extensively). Inclusion of the non- data on the costs of distributing elec- parametric component may avoid tricity (Figure 1). Factors which can inconsistent estimation which could re- influence distribution costs include cus- sult from incorrect parameterization. tomer density (greater distance be- 1.1.2 An Appetizer for the Reluctant tween customers increases costs), the Palate. For those who have never con- remaining life of physical plant (newer sidered using a nonparametric regres- assets require less maintenance), and sion technique, we suggest the follow- wage rates. These are among the "z" ing elementary procedure, much of variables which appear parametrically which can be implemented in any stan- in the model. However, the nature of dard econometric package. Suppose you scale economies is unclear-the effect are given data (yi,zi,x1). .. (yT,zT,xT) on the on costs could be constant, declining model y = zf3 +f(x) + e where for simplic- to an asymptote or even U-shaped. ity all variables are assumed to be sca- Indeed, important regulatory decisions lars. The Ct are i.i.d. with mean 0 and (such as merger policy) may involve variance (y2 given (z,x). The x's are determining the minimum efficient drawn from a distribution with support, scale. say, the unit interval. Most important, In the left panel, we estimate the cost the data are rearranged so that function parametrically, incorporating a XI < ... < XT. All that is known about f is quadratic term for the scale variable "x" that its first derivative is bounded by a which we measure using the number of constant, say L. Suppose that we first customers of each utility. In the right difference to obtain panel, we report the coefficient esti- mates of the z variables after applying yt - yt-1 = (Zt - Zt-1)f the differencing estimator (1.2), suit- + (f(Xt) -f(xt-I)) + ?t-t - (1.1) ably generalized to allow for vector z. There is moderate change in coefficients. As sample size increases-packing the unit interval with x's-the typical difference 2 Throughout the paper, the symbol - will de- note that a random variable has the indicated ap- xt - xt-1 shrinks at a rate of about IIT so proximatedistribution and the symbol_ will indi- that f(xti1) tends to cancel f(xt). (This is cate approximateequality. Yatchew: Nonparametric Regression Techniques in Economics 671 Figure 1. Returns to Scale in Electricity Distribution Model: Restructuring,vertical unbundling, and deregulation of electricity industries have led to reexaminationof scale economies in electricity distribution.The efficiency of small (usually municipal) distributorsthat exist in the U.S., Norway, New Zealand, Germany and Canadais at issue. The objective is to assess scale economies. The model is given by y = z i + f(x) + ? where y is variable costs of distributionper customer (COST PER CUST). The vector z includes: customer density (ruraldistribution is more expensive) as measured by average distance between customers (DIST); load per customer (LOAD); local generation per customer (GEN); remaining life of assets (LIFE) - older plants require more maintenance; assets per customer (ASSETS);and a proxy for labour wage rates (WAGE). The scale of operation, x, is measured by the number of customers served (CUST). The scale effect may be nonlinear, (e.g., decreasing to an asymptote or U-shaped), hence the inclusion of a nonparametriccomponentf (x). Data: 265 municipal distributorsin Ontario, Canadawhich vary in size from tiny ones serving 200 customers to the largest serving 218,000 (a 'customer' can be a household, or a commercial, industrialor governmental entity). ESTIMATEDMODELS PARAMETRIC:Quadratic scale effect SEMIPARAMETRIC:Smooth scale effect y= a + Z3 +y1x +y2x2 + ? Y = z 3 +f(x) + ? OLS Differencing Estimator1 Coefficient StandardError Coefficient StandardError oc 15.987 37.002 DIST 3.719 1.248 ADIST 2.568 1.560 LOAD 1.920 .661 ALOAD .437 .912 GEN -.051 .023 AGEN .0005 .032 LIFE -5.663 .798 ALIFE -4.470 1.070 ASSETS .037 .005 AASSETS .030 .0072 WAGE .003 .0007 AWAGE .003 .00086 71 -.00154 .00024 72 .1x10-7 .14x10-8 R2 .45 ESTIMATEDIMPACT OF SCALEON DOLLARCOSTS PER CUSTOMERPER YEAR , 200 Kernel w34 - - - - - - Quadratic 0 E- 150- Q ~~~~~~~~~~0 = 100 o 0 C0 05 DCo ? E C/) 0%00Qx 800 0 80 0 00 0~~~~~~~ 500 1000 5000 0 1000500 100 1 aarodrds htx < .t. XT,.0 othe al-0 vaibe difeecedAw= w0 - , foloe by0 orinr lesqurs Standarderor ~ mutple by XTS as per eqato (13) DeenenA- vaial is COS PE CUST. 500 1000 5000 10000 50000 100000 CUSTOMERS ? ? 1Data reordered so that x, ... XT, then all variables differenced A wt = Wt- Wtl1, followed by ordinaryleast squares. Standarderrors multiplied by Vfi75, as per equation (1.3). Dependent variable is A COST PER CUST. 672 Tournal of Economic Literature. Vol. XXXVI (Tune 1998) Though the "differencing estimator" vations. For the approach described of f3 does not require estimation of f, here, the implicit estimate of f(xt) is one can visually assess whether there is f(xt-i).

Load more