On the Identification of Production Functions: How Heterogeneous Is
Total Page:16
File Type:pdf, Size:1020Kb
On the Identification of Production Functions: How Heterogeneous is Productivity? Amit Gandhi, Salvador Navarro, David Rivers∗ May 30, 2016 Abstract We show that the “proxy variable” approaches for estimating production functions generi- cally suffer from a fundamental identification problem in the presence of a flexible input that satisfies the “proxy variable” assumptions for invertibility, such as intermediate inputs. We provide a formal proof of nonparametric non-identification, and illustrate that the source of under-identification is the flexible input elasticity. Using a transformation of the firm’s first order condition, we develop a new nonparametric identification strategy that addresses this problem, as well as a simple corresponding estimator for the production function. We show that the alternative of approximating the effects of intermediate inputs using a value-added production function cannot generally be used to identify features of interest from the gross output production function. Applying our approach to plant-level data from Colombia and Chile, we find that a gross output production function implies fundamentally different patterns of productivity heterogeneity than a value-added specification. ∗We would like to thank Dan Ackerberg, Richard Blundell, Juan Esteban Carranza, Allan Collard-Wexler, Ulrich Doraszelski, Steven Durlauf, Jeremy Fox, Silvia Goncalves, Phil Haile, Joel Horowitz, Jean-Francois Houde, Au- reo de Paula, Amil Petrin, Mark Roberts, Nicolas Roys, Chad Syverson, Chris Taber, Quang Vuong, and especially Tim Conley for helpful discussions. This paper has also benefited from detailed comments by the editor and three anonymous referees. We would also like to thank Amil Petrin and David Greenstreet for helping us to obtain the Colombian and Chilean data respectively. Navarro and Rivers acknowledge support from the Social Sciences and Humanities Research Council of Canada. This paper previously circulated under the name “Identification of Produc- tion Functions using Restrictions from Economic Theory.” First draft: May 2006. Amit Gandhi is at the University of Wisconsin-Madison, E-mail: [email protected]. Salvador Navarro is at the University of Western Ontario, E-mail: [email protected]. David Rivers is at the University of Western Ontario, E-mail: [email protected]. 1 1 Introduction The identification and estimation of production functions using data on firm inputs and output is among the oldest empirical problems in economics. A key challenge for identification arises because firms optimally choose their inputs as a function of their productivity, but productivity is unobserved by the econometrician in the data. This gives rise to a classic simultaneity problem that was first articulated by Marschak and Andrews (1944) and has come to be known in the production function literature as “transmission bias”. In their influential review of the state of the literature nearly 50 years later, Griliches and Mairesse (1998) (henceforth GM) concluded that the “search for identification” for the production function remained a fundamentally open problem.1 Resolving this identification problem is critical to measuring productivity with plant level production data, which has become increasingly available for many countries, and which motivates a variety of industry equilibrium models based on patterns of productivity heterogeneity found in this data.2 In this paper we examine the identification foundations of the “proxy variable” approach for estimating production functions pioneered by Olley and Pakes (1996) and further developed by Levinsohn and Petrin (2003)/Ackerberg, Caves, and Frazer (2015)/Wooldridge (2009) (henceforth OP, LP, ACF and Wooldridge, respectively). Despite the immense popularity of the method in the applied productivity literature, a fundamental question has remained unexplored: does the proxy variable technique solve the identification problem of transmission bias? That is, is the production function in fact identified from panel data on inputs and output under the structural assumptions that the proxy variable technique places on the data? To date there has been no formal demonstration of identification in this literature. On the other hand, there have been various red flags concerning the identification foundations for this class of estimators (see Bond and Söderbom, 2005 and ACF). However, no clear conclusion regarding non-identification of the model as a whole 1In particular, the standard econometric solutions to correct the transmission bias, i.e., using firm fixed effects or instrumental variables, have proven to be both theoretically problematic and unsatisfactory in practice (see e.g., GM and Ackerberg et al., 2007 for a review and Section 7 of this paper for a discussion). 2Among these patterns are the general understanding that even narrowly defined industries exhibit “massive” unex- plained productivity dispersion (Dhrymes, 1991; Bartelsman and Doms, 2000; Syverson, 2004; Collard-Wexler, 2010; Fox and Smeets, 2011), and that productivity is closely related to other dimensions of firm-level heterogeneity, such as importing (Kasahara and Rodrigue, 2008), exporting (Bernard and Jensen, 1995, Bernard and Jensen, 1999, Bernard et al., 2003), wages (Baily, Hulten, and Campbell, 1992), etc. See Syverson (2011) for a review of this literature. 2 has been reached either. Without an answer to this basic question, it is unclear how to interpret the vast body of empirical results regarding patterns of productivity which rely on the proxy approach for the measurement of productivity. Our first main result is a negative one. We show that the model structure underlying the proxy variable technique does not identify the production function (and hence productivity) in the pres- ence of a flexible input that satisfies the conditions for invertibility invoked in this literature. We establish this result under the LP/Wooldridge approach that uses intermediate inputs as a flexi- ble input that satisfies the proxy variable assumption of being monotone (and hence invertible) in productivity. Under this structure we show how to construct a continuum of observationally equiv- alent production functions that cannot be distinguished from the true production function in the data. We also show that similar non-identification problems arise under the original OP strategy of using investment as the proxy variable.3 Our second contribution is to show that the exact source of this non-identification is the flexible input elasticity for inputs satisfying the proxy variable assumption. The flexible input elasticity defines a partial differential equation on the production function. As we show, if this elasticity were known, then it could be integrated up to nonparametrically identify the part of the production function that depends on the flexible input. We then show that the proxy variable structure is sufficient to nonparametrically identify the remainder of the production function. These results taken together formalize the empirical content of the proxy variable structure that is widely used in applied work - the model is identified up to the flexible input elasticity, but fails to identify the flexible input elasticity itself. Our third contribution is that we present a new empirical strategy that nonparametrically iden- tifies the flexible input elasticity, and hence solves for the missing source of identification for the production function within the proxy variable structure. The key to our approach is that we build on a key economic assumption of the proxy variable structure - that the firm optimally chooses intermediate inputs in response to its realized productivity. Whereas the proxy variable literature 3We do not emphasize OP as the leading case because the applied literature has largely adopted the use of interme- diate inputs as the proxy variable due to the prevalence of zeroes in investment, which was the original motivation for LP. Nevertheless we show that resorting to OP does not solve the non-identification problem we pose for LP. 3 has used this assumption to “invert” and replace for productivity using intermediate inputs, we go further and exploit the nonparametric structural link between the production function and the firm’s first order condition for intermediate inputs. This link allows for a nonparametric regression of the intermediate input’s revenue share on all inputs (labor, capital, and intermediate inputs) to identify the flexible input elasticity. This is a nonparametric analogue of the familiar parametric insight that revenue shares directly identify the intermediate input coefficient in a Cobb-Douglas setting (e.g., Klein, 1953 and Solow, 1957). Our key innovation is that we show that the information in the first order condition can be used in a completely nonparametric way, i.e., without making functional form assumptions on the production function. Our results thus show how this “share regression” can be combined with the remaining content of the proxy variable structure to nonparametrically identify the production function as a whole. This identification strategy - regressing revenue shares on inputs to identify the flexible input elasticity, solving the partial differential equation, and integrating this into the proxy variable struc- ture to identify the remainder of the production function - gives rise to a natural two-step estimator in which a flexible parametric approximation to different components of the production function is estimated in each stage. We present a computationally straightforward implementation of this