On the Identification of Production Functions: How Heterogeneous is Productivity?
Amit Gandhi, Salvador Navarro, David Rivers∗
May 30, 2016
Abstract
We show that the “proxy variable” approaches for estimating production functions generi- cally suffer from a fundamental identification problem in the presence of a flexible input that satisfies the “proxy variable” assumptions for invertibility, such as intermediate inputs. We provide a formal proof of nonparametric non-identification, and illustrate that the source of under-identification is the flexible input elasticity. Using a transformation of the firm’s first order condition, we develop a new nonparametric identification strategy that addresses this problem, as well as a simple corresponding estimator for the production function. We show that the alternative of approximating the effects of intermediate inputs using a value-added production function cannot generally be used to identify features of interest from the gross output production function. Applying our approach to plant-level data from Colombia and Chile, we find that a gross output production function implies fundamentally different patterns of productivity heterogeneity than a value-added specification.
∗We would like to thank Dan Ackerberg, Richard Blundell, Juan Esteban Carranza, Allan Collard-Wexler, Ulrich Doraszelski, Steven Durlauf, Jeremy Fox, Silvia Goncalves, Phil Haile, Joel Horowitz, Jean-Francois Houde, Au- reo de Paula, Amil Petrin, Mark Roberts, Nicolas Roys, Chad Syverson, Chris Taber, Quang Vuong, and especially Tim Conley for helpful discussions. This paper has also benefited from detailed comments by the editor and three anonymous referees. We would also like to thank Amil Petrin and David Greenstreet for helping us to obtain the Colombian and Chilean data respectively. Navarro and Rivers acknowledge support from the Social Sciences and Humanities Research Council of Canada. This paper previously circulated under the name “Identification of Produc- tion Functions using Restrictions from Economic Theory.” First draft: May 2006. Amit Gandhi is at the University of Wisconsin-Madison, E-mail: [email protected]. Salvador Navarro is at the University of Western Ontario, E-mail: [email protected]. David Rivers is at the University of Western Ontario, E-mail: [email protected].
1 1 Introduction
The identification and estimation of production functions using data on firm inputs and output is among the oldest empirical problems in economics. A key challenge for identification arises because firms optimally choose their inputs as a function of their productivity, but productivity is unobserved by the econometrician in the data. This gives rise to a classic simultaneity problem that was first articulated by Marschak and Andrews (1944) and has come to be known in the production function literature as “transmission bias”. In their influential review of the state of the literature nearly 50 years later, Griliches and Mairesse (1998) (henceforth GM) concluded that the “search for identification” for the production function remained a fundamentally open problem.1 Resolving this identification problem is critical to measuring productivity with plant level production data, which has become increasingly available for many countries, and which motivates a variety of industry equilibrium models based on patterns of productivity heterogeneity found in this data.2 In this paper we examine the identification foundations of the “proxy variable” approach for estimating production functions pioneered by Olley and Pakes (1996) and further developed by Levinsohn and Petrin (2003)/Ackerberg, Caves, and Frazer (2015)/Wooldridge (2009) (henceforth OP, LP, ACF and Wooldridge, respectively). Despite the immense popularity of the method in the applied productivity literature, a fundamental question has remained unexplored: does the proxy variable technique solve the identification problem of transmission bias? That is, is the production function in fact identified from panel data on inputs and output under the structural assumptions that the proxy variable technique places on the data? To date there has been no formal demonstration of identification in this literature. On the other hand, there have been various red flags concerning the identification foundations for this class of estimators (see Bond and Söderbom, 2005 and ACF). However, no clear conclusion regarding non-identification of the model as a whole
1In particular, the standard econometric solutions to correct the transmission bias, i.e., using firm fixed effects or instrumental variables, have proven to be both theoretically problematic and unsatisfactory in practice (see e.g., GM and Ackerberg et al., 2007 for a review and Section 7 of this paper for a discussion). 2Among these patterns are the general understanding that even narrowly defined industries exhibit “massive” unex- plained productivity dispersion (Dhrymes, 1991; Bartelsman and Doms, 2000; Syverson, 2004; Collard-Wexler, 2010; Fox and Smeets, 2011), and that productivity is closely related to other dimensions of firm-level heterogeneity, such as importing (Kasahara and Rodrigue, 2008), exporting (Bernard and Jensen, 1995, Bernard and Jensen, 1999, Bernard et al., 2003), wages (Baily, Hulten, and Campbell, 1992), etc. See Syverson (2011) for a review of this literature.
2 has been reached either. Without an answer to this basic question, it is unclear how to interpret the vast body of empirical results regarding patterns of productivity which rely on the proxy approach for the measurement of productivity. Our first main result is a negative one. We show that the model structure underlying the proxy variable technique does not identify the production function (and hence productivity) in the pres- ence of a flexible input that satisfies the conditions for invertibility invoked in this literature. We establish this result under the LP/Wooldridge approach that uses intermediate inputs as a flexi- ble input that satisfies the proxy variable assumption of being monotone (and hence invertible) in productivity. Under this structure we show how to construct a continuum of observationally equiv- alent production functions that cannot be distinguished from the true production function in the data. We also show that similar non-identification problems arise under the original OP strategy of using investment as the proxy variable.3 Our second contribution is to show that the exact source of this non-identification is the flexible input elasticity for inputs satisfying the proxy variable assumption. The flexible input elasticity defines a partial differential equation on the production function. As we show, if this elasticity were known, then it could be integrated up to nonparametrically identify the part of the production function that depends on the flexible input. We then show that the proxy variable structure is sufficient to nonparametrically identify the remainder of the production function. These results taken together formalize the empirical content of the proxy variable structure that is widely used in applied work - the model is identified up to the flexible input elasticity, but fails to identify the flexible input elasticity itself. Our third contribution is that we present a new empirical strategy that nonparametrically iden- tifies the flexible input elasticity, and hence solves for the missing source of identification for the production function within the proxy variable structure. The key to our approach is that we build on a key economic assumption of the proxy variable structure - that the firm optimally chooses intermediate inputs in response to its realized productivity. Whereas the proxy variable literature
3We do not emphasize OP as the leading case because the applied literature has largely adopted the use of interme- diate inputs as the proxy variable due to the prevalence of zeroes in investment, which was the original motivation for LP. Nevertheless we show that resorting to OP does not solve the non-identification problem we pose for LP.
3 has used this assumption to “invert” and replace for productivity using intermediate inputs, we go further and exploit the nonparametric structural link between the production function and the firm’s first order condition for intermediate inputs. This link allows for a nonparametric regression of the intermediate input’s revenue share on all inputs (labor, capital, and intermediate inputs) to identify the flexible input elasticity. This is a nonparametric analogue of the familiar parametric insight that revenue shares directly identify the intermediate input coefficient in a Cobb-Douglas setting (e.g., Klein, 1953 and Solow, 1957). Our key innovation is that we show that the information in the first order condition can be used in a completely nonparametric way, i.e., without making functional form assumptions on the production function. Our results thus show how this “share regression” can be combined with the remaining content of the proxy variable structure to nonparametrically identify the production function as a whole. This identification strategy - regressing revenue shares on inputs to identify the flexible input elasticity, solving the partial differential equation, and integrating this into the proxy variable struc- ture to identify the remainder of the production function - gives rise to a natural two-step estimator in which a flexible parametric approximation to different components of the production function is estimated in each stage. We present a computationally straightforward implementation of this es- timator and show that the properties of the estimator are equivalent to a standard GMM estimator, which gives us a straightforward approach to inference with the computed parameter estimates. We further relate our solution to the common empirical practice in the literature of estimating value-added production functions that subtract out a flexible input from the model (typically inter- mediate inputs). While value-added specifications may be of direct interest themselves, a common justification for value-added is that they derive directly from an underlying gross output technol- ogy. To the extent that one is interested in features of the gross output production function, value added production functions may appear immune from the non-identification problem we raise, as they explicitly exclude intermediate inputs. We show that, unless the production function is a very specific version of Leontief in value added and intermediate inputs, then value added cannot be used to identify features of interest (including productivity) from the underlying gross output production.
4 Finally, we apply our identification strategy to plant-level data from Colombia and Chile to study the underlying patterns of productivity under gross output compared to value-added specifi- cations. We find that productivity differences become orders of magnitude smaller and sometimes even change sign when we analyze the data via gross output rather than value added. For example, the standard 90/10 productivity ratio taken among all manufacturing firms in Chile is roughly 9 under value added (meaning that the 90th percentile firm is 9 times more productive than the 10th percentile firm), whereas under our gross output estimates this ratio falls to 2. Moreover, these dis- persion ratios exhibit a remarkable degree of stability across industries and across the two countries when measured via gross output, but exhibit much larger cross-industry and cross-country variance when measured via value added. We further show that, as compared to gross output, value added estimates generate economically significant differences in the productivity premium of firms that export, firms that import, firms that advertise, and higher wage firms. In contrast to the view expressed in Syverson (2011), that empirical findings related to produc- tivity are quite robust to measurement choices, our findings illustrate the empirical importance of the distinction between gross output and value-added estimates of productivity. Our results high- light the empirical relevance of our identification strategy for gross output production functions. The results suggest that the distinction between gross output and value added is at least as im- portant, if not more so, than the transmission bias that has been the main focus of the production function estimation literature to date. The rest of the paper is organized as follows. In Section 2 we describe the model and provide a nonparametric characterization of transmission bias. In Section 3 we formally prove that the production function is nonparametrically non-identified under the proxy variable approach. Sec- tion 4 shows that the source of the non-identification is the flexible input elasticity. In Section 5 we present our nonparametric identification strategy. Section 6 describes our estimation strategy. Section 7 compares our approach to the related literature. Section 8 discusses the use of value added. In Section 9 we describe the Colombian and Chilean data and show the results compar- ing gross output to value added for productivity measurement. In particular, we show evidence of large differences in unobserved productivity heterogeneity suggested by value added relative to
5 gross output. Section 10 concludes with an example of the policy relevance of our results.
2 The Model and Identification Problem
We first describe the economic model of production underlying the “proxy variable” approach to estimating production functions (OP/LP/ACF/Wooldridge), which has become a widely-used approach to estimating production functions and productivity in applied work. We then define the identification problem associated with the model and data. Our first main result in this pa- per demonstrates that, despite the ubiquity of these approaches in empirical work, the production function and productivity are not identified by the restrictions imposed by these methods.4,5
2.1 Data and Definitions
We observe a panel consisting of firms j = 1,...,J over periods t = 1,...,T .6 A generic firm’s output, labor, capital, and intermediate inputs will be denoted by (Yt,Kt,Lt,Mt) respectively, and their log values will be denoted in lowercase by (yt, kt, lt, mt). Firms are sampled from an underlying population and the asymptotic dimension of the data is to let the number of firms J → ∞ for a fixed T , i.e., the data takes a short panel form. The data directly identifies the joint distribution of the history of inputs and output of a firm, i.e., the data identifies the joint the
T distribution of the collection of random variables: {(yt, kt, lt, mt)}t=1 .
We let It denote the information set of the firm in period t. The information set It consists of all information the firm can use to solve its period t decision problem, which potentially includes its input choices. By definition (and irrespective of the economics details underlying input decisions) we have that the capital, labor, and intermediate input choices in each period can be expressed as
4As we discuss in Sections 3 and 8, Ackerberg, Caves, and Frazer (2015) avoid the issues we discuss below by carefully considering data-generating-processes under which their procedure can be employed for restricted profit / Leontief specifications of the production function. 5The identification problem we isolate also applies to the dynamic panel approach to production function estimation following Arellano and Bond (1991); Blundell and Bond (1998, 2000). We draw a comparison to the dynamic panel literature in Section 7.3. 6Throughout this section we assume a balanced panel for notational simplicity. We also omit the firm subscript j, except when the context requires it.
6 functions of the information set It, i.e.,
kt = K (It); lt = L (It); mt = M (It) (1)
Let xt ∈ {kt, lt, mt} denote a generic input. If an input xt is such that xt ∈ It, i.e., the period t amount of the input employed in the period is in the firm’s information set for that period, then we say the input is predetermined in period t. Thus a predetermined input is a function of the information set of a prior period,
xt = X (It−1) ∈ It
If an input is not predetermined, and thus xt ∈/ It, then we say the input is variable in period t. If an input is variable and ∂ X (It) 6= 0 ∂xt−τ for τ > 0, i.e., lagged values of the input affect the optimal period t choice of the input, then we say the input is dynamic. Finally, if a variable input is not dynamic, then we say it is flexible.
2.2 The Production Function and Productivity
We assume that the relationship between output and inputs is determined by an underlying produc- tion function F and a Hicks neutral productivity shock νt.
Assumption 1. The relationship between output and the inputs takes the form
νt Yt = F (kt, lt, mt) e ⇐⇒
yt = f (kt, lt, mt) + νt (2)
The Hick’s neutral productivity shock νt is decomposed as νt = ωt+εt. The distinction between
ωt and εt is that ωt is known to the firm before making its period t decisions, whereas εt is an ex- post productivity shock realized only after the period decisions are made. The stochastic behavior of both of these components is explained next.
7 Assumption 2. ωt ∈ It is known to the firm at the time of making its period t decisions, whereas
εt ∈/ It is not. Furthermore ωt is Markovian so that its distribution can be written as Pω (ωt | It−1) =
Pω (ωt | ωt−1). The function h (ωt−1) = E [ωt | ωt−1] is continuous. The shock εt on the other hand is independent of the within period variation in information sets, Pε (εt | It) = Pε (εt).
Given that ωt ∈ It, but εt is completely unanticipated on the basis of It, we will refer to ωt as persistent productivity, εt as ex-post productivity, and νt = ωt + εt as total productivity. Observe that we can express ωt = h(ωt−1) + ηt, where ηt satisfies E [ηt | It−1] = 0. ηt can be interpreted as
7 the, unanticipated at period t − 1, “innovation” to the firm’s persistent productivity ωt in period t.
Without loss of generality, we can normalize E [εt | It] = E [εt] = 0, which is in units of log output. However, the expectation of the ex-post shock, in units of the level of output, becomes a
εt εt 8 free parameter which we denote as E ≡ E [e | It] = E [e ]. Note that the general form of input demand (1) implies E [εt | It, kt, lt, mt] = E [εt | It] = 0 and hence E [εt | kt, lt, mt] = 0 by the law of iterated expectations. Finally, we have the scalar invertibility assumption that allows an input to be used to proxy for productivity. Levinsohn and Petrin (2003) proposed using the structure of a flexible input demand as the basis for such a proxy variable. For simplicity, we focus on the case of a single flexible input in the model, namely intermediate inputs mt, and treat capital kt and labor lt as predetermined in the model. The non-identification problem we demonstrate can be easily adapted to the case where
9 lt is also flexible.
Assumption 3. The scalar unobservability assumption of LP/Wooldridge places the following as-
7 It is straightforward to allow the distribution of Pω (ωt | It−1) to depend upon other elements of It−1, such as firm export or import status, R&D, etc. In these cases ωt becomes a controlled Markov process from the firm’s point of view. See Kasahara and Rodrigue (2008) and Doraszelski and Jaumandreu (2013) for examples. 8See Goldberger (1968) for an early discussion of the implicit reinterpretation of results that arises from ignoring εt E (i.e., setting E≡ E [e ] = 1 while simultaneously setting E [εt] = 0) in the context of Cobb-Douglas production functions. 9 We focus on the case of a single flexible input because allowing lt to be flexible, in addition to intermediate inputs mt, is associated with a set of distinct problems related to the identification of the labor elasticity that were raised by ACF. The key to avoiding the ACF critique is letting labor have sources of variation beyond ωt and yet preserve the scalar unobservability restriction on the intermediate input demand. By treating lt as predetermined we avoid the ACF critique and can focus attention on the flexible input elasticity problem (which is the focus of our paper).
8 sumption on the flexible input demand
mt = M (It) = M (kt, lt, ωt) . (3)
10 The intermediate input demand M is assumed strictly monotone in ωt.
Observe that a key implication of Assumption 3 is that M can be inverted in ωt, allowing productivity to be expressed as a deterministic function of the inputs
−1 ωt = M (kt, lt, mt) .
We restrict our attention to the use of intermediate inputs as a proxy versus the original proxy variable strategy of OP that uses investment. As LP argued, the fact that investment is often zero in plant level data leads to practical challenges in using the OP approach, and as a result using intermediate inputs as a proxy has become the preferred strategy in applied work. Investment as a proxy raises similar identification challenges, which we discuss in Appendix A.
We could have generalized the model to allow the primitives to vary with time t, i.e., ft, Pt,ω, and Pt,ε to all vary by time t. We do not use this more general form of the model in the analysis to follow because the added notational burden distracts from the main ideas of the paper. However, it is straightforward to generalize the analysis that follows to the time-varying case by simply repeating the steps of our analysis separately for each time period t ∈ {2,...,T }.
2.3 Transmission Bias
Given the structure of the production function we can formally state the problem of transmission bias in the nonparametric setting. Transmission bias classically refers to the bias of the OLS re- gression of output on inputs as estimates of a Cobb-Douglas production parameter. In the nonpara- metric setting we can see transmission bias more generally as the empirical problem of regressing
10This approach can be generalized to allow the input demand M to vary by time period t. All of our results can be readily extended to this more general case at the cost of introducing additional notational complexity, which we avoid here in order not to distract from the core conceptual issues we raise.
9 output yt on inputs (kt, lt, mt) which yields
E [yt | kt, lt, mt] = f (kt, lt, mt) + E [ωt | kt, lt, mt]
and hence the elasticity of the regression in the data with respect to an input xt ∈ {kt, lt, mt}
∂ ∂ ∂ E [yt | kt, lt, mt] = f (kt, lt, mt) + E [ωt | kt, lt, mt] ∂xt ∂xt ∂xt is a biased estimate of the true production elasticity ∂ f (k , l , m ). ∂xt t t t Under the proxy variable structure, transmission bias takes a very specific form. This can be seen as follows:
−1 E [yt | kt, lt, mt] = f (kt, lt, mt) + M (kt, lt, mt) ≡ φ (kt, lt, mt) . (4)
Clearly no structural elasticities can be identified from this regression (the “first stage”), in partic- ular the flexible input elasticity, ∂ f (k , l , m ). Instead, all the information from the first stage is ∂mt t t t summarized by the identification of the random variable φt ≡ φ (kt, lt, mt), and as a consequence the ex-post productivity shock εt = yt − E [yt | kt, lt, mt].
The question then becomes whether the part of φt that is due to f (kt, lt, mt) versus the part due to ωt can be separately identified using the second stage restrictions of the model. This second stage is formed by recognizing that
yt = f (kt, lt, mt) + ωt + εt
= f (kt, lt, mt) + h (φt−1 − f (kt−1, lt−1, mt−1)) + ηt + εt. (5)
The challenge in using this equation for identification is the presence of an endogenous variable mt in the model that is correlated with ηt. LP/Wooldridge propose to use instrumental variables for this endogeneity problem by exploit- ing orthogonality restrictions implicit in the model with respect to ηt +εt. In particular Assumption
10 2 implies that for any transformation Γt = Γ (It−1) of the lagged period information set It−1 we have the orthogonality E [ηt + εt | Γt] = 0. We focus on transformations that are observable by the econometrician, in which case Γt will serve as the instrumental variables for the problem. The full vector of potential instrumental variables given the data (as described in section 2.1) consists of all lagged output/input values which, by construction, are transformations of It−1, as well as the current values of the predetermined inputs kt and lt (which by assumption are a transformation of
It−1), i.e., Γt = (kt, lt, yt−1, kt−1, lt−1, mt−1, . . . , y0, k0, l0, m0).
3 Non-Identification
In this section we show that the proxy variable structure of Assumptions 1 - 3 does not suffice to identify the production function. In Theorem 1, we first show that the application of instrumental variables (via the orthogonality restriction E [ηt + εt | Γt] = 0) to the structural equation (5) is insufficient to identify the production function f (and the Markovian process h). However, the orthogonality restriction underlying the instrumental variables approach does not summarize the full structure of the model. In Theorem 2, we show that, even if we treat the model as a simultane- ous system of equations for the determination of the the endogenous variables (yt, mt) in (5), the production function f cannot be identified. Identification of the production function f by instrumental variables is based on projecting output yt onto the exogenous variables Γt (see e.g., Newey and Powell, 2003). This generates a restriction between (f, h) and the distribution of the data Gyt,mt|Γt that takes the form
E [yt | Γt] = E [f (kt, lt, mt) | Γt] + E [ωt | Γt]
= E [f (kt, lt, mt) | Γt] + h (φt−1 − f (kt−1, lt−1, mt−1)) , (6)
where recall that φt−1 ≡ φ (kt−1, lt−1, mt−1) is known from the first stage equation (4). The structural primitives underlying equation (6) are given by (f, h). The true (f 0, h0) are identified if no other f,˜ h˜ among all possible alternatives also satisfy the functional restriction (6) given the
11 11 distribution of the observables Gyt,mt|Γt . We first establish the following useful Lemma. For notational simplicity, we define the random variable ft ≡ f (kt, lt, mt).
Lemma 1. If (f, h) solve the functional restriction (6), then it must be the case that
E [φt − ft | Γt] = h (φt−1 − ft−1)
Proof. Observe that
E [yt | Γt] = E [E [yt | kt, lt, mt] | Γt]
= E [φt | Γt]
by construction of φt. From the definition of yt it follows that
E [φt | Γt] = E [ft | Γt] + h (φt−1 − ft−1) .
Re-arranging terms gives us the Lemma.
Theorem 1. Under the model defined by Assumptions 1 - 3, and given φt ≡ φ (kt, lt, mt) identified from the first stage equation (4), there exists a continuum of alternative f,˜ h˜ defined by
˜ 0 f ≡ (1 − a) f + aφt 1 h˜ (x) ≡ (1 − a) h0 x (1 − a)
for any a ∈ (0, 1), that satisfy the same functional restriction (6) as the true (f 0, h0).
Proof. The proof of the Theorem follows almost immediately from Lemma 1. Given the definition
11 0 0 Less formally, the intuitive idea is that f , h are the unique primitives that explain the reduced form E [yt | Γt] given the model.
12 of f,˜ h˜ we have
˜ ˜ ˜ ft + h φt−1 − ft−1 =
0 0 ˜ 0 ft + a φt − ft + h (1 − a) φt−1 − ft−1 =
0 0 0 0 ft + a φt − ft + (1 − a) h φt−1 − ft−1 .
Now, take the conditional expectation of the above (with respect to Γt) and apply the Lemma
h ˜ i ˜ ˜ E ft | Γt + h φt−1 − ft−1 =
0 0 0 0 0 E ft | Γt + ah φt−1 − ft−1 + (1 − a) h φt−1 − ft−1 =
0 0 0 E ft | Γt + h φt−1 − ft−1 .
Thus (f 0, h0) and f,˜ h˜ satisfy the functional restriction and cannot be distinguished via instru- mental variables.
The intuition for the identification failure established in Theorem 1 can be seen by looking at
equation (5) above. Notice that, by replacing for ωt in the intermediate input demand equation (3) we get