<<

On the Identification of Functions: How Heterogeneous is ?

Amit Gandhi, Salvador Navarro, David Rivers∗

May 30, 2016

Abstract

We show that the “proxy variable” approaches for estimating production functions generi- cally suffer from a fundamental identification problem in the presence of a flexible input that satisfies the “proxy variable” assumptions for invertibility, such as intermediate inputs. We provide a formal proof of nonparametric non-identification, and illustrate that the source of under-identification is the flexible input . Using a transformation of the firm’s first order condition, we develop a new nonparametric identification strategy that addresses this problem, as well as a simple corresponding estimator for the production function. We show that the alternative of approximating the effects of intermediate inputs using a value-added production function cannot generally be used to identify features of from the gross output production function. Applying our approach to plant-level data from Colombia and Chile, we find that a gross output production function implies fundamentally different patterns of productivity heterogeneity than a value-added specification.

∗We would like to thank Dan Ackerberg, Richard Blundell, Juan Esteban Carranza, Allan Collard-Wexler, Ulrich Doraszelski, Steven Durlauf, Jeremy Fox, Silvia Goncalves, Phil Haile, Joel Horowitz, Jean-Francois Houde, Au- reo de Paula, Amil Petrin, Mark Roberts, Nicolas Roys, Chad Syverson, Chris Taber, Quang Vuong, and especially Tim Conley for helpful discussions. This paper has also benefited from detailed comments by the editor and three anonymous referees. We would also like to thank Amil Petrin and David Greenstreet for helping us to obtain the Colombian and Chilean data respectively. Navarro and Rivers acknowledge support from the Social Sciences and Humanities Research Council of Canada. This paper previously circulated under the name “Identification of Produc- tion Functions using Restrictions from Economic Theory.” First draft: May 2006. Amit Gandhi is at the University of Wisconsin-Madison, E-mail: [email protected]. Salvador Navarro is at the University of Western Ontario, E-mail: [email protected]. David Rivers is at the University of Western Ontario, E-mail: [email protected].

1 1 Introduction

The identification and estimation of production functions using data on firm inputs and output is among the oldest empirical problems in . A key challenge for identification arises because firms optimally choose their inputs as a function of their productivity, but productivity is unobserved by the econometrician in the data. This gives rise to a classic simultaneity problem that was first articulated by Marschak and Andrews (1944) and has come to be known in the production function literature as “transmission bias”. In their influential review of the state of the literature nearly 50 years later, Griliches and Mairesse (1998) (henceforth GM) concluded that the “search for identification” for the production function remained a fundamentally open problem.1 Resolving this identification problem is critical to measuring productivity with plant level production data, which has become increasingly available for many countries, and which motivates a variety of industry equilibrium models based on patterns of productivity heterogeneity found in this data.2 In this paper we examine the identification foundations of the “proxy variable” approach for estimating production functions pioneered by Olley and Pakes (1996) and further developed by Levinsohn and Petrin (2003)/Ackerberg, Caves, and Frazer (2015)/Wooldridge (2009) (henceforth OP, LP, ACF and Wooldridge, respectively). Despite the immense popularity of the method in the applied productivity literature, a fundamental question has remained unexplored: does the proxy variable technique solve the identification problem of transmission bias? That is, is the production function in fact identified from panel data on inputs and output under the structural assumptions that the proxy variable technique places on the data? To date there has been no formal demonstration of identification in this literature. On the other hand, there have been various red flags concerning the identification foundations for this class of estimators (see Bond and Söderbom, 2005 and ACF). However, no clear conclusion regarding non-identification of the model as a whole

1In particular, the standard econometric solutions to correct the transmission bias, i.e., using firm fixed effects or instrumental variables, have proven to be both theoretically problematic and unsatisfactory in practice (see e.g., GM and Ackerberg et al., 2007 for a review and Section 7 of this paper for a discussion). 2Among these patterns are the general understanding that even narrowly defined industries exhibit “massive” unex- plained productivity dispersion (Dhrymes, 1991; Bartelsman and Doms, 2000; Syverson, 2004; Collard-Wexler, 2010; Fox and Smeets, 2011), and that productivity is closely related to other dimensions of firm-level heterogeneity, such as importing (Kasahara and Rodrigue, 2008), exporting (Bernard and Jensen, 1995, Bernard and Jensen, 1999, Bernard et al., 2003), (Baily, Hulten, and Campbell, 1992), etc. See Syverson (2011) for a review of this literature.

2 has been reached either. Without an answer to this basic question, it is unclear how to interpret the vast body of empirical results regarding patterns of productivity which rely on the proxy approach for the measurement of productivity. Our first main result is a negative one. We show that the model structure underlying the proxy variable technique does not identify the production function (and hence productivity) in the pres- ence of a flexible input that satisfies the conditions for invertibility invoked in this literature. We establish this result under the LP/Wooldridge approach that uses intermediate inputs as a flexi- ble input that satisfies the proxy variable assumption of being monotone (and hence invertible) in productivity. Under this structure we show how to construct a continuum of observationally equiv- alent production functions that cannot be distinguished from the true production function in the data. We also show that similar non-identification problems arise under the original OP strategy of using investment as the proxy variable.3 Our second contribution is to show that the exact source of this non-identification is the flexible input elasticity for inputs satisfying the proxy variable assumption. The flexible input elasticity defines a partial differential equation on the production function. As we show, if this elasticity were known, then it could be integrated up to nonparametrically identify the part of the production function that depends on the flexible input. We then show that the proxy variable structure is sufficient to nonparametrically identify the remainder of the production function. These results taken together formalize the empirical content of the proxy variable structure that is widely used in applied work - the model is identified up to the flexible input elasticity, but fails to identify the flexible input elasticity itself. Our third contribution is that we present a new empirical strategy that nonparametrically iden- tifies the flexible input elasticity, and hence solves for the missing source of identification for the production function within the proxy variable structure. The key to our approach is that we build on a key economic assumption of the proxy variable structure - that the firm optimally chooses intermediate inputs in response to its realized productivity. Whereas the proxy variable literature

3We do not emphasize OP as the leading case because the applied literature has largely adopted the use of interme- diate inputs as the proxy variable due to the prevalence of zeroes in investment, which was the original motivation for LP. Nevertheless we show that resorting to OP does not solve the non-identification problem we pose for LP.

3 has used this assumption to “invert” and replace for productivity using intermediate inputs, we go further and exploit the nonparametric structural link between the production function and the firm’s first order condition for intermediate inputs. This link allows for a nonparametric regression of the intermediate input’s revenue share on all inputs (labor, , and intermediate inputs) to identify the flexible input elasticity. This is a nonparametric analogue of the familiar parametric insight that revenue shares directly identify the intermediate input coefficient in a Cobb-Douglas setting (e.g., Klein, 1953 and Solow, 1957). Our key innovation is that we show that the information in the first order condition can be used in a completely nonparametric way, i.e., without making functional form assumptions on the production function. Our results thus show how this “share regression” can be combined with the remaining content of the proxy variable structure to nonparametrically identify the production function as a whole. This identification strategy - regressing revenue shares on inputs to identify the flexible input elasticity, solving the partial differential equation, and integrating this into the proxy variable struc- ture to identify the remainder of the production function - gives rise to a natural two-step estimator in which a flexible parametric approximation to different components of the production function is estimated in each stage. We present a computationally straightforward implementation of this es- timator and show that the properties of the estimator are equivalent to a standard GMM estimator, which gives us a straightforward approach to inference with the computed parameter estimates. We further relate our solution to the common empirical practice in the literature of estimating value-added production functions that subtract out a flexible input from the model (typically inter- mediate inputs). While value-added specifications may be of direct interest themselves, a common justification for value-added is that they derive directly from an underlying gross output technol- ogy. To the extent that one is interested in features of the gross output production function, value added production functions may appear immune from the non-identification problem we raise, as they explicitly exclude intermediate inputs. We show that, unless the production function is a very specific version of Leontief in value added and intermediate inputs, then value added cannot be used to identify features of interest (including productivity) from the underlying gross output production.

4 Finally, we apply our identification strategy to plant-level data from Colombia and Chile to study the underlying patterns of productivity under gross output compared to value-added specifi- cations. We find that productivity differences become orders of magnitude smaller and sometimes even change sign when we analyze the data via gross output rather than value added. For example, the standard 90/10 productivity ratio taken among all manufacturing firms in Chile is roughly 9 under value added (meaning that the 90th percentile firm is 9 times more productive than the 10th percentile firm), whereas under our gross output estimates this ratio falls to 2. Moreover, these dis- persion ratios exhibit a remarkable degree of stability across industries and across the two countries when measured via gross output, but exhibit much larger cross-industry and cross-country variance when measured via value added. We further show that, as compared to gross output, value added estimates generate economically significant differences in the productivity premium of firms that export, firms that import, firms that advertise, and higher firms. In contrast to the view expressed in Syverson (2011), that empirical findings related to produc- tivity are quite robust to measurement choices, our findings illustrate the empirical importance of the distinction between gross output and value-added estimates of productivity. Our results high- light the empirical relevance of our identification strategy for gross output production functions. The results suggest that the distinction between gross output and value added is at least as im- portant, if not more so, than the transmission bias that has been the main focus of the production function estimation literature to date. The rest of the paper is organized as follows. In Section 2 we describe the model and provide a nonparametric characterization of transmission bias. In Section 3 we formally prove that the production function is nonparametrically non-identified under the proxy variable approach. Sec- tion 4 shows that the source of the non-identification is the flexible input elasticity. In Section 5 we present our nonparametric identification strategy. Section 6 describes our estimation strategy. Section 7 compares our approach to the related literature. Section 8 discusses the use of value added. In Section 9 we describe the Colombian and Chilean data and show the results compar- ing gross output to value added for productivity measurement. In particular, we show evidence of large differences in unobserved productivity heterogeneity suggested by value added relative to

5 gross output. Section 10 concludes with an example of the policy relevance of our results.

2 The Model and Identification Problem

We first describe the economic model of production underlying the “proxy variable” approach to estimating production functions (OP/LP/ACF/Wooldridge), which has become a widely-used approach to estimating production functions and productivity in applied work. We then define the identification problem associated with the model and data. Our first main result in this pa- per demonstrates that, despite the ubiquity of these approaches in empirical work, the production function and productivity are not identified by the restrictions imposed by these methods.4,5

2.1 Data and Definitions

We observe a panel consisting of firms j = 1,...,J over periods t = 1,...,T .6 A generic firm’s output, labor, capital, and intermediate inputs will be denoted by (Yt,Kt,Lt,Mt) respectively, and their log values will be denoted in lowercase by (yt, kt, lt, mt). Firms are sampled from an underlying population and the asymptotic dimension of the data is to let the number of firms J → ∞ for a fixed T , i.e., the data takes a short panel form. The data directly identifies the joint of the history of inputs and output of a firm, i.e., the data identifies the joint the

T distribution of the collection of random variables: {(yt, kt, lt, mt)}t=1 .

We let It denote the information set of the firm in period t. The information set It consists of all information the firm can use to solve its period t decision problem, which potentially includes its input choices. By definition (and irrespective of the economics details underlying input decisions) we have that the capital, labor, and intermediate input choices in each period can be expressed as

4As we discuss in Sections 3 and 8, Ackerberg, Caves, and Frazer (2015) avoid the issues we discuss below by carefully considering data-generating-processes under which their procedure can be employed for restricted profit / Leontief specifications of the production function. 5The identification problem we isolate also applies to the dynamic panel approach to production function estimation following Arellano and Bond (1991); Blundell and Bond (1998, 2000). We draw a comparison to the dynamic panel literature in Section 7.3. 6Throughout this section we assume a balanced panel for notational simplicity. We also omit the firm subscript j, except when the context requires it.

6 functions of the information set It, i.e.,

kt = K (It); lt = L (It); mt = M (It) (1)

Let xt ∈ {kt, lt, mt} denote a generic input. If an input xt is such that xt ∈ It, i.e., the period t amount of the input employed in the period is in the firm’s information set for that period, then we say the input is predetermined in period t. Thus a predetermined input is a function of the information set of a prior period,

xt = X (It−1) ∈ It

If an input is not predetermined, and thus xt ∈/ It, then we say the input is variable in period t. If an input is variable and ∂ X (It) 6= 0 ∂xt−τ for τ > 0, i.e., lagged values of the input affect the optimal period t choice of the input, then we say the input is dynamic. Finally, if a variable input is not dynamic, then we say it is flexible.

2.2 The Production Function and Productivity

We assume that the relationship between output and inputs is determined by an underlying produc- tion function F and a Hicks neutral productivity shock νt.

Assumption 1. The relationship between output and the inputs takes the form

νt Yt = F (kt, lt, mt) e ⇐⇒

yt = f (kt, lt, mt) + νt (2)

The Hick’s neutral productivity shock νt is decomposed as νt = ωt+εt. The distinction between

ωt and εt is that ωt is known to the firm before making its period t decisions, whereas εt is an ex- post productivity shock realized only after the period decisions are made. The stochastic behavior of both of these components is explained next.

7 Assumption 2. ωt ∈ It is known to the firm at the time of making its period t decisions, whereas

εt ∈/ It is not. Furthermore ωt is Markovian so that its distribution can be written as Pω (ωt | It−1) =

Pω (ωt | ωt−1). The function h (ωt−1) = E [ωt | ωt−1] is continuous. The shock εt on the other hand is independent of the within period variation in information sets, Pε (εt | It) = Pε (εt).

Given that ωt ∈ It, but εt is completely unanticipated on the basis of It, we will refer to ωt as persistent productivity, εt as ex-post productivity, and νt = ωt + εt as total productivity. Observe that we can express ωt = h(ωt−1) + ηt, where ηt satisfies E [ηt | It−1] = 0. ηt can be interpreted as

7 the, unanticipated at period t − 1, “innovation” to the firm’s persistent productivity ωt in period t.

Without loss of generality, we can normalize E [εt | It] = E [εt] = 0, which is in units of log output. However, the expectation of the ex-post shock, in units of the level of output, becomes a

εt εt 8 free parameter which we denote as E ≡ E [e | It] = E [e ]. Note that the general form of input demand (1) implies E [εt | It, kt, lt, mt] = E [εt | It] = 0 and hence E [εt | kt, lt, mt] = 0 by the law of iterated expectations. Finally, we have the scalar invertibility assumption that allows an input to be used to proxy for productivity. Levinsohn and Petrin (2003) proposed using the structure of a flexible input demand as the basis for such a proxy variable. For simplicity, we focus on the case of a single flexible input in the model, namely intermediate inputs mt, and treat capital kt and labor lt as predetermined in the model. The non-identification problem we demonstrate can be easily adapted to the case where

9 lt is also flexible.

Assumption 3. The scalar unobservability assumption of LP/Wooldridge places the following as-

7 It is straightforward to allow the distribution of Pω (ωt | It−1) to depend upon other elements of It−1, such as firm export or import status, R&D, etc. In these cases ωt becomes a controlled Markov process from the firm’s point of view. See Kasahara and Rodrigue (2008) and Doraszelski and Jaumandreu (2013) for examples. 8See Goldberger (1968) for an early discussion of the implicit reinterpretation of results that arises from ignoring εt E (i.e., setting E≡ E [e ] = 1 while simultaneously setting E [εt] = 0) in the context of Cobb-Douglas production functions. 9 We focus on the case of a single flexible input because allowing lt to be flexible, in addition to intermediate inputs mt, is associated with a set of distinct problems related to the identification of the labor elasticity that were raised by ACF. The key to avoiding the ACF critique is letting labor have sources of variation beyond ωt and yet preserve the scalar unobservability restriction on the intermediate input demand. By treating lt as predetermined we avoid the ACF critique and can focus attention on the flexible input elasticity problem (which is the focus of our paper).

8 sumption on the flexible input demand

mt = M (It) = M (kt, lt, ωt) . (3)

10 The intermediate input demand M is assumed strictly monotone in ωt.

Observe that a key implication of Assumption 3 is that M can be inverted in ωt, allowing productivity to be expressed as a deterministic function of the inputs

−1 ωt = M (kt, lt, mt) .

We restrict our attention to the use of intermediate inputs as a proxy versus the original proxy variable strategy of OP that uses investment. As LP argued, the fact that investment is often zero in plant level data leads to practical challenges in using the OP approach, and as a result using intermediate inputs as a proxy has become the preferred strategy in applied work. Investment as a proxy raises similar identification challenges, which we discuss in Appendix A.

We could have generalized the model to allow the primitives to vary with time t, i.e., ft, Pt,ω, and Pt,ε to all vary by time t. We do not use this more general form of the model in the analysis to follow because the added notational burden distracts from the main ideas of the paper. However, it is straightforward to generalize the analysis that follows to the time-varying case by simply repeating the steps of our analysis separately for each time period t ∈ {2,...,T }.

2.3 Transmission Bias

Given the structure of the production function we can formally state the problem of transmission bias in the nonparametric setting. Transmission bias classically refers to the bias of the OLS re- gression of output on inputs as estimates of a Cobb-Douglas production parameter. In the nonpara- metric setting we can see transmission bias more generally as the empirical problem of regressing

10This approach can be generalized to allow the input demand M to vary by time period t. All of our results can be readily extended to this more general case at the cost of introducing additional notational complexity, which we avoid here in order not to distract from the core conceptual issues we raise.

9 output yt on inputs (kt, lt, mt) which yields

E [yt | kt, lt, mt] = f (kt, lt, mt) + E [ωt | kt, lt, mt]

and hence the elasticity of the regression in the data with respect to an input xt ∈ {kt, lt, mt}

∂ ∂ ∂ E [yt | kt, lt, mt] = f (kt, lt, mt) + E [ωt | kt, lt, mt] ∂xt ∂xt ∂xt is a biased estimate of the true production elasticity ∂ f (k , l , m ). ∂xt t t t Under the proxy variable structure, transmission bias takes a very specific form. This can be seen as follows:

−1 E [yt | kt, lt, mt] = f (kt, lt, mt) + M (kt, lt, mt) ≡ φ (kt, lt, mt) . (4)

Clearly no structural elasticities can be identified from this regression (the “first stage”), in partic- ular the flexible input elasticity, ∂ f (k , l , m ). Instead, all the information from the first stage is ∂mt t t t summarized by the identification of the random variable φt ≡ φ (kt, lt, mt), and as a consequence the ex-post productivity shock εt = yt − E [yt | kt, lt, mt].

The question then becomes whether the part of φt that is due to f (kt, lt, mt) versus the part due to ωt can be separately identified using the second stage restrictions of the model. This second stage is formed by recognizing that

yt = f (kt, lt, mt) + ωt + εt

= f (kt, lt, mt) + h (φt−1 − f (kt−1, lt−1, mt−1)) + ηt + εt. (5)

The challenge in using this equation for identification is the presence of an endogenous variable mt in the model that is correlated with ηt. LP/Wooldridge propose to use instrumental variables for this endogeneity problem by exploit- ing orthogonality restrictions implicit in the model with respect to ηt +εt. In particular Assumption

10 2 implies that for any transformation Γt = Γ (It−1) of the lagged period information set It−1 we have the orthogonality E [ηt + εt | Γt] = 0. We focus on transformations that are observable by the econometrician, in which case Γt will serve as the instrumental variables for the problem. The full vector of potential instrumental variables given the data (as described in section 2.1) consists of all lagged output/input values which, by construction, are transformations of It−1, as well as the current values of the predetermined inputs kt and lt (which by assumption are a transformation of

It−1), i.e., Γt = (kt, lt, yt−1, kt−1, lt−1, mt−1, . . . , y0, k0, l0, m0).

3 Non-Identification

In this section we show that the proxy variable structure of Assumptions 1 - 3 does not suffice to identify the production function. In Theorem 1, we first show that the application of instrumental variables (via the orthogonality restriction E [ηt + εt | Γt] = 0) to the structural equation (5) is insufficient to identify the production function f (and the Markovian process h). However, the orthogonality restriction underlying the instrumental variables approach does not summarize the full structure of the model. In Theorem 2, we show that, even if we treat the model as a simultane- ous system of equations for the determination of the the endogenous variables (yt, mt) in (5), the production function f cannot be identified. Identification of the production function f by instrumental variables is based on projecting output yt onto the exogenous variables Γt (see e.g., Newey and Powell, 2003). This generates a restriction between (f, h) and the distribution of the data Gyt,mt|Γt that takes the form

E [yt | Γt] = E [f (kt, lt, mt) | Γt] + E [ωt | Γt]

= E [f (kt, lt, mt) | Γt] + h (φt−1 − f (kt−1, lt−1, mt−1)) , (6)

where recall that φt−1 ≡ φ (kt−1, lt−1, mt−1) is known from the first stage equation (4). The structural primitives underlying equation (6) are given by (f, h). The true (f 0, h0) are identified if   no other f,˜ h˜ among all possible alternatives also satisfy the functional restriction (6) given the

11 11 distribution of the observables Gyt,mt|Γt . We first establish the following useful Lemma. For notational simplicity, we define the random variable ft ≡ f (kt, lt, mt).

Lemma 1. If (f, h) solve the functional restriction (6), then it must be the case that

E [φt − ft | Γt] = h (φt−1 − ft−1)

Proof. Observe that

E [yt | Γt] = E [E [yt | kt, lt, mt] | Γt]

= E [φt | Γt]

by construction of φt. From the definition of yt it follows that

E [φt | Γt] = E [ft | Γt] + h (φt−1 − ft−1) .

Re-arranging terms gives us the Lemma.

Theorem 1. Under the model defined by Assumptions 1 - 3, and given φt ≡ φ (kt, lt, mt) identified   from the first stage equation (4), there exists a continuum of alternative f,˜ h˜ defined by

˜ 0 f ≡ (1 − a) f + aφt  1  h˜ (x) ≡ (1 − a) h0 x (1 − a)

for any a ∈ (0, 1), that satisfy the same functional restriction (6) as the true (f 0, h0).

Proof. The proof of the Theorem follows almost immediately from Lemma 1. Given the definition

11 0 0 Less formally, the intuitive idea is that f , h are the unique primitives that explain the reduced form E [yt | Γt] given the model.

12   of f,˜ h˜ we have

˜ ˜  ˜  ft + h φt−1 − ft−1 =

0 0 ˜ 0  ft + a φt − ft + h (1 − a) φt−1 − ft−1 =

0 0 0 0  ft + a φt − ft + (1 − a) h φt−1 − ft−1 .

Now, take the conditional expectation of the above (with respect to Γt) and apply the Lemma

h ˜ i ˜  ˜  E ft | Γt + h φt−1 − ft−1 =

 0  0 0  0 0  E ft | Γt + ah φt−1 − ft−1 + (1 − a) h φt−1 − ft−1 =

 0  0 0  E ft | Γt + h φt−1 − ft−1 .

  Thus (f 0, h0) and f,˜ h˜ satisfy the functional restriction and cannot be distinguished via instru- mental variables.

The intuition for the identification failure established in Theorem 1 can be seen by looking at

equation (5) above. Notice that, by replacing for ωt in the intermediate input demand equation (3) we get

−1   mt = M kt, lt, h M (kt−1, lt−1, mt−1) + ηt .

This implies that the only source of variation left in mt after conditioning on

(kt, lt, kt−1, lt−1, mt−1) ∈ Γt (which are used as instruments for themselves) is the unobservable ηt.

Therefore, despite the apparent abundance of instruments in Γt, all of the remaining elements in Γt

are orthogonal to this remaining source of variation, ηt, and hence have no power as instruments. Theorem 1 calls into question existing applied work on productivity that employs the proxy variable technique to recover productivity. A standard application in the literature will employ a

“flexible parametric approximation” fβ parametrized by a finite dimensional parameter β to the production function f. The standard estimator applies the restrictions of the first stage and second

stage in a GMM formulation where the moments are defined by E [εtxt] = 0 for xt ∈ {kt, lt, mt}

13 and E [(ηt + εt) xt] for xt ∈ Γt. So long as the number of moment restrictions (determined by the number of elements in Γt the estimator exploits) exceeds the dimensionality of β, then such an approach would appear identified. However, our Theorem 1 shows that this simple process of “counting moment equations” and unknown parameters is deceiving. In so far as such estimators are consistent for β, it is only because of the parametric structure being employed. Theorem 1 establishes that the production function is nonparametrically non- identified by the moment restrictions underlying these techniques. However, the researcher will typically have little basis for imposing parametric restrictions, and, if the parametric restrictions are not correct, this can generate misleading inferences about the production function and productivity (see Manski, 2003; Roehrig, 1988; and Matzkin, 2007 for more detail). Furthermore, as we show in Appendix B, for the case of the commonly-employed Cobb-Douglas parametric form, even imposing structural parametric assumptions is not necessarily sufficient to solve the identification problem. The result in Theorem 1 is a useful benchmark, as it relates directly to the approach used in the proxy variable literature. However, this instrumental variables approach does not necessarily ex- haust the sources of identification inherent in the proxy variable structure. First, since instrumental variables is based only on conditional expectations, it does not employ the entire distribution of the

data (yt, mt, Γt). Second, it does not directly account for the fact that Assumption 3 also imposes restrictions (scalar unobservability and monotonicity) on the determination of the endogenous vari-

able mt via M (·). Therefore, the proxy variable structure imposes restrictions on a simultaneous

system of equations because, in addition to the model for output, yt, via the production function,

there is a model for the proxy variable, in this case intermediate inputs, mt.

We now extend our non-identification result to show that the complete structure Θ = (f, h, M)

cannot be identified from the full joint distribution Gyt,mt|Γt of the data. For a structure Θ, let

Θ −1 εt = yt − f (kt, lt, mt) − M (kt, lt, mt) ,

14 and

Θ −1 −1  ηt = M (kt, lt, mt) − h M (kt−1, lt−1, mt−1) .

In order to relate the structure Θ to the joint distribution of the data Gyt,mt|Γt through the model, a

joint distribution of the G Θ Θ needs to be specified. Let E (·) denote the expectation operator ηt ,εt |Γt G taken with respect to distribution G. We say that a structure Θ rationalizes the data if: (i) there

exists a joint distribution G Θ Θ = G Θ × G Θ that generates the joint distribution G ; ηt ,εt |Γt ηt |Γt εt yt,mt|Γt  Θ  (ii) satisfies the first stage moment restriction EG Θ εt | kt, lt, mt = 0; (iii) satisfies the IV or- εt  Θ Θ  thogonality restriction EG Θ Θ ηt + εt | Γt = 0; and (iv) satisfies Assumption 3 (i.e., scalar ηt ,εt |Γt unobservability and monotonicity of M). Following Matzkin (2007), we say that, if there exists an alternative structure Θ˜ 6= Θ0 that rationalizes the data, then the structure Θ0 is not identified from

the joint distribution Gyt,mt|Γt of the data.

Theorem 2. Given the true structure Θ0 = (f 0, h0, M0) and Assumptions 1 - 3, there always exists a continuum of alternative structures Θ˜ 6= Θ0, defined by

0 0−1 f˜ ≡ f + a M  1  h˜ (x) ≡ (1 − a) h0 x (1 − a) −1 0−1 M˜ ≡ (1 − a) M

for any a ∈ (0, 1), that exactly rationalize the data Gyt,mt|Γt .

Proof. Let ˚x denote a particular value of the random variable x in its support. We first observe that,

for any hypothetical structure Θ = (f, h, ), there always exists a distribution G Θ Θ defined M ηt ,εt |Γt by

G Θ Θ (˚η ,˚ε | Γ ) = ηt ,εt |Γt t t t     −1 ˚εt + f (kt, lt, M (kt, lt, h (M (kt−1, lt−1, mt−1)) + ˚ηt))    ,  G  −1 −1 Γ  , yt,mt|Γt  +M (kt, lt, M (kt, lt, h (M (kt−1, lt−1, mt−1)) + ˚ηt)) t   −1 M (kt, lt, h (M (kt−1, lt−1, mt−1)) + ˚ηt)

15 that generates the conditional distribution of the data Gyt,mt|Γt through the model, hence (i) is satisfied. h i Second, since the true model rationalizes the data, it follows that E εΘ0 | k , l , m = 0. G Θ0 t t t t εt Θ˜ The εt implied by our alternative structure is given by

Θ˜ ˜ ˜ −1 εt = yt − f (kt, lt, mt) − M (kt, lt, mt)

0 0−1 0−1 = yt − f (kt, lt, mt) − a M (kt, lt, mt) − (1 − a) M (kt, lt, mt)

0 0−1 = yt − f (kt, lt, mt) − M (kt, lt, mt)

Θ0 = εt ,

so it trivially satisfies the moment restriction in (ii). Third, it follows that

  Θ˜ Θ˜ ˜ ˜ ˜ −1 ηt + εt = yt − f (kt, lt, mt) − h M (kt−1, lt−1, mt−1)

0 0−1 0  0−1  = yt − f (kt, lt, mt) − a M (kt, lt, mt) − (1 − a) h M (kt−1, lt−1, mt−1)

0 0  0−1  = yt − f (kt, lt, mt) − h M (kt−1, lt−1, mt−1) | {z } Θ0 Θ0 ηt +εt       0−1 0 0−1  −a  M (kt, lt, mt) − h M (kt−1, lt−1, mt−1)  | {z } Θ0 ηt Θ0 Θ0 = (1 − a) ηt + εt .

  Since εΘ˜ = εΘ0 , it immediately follows that E εΘ˜ | Γ = 0. It also follows that ηΘ˜ = t t G Θ˜ Θ˜ t t t ηt ,εt |Γt

16 Θ0 (1 − a) ηt . By a simple change of variables we have that

    E ηΘ˜ | Γ = E ηΘ˜ | Γ G Θ˜ Θ˜ t t G Θ˜ t t ηt ,εt |Γt ηt |Γt ! ηΘ˜ = E t | Γ G Θ0 t ηt |Γt (1 − a)

 0  = E ηΘ | Γ G Θ0 t t ηt |Γt = 0.

Hence, our alternative structure satisfies the moment restriction in (iii). −1 −1   −1 Finally we notice that, since (M0) is invertible given Assumption 3, M˜ ≡ (1 − a)(M0) is therefore also invertible and hence satisfies Assumption 3 (i.e., (iv)) as well. Since both Θ˜ and Θ0 satisfy requirements (i)-(iv), i.e., both rationalize the data, we conclude that Θ0 is not identi- fied.

While we have focused our discussion on gross output production functions, a similar non- identification result generally arises in a value-added specification in which either capital or labor is flexible and satisfies Assumption 3. Notice that it is not necessary that this flexible input be used as the proxy, just that it satisfies scalar unobservability and monotonicity. Hence, in the value-added case with intermediate inputs used as the proxy, if labor is also flexible and satisfies Assumption 3, it will be subject to the same identification problems described in this section. A notable exception is Ackerberg, Caves, and Frazer (2015), which carefully lays out a description of the data-generating-processes under which their procedure can be employed for value-added specifications.12 However, based in part on our non-identification results above, ACF suggest not applying their procedure to gross output production functions that are not Leontief in intermediate inputs.

12 In particular, in one of the cases they present, they show that when labor is a flexible input that does not satisfy Assumption 3 due to persistent and unobserved wage shocks, its elasticity is identified.

17 4 The Empirical Content of the Model

Our theorems in the previous section show that the LP/Wooldridge approach is nonparametrically under-identified. Our next result makes precise the exact source of under-identification. In par- ticular, if the flexible input elasticity ∂ f (k , l , m ) were nonparametrically known, then the ∂mt t t t structure of the second stage of the model is informative enough to identify the remainder of the production function nonparametrically. We present this result for a single flexible input mt, and show how it extends to multiple flexible input elasticities in Appendix C. The idea is that the flexible input elasticity defines a partial differential equation that can be integrated up to identify the part of the production function f related to the intermediate input

13 mt. By the fundamental theorem of calculus we have

Z ∂ f (kt, lt, mt) dmt = f (kt, lt, mt) + C (kt, lt) . (7) ∂mt

Subtracting equation (7) from the production function, and re-arranging terms we have

Z ∂ Yt ≡ yt − εt − f (kt, lt, mt) dmt = −C (kt, lt) + ωt. (8) ∂mt

Notice that Yt is an “observable” random variable as it is a function of data and the flexible input elasticity which is assumed to be known. It also depends on the ex-post shock εt, which can be recovered, for example, from the first stages of the OP/LP/ACF procedures. Applying the Markov structure on productivity that is the basis for the “second stage” moments of the OP/LP/ACF procedure gives

Yt = −C (kt, lt) + h (Yt−1 + C (kt−1, lt−1)) + ηt. (9)

Since (kt, lt, kt−1, lt−1, mt−1, yt−1 − εt−1) is a transformation of the information set It−1, and

Yt−1 is a function of these variables, we have the orthogonality E [ηt | kt, lt, Yt−1, kt−1, lt−1] = 0

13See Houthakker (1950) for a similar solution to the related problem of how to recover the utility function from the demand functions.

18 which implies

E [Yt | kt, lt, Yt−1, kt−1, lt−1] = −C (kt, lt) + h (Yt−1 + C (kt−1, lt−1)) . (10)

This regression identified in the data will allow us to identify the last component of the production function C up to an additive constant.14,15 We now establish this result formally based on the observations of the above discussion. We will use the following regularity condition on the support of the regressors

(kt, lt, Yt−1, kt−1, lt−1) (adapted from Newey, Powell, and Vella, 1999). ¯ ¯ ¯  Assumption 4. For each point Yt, kt−1, lt−1 in the support of (Yt−1, kt−1, lt−1), the boundary of ¯ ¯ ¯  the support of (kt, lt) conditional on Yt, kt−1, lt−1 has a probability measure zero.

Assumption 4 is a condition that states that we can independently vary the predetermined inputs

(kt, lt) conditional on (Yt−1, kt−1, lt−1) within the support. This implicitly assumes the existence of enough variation in the input demand functions for the predetermined inputs to induce open

set variation in them conditional on the lagged output and input values (Yt−1, kt−1, lt−1). This condition makes explicit the variation that allows for nonparametric identification of the remainder of the production function under the second stage moments above. A version of this assumption is thus implicit in the LP/ACF/Wooldridge procedures.

Theorem 3. Under Assumptions 1 - 4, if ∂ f (k , l , m ) is nonparametrically known, then the ∂mt t t t production function f is nonparametrically identified up to an additive constant.

Proof. The theorem assumes that ∂ f (k , l , m ) is known almost everywhere in (k , l , m ). As- ∂mt t t t t t t

sumptions 2, 3, and 4 ensure that with probability 1 for any (kt, lt, mt) in the support of the data

14 Notice that in the OP/LP/ACF procedures, Yt can only be formed for observations in which the proxy vari- able is strictly positive. Observations that violate the strict monotonicity of the proxy equation need to be dropped from the first stage, which implies that εt cannot be recovered. This introduces a selection bias since E [ηt | kt, lt, Yt−1, kt−1, lt−1, ιt > 0] 6= E [ηt | kt, lt, Yt−1, kt−1, lt−1], where ιt is the proxy variable, and equation (10) does not hold. The reason is that firms that receive lower draws of ηt are more likely to choose non-positive values of the proxy, and this probability is a function of the other state variables of the firm. An alternative is to form ηt +εt = R ∂ yt − f (kt, lt, mt) dmt + (kt, lt) + h (Yt−1 + (kt−1, lt−1)) , which does not require one to recover εt, and ∂mt C C hence can be formed for all observations. One can then use moments of the form E [ηt + εt | kt, lt, Yt−1, kt−1, lt−1] = 0 to recover the rest of the production function: −C (kt, lt) + h (Yt−1 + C (kt−1, lt−1)). 15As it is well known, it is not possible to separately identify a constant in the production function from mean productivity, E [ωt].

19 there is a set

{(k, l, m) | k = kt, l = lt, m ∈ [m (kt, lt) , mt]}

also contained in the support for some m (kt, lt). Hence with probability 1 the integral

Z mt ∂ f (kt, lt, m) dm = f (kt, lt, mt) + C (kt, lt) m(kt,lt) ∂mt is identified, where the equality follows from the fundamental theorem of calculus. Therefore, if two production functions f and f˜ give rise to the same input elasticity ∂ f (k , l , m ) over the ∂mt t t t support of the data, then they can only differ by an additive function C (kt, lt) . To identify this additive function, observe that we can identify the joint distribution of (Yt, kt, lt, Yt−1, kt−1, lt−1) for Yt defined by (8). Thus the regression function

E [Yt | kt, lt, Yt−1, kt−1, lt−1] = r (kt, lt, Yt−1, kt−1, lt−1) (11)

can be identified for almost all xt = (kt, lt, Yt−1, kt−1, lt−1), where recall that r = −C (kt, lt) +  ˜ ˜ h (Yt−1 + C (kt−1, lt−1)). Let C , h be a candidate alternative structure. The two structures   (C , h) and C˜, h˜ are observationally equivalent if and only if

˜ ˜  ˜  −C (kt, lt) + h (Yt−1 + C (kt−1, lt−1)) = −Ct (kt, lt) + h Yt−1 + C (kt−1, lt−1) , (12)

for almost all points in the support of xt. Our support assumption (4) on (kt, lt) allows us to take partial derivatives of both sides of (12) with respect to kt and lt

∂ ∂ ˜ C (kt, lt) = C (kt, lt) ∂z ∂z

˜ for z ∈ {kt, lt} and for all xt in its support, which implies C (kt, lt) − C (kt, lt) = c for a constant c for almost all xt. Thus we have shown the production function is identified up to an additive constant.

The key insight this result provides is that it makes precise the true empirical content of the

20 proxy variable structure places on the economic environment surrounding the production func- tion. In particular, it shows the empirical content of the structure of the model is such that it can nonparametrically identify the predetermined input elasticities given the flexible input elasticities. However, from our results in Section 3, there is not enough variation in the model to identify the flexible input elasticities themselves. Hence, our Theorems 1 - 3 establish that the precise source of under-identification of the proxy variable model is the elasticity ∂ f (k , l , m ) of the flexible ∂mt t t t input mt that is used as the proxy variable. The conclusion is that the proxy variable structure in LP/Wooldridge is too weak to identify this flexible input elasticity.

The simple intuition for this fact can be seen by juxtaposing the production function yt =

f (kt, lt, mt) + ωt + εt against the structure on intermediate input demand mt = M (kt, lt, ωt).

Observe first that mt is an endogenous variable in the model as it is also a function of the same

productivity shock ωt that determines output yt. However this endogenous variable does not admit any source of variation from outside the production function - the only input demand shifter aside

from the other inputs in the production function is productivity ωt. In particular, the elasticity is

identified with how output varies with mt holding fixed (kt, lt), but the only source of variation in

mt (namely ωt) also simultaneously shifts output yt. Thus it would seem impossible to identify an

elasticity of the production function with respect to intermediate inputs mt. One way out of this problem is to allow for observed shifters that enter the flexible input

demand M, but are excluded from the production function.16 Flexible input and output prices that vary by firm are one natural source of variation and it has been considered recently by Doraszelski and Jaumandreu (2013, 2015). In Section 7 we provide a more detailed discussion of the use of prices as exclusions restrictions, and the circumstances under which they can be used. In the next section, however, we present an alternative approach to identify the flexible input elasticity which is widely applicable and does not rely on exogenously varying observable price differences among firms within an industry.

16It may be possible to achieve identification in the absence of exclusion restrictions by imposing additional re- strictions. One example is using heteroskedasticity restrictions (see e.g., Rigobon, 2003; Klein and Vella, 2010; and Lewbel, 2012), although these approaches require explicit restrictions on the form of the error structure. We thank an anonymous referee for pointing this out. We are not aware of any applications of these ideas in the production function setting.

21 5 Nonparametric Identification of the Flexible Input Elasticity

The source of the identification problem underlying our Theorems 1 and 2 is that the data cannot

disentangle f from M−1 in either the first or second stages of the proxy variable technique. Our solution is to use the restrictions implied by optimal firm behavior which underlie the input demand function M. The key idea is to recognize that f and M are not independent functions for an optimizing firm, but instead the input demand M is implicitly defined by f through the firm’s first order condition. We show that this functional relationship can be exploited in a fully nonparametric fashion (i.e., without imposing parametric structure on f) to solve the non-identification problem.17 The key reason why we are able to use the first order condition with such generality is that the key assumption of the model - Assumption 3 - already presumes intermediate inputs are a flexible input, thus making the economics of this input choice especially tractable. We focus attention in the main body on the classic case of perfect competition in the interme- diate input and output markets. The perfect competition case makes our proposed solution to the identification problem caused by intermediate inputs particularly transparent. In Appendix C4, we show that our approach can be extended to the case of monopolistic competition with unobserved output prices.

Assumption 5. Firms are price takers in the output and intermediate input market, with ρt denot-

ing the common intermediate input price and Pt denoting the common output price facing all firms

3 in period t. The production function f is differentiable at all (k, l, m) ∈ R++. Firms maximize expected discounted profits.

We can view Assumption 5 as a natural strengthening of Assumption 3. Indeed the monotonic-

ity of M in Assumption 3 is typically justified following from the market structure, Assumption 5, under suitable shape restrictions on the production function f (see Appendix A in Levinsohn and Petrin, 2003). However, an important distinction is that our approach can be generalized to allow for additional unobservables in the firm’s problem that scalar unobservability, Assumption 3, cannot accommodate (see Appendix C).

17Please see Appendix C3 for the extension to the case of multiple flexible inputs.

22 Assumption 5 allows us to establish the link between M and the production function f. The firm’s profit maximization problem with respect to intermediate inputs is

 ωt+εt  M (kt, lt, ωt) = arg max PtE F (kt, lt, mt) e | It − ρtMt, (13) Mt

which follows because Mt does not have any dynamic implications and thus only affects current period profits. The first order condition of the problem (13) is

∂ ωt Pt F (kt, lt, mt) e E = ρt, (14) ∂Mt where recall E = E [eεt ]. Taking logs of (14) and differencing with the production function (2) gives

 ∂  st = ln E + ln f (kt, lt, mt) − εt (15) ∂mt E ≡ ln D (kt, lt, mt) − εt

  where s ≡ ln ρtMt is the (log) intermediate input share of output. t PtYt

Theorem 4. Under Assumptions 1, 2, 4, 5, and that ρt,Pt (or price-deflators) are observed, the share regression in equation (15) nonparametrically identifies the flexible input elasticity ∂ f (k , l , m ) ∂mt t t t

of the production function almost everywhere in (kt, lt, mt).

Proof. Because E [εt | kt, lt, mt] = 0 we have that the conditional expectation

E E [st | kt, lt, mt] = ln D (kt, lt, mt) (16)

identifies the function DE . We refer to this regression in the data as the share regression. As we now show, the share regression is the key to close the non-identification gap left by our Theorems

E 1 and 2. Observe that εt = ln D (kt, lt, mt) − st and thus the constant

 E  E = E exp ln D (kt, lt, mt) − st (17)

23 can be identified.18 This allows us to identify the flexible input elasticity as

E ∂ D (kt, lt, mt) D (kt, lt, mt) ≡ f (kt, lt, mt) = . (18) ∂mt E

Theorem 4 shows that, by taking full advantage of the economic content of the model, we can identify the flexible input elasticity. Combined with Theorem 3 this allows us to identify the whole production function f nonparametrically.

6 A Computationally Simple Estimator

In this section we show how to obtain a simple nonparametric estimator of the production function using standard sieve series estimators as analyzed by Chen (2007). Our estimation procedure consists of two steps. We first show how to estimate the share regression, and then proceed to

estimation of the constant of integration C and the Markov process h. We propose a finite-dimensional truncated linear series given by a complete polynomial of degree r for the share regression. In what follows, we add back in the firm subscripts j for clarity.

T Given the observations {(yjt, kjt, ljt, mjt)}t=1 for the firms j = 1,...,J sampled in the data, we propose to use a complete polynomial of degree r in kjt,ljt, mjt and to use the sum of squared

P 2 residuals, jt εjt, as our objective function. For example, for a complete polynomial of degree two, our estimator would solve:

  2 0 0 0 0 0 2 0 2 X  γ0 + γkkjt + γlljt + γmmjt + γkkkjt + γllljt  min sjt − ln   . γ0 0 2 0 0 0 j,t  +γmmmjt + γklkjtljt + γkmkjtmjt + γlmljtmjt 

The solution to this problem is an estimator

X DE (k , l , m ) = γ0 krk lrl mrm , r , r , r ≥ 0, r jt jt jt rk,rl,rm jt jt jt with k l m (19) rk+rl+rm≤r

18Doraszelski and Jaumandreu (2013) use a similar idea to recover this constant.

24 of the elasticity up to the constant E, as well as the residual εjt corresponding to the ex-post shocks

0 to production.19 Since we can estimate Eˆ = 1 P eεˆjt , we can recover γˆ ≡ γˆ , and thus estimate JT j,t Eˆ ∂ f (kjt, ljt, mjt) from equation (19), free of the constant. ∂mjt Given our estimator for the intermediate input elasticity, we can calculate the integral in (7). One advantage of the polynomial sieve estimator we have selected is that this integral will have a closed-form solution:

Z X γrk,rl,rm rk rl rm+1 Dr (kjt, ljt, mjt) ≡ Dr (kjt, ljt, mjt) dmjt = kjt ljtmjt . rm + 1 rk+rl+rm≤r

For a degree two estimator (r = 2) we would have

  γm 2 2 γ0 + γkkjt + γlljt + 2 mjt + γkkkjt + γllljt D2 (kjt, ljt, mjt) ≡   mjt. γmm 2 γkm γlm + 3 mjt + γklkjtljt + 2 kjtmjt + 2 ljtmjt

With an estimate of εjt and of Dr (kjt, ljt, mjt) in hand, we can form a sample analogue of Yjt in   ˆ Yjt equation (8): Yjt ≡ ln . Dˆ k ,l ,m eεˆjt e r( jt jt jt) In the second step, in order to recover the constant of integration C in (9) and the Markovian process h, we use similar complete polynomial series estimators. Since a constant in the production

function cannot be separately identified from mean productivity, E [ωjt], we normalize C (kjt, ljt) to contain no constant. That is, we use

X τk τl Cτ (kjt, ljt) = ατk,τl kjt ljt, with τk, τl ≥ 0, (20) 0<τk+τl≤τ and

X a hA (ωjt−1) = δaωjt−1, with a ≤ A (21) 0≤a≤A

19As with all nonparametric sieve estimators, the number of terms in the series increases with the number of ob- servations. Under mild regularity conditions these estimators will be consistent and asymptotically normal for sieve M-estimators like the one we propose. See Chen (2007).

25 for some degrees τ and A (that increase with the sample size). Combining these gives us

ˆ X τk τl X a Yjt = − ατk,τl kjt ljt + δaωjt−1 + ηjt. (22) 0<τk+τl≤τ 0≤a≤A

Replacing for ωjt−1 we have the estimating equation:

!a ˆ X τk τl X ˆ X τk τl Yjt = − ατk,τl kjt ljt + δa Yjt−1 + ατk,τl kjt−1ljt−1 + ηjt (23) 0<τk+τl≤τ 0≤a≤A 0<τk+τl≤τ

h i  τk τl  ˆa We can then use moments of the form E ηjtkjt ljt = 0 and E ηjtYjt−1 = 0 to form a standard sieve GMM criterion function to estimate (α, δ).20 Notice that the estimator described above is just-identified. One could also use higher-order moments, as well as lags of inputs, to estimate an over-identified version of the model. Even though the estimator we introduce is a straightforward application of sieves, to our knowl- edge, there is no asymptotic distributional theory for multi-step nonparametric sieve estimators. Hence, for the purpose of inference, we interpret our estimator as a flexible parametric approxima- tion to the production function. Our estimator then becomes a standard GMM problem that uses the following moments

 ∂ ln D (k , l , m ) E ε r jt jt jt = 0, jt ∂γ

 τk τl  E ηjtkjt ljt = 0,

 a  E ηjtYjt−1 = 0,

where the first set of moments are the NLLS moments corresponding to the share equation. We can then apply standard GMM theory to do inference.21 Since our main empirical focus is not on

20 ˆ ˆ Alternatively, for a guess of α, one can form ωjt−1 (α) = Yjt−1 + C (kjt−1, ljt−1) = Yjt−1 + P α kτk lτl , and based on equation (22) use moments of the form E [η ω (α)] = 0 to esti- 0<τk+τl≤τ τk,τl jt−1 jt−1 jt jt−1 mate δ. Notice that since ω (α) = Yˆ + P α kτk lτl , this is equivalent to regressing ω on a sieve in jt jt 0<τk+τl≤τ τk,τl jt jt jt  τk τl  ωjt−1. Then the moments E ηjtkjt ljt = 0 can be used to estimate α. 21See e.g., Hansen (1982) for the one-step version and Newey and McFadden (1994) for the two-step version.

26 particular parameters themselves, but on functions of the parameters (e.g., elasticities, productiv- ity), we employ a nonparametric block bootstrap to compute standard errors (see e.g., Horowitz, 2001).22

7 Relationship to Literature

7.1 Price Variation as an Instrument

Recall that the identification problem with respect to intermediate inputs stems from insufficient variation in mjt to identify their influence in the production function independently of the other in- puts. However, by looking at the intermediate input demand equation, mjt = Me (kjt, ljt, ωjt, ρt,Pt), it can be seen that, if prices (Pt, ρt) were firm specific, they could potentially serve as a source of variation to address the identification problem.23 In fact, given enough sources of price variation, prices could potentially be used directly as instruments to estimate the entire production function while controlling for the endogeneity of input decisions. There are several challenges to using prices as instruments (see GM and Ackerberg et al., 2007). First, in many firm-level production datasets, firm-specific prices are not observed. Second, even if price variation is observed, in order to be useful as an instrument, the variation employed must not be correlated with the innovation to productivity, ηjt (as is likely to be the case if firms choose prices optimally due to market power); and it cannot purely reflect differences in the quality of either inputs or output. To the extent that input and output prices capture quality differences,

22Since we estimate a just-identified model, the multi-step and single-step estimators are equivalent. In principle, one could employ the asymptotic results in Chen and Pouzo (2015) and Tao (2015) to do inference for the nonpara- metric estimator in this case. In practice, researchers may want to use an over-identified version of our estimator, in which case these results no longer apply. Hence we focus on the flexible parametric interpretation for the purposes of inference. That being said, in Online Appendix O1, we present Monte Carlo simulations which show that our bootstrap procedure has the correct coverage for the nonparametric estimates. 23The intermediate input demand equation also suggests that time-varying prices, even if they were not firm-specific, could serve as potential instruments. While industry-time specific price indices (deflators) are commonly available in most datasets, this approach is problematic for a few reasons. First, these instruments are likely to have little identifying variation given that they do not vary across firm. Second, given that most firm-level datasets are short panels, with asymptotics taken in the number of firms as opposed to the number of periods, time-series variation alone will not lead to consistent estimates. In addition, when the production function is allowed to vary over time, time-series variation in prices is no longer sufficient for identification, as variation within each period is necessary.

27 prices should be included in the measure of the inputs used in production.24 This is not to say that if one can isolate exogenous price variation, it cannot be used to aid in identification. The point is that just observing price variation is not enough. The case must be made that the price variation that is used is indeed exogenous. For example, if prices are observed and serially correlated, one way to deal with the endogeneity concern, as suggested by Doraszelski and Jaumandreu (2013), is to use lagged prices as instruments. This diminishes the endogeneity concerns, since lagged prices only need to be uncorrelated with the innovation to productivity, ηjt. In an effort to show that wage variation does not reflect differences in worker quality, Doraszelski and Jaumandreu (2015) demonstrate empirically that the majority of wage variation in the Spanish manufacturing dataset they use is not due to variation in the skill mix of workers, and therefore likely due to geographic and temporal differences in labor markets. This work demonstrates that prices (specifically lagged prices), when carefully employed, can be a useful source of variation for identification of the production function. However, as also noted in Doraszelski and Jaumandreu (2013), this information is not available in most datasets. Our approach offers an alternative identification strategy that can be employed even when external instruments are not available.

7.2 Exploiting First-Order Conditions

The use of first-order conditions for the estimation of production functions dates back to at least the work by Klein (1953) and Solow (1957),25 who recognized that, for a Cobb-Douglas production function, there is an explicit relationship between the parameters representing input elasticities and input cost or revenue shares. This observation forms the basis for index number methods (see e.g., Caves, Christensen, and Diewert, 1982) that are used to nonparametrically recover input elasticities and productivity.26

24Recent work has suggested that quality differences may be a key driver of price differences (see GM and Fox and Smeets, 2011). 25Other examples of using first-order conditions to obtain identification include Stone (1954) on consumer demand, Heckman (1974) on labor supply, Hansen and Singleton (1982) on Euler equations and consumption, Paarsch (1992) and Laffont and Vuong (1996) on auctions, and Heckman, Matzkin, and Nesheim (2010) on hedonics. 26Index number methods are grounded in three important economic assumptions. First, all inputs are flexible and competitively chosen. Second, the production exhibits constant , which while not strictly

28 More recently, Doraszelski and Jaumandreu (2013, 2015) and Grieco, Li, and Zhang (2014) exploit the first-order conditions for labor and intermediate inputs under the assumption that they are flexibly chosen. Instead of using shares to recover input elasticities, these papers recognize that given a particular parametric form of the production function, the first-order condition for a flexible input (the proxy equation in LP/ACF) implies cross-equation parameter restrictions that can be used to aid in identification. Using a Cobb-Douglas production function, Doraszelski and Jaumandreu (2013) show that the first-order condition for a flexible input can be re-written to replace for productivity in the production function. Combined with observed variation in the prices of labor and intermediate inputs, they are able to estimate the parameters of the production function and productivity. Doraszelski and Jaumandreu (2015) extend the methodology developed in Doraszelski and Jaumandreu (2013) to estimate productivity when it is non-Hicks neutral, for a CES production function. By exploiting the first order conditions for both labor and intermediate inputs they are able to estimate a standard Hicks-neutral and a labor-augmenting component to productivity. Grieco, Li, and Zhang (2014) also use first order conditions for both labor and intermediate inputs to recover multiple unobservables. In the presence of unobserved heterogeneity in interme- diate input prices, they show that the parametric cross-equation restrictions between the production function and the two first-order conditions, combined with observed wages, can be exploited to es- timate the production function and recover the unobserved intermediate input prices. They also show that their approach can be extended to account for the composition of intermediate inputs and the associated (unobserved) component prices. The paper most closely related to ours is Griliches and Ringstad (1971), which exploits the relationship between the first order condition for a flexible input and the production function in a Cobb-Douglas parametric setting. They use the average revenue share of the flexible input to measure the output elasticity of flexible inputs. This combined with the log-linear form of the Cobb-Douglas production function, allows them to then subtract out the term involving flexible in- necessary is typically assumed in order to avoid imputing a rental price of capital. Third, and most importantly for our comparison, there are no ex-post shocks to output. Allowing for ex-post shocks in the index number framework can only be relaxed by assuming that elasticities are constant across firms, i.e., by imposing the parametric structure of Cobb-Douglas.

29 puts. Finally, under the assumption that the non-flexible inputs are predetermined and uncorrelated with productivity (not just the innovation), they estimate the coefficients for the predetermined inputs. Our identification solution can be seen as a nonparametric generalization of the Griliches and Ringstad (1971) empirical strategy. Instead of using the Cobb-Douglas restriction, our share equa- tion (15) uses revenue shares to recover input elasticities in a fully nonparametric setting. In addition, rather than subtract out the effect of intermediate inputs from the production function, we instead integrate up the intermediate input elasticity and take advantage of the nonparametric cross-equation restrictions between the share equation and the production function. Furthermore, we allow for predetermined inputs to be correlated with productivity, but uncorrelated with just the innovation to productivity.

7.3 Dynamic Panel

An alternative approach employed in the empirical literature is to use the dynamic panel estima- tors of Arellano and Bond (1991) and Blundell and Bond (1998, 2000). Under a linear parametric restriction on the evolution of ωt, these methods take advantage of the conditional moment re- strictions implied by Assumption (2), which allows for the use of appropriately lagged inputs as instruments. If one is willing to step outside the model described in Section 2 and assume that all inputs are not flexible (i.e., rule out the existence of flexible inputs) or that no flexible input satisfies the proxy variable assumption,27 and in addition assume a version of Assumption 4 that includes all inputs, then it may be possible to show that these dynamic panel methods can identify the production function and productivity. However, the bulk of empirical work based on production function estimation has focused on environments in which some inputs are non-flexible and some inputs are flexible. It is this setting that motivates our problem and distinguishes our approach from the

27Note that not satisfying the proxy variable assumption does not guarantee identification in the presence of a flexible input. For example, unobserved serially correlated intermediate input price shocks violate the proxy vari- able assumption. However, this variation generates a measurement problem, since intermediate inputs are typically measured in expenditures.

30 dynamic panel literature.

8 Value Added

A common empirical approach is to employ a value-added production function by relating a mea- sure of the output of a firm to a function of capital and labor only. Typically output is measured empirically as the “value added” by the firm (i.e., the value of gross output minus expenditures on intermediate inputs).28 One potential advantage of this approach is that, by excluding inter- mediate inputs from the production function, it avoids the identification problem associated with intermediate inputs that we describe in Section 3. The use of value added is typically motivated in one of two ways. First, a researcher may feel that a value-added function is a better model of the production process. For example, suppose there is a lot of heterogeneity in the degree of vertical integration within an industry, with firms outsourcing varying degrees of the production process, as a result of the production function being heterogeneous in how intermediate inputs enter. In this case, a researcher may feel that focusing on just the contributions of capital and labor (to the value added by the firm) is preferred to a gross output specification including intermediate inputs. Under this setup, if either capital or labor are flexibly chosen, then our non-identification arguments in Section 3 still apply. Similarly, our identification arguments also apply, and one can use our proposed identification/estimation strategy for this model as well. The second motivation is based on the idea that a value-added function can be constructed from an underlying gross output production function. This value-added function can then be used to recover objects of interest from the underlying gross output production function, such as firm-level productivity eωt+εt and certain features of the production technology (e.g., output elasticities of inputs) with respect to the “primary inputs”, capital and labor. This approach is typically justified either via the restricted profit function or by using structural production functions. As we discuss in more detail below, under the model described in Section 2, in general, neither justification allows

28One exception is Ackerberg, Caves, and Frazer (2015) which uses gross output as the output measure.

31 for a value added production function to be isolated from the gross output production function. Regardless of the motivation for value added, the objects from a value-added specification, particularly productivity, will be fundamentally different than those from gross output. Under the first motivation, this is because productivity from a primitively specified value-added setup mea- sures differences in value added holding capital and labor fixed, as opposed to differences in gross output holding all inputs fixed. The results in this section show that under the second motivation, the value-added objects cannot generally be mapped into their gross output counterparts if all one has are the value-added objects. A key exception is the linear in intermediate inputs Leontief spec- ification that we discuss below, a version of which is employed by Ackerberg, Caves, and Frazer (2015).

8.1 Restricted Profit Value-Added

The first approach to relating gross output to value added is based on the duality results in Bruno (1978) and Diewert (1978). We first briefly discuss their original results, which were derived under the assumption that intermediate inputs are flexibly chosen, but excluding the ex-post shocks. In this case, they show that by replacing intermediate inputs with their optimized value in the profit

E function, the empirical measure of value added, VAt ≡ Yt − Mt can be expressed as:

E ωt VAt = F (kt, lt, M (kt, lt, ωt)) e − M (kt, lt, ωt) ≡ V (kt, lt, ωt) , (24) where we use V (·) to denote value-added in this setup.29 This formulation is sometimes referred to as the restricted profit function (see Lau, 1976; Bruno, 1978; McFadden, 1978). In an index number framework, Bruno (1978) shows that elasticities of gross output with re- spect to capital, labor, and productivity can be locally approximated by multiplying estimates of the

E value-added counterparts by the firm-level ratio of value added to gross output, VAt = (1 − S ), GOt t

29 E PtYt ρtMt Technically, VAt ≡ − , where P t and ρt are the price deflators for output and intermediate inputs, P t ρt Pt respectively. The ratio is equal to the output price in the base year, PBASE, and similarly for the price of interme- P t diate inputs. Since PBASE and ρBASE are constants, they get subsumed in the constants in the F and M functions. For ease of notation, we normalize these constants to 1.

32 where GO stands for gross output.30 For productivity, the result is as follows:

E  E  VA   E  GOt  VAt t VAt elaseωt = elaseωt = elaseωt (1 − St) (25) GOt

See Online Appendix O2 for the details of this derivation. Analogous results hold for the elasticities

ωt with respect to capital and labor by replacing e with Kt or Lt. While this derivation suggests that estimates from the restricted-profit value-added function can be simply multiplied by (1 − St) to recover estimates from the underlying gross output production function, there are several important problems with the relationship in equation (25). First, this approach is based on a local approximation. While this may work well for small changes in productivity, for example looking at productivity growth rates (the original context under which these results were derived), it may not work well for large differences in productivity, such as analyzing cross-sectional productivity differences. Second, this approximation does not account for ex-post shocks to output. As we show in Online Appendix O2, when ex-post shocks are accounted for, the relationship in equation (25) becomes:

 ωt   E ωt   ωt  εt  ∂GOt e ∂V At e ∂Mt e e = (1 − St) + − 1 (26) ωt ωt E ωt ∂e GOt ∂e VAt ∂e GOt E | {z } | {z } GOt E elas VAt eωt elas eωt

The term in brackets is the bias introduced due to the ex-post shock. Ex-post shocks drive a wedge between the local equivalence of value added and gross output objects. Analogous results for the output elasticities of capital and labor can be similarly derived. As a result of the points discussed above, estimates from the restricted profit value-added func- tion cannot simply be “transformed” by re-scaling with the firm-specific share of intermediate inputs to obtain estimates of the underlying production function and productivity. How much of a difference this makes is ultimately an empirical question, which we address in Section 9. Preview-

30These results were originally derived under a general form of technical change. We have augmented the results here to correspond to the standard setup with Hicks-neutral technical change as discussed in Section 2.1.

33 ing our results, we find that re-scaling using the shares, as suggested by equation (25), performs poorly.

8.2 “Structural” Value-Added

The second approach to connecting gross output to value added is based on specific parametric as- sumptions on the production function, such that a value-added production function of only capital, labor, and productivity can be both isolated and measured (see Sims, 1969 and Arrow, 1972). We refer to this version of value added as the “structural value-added production function”. The empirical literature on value-added production functions often appeals to the extreme case of perfect complements (i.e., Leontief). A standard representation is:

ωt+εt Yt = min [H (Kt,Lt) , C (Mt)] e , (27) where C (·) is a monotone increasing and concave function. The main idea underlying the Leontief justification is that, under the assumption that

H (Kt,Lt) = C (Mt) , (28)

ωt+εt the right hand side of equation (27) can be written as H (Kt,Lt) e , a function that does not depend on intermediate inputs M. The key problem with this approach is that, given the assump- tions of the model, the relation in equation (28) will not generally hold. Unless capital or labor is assumed to be flexible, firms either cannot adjust them in period t or can only do so with some positive adjustment cost. The key consequence is that firms may optimally choose to not equate

H(Kt,Lt) and C (Mt), i.e., it may be optimal for the firm to hold onto a larger stock of K and L than can be combined with M, if K and L are both costly (or impossible) to downwardly adjust.31

31 0.5 For example, suppose C (Mt) = Mt . For simplicity, also suppose that capital and labor are fixed one period 0.5 ahead, and therefore cannot be adjusted in the short-run. When Mt ≤ H (Kt,Lt), marginal revenue with respect ∂C(Mt) 0.5 to intermediate inputs equals aPt. When M > H (Kt,Lt), increasing Mt does not increase output due to ∂Mt t the Leontief structure, so marginal revenue is zero. Marginal cost in both cases equals the price of intermediate inputs  2   Pt Pt ρt. The firm’s optimal choice of M is therefore given by Mt = 0.5a , if 0.5a < H (Kt,Lt). But when ρt ρt

34 An exception to this, as discussed in Ackerberg, Caves, and Frazer (2015), is when C (·) is linear (i.e., C (Mt) = aMt). In this case the relation in equation (28) will hold, the right hand side of equation (27) can be written as a function of only capital, labor, and productivity, and we have that

ωt+εt Yt = H (Kt,Lt) e . (29)

This does not imply though that VAE can be used to measure the structural value-added produc-

E tion function, as VAt ≡ Yt − Mt will not be proportional to the value-added production function

ωt+εt 32 H (Kt,Lt) e . Equation (29), however, suggests that gross output could be used on the left hand side to measure the structural value-added production function. This is in fact a version of what Ackerberg, Caves, and Frazer (2015) suggest in estimating a structural value-added produc- tion function based on an underlying Leontief gross output production function.

9 Data and Application

In the previous section we showed that value-added production functions capture fundamentally different objects compared to gross output. A natural question that arises is whether these differ- ences are relevant empirically. A recent survey paper by Syverson (2011) states that many results in the productivity literature are quite robust to alternative measurement approaches. He attributes this to the idea that the underlying variation at the firm level is so large that it dominates any dif- ferences due to measurement. This suggests that whether a researcher uses a value-added or gross output specification should not change any substantive conclusions related to productivity. In this section we show that, not only do the two approaches of gross output and value added produce fun- damentally different patterns of productivity empirically, in many cases the differences are quite large and lead to very different conclusions regarding the relationship between productivity and other dimensions of firm heterogeneity.   Pt 0.5a > H (Kt,Lt), the firm no longer finds it optimal to set H (Kt,Lt) = C (Mt), and prefers to hold onto ρt excess capital and labor. 32 In Online Appendix O2 we also show that moving ωt inside of the min function and/or εt inside of the min function presents a similar set of issues.

35 We quantify the effect of using a value-added rather than gross output specification using two commonly employed plant-level manufacturing datasets. The first dataset comes from the Colom- bian manufacturing census covering all manufacturing plants with more than 10 employees from 1981-1991. This dataset has been used in several studies, including Roberts and Tybout (1997), Clerides, Lach, and Tybout (1998), and Das, Roberts, and Tybout (2007). The second dataset comes from the census of Chilean manufacturing plants conducted by Chile’s Instituto Nacional de Estadística (INE). It covers all firms from 1979-1996 with more than 10 employees. This dataset has also been used extensively in previous studies, both in the production function estimation liter- ature (LP) and in the international trade literature (Pavcnik, 2002 and Alvarez and López, 2005).33 We estimate separate production functions for the five largest 3-digit manufacturing industries in both Colombia and Chile, which are Food Products (311), Textiles (321), Apparel (322), Wood Products (331), and Fabricated Metal Products (381). We also estimate an aggregate specification grouping all manufacturing together. We estimate the production function in two ways.34 First, us- ing our approach from Section 6, we estimate a gross output production function using a complete polynomial series of degree 2 for both the elasticity and the integration constant in the production function.35 That is, adding back in the firm subscripts j for clarity, we use

E 0 0 0 0 0 2 0 2 D2 (kjt, ljt, mjt) = γ0 + γkkjt + γlljt + γmmjt + γkkkjt + γllljt

0 2 0 0 0 +γmmmjt + γklkjtljt + γkmkjtmjt + γlmljtmjt

33We construct the variables adopting the convention used by Greenstreet (2007) with the Chilean dataset, and employ the same approach with the Colombian dataset. In particular, real gross output is measured as deflated rev- enues. Intermediate inputs are formed as the sum of expenditures on raw materials, energy (fuels plus electricity), and services. Real value added is the difference between real gross output and real intermediate inputs, i.e., double deflated value added. Labor input is measured as a weighted sum of blue collar and white collar workers, where blue collar workers are weighted by the ratio of the average blue collar wage to the average white collar wage. Capital is constructed using the perpetual inventory method where investment in new capital is combined with deflated capital from period t − 1 to form capital in period t. Deflators for Colombia are obtained from Pombo (1999) and deflators for Chile are obtained from Bergoeing, Hernando, and Repetto (2003). 34For all of the estimates we present, we obtain standard errors by using the nonparametric block bootstrap with 200 replications. 35We also experimented with higher-order polynomials, and the results were very similar. In a few industries (specifically those with the smallest number of observations) the results are slightly more heterogeneous, as expected.

36 to estimate the intermediate input elasticity and

2 2 C2 (kjt, ljt) = αkkjt + αlljt + αkkkjt + αllljt + αklkjtljt

for the constant of integration. Putting all the elements together, the gross output production func- tion we estimate is given by:

  γm 2 2 γ0 + γkkjt + γlljt + 2 mjt + γkkkjt + γllljt yjt =   mjt (30) γmm 2 γkm γlm + 3 mjt + γklkjtljt + 2 kjtmjt + 2 ljtmjt 2 2 −αkkjt − αlljt − αkkkjt − αllljt − αklkjtljt + ωjt + εjt,

E R D (ljt,kjt,mjt) since yjt = E dmjt − C (kjt, ljt) + ωjt + εjt. Second, we estimate a value-added specification using the commonly-applied method devel- oped by ACF, also using a complete polynomial series of degree 2:

2 2 vajt = βkkjt + βlljt + βkkkjt + βllljt + βklkjtljt + υjt + jt, (31)

where υjt + jt represents productivity in the value-added model. In Table 1 we report estimates of the average output elasticities for each input, as well as the sum, for both the value-added and gross output models.36 In every case but one, the value- added model generates a sum of elasticities that is larger relative to gross output, with an average difference of 2% in Colombia and 6% in Chile. We also report the ratio of the mean capital and labor elasticities, which measures the capital in- tensity (relative to labor) of the production technology in each industry. In general, the value-added estimates of the capital intensity of the technology are larger relative to gross output, although the differences are small. According to both measures, the Food Products (311) and Textiles (321) industries are the most capital intensive in Colombia, and in Chile the most capital intensive are

36The distributions of the estimated firm-specific output elasticities are quite reasonable. For each industry, less than 2% are outside of the range (0,1) for labor and intermediate inputs. For capital, the elasticities are closer to zero, but even in the worst case, less than 9.4% have values below zero. Not surprisingly, these percentages are highest among the the industries with the smallest number of observations.

37 Food Products (311), Textiles (321), and Fabricated Metals (381). In both countries, Apparel (322) and Wood Products (331) are the least capital intensive industries, even compared to the aggregate specification denoted “All” in the tables. Value added also recovers dramatically different patterns of productivity as compared to gross output. Following OP, we define productivity (in levels) as the sum of the persistent and unantici- pated components: eω+ε.37 In Table 2 we report estimates of several frequently analyzed statistics of the resulting productivity distributions. In the first three rows of each panel we report ratios of percentiles of the productivity distribution, a commonly used measure of productivity disper- sion. There are two important implications of these results. First, value added suggests a much larger amount of heterogeneity in productivity across plants within an industry, as the various per- centile ratios are much smaller under gross output. For Colombia, the average 75/25, 90/10, and 95/5 ratios are 1.88, 3.69, and 6.41 under value added, and 1.33, 1.78, and 2.23 under gross out- put. For Chile, the average 75/25, 90/10, and 95/5 ratios are 2.76, 8.02, and 17.93 under value added, and 1.48, 2.20, and 2.95 under gross output. The value-added estimates imply that, with the same amount of inputs, the 95th percentile plant would produce more than 6 times more output in Colombia, and almost 18 times more output in Chile, than the 5th percentile plant. In stark con- trast, we find that under gross output, the 95th percentile plant would produce only 2 times more output in Colombia, and 3 times more output in Chile, than the 5th percentile plant with the same inputs. In addition, the ranking of industries according to the degree of productivity dispersion is not preserved moving from the value added to gross output estimates. For example, in Chile, the Fabricated Metals industry (381) is found to have the smallest amount of productivity dispersion under value added, but the largest amount of dispersion under gross output, for all three dispersion measures. The second important result is that value added also implies much more heterogeneity across

37Since our interest is in analyzing productivity heterogeneity we conduct our analysis using productivity in levels. An alternative would be to measure productivity in logs. However, the log transformation is only a good approximation for measuring percentage differences in productivity across groups when these differences are small, which they are not in our data. We have also computed results based on log productivity. As expected, the magnitude of our results changes, however, our qualitative results comparing gross output and value added still hold. We have also computed results using just the persistent component of productivity, eω. The results are qualitatively similar.

38 industries, which is captured by the finding that the range of the percentile ratios across industries is much tighter using the gross output measure of productivity. For example, for the 95/5 ratio, the value-added estimates indicate a range from 4.36 to 11.01 in Colombia and from 12.52 to 25.08 in Chile, whereas the gross output estimates indicate a range from 2.02 to 2.38 and from 2.48 to 3.31. The surprising aspect of these results is that the dispersion in productivity appears far more stable both across industries and across countries when measured via gross output as opposed to value added. In the conclusion we sketch some important policy implications of this finding for empirical work on the misallocation of resources. In addition to showing much larger overall productivity dispersion, results based on value added also suggest a substantially different relationship between productivity and other dimen- sions of plant-level heterogeneity. We examine several commonly-studied relationships between productivity and other plant characteristics. In the last four rows of each panel in Table 2 we report percentage differences in productivity based on whether plants export some of their output, import intermediate inputs, have positive advertising expenditures, and pay above the median (industry) level of wages. Using the value-added estimates, for most industries exporters are found to be more productive than non-exporters, with exporters appearing to be 83% more productive in Colombia and 14% more productive in Chile across all industries. Using the gross output specification, these estimates of productivity differences fall to 9% in Colombia and 3% in Chile, and actually turn negative (although not statistically different from zero) in some cases. A similar pattern exists when looking at importers of intermediate inputs. The average pro- ductivity difference is 14% in Colombia and 41% in Chile using value added. However, under gross output, these numbers fall to 8% and 13% respectively. The same story holds for differences in productivity based on advertising expenditures. Moving from value added to gross output, the estimated difference in productivity drops for most industries in Colombia, and for all industries in Chile. In several cases it becomes statistically indistinguishable from zero. Another striking contrast arises when we compare productivity between plants that pay wages above versus below the industry median. Using the productivity estimates from a value-added

39 specification, firms that pay wages above the median industry wage are found to be substantially more productive, with the estimated differences ranging from 34%-63% in Colombia and from 47%-123% in Chile. In every case the estimates are statistically significant. Using the gross output specification, these estimates fall to 9%-22% in Colombia and 19%-30% in Chile, representing a fall by a factor of 3, on average, in both countries. Since intermediate input usage is likely to be positively correlated with productivity, we would expect that including (excluding) intermediate inputs in the production function will lead to smaller (larger) differences in productivity heterogeneity. Therefore, we would expect to see the largest discrepancies between the value-added and gross output productivity heterogeneity estimates in industries which are intensive in intermediate input usage. By looking at Tables 1 and 2 we can confirm that, for the most part, this is the case. When comparing the value added and gross output productivity estimates, the largest differences tend to occur in the most intermediate input inten- sive industries, which are Food Products (311) in Colombia and Food Products (311) and Wood Products (331) in Chile. However, this is not always the case. For example, in Chile, the differ- ence between the gross output and value added estimates of the average productivity comparing advertisers and non-advertisers is actually the smallest in the Wood Products (331) industry. In order to isolate the importance of the value-added/gross output distinction separately from the effect of transmission bias, in Table 3 we repeat the above analysis without correcting for the endogeneity of inputs. We examine the raw effects in the data by estimating productivity using simple linear regression (OLS) to estimate both gross output and value-added specifications, using a complete polynomial of degree 2. As can be seen from Table 3, the general pattern of results, that value added leads to larger productivity differences across many dimensions, is similar to our previous results both qualitatively and quantitatively. While the results in Table 3 may suggest that transmission bias is not empirically important, in Table 4 we show evidence to the contrary. In particular, we report the average input elasticities based on estimates for the gross output model using OLS and using our method to correct for transmission bias. A well-known result is that failing to control for transmission bias leads to overestimates of the coefficients on more flexible inputs. The intuition is that the more flexible

40 the input is, the more it responds to productivity shocks and the higher the degree of correlation between that input and unobserved productivity. The estimates in Table 4 show that the OLS results substantially overestimate the output elasticity of intermediate inputs in every case. The average difference is 34%, which illustrates the importance of controlling for the endogeneity generated by the correlation between input decisions and productivity. An important implication of our results is that, while controlling for transmission bias certainly has an effect, the use of value added versus gross output has a much larger effect on the productivity estimates than transmission bias. This suggests that the use of gross output versus value added may be more important from a policy perspective than controlling for the transmission bias that has been the primary focus in the production function literature. Our approach enables the estimation of gross output production functions by solving the identification problem associated with flexible inputs while simultaneously correcting for transmission bias.

9.1 Robustness Checks

Adjusting the Value Added Estimates As discussed in Section 8.1, in the absence of ex-post shocks, the derivation provided in equation (25) suggests that the differences between gross output and value added can be eliminated by re-scaling the value-added estimates by a factor equal to the plant-level ratio of value added to gross output, i.e., one minus the share of intermediate inputs in total output. While this idea has been known in the literature for a while, this re-scaling is very rarely applied in practice.38 As shown in Section 8.1, there are several reasons why this re- scaling may not work. In order to investigate how well the re-scaling of value added estimates performs, we apply the transformation implied by equation (25) using the firm-specific ratio of  E  value added to gross output VAt , a quantity readily available in the data. We find that this re- GOt scaling performs quite poorly in recovering the underlying gross output estimates of the production function and productivity, leading to estimates that are in some cases even further from the gross output estimates than the value-added estimates themselves. In Tables 5 and 6 we report the re-scaled estimates as well as the value-added estimates using

38See Petrin and Sivadasan (2013) for an example in which a version of this is implemented.

41 ACF and the gross output estimates using our method for comparison. At first glance, the re- scaling appears to be working as many of the re-scaled value-added estimates move towards the gross output estimates. However, in some cases, the estimates of dispersion and the relationship between productivity and other dimensions of firm heterogeneity move only slightly towards the gross output estimates, and remain very close to the original value-added estimates. Moreover, in many cases the estimates overshoot the gross output estimates. Even worse, in some cases the re-scaling moves them in the opposite direction and leads to estimates that are even further from the gross output estimates than the original value-added estimates. Finally, in several cases, the re-scaled estimates actually lead to a sign-reversal compared to both the value-added and gross output estimates. Overall, while in some cases the re-scaling applied to the value-added estimates moves them closer to the gross output estimates, it does a poor job of replicating the gross output estimates, and in many cases moves them even further away.

Alternative Assumptions on Flexible Inputs Our new identification and estimation strategy takes advantage of the first-order condition with respect to a flexible input. We have used interme- diate inputs (the sum of raw materials, energy, and services) as the flexible input, as intermediate inputs have been commonly assumed to be flexible in the literature. We believe that this is a rea- sonable assumption because a) the model period is typically a year and b) what is required is that they can be adjusted flexibly at the margin. To the extent that spot markets for commodities exist, including energy and certain raw materials, this enables firms to make such adjustments. However, it may be the case that in some applications, researchers do not want to assume that all intermediate inputs are flexible, or they may want to test the sensitivity of their estimates to this assumption. First, in order to investigate the robustness of our estimator, in Online Appendix O1 we present results from a Monte Carlo experiment in which we introduce adjustment frictions for the flexible input. We then estimate the production function using our estimator, assuming the first order condition holds. We design the simulation so that dynamic panel methods should work well in the presence of adjustment costs, and compare the performance of our method to a version of dynamic panel.

42 We find that our method performs remarkably well overall, even for large values of adjustment costs. As expected, when there are no adjustment costs, our method recovers the true elasticities of the production function, and dynamic panel breaks down. As we increase adjustment costs, our method continues to outperform dynamic panel for small values, performs similarly for intermedi- ate values, and does only marginally worse for large values. Second, as a robustness check on our results, we estimate two different specifications of our model in which we allow some of the components of intermediate inputs to be non-flexible. In

ωt+εt particular, the production function we estimate is of the form F (kt, lt, rmt, nst) e , where rm denotes raw materials and ns denotes energy plus services. In one specification we assume rm to be non-flexible and ns to be flexible, and in the other specification we assume the opposite. See Appendix D for these results. Overall the results are sensible and qualitatively similar to our main results. In addition, the relationship between productivity estimates based on gross output and value added is very similar to the one in the main set of results.

Fixed Effects As we detail in Appendix C, our identification and estimation strategy can be easily extended to incorporate fixed effects in the production function. The production function

a+ωt+εt 39 allowing for fixed effects, a, can be written as Yt = F (kt, lt, mt) e . A common drawback of models with fixed effects is that the differencing of the data needed to subtract out the fixed effects can remove a large portion of the identifying information in the data. In the context of production functions, this often leads to estimates of the capital coefficient and returns to scale that are unrealistically low, as well as large standard errors (see GM). In Appendix D, we report estimates corresponding to those in Tables 1 and 2, using our method to estimate the gross output production function allowing for fixed effects. The elasticity estimates for intermediate inputs are exactly the same as in the specification without fixed effects, as the first stage of our approach does not depend on the presence of fixed effects. We do find some evidence in Colombia of the problems mentioned above as the sample sizes are smaller than those for Chile. Despite this, the estimates are very similar to those from the main specification for both countries,

39See Kasahara, Schrimpf, and Suzuki (2015) for an important extension of our approach to the general case of firm-specific production functions.

43 and the larger differences are associated with larger standard errors.

Extra Unobservables As we show in Appendix C, our approach can also be extended to incor- porate additional unobservables driving the intermediate input demand. Specifically, we allow for an additional unobservable in the share equation for the flexible input (e.g., optimization error). This introduces some small changes to the identification and estimation procedure, but the core ideas are unchanged. In Appendix D we report estimates from this alternative specification. Our results are remarkably robust. The standard errors increase slightly, which is not surprising given that we have introduced an additional unobservable into the model. The point estimates, however, are very similar.

10 Conclusion

In this paper we show that the nonparametric identification of production functions in the presence of both flexible and non-flexible inputs has remained an unresolved issue. We offer a new iden- tification strategy that closes this loop. The key to our approach is exploiting the nonparametric cross-equation restrictions between first order condition for the flexible inputs and the production function. Our empirical analysis demonstrates that value added can generate substantially different pat- terns of productivity heterogeneity as compared to gross output, which suggests that empirical studies of productivity based on value added may lead to fundamentally different policy impli- cations compared to those based on gross output. To illustrate this possibility, consider the recent literature that uses productivity dispersion to explain cross-country differences in output per worker through resource misallocation. As an example, the recent influential paper by Hsieh and Klenow (2009) finds substantial heterogeneity in productivity dispersion (defined as the variance of log productivity) across countries as measured using value added. In particular, when they compare the United States with China and India, the variance of log productivity ranges from 0.40-0.55 for China and 0.45-0.48 for India, but only from 0.17-0.24 for the United States. They then use

44 this estimated dispersion to measure the degree of misallocation of resources in the respective economies. In their main counterfactual they find that, by reducing the degree of misallocation in China and India to that of the United States, aggregate TFP would increase by 30%-50% in China and 40%-60% in India. In our datasets for Colombia and Chile the corresponding estimates of the variance in log productivity using a value-added specification are 0.43 and 0.94, respectively. Thus their analysis applied to our data would suggest that there is similar room for improvement in aggregate TFP in Colombia, and much more in Chile. However, when productivity is measured using our gross output framework, our empirical findings suggest a much different result. The variance of log productivity using gross output is 0.08 in Colombia and 0.15 in Chile. These significantly smaller dispersion measures could imply that there is much less room for improvement in aggregate productivity for Colombia and Chile. Since the 90/10 ratios we obtain for Colombia and Chile using gross output are quantitatively very similar to the estimates obtained by Syverson (2004) for the United States (who also employed gross output but in an index number framework), this also suggests that the degree of differences in misallocation of resources between developed and developing countries may not be as large as the analysis of Hsieh and Klenow (2009) implies.40 Exploring the role of gross output production functions for policy problems such as the one above could be a fruitful direction for future research. A key message of this paper is that insights derived under value added, compared to gross output, could lead to significantly different policy conclusions. Our identification strategy provides researchers with a stronger foundation for using gross output production functions in practice.

40Hsieh and Klenow note that their estimate of log productivity dispersion for the United States is larger than previous estimates by Foster, Haltiwanger, and Syverson (2008) by a factor of almost 4. They attribute this to the fact that Foster, Haltiwanger, and Syverson use a selected set of homogeneous industries. However, another important difference is that Foster, Haltiwanger, and Syverson use gross output measures of productivity rather than value-added measures. Given our results in Section 9, it is likely that a large part of this difference is due to Hsieh and Klenow’s use of value added, rather than their selection of industries.

45 References Ackerberg, Daniel, C. Lanier Benkard, Steven Berry, and Ariel Pakes. 2007. “Econometric Tools For Analyzing Market Outcomes.” In Handbook of Econometrics, vol. 6, edited by James J. Heckman and Edward E. Leamer. Amsterdam: Elsevier, 4171–4276.

Ackerberg, Daniel A, Kevin Caves, and Garth Frazer. 2015. “Identification Properties of Recent Production Function Estimators.” Econometrica 83 (6):2411–2451.

Alvarez, Roberto and Ricardo A. López. 2005. “Exporting and Performance: Evidence from Chilean Plants.” Canadian Journal of Economics 38 (4):1384–1400.

Arellano, Manuel and Stephen Bond. 1991. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations.” Review of Economic Studies 58 (2):277–297.

Arrow, Kenneth J. 1972. The Measurement of Real Value Added. Stanford: Institute for Mathe- matical Studies in the Social Sciences.

Baily, Martin N., Charles Hulten, and David Campbell. 1992. “Productivity Dynamics in Manu- facturing Plants.” Brookings Papers on Economic Activity. Microeconomics :187–267.

Bartelsman, Eric J. and Mark Doms. 2000. “Understanding Productivity: Lessons from Longitu- dinal Microdata.” Finance and Economics Discussion Series 2000-2019, Board of Governors of the Federal Reserve System (U.S.).

Bergoeing, Raphael, Andrés Hernando, and Andrea Repetto. 2003. “Idiosyncratic Productivity Shocks and Plant-Level Heterogeneity.” Documentos de Trabajo 173, Centro de Economía Aplicada, Universidad de Chile.

Bernard, Andrew B., Jonathan Eaton, J. Bradford Jensen, and Samuel Kortum. 2003. “Plants and Productivity in International Trade.” American Economic Review 93 (4):1268–1290.

Bernard, Andrew B. and J. Bradford Jensen. 1995. “Exporters, Jobs, and Wages in U.S. Manufac- turing: 1976-1987.” Brookings Papers on Economic Activity. Microeconomics :67–119.

———. 1999. “Exceptional Exporter Performance: Cause, Effect or Both?” Journal of Interna- tional Economics 47 (1):1–25.

Blundell, Richard and Stephen Bond. 1998. “Initial Conditions and Moment Restrictions in Dy- namic Panel Data Models.” Journal of Econometrics 87 (1):115–143.

———. 2000. “GMM Estimation with Persistent Panel Data: An Application to Production Func- tions.” Econometric Reviews 19 (3):321–340.

Bond, Stephen and Måns Söderbom. 2005. “Adjustment Costs and the Identification of Cobb Dou- glas Production Functions.” Unpublished Manuscript, The Institute for Fiscal Studies, Working Paper Series No. 05/4.

46 Bruno, Michael. 1978. “Duality, Intermediate Inputs, and Value-Added.” In Production Eco- nomics: A Dual Approach to Theory and Applications, vol. 2, edited by M. Fuss and McFadden D., chap. 1. Amsterdam: North-Holland.

Caves, Douglas W., Laurits R. Christensen, and W. Erwin Diewert. 1982. “The Economic Theory of Index Numbers and the Measurement of Input, Output, and Productivity.” Econometrica 50 (6):1393–1414.

Chen, Xiaohong. 2007. “Large Sample Sieve Estimation of Semi-Nonparametric Models.” In Handbook of Econometrics, vol. 6, edited by James J. Heckman and Edward E. Leamer. Ams- terdam: Elsevier, 5549–5632.

Chen, Xiaohong and Demian Pouzo. 2015. “Sieve Quasi Likelihood Ratio Inference on Semi/Nonparametric Conditional Moment Models.” Econometrica 83 (3):1013–1079.

Clerides, Sofronis K., Saul Lach, and James R. Tybout. 1998. “Is Learning by Exporting Impor- tant? Micro-Dynamic Evidence from Colombia, Mexico, and Morocco.” The Quarterly Journal of Economics 113 (3):903–947.

Collard-Wexler, Allan. 2010. “Productivity Dispersion and Plant Selection in the Ready-Mix Con- crete Industry.” NYU Stern working paper.

Cunha, Flavio, James J. Heckman, and Susanne M. Schennach. 2010. “Estimating the Technology of Cognitive and Noncognitive Skill Formation.” Econometrica 78 (3):883–931.

Das, Sanghamitra, Mark J. Roberts, and James R. Tybout. 2007. “Market Entry Costs, Producer Heterogeneity, and Export Dynamics.” Econometrica 75 (3):837–873.

De Loecker, Jan. 2011. “Product Differentiation, Multiproduct Firms, and Estimating the Impact of Trade Liberalization on Productivity.” Econometrica 79 (5):1407–1451.

Dhrymes, Phoebus J. 1991. The Structure of Production Technology Productivity and Aggregation Effects. US Department of Commerce, Bureau of the Census.

Diewert, W. Erwin. 1978. “Hick’s Aggregation Theorem and the Existence of a Real Value Added Function.” In Production Economics: A Dual Approach to Theory and Practice, vol. 2, edited by Melvyn Fuss and Daniel McFadden, chap. 2. Amsterdam: North-Holland.

Doraszelski, Ulrich and Jordi Jaumandreu. 2013. “R&D and Productivity: Estimating Endogenous Productivity.” Review of Economic Studies 80 (4):1338–1383.

———. 2015. “Measuring the Bias of Technological Change.” Working Paper.

Foster, Lucia, John Haltiwanger, and Chad Syverson. 2008. “Reallocation, Firm Turnover, and Efficiency: Selection on Productivity or Profitability?” American Economic Review 98 (1):394– 425.

Fox, Jeremy and Valérie Smeets. 2011. “Does Input Quality Drive Measured Differences in Firm Productivity?” International Economic Review 52 (4):961–989.

47 Goldberger, Arthur S. 1968. “The Interpretation and Estimation of Cobb-Douglas Functions.” Econometrica 35 (3-4):464–472.

Greenstreet, David. 2007. “Exploiting Sequential Learning to Estimate Establishment-Level Pro- ductivity Dynamics and Decision Rules.” Economics Series Working Papers 345, University of Oxford, Department of Economics.

Grieco, Paul, Shengyu Li, and Hongsong Zhang. 2014. “Production Function Estimation with Unobserved Input Price Dispersion.” Forthcoming, International Economic Review.

Griliches, Zvi and Jacques Mairesse. 1998. “Production Functions: The Search for Identification.” In Econometrics and Economic Theory in the Twentieth Century: The Ragnar Frisch Centennial Symposium. New York: Cambridge University Press, 169–203.

Griliches, Zvi and Vidar Ringstad. 1971. and the Form of the Production Function: An Econometric Study of Norwegian Manufacturing Establishment Data. North- Holland Pub. Co. (Amsterdam).

Hansen, Lars Peter. 1982. “Large Sample Properties of Generalized Method of Moments Estima- tors.” Econometrica 50 (4):1029–1054.

Hansen, Lars Peter and Kenneth J. Singleton. 1982. “Generalized Instrumental Variables Estima- tion of Nonlinear Rational Expectations Models.” Econometrica 50 (5):1269–1286.

Heckman, James J. 1974. “Shadow Prices, Market Wages, and Labor Supply.” Econometrica 42 (4):679–694.

Heckman, James J., Rosa L. Matzkin, and Lars Nesheim. 2010. “Nonparametric Identification and Estimation of Nonadditive Hedonic Models.” Econometrica 78 (5):1569–1591.

Horowitz, Joel L. 2001. “The Bootstrap.” In Handbook of Econometrics, vol. 5, edited by James J. Heckman and Edward E. Leamer. Amsterdam: Elsevier, 3159–3228.

Houthakker, Hendrik S. 1950. “Revealed Preference and the Utility Function.” Economica 17 (66):159–174.

Hsieh, Chang-Tai and Peter J. Klenow. 2009. “Misallocation and Manufacturing TFP in China and India.” Quarterly Journal of Economics 124 (4):1403–1448.

Hu, Yingyao and Susanne M Schennach. 2008. “Instrumental variable treatment of nonclassical measurement error models.” Econometrica 76 (1):195–216.

Kasahara, Hiroyuki and Joel Rodrigue. 2008. “Does the Use of Imported Intermediates Increase Productivity? Plant-level Evidence.” The Journal of Development Economics 87 (1):106–118.

Kasahara, Hiroyuki, Paul Schrimpf, and Michio Suzuki. 2015. “Identification and Estimation of Production Function with Unobserved Heterogeneity.” Working Paper.

Klein, Lawrence R. 1953. A Textbook of Econometrics. Evanston: Row, Peterson and Co.

48 Klein, Roger and Francis Vella. 2010. “Estimating a Class of Triangular Simultaneous Equations Models Without Exclusion Restrictions.” Journal of Econometrics 154 (2):154–164.

Klette, Tor Jacob and Zvi Griliches. 1996. “The Inconsistency of Common Scale Estimators When Output Prices are Unobserved and Endogenous.” Journal of Applied Econometrics 11 (4):343– 361.

Laffont, Jean-Jacques and Quang Vuong. 1996. “Structural Analysis of Auction Data.” American Economic Review, P&P 86 (2):414–420.

Lau, Lawrence J. 1976. “A Characterization of the Normalized Restricted Profit Function.” Journal of Economic Theory 12 (1):131–163.

Levinsohn, James and Amil Petrin. 2003. “Estimating Production Functions Using Inputs to Con- trol for Unobservables.” Review of Economic Studies 70 (2):317–342.

Lewbel, Arthur. 2012. “Using Heteroscedasticity to Identify and Estimate Mismeasured and En- dogenous Regressor Models.” Journal of Business & Economic Statistics 30 (1):67–80.

Manski, Charles F. 2003. Partial Identification of Probability Distributions. New York: Springer- Verlag.

Marschak, Jacob and William H. Andrews. 1944. “Random Simultaneous Equations and the The- ory of Production.” Econometrica 12 (3-4):143–205.

Matzkin, Rosa L. 2007. “Nonparametric Identification.” In Handbook of Econometrics, vol. 6, edited by James J. Heckman and Edward E. Leamer. Amsterdam: Elsevier, 5307–5368.

McFadden, Daniel. 1978. “Cost, Revenue, and Profit Functions.” In Production Economics: A Dual Approach to Theory and Applications, vol. 1, edited by M. Fuss and McFadden D., chap. 1. Amsterdam: North-Holland.

Newey, Whitney K. and Daniel McFadden. 1994. “Large sample estimation and hypothesis test- ing.” In Handbook of Econometrics, vol. 4, edited by Robert F. Engle and Daniel L. McFadden. Amsterdam: North-Holland, 2111–2245.

Newey, Whitney K and James L Powell. 2003. “Instrumental Variable Estimation of Nonparamet- ric Models.” Econometrica 71 (5):1565–1578.

Newey, Whitney K., James L. Powell, and F Vella. 1999. “Nonparametric Estimation of Triangular Simultaneous Equations Models.” Econometrica 67 (3):565–603.

Olley, G. Steven and Ariel Pakes. 1996. “The Dynamics of Productivity in the Telecommunications Equipment Industry.” Econometrica 64 (6):1263–1297.

Paarsch, Harry J. 1992. “Deciding Between the Common and Private Value Paradigms in Empirical Models of Auctions.” Journal of Econometrics 51 (1-2):191–215.

Pavcnik, Nina. 2002. “Trade Liberalization Exit and Productivity Improvements: Evidence from Chilean Plants.” Review of Economic Studies 69 (1):245–276.

49 Petrin, Amil and Jagadeesh Sivadasan. 2013. “Estimating Lost Output from Allocative Ineffi- ciency, with an Application to Chile and Firing Costs.” The Review of Economics and Statistics 95 (1):286–301.

Pombo, Carlos. 1999. “Productividad Industrial en Colombia: Una Aplicacion de Numeros In- dicey.” Revista de Economia del Rosario 2 (3):107–139.

Rigobon, Roberto. 2003. “Identification Through Heteroskedasticity.” Review of Economics and Statistics 85 (4):777–792.

Roberts, Mark J. and James R. Tybout. 1997. “The Decision to Export in Colombia: An Empirical Model of Entry with Sunk Costs.” American Economic Review 87 (4):545–564.

Robinson, Peter M. 1988. “Root-N-Consistent Semiparametric Regression.” Econometrica 56 (4):931–954.

Roehrig, Charles S. 1988. “Conditions for Identification in Nonparametric and Parametric Mod- els.” Econometrica 56 (2):433–447.

Sims, Christopher A. 1969. “Theoretical Basis for a Double Deflated Index of Real Value Added.” The Review of Economics and Statistics 51 (4):470–471.

Solow, Robert M. 1957. “Technical Change and the Aggregate Production Function.” Review of Economics and Statistics 39 (3):312–320.

Stone, Richard. 1954. “Linear Expenditure Systems and Demand Analysis: An Application to the Pattern of British Demand.” The Economic Journal 64 (255):511–527.

Syverson, Chad. 2004. “Product Substitutability and Productivity Dispersion.” The Review of Economics and Statistics 86 (2):534–550.

———. 2011. “What Determines Productivity?” Journal of Economic Literature 49 (2):326–365.

Tao, Jing. 2015. “Inference for Point and Partially Identified Semi-Nonparametric Conditional Moment Models.” Working Paper.

Varian, Hal R. 1992. Microeconomic Analysis. New York: WW Norton.

Wooldridge, Jeffrey M. 2009. “On Estimating Firm-Level Production Functions Using Proxy Variables to Control for Unobservables.” Economics Letters 104 (3):112–114.

50 Table 1: Average Input Elasticities of Output (Structural Estimates: Value Added vs. Gross Ouput)

Colombia Industry (ISIC Code) Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Value Gross Value Gross Value Gross Value Gross Value Gross Value Gross Added Output Added Output Added Output Added Output Added Output Added Output (ACF) (GNR) (ACF) (GNR) (ACF) (GNR) (ACF) (GNR) (ACF) (GNR) (ACF) (GNR)

Labor 0.70 0.22 0.65 0.32 0.83 0.42 0.86 0.44 0.89 0.43 0.78 0.35 (0.04) (0.02) (0.06) (0.03) (0.03) (0.02) (0.06) (0.05) (0.04) (0.02) (0.01) (0.01)

Capital 0.33 0.12 0.36 0.16 0.16 0.05 0.12 0.04 0.25 0.10 0.31 0.14 (0.02) (0.01) (0.04) (0.02) (0.02) (0.01) (0.04) (0.02) (0.03) (0.01) (0.01) (0.01)

Intermediates -- 0.67 -- 0.54 -- 0.52 -- 0.51 -- 0.53 -- 0.54

(0.01) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 1.03 1.01 1.01 1.01 0.99 0.99 0.98 0.99 1.14 1.06 1.09 1.04 (0.03) (0.01) (0.04) (0.02) (0.02) (0.01) (0.07) (0.04) (0.02) (0.01) (0.01) (0.00) Mean(Capital) / Mean(Labor) 0.47 0.55 0.55 0.49 0.19 0.12 0.14 0.08 0.28 0.23 0.39 0.40 (0.06) (0.08) (0.10) (0.09) (0.03) (0.04) (0.05) (0.05) (0.04) (0.04) (0.02) (0.03)

Chile

Labor 0.77 0.28 0.93 0.45 0.95 0.45 0.92 0.40 0.96 0.52 0.77 0.38 (0.02) (0.01) (0.04) (0.03) (0.04) (0.02) (0.04) (0.02) (0.04) (0.03) (0.01) (0.01)

Capital 0.33 0.11 0.24 0.11 0.20 0.06 0.19 0.07 0.25 0.13 0.37 0.16 (0.01) (0.01) (0.02) (0.01) (0.03) (0.01) (0.02) (0.01) (0.02) (0.01) (0.01) (0.00)

Intermediates -- 0.67 -- 0.54 -- 0.56 -- 0.59 -- 0.50 -- 0.55 (0.00) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 1.10 1.05 1.17 1.10 1.14 1.08 1.11 1.06 1.22 1.15 1.13 1.09 (0.02) (0.01) (0.03) (0.02) (0.03) (0.02) (0.03) (0.01) (0.03) (0.02) (0.01) (0.01) Mean(Capital) / Mean(Labor) 0.43 0.39 0.26 0.24 0.21 0.14 0.21 0.18 0.26 0.25 0.48 0.43 (0.03) (0.03) (0.03) (0.04) (0.04) (0.03) (0.03) (0.03) (0.03) (0.03) (0.02) (0.02)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers in the first column are based on a value-added specification and are estimated using a complete polynomial series of degree 2 with the method from Ackerberg, Caves, and Frazer (2006). The numbers in the second column are based on a gross output specification and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, and intermediate input elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity.

51 Table 2: Heterogeneity in Productivity (Structural Estimates)

Colombia Industry (ISIC Code) Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Value Gross Value Gross Value Gross Value Gross Value Gross Value Gross Added Output Added Output Added Output Added Output Added Output Added Output (ACF) (GNR) (ACF) (GNR) (ACF) (GNR) (ACF) (GNR) (ACF) (GNR) (ACF) (GNR)

75/25 ratio 2.20 1.33 1.97 1.35 1.66 1.29 1.73 1.30 1.78 1.31 1.95 1.37 (0.07) (0.02) (0.09) (0.03) (0.03) (0.01) (0.08) (0.04) (0.04) (0.02) (0.17) (0.01)

90/10 ratio 5.17 1.77 3.71 1.83 2.87 1.66 3.08 1.80 3.33 1.74 4.01 1.86 (0.27) (0.05) (0.30) (0.07) (0.09) (0.03) (0.38) (0.12) (0.13) (0.03) (0.07) (0.02)

95/5 ratio 11.01 2.24 6.36 2.38 4.36 2.02 4.58 2.24 5.31 2.16 6.86 2.36 (1.11) (0.08) (0.76) (0.14) (0.22) (0.05) (1.01) (0.22) (0.34) (0.06) (0.02) (0.03)

Exporter 3.62 0.14 0.20 0.02 0.16 0.05 0.26 0.15 0.20 0.08 0.51 0.06 (0.99) (0.05) (0.10) (0.03) (0.07) (0.03) (0.63) (0.14) (0.05) (0.03) (0.12) (0.01)

Importer -0.25 0.04 0.27 0.05 0.29 0.12 0.06 0.05 0.26 0.10 0.20 0.11 (0.08) (0.02) (0.10) (0.04) (0.08) (0.03) (0.53) (0.08) (0.06) (0.02) (0.05) (0.01)

Advertiser -0.46 -0.03 0.20 0.08 0.13 0.05 0.02 0.04 0.15 0.05 -0.13 0.03 (0.10) (0.02) (0.07) (0.03) (0.04) (0.02) (0.09) (0.04) (0.04) (0.02) (0.06) (0.01)

Wages > Median 0.59 0.09 0.60 0.18 0.41 0.18 0.34 0.15 0.55 0.22 0.63 0.20 (0.19) (0.02) (0.09) (0.03) (0.03) (0.02) (0.17) (0.04) (0.06) (0.02) (0.05) (0.01)

Chile

75/25 ratio 2.92 1.37 2.56 1.48 2.58 1.43 3.06 1.50 2.45 1.53 3.00 1.55 (0.05) (0.01) (0.07) (0.02) (0.07) (0.02) (0.08) (0.02) (0.06) (0.02) (0.03) (0.01)

90/10 ratio 9.02 1.90 6.77 2.16 6.76 2.11 10.12 2.32 6.27 2.33 9.19 2.39 (0.30) (0.02) (0.30) (0.05) (0.33) (0.05) (0.60) (0.05) (0.27) (0.05) (0.15) (0.02)

95/5 ratio 21.29 2.48 13.56 2.91 14.21 2.77 25.08 3.11 12.52 3.13 20.90 3.31 (0.99) (0.05) (0.84) (0.09) (0.77) (0.09) (2.05) (0.11) (0.78) (0.10) (0.47) (0.04)

Exporter 0.27 0.02 0.07 0.02 0.18 0.09 0.12 0.00 0.03 -0.01 0.20 0.03 (0.10) (0.02) (0.07) (0.03) (0.08) (0.03) (0.12) (0.03) (0.06) (0.03) (0.04) (0.01)

Importer 0.71 0.14 0.22 0.10 0.31 0.14 0.44 0.15 0.30 0.11 0.46 0.15 (0.11) (0.02) (0.05) (0.02) (0.05) (0.02) (0.10) (0.03) (0.05) (0.02) (0.03) (0.01)

Advertiser 0.18 0.04 0.09 0.04 0.15 0.06 0.04 0.03 0.07 0.01 0.14 0.06 (0.05) (0.01) (0.04) (0.02) (0.04) (0.02) (0.04) (0.01) (0.04) (0.02) (0.02) (0.01)

Wages > Median 1.23 0.21 0.47 0.19 0.62 0.22 0.68 0.21 0.56 0.22 0.99 0.30 (0.09) (0.01) (0.06) (0.02) (0.06) (0.02) (0.08) (0.02) (0.06) (0.02) (0.04) (0.01)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers in the first column are based on a value-added specification and are estimated using a complete polynomial series of degree 2 with the method from Ackerberg, Caves, and Frazer (2006). The numbers in the second column are based on a gross output specification and are estimated using a complete polynomial series of degree 2 for each of the nonparametric functions (G and C ) of our approach. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile value added implies that a firm that advertises is, on average, 18% more productive than a firm that does not advertise.

52 Table 3: Heterogeneity in Productivity (Uncorrected OLS Estimates)

Colombia Industry (ISIC Code) Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Value Gross Value Gross Value Gross Value Gross Value Gross Value Gross Added Output Added Output Added Output Added Output Added Output Added Output (OLS) (OLS) (OLS) (OLS) (OLS) (OLS) (OLS) (OLS) (OLS) (OLS) (OLS) (OLS)

75/25 ratio 2.17 1.16 1.86 1.21 1.65 1.17 1.72 1.23 1.78 1.23 1.93 1.24 (0.06) (0.01) (0.06) (0.01) (0.03) (0.01) (0.06) (0.02) (0.04) (0.01) (0.02) (0.00)

90/10 ratio 5.15 1.42 3.50 1.51 2.81 1.44 3.05 1.57 3.30 1.53 3.96 1.58 (0.27) (0.02) (0.18) (0.04) (0.08) (0.02) (0.22) (0.06) (0.12) (0.02) (0.06) (0.01)

95/5 ratio 10.86 1.74 5.77 1.82 4.23 1.74 4.67 2.01 5.22 1.82 6.81 1.94 (0.94) (0.05) (0.55) (0.08) (0.20) (0.04) (0.72) (0.15) (0.31) (0.04) (0.15) (0.02)

Exporter 3.42 0.09 -0.03 -0.01 0.10 0.00 0.21 0.10 0.12 0.03 0.45 0.01 (0.99) (0.04) (0.04) (0.01) (0.05) (0.01) (0.19) (0.09) (0.04) (0.02) (0.12) (0.01)

Importer -0.23 -0.02 0.09 0.00 0.21 0.02 0.02 -0.03 0.20 0.05 0.14 0.04 (0.07) (0.01) (0.06) (0.01) (0.06) (0.01) (0.06) (0.02) (0.05) (0.01) (0.04) (0.01)

Advertiser -0.46 -0.07 0.11 -0.04 0.10 -0.03 0.01 -0.02 0.08 0.00 -0.16 -0.02 (0.11) (0.02) (0.05) (0.02) (0.03) (0.01) (0.07) (0.03) (0.04) (0.01) (0.06) (0.01)

Wages > Median 0.51 0.06 0.49 0.10 0.39 0.13 0.33 0.11 0.50 0.13 0.56 0.13 (0.15) (0.02) (0.07) (0.02) (0.03) (0.01) (0.08) (0.03) (0.04) (0.01) (0.05) (0.01)

Chile

75/25 ratio 2.91 1.30 2.57 1.40 2.56 1.36 3.07 1.39 2.47 1.46 3.01 1.45 (0.05) (0.00) (0.07) (0.01) (0.07) (0.01) (0.08) (0.01) (0.06) (0.01) (0.03) (0.00)

90/10 ratio 9.00 1.72 6.63 1.97 6.64 1.91 10.21 2.03 6.27 2.14 9.13 2.14 (0.29) (0.01) (0.31) (0.04) (0.29) (0.03) (0.57) (0.04) (0.26) (0.04) (0.15) (0.01)

95/5 ratio 20.93 2.15 13.49 2.57 14.20 2.45 25.26 2.77 12.18 2.80 20.64 2.86 (0.96) (0.02) (0.83) (0.07) (0.80) (0.05) (2.05) (0.07) (0.77) (0.06) (0.47) (0.03)

Exporter 0.17 -0.01 0.04 -0.02 0.12 0.01 0.12 -0.02 0.00 0.00 0.15 -0.01 (0.09) (0.02) (0.06) (0.02) (0.08) (0.02) (0.09) (0.02) (0.06) (0.02) (0.04) (0.01)

Importer 0.57 0.03 0.20 0.04 0.26 0.06 0.41 0.07 0.27 0.06 0.41 0.09 (0.09) (0.01) (0.04) (0.02) (0.05) (0.01) (0.09) (0.03) (0.05) (0.02) (0.03) (0.01)

Advertiser 0.12 0.00 0.07 0.01 0.11 0.02 0.02 0.01 0.05 0.01 0.10 0.04 (0.04) (0.01) (0.04) (0.01) (0.04) (0.01) (0.04) (0.01) (0.04) (0.02) (0.02) (0.01)

Wages > Median 1.11 0.12 0.45 0.15 0.58 0.16 0.66 0.13 0.53 0.16 0.94 0.24 (0.07) (0.01) (0.05) (0.02) (0.06) (0.02) (0.07) (0.02) (0.06) (0.02) (0.03) (0.01)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers in the first column are based on a value-added specification and are estimated using a complete polynomial series of degree 2 with OLS. The numbers in the second column are based on a gross output specification estimated using a complete polynomial series of degree 2 with OLS. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile value added implies that a firm that advertises is, on average, 12% more productive than a firm that does not advertise.

53 Table 4: Average Input Elasticities of Output (Gross Output: Structural vs. Uncorrected OLS Estimates)

Colombia Industry (ISIC Code) Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Gross Gross Gross Gross Gross Gross Gross Gross Gross Gross Gross Output Output Output Output Output Output Output Output Output Output Output Output (OLS) (GNR) (OLS) (GNR) (OLS) (GNR) (OLS) (GNR) (OLS) (GNR) (OLS) (GNR)

Labor 0.15 0.22 0.21 0.32 0.32 0.42 0.32 0.44 0.29 0.43 0.26 0.35 (0.01) (0.02) (0.02) (0.03) (0.01) (0.02) (0.03) (0.05) (0.02) (0.02) (0.01) (0.01)

Capital 0.04 0.12 0.06 0.16 0.01 0.05 0.03 0.04 0.03 0.10 0.06 0.14 (0.01) (0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.02) (0.01) (0.01) (0.00) (0.01)

Intermediates 0.82 0.67 0.76 0.54 0.68 0.52 0.65 0.51 0.73 0.53 0.72 0.54

(0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00)

Sum 1.01 1.01 1.03 1.01 1.01 0.99 1.00 0.99 1.05 1.06 1.04 1.04 (0.01) (0.01) (0.01) (0.02) (0.01) (0.01) (0.02) (0.04) (0.01) (0.01) (0.00) (0.00) Mean(Capital) / Mean(Labor) 0.27 0.55 0.27 0.49 0.04 0.12 0.08 0.08 0.11 0.23 0.23 0.40

(0.07) (0.08) (0.06) (0.09) (0.02) (0.04) (0.05) (0.05) (0.04) (0.04) (0.01) (0.03)

Chile

Labor 0.17 0.28 0.26 0.45 0.29 0.45 0.20 0.40 0.32 0.52 0.20 0.38 (0.01) (0.01) (0.02) (0.03) (0.02) (0.02) (0.01) (0.02) (0.02) (0.03) (0.01) (0.01)

Capital 0.05 0.11 0.06 0.11 0.03 0.06 0.02 0.07 0.07 0.13 0.09 0.16 (0.00) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.00) (0.00)

Intermediates 0.83 0.67 0.75 0.54 0.74 0.56 0.81 0.59 0.71 0.50 0.77 0.55 (0.01) (0.00) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.00) (0.00)

Sum 1.05 1.05 1.06 1.10 1.06 1.08 1.04 1.06 1.10 1.15 1.06 1.09 (0.00) (0.01) (0.01) (0.02) (0.01) (0.02) (0.01) (0.01) (0.01) (0.02) (0.00) (0.01) Mean(Capital) / Mean(Labor) 0.28 0.39 0.22 0.24 0.12 0.14 0.12 0.18 0.21 0.25 0.42 0.43

(0.03) (0.03) (0.04) (0.04) (0.03) (0.03) (0.05) (0.03) (0.04) (0.03) (0.02) (0.02)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers in the first column are based on a gross output specification and are estimated using a complete polynomial series of degree 2 with OLS. The numbers in the second column are also based on a gross output specification using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, and intermediate input elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity.

54 Table 5: Average Input Elasticities of Output--Rescaled Value Added (Structural Estimates: Rescaled Value Added vs. Gross Ouput)

Colombia Industry (ISIC Code) Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Value Value Value Value Value Value Value Added Gross Value Added Gross Value Added Gross Value Added Gross Value Added Gross Value Added Gross Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR)

Labor 0.70 0.20 0.22 0.65 0.28 0.32 0.83 0.38 0.42 0.86 0.40 0.44 0.89 0.40 0.43 0.78 0.33 0.35 (0.04) (0.01) (0.02) (0.06) (0.03) (0.03) (0.03) (0.02) (0.02) (0.06) (0.03) (0.05) (0.04) (0.02) (0.02) (0.01) (0.01) (0.01)

Capital 0.33 0.08 0.12 0.36 0.15 0.16 0.16 0.07 0.05 0.12 0.05 0.04 0.25 0.11 0.10 0.31 0.13 0.14 (0.02) (0.01) (0.01) (0.04) (0.02) (0.02) (0.02) (0.01) (0.01) (0.04) (0.02) (0.02) (0.03) (0.01) (0.01) (0.01) (0.00) (0.01)

Intermediates -- -- 0.67 -- -- 0.54 -- -- 0.52 -- -- 0.51 -- -- 0.53 -- -- 0.54

(0.01) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 1.03 1.01 1.01 1.01 1.00 1.01 0.99 0.99 0.99 0.98 0.99 0.99 1.14 1.06 1.06 1.09 1.04 1.04 (0.03) (0.01) (0.01) (0.04) (0.02) (0.02) (0.02) (0.01) (0.01) (0.07) (0.03) (0.04) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00) Mean(Capital) / Mean(Labor) 0.47 0.42 0.55 0.55 0.55 0.49 0.19 0.19 0.12 0.14 0.14 0.08 0.28 0.27 0.23 0.39 0.38 0.40 (0.06) (0.05) (0.08) (0.10) (0.10) (0.09) (0.03) (0.03) (0.04) (0.05) (0.05) (0.05) (0.04) (0.04) (0.04) (0.02) (0.02) (0.03) 55

Chile

Labor 0.77 0.21 0.28 0.93 0.37 0.45 0.95 0.37 0.45 0.92 0.32 0.40 0.96 0.43 0.52 0.77 0.29 0.38 (0.02) (0.01) (0.01) (0.04) (0.02) (0.03) (0.04) (0.02) (0.02) (0.04) (0.01) (0.02) (0.04) (0.02) (0.03) (0.01) (0.01) (0.01)

Capital 0.33 0.10 0.11 0.24 0.10 0.11 0.20 0.08 0.06 0.19 0.07 0.07 0.25 0.11 0.13 0.37 0.14 0.16 (0.01) (0.00) (0.01) (0.02) (0.01) (0.01) (0.03) (0.01) (0.01) (0.02) (0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00)

Intermediates -- -- 0.67 -- -- 0.54 -- -- 0.56 -- -- 0.59 -- -- 0.50 -- -- 0.55 (0.00) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 1.10 1.03 1.05 1.17 1.07 1.10 1.14 1.06 1.08 1.11 1.04 1.06 1.22 1.09 1.15 1.13 1.05 1.09 (0.02) (0.00) (0.01) (0.03) (0.01) (0.02) (0.03) (0.01) (0.02) (0.03) (0.01) (0.01) (0.03) (0.01) (0.02) (0.01) (0.00) (0.01) Mean(Capital) / Mean(Labor) 0.43 0.46 0.39 0.26 0.26 0.24 0.21 0.21 0.14 0.21 0.22 0.18 0.26 0.27 0.25 0.48 0.49 0.43 (0.03) (0.03) (0.03) (0.03) (0.03) (0.04) (0.04) (0.04) (0.03) (0.03) (0.03) (0.03) (0.03) (0.03) (0.03) (0.02) (0.02) (0.02)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers in the first column are based on a value-added specification and are estimated using a complete polynomial series of degree 2 with the method from Ackerberg, Caves, and Frazer (2006). The numbers in the second column are obtained by raising the value-added estimates to the power of one minus the firm's share of intermediate inputs in total output. The numbers in the third column are based on a gross output specification and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, and intermediate input elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity. Table 6: Heterogeneity in Productivity--Rescaled Value Added (Structural Estimates: Rescaled Value Added vs. Gross Ouput)

Colombia Industry (ISIC Code) Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Value Value Value Value Value Value Value Added Gross Value Added Gross Value Added Gross Value Added Gross Value Added Gross Value Added Gross Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output Added (ACF-- Output (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR) (ACF) Rescaled) (GNR)

75/25 ratio 2.20 2.29 1.33 1.97 1.85 1.35 1.66 1.88 1.29 1.73 2.74 1.30 1.78 2.00 1.31 1.95 2.15 1.37 (0.07) (0.12) (0.02) (0.09) (0.12) (0.03) (0.03) (0.09) (0.01) (0.08) (0.44) (0.04) (0.04) (0.10) (0.02) (0.17) (0.06) (0.01)

90/10 ratio 5.17 4.99 1.77 3.71 3.28 1.83 2.87 3.60 1.66 3.08 6.27 1.80 3.33 3.81 1.74 4.01 4.68 1.86 (0.27) (0.51) (0.05) (0.30) (0.39) (0.07) (0.09) (0.34) (0.03) (0.38) (1.51) (0.12) (0.13) (0.35) (0.03) (0.07) (0.24) (0.02)

95/5 ratio 11.01 8.79 2.24 6.36 5.00 2.38 4.36 5.41 2.02 4.58 10.53 2.24 5.31 5.85 2.16 6.86 7.91 2.36 (1.11) (1.20) (0.08) (0.76) (0.87) (0.14) (0.22) (0.67) (0.05) (1.01) (3.19) (0.22) (0.34) (0.69) (0.06) (0.02) (0.52) (0.03)

Exporter 3.62 0.84 0.14 0.20 0.07 0.02 0.16 -0.04 0.05 0.26 0.55 0.15 0.20 0.06 0.08 0.51 0.09 0.06 (0.99) (0.30) (0.05) (0.10) (0.08) (0.03) (0.07) (0.07) (0.03) (0.63) (0.60) (0.14) (0.05) (0.07) (0.03) (0.12) (0.06) (0.01)

Importer -0.25 -0.40 0.04 0.27 0.08 0.05 0.29 -0.09 0.12 0.06 -0.20 0.05 0.26 0.15 0.10 0.20 0.05 0.11 (0.08) (0.08) (0.02) (0.10) (0.08) (0.04) (0.08) (0.05) (0.03) (0.53) (0.20) (0.08) (0.06) (0.04) (0.02) (0.05) (0.04) (0.01)

Advertiser -0.46 -0.42 -0.03 0.20 -0.16 0.08 0.13 -0.28 0.05 0.02 -0.24 0.04 0.15 -0.05 0.05 -0.13 -0.18 0.03 (0.10) (0.10) (0.02) (0.07) (0.07) (0.03) (0.04) (0.06) (0.02) (0.09) (0.13) (0.04) (0.04) (0.04) (0.02) (0.06) (0.03) (0.01)

Wages > Median 0.59 0.15 0.09 0.60 0.35 0.18 0.41 0.31 0.18 0.34 0.19 0.15 0.55 0.27 0.22 0.63 0.29 0.20 (0.19) (0.18) (0.02) (0.09) (0.08) (0.03) (0.03) (0.05) (0.02) (0.17) (0.18) (0.04) (0.06) (0.04) (0.02) (0.05) (0.04) (0.01) 56

Chile

75/25 ratio 2.92 2.36 1.37 2.56 2.06 1.48 2.58 2.94 1.43 3.06 3.31 1.50 2.45 2.47 1.53 3.00 2.92 1.55 (0.05) (0.11) (0.01) (0.07) (0.17) (0.02) (0.07) (0.27) (0.02) (0.08) (0.36) (0.02) (0.06) (0.19) (0.02) (0.03) (0.10) (0.01)

90/10 ratio 9.02 5.24 1.90 6.77 3.80 2.16 6.76 8.08 2.11 10.12 10.47 2.32 6.27 5.72 2.33 9.19 7.24 2.39 (0.30) (0.44) (0.02) (0.30) (0.57) (0.05) (0.33) (1.44) (0.05) (0.60) (2.23) (0.05) (0.27) (0.85) (0.05) (0.15) (0.44) (0.02)

95/5 ratio 21.29 8.91 2.48 13.56 5.54 2.91 14.21 13.89 2.77 25.08 19.83 3.11 12.52 9.40 3.13 20.90 12.23 3.31 (0.99) (0.96) (0.05) (0.84) (1.05) (0.09) (0.77) (2.94) (0.09) (2.05) (5.41) (0.11) (0.78) (1.84) (0.10) (0.47) (0.91) (0.04)

Exporter 0.27 0.34 0.02 0.07 -0.03 0.02 0.18 0.14 0.09 0.12 0.09 0.00 0.03 0.01 -0.01 0.20 0.12 0.03 (0.10) (0.12) (0.02) (0.07) (0.06) (0.03) (0.08) (0.11) (0.03) (0.12) (0.11) (0.03) (0.06) (0.07) (0.03) (0.04) (0.05) (0.01)

Importer 0.71 0.44 0.14 0.22 0.10 0.10 0.31 0.18 0.14 0.44 0.21 0.15 0.30 0.19 0.11 0.46 0.37 0.15 (0.11) (0.15) (0.02) (0.05) (0.05) (0.02) (0.05) (0.07) (0.02) (0.10) (0.12) (0.03) (0.05) (0.07) (0.02) (0.03) (0.04) (0.01)

Advertiser 0.18 0.05 0.04 0.09 0.00 0.04 0.15 0.06 0.06 0.04 -0.04 0.03 0.07 0.03 0.01 0.14 0.14 0.06 (0.05) (0.05) (0.01) (0.04) (0.04) (0.02) (0.04) (0.06) (0.02) (0.04) (0.06) (0.01) (0.04) (0.06) (0.02) (0.02) (0.02) (0.01)

Wages > Median 1.23 0.66 0.21 0.47 0.34 0.19 0.62 0.57 0.22 0.68 0.53 0.21 0.56 0.47 0.22 0.99 0.89 0.30 (0.09) (0.07) (0.01) (0.06) (0.06) (0.02) (0.06) (0.09) (0.02) (0.08) (0.11) (0.02) (0.06) (0.09) (0.02) (0.04) (0.05) (0.01)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers in the first column are based on a value-added specification and are estimated using a complete polynomial series of degree 2 with the method from Ackerberg, Caves, and Frazer (2006). The numbers in the second column are obtained by raising the value-added estimates to the power of one minus the firm's share of intermediate inputs in total output. The numbers in the third column are based on a gross output specification and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile value added implies that a firm that advertises is, on average, 18% more productive than a firm that does not advertise. Appendix A: The Olley-Pakes Estimator

The original Olley and Pakes (1996) approach differs from LP in that it aims to exploit the structure of a predetermined input demand as an alternative to the flexible intermediate input as the proxy

variable. In particular they treat kt as the predetermined input in the model whereas the remaining

inputs (lt, mt in this case) are treated as flexible. By using demand for future capital (investment) as a proxy,38 rather than intermediate inputs, it may appear that the identification problems we raise above are less severe for the OP empirical strategy, as kt+1 is not a direct input into the production function at t. However, we now show that capital as a proxy faces the same fundamental identification problems we raise above.39 The key idea behind the OP approach is to assume that the structure of the predetermined input demand satisfies the following scalar unobservability assumption

Assumption 6. The predetermined input demand is given by

kt = K (It−1) = K (kt−1, ωt−1)

where K is strictly monotone increasing in the second argument.

This assumption allows them to exploit kt+1 as a proxy variable for productivity ωt. This can

be seen by examining the “first stage” of their procedure in which they regress output yt on the

inputs (kt, lt, mt) as well as the proxy variable kt+1, which yields

E [yt | kt, lt, mt, kt+1] = f (kt,, lt, mt) + E [ωt | kt,, lt, mt, kt+1]

−1 = f (kt,, lt, mt) + K (kt+1, kt) , (32)

where the second equality follows from Assumption 6.

As Ackerberg, Caves, and Frazer (2015) have shown, since mt = M (kt, ωt) = M (kt, kt+1),

38 Technically OP used investment it as the proxy, but given the process they assume for capital: Kt+1 = (1 − depreciation) Kt + Ii, and conditional on kt, using kt+1 is equivalent to using it. 39This leaves aside altogether the original LP motivation that investment is often zero in the data and hence those observations cannot be used for the estimation.

A-1 and similarly for labor, in general the OP procedure faces the same identification problems regard- ing the flexible input elasticities. However, in contrast to LP, ACF show that the economic structure on the input decisions in OP admits some model-consistent sources of variation that would permit the elasticity ∂ E [y | k , l , m , k ] to be identified in the data for x ∈ {l , m }. If such sources ∂xt t t, t t t+1 t t t

of variation exist, then for xt ∈ {lt, mt} the elasticity

∂ ∂ E [yt | kt,, lt, mt, kt+1] = f (kt,, lt, mt) (33) ∂xt ∂xt

−1 is identified in the data. Notice also that since K (kt+1, kt) = ωt in equation (32), it follows that the ex-post productivity shock, εt = yt − E [yt | kt,, lt, mt, kt+1], is also identified. The capital elasticity clearly cannot be identified from the first stage (33). The empirical strat- egy proposed by OP for the capital elasticity makes use of parametric structure of the production function, which they assumed took a Cobb-Douglas form (C-D for short):

f (kt,, lt, mt) = αkkt + αllt + αmmt.

The first stage of the model in this case is

−1 E [yt | kt,, lt, mt, kt+1] = αkkt + αllt + αmmt + K (kt+1, kt)

= αllt + αmmt + φ (kt+1, kt) (34)

which takes the form of a partially linear model (Robinson, 1988). The random variable φt ≡

φ (kt+1, kt) can be recovered in the data from the partially linear regression in (34), as can its derivatives. The “second stage” of the OP empirical strategy to identify the capital elasticity is based on regressing the new “dependent variable”

y˜t ≡ yt − αllt − αmmt

A-2 on (kt, φt−1, kt−1). This second stage thus yields

E [˜y | kt, kt−1, φt−1] = αkkt + E [ωt | kt, kt−1, φt−1]

= αkkt + h (φt−1 − αkkt−1) . (35)

As we now show, however, under the model’s restrictions, the joint variation in (kt, kt−1, φt−1) required to identify the capital elasticity αk is unlikely to exist in the data. The reason is based on the following observation. Because ωt−1 = φt−1 − αkkt−1, which implies

˜ kt = K (kt−1, ωt−1) = K (kt−1, φt−1) , (36)

40 there is no variation in kt conditional on (kt−1, φt−1). Therefore, the partial derivative

∂ E [˜y | kt, kt−1, φt−1] = αk (37) ∂kt cannot be recovered directly in the data. This observation gives rise to two fundamental problems for the OP solution to transmission bias. The first problem is that any real data application is unlikely to actually exhibit the knife-edge variation (36) implied by the model, i.e., real data will typically exhibit variation in kt conditional on (kt−1, φt−1), in which case one could recover the LHS of (37). Applying their estimator to such data would estimate αk using variation that the model predicts should not exist. The estimates of

“αk” in this case will have no structural meaning (i.e., will not correspond to the capital elasticity) as the data rejects the model. This identification problem regarding the OP approach can be seemingly alleviated if we try to impose a version of Assumption 4 to generate an additional source of variation in kt conditional on

(kt−1, φt−1). If the model allowed for another source of variation χt that enters the predetermined

40This is in essence what Assumption 4 rules out in the nonparametric case.

A-3 input demand function, we would have

kt = K (It−1) = K (kt−1, ωt−1, χt−1) .

The second problem is that such variation χt would represent additional dimensions of firm hetero- geneity in addition to productivity ωt, and would thus violate the scalar unobservability assumption that drives the OP approach.

If χt were observed and augmented to the data, then using it to resolve the identification prob- lem would require that χt accounts for all the variation in kt conditional on kt−1 and φt−1. Any unexplained variation would again violate the scalar unobservability assumption on the prede- termined input. Even if this requirement is met, χt will likely be a higher dimensional random variable that must now also be included in the nonparametric estimation of φ (kt, kt+1, χt). This places a major burden on the data to estimate a high-dimensional nonparametric function. This is especially problematic in applications because, as pointed out by LP, real data sets often contain a large percentage of observations with zero investment which must be excluded from the empirical estimation of φt, thus creating a severe small sample problem that would undermine this strategy.

Appendix B: A Parametric Example

In order to illustrate our non-identification result, we consider a parametric example. Suppose

αk αl αm that the true production function is Cobb-Douglas F (kt, lt, mt) = Kt Lt Mt , and productivity

41 follows an AR(1) process ωt = δ0 + δωt−1 + ηt. The parametric version of the instrumental variables restriction in equation (6) is the following:

E [yt | Γt] = constant+αkkt +αllt +αmE [mt | Γt]+δ0 +δ (φt−1 − αkkt−1 − αllt−1 − αmmt−1) ,

41For simplicity we assume that the prices of output and intermediate inputs are non-time-varying. While time- series variation in relative prices would provide a source of identifying variation here, as we discuss in footnote 21, relying on only time-series variation in prices is problematic for several reasons.

A-4 −1 where φt−1 = f (kt−1, lt−1, mt−1) + M (kt−1, lt−1, mt−1). If we plug in for mt using the first- order condition and combine constants we have

  αkkt + αllt + δ (φt−1 − f (kt−1, lt−1, mt−1)) E [yt | Γt] = constant˜ + αkkt + αllt + αm 1 − αm

+δ (φt−1 − αkkt−1 − αllt−1 − αmmt−1)     αk αl = constant˜ + kt + lt 1 − αm 1 − αm δ + (φt−1 − αkkt−1 − αllt−1 − αmmt−1) . 1 − αm

−1 Plugging in for the Cobb-Douglas parametric form of M , it can be shown that φt−1 = mt−1, which implies

        αk αl αk αl E [yt | Γt] = constant˜ + kt+ lt−δ kt−1−δ lt−1−δmt−1. 1 − αm 1 − αm 1 − αm 1 − αm

Notice that, although there are five sources of variation (kt, lt, kt−1, lt−1, mt−1), the model is not identified. Variation in mt−1 identifies δ, but the coefficient on kt is equal to the coefficient on kt−1 multiplied by −δ, and the same is true for l. In other words, variation in kt−1 and lt−1 do not provide any additional information about the parameters of the production function. As a result, all     we can identify is α 1 and α 1 . To put it another way, the rank condition necessary k 1−αm l 1−αm for identification of this model is not satisfied. In terms of our proposed alternative structure in Theorem 1, we would have

˜ α˜k = (1 − a) αk ;α ˜l = (1 − a) αl ;α ˜m = (1 − a) αm + a ; δ = δ .

        It immediately follows that α˜k = αk and α˜l = αl , and thus our continuum 1−α˜m 1−αm 1−α˜m 1−αm of alternative structures indexed by a ∈ (0, 1) satisfy the instrumental variables restriction. Doraszelski and Jaumandreu (2013) avoid this problem by exploiting both parametric restric- tions and observed price variation as an instrument for identification. This illustrates an important difference with our approach. In addition to not requiring parametric restrictions (or price varia-

A-5 tion), we are not using the first-order condition to find a replacement function for ω in the produc- tion function. Instead, we use it to form the share regression equation, which gives us a second structural equation that we use in identification and estimation. In terms of our Cobb-Douglas example, the second equation would be given by the following share equation st = αm − εt. Since this equation identifies αm (given that E [εt] = 0), this is enough to allow for identification of the whole production function and productivity.

Appendix C: Extensions

In this section we discuss four extensions to our baseline model: allowing for fixed effects, in- corporating additional unobservables in the flexible input demand, allowing for multiple flexible inputs, and revenue production functions.

C1. Fixed Effects

One benefit of our identification strategy is that it can easily incorporate fixed effects in the pro- duction function. With fixed effects, the production function can be written as

yt = f (kt, lt, mt) + a + ωt + εt, (38) where a is a firm-level fixed effect.42 From the firm’s perspective, the optimal decision problem for intermediate inputs is the same as before, as is the derivation of the nonparametric share regression

(equation 15), with ωet ≡ a + ωt replacing ωt. The other half of our approach can be easily augmented to allow for the fixed effects. We follow the dynamic panel data literature and impose that persistent productivity ω follows a first-

43 order linear Markov process to difference out the fixed effects: ωt = δωt−1 + ηt. The equivalent

42Kasahara, Schrimpf, and Suzuki (2015) generalize our approach to allow for the entire production function to be firm-specific. 43For simplicity we use an AR(1) here, but higher order auto-regressive models can be incorporated as well. We omit the constant from the Markov process since it is not separately identified from the mean of the fixed effects.

A-6 of equation (9) is given by:

Yt = a − C (kt, lt) + δ (Yt−1 + C (kt−1, lt−1)) + ηt.

Subtracting the counterpart for period t − 1 eliminates the fixed effect. Re-arranging terms leads to:

Yt − Yt−1 = − (C (kt, lt) − C (kt−1, lt−1)) + δ (Yt−1 − Yt−2)

+δ (C (kt−1, lt−1) − C (kt−2, lt−2)) + (ηt − ηt−1) .

Recall that E [ηt | Γt] = 0. Since Γt−1 ⊂ Γt, this implies that E [ηt − ηt−1 | Γt−1] = 0, where Γt−1 includes (kt−1, lt−1, Yt−2, kt−2, lt−2, Yt−3, ...). Let

r (kt, lt, kt−1, lt−1, (Yt−1 − Yt−2) , kt−2, lt−2) = − (C (kt, lt) − C (kt−1, lt−1)) (39)

+δ (Yt−1 − Yt−2)

+δ (C (kt−1, lt−1) − C (kt−2, lt−2)) .

From this we have the following nonparametric IV equation

E [Yt − Yt−1 | kt−1, lt−1, Yt−2, kt−2, lt−2, kt−3, lt−3]

= E [r (kt, lt, kt−1, lt−1, (Yt−1 − Yt−2) , kt−2, lt−2) | kt−1, lt−1, Yt−2, kt−2, lt−2, kt−3, lt−3] ,

which is an analogue to equation (11), in the case without fixed effects.

Theorem 5. Under Assumptions 2 - 5, plus the additional assumption that the distribution of the endogenous variables conditional on the exogenous variables (i.e., instruments),

G (lt, kt, lt−1, kt−1, (Yt−1 − Yt−2) , lt−2, kt−2 | lt−3, kt−3, lt−1, kt−1, Yt−2, lt−2, kt−2), is complete (as defined in Newey and Powell, 2003), the production function f is nonparametrically identified up to an additive constant if ∂ f (l , k , m ) is nonparametrically known. ∂mt t t t

A-7 Following the first part of the proof of Theorem 3, we know that the production function is

identified up to an additive function C (kt, lt). Following directly from Newey and Powell (2003), we know that, if the distribution G is complete, then the function r defined in equation (39) is identified.     Let C˜, δ˜ be a candidate alternative structure. The two structures (C , δ) and C˜, δ˜ are observationally equivalent if and only if

− (C (kt, lt) − C (kt−1, lt−1)) + δ (Yt−1 − Yt−2) + δ (C (kt−1, lt−1) − C (kt−2, lt−2))  ˜ ˜  ˜ ˜ ˜ ˜  = − C (kt, lt) − C (kt−1, lt−1) + δ (Yt−1 − Yt−2) + δ C (kt−1, lt−1) − C (kt−2, lt−2) . (40)

By taking partial derivatives of both sides of (40) with respect to kt and lt we obtain

∂ ∂ ˜ C (kt, lt) = C (kt, lt) ∂z ∂z

˜ for z ∈ {kt, lt}, which implies C (kt, lt) − C (kt, lt) = c for a constant c. Thus we have shown the production function is identified up to an additive constant. The estimation strategy for the model with fixed effects is almost exactly the same as without ˆ fixed effects. The first stage, estimating Dr (kt, lt, mt), is the same. We then form Yt in the same way. We also use the same series estimator for C (kt, lt). This generates analogues to equations (22) and (23):

X τk τl X τk τl Yt−Yt−1 = − ατk,τl kt lt + ατk,τl kt−1lt−1+δ (ωt−1 − ωt−2)+(ηt − ηt−1) (41)

0<τk+τl≤τ 0<τk+τl≤τ and Y − Y = − P α kτk lτl + δ (Y − Y ) t t−1 0<τk+τl≤τ τk,τl t t t−1 t−2 + (δ + 1) P α kτk lτl  (42) 0<τk+τl≤τ τk,τl t−1 t−1 −δ P α kτk lτl  + (η − η ) . 0<τk+τl≤τ τk,τl t−2 t−2 t t−1 We can use similar moments as for the model without fixed effects, except that now we need to lag the instruments one period given the differencing involved. Therefore the following moments can

A-8 τk τl be used to form a standard sieve GMM criterion function to estimate (α, δ): E [(ηt − ηt−1) kt−ιlt−ι], for ι ≥ 1.

C2. Extra Unobservables

Our identification and estimation approach can also be extended to incorporate additional unob- servables driving the intermediate input demand. In our baseline model, our system of equations consists of the share equation and the production function given by

st = ln D (kt, lt, mt) + ln E − εt

yt = f (kt, lt, mt) + ωt + εt.

We now show that our model can be extended to include an additional structural unobservable to the share equation for intermediate inputs, which we denote by ψt:

st = ln D (kt, lt, mt) + ln Ee − εt − ψt (43)

yt = f (kt, lt, mt) + ωt + εt,

 ψ +ε  where Ee ≡ E e t t .

Assumption 7. ψt ∈ It is known to the firm at the time of making its period t decisions and is not persistent: Pψ (ψt | It−1) = Pψ (ψt).

C2.1. Interpretations for the extra unobservable

We now discuss some possible interpretations for the extra unobservable ψ.

Shocks to prices of output and/or intermediate inputs Suppose that the prices of output and intermediate inputs, Pt and ρt, are not fully known when firm j decides its level of intermediate

A-9 ∗ ∗ inputs, but that the firm has private signals about the prices, denoted Pjt and ρjt, where

∗ ln Pjt = ln Pt − νjt, ∗ M ln ρjt = ln ρt − νjt ,

where we add the firm subscripts j for clarity. Firms maximize expected profits conditional on their signals:

 ωjt+εjt ∗ ∗  M (kjt, ljt, ωt) = max Eε,ν,νM PtF (kjt, ljt, mjt) e − ρtMjt | Pjt, ρjt Mjt h  M  i ∗ νjt  ωjt+εjt ∗ νjt ∗ ∗ = max Eε,ν,νM Pjte F (kjt, ljt, mjt) e − ρjte Mjt | Pjt, ρjt Mjt  M  νjt εjt ∗ ωjt νjt ∗ = max E (e ) E (e ) PjtF (kjt, ljt, mjt) e − E e ρjtMjt. Mjt

This implies that the firm’s first order condition for intermediate inputs is given by

∂  M  νjt εjt ∗ ωjt νjt ∗ E (e ) E (e ) Pjt F (kjt, ljt, mjt) e − E e ρjt = 0, ∂Mjt

which can be rewritten as

M   νjt εjt ν ρtMjt E (e ) E (e ) ∂ e jt = f (kjt, ljt, mjt) .  νM  vjt εjt PtYjt E e jt ∂mjt e e

M Let ψjt ≡ νjt − νjt , and we have

ρtMjt ln = sjt = ln D (kjt, ljt, mjt) + ln Ee − εjt − ψjt. PtYjt

Optimization error Suppose that firms do not exactly know their productivity, ωt, when they

∗ make their intermediate input decision. Instead, they observe a signal about productivity ωt =

ωt − ψt, where ψt denotes the noise in the signal. The firm’s profit maximization problem with

A-10 respect to intermediate inputs is

∗  ω +ψt+εt  M (kt, lt, ωt) = arg max PtEε,ψ F (kt, lt, mt) e t − ρtMt. Mt

This implies the following first order condition

∂ ∗ ω  ψt+εt  Pt F (kt, lt, mt) e t Eε,ψ e = ρt ∂Mt

Re-arranging to solve for the share of intermediate inputs gives us the share equation

ρtMt ln = st = ln D (kt, lt, mt) + ln Ee − εt − ψt. PtYt

Notice that for both interpretations of ψ, the firm will take into account the value of E eεt+ψt  when deciding on the level of intermediate inputs, which means we want to correct the share estimates by this term. As in the baseline model, we can recover this term by estimating the share

εt+ψt equation, forming the residuals, εt + ψt, and computing the expectation of e .

C2.2. Identification

The identification of the share equation is similar to our main specification, but with two differ- ences. The first is that, since ψt drives intermediate input decisions and is in the residual of the modified share equation (43), intermediate inputs are now endogenous in the share equation. As a result, we need to instrument for mt in the share regression. We can use mt−1 as an instrument for mt, since it is correlated with mt and independent of the error (εt + ψt). Since in the share re- gression we condition only on kt and lt (and no lags), mt−1 generates variation in mt (conditional on kt and lt), due to Assumptions 3 and 4. Identification follows from standard nonparametric IV arguments as in Newey and Powell (2003).

The second difference is that the error in the share equation is εt + ψt instead of εt. We can

A-11 form an alternative version of Yt, which we denote Yet:

Z Yet ≡ yt − D (kt, lt, mt) dmt − (εt + ψt) = Yt − ψt. (44)

This generates an analogous equation to equation (8) in the paper:

Yet = −C (kt, lt) + ωt − ψt ⇒ ωt = Yet + C (kt, lt) + ψt.

Re-arranging terms and plugging in the Markovian structure of ω gives us:

 

Yet = −C (kt, lt) + h Ygt−1 + C (kt−1, lt−1) + ψt−1 + ηt − ψt, (45) | {z } ωt−1 which is an analogue of equation (9).

The challenge is that we cannot form ωt−1, the argument of h in equation (45), because ψt−1 is not observed. We can, however, construct two noisy measures of ωt−1: (ωt−1 + εt−1) and

(ωt−1 − ψt−1) where

ωt−1 + εt−1 = yt−1 − f (kt−1, lt−1, mt−1) Z = yt−1 − D (kt−1, lt−1, mt−1) dmt−1 + C (kt−1, lt−1)

ωt−1 − ψt−1 = (ωt−1 + εt−1) − (εt−1 + ψt−1)  Z  = yt−1 − D (kt−1, lt−1, mt−1) dmt−1 + C (kt−1, lt−1)

− (st−1 − ln D (kt−1, lt−1, mt−1)) .

We could proceed to identify h and C from equation (45) by adopting methods from the mea- surement error literature (Hu and Schennach (2008) and Cunha, Heckman, and Schennach (2010)) using one of the noisy measures as our measure of ωt−1 and using the other as an instrument. However, such an exercise is beyond the scope of the current paper.

A-12 Instead, we illustrate our approach using an AR(1) process for the evolution of ω: h (ωt−1) =

δ0 + δωt−1 + ηt. We can then re-write equation (45) as

  Yet = −C (kt, lt) + δ0 + δ Ygt−1 + C (kt−1, lt−1) + ηt − ψt + δψt−1, (46)

where now the residual is given by ηt − ψt + δψt−1. Given Assumptions 2 and 7, we have that

E [ηt − ψt + δψt−1 | Γt−1] = 0, where recall that Γt−1 = Γ (It−2), i.e., a transformation of the period t − 2 information set. If we let

    r kt, lt, Ygt−1, kt−1, lt−1 = −C (kt, lt) + δ0 + δ Ygt−1 + C (kt−1, lt−1) , then identification of equation (46) follows from a parallel argument to that in Theorem 5 (i.e., in- cluding the completeness assumption and following the nonparametric IV identification arguments in Newey and Powell (2003)). Therefore we can identify the entire production function, up to an additive constant. We can also identify δ0 and δ, as well as productivity: ω + ε.

C3. Multiple Flexible Inputs

Suppose that, in addition to intermediate inputs being flexible, the researcher believes that one or more additional inputs are also flexible.44 Our approach can also be extended to handle this case. In what follows we assume that labor is the additional flexible input, but the approach can be extended to allow for more than two flexible inputs. When labor and intermediate inputs are both assumed to be flexible, we have two share equa- tions. We use superscripts M and L to distinguish them. Given the extra equation, we allow for additional structural errors in the model, ψ, as described in the preceding sub-section. Our system

44See, for example, Doraszelski and Jaumandreu (2013).

A-13 of equations is thus given by:

M M M M st = ln D (kt, lt, mt) + ln Ee − εt − ψt

L L L L st = ln D (kt, lt, mt) + ln Ee − εt − ψt

yt = f (kt, lt, mt) + ωt + εt.

Nonparametric identification of the flexible input elasticities of L and M proceeds as above in Appendix C2. These two input elasticities define a system of partial differential equations of the production function. By the fundamental theorem of calculus we have

Z mt ∂ M f (kt, lt, mt) dmt = f (kt, lt, mt) + C (kt, lt) m0 ∂mt

and

Z lt ∂ L f (kt, lt, mt) dlt = f (kt, lt, mt) + C (kt, mt) l0 ∂lt

M where now we have two constants of integration, one for each integrated share equation, C (kt, lt)

L and C (kt, mt). Following directly from Varian (1992), these partial differential equations can be combined to construct the production function as follows:

mt lt Z ∂ Z ∂ f (kt, lt, mt) = f (kt, l0, s) ds + f (kt, τ, mt) dτ − C (kt) . (47) ∂mt ∂lt m0 l0 That is, by integrating the (log) elasticities of intermediate inputs and labor, we can construct the production function up to a constant that is a function of capital only.45

45In order to see why this is the case, evaluate the integrals on the RHS of equation (47), we have the following

M  M  f (kt, lt, mt) = f (kt, l0, mt) − C (kt, l0) − f (kt, l0, m0) − C (kt, l0) L  L  + f (kt, lt, mt) − C (kt, mt) − f (kt, l0, mt) − C (kt, mt)

+f (kt, l0, m0)

= f (kt, lt, mt) ,

A-14 We can now construct an analogue to equation (44) above using the residual from either share equation. Using the intermediate input share equation, we have

m l Z t Z t ∂ ∂ M  Yet ≡ yt − f (kt, l0, s) ds − f (kt, τ, mt) dτ − εt + ψt (48) ∂mt ∂lt m0 l0

By subtracting equation (48) from the production function and re-arranging terms we have

M Yet = −C (kt) + ωt − ψt .

Plugging in the Markovian structure of ω gives us

  Ye = − (k ) + h Ye + (k ) + ψM  + η − ψM , et C t  et−1 C t−1 t−1 t t (49) | {z } ωt−1 an analogue to equation (45) above. Identification of C and h can be achieved in the same way as described in C2 for equation (45), with the difference that in this case C only depends on capital.

C4. Revenue Production Functions

We now show that our empirical strategy can be extended to the setting with imperfect competition and revenue production functions such that 1) we solve the identification problem with flexible inputs and 2) we can recover time-varying industry markups.46 We specify a generalized version of the demand system in Klette and Griliches (1996) and De Loecker (2011),

1   σ Pjt Yjt t = eχjt , (50) Πt Yt where f (kt, l0, m0) ≡ C (kt) is a constant of integration that is a function of capital kt. 46This stands in contrast to the Klette and Griliches (1996) approach that can only allow for a markup that is time- invariant.

A-15 where Pjt is the output price of firm j, Πt is the industry price index, Yt is a quantity index that

47 plays the role of an aggregate demand shifter, χjt is an observable (to the firm) demand shock,

and σt is the elasticity of demand that is allowed to vary over time.

Substituting for price using equation (50), the firm’s first order condition with respect to Mjt in the (expected) profit maximization problem is

1 σt     1  1 Yjt 1 ∂ εjt +1 χjt σt + 1 Πt 1 1 F (kjt, ljt, mjt) e E e = ρt. εjt σt σt e σt ∂Mjt Yt

Following the same strategy as before, we can rewrite this expression in terms of the observed log revenue share, which becomes

 1     1   1  εjt σ +1 sjt = ln + 1 + ln D (kjt, ljt, mjt) E e t − + 1 εjt, (51) σt σt

  ρtMt 1 where sjt ≡ ln ,   is the expected markup, D (·) is the output elasticity of interme- PjtYt 1 +1 σt diate inputs, and εjt is the ex-post shock. Equation (51) nests the one obtained for the perfectly competitive case in (15), the only difference being the addition of the expected markup, which is equal to 1 under perfect competition. We now show how to use the share regression (51) to identify production functions among

 1  imperfectly competitive firms. Letting εjt = + 1 εjt, equation (51) becomes e σt

sjt = Υt + ln D (kjt, ljt, mjt) + ln Ee − εejt, (52)

 ε   1  where E = E eejt and Υ = ln + 1 . The intermediate input elasticity can be rewritten so e t σt that we can break it into two parts: a component that varies with inputs and a constant µ, i.e.,

47 As noted by Klette and Griliches (1996) and De Loecker (2011), Yt can be calculated using a market-share weighted average of deflated revenues.

A-16 µ ln D (kjt, ljt, mjt) = ln D (kjt, ljt, mjt) + µ. Then, equation (52) becomes

µ sjt = (Υt + µ) + ln Ee + ln D (kjt, ljt, mjt) − εjt e (53) µ = ϕt + ln Ee + ln D (kjt, ljt, mjt) − εejt.

As equation (53) makes clear, without observing prices, we can nonparametrically recover the scaled ex-post shock εejt (and hence Ee), the output elasticity of intermediate inputs up to a constant µ ln D (kjt, ljt, mjt) = ln D(kjt, ljt, mjt)−µ, and the time-varying markups up to the same constant,

ϕt = Υt + µ, using time dummies for ϕt. Recovering the growth pattern of markups over time is useful as an independent result as it can, for example, be used to check whether market power has increased over time, or to analyze the behavior of market power with respect to the business cycle.

As before, we can correct our estimates for Ee and solve the differential equation that arises from equation (53). However, because we can still only identify the elasticity up to the constant µ,

R µ we have to be careful about keeping track of it as we can only calculate D (kjt, ljt, mjt) dmjt =

−µ R e D (kjt, ljt, mjt) dmjt. It follows that

Z −µ −µ µ f (kjt, ljt, mjt) e + C (kjt, ljt) e = D (kjt, ljt, mjt) dmjt.

From this equation it is immediately apparent that, without further information, we will not be able to separate the integration constant C (kjt, ljt) from the unknown constant µ. To see how both the constant µ and the constant of integration can be recovered, notice that what we observe in the data is the firm’s real revenue, which in logs is given by rjt = (pjt − πt) + yjt. Recalling equation (2), and replacing for pjt − πt using (50), the observed log-revenue produc- tion function is

 1  1  1  rjt = + 1 f(kjt, ljt, mjt) − yt + χjt + + 1 ωjt + εejt. (54) σt σt σt

  However, we can write 1 + 1 = eϕt e−µ. We know ϕ from our analysis above, so only µ is σt t

A-17 unknown. Replacing back into (54) we get

ϕt −µ ϕt −µ  rjt = e e f (kjt, ljt, mjt) − e e − 1 yt (55)

 ϕt −µ  + e e ωjt + χjt + εejt.

We then follow a similar strategy as before. As in equation (8), we first form an observable variable PjtYjt ! Πt Rjt ≡ ln ϕ R µ , eεejt ee t D (kjt,ljt,mjt)dmjt

where we now use revenues (the measure of output we observe), include eϕt , as well as using Dµ instead of the, for now, unobservable D. Replacing into (55) we obtain

ϕt−µ ϕt −µ   ϕt −µ  Rjt = −e C (kjt, ljt) − e e − 1 yt + e e ωjt + χjt .

From this equation it is clear that the constant µ will be identified from variation in the observed demand shifter yt. Without having recovered ϕt from the share regression first, it would not be possible to identify time-varying markups. Note that in equation (54) both σt and yt change with

time, and hence yt cannot be used to identify σt unless we restrict σt = σ as in Klette and Griliches (1996) and De Loecker (2011). Finally, we can only recover a linear combination of productivity and the demand shock,   1 + 1 ω + χ . The reason is clear: since we do not observe prices, we have no way of σt jt jt disentangling whether, after controlling for inputs, a firm has higher revenues because it is more   productive (ω ) or because it can sell at a higher price (χ ). We can write ωµ = 1 + 1 ω +χ jt jt jt σt jt jt as a function of the parts that remain to be recovered

µ ϕt−µ ϕt −µ  ωjt = Rjt + e C (kjt, ljt) + e e − 1 yt,

A-18 48 µ µ  µ and impose the Markovian assumption on this combination: ωjt = h ωjt−1 + ηjt. We can use µ  similar moment restrictions as before, E ηjt|kjt, ljt = 0, to identify the constant of integration

C (kjt, ljt) as well as µ (and hence the level of the markups).

48In this case, one would need to replace Assumption 2 with the assumption that the weighted sum of productivity ωjt and the demand shock, χjt is Markovian. Note that this assumption does not necessarily imply that the two components will be Markovian individually. See De Loecker (2011) for an example that imposes this assumption.

A-19 Appendix D: Additional Results

Table D1: Average Input Elasticities of Output--Energy+Services Flexible (Structural Estimates: Gross Ouput)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

Labor 0.15 0.21 0.37 0.28 0.29 0.22 (0.02) (0.03) (0.03) (0.06) (0.03) (0.01)

Capital 0.06 0.05 0.05 0.01 0.04 0.08 (0.01) (0.02) (0.01) (0.03) (0.02) (0.01)

Raw Materials 0.71 0.69 0.49 0.63 0.58 0.63 (0.02) (0.03) (0.04) (0.07) (0.03) (0.01)

Energy+Services 0.08 0.11 0.09 0.10 0.11 0.11

(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

Sum 1.01 1.05 1.00 1.02 1.02 1.04 (0.01) (0.02) (0.01) (0.03) (0.01) (0.01) Mean(Capital) / Mean(Labor) 0.43 0.23 0.12 0.04 0.14 0.37 (0.08) (0.14) (0.04) (0.08) (0.06) (0.04)

Chile

Labor 0.18 0.28 0.31 0.29 0.31 0.22 (0.02) (0.03) (0.03) (0.04) (0.02) (0.01)

Capital 0.06 0.08 0.04 0.06 0.08 0.11 (0.01) (0.01) (0.01) (0.02) (0.01) (0.01)

Raw Materials 0.77 0.65 0.65 0.59 0.63 0.67 (0.02) (0.02) (0.02) (0.05) (0.02) (0.01)

Energy+Services 0.07 0.07 0.06 0.11 0.07 0.07 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

Sum 1.08 1.08 1.06 1.05 1.10 1.08 (0.01) (0.01) (0.01) (0.02) (0.01) (0.00) Mean(Capital) / Mean(Labor) 0.36 0.28 0.13 0.20 0.26 0.49

(0.04) (0.05) (0.04) (0.06) (0.04) (0.03)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification in which energy+services is flexible and raw materials is quasi-fixed. The results are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, raw materials, and energy+services elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity.

A-20 Table D2: Heterogeneity in Productivity--Energy+Services Flexible (Structural Estimates: Gross Output)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

75/25 ratio 1.20 1.25 1.24 1.30 1.28 1.33 (0.02) (0.03) (0.03) (0.06) (0.02) (0.01)

90/10 ratio 1.50 1.62 1.60 1.75 1.68 1.83 (0.05) (0.10) (0.07) (0.16) (0.06) (0.03)

95/5 ratio 1.87 2.09 2.00 2.26 2.04 2.43 (0.09) (0.22) (0.11) (0.24) (0.11) (0.08)

Exporter 0.14 -0.04 0.02 0.14 0.08 0.01 (0.04) (0.06) (0.03) (0.12) (0.02) (0.03)

Importer 0.00 -0.03 -0.03 -0.04 0.10 -0.02 (0.02) (0.06) (0.03) (0.05) (0.02) (0.05)

Advertiser -0.12 -0.13 -0.10 -0.07 0.05 -0.16 (0.03) (0.11) (0.04) (0.06) (0.02) (0.05)

Wages > Median 0.06 0.13 0.14 0.09 0.19 0.10 (0.02) (0.05) (0.02) (0.05) (0.02) (0.04)

Chile

75/25 ratio 1.31 1.42 1.41 1.44 1.48 1.49 (0.01) (0.02) (0.02) (0.04) (0.02) (0.01)

90/10 ratio 1.76 2.04 2.01 2.13 2.23 2.26 (0.03) (0.05) (0.04) (0.12) (0.05) (0.02)

95/5 ratio 2.22 2.69 2.65 2.94 2.94 3.08 (0.05) (0.12) (0.07) (0.21) (0.08) (0.04)

Exporter -0.01 -0.06 0.01 0.01 -0.05 -0.03 (0.03) (0.03) (0.03) (0.05) (0.03) (0.01)

Importer 0.02 0.04 0.07 0.09 0.05 0.07 (0.02) (0.02) (0.02) (0.04) (0.02) (0.01)

Advertiser -0.01 -0.01 0.01 0.01 0.00 0.02 (0.01) (0.02) (0.01) (0.01) (0.02) (0.01)

Wages > Median 0.12 0.15 0.19 0.17 0.16 0.25 (0.01) (0.02) (0.02) (0.03) (0.03) (0.01)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification in which energy+services is flexible and raw materials is quasi-fixed. The results are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile a firm that advertises is, on average, 1% less productive than a firm that does not advertise.

A-21 Table D3: Average Input Elasticities of Output--Raw Materials Flexible (Structural Estimates: Gross Ouput)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

Labor 0.13 0.21 0.33 0.28 0.30 0.24 (0.02) (0.04) (0.03) (0.06) (0.02) (0.01)

Capital 0.05 0.07 0.02 0.01 0.05 0.06 (0.01) (0.03) (0.02) (0.04) (0.02) (0.02)

Raw Materials 0.60 0.44 0.41 0.42 0.42 0.43 (0.01) (0.01) (0.01) (0.02) (0.01) (0.01)

Energy+Services 0.22 0.30 0.23 0.26 0.28 0.29

(0.02) (0.05) (0.03) (0.07) (0.04) (0.02)

Sum 1.00 1.01 1.00 0.97 1.05 1.02 (0.01) (0.04) (0.03) (0.05) (0.04) (0.01) Mean(Capital) / Mean(Labor) 0.39 0.33 0.07 0.04 0.16 0.26

(0.12) (0.26) (0.06) (0.15) (0.07) (0.09)

Chile

Labor 0.19 0.34 0.35 0.38 0.42 0.26 (0.02) (0.03) (0.04) (0.04) (0.05) (0.02)

Capital 0.06 0.07 0.06 0.06 0.12 0.11 (0.01) (0.02) (0.02) (0.02) (0.03) (0.01)

Raw Materials 0.59 0.47 0.49 0.46 0.43 0.47 (0.00) (0.01) (0.01) (0.01) (0.01) (0.00)

Energy+Services 0.21 0.18 0.15 0.16 0.18 0.21 (0.02) (0.03) (0.04) (0.03) (0.06) (0.02)

Sum 1.05 1.06 1.05 1.07 1.14 1.05 (0.02) (0.02) (0.02) (0.02) (0.03) (0.01) Mean(Capital) / Mean(Labor) 0.32 0.20 0.16 0.16 0.28 0.41 (0.04) (0.06) (0.04) (0.06) (0.07) (0.04)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification in which raw materials is flexible and energy+services is quasi-fixed. The results are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, raw materials, and energy+services elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity.

A-22 Table D4: Heterogeneity in Productivity--Raw Materials Flexible (Structural Estimates: Gross Output)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

75/25 ratio 1.24 1.27 1.25 1.24 1.24 1.31 (0.02) (0.07) (0.03) (0.07) (0.07) (0.02)

90/10 ratio 1.58 1.64 1.59 1.62 1.57 1.72 (0.03) (0.16) (0.08) (0.19) (0.19) (0.05)

95/5 ratio 1.92 2.10 1.90 2.01 1.89 2.13 (0.06) (0.24) (0.13) (0.41) (0.30) (0.07)

Exporter 0.09 -0.01 0.04 0.05 0.01 0.04 (0.04) (0.09) (0.07) (0.18) (0.11) (0.01)

Importer 0.02 0.03 0.08 -0.03 0.05 0.07 (0.02) (0.09) (0.10) (0.15) (0.08) (0.01)

Advertiser -0.06 0.01 -0.01 -0.02 -0.02 -0.03 (0.02) (0.05) (0.04) (0.05) (0.05) (0.01)

Wages > Median 0.04 0.11 0.14 0.09 0.11 0.13 (0.02) (0.07) (0.03) (0.06) (0.07) (0.01)

Chile

75/25 ratio 1.34 1.44 1.40 1.50 1.52 1.51 (0.02) (0.02) (0.02) (0.02) (0.04) (0.01)

90/10 ratio 1.82 2.13 2.05 2.31 2.26 2.32 (0.07) (0.06) (0.06) (0.06) (0.13) (0.02)

95/5 ratio 2.32 2.80 2.70 3.09 2.98 3.19 (0.12) (0.10) (0.10) (0.11) (0.25) (0.04)

Exporter -0.02 0.01 0.05 -0.01 -0.03 0.01 (0.06) (0.03) (0.03) (0.03) (0.03) (0.01)

Importer 0.06 0.06 0.09 0.12 0.06 0.13 (0.06) (0.03) (0.03) (0.04) (0.03) (0.01)

Advertiser 0.00 0.02 0.02 0.03 -0.02 0.04 (0.02) (0.02) (0.03) (0.01) (0.02) (0.01)

Wages > Median 0.13 0.15 0.18 0.19 0.17 0.25 (0.04) (0.03) (0.03) (0.02) (0.04) (0.01)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification in which raw materials is flexible and energy+services is quasi-fixed. The results are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile a firm that advertises is, on average, 0% less productive than a firm that does not advertise.

A-23 Table D5: Average Input Elasticities of Output--Fixed Effects (Structural Estimates: Gross Ouput)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

Labor 0.18 0.26 0.39 0.46 0.29 0.33 (0.05) (0.07) (0.04) (0.12) (0.09) (0.02)

Capital 0.09 0.04 0.06 0.25 0.09 0.07 (0.07) (0.06) (0.04) (0.16) (0.11) (0.02)

Intermediates 0.67 0.54 0.52 0.51 0.53 0.54

(0.01) (0.01) (0.01) (0.02) (0.01) (0.00)

Sum 0.95 0.84 0.96 1.22 0.90 0.95 (0.12) (0.11) (0.07) (0.26) (0.18) (0.04) Mean(Capital) / Mean(Labor) 0.52 0.16 0.15 0.55 0.30 0.21 (1.42) (0.66) (0.09) (0.34) (0.35) (0.07)

Chile

Labor 0.20 0.33 0.50 0.37 0.60 0.30 (0.03) (0.07) (0.05) (0.03) (0.15) (0.02)

Capital 0.02 0.08 0.17 0.10 0.32 0.15 (0.06) (0.09) (0.07) (0.06) (0.15) (0.05)

Intermediates 0.67 0.54 0.56 0.59 0.50 0.55 (0.00) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 0.89 0.95 1.24 1.05 1.42 1.01 (0.08) (0.13) (0.11) (0.07) (0.29) (0.07) Mean(Capital) / Mean(Labor) 0.09 0.23 0.35 0.27 0.53 0.50 (0.25) (0.23) (0.11) (0.14) (0.35) (0.15)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification with fixed effects and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, and intermediate input elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity.

A-24 Table D6: Heterogeneity in Productivity--Fixed Effects (Structural Estimates: Gross Output)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

75/25 ratio 1.36 1.67 1.30 1.58 1.52 1.52 (0.32) (0.40) (0.06) (0.46) (0.32) (0.08)

90/10 ratio 1.82 2.82 1.70 2.71 2.20 2.21 (1.25) (1.71) (0.18) (2.82) (1.04) (0.25)

95/5 ratio 2.30 4.14 2.09 4.04 2.78 2.84 (2.66) (4.01) (0.35) (13.90) (1.75) (0.41)

Exporter 0.26 0.25 0.10 -0.04 0.41 0.22 (0.35) (0.95) (0.19) (2.64) (0.50) (0.11)

Importer 0.13 0.35 0.19 -0.08 0.32 0.27 (0.29) (0.76) (0.25) (2.25) (0.37) (0.10)

Advertiser 0.01 0.32 0.07 -0.17 0.19 0.14 (0.09) (0.30) (0.08) (0.41) (0.24) (0.04)

Wages > Median 0.17 0.45 0.20 0.02 0.40 0.37 (0.26) (0.53) (0.06) (0.51) (0.33) (0.09)

Chile

75/25 ratio 1.57 1.60 1.52 1.52 2.06 1.57 (0.15) (0.17) (0.12) (0.13) (0.36) (0.15)

90/10 ratio 2.41 2.55 2.40 2.34 4.48 2.45 (0.40) (0.59) (0.45) (0.48) (1.27) (0.45)

95/5 ratio 3.14 3.38 3.38 3.20 7.30 3.41 (0.61) (1.20) (0.98) (0.99) (2.55) (0.77)

Exporter 0.34 0.07 -0.07 0.07 -0.42 0.14 (0.23) (0.21) (0.12) (0.42) (0.38) (0.24)

Importer 0.51 0.17 -0.02 0.22 -0.25 0.25 (0.26) (0.18) (0.11) (0.31) (0.39) (0.21)

Advertiser 0.22 0.13 -0.06 0.04 -0.20 0.12 (0.11) (0.14) (0.09) (0.07) (0.23) (0.10)

Wages > Median 0.50 0.28 0.10 0.23 -0.12 0.39 (0.20) (0.17) (0.08) (0.15) (0.44) (0.22)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification with fixed effects and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile a firm that advertises is, on average, 5% more productive than a firm that does not advertise.

A-25 Table D7: Average Input Elasticities of Output--Extra Unobservable (Structural Estimates: Gross Ouput)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

Labor 0.18 0.32 0.39 0.45 0.40 0.36 (0.04) (0.04) (0.03) (0.07) (0.03) (0.01)

Capital 0.13 0.18 0.08 -0.01 0.11 0.15 (0.03) (0.02) (0.02) (0.04) (0.02) (0.01)

Intermediates 0.67 0.54 0.52 0.51 0.53 0.54

(0.01) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 0.98 1.03 0.99 0.95 1.03 1.05 (0.02) (0.03) (0.02) (0.08) (0.02) (0.01) Mean(Capital) / Mean(Labor) 0.72 0.55 0.21 -0.02 0.27 0.41

Chile

Labor 0.24 0.44 0.45 0.37 0.52 0.36 (0.01) (0.03) (0.02) (0.03) (0.03) (0.01)

Capital 0.12 0.11 0.07 0.09 0.14 0.17 (0.01) (0.02) (0.01) (0.02) (0.01) (0.01)

Intermediates 0.66 0.54 0.55 0.59 0.50 0.55 (0.00) (0.01) (0.01) (0.01) (0.01) (0.00)

Sum 1.02 1.09 1.08 1.04 1.16 1.08 (0.01) (0.02) (0.02) (0.02) (0.02) (0.01) Mean(Capital) / Mean(Labor) 0.50 0.26 0.15 0.25 0.28 0.48 (0.05) (0.05) (0.04) (0.05) (0.04) (0.02)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification with fixed effects and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. Since the input elasticities are heterogeneous across firms, we report the average input elasticities within each given industry. d. The row titled "Sum" reports the sum of the average labor, capital, and intermediate input elasticities, and the row titled "Mean(Capital)/Mean(Labor)" reports the ratio of the average capital elasticity to the average labor elasticity.

A-26 Table D8: Heterogeneity in Productivity--Extra Unobservable (Structural Estimates: Gross Output)

Colombia

Food Products Textiles Apparel Wood Products Fabricated Metals (311) (321) (322) (331) (381) All Gross Output Gross Output Gross Output Gross Output Gross Output Gross Output (GNR) (GNR) (GNR) (GNR) (GNR) (GNR)

75/25 ratio 1.35 1.35 1.29 1.35 1.32 1.37 (0.04) (0.03) (0.02) (0.08) (0.03) (0.01)

90/10 ratio 1.82 1.83 1.68 1.94 1.76 1.88 (0.13) (0.08) (0.06) (0.30) (0.05) (0.02)

95/5 ratio 2.36 2.34 2.03 2.57 2.18 2.37 (0.26) (0.17) (0.11) (0.70) (0.09) (0.03)

Exporter 0.15 0.03 0.04 0.32 0.11 0.06 (0.07) (0.04) (0.04) (0.25) (0.04) (0.01)

Importer 0.03 0.05 0.11 0.14 0.12 0.11 (0.04) (0.04) (0.04) (0.15) (0.03) (0.01)

Advertiser -0.03 0.05 0.03 0.07 0.07 0.02 (0.03) (0.04) (0.03) (0.11) (0.03) (0.01)

Wages > Median 0.09 0.18 0.18 0.21 0.23 0.19 (0.04) (0.04) (0.02) (0.10) (0.03) (0.01)

Chile

75/25 ratio 1.37 1.49 1.43 1.51 1.54 1.55 (0.01) (0.03) (0.02) (0.02) (0.02) (0.01)

90/10 ratio 1.92 2.18 2.12 2.35 2.35 2.40 (0.03) (0.09) (0.04) (0.05) (0.06) (0.02)

95/5 ratio 2.51 2.93 2.77 3.15 3.12 3.33 (0.06) (0.18) (0.08) (0.11) (0.12) (0.04)

Exporter 0.00 0.03 0.09 0.00 -0.02 0.03 (0.04) (0.05) (0.03) (0.04) (0.03) (0.01)

Importer 0.12 0.10 0.13 0.15 0.09 0.15 (0.04) (0.04) (0.02) (0.04) (0.03) (0.01)

Advertiser 0.04 0.04 0.06 0.04 0.01 0.06 (0.02) (0.03) (0.02) (0.02) (0.02) (0.01)

Wages > Median 0.20 0.19 0.22 0.22 0.20 0.30 (0.03) (0.04) (0.02) (0.03) (0.03) (0.01)

Notes: a. Standard errors are estimated using the bootstrap with 200 replications and are reported in parentheses below the point estimates. b. For each industry, the numbers are based on a gross output specification with fixed effects and are estimated using a complete polynomial series of degree 2 for each of the two nonparametric functions (G and C ) of our approach. c. In the first three rows we report ratios of productivity for plants at various percentiles of the productivity distribution. In the remaining four rows we report estimates of the productivity differences between plants (as a fraction) based on whether they have exported some of their output, imported intermediate inputs, spent money on advertising, and paid wages above the industry median. For example, in industry 311 for Chile a firm that advertises is, on average, 5% more productive than a firm that does not advertise.

A-27 Online Appendix O1: Monte Carlo Simulations

We consider a panel of 500 firms over 30 periods. To simplify the problem we abstract away from labor and consider the following Cobb-Douglas production function

αk αm ωjt+εjt Yjt = Kjt Mjt e ,

where αk = 0.25, αm = 0.65, and εjt is measurement error that is distributed N (0, 0.07). ωjt follows an AR(1) process

ωjt = δ0 + δωjt−1 + ηjt,

where δ0 = 0.2, δ = 0.8, and ηjt ∼ N (0, 0.04). We select the variances of the errors and the AR(1) parameters to roughly correspond to the estimates from our Chilean and Colombian datasets. The environment facing the firms is the following. At the beginning of each period, firms choose investment Ijt and intermediate inputs Mjt. Investment determines the next period’s capital stock via the law of motion for capital

Kjt+1 = (1 − dj) Kjt + Ijt,

where dj ∈ {0.05, 0.075, 0.10, 0.125, 0.15} is the depreciation rate which is distributed uniformly across firms. Intermediate inputs are subject to quadratic adjustment costs of the form

2 M (Mjt − Mjt−1) Cjt = 0.5b , Mjt where b is a parameter that indexes the level of adjustment costs, which we vary in our simulations. Firms choose investment and intermediate inputs to maximize expected discounted profits. The

O-1 problem of the firm, written in recursive form, is thus given by

αk αm ωjt I V (Kjt,Mjt−1, ωjt) = max PtKjt Mjt e − Pt Ijt − ρtMjt Ijt,Mjt 2 (Mjt − Mjt−1) −0.5b + βEtV (Kjt+1,Mjt, ωjt+1) Mjt s.t.

Kjt+1 = (1 − dj) Kjt + Ijt

Ijt ≥ 0,Mjt ≥ 0

ωjt+1 = δ0 + δωjt + ηjt+1.

The price of output Pt and the price of intermediate inputs ρt are set to 1. The price of investment

I Pt is set to 8, and there are no other costs to investment. The discount factor is set to 0.985. In order for our Monte Carlo simulations not to depend on the initial distributions of (k, m, ω), we simulate each firm for a total of 200 periods, saving only the last 30 periods. The initial conditions, k1, m0, and ω1 are drawn from the following distributions: U (11, 400) ,U (11, 400) , and U (1, 3). Since the firm’s problem does not have an analytical solution, we solve the problem numerically by value function iteration with an intermediate modified policy iteration with 100 steps, using a multi-linear interpolant for both the value and policy functions.49

O1.1. Inference

In the first set of Monte Carlo simulations, we provide evidence that our bootstrap procedure has the correct coverage for our estimator. For this set of simulations, we set the adjustment cost pa- rameter for intermediate inputs, b, to zero to correspond with our DGP. We begin by simulating 500 samples, each consisting of 500 firms over 30 periods. For each sample we nonparametrically bootstrap the data 199 times.50 For each bootstrap replication we estimate the output elasticities of capital and intermediate inputs using our procedure as described in Section 6. We then compute the 95% bootstrap confidence interval using the 199 bootstrap replications. This generates 500

49See Judd (1998) for details. 50See Davidson and MacKinnon (2004).

O-2 bootstrap confidence intervals, one for each sample. We then count how many times (out of 500) the true values of the output elasticities (i.e., 0.25 and 0.65) lie within the bootstrap confidence interval. The results are presented graphically in Figures O1.1A and O1.1B. The true value of the elasticity is contained inside the 95% confidence interval 95.4% (capital) and 94.2% (interme- diate inputs) of the time. Hence, for both the capital and intermediate elasticities we obtain the correct coverage, suggesting that we can use our bootstrap procedure to do inference even in the nonparametric case.

O1.2. Estimator Performance

For our second set of Monte Carlo simulations, we evaluate how well our estimator performs when the first-order condition for intermediate inputs does not hold exactly. We first generate 100 Monte Carlo samples for each of 9 values of the adjustment cost parameter b, ranging from zero adjustment costs to very large adjustment costs. For the largest value, b = 1, this would imply that firms in our Chilean and Colombian datasets, on average, pay substantial adjustment costs for intermediate inputs of almost 10% of the value of total gross output. For each sample we estimate the average capital and intermediate input elasticities in two ways. As a benchmark, we first obtain estimates using a simple version of dynamic panel with no fixed effects. The reason we use dynamic panel is that, in light of our non-identification arguments in Section 3, this procedure provides consistent estimates under the presence of adjustment costs. We compare these estimates to ones obtained via our nonparametric procedure, which assumes adjustment costs of zero. We impose the (true) Cobb-Douglas parametric form in the estimation of the dynamic panel (but not in our nonparametric procedure) to give dynamic panel the best possible chance of recov- ering the true parameters and to minimize the associated standard errors. Given the Cobb-Douglas structure and the AR(1) process for productivity, we have

yjt − αkkjt − αmmjt − δ0 − δ (yjt−1 − αkkjt−1 − αmmjt−1) = ηjt − δεjt−1 + εjt.

The dynamic panel procedure estimates the parameter vector (αk, αm, δ0, δ) by forming moments

O-3 in the RHS of the equation above. Specifically we use a constant and kjt, kjt−1, mjt−1 as the instruments. Since the novel part of our procedure relates to the intermediate input elasticity via the first stage, we focus on the intermediate input elasticity estimates. The comparison for the capital elasticities is very similar. The results are presented graphically in Figures O1.2A and O1.2B. Not surprisingly, the dynamic panel data method breaks down and becomes very unstable for small values of adjustment costs, as these costs are insufficient to provide identifying variation via the lags. This is reflected both in the large percentile ranges and in the fact that the average estimates bounce around the truth. Our method on the other hand performs very well, as expected. This is the case even though for dynamic panel we impose and exploit the constraint that the true technology is Cobb-Douglas, whereas for our procedure we do not. As we increase the level of adjustment costs, our nonparametric method experiences a small upward bias relative to the truth and relative to dynamic panel, although in some cases our estimates are quite close to those of dynamic panel. The percentile range for dynamic panel is much larger, however. So while on average dynamic panel performs slightly better for large values of adjustment costs, the uncertainty in the estimates is larger. Overall our procedure performs remarkably well, even for very large values of adjustment costs. At the largest value, our average estimated elasticity of 0.689 is less than 3 percentage points larger than the truth.

O-4 Figure O1.1A: Monte Carlo: Inference--Capital Elasticity Distribution of 95% Bootstrap Confidence Intervals O-5

Notes: This figure presents the results from applying our estimator to Monte Carlo data generated as described in Online Appendix O1.1 in the absenece of adjustment costs. For each simulation we nonparametrically bootstrap the data 199 times. For each bootstrap replication we estimate the output elasticity of capital using our procedure as described in Section 6. We then compute the 95% bootstrap confidence intervals using these replications. This generates a confidence interval for each of the 500 Monte Carlo samples. In the figure we plot the lower and upper boundaries of the confidence intervals for each Monte Carlo sample. The simulations are sorted by the mid-point of these intervals. The true value of the elasticity is 0.25. 95.4% of the constructed confidence intervals cover the true value. Figure O1.1B: Monte Carlo: Inference--Intermediate Elasticity Distribution of 95% Bootstrap Confidence Intervals O-6

Notes: This figure presents the results from applying our estimator to Monte Carlo data generated as described in Online Appendix O1.1 in the absenece of adjustment costs. For each simulation we nonparametrically bootstrap the data 199 times. For each bootstrap replication we estimate the output elasticity of intermediate inputs using our procedure as described in Section 6. We then compute the 95% bootstrap confidence intervals using these replications. This generates a confidence interval for each of the 500 Monte Carlo samples. In the figure we plot the lower and upper boundaries of the confidence intervals for each Monte Carlo sample. The simulations are sorted by the mid-point of these intervals. The true value of the elasticity is 0.65. 94.2% of the constructed confidence intervals cover the true value. Figure O1.2A: Monte Carlo: Estimator Performance--Intermediate Input Elasticity GNR and Dynamic Panel: Averages

0.85

0.8

0.75

0.7 O-7

0.65 Average Intermediate Elasticity Input Intermediate Average

Dynamic Panel GNR

0.6 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Value of Adjustment Cost Parameter (b)

Notes: This figure presents the results from applying both our estimator and a dynamic panel data estimator to Monte Carlo data generated as described in Online Appendix O1.2. The data are generated such that the first order condition no longer holds because of quadratic adjustment costs. The parameter b indexes the degree of adjustment costs in intermediate inputs facing the firm, with higher values representing larger adjustment costs. The y-axis measures the average estimated intermediate input elasticity for both estimators across 100 Monte Carlo simulations. The true value of the average elasticity is 0.65. Figure O1.2B: Monte Carlo: Estimator Performance--Intermediate Input Elasticity GNR and Dynamic Panel: Percentile Ranges

1.2

1

0.8

0.6

0.4 O-8

0.2

0 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Average Intermediate Input Elasticity Input Intermediate Average -0.2

Dynamic Panel (97.5 percentile) GNR (97.5 percentile) -0.4 Dynamic Panel (2.5 percentile) GNR (2.5 percentile)

-0.6 Value of Adjustment Cost Parameter (b)

Notes: This figure presents the results from applying both our estimator and a dynamic panel data estimator to Monte Carlo data generated as described in Online Appendix O1.2. The data are generated such that the first order condition no longer holds because of quadratic adjustment costs. The parameter b indexes the degree of adjustment costs in intermediate inputs facing the firm, with higher values representing larger adjustment costs. The y-axis measures the 2.5 and 97.5 percentiles of the estimated intermediate input elasticity for both estimators across 100 Monte Carlo simulations. The true value of the average elasticity is 0.65. Online Appendix O2: Value Added

In this appendix we provide additional details regarding value added.

Restricted Profit Functions

Recall equation (24) in the main body:

E ωt ωt VAt = F (kt, lt, mt) e − Mt≡ V (kt, lt, e ) .

It can be shown that the total derivative of value added with respect to one of its inputs is equal to the partial derivative of gross output with respect to that input. For example, the total derivative of value added with respect to productivity is given by:

dV AE dV (k , l , eωt ) t = t t deωt deωt  ωt  ωt   ∂F (kt, lt, mt) e ∂F (kt, lt, mt) e ∂Mt = ω − − 1 ω ∂e t ∂Mt ∂e t ∂GO = t . ∂eωt

Due to the first order condition in equation (14) in the main text, the term inside the parentheses on the second line is equal to zero, where the relative price of output to intermediate inputs has been normalized to one via deflation. This implies that:

 ωt   E ωt  E  E ωt  ∂GOt e dV At e VAt dV At e = = (1 − St) . ωt ωt E E ωt E ∂e GOt de VAt GOt de VAt | {z } | {z } | {z } elasGOt elasVAt elasVAt eωt eωt eωt

However, once we add back in the ex-post shocks we have the following:

dV AE dV (k , l , eωt , eεt ) t = t t deωt deωt  ωt+εt  ωt+εt   ∂F (kt, lt, mt) e ∂F (kt, lt, mt) e ∂Mt = ω − − 1 ω . ∂e t ∂Mt ∂e t

O-9 Notice now that the term inside the parentheses is no longer equal to zero, due to the presence of

the ex-post shock, εt. The reason is that the first order condition, which previously made that term equal to zero, is an ex-ante object, whereas what is inside the parentheses is ex-post. Therefore, we cannot simply transform the value added elasticities into their gross output counterparts by re-scaling via the ratio of value added to gross output.

ω +ε The first order condition implies that ∂F (kt,lt,mt)e t t = eεt . In turn, this implies that ∂Mt E

dV AE ∂F (k , l , m ) eωt+εt eεt  ∂M  t = t t t − − 1 t deωt ∂eωt E ∂eωt

E ωt  εt  VAt GOt GOt ∂Mt e e ⇒ elas ωt = elas ωt − − 1 . e e E ωt+εt E VAt ∂e VAt E

The equation above can then be rearranged to form relationship between the elasticities as:

 ωt   E ωt  ωt  εt  ∂GOt et ∂V At e ∂Mt e e = (1 − St) + − 1 . ωt ωt E ωt ∂e GOt ∂e VAt ∂e GOt E | {z } | {z } GOt E elas VAt eωt elas eωt

A similar result holds when we analyze the elasticities with respect to the entire productivity shock,

eωt+εt , instead of just the persistent component, eωt . In this case we have the following relationship:

 ωt+εt   E ωt+εt  ωt+εt  εt  ∂GOt e ∂V At e ∂Mt e e = (1 − St) + − 1 . ωt+εt ωt+εt E ωt+εt ∂e GOt ∂e VAt ∂e GOt E | {z } | {z } GOt E elas VAt eωt+εt elas eωt+εt

“Structural” Value Added

As discussed in Section 8.2, for the Leontief case we have

ωt+εt Yt = min [H(kt, lt), C (mt)] e . (56)

The standard Leontief condition, H(kt, lt) = C (mt), will not generally hold unless C (mt) = aMt.

ωt+εt Even in this linear case, the value added production function, H(kt, lt)e , does not relate

E E ωt+εt 1  cleanly to the empirical measure of value added VAt ≡ Yt−Mt, since VAt = H(kt, lt) e − a .

O-10 However, it does correspond directly to gross output since

ωt+εt Yt = H(kt, lt)e .

Neither of these issues are resolved by moving ωt inside the min function in equation (56). Sup-

ωt pose that instead of equation (56), one wrote the production function as: Yt = min[H (kt, lt) e ,

εt ωt C (mt)]e . For similar reasons, the condition, H(kt, lt)e = C (mt) , only holds when C (mt) =

aMt. Even when this is the case, the value added production function will again not correspond

E ωt εt 1  to the empirical measure of value added as VAt = H(kt, lt)e e − a . As was the case above,

ωt+εt however, it directly corresponds to gross output: Yt = H(kt, lt)e .

It is also the case that moving εt inside the min function does not help. The problem is that the

εt key condition, H(kt, lt)e = aMt, will not hold when ωt is outside the min because of the presence of the ex-post shock εt. Since εt is realized after input decisions are made, the key condition will

E generally not hold. Thus neither VAt , nor gross output Yt correspond to the structural value-

ωt+εt added production function H (kt, lt) e . An analogous argument holds when ωt is inside the min function.

References Online Appendix Davidson, Russell and James G MacKinnon. 2004. Econometric Theory and Methods, vol. 5. New York: Oxford University Press.

Judd, Kenneth L. 1998. Numerical Methods in Economics. Cambridge: MIT Press.

O-11