NBER WORKING PAPER SERIES

UNOBSERVABLE SELECTION AND COEFFICIENT STABILITY: THEORY AND VALIDATION

Emily Oster

Working Paper 19054 http://www.nber.org/papers/w19054

NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 May 2013

Ling Zhong, Unika Shrestha, Damian Kozbur, Guillaume Pouliot, David Birke and Angela Li provided excellent research assistance. I thank David Cesarini, Raj Chetty, Todd Elder, Amy Finkelstein, Larry Katz, Matt Gentzkow, Matt Notowidigdo, Chad Syverson, Manisha Shah, Azeem Shaikh, Jesse Shapiro, Bryce Steinberg, Matt Taddy, Heidi Williams and participants in seminar at Brown, University of Chicago Booth School of Business, Wharton and Yale for helpful comments. I am grateful to a number of authors for providing R-squared values from their work and to Amir Sufi for re-running analyses on request. I gratefully acknowledge financial support from the Neubauer Family. The views expressed herein are those of the author and do not necessarily reflect the views of the National Bureau of Economic Research.

NBER working papers are circulated for discussion and comment purposes. They have not been peer- reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications.

© 2013 by Emily Oster. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Unobservable Selection and Coefficient Stability: Theory and Validation Emily Oster NBER Working Paper No. 19054 May 2013, Revised January 2014 JEL No. C01,I1,I12

ABSTRACT

A common heuristic for evaluating robustness of results to omitted variable bias is to look at coefficient movements after inclusion of controls. This heuristic is informative only if selection on observables is proportional to selection on unobservables. I formalize this link, drawing on theory in Altonji, Elder and Taber (2005) and show how, with this assumption, coefficient movements, along with movements in R-squared values, can be used to calculate omitted variable bias. I discuss empirical implementation and describe a formal bounding argument to replace the coefficient movement heuristic. I show two validation exercises suggesting that this bounding argument would perform well empirically. I discuss application of this procedure to a large set of publications in economics, and use evidence from randomized studies to draw guidelines as to appropriate bounding values.

Emily Oster University of Chicago Booth School of Business 5807 South Woodlawn Ave Chicago, IL 60637 and NBER [email protected] 1 Introduction

Concerns about omitted variable bias are common to most or all non-experimental empirical work in economics, other social sciences and the natural sciences. And although randomized experiments are common in natural sciences and are becoming increasingly common within economics, the majority of empirical work in both settings is still not randomized.1 Within economics, a common heuristic for evaluating the robustness of a result to omitted variable bias concerns is to look at the sensitivity of the treatment effect to added controls. In a review of non-structural, non-experimental empirical work published in three general interest economics journals2 in 2012, 75% of papers explored the sensitivity of the results to varying control sets, and a number of these papers were quite explicit about the relationship between coefficient stability and omitted variable bias.3 The baseline assumptions underlying the linear model do not imply that coefficient movements are informative about bias. The heuristic is informative only with the added assumption that the selection on observables variables is informative about the selection on unobservables (Altonji, Elder and Taber (2005); Murphy and Topel (1990)). This connection is rarely made explicit in empirical work and the underlying assumption is generally untested. The goal in this paper is to develop a formal statement of robustness related to the coefficient stability heuristic. I formalize the link between coefficient stability and omitted variable bias through the proportional selection on observed and unobserved variables. I develop implementation guidelines and suggest that robustness can be described by an identified set of the treatment effect. I perform two validation examples - one based on constructed data and one linking possibly biased observational relationships to external causal estimates - which suggest this procedure performs well. Finally, I apply this adjustment to a set of papers in economics and use insights from randomized data to suggest bounds for constructing the identified set. A key result is that coefficient movements alone are not sufficient to learn about bias. It is also necessary to take into account movements in the R-squared values when controls are added. Small coefficient movements with small R-squared movements can imply more bias than large movements in both. In the analysis of the economics literature, I show significant overlap in the distribution of coefficient movements between result which do and do not survive the full robustness adjustment. I begin in Section 2 with the formal theory, which adopts the setup and assumption of Altonji, Elder and Taber (2005) (hence, AET). I consider the following model: Y = βX + W1 + W2, where W1 is observed and W2 is unobserved and the coefficient of interest is β. Note that β cannot be recovered from regression 1For example: in 2012 JAMA published 133 major research papers, only 53 of which were randomized. The American Journal of Public Health published 128, only 14 of which were randomized. The combination of the American Economic Review, the Quarterly Journal of Economics and the Journal of Political Economy published 69 empirical, non-structural papers, only 11 of which were randomized. 2American Economic Review, Journal of Political Economy and Quarterly Journal of Economics. 3For example, Chiappori et al (2012) state: “It is reassuring that the estimates are very similar in the standard and the augmented specifications, indicating that our results are unlikely to be driven by omitted variables bias.” Similarly, Lacetera et al (2012) state: “These controls do not change the coefficient estimates meaningfully, and the stability of the estimates from columns 4 through 7 suggests that controlling for the model and age of the car accounts for most of the relevant selection.”

2 because of the unobserved elements in the model. I describe the proportional selection assumption, which formally links the relationship between X and the observed variables to the relationship between X and the unobserved variables. This link invokes a degree of proportionality, denoted δ. Under this assumption I show that β can be recovered from: (1) the coefficients on X with and without controls for observed variables; (2) the R-squared values from controlled and uncontrolled regressions; and (3) a value for δ. The result shows that coefficient movements do relate to omitted variable bias, but they must be scaled by movements in the R-squared values. Section 3 turns to implementation. Performing the adjustment described requires the coefficient and R-squared values from regressions, which are recoverable, but also an assumption about δ. A key practical issue is that, because the model specified has no error term, δ above captures both the relative importance of the unobserved variables related to X and the idiosyncratic variation in Y . This value has no clear analog in the data.

In the first part of Section 3, I redefine the model as Y = βX + W1 + W˜2 + , where  is orthogonal to ˜ X,W1 and W˜2. This unpacks δ into two components: (1) δ, the proportional selection between W1 and the unobservables related to X (W˜2) and (2) the R-squared of the full regression with controls for X,W1 and W˜2, which I denote Rmax. The advantage to this unpacking is that it allows us to use knowledge about the measurement error or idiosyncratic variation in Y to constrain the problem. Consider estimating the wage returns to education in a setting where omitted family background is a concern, and imagine two possible wage measures: this week’s wages, or the average weekly wage over the last year. As an overall measure of earning power, wages measured based on a single week have more measurement error. Without separating δ into components we know only that it should be lower for the single-week wages. By separating out the components,we can use knowledge of ˜ the relative degree of measurement error to define the two Rmax values, and then note that δ should actually be the same in the two analyses. Section 3 provides a formal way to describe robustness of results under this model. I show, using logic ˜ akin to the partial identification literature, that by assuming bounding values for Rmax and δ one can generate an identified set for the treatment effect. This set can then be subject to robustness questions: Does it contain zero? Does it contain values far from what we would conclude based on the controlled effect? A key ˜ ˜ consideration is bounding values for δ and Rmax. I argue that δ ∈ [0, 1] is a plausible bound on the degree of selection, and note that Rmax is necessarily bounded between the controlled R-squared and 1, although knowledge of the problem may suggest a bounding value less than 1. Section 3 provides some simulation evidence on the performance of this adjustment, primarily illustrating the importance of the R-squared movements. What this simulation does not answer is the question of how this adjustment performs in non-simulated data or whether the identified set logic would improve

3 inference. I evaluate the performance of this adjustment in two ways in Section 4. First, I use NLSY data to construct a dataset relating education and wages; the data is constructed such that we know the “true” treatment effect based on a full set of family background controls. I then evaluate the performance of this adjustment by excluding combinations of controls from the “observed” set. I estimate the value of δ˜ which would be produced by each excluded set and calculated the bias-adjusted treatment effect. I show that 83% of the δ˜ values are in the range from 0 to 1, the bounds suggested in Section 3, meaning that in 83% of cases the identified set would include the true effect. This may actually undervalue this performance as the control set is selected here at random rather than based on using the most important controls first, as would be common in practice. The data is constructed with a realistic amount of idiosyncratic variation in the outcome, and I demonstrate that ignoring this variation (i.e. assuming the R-squared in the full model is 1) would lead to a flawed adjustment. In a second test I use NLSY data to estimate the relationship between maternal behavior and child outcomes. A major concern in this setting is the the confound by socioeconomic status. I match the possibly biased observational estimates with external evidence of the causal effect from randomized data or comprehensive meta-analyses (this is close in spirit to Lalonde (1986)).4 I then ask whether the proportional selection adjustment would separate true from false associations within these possibly biased observational correlations. I find that the adjustment performs well: The identified set under this adjustment excludes zero in the two cases where external evidence confirms a link, and overlaps with zero in the three cases where the external evidence rejects the link. The evidence in this section suggests this adjustment would improve inference. Although these settings by no means capture all of the areas in which coefficient stability heuristics are used, they are helpful largely because the basic empirical issue – pro-social behavior linked with socioeconomic status – is shared by many settings within economics. In the final section of the paper I turn to the application of this procedure to the economics literature. I focus on two questions. First: How do stability statements in published papers in economics hold up to a

version of this adjustment? Second: Is it possible to generate tighter bounds on Rmax? This latter issue is

crucial since the evidence in Section 4 suggests that the bound of Rmax = 1 will lead us to conclude too few results are robust. I begin with a sample of papers in the American Economic Review, Journal of Political Economy, Quarterly Journal of Economics and Econometrica, published between 2008 and 2013 and satisfying citation cutoffs (>20 for 2008-2010 and >10 for 2011-2013). I extract all relationships for which a coefficient stability heuristic is reported (58 papers; 134 results). I consider the sensitivity of these results to the proportional ˜ selection adjustment, with δ ∈ [0, 1] and varying bounds on Rmax. My primary definition of robustness is 4I also look at sibling fixed effects estimates within this dataset as another way to fully control for family background, with identical results.

4 whether the identified set includes zero; this is estimated on the set of relationships for which inclusion of controls moves the effect toward zero. I also consider an auxiliary definition based on whether the identified set falls fully within some standard error bounds of the controlled effect.

Under either notion of robustness, about 40% of results are robust to a value of Rmax = 1. I show other bounds on Rmax which are a function of the fully controlled R-squared. These capture the idea that there is variation in how predictable outcomes are, and this variation can be roughly inferred from how much is predicted by the observables. Denoting the fully controlled R-squared as R˜, I explore robustness to

Rmax = ΠR˜, with varying values of Π. About 50% of results are robust to a value of Π = 3, and 65% to a value of Π = 2. This shows considerable variation in the robustness of these stability claims, but does not suggest a bounding value. For that, I turn to randomized results. The claim that the coefficient is unchanged by inclusion of controls implicitly suggests that the treatment is assigned as if randomly. If that is the case, then the coefficient movement should be within the bounds we would expect to occur if treatment were randomized. Fortunately for this exercise, it is common in randomized papers to show coefficients with and without controls, either as a balancing test or to increase precision. I draw a sample of all randomized papers from the American Economic Review, Journal of Political Economy, Quarterly Journal of Economics, Econometrica and American Economic Journal: Applied Economics between 2008 and 2013 which report coefficients with and without controls (33 papers; 76 results).

Randomized data is more robust to this adjustment than non-randomized, but I show that assuming Rmax = 1 would still lead to rejection of 30% of randomized results. I derive cutoffs based on values of Π which would allow 95% of randomized results to survive: for the primary robustness definition, this value is Π = 2.2. This provides a full robustness reporting standard. Showing that the identified set with bounds ˜ δ ∈ [0, 1] and Rmax ∈ [R,˜ min{2.2R,˜ 1}] excludes zero would suggest robustness in the range of what would be seen if the treatment were randomized. For results where inclusion of controls moves the coefficient away from zero, this identified set may be evaluated based on whether it leads to similar magnitude conclusions. In the full sample of non-randomized results considered, about 65% would survive this bounding robustness argument. I show considerable overlap in coefficient movements across groups which do and do not survive this robustness standard, indicating that additional information is provided by this calculation over an approach that explicitly calculates coefficient movements. I conclude this section by discussing three example papers and showing how the conclusions would be altered (or not) by this adjustment.

5 2 Theory

Consider the regression model

Y = βX + W1 + W2 (1)

X represents the treatment and the coefficient of interest is β. W1 and W2 represent confounders. Specifically,

o W1 is a vector which is a linear combination of observed control variables wj multiplied by their true

PJo o o coefficients: W1 = j=1 wj γj .W2 is a vector which is a linear combination of unobserved control variables

u PJu u u wj , again multiplied by their true coefficients: W2 = j=1 wj γj . Note that W2 may contain some components which are orthogonal to X, including any measurement error in Y .

I assume that Cov(W1,W2) = 0 and, without loss of generality, that V ar(X) = 1. The assumption of

orthogonality between W1 and W2 is discussed in more detail below. The covariance matrix associated with

0 the vector [X,W1,W2] is positive definite. Note that without further assumptions on the relationship between

X,W1 and W2 there is no information provided about the bias associated with W2 by seeing the bias from W1.

σ1X σ2X Define the proportional selection relationship as δ = , where σiX = Cov(Wi,X), σii = V ar(Wi) σ11 σ22 and δ is the coefficient of proportionality. I assume that δ > 0 and refer to this as the proportional selection assumption. This implies that the relationship between X and the vector containing the observables is informative about the relationship between X and the vector containing the unobservables. Define the coefficient resulting from the short regression of Y on X as β˚ and the R-squared from that ˜ regression as R˚. Define the coefficient from the intermediate regression of Y on X and W1 as β and the R-squared as R.˜ Note these are in-sample values. ˚ ˜ The omitted variable bias on β and β is controlled by the auxiliary regressions of (1) W1 on X; (2) W2 on X; and (3) W2 on X and W1. Denote the in-sample coefficient on X from regressions of W1 and W2 on X ˆ ˆ ˆ as λW1|X and λW2|X , respectively and the coefficient on X from a regression of W2 on X and W1 as λW2|X,W1 .

Denote the population analogs of these values λW1|X , λW2|X and λW2|X,W1. All estimates are implicitly indexed by n. Probability limits are taken as n approaches infinity. All observations are independent and identically distributed according to model (1). By standard omitted variable bias formulas, I can express the probability limits of the short and intermediate regression coefficients in terms of these values:

˚ p β → β + λW1|X + λW2|X ˜ p β → β + λW2|X,W1

Lemma 1 defines the probability limit of the coefficient difference.

p 2 2 ˚ ˜ σ11−σ1X (δσ22+σ11) Lemma 1. (β − β) → σ1X 2 σ11(σ11−σ1X )

6 Proof. This follows directly from the probability limits of the auxiliary regression coefficients under the proportional selection assumption. Proof details are in Appendix A.

p Denote the sample variance of Y asσ ˆyy and note thatσ ˆyy → σyy. Lemma 2 defines probability limits for functions of the R-squared values.

2 σ2 −σ2 (σ +δσ ) σ σ2 −σ2 (σ +δ2σ ) ˜ ˚ p [ 11 1X 11 22 ] ˜ p 22[ 11 1X 11 22 ] Lemma 2. (R − R)ˆσyy → 2 2 and (1 − R)ˆσyy → 2 . σ11(σ11−σ1X ) σ11(σ11−σ1X )

Proof. This follows directly from the auxiliary regression coefficients and Lemma 1. Proof details are in Appendix A.

Define the following:

 h i β˜ − β˚− β˜ 1−R˜ if δ=1  R˜−R˚  " q #  ˚ ˜ 2 2 ˚ ˜ 2 ˜ ˚ ˜  ˜ [β−β] [Θ +Θ(4δ(1−δ)[β−β] [1−R])]−Θ[β−β] ∗ β − 2 if δ 6= 1, σ1X ≥ 0 β = 2(1−δ)[β˚−β˜] [R˜−R˚]  " q #  ˚ ˜ 2 2 ˚ ˜ 2 ˜ ˚ ˜  ˜ − [β−β] [Θ +Θ(4δ(1−δ)[β−β] [1−R])]−Θ[β−β] β − 2 if δ 6= 1, σ1X < 0  2(1−δ)[β˚−β˜] [R˜−R˚] h i h i h i 2 ˚ ˜ 2 where Θ = R˜ − R˚ σˆyy + β − β R˜ − R˚ .

p Proposition 1. β∗ → β.

Proof. I outline the proof here, with details in Appendix A. Recall that the bias of interest to calculate is ˆ λW2|X,W1 which, under the proportional selection assumption and by Lemma 1, converges in probability to

δσ22σ1X 2 . δ is assumed to be known so the unknown variables are σ11, σ22 and σ1X . σ11−σ1X By Lemmas 1 and 2 we have:

2 2 ˚ ˜ p σ11 − σ1X (δσ22 + σ11) β − β → σ1X 2 σ11(σ11 − σ1X ) 2 σ2 − σ2 (σ + δσ ) h ˜ ˚i p 11 1X 11 22 R − R σˆyy → 2 2 σ11(σ11 − σ1X ) σ σ2 − σ2 (σ + δ2σ ) h ˜i p 22 11 1X 11 22 1 − R σˆyy → 2 σ11(σ11 − σ1X )

This defines a system of three equations in the three unknowns of interest. Solving this system completes the proof.

h i For values of δ close to 1, the simple expression β˜ − δ β˚− β˜ 1−R˜ will be an approximation for β. The R˜−R˚ exact value diverges from this as δ gets significantly larger than 1. A standard error for this estimator can be calculated with a bootstrap.

7 It is worth briefly discussing a central assumption in deriving these results – namely, that W1 and W2 are orthogonal. This is conceptually somewhat at odds with the idea that these are “related” observables and unobservables. In practice, this assumption can be generated mechanically, but it does influence how we should think about the proportionality condition.

Consider the example where W1 and W2 together capture a full picture of socioeconomic status, and W1 contains standard demographic controls – education categories, income categories, race. The assumption is, then, that W2 captures everything else that we do not see – details about education and income, IQ and achievement measures, etc. If we simply observed these variables they would be by definition correlated with the components of W1. What W2 actually captures is the variables after they are residualized with respect to the elements in W1. This does not pose an issue for the implementation procedure I discuss below – in running the intermediate regression and controlling for the components of W1 the relevant omitted information is by definition orthogonal. However, when we consider the relative importance of the observed and unobserved variables in explaining variation in X, it is important to keep in mind that the unobservable set is residualized. The setup and assumptions in this section are drawn directly from AET. The observation that the bias

σ22σ1,X is proportional to 2 is echoed in their work. The discussion here differs in two ways. First, I show σ11−σ1,X explicitly the relationship between bias and coefficient movements. Second, the estimator they derive is consistent only if β = 0 or δ = 1. The more complicated formulas above arise from the bias on W1 in the intermediate regression. Further discussion is in Appendix A.2.

3 Implementation

This section develops an empirically tractable robustness concept based on the theory in Section 2.

3.1 Components of δ

In Section 2, W2 in the full model is defined to capture all of the residual variation in Y which is not explained

by X and W1. This includes some components which are related to X and W1 through proportional selection.

W2 may also include components which are orthogonal to X or are related to X in a way about which W1 is not informative. Focus on the simplest case where there is some idiosyncratic variation (say, measurement error) in Y.

The proportionality value, δ, defined as δ σ1X = σ2X is sensitive to the degree of this measurement error5 and σ11 σ22

the relationship between W1,W2 and X. It is possible to unpack δ into these two elements.

5 This is because σ22 captures measurement error in Y.

8 Define W2 = W˜2 + , where Cov(X, ) = 0, Cov(W1, ) = 0 and Cov(W˜2, ) = 0. Define ˜ ˜ σ2˜X = Cov(W2,X), σ2˜2˜ = V ar(W2). The full model can be rewritten as:

Y = βX + W1 + W˜2 +  (2)

With this assumption, we can separate δ into two components.

˜ ˜σ1X σ2˜X ˜ First, define δ as the proportionality value relating W1 and W˜2 : δ = . This δ captures how much σ11 σ2˜2˜ of X is explained by the observables versus the unobservables, but in this case the unobservables contain only

the omitted variables which are related to X and are proxied by W1.

Second, define the overall R-squared of the model above - with controls for X,W1 and W˜2 – as Rmax.

Note that Rmax < 1 as long as  6= 0. The difference between Rmax and 1 captures the degree of idiosyncratic variation in Y. We can now define β∗0, a slight modification of β∗ :

 h i ˜ β˜ − β˚− β˜ Rmax−R if δ˜=1  R˜−R˚  " q #  ˚ ˜ 2 2 ˜ ˜ ˚ ˜ 2 ˜ ˚ ˜  ˜ [β−β] [Θ +Θ(4δ(1−δ)[β−β] [Rmax−R])]−Θ[β−β] ˜ ∗0 β − 2 if δ 6= 1, σ1X ≥ 0 β = 2(1−δ˜)[β˚−β˜] [R˜−R˚]  " q #  ˚ ˜ 2 2 ˜ ˜ ˚ ˜ 2 ˜ ˚ ˜  ˜ − [β−β] [Θ +Θ(4δ(1−δ)[β−β] [Rmax−R])]−Θ[β−β] ˜ β − 2 if δ 6= 1, σ1X < 0  2(1−δ˜)[β˚−β˜] [R˜−R˚] h i h i h i 2 ˚ ˜ 2 with Θ = R˜ − R˚ σˆyy + β − β R˜ − R˚ as in Section 2.

p Corollary 1. β∗0 → β.

Proof. See Appendix A.

This discussion focuses on the case where W2 contains only a portion related to W1 and an idiosyncratic component. This discussion can be extended, however, to a case where there is a second (possibly observed) factor which is orthogonal to W1 but correlated with X and Y. If we consider the education and wage example, this could capture something like sex: we would not think of sex as an important socioeconomic confound, but failing to control for it would bias X. In particular, consider the case where the full model is

Y = βX + W1 + W˜2 + m +  (3)

where m is orthogonal to W1, W˜2 and  and the assumptions about orthogonality with  are as above.

p Corollary 2. If m is included in the short and intermediate regression, then β∗0 → β. If m is not included in

∗0 p either regression then β → β where β is defined by Y = βX + Ψ1W1 + Ψ2W˜2 + .

9 Proof. See Appendix A.

This corollary simply notes that if there is a second confounding factor which is orthogonal to W1 and

W˜2, the effect β can be recovered in a similar way, as long as m can be observed and included in both

regressions. In this case Rmax is the R-squared we would observe if X,W1, W˜2 and m were all included in the regression. ˜ ∗ ∗0 These corollaries are a simple extension of Proposition 1. If Rmax = 1 then δ = δ and β = β . ˜ Together, δ and Rmax define a single value of δ (note that the converse is not true). In this sense, there is no additional information provided by unpacking δ into these components. Practically, however, this separation is useful. Using this procedure to estimate β, or to make an argument about robustness, requires assuming a value for δ. This value, which captures both idiosyncratic variation in the outcome and the ˜ selection relationship, does not have a natural interpretation in the data. In contrast, both δ and Rmax do. This is not to say that these values are recoverable from the data directly - they are not - but that because they have a natural interpretation it may be easier to describe and evaluate the assumptions made about them.

3.2 Implementation in Regression

Consider a researcher interested in estimating a treatment effect β of X on Y in a linear model. The researcher is concerned about confounding variable, and observes a vector of controls W1. There are a set of unobserved variables, which relate to Y , X and an index based on W1 as defined in the proportional selection assumption. Practically, the researcher is concerned that they observe only a subset of the possible omitted variables. There are two regressions that the researcher can observe, shown in equations 4 and 5 below. The first controls only for X, the variable of interest. The second adds controls for the observed confounders. Each of these produces a coefficient on X.

Y =α ˆ + βX˚ + ˆ (4) ˜ Y =α ˜ + βX + ΨW1 + ˜ (5)

Referring back to the expressions for β∗ and β∗0 above, equation (4) here recovers β˚ and R˚. Similarly, equation (5) recovers β˜ and R.˜ The result in Proposition (1) shows how β can be recovered using these regression values and an assumption about δ. Corollary (1) shows how β can be recovered using these regression values and ˜ 6 assumptions about Rmax and δ. Stata code accompanying this paper performs this calculation.

6The command is psacalc and it is available through ssc.

10 Point Estimation

˜ Given values for Rmax and δ, the object β is point identified. As I noted above, both of these objects have natural interpretations in the data and in some cases it may be reasonable to make assumption about their value. β will also be point identified with an assumption about δ.

Bounding and Robustness Calculations

˜ In many cases the exact values of δ and Rmax will not be clear. In such cases, it may be more feasible to make robustness statements based on bounding values for these objects. In practice, discussion of coefficient movements are typically done as part of a robustness argument; the discussion here will suggest a more formal way to make similar contributions. I conceptualize this in a partial identification framework (Tamer, 2010; Manski, 2003). Consider the ∗0 ˜ estimator β (Rmax, δ) which is defined above and is an asymptotically consistent estimator of β under known ˜ values of Rmax and δ. Without any additional assumptions, I note that Rmax is bounded between R˜ (the controlled regression coefficient) and 1. Under the proportional selection assumption, δ˜ is bounded below at 0 and some arbitrary upper bound δ. The estimator below delivers the identified set for β.

∗0 ˜ ˜ ∆S = {β ∈ R : β = β (Rmax, δ), for some Rmax ∈ [R, 1] and δ ∈ [0, δ]}

˜ This set is bounded on one side by β, which is the value of β delivered when Rmax = R˜ or δ = 0 (or both). Without more assumptions, the other bound is either positive or negative infinity, since δ is unbounded. The insight of partial identification is that it may be possible to use additional intuition from the problem to ˜ further bound both Rmax and δ values. Consider first the issue of bounding δ.˜ I argue that for many problems, δ˜ = 1 may be a reasonable upper bound. Recall that δ˜ captures the relative importance of the index of observed and unobserved variables in explaining X. The bound of δ˜ = 1 suggests the observables are at least as important as the unobservables. One reason to favor this is that researchers typically focus their data collection efforts (or their choice of regression controls) on the controls they believe ex ante are the most important. A second is that W˜2 is residualized with

respect to W1 so, conceptually, we want to think of the omitted variables having been stripped of the portion related to the included ones. Ultimately, this is an empirical issue, and I will discuss evidence for this bound in Section 4.

In the case of Rmax it may be possible to generate a bound smaller than 1 by, for example, considering measurement error in Y or evaluating variation in Y which cannot be related to X because it results from choices made after X is determined. Define an assumed upper bound on Rmax as Rmax, with Rmax ≤ 1.

11 ˜ ∗0 With these two bounding assumptions the identified set is: ∆s = [β, β (Rmax, 1)].

Empirically, the question of interest in considering ∆s is whether the conclusions based on the full set are similar to what we would draw based on observing the controlled coefficient β.˜ If inclusion of controls moves the coefficient toward zero, one natural question is whether the set includes zero. Regardless of the direction of movement one could ask whether the bounds of the set are outside the confidence interval on β˜ – this effectively asks whether the magnitude conclusions based on the controlled coefficient are robust.

This suggested robustness leaves open the question of what is a reasonable Rmax to assume in describing the identified set. I discuss this in two specific empirical contexts in Section 4 and in more detail in the context of the economics literature in Section 5.

3.3 Numerical Example

Before moving to explore the performance of this adjustment in data, it is useful to illustrate the mechanisms here with numerical simulation. I will focus on illustrating the importance of the R-squared movements in addition to coefficient changes. ˜ To begin, assume a setup in which it is known that δ = 1 and Rmax = 1. Assume that a completely uncontrolled regression of Y on X yields a coefficient of 0.5 and an R-squared of 0.1. Consider three scenarios for the controlled coefficient: (1) β˜ = .45, (2) β˜ = .3 and (3) β˜ = .15. Relying on only the coefficient movements, the last result looks the least robust. That is, observation that the coefficient moves from 0.5 to 0.15 would likely lead to much more concern about the remaining unobservables than the move from 0.5 to 0.45. However, this information is incomplete without knowing R.˜ Figure 1 graphs, for the three coefficient movement scenarios, the distribution of bias-adjusted β values. I have truncated the distribution in all cases at -2.5, but used text to describe the coefficients under one extreme case where R˜ = .11. This figure illustrates large overlap in the distributions. Clearly values of β > 0.15 will be delivered only by the smaller coefficient movements (and values β > 0.3 only by the smallest one) but below β = 0.15 there is full overlap. At the value of R˜ = .11, the larger coefficient movement implies a more negative effect, but as the R-squared movement gets arbitrarily small, all three cases could have infinitely negative adjusted β values. A perhaps simpler way to say this is that all three coefficient movements are fully consistent with true effect of less than zero. Such an effect would be delivered by R˜ < .19 with the small move, R˜ < .46 with the medium move, and R˜ < .73 with the big move. Clearly, the last is the most sensitive, but without knowing the movement in R-squared, we cannot draw conclusions. ˜ A second example is useful to illustrate in simulated data how Rmax and δ may separately vary and what this will imply about the relationship between initial coefficient movements and the remaining bias. I ˜ simulate data on {X,W1, W2} based on assumed values of variances, covariances and means. I then construct

12 the outcome Y as a function of these variables:

Y = 0.3X + W1 + W˜2 + 

where  ∼ N(0, 1) and Cov(X, ) = Cov(W1, ) = Cov(W˜2, ) = 0. I estimate the movements in coefficients and

R-squared values when W1 is included in the regression of Y on X. ˜ I construct several simulations which vary values of δ and the variance of W˜2. Throughout the example

I assume that V ar(W1) = V ar(X) = 1, Cov(X,W1) = 0.2 and all variables (X,W1 and W˜2) have a mean of 1. ˜ Table 1 shows the simulated results. The first three rows use a value of δ = 1 and vary V ar(W˜2); the bottom ˜ ˜ three use a value of δ = .5 and vary V ar(W˜2). The variation in δ does not alter the intuition here, so I will focus on the case where δ˜ = 1.

This table demonstrates that the variance of W˜2 plays a huge role in how coefficient movements should be interpreted, precisely because this variation drives the difference between Rmax and R,˜ as well as how that

difference compares to the difference between R˜ and R.˚ When V ar(W˜2) = 1 - the same variance as W1 - the ˜ R-squared moves about halfway toward Rmax with inclusion of controls. In this case, if we assume δ = 1 we could come close to the true β by simply adjusting the coefficients again by the amount they move with

7 controls. When V ar(W˜2) is much smaller than 1 – we assume 0.1 here – the R-squared moves nearly all the way to Rmax and the remaining bias is very small. Conversely, when V ar(W˜2) is large relative to V ar(W1) the

R-squared moves only a tiny fraction of the way to Rmax and the remaining bias is very large relative to the coefficient movements. This last example illustrates the key example of where either ignoring R-squared movements or not

assuming a correct value for Rmax will cause serious errors. In this case the coefficient movement with inclusion of controls is very small, and one might be tempted to conclude the result was very robust. But because there is so much explanatory power in the unobservables, the remaining error is very large. ∗0 ˜ The final column of the table shows the calculated β values, using the δ and correct Rmax and demonstrates that the mechanics of the adjustment are correct: all values are very tightly estimated around 0.3, the true coefficient used to construct the simulation.

4 Empirical Validation

The results above provide a way to recover an estimate of causal treatment effects under the assumption that selection on observables and unobservables is proportional. The discussion makes clear that the bias does

7This could actually be done without seeing the R-squared at all. This is the procedure that is suggested in Bellows and Miguel (2009) and followed in Nunn and Wantchekon (2011). It relies heavily, however, on this assumption that the variance of the observed and unobserved components are equal (this assumption is not stated in those papers, which do not discuss R-squared movements). The Nunn and Wantchekon (2011) case will be discussed in more detail later.

13 relate to coefficient movements in these cases, which is encouraging for common robustness calculations. The simulated data shows that this works mechanically, and illustrates the importance of these R-squared movements. However, this theoretical discussion does not provide any insight as to whether the proportional selection assumption is valid in empirical contexts and, by extension, whether the robustness calculation suggested would improve inference. In this section I describe two exercises which explore how this adjustment performs in the data. I use two example settings which are familiar to economists. In the first example, I generate a dataset in which I know the true effect by construction and explore how this adjustment would perform when various confounds are “unobserved”. In the second example, I compute possibly biased estimates, perform the adjustment, and compare the resulting conclusions to external evidence on causal impacts. I ask whether the adjusted coefficients generate more accurate conclusions than the simple controlled estimates.

4.1 Constructed Data: Returns to Education

I begin with a canonical example in economics, the relationship between wages and years of education. One issue with estimates of standard Mincer regressions is the confound with family background: people whose mothers have more education, for example, are more likely to be educated but also have higher wages for other reasons.8 Using data from the NLSY I construct a dataset in which I define the “true” return to education as the impact of education controlling for a full set of family background characteristics. I then consider the bias - both in simple controlled regressions and after this adjustment is performed - in hypothetical cases in which I do not observe the full set of controls. This exercise will allow me to see how the adjustment performs and to estimate values of δ˜ and ask how they compare to the bounds suggested in Section 3.

4.1.1 Data and Empirical Strategy

I use data from the NLSY-79 cohort. I am concerned with the impact of years of education on log wages, and I begin by considering the standard Mincer regression of log wages on educational attainment. I use the higher of the two educational levels recorded in 1981 and 1986 and the higher of the two wage values recorded in 1996 and 1998. Experience and experience-squared are calculated in the typical way (experience = age - education years - 6). I also control for individual sex. My concern is with confounding by demographics and family background. I capture this with eight variables: region of residence, race, marital status, mother’s education, father’s education, mother’s occupation, father’s occupation and number of siblings. All variables are controlled for fully flexibly, with dummies. Summary statistics for these data appear in Appendix B.

8A second obvious issue is the confound with ability. It would be possible to do an exercise similar to this one with that confound. Since the exercise here is not about finding the causal effect of education on wages, but is simply about exploring this adjustment, there is no loss to ignoring the issue of ability.

14 I construct a dataset by regressing log wages on education, experience, sex and the full set of family background data. I generate fitted values, and then take these as the “true” effects in the model - that is, the effect on education we see in this regression is the unbiased treatment effect in the constructed data. The regression of this fitted value on the full set of controls has an R-squared of 1 by construction. In practice, however, wages are not fully predicted by family background or individual characteristics. I therefore add an orthogonal error term to this fitted value. To generate a magnitude for this term I regress the log wage measure used here on log wages in 1992 or 1994 (again, I take the higher of the 2). This regression has an R-squared of about 0.45. I argue that family background, education, etc, should not explain more of the outcome than the previous year’s wages, since these variables all contribute to that wage. I therefore add an orthogonal error term to the fitted value such that the ultimate regression R-squared is about 0.45. It is important to note that the addition of this error term is done largely for realism; it will be

instructive to explore errors that may be introduced by incorrectly assuming that Rmax = 1. However, the calculations of δ˜ is not sensitive to this addition. Given this constructed dataset, the empirical exercise is straightforward. I iterate through excluding all sets of controls (up to 6 of the 8). In each case I: (1) calculate the δ˜ implied by the included and excluded ∗0 ˜ control set; (2) calculate β with this δ and the true Rmax; (3) calculate whether the identified set bounded ˜ ∗0 ∗0 ˜ by β and β (Rmax, 1) contains the true effect. ; and (4) calculate β with this δ and the assuming Rmax = 1.

4.1.2 Results

Figure 2a shows the distributions of the true β and the estimated β˜ and the values of β∗0. The true effect in the constructed data is 0.090, with a standard error of 0.003. The β∗0 values cluster at the true effect value. ˜ This is a simple numerical check of the procedure in realistic data: if we know the true Rmax and the true δ the adjustment works as it should. Not surprisingly, the estimated β˜ distribution is shifted substantially to the right from the true β. Controlled estimates are systematically biased to estimate excess returns to education. Figure 2b shows the values of δ˜ calculated in this exercise. This value is not mechanical: nothing in the setup constrains any particular value of δ.˜ In the figure, I show the full distribution of δ˜ and the [0,1] bounds that I suggest would be appropriate in many settings. The average δ˜ is 0.528 and 83% of values fall within the [0,1] range. Only 4 (of 211) values are negative. The cases with values of δ˜ > 1 are instructive. These are combinations of controls where the index of the omitted variables are more important in explaining education than the included ones. Of the 32 cases with δ˜ > 1, 90% of them excluded either maternal or paternal education. This makes clear that these variables are among the most important confounds; this should not be surprising and, indeed, it seems likely that researchers would think to include these first, before considering data on parental occupation or number of siblings. Put differently, if we consider control set selection not at random as I do here but with the idea that the most

15 important controls are selected first, it is likely that the [0, 1] bound would be even more accurate. The fact that the average δ˜ is less than 1 supports the idea of 1 as a bound on δ˜, rather than as an average value. I can comment on the identified set logic described in Section 3. Given the δ˜ values, it is straightforward ˜ ∗0 to observe that if we calculate the set [β,β (Rmax, 1)], in 83% of cases the set will include the true value. In an additional 7% of cases the set will overlap with the confidence interval of the true β, suggesting that the error in the set is small. In this setting, it seems that this adjustment works well. The confidence interval of controlled estimates captures the true value of β only 62% of the time, whereas 82% of the time the adjusted identified set using ˜ the true Rmax and a δ ∈ [0, 1] bound contains the true value. As a final point, Figure 2c shows a graph similar to Figure 2a, but with values of β∗0 which would be ˜ calculated with Rmax = 1 and the true δ. These values are systematically biased downward (effectively, the adjustment is too large because we are mistakenly assuming the R-squared would go to 1 with the full control set) and the error can be extremely large. This suggests significant thought is necessary in considering Rmax, an issue I take up in more detail in Section 5.

4.2 Observational Data: Maternal Behavior and Child Outcomes

A second approach to validation is to take a setting in which we have some possibly biased observational relationships and we think we have a sense of the true effect from external sources. Given this, the question is whether this approach can separate “true” from “false” associations.9 In this section I undertake this type of validation exercise in the context of the link between maternal behaviors, infant birth weight and child IQ. These relationships are of some interest in economics, and of wider interest in public health and public policy circles. A literature in economics demonstrates that health shocks while children are in the womb can influence early outcomes and later cognitive skills (e.g. Almond and Currie, 2011; Almond and Mazumder, 2011). A second literature, largely in epidemiology and public health, suggests that even much smaller variations in behavior – occasional drinking during pregnancy, not breastfeeding – could impact child IQ and birth weight. These latter studies, in particular, are subject to significant omitted variable concerns, largely associated with omitted socioeconomic status. I consider five relationships in all: the relationship between child IQ and breastfeeding, drinking during pregnancy, low birth weight/prematurity and the relationship between birth weight (as the outcome) and maternal drinking and smoking in pregnancy.

4.2.1 Data

I use NLSY data, this time from the Children and Young Adult sample, which has information on the children of NLSY participants. I measure IQ with PIAT test scores for children 4 to 8 and birth weight with birth

9Altonji, Elder and Taber (2008) do a version of this in a single context.

16 weight in grams as reported by the mother. In the latter analysis I include all children. In all cases I control for child sex and, with IQ, for their age. These are not considered as part of the confounding set. The IQ treatments are: months of breastfeeding, any drinking of alcohol in pregnancy and an indicator for being low birth weight and premature (<2500 grams and <37 weeks of gestation). The birth weight treatments are maternal smoking and drinking intensity during pregnancy. I measure socioeconomic status, the confounding category, with child race, maternal age, maternal education, maternal income and maternal marital status. Summary statistics for these data appear in Appendix B.

4.2.2 Empirical Strategy

The empirical strategy here is straightforward. I run regressions with and without the socioeconomic controls ˚ ˜ to extract β, R˚, β and R.˜ I adopt a bounding value for Rmax drawn from within sibling correlations

(Mazumder, 2011). In theory, Rmax should reflect how much of the variation in child IQ and birth weight could be explained if we had full controls for family background; I argue this is the thought experiment approximated by the sibling fixed effect R-squared. The figures are 0.61 for IQ and 0.53 for birth weight. ˜ ∗0 Given this Rmax bound, I first calculate the identified set [β,β (Rmax, 1)]. For completeness, I do two ˜ other calculations. First, I calculate the value of Rmax which would produce β = 0 assuming that δ = 1 and ˜ compare this value to the Rmax bound. Second, I find the value of δ which would produce β = 0 under the ˜ assumed Rmax and compare this to δ = 1. It is important to note that these latter two analyses contain the same information as the identified set. The conclusions from these robustness calculations are compared to the conclusions we expect to get if we were able to estimate the true model. To ask whether the adjusted coefficient gets it right, we need to know what the correct answer is. I use two types of evidence. First, I consider external evidence from randomized trials (where available) and meta-analyses. Randomized evidence suggests that breastfeeding is not linked with full-scale IQ (Kramer et al, 2008) and most evidence does not suggest an impact of occasional maternal drinking on child IQ (see, for example: Falgreen-Eriksen et al, 2012; O’Callaghan et al, 2007).10 In contrast, low birth weight and prematurity do seem to be consistently linked to low IQ (Salt and Redshaw, 2006), a link which also has a biological underpinning (de Kieviet et al, 2012). Occasional maternal drinking is typically not thought to impact birth weight (Henderson, Gray and Brocklehurst, 2007), but there is better evidence that smoking does (e.g. from trials of smoking cessation programs as in Lumley et al, 2009). Second, I consider the conclusions one would draw from sibling fixed effects regressions in the NLSY data described above, which provides a more “within sample” test of fully controlling for family background.

10Although the question of whether occasional maternal drinking lowers IQ is an issue with some controversy, as I show below the observational data here actually estimates positive impacts of maternal drinking on IQ, and the fact that those effects are not causal is not a subject of much debate.

17 Of course, sibling fixed effects estimates may be subject to their own concerns about causality, so it is perhaps comforting that the conclusions are the same from either source.

4.2.3 Results

Table 2 reports the results: Panel A shows results on IQ, Panel B on birth weight. The first column shows treatment effects, standard errors and R-squared values with only sex (or age and sex in the case of IQ) as controls. Column 2 shows similar values with the full control set. More breastfeeding is associated with higher IQ in these regressions, and low birth weight is associated with lower child IQ. More maternal drinking appears in these data to be associated with higher child IQ later, an finding which has no biological support and is extremely likely to be due to selection. Both samples show smoking and drinking are associated with lower birth weight. All analyses reported here show significant effects with the full set of controls. Interpreting these results in a naive way, one would conclude that each has a significant link with child outcomes. Column 3 reports whether external evidence, summarized above, suggests a causal impact. As noted, low birth weight does seem to be linked to IQ and smoking is linked to low birth weight, but the other relationships do not have broad support. Column 4 shows sibling fixed effects regressions, which show similar conclusions. The only difference is in the impact of low birth weight on child IQ, where the NLSY regression coefficient is significant only at the 11% level.

Column 5 shows the identified set, bounded by the Rmax estimates in the top row of each panel and δ˜ = 1. This procedure performs well. The two cases in which the identified set does not include zero are those where the external evidence suggest significant results. Put differently, if one were to use the rule of accepting the effect as causal only if the identified set excluded zero, this would lead to the same conclusions as the external evidence. In all cases the identified set includes the sibling fixed effect estimates.

In Column 6 I show the value of Rmax which would produce β = 0; comparing these to the Rmax values in the top of each panel yields the same conclusions as in Column 5. Finally, Column 7 shows that the effects confirmed in external data are those which have values of δ˜ > 1 required to produce β = 0. Again, the conclusion are the same. There are two final points to make about this analysis. First, similar to the wage analysis above, the value of δ˜ which matches the adjusted effects to the sibling fixed effect values is less than 1 - it is 0.42 - pointing to the value of 1 as a bound. Second, doing these calculations with a value of Rmax = 1 as the bound would lead us to reject all the associations - including the two which are confirmed in outside data. The results in this section suggest the robustness framework suggested performs well. It also makes clear the importance of taking into account the movements in R-squared in addition to movements in coefficients. In this latter example, if we based our analysis only on the size (say, in percent terms) of the coefficient

18 movements we would conclude the link between drinking and low birth weight is much more robust than the link between low birth weight and IQ since the former moves only 10% and the latter 30%. In fact, the low birth weight and IQ link has more external support. This is confirmed by the identified set conclusions, and mechanically it is reflective of the much larger change in R-squared in the low birth weight - IQ relationship.

5 Application to Economics Literature

The evidence in the previous section suggests that a version of this approach would improve inference. Both exercises suggest that a bounding value of δ˜ ∈ [0, 1] performs well. Both cases also illustrate the importance of the choice of Rmax, in particular noting that assuming Rmax = 1 when the truth is below this will lead to over-adjustment. I turn now to the application of this approach within the economics literature. I consider two questions. First: How do stability statements in published papers in economics hold up to a version of this adjustment? Second: Is it possible to say anything more about realistic bounds on Rmax? I begin by analyzing stability comments made in non-randomized data within economics, and show the degree of robustness of the results to varying Rmax cutoff. Second, I use evidence from randomized data within economics to develop a bound for Rmax. The third subsection describes three examples.

5.1 Stability of Coefficients in Non-Randomized Data

The data for this section comes from the published literature in economics. I extract all papers in the American Economic Review, Quarterly Journal of Economics, The Journal of Political Economy and Econometrica from 2008-2010 with at least 20 citations in the ISI Web of Science, and those from 2011-2013 in the same journals with at least 10 citations. From these papers I extract all results where the researcher explores the sensitivity of the result to a control set. In cases where the R-squared is not reported (29%), I use replication files or ask the researcher. Two results were excluded due to inability to obtain R-squared values. The final sample includes 58 papers, with 134 total results. The full set of citations used appears in Appendix D. The empirical exercise here is straightforward. I extract coefficient and R-squared values from the paper. Note that in cases where controls are included sequentially, I compare the fewest-controls to the ˜ most-controls set. For each result, I calculate the identified set with δ = 1 and varying values of Rmax.

I consider Rmax = 1 as one bound. I also consider a parametrization of Rmax as a function of

R˜ :Rmax = min{ΠR,˜ 1} with varying values of Π. This function allows for that some outcomes have more measurement error or noise than others, and suggests that the degree of variation accounted for by the observables (including the treatment) may be informative as to the degree accounted for by the unobservables.

An alternative would be to use Rmax = R˜ + Π(R˜ − R˚), which captures a similar assumption. I work through

19 this version in Appendix C, and show the conclusions are extremely similar.

Having calculated the identified set using these Rmax values, I consider two standards for robustness. My primary analysis focuses on the subset of results for which the inclusion of controls moves the coefficient towards zero, and simply asks whether the set includes zero. I also consider whether the bounds of the set fall within +/- 2.8 standard errors of the controlled estimate, an analysis which can be done including results where controls move the coefficient away from zero. This second standard captures a test of whether the magnitude conclusions from the controlled estimate are shared by the adjusted estimate. The results appear in Figures 3a and 3b. Figure 3a shows the primary robustness with rejection of zero; Figure 3b uses all results and shows the magnitude test. These graphs show the share of relationships which would survive varying values of Π, with Rmax = min{ΠR,˜ 1}. In either case, we find about 40% of results would survive Rmax = 1. Within the others, there is a wide distribution of robustness; some share of results would not survive even quite small differences between R˜ and Rmax.

To quantify this, Panel A of Table 3 shows the share of these results which would survive Rmax = 1 and three values of Π. Twenty-five to thirty percent of studies would not survive Π = 1.5. Considering the rejection-of-zero robustness, within this set that is not robust to Π = 1.5, the average study fails at a value of Π = 1.2 or, in point estimate terms, a predicted increase in R-squared of 0.09 with inclusion of unobserved controls. One issue in interpreting these results is that the author of these papers may not be intending these results as a test of omitted variable bias. To address this, I limit to the large subset of papers in which the authors either explicitly comment on the coefficient stability (since remaining omitted variable bias is the only reason that would matter) or explicitly comment on omitted variable bias. Within this subsample, consider the analog of Columns (1) and (3) of Table 3: 35% of these papers would survive Rmax = 1 and 63% would survive Rmax = 2R˜. This is very similar to, if anything lower than, the overall sample, suggesting it is not the case that the papers which fail by this criteria do so because this is not the intended test.

5.2 Evidence on Stability Cutoffs from Randomized Data

The evidence above suggests that even within a sample of papers which argue for coefficient stability there is a lot of variation in the degree of stability. A natural following question is whether we can suggest any guidance about where one might draw the line - specifically, is there some value of Π above which we should consider a result robust? I argue that one place to look for such guidance is in reports from randomized data. Randomized experiments are becoming increasingly common within economics and papers reporting results of these experiments often include regressions with and without controls. Sometimes these are explicitly used to test balance in the experiment, although it is also commonly done to increase precision. Assuming that the data is

20 correctly randomized, if the sample size were infinite, the effects would not be expected to move at all. In practice, with finite data, coefficients can move a bit simply due to very small differences across groups. When non-randomized papers invoke a coefficient stability heuristic to argue the results they observe are causal, they are (perhaps implicitly) suggesting that the treatment is as good as random. Including controls doesn’t change the coefficient because there is no confounding; this is exactly the argument we know holds in randomized cases. Given this, I argue we can use the stability of randomized data as a guide to how much stability we would expect in non-randomized data if the treatment were assigned exogenously: is the coefficient stability within the range the researcher would expect with a randomly assigned treatment? The approach in this section is to assume effects estimated in randomized data are causal and to therefore assume that they should survive this adjustment procedure.11 I then ask what value of Π in the

Rmax parametrization would make this true. The baseline set of papers for this analysis is all randomized papers (lab or field) published in the American Economic Review, Quarterly Journal of Economics, Journal of Political Economy, Econometrica and the American Economic Journal - Applied Economics in the period 2008 through 2013.12 I extract from this all papers which report sensitivity of a treatment effect to controls. In cases where there are multiple effects reported (i.e. multiple outcomes), I include all effects. In cases where R-squared values were not provided in the paper (22%), I used replication files if available and otherwise asked the authors of the papers. The one result with no R-squared values available was not included. The final sample includes 33 papers with 76 results. The full set of references is in Appendix D. I undertake the same analysis as in the non-randomized data: calculate the identified set under δ˜ = 1 and varying Rmax and compare the results to the two standards for robustness. Figures 4a and 4b show the distributions of sensitivity for the randomized data. A first thing to note is that these results are more robust than the non-randomized results. I have graphed them on the same scale for comparability. Sixty to seventy percent of randomized results would survive a cutoff of Rmax = 1. Nearly all would survive a cutoff of Rmax = 2R˜ , much greater than for the non-randomized results. Panel B of Table 3 shows the survival shares for this dataset explicitly under the varying Rmax cutoffs. It is not surprising that the randomized results are more robust. The fact that they do not all survive

Rmax = 1 is due to the fact that even small changes in coefficient can be blown up with this assumption. To consider an example, Jensen (2012) estimates the impact of a job recruitment program on (among other outcomes) child BMI (Table 4 and Appendix B). Adding controls changes the coefficient on treatment from

11An obvious concern is that, perhaps, these papers are not correctly randomized. This would lead me to a standard which was too lax. I address this in two ways. First, I have focused on papers published in highly ranked journals, increasing the chance that the randomization was of high quality. Second, I will draw guidelines which fit 95% of papers, thus accepting that a small share of randomized papers may suffer from true lack of balance and should not be used to guide this approach. 12I include AEJ-Applied because it has published a large number of experimental papers. This journal begins in 2009.

21 0.24 to 0.20, a seemingly quite small difference. The R-squared moves from 0.007 to 0.017. Assuming that

Rmax = 1 would produce a coefficient of -3.73. Given that this program was randomized and the randomization passes standard balancing tests, it seems implausible that this is the actual effect. Much more plausible is the possibility that even with all correlated unobservables included it would not be possible to explain close to all of the variation in child BMI. I use these data to develop robustness cutoff values. I base these on the value of Π which would allow 95% of results to survive. In testing rejection of zero, this value is 2.2. This value suggests that a bound where the unobservables explain as much as the observables (where the latter includes the treatment). This is somewhat parallel to the bounding on δ˜. To argue for a level of stability which would be expected from a randomized treatment, non-randomized effects should show that the set [β,β˜ ∗0(min{2.2R,˜ 1}, 1)] excludes zero. Applying this to the non-randomized data above, I find that 64% of results would survive this standard. This set would be valuable to report even in cases where the controls cause the coefficient to move away from zero; in that case the question would be whether considering the full set would lead to very different conclusions than the controlled estimate. A similar analysis applies to the second robustness standard, were I find the cutoff generated by the randomized data is Π = 1.5. This suggests an alternative way to test for robustness by showing that β∗0(min{1.5R,˜ 1}, 1) is withing 2.8 standard errors of the controlled effect. This allows survival of 69% of non-randomized results. A final question to ask is how the conclusions here differ from those we would draw if we relied only on coefficient movements. Figure 5 graphs the percent reduction (in absolute value) in coefficient for results which do and do not survive the β∗0(min{2.2R,˜ 1}, 1) cutoff (I consider only results where the coefficient moves towards zero). The smaller the coefficient movement the more robust the result (this almost has to be true given that coefficient movements are an input to the result) but the distributions are almost fully overlapping. Only at the very tails would coefficient movements alone differentiate these groups, suggesting significant additional information is provided by doing the full adjustment.

5.3 Examples

Before concluding, it is useful to consider the application of this cutoff value in several examples. I choose three papers from the literature which are very explicit about coefficient stability or (in one case) do calculations in the spirit of this adjustment and for which inclusion of controls reduces the coefficient. In each ˜ ∗0 ˜ case I show the identified set [β,β (min{2.2R,˜ 1}, 1)] as well as the Rmax and δ values which produce β = 0.

22 5.3.1 Mian and Sufi (2009)

Mian and Sufi (2009) deal with the impact of sub-prime borrowers on the mortgage crisis. In their Table V, they estimate the impact of fraction sub-prime borrowers by zip code on mortgage origination. They worry about other characteristics of the zip code, and in their most fully controlled specification control for fixed effect for three-digit zip code prefix. They comment explicitly, in the text, on the stability of the coefficients along with the increase in R-squared (this is among very few papers to comment on the latter). Panel A of Table 4 shows the robustness calculations for this paper. To echo the data above, I consider what happens when moving from the least controlled to most controlled regression in Table (4): Column (1) to Column (4). The results in this paper is robust to this adjustment. The full identified set is quite small and does not come close to including zero. The magnitude conclusions in the paper would be quite similar if we drew any figure from the identified set.

5.3.2 Nunn and Wantchekon (2011)

Nunn and Wantchekon (2011) analyze the impact of slave trades on mistrust in Africa. They worry about unobserved differences across areas, and present a number of arguments to support the interpretation of their results as causal. In Section IV.B. they undertake direct calculations based on the theory in Altonji, Elder and Taber (2005). They use coefficient movements in their regressions to calculate the value of δ˜ which would be required to produce β = 0. They argue the results are robust because all the calculated values of δ˜ are greater than 1, a cutoff which they also favor. However, although they do not make this explicit, the calculations they undertake - which rely only on

coefficient movements - are correct only if Rmax = R˜ + (R˜ − R˚). In words, this assumes the unobservables explain as much of the outcome as the observables (ignoring that the treatment is part of the observables). The R-squared in their regressions do not move much; as an example, in the first row of Table 4, considering the “Trust Relatives” measure, adding controls increases R-squared from 0.115 to 0.133. Their adjustment assumes that the fully controlled R-squared would be 0.151. As I note in the discussion in Section 3.3, if the unobservable index has a large variance, this assumption of equal increase in R-squared will not hold.13

I consider how these results might change if we use the Rmax = min{2.2R,˜ 1} value. Their paper has five outcomes and they show the results in several ways. I consider rows 1 and 2 of their Table 4 and

recompute their analysis with the new Rmax values. The results are shown in Table 4. Some, but not all, of the results in their paper survive this version of the adjustment. Three of the ten δ˜ values are above 1, meaning that three of the ten identified sets continue to reject zero under this adjustment. The other seven admit zero. This suggests that the conclusion from this table, at least, would have been

13 As noted above, Appendix C shows the analysis in the above sections using the formulation : Rmax = R˜ + Π(R˜ − R˚). The best fit value of Π there is 4.95, and the conclusions - in general and about this particular paper - are unchanged.

23 altered by taking the full adjustment - with the randomized-based cutoff for Rmax– into account.

5.3.3 Spolaore and Wacziarg, 2009

Spolaore and Wacziarg (2009) estimate the impact of genetic distance on economic distance between countries. They worry about unobserved differences across country groups - geographic distance, shared continent, etc. Their Table 4, and the surrounding discussion, includes various control sets and looks at coefficient stability. Panel C of Table 4 shows the robustness calculation for this paper. As above, I consider the comparison of the baseline results to the most-controlled column (in their case, this is with continent-pair fixed effects). The identified set here does include zero. It is worth noting that this paper also shows three intermediate regressions with other control sets (Columns (2)-(4) of Table 4) where the coefficient is more stable (although the R-squared moves less). The adjusted sets based on these columns would exclude zero. In a case like this it may be useful to delve further into the appropriate Rmax (perhaps using more information drawn from the setting) or think more about whether the unobservables are closer to the controls in Column (5) or to those in Columns (2) - (4). This latter point is akin to the discussion of this issue in Murphy and Topel (1990).

6 Conclusion

This paper develops a formal language for discussing robustness of treatment effects, related to the popular heuristic of exploring coefficient sensitivity to controls. I connect this heuristic to the assumption of proportional selection on observed and unobserved variables (Altonji, Elder and Taber (2005)). I describe an implementation strategy for generating bounds on treatment effects and show validation in two empirical contexts. Applying this to the economics literature, and drawing guidelines for expected coefficient sensitivity from randomized results, I develop a full bounding argument. I suggest researchers report an identified set for the treatment effect [β,β˜ ∗0(min{2.2R,˜ 1}, 1)] where ˚ ˜ ˜ β∗0 = β˜ − δ˜(β−β)(Rmax−R) . Showing that such a set excludes zero would indicate a level of robustness in the (R˜−R˚) range of what would be seen if the treatment were randomized. For results where inclusion of controls moves the coefficient away from zero, a similar bound could be compared to the magnitude conclusions from the controlled coefficient.

The particular Rmax = min{2.2R,˜ 1} bound is a useful summary, but it is important to note that there many be many cases where one is able to be more precise about Rmax using more knowledge of the problem. The examples in Section 4 illustrate two ways one might develop this intuition. The core insight here is that

the movement in R-squared values relative to this Rmax must be taken into account when developing any coefficient stability argument.

24 References

Almond, Douglas and Bhashkar Mazumder, “Health Capital and the Prenatal Environment: The Effect of Ramadan Observance during Pregnancy,” American Economic Journal: Applied Economics, October 2011, 3 (4), 56–85. and Janet Currie, “Killing Me Softly: The Fetal Origins Hypothesis,” Journal of Economic Perspectives, Summer 2011, 25 (3), 153–72. Altonji, Joseph G., Todd E. Elder, and Christopher R. Taber, “Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools,” Journal of Political Economy, 2005, 113 (1), 151–184. , Todd Elder, and Christopher R. Taber, “Using Selection on Observed Variables to Assess Bias from Unobservables When Evaluating Swan-Ganz Catheterization,” American Economic Review, 2008, 98 (2), 345–50. Chiappori, Pierre-Andrei, Sonia Oreffice, and Climent Quintana-Domeque, “Fatter Attraction: Anthropometric and Socioeconomic Matching on the Marriage Market,” Journal of Political Economy, 2012, 120 (4), 659 – 695. de Kieviet, J. F., L. Zoetebier, R. M. van Elburg, R. J. Vermeulen, and J. Oosterlaan, “Brain development of very preterm and very low-birthweight children in childhood and adolescence: a meta-analysis,” Dev Med Child Neurol, Apr 2012, 54 (4), 313–323. Falgreen-Eriksen, H. L., E. L. Mortensen, T. Kilburn, M. Underbjerg, J. Bertrand, H. Stavring, T. Wimberley, J. Grove, and U. S. Kesmodel, “The effects of low to moderate prenatal alcohol exposure in early pregnancy on IQ in 5-year-old children,” BJOG, Sep 2012, 119 (10), 1191–1200. Henderson, J., R. Gray, and P. Brocklehurst, “Systematic review of effects of low-moderate prenatal alcohol exposure on pregnancy outcome,” BJOG, Mar 2007, 114 (3), 243–252. Jensen, Robert, “Do Labor Market Opportunities Affect Young Women’s Work and Family Decisions? Experimental Evidence from India,” The Quarterly Journal of Economics, 2012, 127 (2), 753–792. Kramer, M. S., F. Aboud, E. Mironova et al., “Breastfeeding and child cognitive development: new evidence from a large randomized trial,” Arch. Gen. Psychiatry, May 2008, 65 (5), 578–584. Lacetera, Nicola, Devin G. Pope, and Justin R. Sydnor, “Heuristic Thinking and Limited Attention in the Car Market,” American Economic Review, August 2012, 102 (5), 2206–36. LaLonde, Robert J, “Evaluating the Econometric Evaluations of Training Programs with Experimental Data,” American Economic Review, September 1986, 76 (4), 604–20. Lumley, J., C. Chamberlain, T. Dowswell, S. Oliver, L. Oakley, and L. Watson, “Interventions for promoting smoking cessation during pregnancy,” Cochrane Database Syst Rev, 2009, (3), CD001055. Manski, C.F., Partial Identification of Probability Distributions Springer Series in Statistics, Springer, 2003. Mazumder, Bhashkar, “Family and Community Influences on Health and Socioeconomic Status: Sibling Correlations Over the Life Course,” The B.E. Journal of Economic Analysis & Policy, 2011, 11 (3), 1. Mian, Atif and Amir Sufi, “The consequences of mortgage credit expansion: Evidence from the US mortgage default crisis,” The Quarterly Journal of Economics, 2009, 124 (4), 1449–1496. Murphy, Kevin and Robert Topel, “Efficiency Wages Reconsidered: Theory and Evidence,” in “Advances in the Theory and Measurement of Unemployment” 1990, pp. 204–240.

25 Nunn, Nathan and Leonard Wantchekon, “The Slave Trade and the Origins of Mistrust in Africa,” American Economic Review, 2011, 101 (7), 3221–3252. O’Callaghan, F. V., M. O’Callaghan, J. M. Najman, G. M. Williams, and W. Bor, “Prenatal alcohol exposure and attention, learning and intellectual ability at 14 years: a prospective longitudinal study,” Early Hum. Dev., Feb 2007, 83 (2), 115–123. Salt, A. and M. Redshaw, “Neurodevelopmental follow-up after preterm birth: follow up after two years,” Early Hum. Dev., Mar 2006, 82 (3), 185–197. Spolaore, Enrico and Romain Wacziarg, “The Diffusion of Development,” Quarterly Journal of Economics, MAY 2009, 124 (2), 469–529. Tamer, Elie, “Partial Identification in Econometrics,” Annual Review of Economics, 09 2010, 2 (1), 167–195.

26 Figure 1: Simulated Effects by Controlled R-Squared 8

Small Move Adj. Beta, Rtilde=.11: −4.000

6 Medium Move Adj. Beta, Rtilde=.11: −17.500

Big Move Adj. Beta, Rtilde=.11: −31.000 4 Density 2 0 −2.5 −1.5 −.5 0 .1 .3 .45 Adjusted Coefficient Estimate

Small Coeff. Move Medium Coeff. Move Large Coeff. Move

Notes: This figure shows calculated values of the bias-adjusted treatment effect in a constructed example with varying coefficient movements.

In all cases I assume the coefficient with no controls is equal to 0.5, the uncontrolled R-squared is 0.1, δ˜ = 1 and Rmax = 1. The solid line shows the distribution of adjusted effects assuming the coefficient with controls is 0.45 and with varying controlled R-squared. The dashed line shows a similar distribution assuming the controlled coefficient is 0.30; the dotted line shows this with a controlled coefficient of 0.15. Discussion is in Section 3.3

27 Figure 2: NLSY Wage Data Simulation

(a) “True”, Controlled and Adjusted Beta 150 4000 3000 100 2000 50 Adjusted Density 1000 True, Controlled Effect Density 0 0 .08 .09 .1 .11 .12 Estimate

True Treatment Effect Controlled Beta Adjusted Beta

(b) Distribution of Estimated Delta 1.5 1 .5 0 −1 0 1 2 3 Calculated Delta

(c) “True”, Controlled and Rmax = 1 Adjusted Beta 15 150 10 100 5 50 Adjusted (Rmax=1) Density True, Controlled Effect Density 0 0 −.2 −.1 0 .1 .2 Estimate

True Treatment Effect Controlled Beta Adjusted Beta, Rmax=1

Notes: These figures show results from the validation using the constructed NLSY wage dataset. The analysis is described in Section 4.

28 Figure 3: Robustness of Stability Results in Economics Literature

(a) Rejection of Zero, Rmax = ΠR˜. 1 ) tilde .9

.8 Survive Rmax=1: 0.403 .7 .6 .5 Share of Results With sign( β )=sign(

1 2 3 4 5 6 7 8 9 10 Multiplier Π : Rmax = Π *Rtilde

(b) Results within +/- 2.8 SE, Rmax = ΠR˜. 1 .9 tilde

.8 Survive Rmax=1: 0.389 .7 .6 .5 Share of Results with β +/− 2.8 SE from .4

1 2 3 4 5 6 7 8 9 10 Multiplier Π: Rmax = Π *Rtilde

Notes: These graphs show the performance of non-randomized results under the proportional selection adjustment. Each

figure graphs the share of results which would survive varying parametrizations of Rmax, in all cases assuming δ˜ = 1. Sub-

Figure a indicates the share of results which would survive Rmax = ΠR˜ for varying values of Π, with the survival in this case meaning the identified set does not include zero. This figure contains only relationships where the effect is significant with controls and adding the controls moves the coefficient toward zero. Sub-Figure b indicates the share of results for which the full identified set would be within 2.8 standard errors of the controlled coefficient. This sub-figure includes all relationships.

29 Figure 4: Results from Randomized Data

(a) Rejection of Zero, Rmax = ΠR˜. 1 ) tilde .9

.8 Survive Rmax=1: 0.694 .7 .6 .5 Share of Results With sign( β )=sign(

1 2 3 4 5 6 7 8 9 10 Multiplier: Rmax = Π*Rtilde

(b) Results within +/- 2.8 SE, Rmax = ΠR˜. 1 .9 tilde

.8 Survive Rmax=1: 0.658 .7 .6 .5 Share of Results with β +/− 2.8 SE from .4

1 2 3 4 5 6 7 8 9 10 Multiplier Π : Rmax = Π *Rtilde

Notes: These graphs show the performance of randomized results under the proportional selection adjustment. Each figure graphs the share of results which would survive varying parametrizations of Rmax, in all cases assuming δ˜ = 1. Sub-Figure a indicates the share of results which would survive Rmax = ΠR˜ for varying values of Π, with the survival in this case meaning the identified set does not include zero. This figure contains only relationships where the effect is significant with controls and adding the controls moves the coefficient toward zero. Sub-Figure b indicates the share of results for which the full identified set would be within 2.8 standard errors of the controlled coefficient. This sub-figure includes all relationships.

30 Figure 5: Relationship between Full Robustness and Coefficient Movement 4 3 2 1 0 −.8 −.6 −.4 −.2 0 % Reduction in Coefficient Size

Robust to Adjustment Not Robust to Adjustment

Notes: This graph shows the range of coefficient movements in non-randomized studies divided into those which are robust to the propor- tional selection adjustment with Rmax = 2.2R˜ (solid line) and those which are not (dotted line). This includes only relationships in which the inclusion of controls moves the coefficient toward zero.

31 Table 1: Example from Simulated Data

Simulation Inputs Simulation Outputs

δ˜ V ar(W˜2) β˚[R˚] β˜[R˜] Rmax Change in Coeff. Remaining Bias Adjusted β 1 0.1 0.52 [.12] 0.32 [.53] 0.57 0.198 0.017 0.308 1 1 0.69 [.14] 0.50 [.41] 0.69 0.189 0.198 0.299 1 10 2.50 [.46] 2.38 [.49] 0.93 0.116 2.08 0.309 0.5 0.1 0.51 [.11] 0.31 [.52] 0.56 0.199 0.009 0.307 0.5 1 0.60 [.11] 0.40 [.39] 0.70 0.196 0.102 0.295 0.5 10 1.49 [.17] 1.33 [.22] 0.92 0.159 1.03 0.291 Notes: This table shows calculations based on simulated data described in Section 3.3. The simulation constructs the outcome as follows:

Y = .3X + W1 + W˜2 +  where  ∼ N(0, 1), Cov(X, ) = Cov(W1, ) = Cov(W˜2, ) = 0, V ar(W1) = V ar(X) = 1 and the means of X,W1 and W˜2 are all equal to 1. The values β˚ and R˚ come from regressions of Y on X with no other controls; β˜ and R˜ come from regression which control for W1.

32 Table 2: Maternal Behavior, Child IQ and Birth Weight

(1) (2) (3) (4) (5) (6) (7)

Panel A: Child IQ, Standardized (NLSY) (Rmax = .61)

Treatment Variable Baseline Effect Controlled Effect Null Reject? Sibling FE Identified Rmaxfor β = 0 δ˜ for β = 0 2 2 (Std. Error), [R ] (Std. Error), [R ] (extrnl. evid.) Estimate Set (δ˜ = 1) Given Rmax Breastfeed (Months) 0.045∗∗∗ (.003) [.045] 0.017∗∗∗(.002) [.256] No -0.007 (.005) [-0.028,0.017] 0.391 0.382 Drink in Preg. (Any) 0.176∗∗∗(.026) [.008] 0.050∗∗(.023) [.249] No 0.026 (.036) [-0.138,0.050] 0.345 0.265

33 LBW + Preterm -0.188∗∗∗(.057) [.004] -0.125∗∗∗(.050) [.251] Yes -0.111 (.070) [-0.124,-0.033]† 0.743 1.37

Panel B: Birth Weight in Grams (NLSY) (Rmax = .53)

Treatment Variable Baseline Effect Effect with Null Reject? Sibling FE Identified Rmaxfor β = 0 δ˜ for β = 0

Full Controls (extrnl. evid.) Estimate Set (δ˜ = 1) Given Rmax Smoking in Preg -206.7∗∗∗ (16.0) [.030] -196.5∗∗∗(16.5) [.068] Yes -90.1∗∗∗(31.8) [-196.5,-73.6]† 0.806 1.59 Drink in Preg. (Amt) -25.1∗∗∗(6.06) [.011] -21.8∗∗∗ (5.99) [.056] No -6.31 (8.64) [-21.8,12.9] 0.353 0.63 Notes: This table shows the validation results for the analysis of the impact of maternal behavior on child birth weight and IQ. Baseline effects include only controls for child sex and age dummies in the case of IQ. Full controls: race, age, education, income, marital status. Sibling fixed effects estimates come from NLSY in all panels. The identified set in Column (5) is ∗ † bounded below by β˜ and above by β calculated based on Rmax given in the top row of each panel and δ˜ = 1. identified set excludes zero. The Rmax calculation in Column (6) is done under the assumption that δ˜ = 1 ∗ significant at 10% level, ∗∗ significant at 5% level, ∗∗∗ significant at 1% level. Table 3: Robustness of Stability Results

Panel A: Non-Randomized Data, Share of Results which Survive δ˜ = 1, varying Rmax

Rmax = 1 Rmax = min(3R˜; 1) Rmax = min(2R˜; 1) Rmax = min(1.5R˜; 1) Share With Adjusted β Same Sign as β˜ 41% 54% 64% 70% Sample: Add Controls, Moves toward Zero Share with Adjusted β +/- 2.8 SE of β˜ 38% 52% 63% 69% Sample: All

Panel B: Randomized Data, Share of Results which Survive δ˜ = 1, varying Rmax

Rmax = 1 Rmax = min(3R˜; 1) Rmax = min(2R˜; 1) Rmax = min(1.5R˜; 1) Share With Adjusted β Same Sign as β˜ 69% 91% 97% 100% Sample: Add Controls, Moves toward Zero Share with Adjusted β +/- 2.8 SE of β˜ 63% 87% 91% 95% Sample: All Notes: This table describes the survival of non-randomized (Panel A) and randomized (Panel B) results under the proportional selection adjustment. Both panels show the share of results which would survive δ˜ with varying Rmax values. I consider two definitions of survival: (1) the identified set does not include zero and (2) the outer bound of the set is within 2.8 standard errors of β.˜ The first of these is considered only for results which move toward zero when controls are added.

34 Table 4: Examples from Economics Literature

Panel A: Mian and Sufi (2009)

Result Description Baseline Effect Controlled Effect Rmaxfor β = 0 δ˜ for β = 0 Identified Set 2 2 ∗ (Std.Error)[R ] (Std.Error)[R ](δ˜ = 1) (Rmax = 2.2R˜)[β,˜ β (2.2R,˜ 1)] Table V, Column (1) to (4) 0.49 (.033) [.44] 0.431 (.079) [.94] > 1 54.19 [0.431,0.423] Panel B: Nunn and Wantchekon (2011)

Result Description Baseline Effect Controlled Effect Rmaxfor β = 0 δ˜ for β = 0 Identified Set 2 2 ∗ (Std.Error)[R ] (Std.Error)[R ](δ˜ = 1) (Rmax = 2.2R˜)[β,˜ β (2.2R,˜ 1)] Row 1, Trust Relatives -0.163 (.043) [.115] -0.133 (.036) [.133] 0.21 0.48 [-0.133, 0.141] Row 1, Trust Neighbors -0.196 (.045) [.117] -0.159 (.034) [.156] 0.32 0.89 [-0.159, 0.020] Row 1, Trust Local Council -0.147 (.031) [.163] -0.111 (.022) [.196] 0.30 0.43 [-0.111, 0.149] Row 1, Intragroup Trust -0.179 (.041) [.115] -0.144 (.031) [.144] 0.26 0.69 [-0.144, 0.064] Row 1, Intergroup Trust -0.126 (.032) [.091] -0.097 (.027) [.112] 0.18 0.52 [-0.097, 0.088] Row 2, Trust Relatives -0.193 (.043) [.106] -0.178 (.031) [.130] 0.41 1.81 [-0.178, -0.080] Row 2, Trust Neighbors -0.238 (.044) [.115] -0.202 (.029) [.159] 0.47 1.63 [-0.202, -0.078] Row 2, Trust Local Council -0.177 (.027) [.175] -0.128 (.021) [.205] 0.28 0.32 [-0.128, 0.274] Row 2, Intragroup Trust -0.208 (.041) [.121] -0.187 (.032) [.155] 0.47 1.70 [-0.187, -0.078] Row 2, Intergroup Trust -0.145 (.031) [.093] -0.115 (.030) [.119] 0.21 0.68 [-0.115, 0.054] Panel C: Spolaore and Wacziarg (2009)

Result Description Baseline Effect Controlled Effect Rmaxfor β = 0 δ˜ for β = 0 Identified Set 2 2 ∗ (Std.Error)[R ] (Std.Error)[R ](δ˜ = 1) (Rmax = 2.2R˜)[β,˜ β (2.2R,˜ 1)] Table 4, Column (1) to (5) 6.357 (.996) [.11] 4.134 (1.046) [.22] 0.42 0.75 [-1.37, 4.13] Notes: This table summarizes the sensitivity of results in three papers to the proportional selection adjustment selected. Mian and Sufi (2009) estimate the relationship between sub-prime borrowers and mortgage origination by zip code. The uncontrolled results for Mian and Sufi (2009) are slightly different from the paper because these limit to the same sample for controlled and uncontrolled results. Nunn and Wantchekon (2011) estimate the relationship between African slave trades and the extent of mistrust in modern African countries. Spolaore and Wacziarg (2009) estimate the impact of genetic distance on income distance across countries.

35 Appendix A: Theoretical Results A.1. Details of Proofs ˜ I show the proof in this section for arbitrary Rmax and δ, the formulation described in Section 3. Note, ˜ however, that if Rmax = 1 then δ = δ, and the proof is identical. Therefore I show the proof for Corollary 1 below, noting that Proposition 1 is a special case.

p 2 2 ˚ ˜ σ11−σ1X (δσ22+σ11) Proof of Lemma 1: Claim: (β − β) → σ1X 2 σ11(σ11−σ1,X ) ˆ Cov(W1,X) ˆ Proof: Observe that λw1|X converges in probability to V (X) = σ1X . By a similar logic, λW2|X converges ˜ δσ1X σ22 ˆ Cov(W2,X) ˜ to σ2X and, under proportional selection, to . λW |X,W converges in probability to where X σ11 2 1 V ar(X˜ ) 2 σ1X is the residual from a regression of X on W1. Note that V ar(X˜) converges in probability to 1 − . σ11 ˜ ˆ δσ22σ1X Therefore, again invoking proportional selection, λW2|X,W1 converges in probability to 2 . Subtracting σ11−σ1X and simplifying yields the result.

2 σ2 −σ2 (σ +δσ˜ ) ˜ ˚ p [ 11 1X 11 22 ] Proof of Lemma 2: Claim: (R − R)ˆσyy → 2 2 and σ11(σ11−σ1X ) σ σ2 −σ2 (σ +δ˜2σ ) ˜ p 22[ 11 1X 11 22 ] (Rmax − R)ˆσyy → 2 . σ11(σ11−σ1X ) (β+λˆ +λˆ )2 Proof: Observe the following definitions. From the short regression coefficient, R˚ = w1|X W2|X . By σˆyy σ (δσ˜ +σ ) (β+ 1,X 22 11 )2 Lemma 1, this converges in probability to σ11 . In the intermediate regression the calculation σyy ˆ relies on the coefficient on X (β + λW2|X,W1 ) and the coefficient on W1, which is also biased by the exclusion of σ1X ˆ W2 through the joint correlation with X and is equal to 1 − λ . Thus, σ11 W2|X,W1 ˆ 2 σ1X ˆ 2 ˆ σ1X ˆ (β+λW |X,W ) +σ11(1− λW |X,W ) +2σ1X (β+λW |X,W )(1− λW |X,W ) R˜ = 2 1 σ11 2 1 2 1 σ11 2 1 . By Lemma 1, σˆyy ˜ ˜ ˜ ˜ δσ22σ1X 2 σ1X δσ22σ1X 2 δσ22σ1X σ1X δσ22σ1X (β+ ) +(1− ) σ11+2(β+ )(1− )σ1X p σ −σ2 σ σ −σ2 σ −σ2 σ σ −σ2 R˜ → 11 1X 11 11 1X 11 1X 11 11 1X . Finally, observe that σyy ˜ 2 δσ1X σ22 β +σ11+σ22+2βσ1,X +2β σ11 Rmax = . Differencing these expressions appropriately yields the result. σyy

Proof of Corollary 1. Claim : Define:

 h i ˜ β˜ − β˚− β˜ Rmax−R if δ˜=1  R˜−R˚  " q #  β˚−β˜ 2 Θ2+Θ 4δ(1−δ) β˚−β˜ 2 R −R˜ −Θ β˚−β˜  ˜ [ ] [ ( [ ] [ max ])] [ ] ˜ ∗0 β − 2 if δ 6= 1, σ1X ≥ 0 β = 2(1−δ)[β˚−β˜] [R˜−R˚]  " q #  − β˚−β˜ 2 Θ2+Θ 4δ˜(1−δ) β˚−β˜ 2 R −R˜ −Θ β˚−β˜  ˜ [ ] [ ( [ ] [ max ])] [ ] ˜ β − 2 if δ 6= 1, σ1X < 0  2(1−δ˜)[β˚−β˜] [R˜−R˚]

h i h i h i 2 ˚ ˜ 2 ∗ p where Θ = R˜ − R˚ σˆyy + β − β R˜ − R˚ . β → β.

δσ˜ 22σ1X Proof: Recall that the object of interest – the bias – is 2 . There are three unknowns here: σ11, σ22 and σ11−σ1X σ1X . Note that none of these can be calculated directly from the data. Lemmas 1 and 2 provide a system of three equations in these variables. Lemmas are stated in probability limits; for the proof I will write these as equalities to simplify notation, and return to the probability limit notation at the end. In addition, again to

36 simplify notation in the algebra, I will adopt single letter notation for each of the differences.

2 2 ˜ ˚ ˜ σ11 − σ1X (δσ22 + σ11) A = β − β = σ1X 2 σ11(σ11 − σ1X ) h i2 2 2 ˜ σ − σ (σ11 + δσ22) h ˜ ˚i 11 1X B = R − R σyy = 2 2 σ11(σ11 − σ1X ) h i 2 2 ˜2 σ22 σ − σ (σ11 + δ σ22) h ˜i 11 1X C = Rmax − R σyy = 2 σ11(σ11 − σ1X )

The algebra differs slightly for the case of δ˜ = 1 and the case of δ˜ 6= 1 but only in a later step of the proof. I will note when the cases diverge below. The method of proof is simply to solve the system of simultaneous equations. Some algebraic steps are suppressed. Solve Equation (1) for σ22:

2 2 ˜ σ11 − σ1X (δσ22 + σ11) A = σ1X 2 σ11(σ11 − σ1X ) " 2 2 3 # 1 σ σ1X − Aσ11(σ11 − σ ) − σ11σ σ = 11 1,X 1X 22 ˜ 3 δ σ1X

Solve Equation (2) for σ11 and σ22 in terms of σ1X :

2 h i2 h σ2 σ −Aσ (σ −σ2 )−σ σ3 i 2 2 ˜ 2 2 11 1X 11 11 1,X 11 1X σ11 − σ1X (σ11 + δσ22) σ11 − σ1X (σ11 + 3 ) σ1X B = 2 2 = 2 2 σ11(σ11 − σ1X )σy σ11(σ11 − σ1X ) σ2 (A2 + B) σ = 1X 11 A2  2 2  1 σ1X (B + A B)[σ1X − A] σ22 = δ˜ A4

2 Note that for the bias calculation we do not require σ11 alone but only σ11 − σ1X which, given values 2 2 σ1X B δσ˜ 22A above, equals 2 and allows us to collapse the bias calculation to . A σ1X B

˜ Case 1: δ = 1. Solve Equation (3) for σ1X :

h σ (B2+A2B)[σ −A] i h σ2 (A2+B)  σ2 (A2+B) h σ (B2+A2B)[σ −A] ii  2 2  1X 1X 1X 2 1X 1X 1X σ σ − σ (σ + σ ) A4 A2 − σ1X A2 + A4 C = 22 11 1X 11 22 = 2 σ2 (A2+B) h σ2 (A2+B) i σ11(σ11 − σ1X ) 1X 1X 2 A2 A2 − σ1X

CA3 + A(B2 + A2B) σ = 1X (B2 + A2B)  C  σ = CA3 + A(B2 + A2B) 22 A(B2 + A2B)

Applying these values to the formula above, we have: ˜ ˜ δσ22σ1X AC ˜h ˆ ˜i Rmax − R 2 = = δ β − β σ11 − σ1X B R˜ − Rˆ

37 which leads us to the δ˜ = 1 result.

˜ Case 2: δ 6= 1. Solve Equation (3) for σ1X :

h i 2 2 ˜2 σ22 σ11 − σ1X (σ11 + δ σ22) C = 2 σ11(σ11 − σ1X )  2  2 2 h σ2 (A2+B) i σ2 (A2+B) 2 2 σ1X (B +A B)[σ1X −A] 1X − σ2 ( 1X + δ˜σ1X (B +A B)[σ1X −A] ) 1 A4 A2 1X A2 A4 C = σ2 (A2+B) σ2 (A2+B) δ˜ 1X 1X 2 A2 ( A2 − σ1X )

This does not simplify to the extent that the δ = 1 case does, and solving for σ1X requires the quadratic formula. Applying this, we find: q (A(B2 + A2B)(1 − 2δ˜)) ± (A2(B2 + A2B)2 + 4(B2 + A2B)(1 − δ˜)δCA˜ 4) σ1X = 2(B2 + A2B)(1 − δ˜)

Note this has two roots. The positive root corresponds to the case where σ1X ≥ 0; the negative root to the case where σ1X < 0. Given this and the resulting formula for σ22 we can complete the solution. If σ1X ≥ 0, we have:

 r h i 2 2 2 2 2 2 2 ˜ ˜ 2 ˜ 2 (−A(B + A B) + (A (B + A B) B + A B + 4δ(1 − δ)CA δσ22A   =   σ1X B  2(1 − δ˜)BA2 

If σ1X < 0 we have:

 r h i 2 2 2 2 2 2 2 ˜ ˜ 2 ˜ 2 (−A(B + A B) − (A (B + A B) B + A B + 4δ(1 − δ)CA δσ22A   =   σ1X B  2(1 − δ˜)BA2 

Substituting in the difference values for A, B and C yields the result.

˜ Note that calculating the values of Rmax or δ which produce β = 0 simply involves rearranging these equations.

A.2 Further Theoretical Results This appendix discusses two additional issues related to the theory. Subsection A.2.1 below briefly contrasts the calculation of bias based on the coefficients to the calculation directly from the data suggested by Altonji, Elder and Taber (2005). Subsection A.2.2 discusses details of the case with m.

A.2.1 Altonji, Elder and Taber (2005) Calculation Recall the model: Y = α + βX + W1 + W2 Lemma 1 in the text demonstrates that, under the proportional selection relationship the bias on the ˜ δσ22σ1X intermediate regression coefficient β is 2 . σ11−σ1X Altonji, Elder and Taber (2005) suggest that this bias might be calculated directly from the data. In particular, they propose: ˜ 1. Run the intermediate regression, which we will denote Y = βX + ΨW1 +. ˜

38 2. Calculate ΨW1 and denote the variance of the residual V˜. ˆ ˆ 3. Regress X on ΨW1. Denote the coefficient on ΨW1 as Γ, and the variance of the residual VX˜ . 4. Calculate the bias as δΓV˜ VX˜ Recall that V ar(X) = 1. Consider each of the elements of this in turn:

1. VX˜ 2 p σ1X VX˜ → 1 − σ11

ˆ p σ1X δσ22σ1X 2. Γ. Note first that Ψ → 1 − 2 . σ11 σ11(1−σ1X )

Cov(Ψˆ W ,X) Cov(W ,X) Γ = 1 = 1 V ar(Ψˆ W1) Ψˆ V ar(W1) 2 p σ1X σ1X (σ11 − σ1X ) Γ → h i = 2 2 σ1X δσ22σ1X σ11(σ11 − σ ) − δσ22σ 1 − 2 σ11 1X 1X σ11 σ11−σ1X )

3. V˜. 2 p (δσ22σ1X ) V˜ → σ22 − 2 σ11(σ11 − σ1X ) Combining these, we find:

 2 2 2  δΓV˜ p δσ22σ1X σ11(σ11 − σ1X ) − δ σ22σ1X → 2 2 2 VX˜ σ11 − σ1X σ11(σ11 − σ1X ) − δσ22σ1X If δ = 1 the second term cancels, but in cases where δ 6= 1 it does not and this calculation is a close approximation to the bias.

A.2.2 Additional Category Controls Section 3 discusses extending the model to a case where there is an additional, orthogonal, category of controls, so the true model is Y = α + βX + W1 + W˜2 + m +  I suggest in Section 3 that the appropriate procedure for recovering β if m is observed is to include m in both the short and intermediate regressions and preform the same procedure. It is trivial to see why this works. I have assumed that m is orthogonal to W1 and W˜ 2. The only correlations are between m and X and Y. Consider regressing Y on m and taking residuals and doing the same for X. We can then run our original procedure on the residuals of X and Y to recover β. Including the m control in both regressions is equivalent to this exercise. In the case where m is not observed, I suggest that it is still possible to use this procedure to recover β from this regression: Y = α + βX + ΨW1 + ΨW˜2 +  Although this will not be the causal effect, since m is omitted, it will be closer to the causal effect since it 14 adjusts for the influence of W˜ 2.The procedure for recovering β differs from the main text only in that Ψ 6= 1. To prove this, we therefore work through a modified version of the proof in Section 2. Short and intermediate regression coefficients are given below. ˚ ˆ ˆ ˆ ˆ β = β + Ψλw1|X + ΨλW2|X ˜ ˆ ˆ β = β + ΨλW2|X,W1

14As mentioned in the text this is because they are biased by the exclusion of m through the joint correlation with X.

39 By the same logic as Lemma 1 in the text, and the observation that Ψˆ →p Ψ we observe that

2 2 ˜ ˚ ˜ p σ11 − σ1X (δσ22 + σ11) β − β → Ψσ1X 2 σ11(σ11 − σ1X )

Ψσ (δσ˜ +σ ) p (β+ 1X 22 11 )2 Turning to the r-squared values, we observe R˚ → σ11 , σyy ˜ δσ˜ σ δσ˜ σ ˜ δσ22σ1X 2 σ1X 22 1,X 2 22 1,X σ1X δσ22σ1X (β+ ) +(Ψ− ) σ11+2(β+ )(Ψ− )σ1X p σ −σ2 σ σ −σ2 σ −σ2 σ σ −σ2 R˜ → 11 1X 11 11 1X 11 1X 11 11 1X and σyy ˜ 2 2 2 δσ1X σ22 β +Ψ σ11+Ψ σ22+2βΨσ1X +2βΨ σ11 Rmax = . Algebraic simplification then yields: σyy h i2 2 2 ˜ σ11 − σ1X (σ11 + δσ22) h ˜ ˚i p 2 R − R σyy → Ψ 2 2 σ11(σ11 − σ1X ) h i 2 2 ˜2 σ22 σ − σ (σ11 + δ σ22) h ˜i p 11 1X Rmax − R σyy → Ψ 2 σ11(σ11 − σ1X ) Combining, we replicate the results from Section 2.

40 Appendix B: Appendix Tables

Table 1: Summary Statistics: NLSY Wage Data

Mean Standard Deviation Range Sample Size Log Wages (1996-1998) 2.67 0.63 0-6.21 7496 Years of educ. 12.5 2.24 0-20 7496 Years of exper. 16.3 3.02 8-31 7496 Female 0.49 0.50 0-1 7496 Region of Residence N/A N/A 1-4 7496 White 0.64 0.47 0-1 7496 Married Codes N/A N/A 0-6 7496 Mother Educ (yrs) 11.0 3.00 1-20 7496 Father Educ (yrs) 11.2 5.27 1-20 7496 Mother Occup (codes) N/A N/A 0-984 7496 Father Occup (codes) N/A N/A 0-984 7496 Siblings (#) 3.8 2.6 0-22 7496 Notes: This table shows summary statistics for the data used in the NSLY wage analysis in Section 4. Data comes from the NLSY-79 cohort. Means are not reported for region, marital codes or occupation because they are not meaningful. All variables are controlled in the regressions as dummies. Wages are the max of 1996 and 1998 wages.

41 Table 2: Summary Statistics: Early Life and Child IQ

Panel A: IQ Analysis Mean Standard Deviation Sample Size IQ (PIAT Score, Standardized) 0.025 0.991 6962 Breastfeeding Months 2.40 4.63 6514 LBW + Preterm 0.049 0.217 6174 Mom Drink at all in Pregnancy 0.322 0.467 6537 Age 5.57 1.37 6962 Child Female 0.495 0.500 6962 Mother Black 0.282 0.450 6962 Mother Age 25.3 5.61 6962 Mother Education (years) 12.2 2.7 6962 Mother Income $41,294 $80,735 6962 Mother Married 0.654 0.476 6962

Panel B: Birth Weight Analysis Birth Weight (grams) 3290.4 647.69 7686 Mom Smoke in Pregnancy 0.291 0.454 7686 Drinking Intensity (0-7) 0.638 1.16 7442 Child Female 0.486 0.499 7686 Mother Black 0.273 0.445 7686 Mother Age 24.4 5.49 7686 Mother Education (years) 12.0 2.7 7686 Mother Income $30,813 $65,374 7686 Mother Married 0.667 0.471 7686 Notes: This table shows summary statistics for the data used in the analysis in Section 4. Drinking intensity is coded from 0 (never) to 7 (every day). Natality detail files are from 2001 and 2002. Data is from the NLSY Children and Young Adults panel.

42 Appendix C: Alternative Parametrization of Rmax This appendix considers how the results in Section 5 would change if I used an alternative parametrization of Rmax. The primary analysis in the paper uses Rmax = ΠR˜ with varying Π. Here, I use Rmax = R˜ + Π(R˜ − R˚). I consider the same questions: the level of robustness of non-randomized results, the Π cutoff implied by the randomized data and what share of non-randomized results would survive the cutoff. For simplicity, I consider here only the primary robustness criteria of whether the identified set excludes zero, and therefore limit to results where inclusion of controls moves the effect size toward zero. Figure 1a shows the robustness of non-randomized data under this parametrization, and Figure 1b shows the randomized robustness. The multiplier values in both cases are larger here, reflecting the fact that the increase in R-squared from R˚ to R˜ is smaller in value than the level, R˜. The observation that the randomized data is more robust than the non-randomized is even more true. Table 1 replicates the form of Table 3 in the paper. I consider larger values of Π as cutoffs but, again, the conclusion of varying stability and higher stability of randomized results. The robustness cutoff value implied by the randomized data is 4.95. With this value 67% of non-randomized results would survive, almost exactly the share in the main analysis. Perhaps more important, in 83% of cases the robustness conclusions overlap, suggesting that similar information is being provide by both parametrizations.

Table 1: Robustness of Stability Results, Alternative Rmax

Share of Results which Survive δ˜ = 1, varying Rmax: Robustness is Identified Set Excludes Zero

Rmax = 1 Rmax = min{R˜ + 5(R˜ − R˚), 1} Rmax = min{R˜ + 3(R˜ − R˚), 1} Rmax = min{R˜ + 2(R˜ − R˚), 1} Non-Randomized 41% 67% 75% 81% Randomized 69% 94% 97% 100% Notes: This table describes the survival of non-randomized and randomized results under the proportional selection adjustment with varying

Rmax using the alternative Rmax parametrization. Both rows show the share of results which would survive δ˜ with varying Rmax values with survival defined as the identified set does not include zero. The analysis includes only results which move toward zero when controls are added.

43 Figure 1: Stability Results Using Additive Rmax Parametrization

(a) Rejection of Zero, Non-Randomized, Rmax = R˜ + Π(R˜ − R˚). 1 ) tilde

.8 Survive Rmax=1: 0.413 .6 Share of Results With sign( β )=sign( .4 0 2 4 6 8 10 12 14 Multiplier: Rmax = Rtilde+Π*(Rtilde−Rdot)

(b) Rejection of Zero, Randomized, Rmax = R˜ + Π(R˜ − R˚). 1 ) tilde .95

.9 Survive Rmax=1: 0.694 .85 .8 Share of Results With sign( β )=sign( .75 0 2 4 6 8 10 12 14 Multiplier: Rmax = Rtilde+Π*(Rtilde−Rdot)

Notes: These graphs show the performance of non-randomized results (Sub-Figure a) and randomized results (Sub-Figure b) with under the proportional selection adjustment. Each figure graphs the share of results which would survive varying

parametrizations of Rmax, in all cases assuming δ˜ = 1. Each Sub-Figure indicates the share of results which would survive

Rmax = R˜ + Π(R˜ − R˚) for varying values of Π, with the survival in this case meaning the identified set does not include zero. This figure contains only relationships where the effect is significant with controls and adding the controls moves the coefficient toward zero.

44 Appendix D: References for Section 5

Abeler, Johannes, , Lorenz Goette, and David Huffman, “Reference Points and Effort Provision,” AMERICAN ECONOMIC REVIEW, APR 2011, 101 (2), 470–492. Acemoglu, Daron, Simon Johnson, James A. Robinson, and Pierre Yared, “Income and democracy,” AMERICAN ECONOMIC REVIEW, JUN 2008, 98 (3), 808–842. Aghion, Philippe, Robin Burgess, Stephen J. Redding, and Fabrizio Zilibotti, “The Unequal Effects of Liberalization: Evidence from Dismantling the License Raj in India,” AMERICAN ECONOMIC REVIEW, SEP 2008, 98 (4), 1397–1412. Aker, Jenny C., Christopher Ksoll, and Travis J. Lybbert, “Can Mobile Phones Improve Learning? Evidence from a Field Experiment in Niger,” AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, OCT 2012, 4 (4), 94–120. Algan, Yann and Pierre Cahuc, “Inherited Trust and Growth,” AMERICAN ECONOMIC REVIEW, DEC 2010, 100 (5), 2060–2092. Angrist, Joshua and Victor Lavy, “The Effects of High Stakes High School Achievement Awards: Evidence from a Randomized Trial,” AMERICAN ECONOMIC REVIEW, SEP 2009, 99 (4), 1384–1414. Ashraf, Nava, “Spousal Control and Intra-Household Decision Making: An Experimental Study in the Philippines,” AMERICAN ECONOMIC REVIEW, SEP 2009, 99 (4), 1245–1277. , James Berry, and Jesse M. Shapiro, “Can Higher Prices Stimulate Product Use? Evidence from a Field Experiment in Zambia,” AMERICAN ECONOMIC REVIEW, DEC 2010, 100 (5), 2383–2413. Ashraf, Quamrul and Oded Galor, “Dynamics and Stagnation in the Malthusian Epoch,” AMERICAN ECONOMIC REVIEW, AUG 2011, 101 (5), 2003–2041. Bandiera, Oriana, Andrea Prat, and Tommaso Valletti, “Active and Passive Waste in Government Spending: Evidence from a Policy Experiment,” AMERICAN ECONOMIC REVIEW, SEP 2009, 99 (4), 1278–1308. Beaman, Lori, Raghabendra Chattopadhyay, Esther Duflo, Rohini Pande, and Petia Topalova, “POWERFUL WOMEN: DOES EXPOSURE REDUCE BIAS?,” QUARTERLY JOURNAL OF ECONOMICS, NOV 2009, 124 (4), 1497–1540. Beaudry, Paul, David A. Green, and Benjamin Sand, “Does Industrial Composition Matter for Wages? A Test of Search and Bargaining Theory,” ECONOMETRICA, MAY 2012, 80 (3), 1063–1104. Becker, Sascha O. and Ludger Woessmann, “WAS WEBER WRONG? A HUMAN CAPITAL THEORY OF PROTESTANT ECONOMIC HISTORY,” QUARTERLY JOURNAL OF ECONOMICS, MAY 2009, 124 (2), 531–596. Bernard, Andrew B., Stephen J. Redding, and Peter K. Schott, “Multiple-Product Firms and Product Switching,” AMERICAN ECONOMIC REVIEW, MAR 2010, 100 (1), 70–97.

1 Bloom, Nicholas, Raffaella Sadun, and John Van Reenen, “Americans Do IT Better: US Multinationals and the Productivity Miracle,” AMERICAN ECONOMIC REVIEW, FEB 2012, 102 (1), 167–201. Bohnet, Iris, Fiona Greig, Benedikt Herrmann, and Richard Zeckhauser, “Betrayal aversion: Evidence from Brazil, China, Oman, Switzerland, Turkey, and the United States,” AMERICAN ECONOMIC REVIEW, MAR 2008, 98 (1), 294–310. Boivin, Jean, Marc P. Giannoni, and Ilian Mihov, “Sticky Prices and Monetary Policy: Evidence from Disaggregated US Data,” AMERICAN ECONOMIC REVIEW, MAR 2009, 99 (1), 350–384. Broda, Christian, Nuno Limao, and David E. Weinstein, “Optimal Tariffs and : The Evidence,” AMERICAN ECONOMIC REVIEW, DEC 2008, 98 (5), 2032–2065. Brunnermeier, Markus K. and Stefan Nagel, “Do wealth fluctuations generate time-varying risk aversion? Micro-evidence on individuals’ asset allocation,” AMERICAN ECONOMIC REVIEW, JUN 2008, 98 (3), 713–736. Buera, Francisco J., Alexander Monge-Naranjo, and Giorgio E. Primiceri, “Learning the Wealth of Nations,” ECONOMETRICA, JAN 2011, 79 (1), 1–45. Burde, Dana and Leigh L. Linden, “Bringing Education to Afghan Girls: A Randomized Controlled Trial of Village-Based Schools,” AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, JUL 2013, 5 (3), 27–40. Bursztyn, Leonardo and Lucas C. Coffman, “The Schooling Decision: Family Preferences, Intergenerational Conflict, and Moral Hazard in the Brazilian Favelas,” JOURNAL OF POLITICAL ECONOMY, JUN 2012, 120 (3), 359–397. Bustos, Paula, “Trade Liberalization, Exports, and Technology Upgrading: Evidence on the Impact of MERCOSUR on Argentinian Firms,” AMERICAN ECONOMIC REVIEW, FEB 2011, 101 (1), 304–340. Caballero, Ricardo J., Takeo Hoshi, and Anil K. Kashyap, “Zombie Lending and Depressed Restructuring in Japan,” AMERICAN ECONOMIC REVIEW, DEC 2008, 98 (5), 1943–1977. Cai, Hongbin, Yuyu Chen, and Hanming Fang, “Observational Learning: Evidence from a Randomized Natural Field Experiment,” AMERICAN ECONOMIC REVIEW, JUN 2009, 99 (3), 864–882. Card, David, Alexandre Mas, Enrico Moretti, and Emmanuel Saez, “Inequality at Work: The Effect of Peer Salaries on Job Satisfaction,” AMERICAN ECONOMIC REVIEW, OCT 2012, 102 (6), 2981–3003. Carpenter, Jeffrey, Peter Hans Matthews, and John Schirm, “Tournaments and Office Politics: Evidence from a Real Effort Experiment,” AMERICAN ECONOMIC REVIEW, MAR 2010, 100 (1), 504–517. Chaney, Eric, “REVOLT ON THE NILE: ECONOMIC SHOCKS, RELIGION, AND POLITICAL POWER,” ECONOMETRICA, SEP 2013, 81 (5), 2033–2053. Charles, Kerwin Kofi and Jonathan Guryan, “Prejudice and Wages: An Empirical Assessment of Becker’s The Economics of Discrimination,” JOURNAL OF POLITICAL ECONOMY, OCT 2008, 116 (5), 773–809. , Erik Hurst, and Nikolai Roussanov, “ AND RACE,” QUARTERLY JOURNAL OF ECONOMICS, MAY 2009, 124 (2), 425–467. Chen, Roy and Yan Chen, “The Potential of Social Identity for Equilibrium Selection,” AMERICAN ECONOMIC REVIEW, OCT 2011, 101 (6), 2562–2589.

2 Chetty, Raj, Adam Looney, and Kory Kroft, “Salience and Taxation: Theory and Evidence,” AMERICAN ECONOMIC REVIEW, SEP 2009, 99 (4), 1145–1177. , John N. Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzenbach, and Danny Yagan, “How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from Project Star,” QUARTERLY JOURNAL OF ECONOMICS, NOV 2011, 126 (4), 1593–1660. Cohen, Jessica and Pascaline Dupas, “FREE DISTRIBUTION OR COST-SHARING? EVIDENCE FROM A RANDOMIZED MALARIA PREVENTION EXPERIMENT,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2010, 125 (1), 1–45. Cohen, Lauren, Andrea Frazzini, and Christopher Malloy, “The Small World of Investing: Board Connections and Mutual Fund Returns,” JOURNAL OF POLITICAL ECONOMY, OCT 2008, 116 (5), 951–979. Cole, Shawn, Xavier Gine, Jeremy Tobacman, Petia Topalova, Robert Townsend, and James Vickery, “Barriers to Household Risk Management: Evidence from India,” AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, JAN 2013, 5 (1), 104–135. Cortes, Patricia, “The effect of low-skilled immigration on U. S. prices: Evidence from CPI data,” JOURNAL OF POLITICAL ECONOMY, JUN 2008, 116 (3), 381–422. Dafny, Leemore, Mark Duggan, and Subramaniam Ramanarayanan, “Paying a Premium on Your Premium? Consolidation in the US Health Insurance Industry,” AMERICAN ECONOMIC REVIEW, APR 2012, 102 (2), 1161–1185. Das, Jishnu, Stefan Dercon, James Habyarimana, Pramila Krishnan, Karthik Muralidharan, and Venkatesh Sundararaman, “School Inputs, Household Substitution, and Test Scores,” AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, APR 2013, 5 (2), 29–57. Dell, Melissa, “The Persistent Effects of Peru’s Mining Mita,” ECONOMETRICA, NOV 2010, 78 (6), 1863–1903. Djankov, Simeon, Oliver Hart, Caralee McLiesh, and Andrei Shleifer, “Debt Enforcement around the World,” JOURNAL OF POLITICAL ECONOMY, DEC 2008, 116 (6), 1105–1150. Duflo, Esther, Michael Kremer, and Jonathan Robinson, “Nudging Farmers to Use Fertilizer: Theory and Experimental Evidence from Kenya,” AMERICAN ECONOMIC REVIEW, OCT 2011, 101 (6), 2350–2390. , Pascaline Dupas, and Michael Kremer, “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya,” AMERICAN ECONOMIC REVIEW, AUG 2011, 101 (5), 1739–1774. , Rema Hanna, and Stephen P. Ryan, “Incentives Work: Getting Teachers to Come to School,” AMERICAN ECONOMIC REVIEW, JUN 2012, 102 (4), 1241–1278. Dupas, Pascaline and Jonathan Robinson, “Why Don’t the Poor Save More? Evidence from Health Savings Experiments,” AMERICAN ECONOMIC REVIEW, JUN 2013, 103 (4), 1138–1171. Duranton, Gilles and Matthew A. Turner, “The Fundamental Law of Road Congestion: Evidence from US Cities,” AMERICAN ECONOMIC REVIEW, OCT 2011, 101 (6), 2616–2652. Estevadeordal, Antoni, Caroline Freund, and Emanuel Ornelas, “DOES REGIONALISM AFFECT TRADE LIBERALIZATION TOWARD NONMEMBERS?,” QUARTERLY JOURNAL OF ECONOMICS, NOV 2008, 123 (4), 1531–1575. Fairlie, Robert W. and Jonathan Robinson, “Experimental Evidence on the Effects of Home Computers on Academic Achievement among Schoolchildren,” AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, JUL 2013, 5 (3), 211–240.

3 Ferraz, Claudio and Frederico Finan, “Electoral Accountability and Corruption: Evidence from the Audits of Local Governments,” AMERICAN ECONOMIC REVIEW, JUN 2011, 101 (4), 1274–1311. Finan, Frederico and Laura Schechter, “Vote-Buying and Reciprocity,” ECONOMETRICA, MAR 2012, 80 (2), 863–881. Finkelstein, Amy, “E-ZTAX: TAX SALIENCE AND TAX RATES,” QUARTERLY JOURNAL OF ECONOMICS, AUG 2009, 124 (3), 969–1010. Gabaix, Xavier and Augustin Landier, “Why has CEO pay increased so much?,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2008, 123 (1), 49–100. Gentzkow, Matthew and Jesse M. Shapiro, “Preschool television viewing and adolescent test scores: Historical evidence from the Coleman study,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2008, 123 (1), 279–323. and , “What Drives Media Slant? Evidence From US Daily Newspapers,” ECONOMETRICA, JAN 2010, 78 (1), 35–71. Gruber, Jonathan and Daniel M. Hungerman, “The church versus the mall: What happens when religion faces increased secular competition?,” QUARTERLY JOURNAL OF ECONOMICS, MAY 2008, 123 (2), 831–862. Guiso, Luigi, Paola Sapienza, and Luigi Zingales, “CULTURAL BIASES IN ECONOMIC EXCHANGE?,” QUARTERLY JOURNAL OF ECONOMICS, AUG 2009, 124 (3), 1095–1131. Guryan, Jonathan, Kory Kroft, and Matthew J. Notowidigdo, “Peer Effects in the Workplace: Evidence from Random Groupings in Professional Golf Tournaments,” American Economic Journal: Applied Economics, October 2009, 1 (4), 34–68. Heffetz, Ori and Moses Shayo, “How Large Are Non-Budget-Constraint Effects of Prices on Demand?,” AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, OCT 2009, 1 (4), 170–199. Ifcher, John and Homa Zarghamee, “Happiness and Time Preference: The Effect of Positive Affect in a Random-Assignment Experiment,” AMERICAN ECONOMIC REVIEW, DEC 2011, 101 (7), 3109–3129. Jayachandran, Seema and Adriana Lleras-Muney, “LIFE EXPECTANCY AND HUMAN CAPITAL INVESTMENTS: EVIDENCE FROM MATERNAL MORTALITY DECLINES,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2009, 124 (1), 349–397. Jensen, Robert, “Do Labor Market Opportunities Affect Young Women’s Work and Family Decisions? Experimental Evidence from India,” QUARTERLY JOURNAL OF ECONOMICS, MAY 2012, 127 (2), 753–792. Jensen, Robert T. and Nolan H. Miller, “Giffen Behavior and Subsistence Consumption,” AMERICAN ECONOMIC REVIEW, SEP 2008, 98 (4), 1553–1577. Karlan, Dean S. and Jonathan Zinman, “Credit elasticities in less-developed economies: Implications for microfinance,” AMERICAN ECONOMIC REVIEW, JUN 2008, 98 (3), 1040–1068. Khwaja, Asim Ijaz and Atif Mian, “Tracing the Impact of Bank Liquidity Shocks: Evidence from an Emerging Market,” AMERICAN ECONOMIC REVIEW, SEP 2008, 98 (4), 1413–1442. Kirwan, Barrett E., “The Incidence of US Agricultural Subsidies on Farmland Rental Rates,” JOURNAL OF POLITICAL ECONOMY, FEB 2009, 117 (1), 138–164. Kling, Jeffrey R., , Eldar Shafir, Lee C. Vermeulen, and Marian V. Wrobel, “Comparison Friction: Experimental Evidence from Medicare Drug Plans,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2012, 127 (1), 199–235.

4 Kremer, Michael, Jessica Leino, Edward Miguel, and Alix Peterson Zwane, “SPRING CLEANING: RURAL WATER IMPACTS, VALUATION, AND PROPERTY RIGHTS INSTITUTIONS,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2011, 126 (1), 145–205. Kroft, Kory, Fabian Lange, and Matthew J. Notowidigdo, “Duration Dependence and Labor Market Conditions: Evidence from a Field Experiment,” QUARTERLY JOURNAL OF ECONOMICS, AUG 2013, 128 (3), 1123–1167. Lalive, Rafael and Josef Zweimueller, “HOW DOES PARENTAL LEAVE AFFECT FERTILITY AND RETURN TO WORK? EVIDENCE FROM TWO NATURAL EXPERIMENTS,” QUARTERLY JOURNAL OF ECONOMICS, AUG 2009, 124 (3), 1363–1402. Lavy, Victor, “Performance Pay and Teachers’ Effort, Productivity, and Grading Ethics,” AMERICAN ECONOMIC REVIEW, DEC 2009, 99 (5), 1979–2011. Libecap, Gary D. and Dean Lueck, “The Demarcation of Land and the Role of Coordinating Property Institutions,” JOURNAL OF POLITICAL ECONOMY, JUN 2011, 119 (3), 426–467. Mas, Alexandre and Enrico Moretti, “Peers at Work,” AMERICAN ECONOMIC REVIEW, MAR 2009, 99 (1), 112–145. Mian, Atif and Amir Sufi, “THE CONSEQUENCES OF MORTGAGE CREDIT EXPANSION: EVIDENCE FROM THE US MORTGAGE DEFAULT CRISIS,” QUARTERLY JOURNAL OF ECONOMICS, NOV 2009, 124 (4), 1449–1496. Michalopoulos, Stelios and Elias Papaioannou, “PRE-COLONIAL ETHNIC INSTITUTIONS AND CONTEMPORARY AFRICAN DEVELOPMENT,” ECONOMETRICA, JAN 2013, 81 (1), 113–152. Muralidharan, Karthik and Venkatesh Sundararaman, “Teacher Performance Pay: Experimental Evidence from India,” JOURNAL OF POLITICAL ECONOMY, FEB 2011, 119 (1), 39–77. Nunn, Nathan, “The long-term effects of Africa’s slave trades,” QUARTERLY JOURNAL OF ECONOMICS, FEB 2008, 123 (1), 139–176. and Leonard Wantchekon, “The Slave Trade and the Origins of Mistrust in Africa,” AMERICAN ECONOMIC REVIEW, DEC 2011, 101 (7), 3221–3252. Olken, Benjamin A. and Patrick Barron, “The Simple Economics of Extortion: Evidence from Trucking in Aceh,” JOURNAL OF POLITICAL ECONOMY, JUN 2009, 117 (3), 417–452. Pope, Devin G. and Maurice E. Schweitzer, “Is Tiger Woods Loss Averse? Persistent Bias in the Face of Experience, Competition, and High Stakes,” AMERICAN ECONOMIC REVIEW, FEB 2011, 101 (1), 129–157. Price, Joseph and Justin Wolfers, “RACIAL DISCRIMINATION AMONG NBA REFEREES,” QUARTERLY JOURNAL OF ECONOMICS, NOV 2010, 125 (4), 1859–1887. Rockoff, Jonah E., Douglas O. Staiger, Thomas J. Kane, and Eric S. Taylor, “Information and Employee Evaluation: Evidence from a Randomized Intervention in Public Schools,” AMERICAN ECONOMIC REVIEW, DEC 2012, 102 (7), 3184–3213. Schularick, Moritz and Alan M. Taylor, “Credit Booms Gone Bust: Monetary Policy, Leverage Cycles, and Financial Crises, 1870-2008,” AMERICAN ECONOMIC REVIEW, APR 2012, 102 (2), 1029–1061. Shayo, Moses and Asaf Zussman, “Judicial Ingroup Bias in the Shadow of Terrorism,” QUARTERLY JOURNAL OF ECONOMICS, AUG 2011, 126 (3), 1447–1484. Snyder, Jr. James M. and David Stromberg, “Press Coverage and Political Accountability,” JOURNAL OF POLITICAL ECONOMY, APR 2010, 118 (2), 355–408.

5 Song, Zheng, Kjetil Storesletten, and Fabrizio Zilibotti, “Growing Like China,” AMERICAN ECONOMIC REVIEW, FEB 2011, 101 (1), 196–233. Spolaore, Enrico and Romain Wacziarg, “THE DIFFUSION OF DEVELOPMENT,” QUARTERLY JOURNAL OF ECONOMICS, MAY 2009, 124 (2), 469–529. Sullivan, Daniel and Till von Wachter, “JOB DISPLACEMENT AND MORTALITY: AN ANALYSIS USING ADMINISTRATIVE DATA,” QUARTERLY JOURNAL OF ECONOMICS, AUG 2009, 124 (3), 1265–1306. Voors, Maarten J., Eleonora E. M. Nillesen, Philip Verwimp, Erwin H. Bulte, Robert Lensink, and Daan P. Van Soest, “Violent Conflict and Behavior: A Field Experiment in Burundi,” AMERICAN ECONOMIC REVIEW, APR 2012, 102 (2), 941–964. Wei, Shang-Jin and Xiaobo Zhang, “The Competitive Saving Motive: Evidence from Rising Sex Ratios and Savings Rates in China,” JOURNAL OF POLITICAL ECONOMY, JUN 2011, 119 (3), 511–564.

6