Mistaking the Forest for the Trees: The Mistreatment of Group-Level Treatments in the Study of American Politics

Kelly T. Rader

Submitted in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

in the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2012 c 2012

Kelly T. Rader

All Rights Reserved ABSTRACT

Mistaking the Forest for the Trees: The Mistreatment of Group-Level Treatments in the Study of American Politics

Kelly T. Rader

Over the past few decades, the field of political science has become increasingly sophisticated in its use of empirical tests for theoretical claims. One particularly productive strain of this devel- opment has been the identification of the limitations of and challenges in using observational .

Making causal inferences with observational data is difficult for numerous reasons. One reason is that one can never be sure that the estimate of interest is un-confounded by omitted variable bias

(or, in causal terms, that a given treatment is ignorable or conditionally random). However, when the ideal hypothetical is impractical, illegal, or impossible, researchers can often use quasi-experimental approaches to identify causal effects more plausibly than with simple regres- sion techniques. Another reason is that, even if all of the factors are observed and properly controlled for in the model specification, one can never be sure that the unobserved (or error) component of the data generating process conforms to the assumptions one must make to use the model. If it does not, then this manifests itself in terms of bias in standard errors and incor- rect inference on statistical significance of quantities of interest. In this case, one can either turn to “fixes” that are robust to generic forms of from standard assumptions or to non-parametric solutions that do not require such assumptions but may be less powerful than their parametric counterparts.

The following essays, I develop the use of some of these techniques for inference with obser- vational data and explore their limitations. Collectively, these essays challenge the conventional application of quasi-experimental techniques and standard error fixes. They also contribute to important substantive debates over legislative organization by producing more cleanly identified effects of the power of Congressional representatives as individuals and as members of parties to bargain over distributive goods. Table of Contents

I Dissertation Chapters1

1 Overview 2

1.1 Introduction...... 3

1.2 Chapter Summaries...... 4

1.2.1 Tests and Inference with Grouped Data...... 4

1.2.2 Party Effects on the Distribution of Federal Outlays...... 7

1.2.3 Malapportionment in the U.S. House of Representatives...... 11

2 Randomization Tests for Grouped Data 14

2.1 Introduction...... 15

2.2 Inference with Clustered Data...... 16

2.3 Cluster-Robust Standard Errors...... 19

2.4 Randomization Tests...... 21

2.5 Evaluation Criteria...... 26

2.6 Monte Carlo Experiment...... 29

2.7 Results and Discussion...... 30

i 2.8 Applications...... 32

2.8.1 State Postregistration Laws and Voting...... 33

2.8.2 Precedent and Voting on the Supreme Court...... 34

2.8.3 Democratic Trade...... 36

2.9 Conclusion...... 37

2.10 Appendix...... 39

3 Party Effects on Outlays 49

3.1 Introduction...... 50

3.2 The Theory of Party Effects...... 52

3.2.1 Universalistic Theories...... 52

3.2.2 Party-Centered Theories...... 53

3.2.3 Democrats versus Republicans...... 54

3.2.4 Empirical Tests of Party Effects on Spending...... 55

3.3 Regression Discontinuity Design...... 56

3.3.1 Implementation...... 57

3.3.2 Assumptions...... 58

3.4 Data and Results...... 59

3.4.1 Majority Party Power...... 61

3.4.2 Democrats vs. Republicans...... 63

3.5 Conclusion...... 65

3.6 Appendix...... 77

ii 4 Malapportionment in the House 82

4.1 Introduction...... 83

4.2 Apportioning the House...... 84

4.3 How Malapportioned is the House?...... 85

4.4 The Apportionment-Funding Connection...... 87

4.5 Relative Federal Spending in the States...... 89

4.6 Method and Results...... 90

4.7 Discussion and Conclusion...... 93

4.8 Appendix...... 103

II Bibliography 106

Bibliography 107

iii List of Figures

2.1 Size results from Monte Carlo . Results shown for nominal α levels .01, .05, and .1.

Shaded gray area represents 95% binomial confidence intervals around the nominal α level, denoted

with dotted line. Confidence intervals calculated using the Wilson method recommended by Agresti

and Coull(1998)...... 43

2.2 Power results from Monte Carlo experiments. Results shown for nominal α levels .01, .05, and

.1, and number of groups equal to 10, 20, and 50. Dotted lines at nominal α level and one (maximum

power)...... 44

2.3 Randomization Test Results. Mailed polling place information and its with

individual education level on an individual’s propensity to vote...... 45

2.4 Randomization Test Results. Regime change in freedom of expression cases before and

after Grayned vs. Rockford ...... 46

2.5 Randomization Test Results. Minimum dyadic democracy score and bilateral trade .... 47

2.6 Size results from Monte Carlo experiments. Results shown for data in which Z and X are

uncorrelated, correlated at 0.3, and correlated at 0.8. Shaded gray area represents 95% binomial

confidence intervals around the nominal α level, denoted with dotted line. Confidence intervals

calculated using the Wilson method recommended by Agresti and Coull(1998)...... 48

3.1 Estimates correspond to table 3.1, column 1. 95% confidence intervals shown..... 68

3.2 Estimates correspond to table 3.1, column 2. 95% confidence intervals shown..... 68

iv 3.3 Estimates correspond to table 3.1, column 3. 95% confidence intervals shown..... 69

3.4 Estimates correspond to table 3.1, column 5. 95% confidence intervals shown..... 69

3.5 Estimates correspond to table 3.1, column 6. 95% confidence intervals shown..... 70

3.6 Estimates correspond to table 3.1, column 7. 95% confidence intervals shown..... 70

3.7 Effect of Majority Party Win on Spending. Locally weighted regressions model

ln(spending) as a function of majority party victory margin in the district up to

zero and above zero. Sample includes all districts matched to either 14 or 20 fiscal

years...... 71

3.8 Vertical line indicates the IK optimal bandwidth. Dots are point estimates, and

bands are 95% confidence intervals. Sample includes all districts matched to either

14 or 20 fiscal years...... 71

3.9 Estimates correspond to table 3.2, column 1. 95% confidence intervals shown..... 73

3.10 Estimates correspond to table 3.2, column 2. 95% confidence intervals shown..... 73

3.11 Estimates correspond to table 3.2, column 3. 95% confidence intervals shown..... 74

3.12 Estimates correspond to table 3.2, column 5. 95% confidence intervals shown..... 74

3.13 Estimates correspond to table 3.2, column 6. 95% confidence intervals shown..... 75

3.14 Estimates correspond to table 3.2, column 7. 95% confidence intervals shown..... 75

3.15 Effect of Democratic Party Win on Spending. Locally weighted regressions model

ln(spending) as a function of Democratic victory margin in the district up to zero

and above zero. Sample includes all districts matched to either 14 or 20 fiscal years. 76

3.16 Vertical line indicates the IK optimal bandwidth. Dots are point estimates, and

bands are 95% confidence intervals. Sample includes all districts matched to either

14 or 20 fiscal years...... 76

4.1 Relative Representation in the U.S. House by State, 1972-1982...... 96

v 4.2 Relative Representation in the U.S. House by State, 1973-1992...... 96

4.3 Relative Representation in the U.S. House by State, 1983-2002...... 97

4.4 Relative Representation in the U.S. House by State, 1993-2004...... 97

4.5 Relative Federal Spending in States, 1972-1982...... 98

4.6 Relative Federal Spending in States, 1973-1992...... 98

4.7 Relative Federal Spending in States, 1983-2002...... 99

4.8 Relative Federal Spending in States, 1993-2004...... 99

4.9 Relative Federal Spending in States, 1983-2002 (FAADS Data)...... 104

vi List of Tables

2.1 Hypothesis Testing...... 27

3.1 Regressions of ln(expenditures) on majority party vote share: columns 1, 2, and 3

use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates

OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth

of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a band-

width of 12. Sample includes all districts matched to either 14 or 20 fiscal years.... 67

3.2 Regressions of ln(expenditures) on Democratic vote share: columns 1, 2, and 3 use

IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB

in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6

percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth

of 12. Sample includes all districts matched to either 14 or 20 fiscal years...... 72

3.3 Regressions of ln(expenditures) on majority party vote share: columns 1, 2, and 3

use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates

OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth

of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a band-

width of 12. Sample includes all districts matched to 20 fiscal years...... 78

vii 3.4 Regressions of ln(expenditures) on majority party vote share: columns 1, 2, and 3

use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates

OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth

of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a band-

width of 12. Sample includes all districts matched to 14 fiscal years...... 79

3.5 Regressions of ln(expenditures) on Democratic vote share: columns 1, 2, and 3 use

IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB

in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6

percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth

of 12. Sample includes all districts matched to 20 fiscal years...... 80

3.6 Regressions of ln(expenditures) on Democratic vote share: columns 1, 2, and 3 use

IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB

in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6

percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth

of 12. Sample includes all districts matched to 14 fiscal years...... 81

4.1 Change in Representatives in the U.S. House Following the 1970, 1980, 1990, and

2000 ...... 100

4.2 Fiscal Impact of Reapportionment Following the 1970 : Difference-in-Differences

Estimates...... 101

4.3 Fiscal Impact of Reapportionment Following the 1980 Census: Difference-in-Differences

Estimates...... 101

4.4 Fiscal Impact of Reapportionment Following the 1990 Census: Difference-in-Differences

Estimates...... 102

4.5 Fiscal Impact of Reapportionment Following the 2000 Census: Difference-in-Differences

Estimates...... 102

viii 4.6 Fiscal Impact of Reapportionment Following the 2000 Census: Difference-in-Differences

Estimates (FAADS Data)...... 105

ix Over my years at Columbia, I have benefitted from the support of many people who made my path through graduate school more bearable and even sometimes really enjoyable.

First, I would like to thank my three advisors Jeff Lax, Bob Erikson, and Greg Wawro, each of whom played key roles in getting me to the end of this process, even when (especially when) I thought it was impossible. From my first year in the program, Jeff treated me like a colleague, even before I knew what that meant. I have learned more about “doing” political science from writing and debating with Jeff than I did in any class, and I will always be grateful for his unwavering belief in me. Bob taught me to trust my intuition about ideas and data, which I can do only because he did so much to shape that intuition by sharing his own with me. Greg taught me how to do

Monte Carlo analysis and how to think about Congress, much of which is reflected in these three papers, but most importantly, he helped me get down from Mount Everest as a storm was rolling in (so to speak).

I would also like to thank Chuck Cameron and Pablo Pinto for serving on my defense commit- tee. They both gave me invaluable feedback on this work during my defense and also on previous versions of these papers.

I have also benefitted from the insight and friendship of many faculty and colleagues in the

Department of Political Science: Jess Blau, David Epstein, Andy Gelman, Guy Grossman, Shigeo

Hirano, Jenn Hudson, John Kastellec, Kate Krimmel, Narayani Lasala, Jeffrey Lenowitz, Sharyn

O’Halloran, Virginia Oliveros, Laura Paler, Dianne Pfundstein, Justin Phillips, Maria Paula Saffon,

Cyrus Samii, Thania Sanchez, Robert Shapiro, Neelanjan Sircar, Piero Stanig, Kevin Toner, and

Milan Vaishnav. I extend a special thank you to Chris Weiss, my first advisor in Columbia’s QMSS program, and to Austin Nichols, my former coworker and current coauthor, for encouraging me to pursue a Ph.D.

I would like to thank Columbia University’s Graduate School of Arts and Sciences and the

Department of Political Science for financial support throughout my time at Columbia.

None of this would have been possible without the love and encouragement of my parents,

Harold Rader and Nancy Rader, who have always trusted my judgement and supported me in

x whatever goals I set out for myself.

Last, I am deeply grateful for my soon-to-be husband, Telis Demos. From the little things (his actual belief that teaching assistants are cool) to the big things (his tireless support through all six years of this journey), his presence in my life has meant more than I can say.

xi I dedicate this dissertation to my grandparents, Lewis F. Gordon Jr. and Patricia Gordon.

xii 1

Part I

Dissertation Chapters CHAPTER 1. OVERVIEW 2

Chapter 1

Overview 3 CHAPTER 1. OVERVIEW

1.1 Introduction

Over the past few decades, the field of political science has become increasingly sophisticated in its use of empirical tests for theoretical claims. One particularly productive strain of this development has been the identification of the limitations of and challenges in using observational data. Making causal inferences with observational data is difficult for numerous reasons, two of which I address herein.

The first difficulty arises because, in order to run a typical parametric model on observational data, a researcher is required to make the untestable and crucial assumption that the estimate of interest is un-confounded by omitted variable bias (or, in causal terms, that a given treatment is ignorable or conditionally random). One way to avoid making this assumption is to conduct a in which the random of observations from a known popula- tion and the of observations into treatment and control groups insures that any post-treatment difference in outcomes between the treatment and control groups is at- tributable to the effect of the treatment alone and that this effect, should it exist, can be generalized to the population as a whole. However, the ideal experiment often remains in the realm of the hy- pothetical because it is impractical, illegal, or impossible.

When such an experiment cannot be conducted, researchers can often use quasi-experimental approaches to identify causal effects more plausibly than with simple regression techniques, such as regression discontinuity design, which I employ in Chapter3, or difference-in-differences esti- mation, which I employ in Chapter4.

The second is that, even if all of the confounding factors are observed and properly controlled for in the model specification, one can never be sure that the unobserved (or error) component of the data generating process conforms to the assumptions one must make in order to use the model.

If it does not, then this manifests itself in terms of bias in standard errors and incorrect inference on statistical significance of quantities of interest.1 In this case, one can either turn to standard

1The worst case scenario occurs in non-linear models, in which improperly modeled errors lead to bias in the esti- mates of quantities of interest. CHAPTER 1. OVERVIEW 4 error “fixes” that are robust to generic forms of deviance from standard assumptions, such as

Huber-White standard errors or cluster-robust standard errors, or to non-parametric solutions, like randomization tests, the subject of Chapter2, that do not require such assumptions but may be less powerful than their parametric counterparts.

In the following three essays, I develop the use of some of these techniques for inference with observational data and explore their limitations. Specifically, I focus on observational data that has a grouped structure, in which the variable of interest is measured at the group level and operates on an outcome measured at the individual level. Substantively, these essays also contribute to im- portant and long-standing debates in political science over legislative organization by producing more cleanly identified effects of the power of Congressional representatives as individuals and as members of parties to bargain over distributive goods. Methodologically, these essays contribute to the literature on empirical political methodology by challenging the conventional application of quasi-experimental techniques and standard error fixes on grouped observational data.

1.2 Chapter Summaries

1.2.1 Randomization Tests and Inference with Grouped Data

Data in which observations are nested in groups are common in political science, and political scientists often want to know how a variable measured at the group level affects an outcome measured at the individual level, or to test for so-called “contextual effects.” A well-known com- plication that arises with grouped datasets is that the amount of independent information in the data is smaller than one would expect given the sample size due to some unobserved, group-level shock shared in common among members of the same group. Ignoring this “clustering” can lead to downward bias in standard errors and thus to overconfident conclusions about the statistical significance of group-level effects.2 For the classic statement of this problem, see Moulton(1986).

2The degree of overconfidence that standard errors exhibit in the face of clustered data depends on several factors. As Moulton(1986) shows, the degree of downward bias increases, all else equal, as the average number of observations in groups increases, the correlation of the intra-group disturbances increases, or the correlation of intra-group regressors 5 CHAPTER 1. OVERVIEW

For a recent overview, see Angrist and Pischke(2009).

A common way that applied researchers deal with this issue is to calculate cluster-robust stan- dard errors (CRSEs) and to conduct hypothesis tests on group-level variables incorporating these errors.3 CRSEs are a generalization of a type of standard errors that are robust to heteroskedas- ticity of unknown form, commonly called sandwich estimators.4 The cluster-robust approach further generalizes the - matrix to allow for correlation across observations in the same group (Liang and Zeger, 1986, Rogers, 1993). In other words, CRSEs are robust to het- eroskedasticity of unknown form and to clustering among observations in the same group, and so they would seem to be the appropriate choice for inference with grouped observational data.5

However, a potentially serious limitation of CRSEs that is almost always ignored in applied literature is that their asymptotic properties hold only as the number of groups goes to infinity, not the number of observations (Baum, Nichols and Schaffer, 2010). While there is no agree- ment on how many groups is sufficient for valid inference, recent Monte Carlo evidence suggests that one must have at least 50 groups for CRSEs to be the correct size (e.g., Bertrand, Duflo and

Mullainathan, 2004). Otherwise, CRSEs, while still a great improvement over standard error esti- mates that ignore clustering, will still be too small, and inferences will still be overconfident. This limitation is particularly meaningful for applied political science research, since there are many situations in which one would want to analyze grouped data with fewer than 50 groups and in which gathering more groups is not an option.6

increases. 3Since the year 2000, approximately 120 articles published in the American Political Science Review, the American Journal of Political Science, and the Journal of Politics made use of CRSEs. 4They are also known as Huber-White standard errors (after Huber(1967) and White(1980), who each derived them), or White standard errors. 5Monte Carlo experiments in Rogers(1993) show that, for data with a large number of groups, t-tests using CRSEs falsely reject the null hypothesis (commit Type I errors) an appropriate number of times for a given confidence level, whereas t-tests using Huber-White robust standard errors are overconfident (commit too many Type I errors). Thus, for data in which errors are clustered within groups, uncorrelated across groups, and have a large number of groups, CRSEs are an improvement over typical OLS standard errors and over robust standard errors. 6For example, if one is interested in estimating the effect of some cross-country variation on an individual outcome, then one only has 27 European Union countries, 30 OECD countries, or 18 Latin American countries. If one is interested in exploiting sub-national variation, then, out of the 24 federal countries, only 2 (the U.S. and Russia) have at least 50 sub-national units. If one is interested in the decisions of key actors, then one may observe many vetos, but only 44 U.S. presidents; or, one may observe many votes, but only 9 Supreme Court justices. CHAPTER 1. OVERVIEW 6

In Chapter2, “Randomization Tests and Inference with Grouped Data,” I propose an alterna- tive approach to inference with grouped data—the randomization test—and explore its advan- tages and limitations for political science applications. Randomization tests are a non-parametric way to conduct hypothesis tests, and thus their validity does not depend on asymptotic properties or on distributional assumptions about the disturbances in a model. Instead of using textbook test distributions, with their attendant assumptions, to judge the rarity of an under the null hypothesis, the randomization test creates a custom reference distribution for the data at hand.

A basic randomization tests proceeds as follows. 1) Set-up the model and calculate a test statis- tic on the variable of interest, say Z, in the usual way, e.g. run OLS and find the t-test statistic. 2)

Shuffle (randomize) the Z variable in order to break the systematic relationship between Z and the dependent variable, thus creating a dataset in which the null hypothesis is true by construction.

3) Calculate the test statistic in the usual way using this shuffled dataset. 4) Repeat steps 2) and

3) many times to create a distribution of test statistics observed when the null hypothesis of no systematic effect is true.7 5) Compare the test statistic from the unshuffled data to this distribu- tion. If it is among the top α percent most extreme values, reject the null hypothesis at the 1 − α confidence level.

A randomization test on grouped data should proceed in much the same way, except that the Z vector should be block shuffled. That is, the the group-level variable is randomized across groups but not within groups. Randomizing across but not within groups preserves the correla- tion among observations in the same group that arises because of the clustered nature of the data.

Randomizing both within and across groups would be analogous to the mistake of ignoring the possibility of clustering. As Moore et al.(2003) put it, one should sample from the set of possible permutations that are consistent with the study design.

This essay represents the first Monte Carlo work testing the effectiveness of randomization tests on grouped data and comparing such tests to other approaches commonly used in political

7Manly(1997) suggests at least 1000 for a test at the 95% confidence level. In this way, a randomization test is a sample version of Fisher’s (Fisher, 1935). 7 CHAPTER 1. OVERVIEW science and elsewhere. I present the results of Monte Carlo experiments comparing t-tests with

CRSEs and randomization tests in terms of size (the rate at which the tests falsely reject the null hypothesis) and power (the rate at which the tests correctly reject the null hypothesis) for datasets with varying numbers of groups. I find that randomization tests are accurately sized for all of the parameter combinations I test, whereas t-tests with CRSEs are almost always over-confident, even with 50 groups. I further find that randomization tests are only slightly less powerful than t-tests with CRSEs for a given dataset. In other words, the gain from using randomization tests instead of t-tests with CRSEs is great in terms of Type I error and the loss is slight in terms of Type II error.

Thus, especially for situations in which the number of groups is small, randomization tests are more appropriate for group-level inference than the common parametric approach.

Randomization tests are not without their limitations, however. When correlation between a group-level variable of interest and an individual-level covariate is very high (greater than about

.7), a block-shuffled randomization test is underconfident, or makes too few Type I errors (while t-tests with CRSEs remain overconfident). I present and discuss these results in an appendix to

Chapter2.

1.2.2 Party Effects on the Distribution of Federal Outlays

Can members of the majority party direct a disproportionate share of federal outlays to their dis- tricts in the U.S. House of Representatives? Party-cartel theories of legislative organization posit that majority party status accords party members influence over legislative outcomes that they would not otherwise wield. The implication for distributive spending is that a majority party member elected from a given district could extract more distributive benefits for the district than a could minority party member, even if elected by the same district for the same Congress. By contrast, universalistic accounts of distributive spending in Congress posit that so-called “pork” projects are doled out equally across districts, such that majority party status should not accord a member any particular advantage in bringing home the bacon. Findings from the current em- pirical literature are mixed in their support for any majority party effect on spending. In Chapter CHAPTER 1. OVERVIEW 8

3, “Party Effects on the Distribution of Federal Outlays: A Regression Discontinuity Approach,” I employ a quasi-experimental approach, specifically a regression discontinuity design, to estimate the casual effect of majority party status on the pattern of federal outlays to marginal congressional districts.

The first theoretical accounts of Congressional bargaining over distributive goods predicted that those goods would be allocated to districts in a universal manner. Mayhew(1974) notably argued that it “makes no sense” for the majority party to deprive the minority party of their share of the goods because costs of including them are much lower than the costs of hard-ball partisan politics. Later work formalized this notion: because of uncertainty over which members will be in a minimum-winning coalitions in the future (Shepsle and Weingast, 1981) or because of coali- tional instability (Collie, 1988), legislators should want to form universalistic coalitions. Through logrolling, legislators can capture gains from trade when their constituents demand different types of goods (Weingast and Marshall, 1988). The locus of power in these accounts is the committee.

While, Collie(1988) and Weingast and Marshall(1988) speculate that strong parties may substitute for strong committees, this does not change their predictions that distribution should be universal.

On the other hand, party-centered theories of legislative organization predict that majority party members can direct more distributive benefits to their districts because of their procedural powers. Cox and McCubbins(1993) argue that majority party members have an incentive to use these powers to benefit their fellow party members because their electoral success is somewhat tied to the party’s reputation as a whole. One way in which party leaders exercise their power is to organize committees to ensure that majority members will benefit from distributive gains reaped in committees. This implies that any logrolling that occurs within or across committees would disproportionately benefit majority party members. Thus, strong committees and strong parties are not substitutes, but the majority party makes use of the inherent strength of committees to pursue its own ends. Similarly, Balla et al.(2002) argue that the majority party gives the minority party some pork but a smaller share of pork than its gets for its own members. The majority party acts in this way to prevent the minority party from attacking its members for being profligate 9 CHAPTER 1. OVERVIEW spenders while still being able to secure a comparative electoral advantage for its members.

The empirical literature on majority party effects on spending is quite mixed. Findings from no effect (Stein and Bickers, 1994) to conditional effects (earmarks only (Balla et al., 2002, Lee,

2003), certain types of programs (Carsey and Rundquist, 1999, Levitt and Snyder, 1995)) to clear effects (Alvarez and Saving, 1997b). While these studies are not directly comparable because they cover different time periods and subsets of federal spending in districts, taken as a whole, they offer scant support for the strongest predictions of party-centered theories.

One feature each of the empirical studies cited above do share is that they all employ simple regressions on observational data. Thus, not only is their validity threatened by omitted variable bias, they also rely on accurately controlling for district-level demand for federal outlays, district- level need for federal outlays, and district-level opinion on distributive spending. By contrast, my approach allows me to estimate an unbiased treatment effect of majority party status on the distribution of federal funds to districts without having to operationalize and control for possibly confounding factors.8

Regression discontinuity design (RDD) methods are used to estimate how a treatment affects outcomes in the absence of a randomized controlled experiment (Thistlethwaite and Campbell,

1960). Many experiments in the social science world are purely hypothetical because they are not legally, politically, or economically feasible. In this case, I clearly cannot estimate the effect of a representatives majority party status on federal outlays in the district by randomly assigning representatives to districts. However, due to a particular feature of the U.S.’s majoritarian voting system, I can estimate the effect of party on the distribution of federal funds by using available data and a quasi-experimental method.

To use RDD, it must be the case that observations are assigned to treatment groups based on an observed variable, the assignment variable. Observations whose assignment variable values are

8Many of the extant studies also do not take into account the grouped nature of the data they use, in which the key attribute is measured at the representative level and the outcome at the district level. When a district has the same representative over multiple fiscal years (as is always the case) or over multiple Congresses (as is often the case), then the data are subject to clustering. Because I find no effect, there is no reason to worry about overconfident standard errors. However, I report bootstrapped standard errors for completeness. CHAPTER 1. OVERVIEW 10 above a certain threshold are assigned to the treatment group; those whose assignment variable values are below the threshold are assigned to the control group. Thus, unlike in a true experi- ment, treatment and control groups are not assigned randomly, but the measure underlying their assignment can be observed.

In the case of party and federal funding to districts, the assignment variable is the majority party vote share in the House election.9 Districts in which above fifty percent of the voters chose the majority party candidate are in the treatment group. Districts in which below fifty percent of the voters chose the majority party candidate (where more than fifty percent chose the minority party candidate) are in the control group.

The null hypothesis of an RDD is that the outcome variable is a smooth and continuous func- tion of the assignment variable, including near the threshold. If the universalistic prediction is correct, federal funding should be a smooth and continuous function of the majority party vote share, even near the fifty percent threshold. In other words, if a member’s majority status truly plays no role in the amount of federal funds she can secure for her district, then the spending behavior of a majority party member elected with slightly more than fifty percent of the vote should be virtually indistinguishable from the spending behavior of a minority member elected with slightly more than fifty percent of the vote. In this way, the assignment of majority party status in very close elections approximates random assignment.

The key insight of RDD is that if there is a discontinuity in the outcome variable near the threshold, where observations are essentially the same in every way, then there is a treatment effect. The presence of a statistically significant treatment effect favoring the majority party would be evidence in favor of party-centered theories of legislative organization.

Using data on federal outlays from the Federal Assistance Awards Data System (FAADS)10

9This will be the two-party vote for the Republican or the Democrat depending on the election year. 10FAADS is gathered by the Census Bureau and includes spending from all federal domestic assistance programs. For example, these programs include social security, agricultural subsidies paid to producers, research grants, and commu- nity development grants. FAADS does not include data on spending from federal procurements, government employee wages, and certain loan programs. I subset the data to “new actions” only. “New action” does not necessarily mean new program. As noted in Stein and Bickers(1994), members rarely have to initiate wholly new programs to provide benefits to their districts. Instead, they initiate new projects within existing programs. In the FAADS data, spending 11 CHAPTER 1. OVERVIEW from 1983 to 2002, I find no statistically significant party effect on the amount of federal outlays to a district in the pooled sample. In year-by-year estimates, I find no more significant effects than one would expect from random chance, and those few years that do exhibit significant effects are not consistently signed.11

1.2.3 Malapportionment in the U.S. House of Representatives

It is well established that malapportionment in the U.S. Senate affects the distribution of federal outlays to states. Do sharp changes in representation in the House of Representatives induced by reapportionment also affect the distribution of federal funds? Every ten years, following the decennial census, the House is reapportioned among states based on updated state population counts. The reapportionment process is meant to insure that the House remains the chamber of equal representation of citizens that the Framers intended it to be. However, the fact that a sizable proportion of states gains or loses representatives after every census is evidence that the

House becomes increasingly malapportioned over the course of a given decade. In Chapter4,

“Malapportionment in the U.S. House of Representatives: The Effect of Census Reapportionment on the Distribution of Federal Funds to States,” I use reapportionment as a around which to test for an effect of malapportionment in the House on the distribution of funds to states. I estimate this effect using a difference-in-differences (DID) approach.

The empirical literature on the impact of malapportionment on spending patterns is not large, but it is conclusive. The distribution of federal funds to U.S. states bears the imprint of malap- portionment in the Senate (Atlas et al., 1995, Lee, 1998, 2000, Lee and Oppenheimer, 1999). Prior to the Supreme Court’s 1960s apportionment decisions, malapportionment in state legislatures affected the distribution of state funds to localities (Ansolabehere, Gerber and Snyder, 2002). In countries with bicameral legislatures, the distribution of national monies to sub-national units is marked “new action” includes the initial payment of a new program or a new project within an extant program. 11This design also allows me to test the somewhat a-theoretical but still provocative claims often made by politicians that one party is more profligate with pork barrel spending than the other. I can do this by using percent voting for the Democrat as the assignment variable instead of percent voting for the majority party member. A positive jump at 50% would indicate that Democrats spend more, while a negative jump would mean that Republicans spend more. Using this test, I again find no statistically significant effects. CHAPTER 1. OVERVIEW 12 influenced by malapportionment in the upper chamber (e.g., Horiuchi and Saito, 2003). In short, more representation more money.

Lee(2000) argues that malapportionment in the Senate affects the distribution of federal money because, to a coalition builder, each Senators vote counts the same but a small state Senators vote is cheaper. Given that relative representation is both more variable and more entrenched in the

Senate than in the House, one may not expect that Lees mechanism would apply to the House of

Representatives. However, even if the logic of bargaining in the House is dissimilar to the logic of bargaining in the Senate, malapportionment in the House may still affect the distribution of federal spending because it induces sudden and sharp changes in representation every ten years due to reapportionment. In other words, if the mere number of House representatives a state has affects the flow of federal outlays to that state, then malapportionment in the House will affect outlays through Census reapportionment, if not through a Senate-style bargaining game. Should we expect that the number of representatives a state has in the House is causally related to federal outlays to states?

There are certainly many legislative bargaining theories that predict such a connection. An- solabehere, Snyder and Ting(2003) show that, when the House proposes a bill and the Senate considers it under a closed rule, then the payoff for all House members is equal. Similarly, Lee and Oppenheimer(1999) argue that House members should want to divide funds approximately equally across districts. This description of bargaining in the House is consistent with univer- salistic, committee-centered theories of legislative organization, discussed in Section 1.2.2, which predict that benefits should be distributed to all members’ districts in a “universal” manner. Thus there is reason to hypothesize that the number of representatives a state has is positively related to the amount of federal money distributed to that state. When the number of representatives changes from one year to the next due to reapportionment, then the amount of federal money should respond in the direction of the change.

Census reapportionment is an natural experiment around which to test the treatment effect of a change in representation. Because only some states gain or lose representatives due to reappor- 13 CHAPTER 1. OVERVIEW tionment while other states retain the same number of representatives, the states that experience no change can act as a control group against which to compare the effect of a change in repre- sentation. A situation in which there exists a pre-treatment measurement and a post-treatment measurement of an outcome in a treated group and in a control group is the ideal situation in which to apply a difference-in-differences (DID) modeling approach.

My particular model employs state fixed effects and control for other possibly confounding factors (e.g., state unemployment, state income) to relax the “parallelism” assumption that, in the absence of the treatment (reapportionment), there should be no difference in the change in the outcome between treated and control groups. I use spending data from the Consolidated Federal

Funds Report (CFFR) from the last pre-treatment years (years ending in 2) and the first treated years (years ending in 4) for reapportionments that occurred after the 1970, 1980, 1990, and 2000

Censuses.

Using a year-by-year measure of state-level representation in the U.S. House of Representa- tives, I showed that the House becomes increasingly malapportioned in every year following a given Census reapportionment until the next reapportionment corrects, as much as possible, any disparity in representation. However, I find no evidence that malapportionment in the House affects federal outlays to states. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 14

Chapter 2

Randomization Tests and Inference with Grouped Data 15 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA

2.1 Introduction

Data in which observations are nested in groups are common in political science, and political scientists often want to know how a variable measured at the group-level affects an outcome measured at the individual level, or to test for so-called “contextual effects.” A well-known com- plication that arises with grouped datasets is that the amount of independent information in the data is smaller than one would expect given the sample size due to some unobserved, group-level shock shared in common among members of the same group. Ignoring this “clustering” can lead to overconfident conclusions about the statistical significance of group-level effects.

A common way that applied researchers deal with this issue is to calculate cluster-robust stan- dard errors (CRSEs) and to conduct hypothesis tests on group-level variables incorporating these errors. CRSEs are robust to both heteroskedasticity across observations and clustering within groups. They are also easy to implement in common statistical packages, like Stata, R, and SAS.

Thus it is no wonder that, since the year 2000, approximately 120 articles published in the Ameri- can Political Science Review, the American Journal of Political Science, and the Journal of Politics made use of CRSEs.

A potentially serious limitation of CRSEs that is almost always ignored in applied literature is that their asymptotic properties hold only as the number of groups goes to infinity, not the number of observations. While there is no agreement on how many groups is sufficient for valid inference, recent Monte Carlo evidence suggests that one must have at least 50 groups for CRSEs to be the correct size. Otherwise, CRSEs, while still a great improvement over standard error estimates that ignore clustering, will be too small, and inferences will be overconfident.

Unfortunately, there are many situations in which an applied researcher would want to ana- lyze grouped data with fewer than 50 groups. For example, if one is interested in estimating the effect of some cross-country variation on an individual outcome, then one only has 27 European

Union countries, 30 OECD countries, or 18 Latin American countries. If one is interested in ex- ploiting sub-national variation, then, out of the 24 federal countries, only 2 (the U.S. and Russia) CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 16 have at least 50 sub-national units. If one is interested in the decisions of key actors, then one may observe many vetos, but only 44 U.S. presidents; or, one may observe many votes, but only 36

Supreme Court justices in the post-WWII era.

In the following, I propose an alternative approach to inference with grouped data—the ran- domization test. Randomization tests are a non-parametric way to conduct hypothesis tests, and thus their validity does not depend on asymptotic properties or on distributional assump- tions about the disturbances in a model. Randomization tests are used widely in other fields and are considered by some to be the “” against which all parametric tests should be judged (Edgington and Onghena, 2007, 9). However, to date, there has been no comparative Monte Carlo work testing the effectiveness of randomization tests on grouped data compared to other approaches commonly used in political science.

I use Monte Carlo experiments to compare t-tests with CRSEs and randomization tests in terms of size and power for datasets with varying numbers of groups. I find that randomization tests are accurately sized for all of the parameter combinations I test, whereas t-tests with CRSEs are almost always over-confident, even with 50 groups. I further find that randomization tests are only slightly less powerful than t-tests with CRSEs for a given dataset. In other words, the gain from using randomization tests instead of t-tests with CRSEs is great in terms of Type I error and the loss is slight in terms of Type II error. Thus, especially for situations in which the number of groups is small, randomization tests are more appropriate for group-level inference than the common parametric approach.

2.2 Inference with Clustered Data

Political scientists are often interested in estimating the effect of a group level variable using data that is measured at the individual level. For example, one might want to know the “contextual effect” of some state law (like a voting registration deadline) on some individual behavior (like the decision to vote or not). Estimating the size of a group-level effect is straightforward with 17 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA a well-specified model. However, estimating the precision of that effect and testing its signifi- cance requires some extra consideration. This is because, while one may have many thousands of individual-level observations, the parameter of interest only varies at the group level. Thus, as a general rule, the “effective sample size” is not the number of individual observations, but is instead closer to the number of groups (Angrist and Pischke, 2009, 308).

To illustrate, say that you have a dataset with N observations, one covariate x measured at the individual level, and one covariate z measured at the group level; and your primary interest lies in whether or not the group-level variable has a statistically significant effect on a continuous dependent variable y. You might suspect that the data is generated by a simple stochastic linear process like the following:

yi,g = γ0 + γ1xi,g + γ2zg + vi,g (2.1)

2 where the error term vi,g ∼ i.i.d. N(0, σ ), i = 1, ..., N indexes individuals, and g = 1, ..., G indexes groups. In this simple case, one could estimate the model using ordinary (OLS), take the ratio of γˆ2 and its standard error to obtain t-statistic (or a standardized effect), and compare that t-statistic to the theoretical t-distribution for sample size N − 3. If it is at least in the top 5% of the distribution, then you can say that the group-level effect is different from zero at the 95% confidence level.

However, one might imagine that, due to the grouped nature of the data, there may be some unobserved correlation among individuals in the same group. For example, people who live in the same state are exposed to the same state culture as each other, which is different from that experienced by people living in other states. In general, it is reasonable to expect that if observations in groups have the same observed group-level characteristics, then they may also have some of the same unobserved group-level characteristics (Moulton, 1990). If this is the case, then stochastic component vi,g in Equation (2.1) may have an individual-level and a group-level component, making the data-generating process like the following:

yi,g = γ0 + γ1xi,g + γ2zg + ei,g + ug (2.2) CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 18

2 Even when ug is well-behaved (∼ i.i.d. N(0, σ ), implying corr(ug, ei,g) = corr(ug, xi,g) = corr(ug, zg) = 0)), ignoring it can affect inference by artificially inflating the amount of information in the data.

This problem is, of course, well-known, as discussed in Angrist and Pischke(2009), Arceneaux and

Nickerson(2009), Bertrand, Duflo and Mullainathan(2004), Kloek(1981), Moulton(1986, 1990),

Wooldridge(2003); and the references therein.

The degree of overconfidence that OLS standard errors exhibit in the face of clustered data depends on several factors. As Moulton(1986) shows, the degree of downward bias increases, all else equal, as the average number of observations in groups increases, the correlation of the intra-group disturbances increases, or the correlation of intra-group regressors increases. For a data-generating process like that in Equation (2.2), the group-level regressor zg is fixed within the group, as is the case in most political science applications that investigate “contextual effects.”

Thus, the intra-group regressor correlation is equal to 1, its maximum.

One can summarize these relationships and approximate the amount of downward bias in

OLS standard errors with a formula derived in Kloek(1981), Moulton(1990), and Angrist and

Pischke(2009). Assuming, for simplicity, that the error structure is like that in Equation (2.2)

2 (with one individual-level error term ei,g ∼ i.i.d. N(0, σe ) and one group-level error term ug ∼ 2 i.i.d. N(0, σu)) that all regressors are fixed within groups (no xi,g, only zg and its coefficient γ2), and that each group has the same number of members m, then the intra-group (or intra-class) correlation coefficient is 2 σu ρe = 2 2 (2.3) σu + σe and the approximate amount of downward bias is given by the square root of

V (γ ˆ2) = 1 + (m − 1)ρe (2.4) Vc(γ ˆ2)

ˆ2 0 −1 where Vc(γ ˆ2) is the usual OLS variance estimate for the coefficient (a diagonal element of σ (Z Z) ) and V (γ ˆ2) is the correct variance estimate given this type of clustering. At the extreme, when

2 ρe is equal to one, then σe must equal zero, and so no new information is added by additional 19 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA individual-level observations. In this case, OLS standard errors would be too small by a factor of √ about m.

The above formula is a good approximation for the amount of downward bias in OLS standard errors if clustering is ignored, even when the data-generating process includes other regressors that are not fixed at the group level and when the groups are not of equal size (Moulton, 1990).

However, this bias approximation does assume that the heteroskedasticity in the data is com- pletely due to clustering, specifically clustering of the form in Equation (2.2), where each group

2 error term has the same σu. And so, one cannot simply fix the bias in OLS standard errors by scal- ing them by the square root of Equation (2.4) unless one is willing to make very strict assumptions.

Otherwise, a variance- that is robust to clustering of unknown form is needed.

Such a corrective is discussed in the next section.

2.3 Cluster-Robust Standard Errors

One common parametric fix for data that is clustered in groups is to estimate the model parameters in the usual way (using OLS, for example) and then adjust the standard errors on the parameters to take into account the group-level error term. If one wishes to remain agnostic about the form of clustering in the data, then a common way to accomplish this standard error fix is to use cluster- robust standard errors (CRSEs), also known simply as clustered standard errors. A t-test can then be performed using a test statistic calculated by dividing the parameter estimate by the CRSE instead of by the typical standard error.

CRSEs are a generalization of a type of standard errors that are robust to heteroskedasticity of unknown form, commonly called sandwich estimators, Huber-White standard errors (after Huber

(1967) and White(1980), who each derived them), or White standard errors. The OLS estimate of the variance-covariance matrix, E[(Z0Z)−1Z0eˆeˆ0Z(Z0Z)−1], where eˆ is an N x 1 vector of residuals, reduces to the familiar σˆ2(Z0Z)−1 because of the classical assumption that E[ˆeeˆ0] = σ2I, i.e., the homoskedasticity assumption that error variance is constant across all observations. This means CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 20 that the N x N matrix E[ˆeeˆ0] contains the same σ2 in each diagonal element and zero in each off- diagonal element. The sandwich estimator still restricts the off-diagonal elements of the variance- covariance matrix to be zero (disallowing correlation across observations) but allows the diagonal terms to vary and be estimated by the square of the residuals (Primo, Jacobsmeier and Milyo, 2007,

Rogers, 1993).

The cluster-robust approach further generalizes the variance-covariance matrix to allow for correlation across observations in the same group (Liang and Zeger, 1986, Rogers, 1993). In other words, the off-diagonal covariance elements are allowed to be non-zero within groups, and are es- timated by the product of the within-group residuals. Thus, CRSEs are robust to heteroskedastic- ity of unknown form and to clustering among observations in the same group. The cluster-robust variance-covariance matrix is given by

G ( mg ! mg !0) 0 −1 X X X 0 −1 (Z Z) eˆizi eˆizi (Z Z) (2.5) g=1 i=1 i=1

Like any finite sample estimate, the cluster-robust variance-covariance matrix estimate needs to be multiplied by a finite-sample adjustment, which, in this case, is (G/(G − 1)) ∗ ((N − 1)/(N − k)), where k is the number of regressors. CRSEs are calculated by taking the square root of the appropriate element of the matrix.

Monte Carlo experiments in Rogers(1993) show that, for data with a large number of clusters, t-tests using CRSEs falsely reject the null hypothesis (commit Type I errors) an appropriate number of times for a given confidence level, whereas t-tests using Huber-White robust standard errors are overconfident (commit too many Type I errors). Thus, for data in which errors are clustered within groups, uncorrelated across groups, and have a large number of groups, CRSEs are an improvement over typical OLS standard errors and over robust standard errors.

However, the asymptotic properties of CRSEs kick in only when the number of groups G approaches infinity (Nichols and Schaffer, 2007). Thus, when the number of groups is small, the properties of CRSEs are less understood. Rogers(1993) argues that, as long as the largest cluster 21 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA composes no more than 5% of the sample, then CRSEs should be approximately the correct size.

For groups of equal size, this implies that 20 groups should be enough. However, Monte Carlo evidence in Bertrand, Duflo and Mullainathan(2004) and Kezdi(2004) suggests that the number of clusters should be closer to 50 for valid inference. Indeed Leoni(2009) finds that CRSEs are overconfident for as many as 30 groups even when there is no group-level residual variation (no clustering).

A further limitation of the cluster-robust approach is that the rank of the variance-covariance matrix is no greater than the number of groups G. Thus, one cannot include more than G re- gressors in any model in which CRSEs are needed. Preliminary Monte Carlo analysis in Nichols and Schaffer(2007) suggests that, when the number of regressors is close to G, the small sample properties of CRSEs are even worse.

Political scientists are often interested in analyzing clustered data in which the number of groups is fewer than 50, and increasing the number of groups is often impossible since, for ex- ample, one cannot create new countries or sub-national units. (And, ironically, gathering more individual-level data without increasing the number of groups exacerbates the challenges of us- ing grouped data.) Thus, t-tests with CRSEs may prove infeasible for inference in many political science applications because of their questionable properties when the number of groups is small.

In the following section, I discuss an alternative test that is non-parametric and so could prove to be a good alternative to CRSEs.

2.4 Randomization Tests

Randomization tests are a non-parametric way to conduct hypothesis tests of the effect of a vari- able on some outcome.1 They are non-parametric in that their validity does not depend on any assumptions about the shape of the errors in the data-generating process (no normality assump-

1Randomization tests can be inverted to obtain standard errors as well. See, e.g., Rosenbaum(2002) and Ho and Imai(2006). In work in progress, I am using Monte Carlo simulations to assess the coverage of standard errors in randomization tests using grouped data. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 22 tion), on any assumptions about constant error variance (no homoskedasticity assumption), nor on any asymptotic properties (no large sample needed).

Randomization tests are free from these assumptions because they do not use theoretically- derived reference distributions to judge the rarity of the test statistic. Instead, they construct the reference distribution from the data themselves. Thus, unlike the cluster robust-approach, which adjusts the standard error of the group-level parameter (or the factor by which the parameter is standardized to create a test statistic) to make the theoretical distribution appropriate for evalu- ating its significance, the randomization approach creates a custom reference distribution for the data at hand.

The basic procedure for performing a randomization test is analogous to conducting any hy- pothesis test. First, set up the model and estimate its parameters in the usual way. Then, calculate a test statistic on the parameter of interest. Next, compare that test statistic to a reference distribu- tion that shows how the test statistic is distributed when the null hypothesis is true. As usual, if the test statistic is more extreme than α% of the distribution, then one can reject the null hypothesis at the ((1 − α) ∗ 100)% confidence level.

Where randomization tests depart from parametric hypothesis tests is in the construction of the reference distribution. Instead of comparing the test statistic to its theoretical distribution under the null hypothesis, randomization tests use the data to create a reference distribution. This is accomplished by shuffling the variable of interest in a manner consistent with the null hypothesis.

In other words, simply randomly reorder the variable of interest, which ensures that there is no systematic relationship between it and the dependent variable. Using this shuffled variable in place of the observed variable in a regression will, on average, produce a test statistic that is equal to zero. Performing this shuffling and re-estimating procedure many times yields a distribution of test statistics, centered at zero, that would arise were the null hypothesis true. Comparing the test statistic calculated with the observed, un-shuffled data to this empirically-generated reference distribution gives a randomization test p-value.

While randomization tests are a non-parametric approach to hypothesis testing, they do make 23 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA one important assumption about the error terms—that they are exchangeable. Exchangeability means that if the null hypothesis is true (if the variable of interest indeed has no effect) then ob- served outcomes across individuals would be similar (conditional on confounding covariates) no matter what the level of the variable of interest. In other words, if exchangeability holds, then under the null hypothesis, the variable of interest is merely a label that can be applied to any observation without changing the expected outcome. This justifies the shuffling procedure. Ex- changeability is a weaker condition than the standard i.i.d. assumption, or, in the case of cluster- ing, that observations are independent across clusters, since i.i.d. implies exchangeability but not vice versa.

Implementation. Randomization tests were originally proposed by Fisher(1935) to test the effect of a treatment in a randomized experiment.2 In order to apply them to large-N observational data that is grouped, there are three factors to consider: how many shuffles should be performed, how to appropriately shuffle multivariate data, and how to appropriately shuffle clustered data.

How many shuffles? Unlike many laboratory experiments, political science data typically con- tain many hundreds or thousands of observations. Thus, shuffling the data to represent all pos- sible permutations of the variable of interest quickly becomes infeasible, and an “exact” random- ization test is impossible. Sampling many times without replacement from the set of possible permutations gives an “approximate” randomization test3. Monte Carlo evidence suggests that, for most applications, randomization tests using 1000 draws should be powerful enough to detect an effect at the 95% confidence level (Manly, 1997, Moore et al., 2003). Modern computing power makes this drawing this number of shuffles, or even many more, easily feasible.

2One may worry that randomization tests are only suitable for inference using data with randomized treatments. However, the one assumption required to make the randomization test valid is mathematically weaker than the stan- dard parametric assumption. Thus, if one is willing to run a regression, one must be willing to conduct a randomization test. 3These tests are also sometimes referred to as permutation tests vs. randomization tests. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 24

How should we shuffle multivariate data? Most political science data is observational.4 Al- though there is widespread agreement about how to conduct a randomization test using data that were generated by a randomized experiment (e.g., Edgington and Onghena(2007), Good(1994)) and about how to test the significance of a coefficient in a univariate model (e.g., Manly(1997)), many methods have been proposed for partial regression coefficients in multivariate observational data. The concern with this type of data is that, unlike experimental or univariate data, the vari- able of interest Z may be systematically correlated with the other regressor(s) X. If the shuffling method breaks the correlation between Z and X, then the results of the randomization test may be affected by the level of correlation in the original data.

Kennedy(1995) reviews the most common methods in the literature. These include simply shuffling the variable of interest Z or shuffling the dependent variable Y . More complicated meth- ods include “residualizing Y ”—residualizing Y with respect to the other covariates X, shuffling the residualized Y , and regressing it on Z—and “shuffling residuals”—regressing Y on X, shuf-

fling the residuals from this regression, adding them to the predicted Y from this regression, and regressing the new Y vector on Z and X. As Kennedy(1995) notes, the shuffling residuals method is mathematically equivalent to residualizing both Y and X.

The results from Monte Carlo analyses (using ungrouped data) in Kennedy and Cade(1996) suggest that the simple method of shuffling Z is sufficient in the multivariate context so long as inferences are based on a reference distribution of test statistics and not on a reference distribu- tion of coefficients. The logic behind this recommendation is as follows. Shuffling Z does not hold constant the collinearity between Z and the other covariates X. Say, for example, that Z and X are highly collinear. Then, we would expect that the standard errors on the coefficients of these variables to be large. Because shuffling Z destroys the collinearity between Z and X, the coefficients obtained from the randomization method may not vary as much as they would in actual repeated sampling. Thus, the distribution of randomized coefficients would be too tight, and inferences made by comparing the actual coefficient to this distribution would be too confi-

4For a review of non-parametric techniques, including randomization tests, for experimental political science data, see Keele, McConnaughy and White(2008). 25 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA dent. Because test statistics are adjusted for variance magnitude, they are unaffected by changing the collinearity in the data. In other words, in a shuffle Z test, test statistics (and, because they are monotonically-related, p-values) are pivotal statistics—their distribution does not depend on nuisance parameters like the correlation between Z and X.

Further Monte Carlo analyses on ungrouped data in O’Gorman(2005) confirms that the sim- ple shuffle Z method performs as well in terms of size (falsely rejects the null hypothesis at an acceptable rate) and power (correctly rejects the null hypothesis at an acceptable rate), even in the presence of non-normal error structures and high collinearity between Z and the other covariates

X. Therefore, if one is interested in testing the significance of the partial coefficient of Z on Y , simply shuffling Z seems to be recommended. I test the fitness of this method on grouped data with the Monte Carlo simulations herein.

How should we shuffle grouped data? The final issue to consider in applying a randomization test to multivariate observational data with clustered errors is how to shuffle data in which ob- servations are grouped. As Moore et al.(2003) put it, one should sample from the set of possible permutations that are consistent with the study design. In other words, the group-level variable of interest should be randomized across groups but not within them. For example, one could imag- ine giving one state’s laws to another state to see how the change affects individual outcomes, or even giving one legislator’s experience to another to see how it affects her behavior. However, applying the characteristics of country A to one citizen from country C and the characteristics of country B to another citizen of country C makes little sense. Applying the experience of legisla- tor A to one vote of legislator C and the experience of legislator B to another vote of legislator C makes even less sense. By definition, group-level characteristics are non-excludable within group.

Randomizing across but not within groups preserves the correlation among observations in the same group that arises because of the clustered nature of the data. Randomizing both within and across groups would be analogous to the mistake of ignoring the possibility of clustering.5

5This group-level randomization strategy has been used in several recent empirical applications in the social sci- ences, including Helland and Tabarrok(2004) to test the effect of “shall issue” gun laws on crime; Donohue and Wolfers(2006) to test the deterrent effect of capital punishment; Erikson, Pinto and Rader(2010) to test the effect of CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 26

Thus, to perform a randomization test on data generated by a process like that in Equa- tion (2.2), one would first estimate the model coefficients using OLS and calculate the t-test statistic associated with γˆ2 in the usual way. Then, one would “block” randomize the vector Z in such a way that all members of the same group mg receive the same value of Z, and re-estimate the model and the t-statistic on γˆ2 with the shuffled Z. Repeating the shuffling and re-estimating steps at least 1000 times gives a reference distribution of test statistics that take into account the clustered nature of the data. Finally, one would compare the t-test statistic from the unshuffled data to the empirically generated distribution to test the significance of γˆ2 at some confidence level.

Randomization tests, now more feasible due to increased computing power, are used widely in experimental settings (Edgington and Onghena, 2007, Good, 1994, Manly, 1997) and increasingly in social science experiments (Ernst, 2004, Keele, McConnaughy and White, 2008) and on obser- vational data (Kennedy and Cade, 1996). Monte Carlo experiments in Manly(1997) show that, under a variety of conditions, randomization tests on multivariate data perform well in terms of size and power compared to t- and f-tests—they falsely reject the null hypothesis less often and correctly reject the null hypothesis almost as often (and sometimes more often). However, there have been no Monte Carlo evaluations of the performance of randomization tests on grouped data compared with common parametric approaches. In the following sections, I will compare randomization tests to t-tests with CRSEs, investigating their size and power.

2.5 Evaluation Criteria

In order to compare the performance of t-tests using cluster-robust stand errors and randomiza- tion tests, the following analysis uses Monte Carlo experiments to measure the size and power of the two tests.

For any hypothesis test, one of two exhaustive hypotheses is actually true—either the null hy- pothesis H0 or the HA. And, one of two conclusions will be drawn—the post-registration laws on voting; Lax and Rader(2010) to test the effect of precedent on Supreme Court voting behav- ior; and Erikson, Pinto and Rader(N.d.) to test whether democracies trade more with other democracies. 27 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA researcher will either reject the null hypothesis or not. Thus, there are four possibilities, summa- rized in Table 2.1, two in which the researcher makes the correct decision and two in which she makes an error. Rejecting the null hypothesis when it is in fact true, or detecting an effect when there is none, is a Type I error. The probability of committing a Type I error is α, also known as the size or significance level of the test. The quantity 1 − α is the test’s confidence level and is used to construct a confidence interval around the parameter of interest. Failing to reject the null hypothesis when it is in fact false, or failing to detect an effect when one does exist, is a Type II error. The probability of committing a Type II error is β. The quantity 1 − β is the power of the test. Ideally, a hypothesis test will have the lowest Type I (α) and Type II (β) errors rates possible.

Bad Decision Good Decision

H0 is true Reject H0 — Type I error Fail to Reject H0 p = α p = 1 − α = nominal size or significance level = nominal confidence level

HA is true Fail to Reject H0 — Type II error Reject H0 p = β p = 1 − β = power of the test

Table 2.1: Hypothesis Testing

Size. The researcher chooses α, the nominal size of the test, or the proportion of false positives that are acceptable. For example, by using a test with an α level of .05, the researcher expects to falsely reject the null hypothesis 1 in 20 times. Type I errors may be eliminated all together by choosing an α that approaches zero. However, this would come at an inferential cost. For a given test and dataset, the smaller the test size, the larger the Type II error rate. In other words, the less likely a test is to find an effect when one does not exist, the less likely it also is to find an effect when one does exist (Greene, 2000).

While the nominal size of a test is under the researcher’s control, the actual size can only be determined empirically using Monte Carlo simulations. When the actual size equals the nominal size, then the test is “exact.” When the actual size is smaller than the nominal size, then the test is CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 28 too conservative and could be improved upon by increasing the rejection region. When the actual size is greater than the nominal size then, unbeknownst to the researcher, the test commits Type I errors more often than desired (Good, 1994).

For a parametric test, like a t-test with CRSEs, the nominal size equals the actual size only when the data meet the assumptions required to use the theoretical reference distribution (Ernst, 2004).

Thus, parametric tests are only exact under specific distributional and sample size conditions.

For a non-parametric test, like a randomization test, the size should always be correct since the data themselves generated the reference distribution (Moir, 1998). When a randomization test is conducted using only a sample of all possible permutations, its actual size is exact on average and could vary around the nominal size due to sampling variability. If the number of shuffles were increased, the variability of the actual size of the randomization test would, of course, decrease.

Thus, in Monte Carlo simulations, t-tests with CRSEs should commit more Type I errors than indicated by the nominal size but should decrease in size (approach nominal size) as the number of groups increases. The size of the randomization test, on the other hand, should not be related to the number of groups. Randomization tests should be more appropriately sized than t-tests with CRSEs, especially when the number of groups is fewer than 50, because they do not rely on asymptotics in G (or any parameter).

Power. The power of the test cannot be directly chosen by the researcher, but it can, in some cases, be manipulated because some of the factors that affect the power are under the researcher’s control. The power of the test increases with the effect size and the sample size N (or for grouped data, the number of groups G), and decreases with the amount of noise in the data (Park, 2008).

Also, as explained above, the power increases for larger α levels but so does the chance of Type I error.

Ideally, among tests of a given size, one should choose the test that is the most powerful.

However, it may be the case that no one test is uniformly the most powerful (Good, 1994). For non-grouped data, Manly(1997) found that randomization tests were often less powerful than 29 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA t-tests and f-tests but not uniformly so. In general, non-parametric tests are less powerful than parametric tests of the same size; however, randomization tests are more powerful than rank- based non-parametric tests because they do not throw away information about magnitude (Moir,

1998).

Thus, in Monte Carlo simulations, t-tests with CRSEs should be more than or equally as pow- erful as randomization tests for a given nominal α level. All else equal, as the number of groups increases, the effect size γ2 increases, or the nominal α level increases, the power of both tests should increase.

2.6 Monte Carlo Experiment

In order to compare the size and power of t-tests with CRSEs and randomization tests, I generated clustered data using the following process

yi,g = γ0 + γ1xi,g + γ2zg + ei,g + ug (2.6) where i = 1, ..., N indexes observations, and g = 1, ..., G indexes groups. Each group has the same number of members mg = 500. The variables xi,g and zg and the error terms ei,g and ug are each

∼ i.i.d. N(0, 1). The intercept γ0 and the individual-level coefficient γ1 are both equal to 1. All

67 possible correlations among xi,g, zg, ei,g, and ug are zero.

I allow two parameters to vary across the Monte Carlo experiments. The coefficient γ2 on the variable of interest zg can be 0, 0.25, 0.5, 1, or 2 in order to assess the power of the tests for different effect sizes. The number of groups G can be 10, 20, or 50 in order to evaluate the performance of the tests for various numbers of groups.8

6According to Equation (2.4), ordinary standard errors on data like this would be too small by a factor of p 1 + (500 − 1) ∗ (1/2) or about 16. 7 See the Appendix for xi,g and zg correlated. 8Recall that, for CRSEs, Rogers(1993) recommends at least 20 groups when the number of observations is equal across groups, while Bertrand, Duflo and Mullainathan(2004) and Kezdi(2004) recommend at least 50 groups. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 30

For each the 15 combinations of the varying parameters, I conducted 1000 Monte Carlo exper- iments. In each run, the following 5 things occur:

1. A dataset is generated according to the process described above.

2. Y is regressed on X and Z.

3. An ordinary t-test is performed on the coefficient γ2 on Z, and its p-value is stored.

9 4. A t-test with CRSEs is performed on γ2, and its p-value is stored.

5. A randomization test with 1000 shuffles of Z is performed on γ2, and its p-value is stored.

After each of the three Monte Carlo simulations in which the group level effect γ2 was equal to zero, I determined the actual the size α of the ordinary t-test (as a baseline), the t-test with CRSEs, and the randomization test. I did so by calculating what proportion of the p-values associated with each test were smaller than three common nominal α levels: 0.01, 0.05, and 0.1.10 This measures how often each test commits a Type I error by falsely rejecting the null hypothesis. If the tests are correctly sized, then their Type I error rates should be equal to their nominal sizes.

After each of the 12 Monte Carlo simulations in which the group level effect γ2 was greater than zero, I determined the power of the t-test with CRSEs and the randomization test. As with size, I did so by calculating what proportion of the p-values associated with each of the tests were smaller than nominal α levels: 0.01, 0.05, and 0.1. This measures how often each test correctly rejects the null hypothesis that γ2 is zero.

2.7 Results and Discussion

Figure 2.1 shows the size results from the Monte Carlo experiments for nominal α levels 0.01, 0.05, and 0.1 and number of groups G equal to 10, 20, and 50. The “C” points plot the actual size of the

9Clustered standard errors and corresponding t-tests were calculated using an R script in Arai(2009). 10Using the distribution of p-values to assess the rejection rate is equivalent to using the distribution of test statistics. 31 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA t-tests with CRSEs along the x-axes, and the “R” points plot the randomization tests.11 Because I conducted only 1000 Monte Carlo trials, the actual size of the tests will be measured with some error. To account for this, I calculated 95% binomial confidence intervals around the nominal α levels for sample size 1000 using the Wilson method recommended in Agresti and Coull(1998).

The shaded gray areas show these confidence intervals, and the dotted line marks the nominal size of the test. If the estimated actual size of a test is outside of these bounds, then it is statistically different from its nominal size at the 95% confidence level.

As expected, the actual size of the t-tests with CRSEs decreases as the number of groups in- creases. Because CRSEs rely on large G in order for their asymptotic properties to hold, they are too small when G is small. Thus, t-tests that incorporate CRSEs are overconfident, making Type I errors at rates that are significantly higher than their nominal α levels. With the exception of α=.01 and G=50, the actual size of the t-test with CRSEs falls outside of the 95% confidence interval for all nominal sizes and group numbers tested here.

In contrast, the size of the randomization test is unrelated to the number of groups. Most im- portantly, the actual size of the randomization test is not significantly different from its nominal size in any of the combinations of α levels and group numbers tested here. Even for G=10, a num- ber of groups that is smaller than what is ever recommended for use with CRSEs, randomization tests are accurately sized. That is, their Type I error rates are as advertised.

Figure 2.2 shows the power results from the Monte Carlo experiments for nominal α levels

0.01, 0.05, and 0.1; number of groups G equal to 10, 20, and 50; and effect size γ2 equal to 0, 0.25, 0.5, 1, and 2. The “C” points plot the power (or rejection rate) of the t-tests with CRSEs against the various γ2 sizes, and the “R” points plot the randomization tests. When γ2 equals zero, the points represent the sizes of the tests, the same rejection rates shown in Figure 2.1.

Under all conditions, both tests approach or achieve maximal power equal to 1. As expected, the power of both tests increases as the number of groups increases, the effect size γ2 increases,

11The actual sizes of the ordinary t-tests are not plotted. But, as expected, they are grossly over-sized, with Type I error rates at 86% or higher. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 32 and the nominal α level increases. Most importantly, the power of the of the t-test with CRSEs is greater than the randomization test, at least for G=10 and 20, and reaches maximal power at a smaller effect size γ2. When G=50, the power of the two tests is approximately the same.

Thus, when G is small, the randomization tests are clearly more appropriately-sized than the t-tests with CRSEs, but they are also less powerful.

So, which test is better: the t-test with CRSEs or the randomization test? Typically, one should choose the most powerful test for a given α level (Greene, 2000). However, in this case, the t-test with CRSEs, while usually more powerful than the randomization test for a given nominal α level, does not have the desired actual α level. Thus, one should not necessarily prefer the t-test with

CRSEs based on its higher power.

Another way to consider the comparison is, for a given dataset, there is a tradeoff between size, which favors the randomization test, and power, which favors the t-test with CRSEs. If one is willing to make Type I errors at a higher rate than the nominal α level, then one may prefer the more powerful t-test with CRSEs.

On the other hand, if one chooses to use the t-test with CRSEs, especially for G fewer than

50, and ends up rejecting the null hypothesis that γ2 is zero, then the p-value associated with that test, or the confidence with which one makes the rejection, will be over-stated. This is, in a sense, false advertising. Thus, there is a strong argument to be made that the incorrect actual size of t-tests with CRSEs for datasets with fewer than 50 groups makes them inappropriate for inference on group-level effects. As Good(1994, 17) puts it, “The importance of an exact test cannot be overestimated, particularly a test that is exact regardless of the underlying distribution.” This certainly applies to the randomization test.

2.8 Applications

The Monte Carlo analysis presented here shows that randomization tests are more appropriate than t-tests with CRSEs for testing the effects of group-level variables on individual-level out- 33 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA comes in that, unlike t-tests with CRSEs, they are correctly sized for groups as few as 10 and are only slightly less powerful. However, as with any Monte Carlo analysis, these results can tell us only so much about real world data since the results generalize only to the extent that the DGP I created captures the salient features of real grouped data. In other words, it is worth investigating whether actual grouped datasets in political science exhibit enough clustering to interfere with group-level inference using t-tests or t-tests with CRSEs. Most importantly, can t-tests with CRSEs and randomization tests lead one to different substantive conclusions when testing the implica- tions of contextual theories? Here, I briefly describe three applications in which the choice of test does make a difference for substantive conclusions.

2.8.1 State Postregistration Laws and Voting

In Erikson, Pinto and Rader(2010), we use randomization tests to assess the impact of laws de- signed to increase voting rates among already-registered voters (postregistration laws). These laws include keeping the polls open early or late in the day, mailing sample ballots and polling place information, and time off work on election day for public or private employees. Following

Wolfinger, Highton and Mullin(2005), we estimate a model that predicts an individual’s propen- sity to vote based on individual-level demographic characteristics, state-level characteristics, and state postregistration laws, in which the effects of postregistration laws are allowed to vary by the type of individual. For example, mailed polling place information is interacted with individual education level in order to test the hypothesis that this information is more useful for individuals with lower education levels.

We show that typical hypothesis tests vastly overstate the significance of both additive and in- teraction effects of postregistration laws and that tests with CRSEs, the approach recommended in

Primo, Jacobsmeier and Milyo(2007), overstate the significance of interaction effects. We perform a block randomization test, randomly shuffling the menu of laws in one state to another state.

Using this procedure, state-level laws generally fail to be significant both as additive effects and as interactions with individual characteristics. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 34

An example of the results from this analysis appears in Figure 2.3. The first column displays the distribution of z values obtained from each shuffle of the randomization test of the signifi- cance of mailed polling place information and its interaction with individual education level. The shaded gray areas mark the 5% and 10% most extreme values in these distributions. The dotted lines represent the z value estimates from the model using the actual observed data. Compared to the randomization test reference distributions, these z values are not rare relative to what we ob- serve from random shuffles of the data. In other words, we cannot reject the null hypothesis that mailed polling place information does not increase the likelihood that an registered individual will turnout to vote, regardless of the individual’s education level.

The second column in Figure 2.3 shows the distribution of p-values calculated with the con- ventional test in each random shuffle of the data. If the conventional test is working appropriately, we should see it reject the null hypothesis on randomly shuffled (i.e., systematically meaningless) data 5% of the time at the 95% confidence level. Instead, the conventional test rejects the null be- tween 30-45% of the time for this particular variable, showing that ignoring the problem of clus- tering in this data risks falsely attributing success to postregistration laws designed to increase turnout.

2.8.2 Precedent and Voting on the Supreme Court

In Lax and Rader(2010), we use randomization tests to assess the impact of landmark decisions on Supreme Court justices’ voting in a judicial regimes framework. As explained in Richards and

Kritzer(2002) and elsewhere, a judicial regime establishes which case factors in a given area of the law should affect the outcome of a case and how much each case factor should be weighed in the decision. For example, in freedom of expression cases, these case factors could include whether or not a particular restrictive law was content neutral or content based, that is whether it restricted speech generally or only speech about a specific topic. A regime change occurs when the Court announces a major landmark decision, like, e.g., Grayned vs. Rockford, that alters the weights accorded to key case factors. Richards and Kritzer(2002) and others test this theory by 35 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA looking for a in justices’ voting behavior before and after a precedent.

Specifically, they estimate a model that predicts justices’ votes from relevant case factors and a regime indicator variable, and then they estimate a model that also includes interaction terms between the case factors and the regime indicator, allowing the case factor weights to vary across the regimes. Finally, they use a Chow test to compare the log likelihoods of two models to test the significance of changing case factor weights. With this approach, they find that the weights do in fact change after the landmark decision occurs.

However, this approach ignores the grouped structure of the data and so does not account for possible clustering across justices, cases, or years. Because the key regime variable is measured at a higher level than the outcome variable, the justice vote, this clustering should cause the Chow test to be overconfident. In Lax and Rader(2010), we indeed find this to be the case. We reanalyze three applications of judicial regime change using a randomization test that randomly shuffles the regime indicator variable. We find that not one of the significant changes in weights for jurispru- dentially relevant case factors from the original analyses holds up with the randomization test.12

We also find that the typical Chow test commits Type I errors up to 99% of the time.

An example of the results from this analysis appears in Figure 2.4. In this example, we used a dataset of voting on freedom of expression cases before and after Grayned vs. Rockford, as in

Richards and Kritzer(2002). The first column show the distribution of chi-squared values obtained from each shuffle of the regime indicator variable for the tests of change in case weights for all variables, for only jurisprudentially relevant case factors, and for content neutral alone.13 The gray area covers the 5% most extreme values, and the dashed lines represent the chi-squared values from the actual observed data. Using our randomization test, we cannot reject the null hypothesis of no change in case factor weights before and after Grayned.

The second column in Figure 2.4 shows the distribution of p-values calculated with the con-

12This applies to the tests in which votes are restricted to justices that were on the court both before and after the precedent. 13Note that these distributions look like chi-squared distributions, as one would expect from creating a reference distribution using a chi-squared test statistic. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 36 ventional Chow test in each random shuffle of the regime variable. Again, if the test is working appropriately, we should see it reject the null hypothesis on randomly shuffled data 5% of the time at the 95% confidence level. Instead, the conventional test rejects the null between 24-99% of the time. Thus, for at least some of these tests, ignoring clustering can lead a researcher to find evidence of regime change no matter which years after labeled before or after the key precedent.

2.8.3 Democratic Trade

In the previous two examples, randomization tests led to different substantive conclusions about the effectiveness of state postregistration laws and the apparent constraint of precedent on Supreme

Court voting. In Erikson, Pinto and Rader(N.d.), we use randomization tests to assess the claim that democracies trade more with other democracies. Although our tests support the democratic trade hypothesis, as much of the empirical literature does, we nonetheless find that the level of clustering in the typical democratic trade dataset is quite high and that ignoring it produces highly over-confident results.

Following the common approach, we use a dyad-year level dataset and estimate a model re- lating minimum democracy score within the dyad to bilateral trade within the dyad, controlling for country and dyad-level characteristics, and including in some specifications dyad and/or year

fixed effects. Departing from the common approach, we use a randomization test, instead of a t-test, to test the significance of minimum democracy score on trade in the dyad. We randomly shuffle the of country-level democracy scores and, after each shuffle, recalculate the minimum score within each dyad. We find that minimum democracy score is positively associ- ated with bilateral trade across each specification we use. However, we also find that the typical t-test produces p-values that are over 46 trillion times too small. This of course makes the typical t-test highly overconfident.

An example of the results from this analysis appears in Figure 2.5. The first column shows the distribution of t-statistics obtained from each shuffle of the times series of country democracy score. The gray areas shade the 5% and 10% most extreme values, and the dotted line marks the 37 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA t-value from the coefficient on minimum dyadic democracy score from the observed data. For the two specifications shown, a model with dyad and year fixed effects and a model with year

fixed effects and a lagged dependent variable, the democracy score is statistically significant at conventional levels. The second column shows the distribution of p-values from the typical t- test on each shuffle of randomized democracy score. Once again, the typical test rejects the null hypothesis on meaningless, scrambled data at much too high a rate, between 66-86% of the time.

Thus, even though our substantive conclusions about democracy and trade hold-up, the t-test is not the appropriate way in which to test the democratic trade hypothesis using dyadic data.

2.9 Conclusion

Political scientists often ask questions that require making inferences about the effects of vari- ables measured at the group level on outcomes measured at the individual level. Inference with grouped data presents special challenges because the amount of independent information in the data is often more related to the number of groups than to the number of individual observations.

A common parametric solution to this problem is to calculate cluster-robust standard errors and to use those errors in hypothesis tests. However, the Monte Carlo evidence in this article adds to the growing literature which shows that the cluster-robust approach is too overconfident for datasets with fewer than 50 groups. Unfortunately, when the number of groups available is fewer than 50, the applied researcher is rarely in the position to do anything about it.

The Monte Carlo experiments here reveal that randomization tests are superior to t-tests with

CRSEs, especially when the number of groups is small. Because randomization tests do not rely on asymptotic properties, their size is exact regardless of the number of groups. Also, because randomization tests do not rely on distributional assumptions about errors, they should maintain their size properties under a wide variety of conditions. While randomization tests, like many nonparametric tests, are less powerful (here, only slightly less powerful) for a given dataset than are t-tests with CRSEs, they allow inferences on group-level effects to be conducted with the cor- CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 38 rect level of confidence. In short, randomization tests are exact, whereas t-tests with CRSEs are overconfident and yield only minimal power gains. Because ever-increasing computing power has made the implementation of randomization tests more feasible and because real-world polit- ical science datasets have levels of clustering that cannot be ignored, randomization tests should be used more widely by political scientists. 39 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA

2.10 Appendix

In the Monte Carlo experiments here, corr(xi,g, zg) = 0, which means that the observed individual- level covariates are uncorrelated with the observed group-level covariates. When this is not the case, when the variable of interest is very highly correlated with some other regressor, there is some concern in the literature on randomization tests and observational data that the simple shuf-

fle Z method I explore in the main text may result in larger-than-expected Type I errors. As dis- cussed in Section 2.4, Kennedy(1995) warns that because shuffling Z destroys the collinearity be- tween Z and X, the coefficients obtained from the randomization method may not vary as much as they would in actual repeated sampling. Thus, the distribution of randomized coefficients would be too tight, and inferences made by comparing the actual coefficient to this distribution would be too confident. However, so long as inferences are based on a pivotal statistic (one whose distribution does not depend on an unmeasured nuisance parameter), like a test statistic, then shuffling Z should work well, even when correlation between Z and X is as high as .9 (Kennedy and Cade, 1996, O’Gorman, 2005). I have confirmed the results in these citations with my own

Monte Carlo analyses on correlated, ungrouped data. Randomization tests using the shuffle Z method maintain correct size regardless of the level of correlation in the data, for N as small as 10.

However, to date, there have been no Monte Carlo evaluations of randomization tests on data that are both grouped and correlated. Figure 2.6 shows the results from such an analysis of size, in which Z and X are uncorrelated, correlated at 0.3, and correlated at 0.8 for groups equal to 10, 20, and 50. The “C” points plot the actual size of the t-tests with CRSEs along the x-axes, and the “R” points plot the randomization tests’ sizes. The shaded gray areas show 95% binomial confidence intervals using the Wilson method recommended in Agresti and Coull(1998) around the nominal

α level 0.05 to account for the fact that the actual level is measured with error.

Just as in the main results, the actual size of the t-tests with CRSEs decreases as the number of groups increases but generally falls outside of the 95% confidence bounds of the nominal size.

However, unlike in the main results, the randomization test size is too small when the correla- CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 40 tion between Z and X is 0.8.14 This is surprising for two reasons. First, correlation between Z and

X does not act as a nuisance parameter when the data are not grouped. Second, the discrepancy between nominal and actual size is in the opposite direction of what Kennedy(1995) hypothesized.

Instead of being overconfident in the face of correlation in the data, the block randomization test is underconfident. This means that the reference distribution created by block randomizing the

Z vector is too fat in the tails, the proximate cause of which is that the t-values calculated using the shuffled datasets are too large and the standard errors too small.15 Thus, while false posi- tives are not an issue, the test may be underpowered when the group-level variable Z is in fact systematically correlated with the outcome variable Y .

Is this result cause for concern? Perhaps not. First, in the applications I discuss here, the cor- relation between any group-level and any individual-level variable does not reach a high enough level for underconfidence in the test to be a problem.16 Second, if the test rejects the null, then one need not worry about underconfidence. Third, one may find underconfidence to be a de- sirable property in a test. Describing this possibility, Rosenbaum(2009) writes that a properly implemented randomization test guarantees that the null will not be incorrectly rejected more of- ten than then nominal size but that it may perform “better than the promised performance,” that is, it may make Type I errors less often than expected (p. 365). On the other hand, lower Type I error rates imply higher Type II error rates. And it is somewhat of a mystery why randomization tests maintain their size with high correlation in individual-level data and with grouped data with no correlation, but not with both.

Shuffle Residuals Method. Despite finding that the shuffle Z method works well to create refer- ence distributions of test statistics, Kennedy and Cade(1996) nonetheless prefer the shuffle resid- uals method (described in Section 2.4), since it explicitly rids Z and Y of the influence of X. The

14My exploratory analysis suggests that randomization tests become significantly undersized around correlation of 0.7. 15The t-values could also be too large because the coefficients on the shuffled Z0s are too large, but I have already eliminated this possibility. 16The highest positive correlation is 0.23 and the largest negative correlation is -0.35, both in the judicial regimes application. 41 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA method proceeds as follows: first, regress Y on X; then, randomly shuffle the residuals from this regression; next, add the shuffled residuals to the predicted Y from this regression; and finally, regress the new Y vector on Z and X. The test statistic on Z from the second regression forms the basis for the shuffled-residuals randomization test.

In the grouped-data context, this method needs to be modified. The residuals vary at the individual-level but they must be block randomized in order to respect the grouped structure of the data. Thus, this method is limited to data in which each group has the same number of members.17 The “E” points in Figure 2.6 plot the actual size of such a shuffled-residuals method for the parameters described above. When correlation between Z and X is 0.8, this method is correctly-sized except when the number of groups is 10. Thus, except for the limitations this method has for data with varying group sizes, it seems to be a promising approach for datasets in which correlation between a group-level and individual-level regressor is high. I plan to do further analysis to determine if the performance of the method is indeed dependent on the number of groups or if this is just a fluke. (When corr(X,Z) =0.3, the size does not appear to be a function of G.)

Inference with Fit Statistics. As an alternative to randomization inference with distributions of tests statistics, Walker(2010) suggests using a distribution of fit statistics with the shuffle Z method. That is, block shuffle Z, but instead of gathering test statistics on the Z variables in each shuffle, gather the fit statistics instead, and compare the fit statistic from the regression on the observed, unshuffled data to this distribution. The “F” points in Figure 2.6 plot the actual size of such a test for the parameters described above. When correlation between Z and X is 0.8, this method is overconfident regardless of the number of groups. While it performs about the same or better as t-tests with CRSEs, it is nonetheless more overconfident than the shuffle residuals method.

17This is analogous to the issue in Erikson, Pinto and Rader(2010), in which what is shuffled is the time series of country democracy scores, requiring that each country be represented in the dataset for the same set of years. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 42

Other Approaches. I have preliminary results from two additional alternative methods. The first is a block shuffle Z method that uses a t-test with CRSEs as the basis for inference. The second is a method that aggregates the individual-level regressor X and the outcome variable Y to the group level and performs a shuffle Z test on the aggregated data. Preliminary results (for groups=10 and corr(X,Z) =0.8) suggest that the shuffle-residuals method outperforms these approaches as well. 43 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA

nominal α = .01

groups R C 10

20 R C

50 RC

0 0.05 0.1 0.15 0.2 0.25 0.3 actual α

nominal α = .05

groups R C 10

20 R C

50 R C

0 0.05 0.1 0.15 0.2 0.25 0.3 actual α

nominal α = .10

groups R C 10

20 R C

50 R C

0 0.05 0.1 0.15 0.2 0.25 0.3 actual α

Figure 2.1: Size results from Monte Carlo experiments. Results shown for nominal α levels .01, .05, and .1. Shaded gray area represents 95% binomial confidence intervals around the nominal α level, denoted with dotted line. Confidence intervals calculated using the Wilson method recommended by Agresti and Coull(1998). CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 44

nominal α = .01 nominal α = .05 nominal α = .10

1 1 C 1 RC ) C ) R )

β R β β C C R C R groups 10 C R C R C R C C

rejection rate (1 − rejection rate (1 − C C rejection rate (1 − α R C C R α R R α R R R

0 0.25 0.5 1 2 0 0.25 0.5 1 2 0 0.25 0.5 1 2

effect size ( γ2 ) effect size ( γ2 ) effect size ( γ2 )

nominal α = .01 nominal α = .05 nominal α = .10

1 CR 1 CR 1 CR ) ) C ) RC C R β β β R groups 20 C C R R C C R C R R rejection rate (1 − C rejection rate (1 − rejection rate (1 − α C R α C R α RC R

0 0.25 0.5 1 2 0 0.25 0.5 1 2 0 0.25 0.5 1 2

effect size ( γ2 ) effect size ( γ2 ) effect size ( γ2 )

nominal α = .01 nominal α = .05 nominal α = .10

1 CR CR 1 CR CR 1 CR CR ) ) ) C RC

β β R β C R groups 50

RC C R C R rejection rate (1 − rejection rate (1 − rejection rate (1 − α C α C R α RC R

0 0.25 0.5 1 2 0 0.25 0.5 1 2 0 0.25 0.5 1 2

effect size ( γ2 ) effect size ( γ2 ) effect size ( γ2 )

Figure 2.2: Power results from Monte Carlo experiments. Results shown for nominal α levels .01, .05, and .1, and number of groups equal to 10, 20, and 50. Dotted lines at nominal α level and one (maximum power). 45 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA

Mailed polling place information Mailed polling place information Mailed polling place information 1.5 2.5 0.15 Chance of Type I error = 0.45 2.0 1.0 0.10 1.5 density density density 0.5 1.0 0.05 0.5 0.0 0.00 −0.5 0.0 0.5 1.0 −10 −5 0 5 0.0 0.2 0.4 0.6 0.8 1.0 coefficients z value p value

Mailed polling place information Mailed polling place information Mailed polling place information x education x education x education 5

0.20 Chance of Type I error = 0.31 1.5 4 0.15 3 1.0 density density density 0.10 2 1 0.05 0.5 0 0.00 −0.2 −0.1 0.0 0.1 0.2 −4 −2 0 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 coefficients z value p value

Figure 2.3: Randomization Test Results. Mailed polling place information and its interaction with individual education level on an individual’s propensity to vote. CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 46

Figure 2.4: Randomization Test Results. Regime change in freedom of expression cases before and after Grayned vs. Rockford 47 CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA

Figure 2.5: Randomization Test Results. Minimum dyadic democracy score and bilateral trade CHAPTER 2. RANDOMIZATION TESTS FOR GROUPED DATA 48

corr(X,Z) = 0

groups ERF C 10

20 EFR C

50 FRE C

0 0.05 0.1 0.15 0.2 actual α

corr(X,Z) = .3

groups RE F C 10

20 REF C

50 REF C

0 0.05 0.1 0.15 0.2 actual α

corr(X,Z) = .8

groups R FE C 10

20 R E F C

50 R E CF

0 0.05 0.1 0.15 0.2 actual α

Figure 2.6: Size results from Monte Carlo experiments. Results shown for data in which Z and X are uncorre- lated, correlated at 0.3, and correlated at 0.8. Shaded gray area represents 95% binomial confidence intervals around the nominal α level, denoted with dotted line. Confidence intervals calculated using the Wilson method recommended by Agresti and Coull(1998). 49 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Chapter 3

Party Effects on the Distribution of Federal Outlays: A Regression Discontinuity Approach CHAPTER 3. PARTY EFFECTS ON OUTLAYS 50

3.1 Introduction

Can members of the majority party in the U.S. House of Representatives direct a disproportionate share of federal outlays to their districts? Party-cartel theories of legislative organization posit that majority party status accords party members influence over legislative outcomes that they would not otherwise wield. The implication for distributive spending is that a majority party member elected from a given district could extract more distributive benefits for the district than a could minority party member even if elected by the same district for the same Congress. By contrast, universalistic accounts of distributive spending in Congress posit that so-called “pork” projects are doled out equally across districts, such that majority party status should not accord a mem- ber any particular advantage in bringing home the bacon. Findings from the current empirical literature are mixed in their support for any majority party effect on spending. In the following analysis, we address this question using an empirical approach that is novel to the distributive spending literature. Specifically, we employ a quasi-experimental regression discontinuity (RD) design to estimate the casual effect of majority party status on the pattern of federal outlays to marginal congressional districts.

We also use an RD design to test the somewhat atheoretical but still provocative claim of- ten made by pundits and politicians themselves that members of one party in particular, either

Democrats or Republicans depending on who is speaking, are the bigger spenders.

One would expect that if universalistic accounts of bargaining in Congress are correct, then a representative’s own party identification, whether as a majority party member or as an affiliate of a particular party, would have no correlation with her spending patterns, all else equal. For example, a majority party member or a Democrat elected from a district that desired (or required) a large amount of federal funds to spend on education would bring home approximately the same amount of money for education as a minority party member or a Republican would if elected by the same district for the same Congress. However, if party-cartel theories are correct, then a representative’s majority party membership should have a positive effect on the federal outlays in her district, all else equal. A minority party member elected from a district that desired a large 51 CHAPTER 3. PARTY EFFECTS ON OUTLAYS amount of federal funds to spend on education would not be able to secure as much money for education as a majority party member could if elected by the same district for the same Congress.

As a thought experiment, one could imagine randomly assigning members’ party identifica- tions to Congressional districts across the country and across time. Then one could test for a majority party effect on spending behavior by simply looking at the difference in mean spending across those districts that ended up with a majority member and those represented by a minority member. One could also test for a party affiliation effect by looking at differences in means across

Democratic and Republican districts. In this ideal experiment, the randomization of party identifi- cation ensures that assignment to one party or the other is ignorable. That is, bias due to selection or confoundedness is not an issue. As with many ideal experiments in political science, such an experiment is clearly constitutionally and practically infeasible. As an alternative, one could, as many studies have, test for majority party or party affiliation effects with a basic regression design.

However, this relies on correctly specifying, measuring, and controlling for member and district characteristics that affect both the party identification of a district’s representative and spending in the district. This would include difficult to measure factors like district-level demand for federal outlays, district-level need for federal outlays, and district-level opinion on distributive spending.

However, a particular feature of the U.S. majoritarian voting scheme, that the party of the winner is assigned discontinuously at fifty percent (plus one vote) of the vote share, allows us to employ a much cleaner quasi-experimental design, with internal validity that does not rely on operationalizing and controlling for possibly confounding factors.

To test the effect of majority party status and party affiliation on spending behavior, we use a regression discontinuity design to analyze data on U.S. House of Representatives elections and federal outlays in districts during fiscal years 1983 to 2002. Using RD, we find no evidence of a causal effect of majority party status nor of party affiliation on outlays to the district. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 52

3.2 The Theory of Party Effects

Two major classes of theories of legislative organization make different predictions for the ex- istence of party effects on spending in districts. Universalistic theories predict that electorally- minded members will distribute benefits among themselves equally. Party-centered theories pre- dict that members of the majority party will be able to secure a disproportionate amount of dis- tributive spending for their districts.

3.2.1 Universalistic Theories

The first theoretical accounts of Congressional bargaining over distributive goods predicted that those goods would be allocated to districts in a universal manner. Mayhew(1974) notably argued that it “makes no sense” for the majority party to deprive the minority party of their share of the goods because costs of including them are much lower than the costs of hard-ball partisan politics. Instead, electorally-minded members of Congress self-select onto committees for which they are high demanders and use their committee memberships to extract distributive gains for their districts above and beyond the level preferred by the House member. A consequence of this arrangement is that policy will tend to be particularistic in the benefits conferred but will confer those benefits to all members’ districts in a “universal” manner. Later work formalized this argument. Shepsle and Weingast(1981) show that rational legislators may prefer universalism because they are uncertain over which members will be a part of a minimum winning coalition on any given issue. Universalism acts as insurance against this uncertainty, such that the political benefits of universal policies, while perhaps economically inefficient, outweigh the political costs.

Likewise, Collie(1988) posits that coalitional instability creates an incentive for legislators to form universalistic coalitions.

These accounts rely on the assumption members have a strong electoral incentive to seek ben- efits for their constituents and that constituency preferences vary geographically, such that each member desires to satisfy a unique constituency. Through cooperative logrolling, legislators can 53 CHAPTER 3. PARTY EFFECTS ON OUTLAYS capture gains from trade when their constituents demand different types of goods (Weingast and

Marshall, 1988). Thus, a member trades influence over policies her constituents care less about in order to gain influence over issues that are most important to her constituents. The locus of power in these accounts are the committees, with their jurisdiction-specific, self-selected high- demanders, who facilitate and enforce the logrolls among members. While, Collie(1988) and

Weingast and Marshall(1988) speculate that strong parties may substitute for strong committees, this does not change their predictions that distribution should be universal.

In general, committee-centered “universalistic” theories of congressional organization imply that congressional districts represented by members of the minority party are just as likely as districts represented by members of the majority party to benefit from federal funds allocated in distributive policies, all else equal. So, too, are Democrats no more likely to direct more federal assistance to their districts than are Republicans. Thus, based on these theories, we would expect to find to no causal effect of a member’s majority party status or party affiliation on federal outlays to their districts.

3.2.2 Party-Centered Theories

Parties play no analytical role in shaping legislative outcomes in the accounts discussed above, and in some models, their power to affect individual members’ behavior is explicitly assumed to be nil (e.g., Weingast and Marshall, 1988). On the other hand, party-centered theories of legislative organization predict that majority party members can direct more distributive benefits to their districts because of their procedural powers.

Contemporary scholarship on parties began with Cooper and Brady(1981) and Rohde(1991) who theorize that party strength is endogenous to partisanship in the electorate. Further, Rohde

(1991) argues that, when parties are strong, they have power over member preferences, the nature of items on the legislative agenda, and actual legislative outcomes. Cox and McCubbins(1993) formalized the mechanisms by which party members may wield their power. They argue that electorally-minded majority party members have an incentive to benefit not only their own con- CHAPTER 3. PARTY EFFECTS ON OUTLAYS 54 stituents, but also those of their fellow party members because their electoral success is somewhat tied to the party’s reputation as a whole. One way in which majority party leaders accomplish this goal is to organize committees to ensure that majority members will benefit from distribu- tive gains reaped in committees. This implies that any logrolling that occurs within or across committees would disproportionately benefit majority party members. Thus, strong committees and strong parties are not substitutes, but the majority party makes use of the inherent strength of committees to pursue its own ends. Similarly, Balla et al.(2002) argue that the majority party gives the minority party some pork but a smaller share of pork than its gets for its own members.

The majority party acts in this way to prevent the minority party from attacking its members for being profligate spenders while still being able to secure a comparative electoral advantage.

If the party-centered theorists are correct about the power of the majority party in the House, then a district represented by a majority party member would benefit more from federal funds allocated in distributive policies than a district represented by a minority party member, all else equal. Thus, based on these theories, we would expect to find a positive and significant causal effect of a member’s majority party status on federal outlays to their districts.

3.2.3 Democrats versus Republicans

While there is no real theoretical reason to hypothesize that either districts represented by Democrats or districts represented by Republicans would receive more federal outlays, the accusation that one party or the other spends too much is certainly common among political pundits and politi- cians themselves. If this is indeed the case, it could be because one party is somehow better at directing pork to their constituents than the other. Or, it could be that one party has a stronger taste for pork than the other.1

1Or, one party represents constituents that need or want more pork. If this is the case, our estimate will not capture it. We’re estimating a pure party effect, independent from public need or opinion. 55 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

3.2.4 Empirical Tests of Party Effects on Spending

The empirical literature on majority party effects on spending is quite mixed. Some studies find that a representative’s party has no effect on how much money a district receives from the federal government. Stein and Bickers(1994) find that funds from pork barrel programs in the 100th

Congress were distributed evenly across Democratic and Republican districts and that the few programs that advantaged one party over the other were split evenly between the parties. Bickers and Stein(2000) note that in the Republican-controlled 103rd and 104th Congresses, the content of federal outlays shifted away from entitlement programs to contingent liability programs but that the overall level of outlays to districts changed little.

Other studies find that a representative’s majority party status has a clear and positive effect on the amount of federal funds a district receives. Carsey and Rundquist(1999) find that, from

1963 to 1989, districts represented by Democrats, the majority party, were more likely to get mili- tary procurement contracts, especially if the representative was on a defense committee. Alvarez and Saving(1997 b) find that, during the 1980s, Democrats were successful at directing a dispro- portionate level of overall federal outlays to their districts. Balla et al.(2002) find that members of the majority party, be they Democrats or Republicans, secured more higher education earmarks for their districts than did the minority party from 1995 to 2000.

Still other studies find a limited or mixed role for parties in influencing the distribution of fed- eral funds to districts. Alvarez and Saving(1997 a) find that, in 1989 and 1990, membership in the

Democratic party was positively associated with federal outlays in districts but only under cer- tain definitions of outlays. They also find that the effect of committee membership on the level of outlays was much stronger than the effect of party. Lee(2003) argues that because Congressional districts are electoral units and not administrative units, members should only be able to target earmarked funds to their districts, since earmarked funds are the most geographically specific out- lays. Accordingly, she finds that the majority party benefits disproportionately from earmarks but not from federal funds generally. In the most long-term look at the effect of party on the distribu- tion of federal funds, Levitt and Snyder(1995) find that programs with high variation in spending CHAPTER 3. PARTY EFFECTS ON OUTLAYS 56 levels and programs in which formulas determine spending levels are skewed toward districts with higher proportions of Democratic voters but that this effect is absent during divided govern- ment. They also find that the party of a given district’s representative has no systematic effect on the allocation of funds to the district. Levitt and Snyder conclude that a long period of Democratic control enabled Democrats to shift funds to their districts generally, but that Democrats were un- able to target funds to specific districts. Finally, Berry, Burden and Howell(2008) find that districts represented by majority party members received slightly more federal funding from low-variation entitlement programs but were not advantaged in new program spending.

While these studies are not directly comparable because they cover different time periods and subsets of federal spending in districts, taken as a whole, they offer scant support for the strongest predictions of party-centered theories.

3.3 Regression Discontinuity Design

One feature each of the empirical studies cited above do share is that they all employ simple regressions on observational data. Thus, not only is their validity threatened by omitted variable bias, they also rely on accurately controlling for district-level demand for federal outlays, district- level need for federal outlays, and district-level opinion on distributive spending. By contrast, our approach allows us to estimate an unbiased treatment effect of majority party status and of party affiliation on the distribution of federal funds to districts without having to operationalize and control for possibly confounding factors.

Regression discontinuity (RD) methods are used to estimate how a treatment or policy change affects individual outcomes in the absence of a randomized controlled trial (Hahn, Todd and

Van der Klaauw, 2001, Thistlethwaite and Campbell, 1960). RD designs are part of a class of methods known as quasi-experimental that identify treatment effects with observational data, in- cluding and instrumental variables (Nichols, 2007).

Estimates of the causal effect of a policy change on individual outcomes typically founder on 57 CHAPTER 3. PARTY EFFECTS ON OUTLAYS the infeasibility of using an experimental design. In an experiment, participation in a treatment group is randomly assigned so that outcomes may be compared across treated and nontreated groups and cannot be attributed to selection bias (that is, to characteristics treated participants do not share with nontreated participants). Therefore, the average difference in outcomes is an unbiased estimate of an average treatment effect.

However, many political science experiments must remain hypothetical because there is no legally, politically, or economically feasible way to conduct them. In this case, the ideal experiment for testing the effects of majority party status and party affiliation on federal outlays to the district would involve randomly assigning representatives to the majority or minority party or to the

Democrats or Republicans. This is obviously not possible. Thus, we are compelled to estimate treatment effects using available observational data.

3.3.1 Implementation

To use an RD design, we must have a situation where participants are assigned to treatment groups based solely on an observed variable, called the assignment variable (or running vari- able, or forcing variable). In the simplest case, participants with values of the assignment variable above a known threshold are assigned to a treatment group, and those with values of the assign- ment variable below the threshold are assigned to a control group, deterministically.

This is the case in our applications. Because representatives are elected by majority vote in the district, the RD design is a particularly appropriate method for estimating the effect of party on the distribution of federal funds. With vote share for the majority party candidate (or for the Democrat) as the assignment variable, districts are assigned to the treatment group (being represented by a majority party member or by a Democrat) at 50% plus one of the vote share.2

Districts falling just below the 50% threshold are assigned to the control group (being represented by a minority party member or by a Republican). Thus, unlike in a true experiment, treatment and control groups are not assigned randomly, but the measure underlying their assignment can

2For the test of the effect of party affiliation on spending, we arbitrarily choose Democrat to be the treatment. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 58 be observed. In this way, the assignment of majority party status and of party affiliation in very close elections approximates random assignment. The key insight of RD design is that if there is a discontinuity in the outcome variable near the threshold, where observations are essentially the same in every way, then there is a treatment effect.

The RD estimate of the treatment effect (the gap at the discontinuity) is the coefficient on an indicator variable for treated observations (those with values of the assignment variable above the threshold) in a regression equation that also includes the assignment variable and its interaction with the treatment indicator. Alternative specifications could use higher-order polynomial terms, different distributional assumptions (leading to different regression techniques), or a variety of kernel regression techniques in the place of polynomials. In fact, a local , where only neighboring observations are included at each point, is standard, and the choice of band- width for such a local is often chosen to minimize expected mean squared error (MSE), following e.g. Imbens and Kalyanaraman(2009) (IK), balancing bias and variability in estimates. A smaller bandwidth uses observations closer to the threshold and therefore tends to have lower bias and higher variability; a larger bandwidth includes observations farther from the threshold and therefore tends to have greater bias and lower variability. The IK “optimal” bandwidth aims to balance these sources of error on average, using information about slopes and standard deviations in the vicinity of the threshold and regularization terms to lessen the influence of atypical draws from the data generating process, and performs well in simulations.

Herein, we use a local linear regression within the IK optimal bandwidth and also within bandwidths of sizes 3, 6, and 12 percentage points (where the maximum possible bandwidth size is 50 percentage points) to test the sensitivity of our estimates to this choice.

3.3.2 Assumptions

The internal validity of an RD design depends on two crucial assumptions. First, we assume that observations at the threshold but in different treatment regimes are exchangeable; that is, within a narrow range around the threshold, assignment to treatment is effectively random. Note that the 59 CHAPTER 3. PARTY EFFECTS ON OUTLAYS goal is to establish exchangeability between the two sets of districts very close to the 50 percent threshold, not between all districts above and below the threshold. It is acceptable (and likely) for there to be significant differences in the means of observable characteristics in all districts represented by a majority member versus those represented by a minority member or between those represented by a Democrat versus a Republican. For example, one would expect that the percent of the population that is black in Democratic districts is higher than that in Republican districts. However, exchangeability holds as long as confounding variables (those that affect both the election outcome and the level of spending) are continuous through the threshold (Lee, 2008).

In other words, as long as the percent of the population that is black, etc, does not “jump” at 50 percent voting for the majority party candidate or at 50 percent voting for the Democrat, then any jump in spending in the district can plausibly be attributed to the treatment (majority member or Democrat in office).3 Thus the RD design, unique among quasi-experimental methods and standard regression techniques, obviates the need to correctly specify and control for potentially confounding covariates.

Similarly, the second assumption is that the outcome itself is a continuous function of the assignment variable, conditional on treatment status, especially near the threshold. Because ex- changeability holds only near threshold, the RD estimate can be considered a local average treat- ment effect (LATE).

3.4 Data and Results

We use data on federal outlays from the Federal Assistance Awards Data System (FAADS), which is gathered by the Census Bureau and includes spending from all federal domestic assistance programs. For example, these programs include social security, agricultural subsidies paid to pro-

3In their examination of roll call voting behavior in the House, Lee, Moretti and Butler(2004) established that key district characteristics were continuous functions of the Democratic two party vote share in pooled data from 1946 to 1995. Nichols and Rader(2007) test the continuity condition using data on district characteristics from the 1990 Census and the election returns from 1990 and find no statistically significant jumps at the threshold in the tested characteristics. In a recent paper, Caughey and Sekhon(2011) question the exchangeability of bare winners and losers in House elections. This is not a concern here because we are comparing winners to winners. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 60 ducers, research grants, and community development grants. FAADS does not include data on spending from federal procurements, government employee wages, and certain loan programs.

The raw FAADS data includes a congressional district indicator. However, for programs that pro- vide transfer payments to individuals, only county-level aggregates are reported. In cases where counties include more than one congressional district, Bickers and Stein(1992) apportioned the payments to the congressional districts based on district population, so that a greater proportion of the payments could be aggregated to the district level.4 We use their version of FAADS from

fiscal years 1983 to 2002.5

We subset the data to “new actions” only. “New action” does not necessarily mean new pro- gram. As noted in Stein and Bickers(1994), members rarely have to initiate wholly new programs to provide benefits to their districts. Instead, they initiate new projects within existing programs.

In the FAADS data, spending marked “new action” includes the initial payment of a new pro- gram or a new project within an extant program. Unlike many of the empirical papers discussed above, we do not, for now, disaggregate the FAADS data further by type of program of level of manipulability. Thus, the party effects we estimate will be an average over all programs that have varying manipulability. See the conclusion for our plans for future work.

Data on House elections from 1980 to 2000 were provided courtesy of Robert Erikson. In any election year t, the winning candidates serve in the years t+1 and t+2. In any Congressional year, the House and Senate appropriations committees typically report bills that authorize discretionary spending for the upcoming fiscal year by July, and Congress passes the bills by October, the start of the new fiscal year (Schick, 2000). New nondiscretionary formula programs also usually begin in the year following their passage, at the earliest. Any change in the allocation of federal funds to districts by members elected in year t would not take effect until fiscal years t + 2 and t + 3. Thus, we match every election year t to spending data from fiscal years t + 2 and t + 3.

4Our results the same regardless of whether we include the county-level . 5See the Appendix for results on different subsets of years. 61 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

3.4.1 Majority Party Power

Using the percent of the two-party vote for the majority party candidate in the district as the assignment variable, we can use an RD design to estimate the treatment effect of having a majority party representative on the amount of federal outlays to a congressional district.6 The presence of a statistically significant treatment effect favoring the majority party would be evidence in favor of the predictions we derived from the party-centered theories of Congressional organization. That is, if districts that barely elected a member of the majority party get more federal assistance than districts that barely elected a member of the minority party, then we can attribute this difference in spending to the member’s majority status. On the other hand, if federal outlays are a smooth and continuous function of the majority party vote share, even through the fifty percent threshold, then this would be consistent with (though not confirmative of) the universalistic predictions about the distribution of funds to districts.

Table 3.1 shows the results of local linear regressions of log federal outlays on majority party vote share in the district. Again, the treatment effect estimate displayed is the coefficient on a treatment indicator variable, and the regression includes that variable, the assignment variable, and their interaction. Columns 1 - 3 show the estimate of the treatment effect of majority party status on outlays using the IK optimal bandwidth. In column 1, standard errors are calculated in the usual way. Columns 2 and 3 use bootstrapped standard errors, and in column 2, the opti- mal bandwidth is recalculated in each resample. Column 4 displays the estimate of the optimal bandwidth itself.7 Columns 5 - 7 show the majority party effect estimate using bandwidth sizes 6,

3, and 12, respectively. T-statistics are displayed in parentheses, and asterisks indicate statistical significance at at least the 95% confidence level. The first line of table 3.1 shows the estimate for the pooled sample, 1983-2002, and the subsequent lines show year-by-year estimates.

For ease of interpretation, figures 3.1 to 3.6 display the same estimates in graphical form, with

6The assignment variable is the two party vote for the Democrat if the Democrats went on to win a majority in the House in that election (election years 1980-1992) and is for the Republican if the Republicans won the majority (election years 1994-2000). 7The significance test in this column is whether the optimal bandwidth is different from zero. Of course, it always is provided that there is enough data near the threshold for the estimate to converge. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 62 point estimates indicated by dots and 95% confidence intervals indicated by horizontal lines.

Using the IK optimal bandwidth, we find, contra our theoretical predictions, that marginal districts that were represented by a minority party member actually received 36.9% more federal outlays from 1983 to 2002 than did districts represented by a majority party member. This effect is statistically significant at the 95% confidence level when bootstrapped standard errors are used

(columns 2 and 3). The graph in figure 3.7 shows federal funding to each district plotted against the percent of voters in the district who voted for the majority party candidate. Fitted curves that model federal funding as a function of majority party victory margin are estimated up to the threshold and above the threshold. The treatment effect of majority party status on federal fund- ing can be observed as a visible jump in funding at the zero point (where fifty percent of voters voted for the majority party candidate). A negative, but small jump exists at the threshold, again indicating that marginal minority party-represented districts were advantaged in distributional spending over majority party districts.

However, this unexpected result does not hold up to closer inspection. When we disaggregate the estimate by year, we find that the negative result is driven solely by large negative coefficients in the 1986 and 1987 spending data, as seen in table 3.1. Otherwise, we find what appears to be random variation around zero. This is easily seen in the graphical representation in figures 3.1 to

3.6.

In this particular application, the size of the majority party effect in the pooled sample appears to be the largest in when estimated within the IK optimal bandwidth. The estimates in columns

5-7, each of which are larger bandwidths than the optimal estimate of 2.63, are still negative but smaller as the bandwidth gets larger. Figure 3.8 shows the sensitivity of the pooled estimate, bracketed by its 95% confidence interval, to the size of the bandwidth for bandwidths 1 to 20, with the IK optimal bandwidth indicated by a vertical line. While the pooled estimate remains negative over most of the range, it shrinks as the bandwidth gets larger. As expected, the effect is estimated with more precision as the bandwidth gets larger, though it is never bounded away from zero.

As discussed above, this precision comes at the expense of possible bias in the estimate, since we 63 CHAPTER 3. PARTY EFFECTS ON OUTLAYS expect that exchangeabilty holds only near the threshold.

In conclusion, we find no support for the strongest predictions of majority party power models— that majority party members can use their procedural power to direct disproportionate amounts of distributive benefits to their districts. In nearly every fiscal year from 1983 to 2002, there is no statistically significant difference between federal outlays in districts that barely elected a major- ity party member versus districts that barely elected a minority party member. The null finding persists regardless of the choice of bandwidth for the RD design.

3.4.2 Democrats vs. Republicans

In addition to testing the competing hypotheses of universalistic and party power models of bar- gaining over distributive goods in Congress, we can use our design to test the claims often made by political pundits and politicians themselves that one party or the other is the bigger spender.

Using the percent of the two-party vote for the Democratic candidate in the district as the as- signment variable, we can use an RD design to estimate the treatment effect of a representative’s party affiliation on the amount of federal outlays to her congressional district.8 The presence of a statistically significant and positive treatment would be evidence that the Democrats are bigger spenders. The presence of a statistically significant and negative treatment effect would be evi- dence that the Republicans are bigger spenders. That is, if districts that barely elected a Democrat get more or less federal assistance than districts that barely elected a Republican, then we can attribute this difference in spending to the member’s party affiliation.

Table 3.2 shows the results of local linear regressions of log federal outlays on Democratic vote share in the district. As explained above, the treatment effect estimate is the coefficient on a treatment indicator variable from a regression that includes the indicator variable, the assignment variable, and their interaction. Columns 1 - 3 show the estimate of the treatment effect of party affiliation on outlays using the IK optimal bandwidth. In column 1, standard errors are calculated

8We arbitrarily chose the assignment variable to be the two-party vote for the Democrat. Of course, we could use the two-party vote for the Republican instead. This would only reverse the sign of the RD estimate. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 64 in the usual way. Columns 2 and 3 use bootstrapped standard errors, and in column 2, the opti- mal bandwidth is recalculated in each resample. Column 4 displays the estimate of the optimal bandwidth itself. Columns 5 - 7 show the party affiliation effect estimate using bandwidth sizes

6, 3, and 12, respectively. T-statistics are displayed in parenthesis, and asterisks indicate statistical significance at at least the 95% confidence level. The first line of table 3.2 shows the estimate for the pooled sample, 1983-2002, and the subsequent lines show year-by-year estimates.

For ease of interpretation, figures 3.9 to 3.14 display the same estimates in graphical form, with point estimates indicated by dots and 95% confidence intervals indicated by horizontal lines.

Using the IK optimal bandwidth, we find that marginal districts that were represented by a

Democrat received 11.4% more federal outlays from 1983 to 2002 than did districts represented by a Republican, but this difference is not statistically significant. Thus, contrary to the claims often leveled by politicians against each other, we cannot say with confidence that one party secures more federal assistance for their districts than another. This insignificance persists across typical

(column 1) and bootstrapped standard errors (columns 2 and 3). The graph in figure 3.15 shows federal funding to each district plotted against the percent of voters in the district who voted for the Democratic candidate. Fitted curves that model federal funding as a function of Democratic victory margin are estimated up to the threshold and above the threshold. The treatment effect of party affiliation on federal funding is hardly visible at the zero point (where fifty percent of voters voted for the Democrat). Again, this indicates that no one party consistently spent more money in the 20 years covered.

When we disaggregate the estimate by year, we find, as we did for majority party status, that

fiscal years 1986 and 1987 are outliers. In both of these years, we find a large negative effect, indicating that Republicans directed significantly more federal outlays to their districts in these years. In fact, because the Republicans were the minority party in the House in 1986 and 1987, these estimates are the exactly the same as those in table 3.1. Thus, were we to attribute any substantive meaning to these findings, we could not be sure if the mechanism was driven by the fact that these districts were represented by a minority party member or by a Republican. 65 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

However, since we find what appears to be random variation around zero in other years, as shown in the graphical representations in figures 3.9 to 3.14, it is likely that these years are simply outliers.

The size of the party affiliation effect varies depending on the size of the bandwidth but never attains statistical significance. Figure 3.16 shows the sensitivity of the pooled estimate, bracketed by its 95% confidence interval, to the size of the bandwidth for bandwidths 1 to 20, with the IK optimal bandwidth indicated by a vertical line. The pooled estimate is mostly positive for small bandwidths and turns negative around bandwidth size 7, far above the optimal bandwidth of 3.6.

However, the estimate is never bounded away from zero. As expected, the effect is estimated with more precision as the bandwidth gets larger, though this increased precision trades off with possible bias since exchangeability across Democratic and Republican districts holds only near the threshold.

Thus, we find no evidence that members of one major party were able to direct more federal outlays to their districts than members of the other party in fiscal years 1983 to 2002. In nearly every year, there is no statistically significant difference between federal outlays in districts that barely elected a Democrat versus districts that barely elected a Republican. As was the case in the majority party status test, the null finding persists regardless of the choice of bandwidth for the

RD design.

3.5 Conclusion

In the preceding analysis, we employed a novel quasi-experimental method, regression disconti- nuity design, to estimate the causal effect of being represented by a member of the majority party on the amount of federal outlays to a congressional district. Across different estimation techniques and standard error calculations, we found no consistent effect of members’ majority party status on spending. Likewise, we could not say with any certainty that either Democrats or Republi- cans are able to direct more federal outlays to their districts. These findings are consistent with universalistic predictions of distributive bargaining. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 66

These null findings are also consistent with those in the empirical distributional literature that tend to find party effects only in manipulable subsets of federal assistance, only in formula pro- grams, or only for earmarks, but not in new federal outlays overall. In future work, we plan to use the RD design with data federal outlays that is disaggregated by type and by level of manip- ulablility. We also plan to employ this design to test the effect of incumbency on federal outlays. 67 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Table 3.1: Regressions of ln(expenditures) on majority party vote share: columns 1, 2, and 3 use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth of 12. Sample includes all districts matched to either 14 or 20 fiscal years. (1) (2) (3) (4) (5) (6) (7) IK IKbs IKb2 OB RD6 RD3 RD12 Pooled -0.369 -0.369∗ -0.369∗ 2.630∗∗∗ -0.168 -0.384 -0.132 (-1.69) (-2.18) (-2.10) (6.30) (-1.16) (-1.87) (-1.29) 1983 -0.236 -0.236 -0.236 4.392∗∗∗ -0.142 -0.573 0.197 (-0.41) (-0.37) (-0.44) (6.21) (-0.30) (-0.74) (0.58) 1984 0.199 0.199 0.199 5.455∗∗∗ 0.170 -0.191 -0.0549 (0.32) (0.25) (0.33) (8.79) (0.28) (-0.20) (-0.13) 1985 0.490 0.490 0.490 5.068∗∗∗ 0.395 0.150 0.0179 (0.81) (0.60) (0.71) (8.47) (0.67) (0.17) (0.04) 1986 -1.268∗∗ -1.268 -1.268∗ 5.088∗∗∗ -1.142∗ -1.751∗∗ -0.566 (-2.95) (-1.92) (-2.55) (6.30) (-2.56) (-2.80) (-1.47) 1987 -1.417∗∗∗ -1.417∗∗ -1.417∗∗ 4.885∗∗∗ -1.286∗∗ -1.979∗∗ -0.663 (-3.31) (-2.61) (-3.06) (7.02) (-2.89) (-3.27) (-1.66) 1988 -0.350 -0.350 -0.350 3.959∗∗∗ -0.576 -0.373 -0.839∗ (-0.94) (-0.67) (-0.66) (6.75) (-1.36) (-0.87) (-2.41) 1989 -0.167 -0.167 -0.167 4.078∗∗∗ -0.489 -0.189 -0.497 (-0.58) (-0.34) (-0.44) (9.17) (-1.37) (-0.54) (-1.69) 1990 0.650 0.650 0.650 5.437∗∗∗ 0.522 1.127 -0.250 (1.02) (0.85) (0.94) (8.10) (0.80) (1.32) (-0.51) 1991 0.0564 0.0564 0.0564 5.359∗∗∗ -0.0246 0.347 -0.581 (0.10) (0.09) (0.10) (8.08) (-0.04) (0.44) (-1.39) 1992 -0.214 -0.214 -0.214 4.973∗∗∗ -0.0308 -0.548 -0.317 (-0.53) (-0.38) (-0.48) (6.58) (-0.08) (-1.35) (-1.00) 1993 -0.361 -0.361 -0.361 5.405∗∗∗ -0.331 -0.396 -0.260 (-1.02) (-0.61) (-0.82) (8.24) (-0.95) (-1.22) (-0.85) 1994 -0.147 -0.147 -0.147 5.009∗∗∗ -0.0567 -0.666 -0.0848 (-0.37) (-0.31) (-0.35) (8.98) (-0.15) (-1.29) (-0.34) 1995 0.00493 0.00493 0.00493 4.519∗∗∗ 0.0673 -0.352 0.0356 (0.01) (0.01) (0.01) (8.34) (0.21) (-0.80) (0.17) 1996 0.149 0.149 0.149 4.347∗∗∗ 0.284 0.214 0.409 (0.64) (0.61) (0.61) (7.90) (1.23) (0.78) (1.95) 1997 -0.850 -0.850 -0.850 4.379∗∗∗ -0.529 -1.232 0.237 (-0.97) (-0.83) (-0.91) (6.30) (-0.67) (-1.07) (0.45) 1998 -0.265 -0.265 -0.265 5.935∗∗∗ -0.244 -0.212 0.0413 (-0.49) (-0.40) (-0.42) (9.90) (-0.44) (-0.39) (0.09) 1999 -0.837 -0.837 -0.837 5.417∗∗∗ -0.757 -0.301 0.0181 (-1.65) (-1.11) (-1.46) (6.02) (-1.28) (-0.57) (0.04) 2000 -0.431 -0.431 -0.431 5.777∗∗∗ -0.461 0.116 -0.754 (-0.69) (-0.40) (-0.61) (6.06) (-0.69) (0.08) (-0.94) 2001 -0.679 -0.679 -0.679 5.797∗∗∗ -0.680 -0.0167 -0.602 (-1.01) (-0.00) (-0.89) (8.57) (-0.96) (-0.01) (-0.88) 2002 2.079 2.079 2.079 5.505∗∗∗ 1.929 3.675∗∗ 0.833 (1.58) (1.12) (1.34) (6.47) (1.41) (2.79) (0.85) t statistics in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 CHAPTER 3. PARTY EFFECTS ON OUTLAYS 68

Figure 3.1: Estimates correspond to table 3.1, column 1. 95% confidence intervals shown.

Effect of majority party win on spending using OB, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10

Figure 3.2: Estimates correspond to table 3.1, column 2. 95% confidence intervals shown.

Effect of majority party win on spending using OB, bootstrap SE recalculating OB in each resample, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10 69 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Figure 3.3: Estimates correspond to table 3.1, column 3. 95% confidence intervals shown.

Effect of majority party win on spending using OB, bootstrap SE not recalculating OB, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10

Figure 3.4: Estimates correspond to table 3.1, column 5. 95% confidence intervals shown.

Effect of majority party win on spending using a bandwidth of 6 percentage points, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10 CHAPTER 3. PARTY EFFECTS ON OUTLAYS 70

Figure 3.5: Estimates correspond to table 3.1, column 6. 95% confidence intervals shown.

Effect of majority party win on spending using a bandwidth of 3 percentage points, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10

Figure 3.6: Estimates correspond to table 3.1, column 7. 95% confidence intervals shown.

Effect of majority party win on spending using a bandwidth of 12 percentage points, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10 71 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Figure 3.7: Effect of Majority Party Win on Spending. Locally weighted regressions model ln(spending) as a function of majority party victory margin in the district up to zero and above zero. Sample includes all districts matched to either 14 or 20 fiscal years. 25 20 15 10 -40 -20 0 20 40

Figure 3.8: Vertical line indicates the IK optimal bandwidth. Dots are point estimates, and bands are 95% confidence intervals. Sample includes all districts matched to either 14 or 20 fiscal years.

Maj Party Vote Share Dependence of estimate on bandwidth 1.5 1 .5 0 -.5 -1 0 5 10 15 20 Bandwidth CHAPTER 3. PARTY EFFECTS ON OUTLAYS 72 Table 3.2: Regressions of ln(expenditures) on Democratic vote share: columns 1, 2, and 3 use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth of 12. Sample includes all districts matched to either 14 or 20 fiscal years. (1) (2) (3) (4) (5) (6) (7) IK IKbs IKb2 OB RD6 RD3 RD12 Pooled 0.114 0.114 0.114 3.571∗∗∗ 0.113 0.0928 -0.0681 (0.61) (0.60) (0.66) (6.98) (0.77) (0.44) (-0.66) 1983 -0.236 -0.236 -0.236 4.392∗∗∗ -0.142 -0.573 0.197 (-0.41) (-0.41) (-0.44) (6.07) (-0.30) (-0.74) (0.58) 1984 0.199 0.199 0.199 5.455∗∗∗ 0.170 -0.191 -0.0549 (0.32) (0.31) (0.34) (9.40) (0.28) (-0.20) (-0.13) 1985 0.490 0.490 0.490 5.068∗∗∗ 0.395 0.150 0.0179 (0.81) (0.72) (0.86) (8.92) (0.67) (0.17) (0.04) 1986 -1.268∗∗ -1.268∗ -1.268∗∗ 5.088∗∗∗ -1.142∗ -1.751∗∗ -0.566 (-2.95) (-2.10) (-2.58) (6.48) (-2.56) (-2.80) (-1.47) 1987 -1.417∗∗∗ -1.417∗ -1.417∗ 4.885∗∗∗ -1.286∗∗ -1.979∗∗ -0.663 (-3.31) (-2.06) (-2.49) (6.97) (-2.89) (-3.27) (-1.66) 1988 -0.350 -0.350 -0.350 3.959∗∗∗ -0.576 -0.373 -0.839∗ (-0.94) (-0.65) (-0.77) (10.57) (-1.36) (-0.87) (-2.41) 1989 -0.167 -0.167 -0.167 4.078∗∗∗ -0.489 -0.189 -0.497 (-0.58) (-0.37) (-0.34) (8.79) (-1.37) (-0.54) (-1.69) 1990 0.650 0.650 0.650 5.437∗∗∗ 0.522 1.127 -0.250 (1.02) (0.75) (0.80) (6.72) (0.80) (1.32) (-0.51) 1991 0.0564 0.0564 0.0564 5.359∗∗∗ -0.0246 0.347 -0.581 (0.10) (0.08) (0.08) (6.94) (-0.04) (0.44) (-1.39) 1992 -0.214 -0.214 -0.214 4.973∗∗∗ -0.0308 -0.548 -0.317 (-0.53) (-0.29) (-0.37) (7.96) (-0.08) (-1.35) (-1.00) 1993 -0.361 -0.361 -0.361 5.405∗∗∗ -0.331 -0.396 -0.260 (-1.02) (-0.80) (-1.07) (9.88) (-0.95) (-1.22) (-0.85) 1994 -0.147 -0.147 -0.147 5.009∗∗∗ -0.0567 -0.666 -0.0848 (-0.37) (-0.30) (-0.39) (7.96) (-0.15) (-1.29) (-0.34) 1995 0.00493 0.00493 0.00493 4.519∗∗∗ 0.0673 -0.352 0.0356 (0.01) (0.01) (0.01) (8.51) (0.21) (-0.80) (0.17) 1996 -0.149 -0.149 -0.149 4.347∗∗∗ -0.284 -0.214 -0.409 (-0.64) (-0.57) (-0.61) (7.78) (-1.23) (-0.78) (-1.95) 1997 0.850 0.850 0.850 4.379∗∗∗ 0.529 1.232 -0.237 (0.97) (0.92) (1.01) (6.70) (0.67) (1.07) (-0.45) 1998 0.265 0.265 0.265 5.935∗∗∗ 0.244 0.212 -0.0413 (0.49) (0.49) (0.47) (8.45) (0.44) (0.39) (-0.09) 1999 0.837 0.837 0.837 5.417∗∗∗ 0.757 0.301 -0.0181 (1.65) (1.08) (1.32) (6.40) (1.28) (0.57) (-0.04) 2000 0.431 0.431 0.431 5.777∗∗∗ 0.461 -0.116 0.754 (0.69) (0.33) (0.43) (7.52) (0.69) (-0.08) (0.94) 2001 0.679 0.679 0.679 5.797∗∗∗ 0.680 0.0167 0.602 (1.01) (0.15) (0.55) (7.76) (0.96) (0.01) (0.88) 2002 -2.079 -2.079 -2.079 5.505∗∗∗ -1.929 -3.675∗∗ -0.833 (-1.58) (-1.04) (-1.33) (6.98) (-1.41) (-2.79) (-0.85) t statistics in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 73 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Figure 3.9: Estimates correspond to table 3.2, column 1. 95% confidence intervals shown.

Effect of democratic win on spending using OB, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10

Figure 3.10: Estimates correspond to table 3.2, column 2. 95% confidence intervals shown.

Effect of democratic win on spending using OB, bootstrap SE recalculating OB in each resample, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10 CHAPTER 3. PARTY EFFECTS ON OUTLAYS 74

Figure 3.11: Estimates correspond to table 3.2, column 3. 95% confidence intervals shown.

Effect of democratic win on spending using OB, bootstrap SE not recalculating OB, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10

Figure 3.12: Estimates correspond to table 3.2, column 5. 95% confidence intervals shown.

Effect of democratic win on spending using a bandwidth of 6 percentage points, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10 75 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Figure 3.13: Estimates correspond to table 3.2, column 6. 95% confidence intervals shown.

Effect of democratic win on spending using a bandwidth of 3 percentage points, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10

Figure 3.14: Estimates correspond to table 3.2, column 7. 95% confidence intervals shown.

Effect of democratic win on spending using a bandwidth of 12 percentage points, Sample includes all districts matched to either 14 or 20 fiscal years. Pooled 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992

Sample 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 -10 -5 0 5 10 CHAPTER 3. PARTY EFFECTS ON OUTLAYS 76

Figure 3.15: Effect of Democratic Party Win on Spending. Locally weighted regressions model ln(spending) as a function of Democratic victory margin in the district up to zero and above zero. Sample includes all districts matched to either 14 or 20 fiscal years. 25 20 15 10 -40 -20 0 20 40

Figure 3.16: Vertical line indicates the IK optimal bandwidth. Dots are point estimates, and bands are 95% confidence intervals. Sample includes all districts matched to either 14 or 20 fiscal years.

Democratic Vote Share Dependence of estimate on bandwidth 1 .5 0 -.5 -1 -1.5 0 5 10 15 20 Bandwidth 77 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

3.6 Appendix

The number of districts represented in the version of the FAADS data from Bickers and Stein varies from year to year. Some of this variation is expected since a few districts may have no new federal outlays in a given fiscal year or may have only received assistance that cannot be matched to one district. However, the number of districts in the data changes notably after 1997. Before

1997, approximately 420 different districts appear in the data. In 1997 and later, this number drops to approximately 270.

The main results we present in the paper include all districts matched to either all 20 years of spending data or to the 14 years before 1997. As long as this missingness is quasi-random and does not break around the 50% threshold in the vote share, it should not affect our estimate. To be sure, we checked the robustness of our results to different subsets of the data. Tables 3.3 and 3.4 display the results for majority party for only the districts matched to 20 years and only 14 years, respectively. Tables 3.5 and 3.6 display the results for party affiliation for districts matched to only

20 and only 14 years. Our null findings still remain in these subsets. CHAPTER 3. PARTY EFFECTS ON OUTLAYS 78 Table 3.3: Regressions of ln(expenditures) on majority party vote share: columns 1, 2, and 3 use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth of 12. Sample includes all districts matched to 20 fiscal years. (1) (2) (3) (4) (5) (6) (7) IK IKbs IKb2 OB RD6 RD3 RD12 Pooled -0.309 -0.309 -0.309 3.006∗∗∗ -0.215 -0.309 -0.126 (-1.27) (-1.34) (-1.18) (6.14) (-1.27) (-1.26) (-1.06) 1983 -0.669 -0.669 -0.669 5.373∗∗∗ -0.544 -1.046 0.128 (-1.26) (-0.90) (-1.19) (6.08) (-1.07) (-1.11) (0.31) 1984 0.0178 0.0178 0.0178 5.591∗∗∗ 0.0218 -0.168 0.145 (0.02) (0.02) (0.02) (9.21) (0.03) (-0.15) (0.29) 1985 0.0377 0.0377 0.0377 5.734∗∗∗ 0.0322 -0.113 0.0556 (0.06) (0.04) (0.06) (7.84) (0.05) (-0.11) (0.12) 1986 -1.199∗ -1.199 -1.199 5.386∗∗∗ -1.094 -1.978 -0.396 (-1.97) (-0.74) (-1.69) (4.00) (-1.75) (-1.82) (-0.85) 1987 -1.552∗ -1.552 -1.552 4.539∗∗∗ -1.268∗ -2.383∗ -0.368 (-2.50) (-1.10) (-1.93) (3.93) (-2.02) (-2.32) (-0.77) 1988 -0.597 -0.597 -0.597 3.816∗∗∗ -0.682 -0.800 -1.041∗ (-1.39) (-0.20) (-0.21) (5.16) (-1.27) (-1.92) (-2.39) 1989 -0.421 -0.421 -0.421 3.872∗∗∗ -0.607 -0.477 -0.710∗ (-1.70) (-0.31) (-0.35) (5.68) (-1.49) (-1.22) (-2.11) 1990 0.795 0.795 0.795 5.213∗∗∗ 0.580 1.385 -0.201 (1.29) (0.39) (1.04) (6.43) (0.90) (1.67) (-0.39) 1991 0.248 0.248 0.248 5.285∗∗∗ 0.0970 0.689 -0.634 (0.45) (0.34) (0.40) (6.35) (0.17) (0.88) (-1.36) 1992 0.459 0.459 0.459 5.463∗∗∗ 0.526 -0.225 -0.378 (0.71) (0.03) (0.40) (6.14) (0.78) (-0.40) (-0.80) 1993 -0.0657 -0.0657 -0.0657 5.644∗∗∗ -0.0237 -0.798 -0.111 (-0.12) (-0.01) (-0.06) (7.28) (-0.04) (-1.94) (-0.25) 1994 -0.197 -0.197 -0.197 5.310∗∗∗ -0.187 -0.670 -0.156 (-0.41) (-0.35) (-0.38) (12.32) (-0.41) (-1.01) (-0.51) 1995 -0.0228 -0.0228 -0.0228 5.059∗∗∗ -0.00377 -0.534 -0.0128 (-0.05) (-0.03) (-0.05) (7.33) (-0.01) (-0.98) (-0.05) 1996 0.202 0.202 0.202 4.859∗∗∗ 0.275 0.242 0.438∗ (1.03) (0.81) (0.80) (9.95) (1.36) (1.06) (2.01) 1997 -0.850 -0.850 -0.850 4.379∗∗∗ -0.529 -1.232 0.237 (-0.97) (-0.87) (-0.96) (7.39) (-0.67) (-1.07) (0.45) 1998 -0.265 -0.265 -0.265 5.935∗∗∗ -0.244 -0.212 0.0413 (-0.49) (-0.44) (-0.46) (7.89) (-0.44) (-0.39) (0.09) 1999 -0.837 -0.837 -0.837 5.417∗∗∗ -0.757 -0.301 0.0181 (-1.65) (-0.94) (-0.87) (6.11) (-1.28) (-0.57) (0.04) 2000 -0.431 -0.431 -0.431 5.777∗∗∗ -0.461 0.116 -0.754 (-0.69) (-0.49) (-0.63) (7.49) (-0.69) (0.08) (-0.94) 2001 -0.679 -0.679 -0.679 5.797∗∗∗ -0.680 -0.0167 -0.602 (-1.01) (-0.44) (-0.52) (7.80) (-0.96) (-0.01) (-0.88) 2002 2.079 2.079 2.079 5.505∗∗∗ 1.929 3.675∗∗ 0.833 (1.58) (1.23) (1.46) (8.28) (1.41) (2.79) (0.85) t statistics in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 79 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Table 3.4: Regressions of ln(expenditures) on majority party vote share: columns 1, 2, and 3 use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth of 12. Sample includes all districts matched to 14 fiscal years. (1) (2) (3) (4) (5) (6) (7) IK IKbs IKb2 OB RD6 RD3 RD12 Pooled -0.539 -0.539 -0.539 3.121∗∗∗ 0.0354 -0.601 -0.162 (-1.61) (-1.14) (-1.52) (6.12) (0.13) (-1.73) (-0.88) 1983 1.219 1.219 1.219 6.352∗∗∗ 1.219 0.719 0.608 (1.91) (1.01) (1.32) (6.31) (1.71) (1.60) (1.13) 1984 2.916∗∗∗ 2.916 2.916 4.101∗∗∗ 1.195 0 -1.437 (5.30) (0.80) (0.80) (6.17) (1.00) (.) (-1.02) 1985 4.827∗∗∗ 4.827 4.827 3.849∗∗∗ 2.525 0 -0.613 (7.54) (0.99) (1.10) (5.24) (1.83) (.) (-0.40) 1986 -1.380∗ -1.380 -1.380 5.972∗∗∗ -1.378 0.501∗∗ -1.041 (-2.13) (-0.61) (-1.20) (5.65) (-1.82) (2.68) (-1.61) 1987 -1.313∗ -1.313 -1.313 6.323∗∗∗ -1.377 0.332 -1.483∗ (-1.98) (-0.77) (-0.93) (6.76) (-1.78) (1.38) (-2.16) 1988 -0.524 -0.524 -0.524 6.290∗∗∗ -0.504 -1.390 -0.460 (-0.70) (-0.29) (-0.19) (6.11) (-0.54) (-0.81) (-0.78) 1989 -0.148 -0.148 -0.148 7.294∗∗∗ -0.240 -1.226 -0.00812 (-0.21) (-0.07) (-0.17) (7.65) (-0.25) (-0.73) (-0.01) 1990 -0.210 -0.210 -0.210 3.110∗ 0.324 0 -0.536 (-0.14) (-0.04) (.) (2.13) (0.14) (.) (-0.41) 1991 0 0 0 0 0.210 0 -0.120 (.) (.) (.) (.) (0.11) (.) (-0.10) 1992 0.828 0.828 0.828 4.465∗∗∗ 0.356 0.416 0.0421 (0.75) (0.17) (0.49) (4.13) (0.39) (0.24) (0.11) 1993 0.877 0.877 0.877 4.309∗∗∗ 0.414 1.148 -0.145 (1.14) (0.16) (0.48) (3.87) (0.62) (0.88) (-0.36) 1994 -0.0865 -0.0865 -0.0865 4.954∗∗∗ 0.164 -1.313∗∗∗ 0.145 (-0.11) (-0.05) (-0.05) (5.87) (0.19) (-4.56) (0.28) 1995 0.00403 0.00403 0.00403 5.693∗∗∗ 0.0157 -0.611 0.103 (0.01) (0.00) (0.00) (6.98) (0.02) (-0.83) (0.24) 1996 0.253 0.253 0.253 5.168∗∗∗ 0.291 0.0533 0.117 (0.64) (0.34) (0.53) (5.77) (0.69) (0.11) (0.33) t statistics in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 CHAPTER 3. PARTY EFFECTS ON OUTLAYS 80 Table 3.5: Regressions of ln(expenditures) on Democratic vote share: columns 1, 2, and 3 use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth of 12. Sample includes all districts matched to 20 fiscal years. (1) (2) (3) (4) (5) (6) (7) IK IKbs IKb2 OB RD6 RD3 RD12 Pooled 0.244 0.244 0.244 3.698∗∗∗ 0.152 0.292 -0.0385 (1.13) (0.94) (1.06) (8.78) (0.90) (1.20) (-0.32) 1983 -0.669 -0.669 -0.669 5.373∗∗∗ -0.544 -1.046 0.128 (-1.26) (-0.86) (-1.11) (6.76) (-1.07) (-1.11) (0.31) 1984 0.0178 0.0178 0.0178 5.591∗∗∗ 0.0218 -0.168 0.145 (0.02) (0.02) (0.02) (8.73) (0.03) (-0.15) (0.29) 1985 0.0377 0.0377 0.0377 5.734∗∗∗ 0.0322 -0.113 0.0556 (0.06) (0.05) (0.06) (7.91) (0.05) (-0.11) (0.12) 1986 -1.199∗ -1.199 -1.199∗ 5.386∗∗∗ -1.094 -1.978 -0.396 (-1.97) (-0.76) (-2.34) (4.73) (-1.75) (-1.82) (-0.85) 1987 -1.552∗ -1.552 -1.552∗ 4.539∗∗∗ -1.268∗ -2.383∗ -0.368 (-2.50) (-0.74) (-1.97) (3.90) (-2.02) (-2.32) (-0.77) 1988 -0.597 -0.597 -0.597 3.816∗∗∗ -0.682 -0.800 -1.041∗ (-1.39) (-0.63) (-0.22) (6.20) (-1.27) (-1.92) (-2.39) 1989 -0.421 -0.421 -0.421 3.872∗∗∗ -0.607 -0.477 -0.710∗ (-1.70) (-0.16) (-0.15) (5.45) (-1.49) (-1.22) (-2.11) 1990 0.795 0.795 0.795 5.213∗∗∗ 0.580 1.385 -0.201 (1.29) (0.88) (1.01) (6.20) (0.90) (1.67) (-0.39) 1991 0.248 0.248 0.248 5.285∗∗∗ 0.0970 0.689 -0.634 (0.45) (0.37) (0.42) (6.59) (0.17) (0.88) (-1.36) 1992 0.459 0.459 0.459 5.463∗∗∗ 0.526 -0.225 -0.378 (0.71) (0.15) (0.44) (5.73) (0.78) (-0.40) (-0.80) 1993 -0.0657 -0.0657 -0.0657 5.644∗∗∗ -0.0237 -0.798 -0.111 (-0.12) (-0.08) (-0.08) (8.26) (-0.04) (-1.94) (-0.25) 1994 -0.197 -0.197 -0.197 5.310∗∗∗ -0.187 -0.670 -0.156 (-0.41) (-0.38) (-0.41) (11.21) (-0.41) (-1.01) (-0.51) 1995 -0.0228 -0.0228 -0.0228 5.059∗∗∗ -0.00377 -0.534 -0.0128 (-0.05) (-0.05) (-0.06) (9.04) (-0.01) (-0.98) (-0.05) 1996 -0.202 -0.202 -0.202 4.859∗∗∗ -0.275 -0.242 -0.438∗ (-1.03) (-1.00) (-0.94) (6.78) (-1.36) (-1.06) (-2.01) 1997 0.850 0.850 0.850 4.379∗∗∗ 0.529 1.232 -0.237 (0.97) (0.91) (0.97) (7.89) (0.67) (1.07) (-0.45) 1998 0.265 0.265 0.265 5.935∗∗∗ 0.244 0.212 -0.0413 (0.49) (0.37) (0.38) (8.12) (0.44) (0.39) (-0.09) 1999 0.837 0.837 0.837 5.417∗∗∗ 0.757 0.301 -0.0181 (1.65) (1.24) (1.62) (7.01) (1.28) (0.57) (-0.04) 2000 0.431 0.431 0.431 5.777∗∗∗ 0.461 -0.116 0.754 (0.69) (0.28) (0.35) (7.26) (0.69) (-0.08) (0.94) 2001 0.679 0.679 0.679 5.797∗∗∗ 0.680 0.0167 0.602 (1.01) (0.57) (0.86) (7.91) (0.96) (0.01) (0.88) 2002 -2.079 -2.079 -2.079 5.505∗∗∗ -1.929 -3.675∗∗ -0.833 (-1.58) (-1.12) (-1.44) (6.33) (-1.41) (-2.79) (-0.85) t statistics in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 81 CHAPTER 3. PARTY EFFECTS ON OUTLAYS

Table 3.6: Regressions of ln(expenditures) on Democratic vote share: columns 1, 2, and 3 use IK optimal bandwidth OB (columns 2 and 3 bootstrap SE, where 2 recalculates OB in each resample and 3 does not); column 4 is OB; column 5 uses a bandwidth of 6 percentage points, column 6 uses a bandwidth of 3, and column 7 uses a bandwidth of 12. Sample includes all districts matched to 14 fiscal years. (1) (2) (3) (4) (5) (6) (7) IK IKbs IKb2 OB RD6 RD3 RD12 Pooled -0.364 -0.364 -0.364 3.423∗∗∗ -0.0337 -0.593 -0.190 (-1.07) (-0.97) (-1.25) (7.06) (-0.12) (-1.60) (-1.03) 1983 1.219 1.219 1.219 6.352∗∗∗ 1.219 0.719 0.608 (1.91) (1.38) (1.51) (8.37) (1.71) (1.60) (1.13) 1984 2.916∗∗∗ 2.916 2.916 4.101∗∗∗ 1.195 0 -1.437 (5.30) (0.14) (0.84) (4.73) (1.00) (.) (-1.02) 1985 4.827∗∗∗ 4.827 4.827 3.849∗∗∗ 2.525 0 -0.613 (7.54) (0.19) (1.24) (4.00) (1.83) (.) (-0.40) 1986 -1.380∗ -1.380 -1.380 5.972∗∗∗ -1.378 0.501∗∗ -1.041 (-2.13) (-0.97) (-0.92) (6.49) (-1.82) (2.68) (-1.61) 1987 -1.313∗ -1.313 -1.313 6.323∗∗∗ -1.377 0.332 -1.483∗ (-1.98) (-0.91) (-1.04) (6.35) (-1.78) (1.38) (-2.16) 1988 -0.524 -0.524 -0.524 6.290∗∗∗ -0.504 -1.390 -0.460 (-0.70) (-0.24) (-0.24) (7.10) (-0.54) (-0.81) (-0.78) 1989 -0.148 -0.148 -0.148 7.294∗∗∗ -0.240 -1.226 -0.00812 (-0.21) (-0.06) (-0.11) (6.61) (-0.25) (-0.73) (-0.01) 1990 -0.210 -0.210 -0.210 3.110∗∗ 0.324 0 -0.536 (-0.14) (-0.10) (.) (2.62) (0.14) (.) (-0.41) 1991 0 0 0 0 0.210 0 -0.120 (.) (.) (.) (.) (0.11) (.) (-0.10) 1992 0.828 0.828 0.828 4.465∗∗∗ 0.356 0.416 0.0421 (0.75) (0.10) (0.62) (3.84) (0.39) (0.24) (0.11) 1993 0.877 0.877 0.877 4.309∗∗∗ 0.414 1.148 -0.145 (1.14) (0.56) (0.54) (4.29) (0.62) (0.88) (-0.36) 1994 -0.0865 -0.0865 -0.0865 4.954∗∗∗ 0.164 -1.313∗∗∗ 0.145 (-0.11) (-0.04) (-0.05) (6.46) (0.19) (-4.56) (0.28) 1995 0.00403 0.00403 0.00403 5.693∗∗∗ 0.0157 -0.611 0.103 (0.01) (0.00) (0.00) (6.21) (0.02) (-0.83) (0.24) 1996 -0.253 -0.253 -0.253 5.168∗∗∗ -0.291 -0.0533 -0.117 (-0.64) (-0.37) (-0.38) (7.45) (-0.69) (-0.11) (-0.33) t statistics in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 82

Chapter 4

Malapportionment in the U.S. House of Representatives: The Effect of Census Reapportionment on the Distribution of Federal Funds to States 83 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

4.1 Introduction

Every ten years, following the decennial census, the number of seats in the U.S. House of Repre- sentatives is reapportioned among states based on updated state population counts. The reappor- tionment process is meant to insure that the House remains the chamber of equal representation of citizens that the Framers intended it to be. However, the fact that a sizable proportion of states gains or loses representatives after every census is evidence that the House becomes increasingly malapportioned over the course of a given decade. It is well established that malapportionment in the Senate affects the distribution of federal outlays to states. Do sharp changes in representation in the House induced by reapportionment also affect the distribution of federal funds? In the fol- lowing analysis, I examine that patterns of relative state representation in the House and relative federal spending in states from 1972 to 2004 and then use a difference-in-differences approach to identify the effect of a change in representation following the 1970, 1980, 1990, and 2000 Censuses on the distribution of federal outlays in the early year of each decade.

I conclude that the House does indeed exhibit a cyclical pattern of malapportionment in the years between the censuses, but I cannot conclude with statistical certainty that states that gained or lost representatives due to reapportionment received substantially more or less federal funds than they would have had reapportionment not occurred. This null finding is in contrast to any theory that connects representation in Congress to bargaining over distributive goods (e.g., An- solabehere, Snyder and Ting, 2003) and is in contrast to empirical evidence on the positive effect of representation on funding from the Senate (e.g., Lee, 2000), state legislatures (Ansolabehere,

Gerber and Snyder, 2002), and other countries (e.g., Horiuchi and Saito, 2003). Rather than indi- cating a lack of support for the representation-funding connection, perhaps the null finding herein instead illustrates the limitations of the difference-in-differences strategy for this particular data, since the effect is identified in each decade by at most 11 states that experience a positive change in representation and at most 13 states that experience a negative change. CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 84

4.2 Apportioning the House

Members of the U.S. House of Representatives are supposed to represent equal numbers of con- stituents, as is evident from the debates at the Constitutional Convention, the text and amend- ments of the Constitution, and, more recently, Supreme Court decisions on congressional appor- tionment.

James Madison, Alexander Hamilton, Benjamin Franklin, and James Wilson, were among the delegates who argued that members of the House should be apportioned to states based on state population and that districts within states should have equal populations. As James Wilson put it, representatives “of different districts ought clearly to hold the same proportion to each other, as their respective constituents hold to each other” (cited in McKay(1965, p. 93)). Article 1, Section

2 of the Constitution establishes the House of Representatives and provides that “Representa- tives...shall be apportioned among several States which may be included in this Union, according to their respective Numbers.” The 14th Amendment, which amended Article 1 to stipulate that all people count as whole persons, reiterates the equal apportionment requirement. Indeed, the original purpose of the Census was to ensure that congressional representatives were equally ap- portioned to states based on updated counts of states’ populations. In addition to establishing the House of Representatives, Article 1, Section 2 also calls for a decennial census, an “Enumer- ation...made within every subsequent term of ten years.” The first Census, conducted in 1790, increased the total number of representatives from the original 65 to 106, based on the Article 1 provision that “The Number of Representatives shall not exceed one for every thirty Thousand.”

The number of representatives was linked to the U.S. population until 1911, when Congress fixed

House membership at 435 (The Constitution, the Congress, and the Census: Representation and Reap- portionment, 1999).

While the House no longer increases in size, representatives are still shifted from one state to another following the Census. For example, after the 2000 Census, eight states gained representa- tives and ten states lost representatives. Table 4.1 shows the number of representatives gained and lost by states affected by reapportionment in the 1970s, 1980s, 1990s, and 2000s. Some states, like 85 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

Alabama, were only affected once over the past 40 years. Other states, like Arizona, gained rep- resentatives following each of the four Censuses, or, like Pennsylvania, lost representatives each time.

State legislatures are responsible for re-drawing congressional districts using the new Cen- sus information. Because redistricting is an inherently political process, it is typically fraught with conflict. While much recent disaccord has involved the shape of congressional districts, his- torically, conflict involved the size of districts, especially in states where rural areas were over- represented and urban areas were under-represented (Cortner, 1970). In the 1960s, the Supreme

Court intervened and held state legislative districts to a “one-man-one-vote” standard and ap- plied this standard to congressional redistricting in two of the Supreme Courts wave of reappor- tionment decisions. The Court ruled in Wesberry v. Sanders (376 U.S. 1 (1964)) that Article 1, Section

2 required state legislatures to draw congressional districts such that “as nearly as practicable one man’s vote in a congressional election is to be worth as much as another’s.” In 1967, the Court further articulated in Kirkpatrick v. Preisler that states must “make a good faith effort to achieve precise mathematical equality” in the redistricting process (385 U.S. 450).

Since the Supreme Court clarified the meaning of Article 1, Section 2, state legislatures have generally been compliant in redrawing equally-sized congressional districts every ten years using updated Census data on local area populations. However, due to the infrequent occurrence of the

Census, it is possible that districts and states become increasingly malapportioned over the course of the decade because of geographically differential demographic trends, like changing migration and fertility. Thus, even though state legislatures draw districts with “mathematical equality,” there is no reason to expect that districts stay that way.

4.3 How Malapportioned is the House?

To understand the pattern of malapportionment in the House over time, I constructed a relative representation index (RRI), following Ansolabehere, Gerber and Snyder(2002). The RRI is the CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 86 ratio of state representatives to population divided by the ratio of total representatives to total

U.S. population, as shown below, where i indexes states and t indexes years.

  repsit popit RRIit = (4.1)  435  uspopt

When a state’s RRI equals one, this is equivalent to apportionment under a “one person, one vote” rule. When a state’s RRI is less than one, it is underrepresented, and when a state’s RRI is greater than one, it is overrepresented. For example, a state with an RRI of 1.2 has 20 percent more representation in the House than would be expected based on a perfectly equal apportionment scheme. Over the time period 1972 to 2004, the average state RRI is 1.009, quite close to the equal representation ideal, with a of 0.120.

However, the variation in RRI increases in every year following a given reapportionment until the next decade’s reapportionment occurs. Figures 4.1-4.4 show RRIs for each state in the years leading up to and immediately following the 1970, 1980, 1990, and 2000 Censuses. The long dashed line in each figure follows the average RRI of the states that gained representatives af- ter the relevant Census (“gain states”), and the short dashed line represents the average RRI of the states that lost representatives (“lose states”). The solid line is the average for all other states.1

Though the Census is taken in every year ending in zero, state legislatures redraw districts in years ending in the digit 1, in time for congressional elections in years ending in 2. Thus, the

first congressional sessions affected by reapportionment are the ones that begin in years ending in

3. As expected, then, the three RRI lines converge in 1973, 1983, 1993, and 2003, during the first sessions of the 93rd, 98th, 103rd, and 108th Congresses. In these years, the mean gain state RRI was .99, and the mean lose state RRI was .98. (The mean RRI in states with no change was 1.02).

With the exception of the reapportionment-induced convergence in RRI between the years ending in 2 and those ending in 3, the general pattern depicted in Figures 4.1-4.4 is one of increas-

1To be clear, the “gain states” and “lose states” are not the same set of states in Figures 4.1-4.4. They are the states that experienced a change in representation in that particular Census, as listed in Table 4.1. The time series in Figure 4.1 and Figure 4.4 are truncated because my data only goes back to 1972 and up to 2004. 87 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE ing malapportionment. Over the course of the decade preceding reapportionment, gain states became more and more underrepresented in the House, while lose states became more and more overrepresented. In the most malapportioned years, those ending in 2, the mean gain state RRI was .83, and the mean lose state RRI was 1.14. The most underrepresented state in the years right before reapportionment was Nevada in 2002, with an RRI of .61, or 39 percent less representation than what would be expected under a “one-person-one-vote” rule. The most overrepresented state was South Dakota in 1982, with an RRI of 1.54, or 54 percent more representation than what would be expected under a “one-person-one-vote” rule.2

As the figures discussed above show, at least in recent decades, the House of Representatives does become increasingly malapportioned over time, until representatives are reallocated based on new Census population counts. Does this cyclical malapportionment substantively affect the pattern of distribution of federal outlays to states? Before discussing the data, I review the litera- ture on representation in legislatures and funding to sub-national units.

4.4 The Apportionment-Funding Connection

The empirical literature on the impact of malapportionment on spending patterns is not large, but it is conclusive. The distribution of federal funds to U.S. states bears the imprint of malappor- tionment in the Senate (e.g., Atlas et al., 1995, Lee, 1998, 2000, Lee and Oppenheimer, 1999). Prior to the Supreme Court’s 1960s apportionment decisions, malapportionment in state legislatures affected the distribution of state funds to localities (Ansolabehere, Gerber and Snyder, 2002). In countries with bicameral legislatures, the distribution of national monies to sub-national units is influenced by malapportionment in the upper chamber (e.g., Horiuchi and Saito, 2003). In short, more representation means more money. Should we expect the same in the U.S. House?

2Interestingly, Montana, which lost one representative following the 1990 reapportionment, went from being the most overrepresented state in 1992, with an RRI of 1.42, to being the most underrepresented state in 1993, with an RRI of .71. While this is an extreme case, it illustrates the fact that, even right after reapportionment, any given state might deviate from an RRI of 1 because representatives are not fully divisible goods. CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 88

Lee(2000) argues that malapportionment in the Senate affects the distribution of federal money because, to a coalition builder, each Senators vote counts the same but a small state Senators vote is cheaper. Given that relative representation is both more variable and more entrenched in the

Senate than in the House, one may not expect that this mechanism would apply to the House of

Representatives. However, even if the logic of bargaining in the House is dissimilar to the logic of bargaining in the Senate, malapportionment in the House may still affect the distribution of federal spending because it induces sudden and sharp changes in representation every ten years due to reapportionment. In other words, if the mere number of House representatives a state has affects the flow of federal outlays to that state, then malapportionment in the House will affect outlays through Census reapportionment, if not through a Senate-style bargaining game. Should we expect that the number of representatives a state has in the House is causally related to federal outlays to states?

There are certainly many legislative bargaining theories that predict such a connection. An- solabehere, Snyder and Ting(2003) show that, when the House proposes a bill and the Senate considers it under a closed rule, then the payoff for all House members is equal. Similarly, Lee and Oppenheimer(1999) argue that House members should want to divide funds approximately equally across districts. This description of bargaining in the House is consistent with univer- salistic, committee-centered theories of legislative organization (e.g., Collie, 1988, Mayhew, 1974,

Shepsle and Weingast, 1981, Weingast and Marshall, 1988), which predict that policy will tend to be particularistic in the benefits conferred but will confer those benefits to all members districts in a “universal” manner.

If any of theories is correct, then total of number of representatives a state has should be cen- trally important to the amount of federal money distributed to that state. When the number of representatives changes from one year to the next, then the amount of federal money should re- spond in the direction of the change. 89 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

4.5 Relative Federal Spending in the States

To take an initial look at the pattern of federal outlays to states around the 1970, 1980, 1990, and

2000 Censuses, I constructed a relative spending index (RSI), analogous to RRI, again following

Ansolabehere, Gerber and Snyder(2002). RSI for a given state is the ratio of federal money dis- tributed to a state and the state’s population divided by ratio of the total amount of federal money distributed to all states and the U.S. population, as shown below, where i indexes states and t indexes years.3

  $it popit RSIit =  P  (4.2) i $it uspopt

Figures 4.5-4.8 show RSIs for each state in the years leading up to and immediately following the 1970, 1980, 1990, and 2000 Censuses. The long dashed line in each figure follows the average

RSI of the gain states, and the short dashed line represents the average RSI of the lose states. The black line is the average for all other states.

In a given session of Congress, the House and Senate appropriations committees typically re- port bills that authorize discretionary spending for the upcoming fiscal year by July, and Congress passes the bills by October, the start of the new fiscal year (Schick, 2000). New nondiscretionary formula programs also usually begin in the year following their passage, at the earliest. Thus, any change in the allocation of federal funds to states caused by reapportionment beginning in years ending in 3 would not take effect until fiscal years ending in 4. Federal monies spent in fis- cal years ending in 3 were authorized during Congressional sessions in years ending in 2, before reapportionment took effect. Thus, reapportionment-induced changes in representation between years 2 and 3 should cause changes in federal outlays to states between years 2 and 4. Federal out- lays to the underrepresented gain states should increase after reapportionment relative to states

3The data on federal outlays are gathered by the Census Bureau and published in the Consolidated Federal Funds Report (CFFR) and the Statistical Abstract of the United States. Total federal outlays include federal grants-in-aid to states, government employee wages, transfer payments from means-tested and entitlement programs, and support for various federal programs. CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 90 with no change in representation. Outlays to the overrepresented lose states should decrease after reapportionment relative to states with no change in representation.

While the average gain state and lose state RSI does vary over time, such a pattern around the years ending in 2 and 4 is not apparent in Figures 4.5-4.8. In each figure, the change in RSI for gain and lose states is parallel to that of other states or appears to be small.

In order to confirm that no significant relative changes in federal outlays to states that were reapportioned occurred, I use a difference-in-differences model to measure the changes in the following section.

4.6 Method and Results

Census reapportionment is a natural experiment around which to test the treatment effect of a change in representation. Because only some states gain or lose representatives due to reappor- tionment while other states retain the same number of representatives, the states that experience no change can act as a control group against which to compare the effect of a change in repre- sentation. A situation in which there exists a pre-treatment measurement and a post-treatment measurement of an outcome in a treated group and in a control group is the ideal situation in which to apply a difference-in-differences (DID) modeling approach.

In the simplest case of DID, the treatment effect (δ) is the difference between the average change in the outcome (y) in the treated group and the average change in the outcome of the control group.

δ = (¯y treated,after − y¯ treated,before) − (¯y control,after − y¯ control,before) (4.3)

This is equivalent to estimating the following with , where Tafter is an indicator variable for post-treatment observations and T reatedi is an indicator variable for treat- ment group observations. 91 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

yit = β0 + β1Tafter + β2T reatedi + δTafter ∗ T reatedi + it (4.4)

In the above specification, β0 is a baseline average outcome, β1 is the average change in the outcome in the control group, β2 is the average pre-treatment difference in the outcome between the treated and control groups, and δ is the difference-in-differences, the change in the outcome attributable to the treatment.

To test the treatment effect of a change in representation due to Census reapportionment on the amount of federal funding distributed to states, I compare the pattern of spending in the latest pre-treatment years (1972, 1982, 1992, and 2002) to that in the first treated year (1974, 1984, 1994,

2004). I begin by estimating the following equation using ordinary least squares for each set of years, where $it is the amount of federal money distributed to a state in a given year, T y4 is an indicator variable for observations on states in years ending in 4, rcposi is an indicator variable for states that gained representatives following reapportionment, and rcnegi is an indicator variable for states that lost representatives following reapportionment.4

log($it) = β0 + β1Ty4 + β2rcposi + β3rcnegi + δ1Ty4 ∗ rcposi + δ2Ty4 ∗ rcnegi + it (4.5)

Because there are two treatment groups, states that gained representatives and states that lost representatives, there are two difference-in-differences estimates, δ1 and δ2, which respectively measure the effect of a gain in representation and a loss in representation on the amount of federal money distributed to a state. Since the outcome variable is logged, the coefficients represent per- centage effects. The first columns in Tables 4.2 to 4.5 show the results of this estimation for each of the four Censuses.

With the exception of lose states in 1970, there is a statistically significant difference in log fed- eral expenditures between gain states relative to no change states and between lose states relative

4 2 it is a random error component. I assume it ∼ N(0, σ ). CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 92 to no change states in the years just before reapportionment (indicated by the coefficients on “gain state” and “lose state”).5

However, the change in these gaps before and after reapportionment, or the difference-in- differences estimates, are not statistically significant (indicated by the coefficients on the interac- tion terms). States that gained representatives received no more federal money and states that lost representatives received no less federal money than they would have had reapportionment not occurred.

The standard differences-in-differences model like the one discussed above assumes that, in the absence of the treatment, there would be no difference in the change in the outcome between the treated and control groups (δ = 0)(Wooldridge, 2005). This “parallelism” assumption can be relaxed by estimating other possibly confounding factors to ensure balance on these covari- ates between the treatment and control groups. Estimating other parameters also has the added advantage of making the estimates of the difference-in-differences coefficients more precise. The second columns in Tables 4.2 to 4.5 show the result of a difference-in-differences estimation con- trolling for log state population, log personal income, percent unemployed, percent black, percent aged 5 to 17, percent over 65, percent with a high school education, and percent with a bachelors degree or higher.6

Not surprisingly, once state demographic characteristics are controlled for, the pre-treatment differences in federal funding between states that gained or lost representatives and states that ex- perienced no change lose significance, with the exception of gain states in 1990. The time trend in funding in the control group is significant: states that experienced no change in representation re- ceived between .083 (in 2004) and .164 (in 1984) percent more federal money on average than they

5Note, the direction of these gaps is always positive in the regressions, but sometimes the average treated states funding falls below the control states funding in Figures 4.5 to 4.8. This is because the dependent variable in the regres- sions is log federal expenditures, while the Figures show RSI, which essentially controls for differences in population across the groups. Using RSI as the dependent variable in the basic DID regressions produces gaps that are of the same sign as those that appear in the Figures. The DID with covariates models are also more comparable to the Figures since they control for population, among other things. 6State population is a Census estimate for July 1 of the relevant year. All other data is from the Census Bureau, except number unemployed, which is from the Bureau of Labor Statistics, and personal income, which is from the Bureau of Economic Analysis. Because I include demographic measures for reasons of balance and not substantive interest, I omit them from the table. Full regression results are available upon request. 93 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE did in the fiscal year just before reapportionment. However, while the confidence intervals around the difference-in-differences coefficients are several times smaller than in the models without co- variates, the estimated treatment effects of a change in representation still fail to achieve statistical significance.

Neither of the models discussed above exploits the panel nature of the data. Each assumes that the observations are random draws from treated and control groups before and after the treatment.

However, since federal funding is measured in the same states before and after reapportionment,

I can control for any unobserved time-invariant state characteristics by estimating a fixed effects least squares dummy variable model. Columns 3 and 4 in Tables 4.2 to 4.5 show the results of these models for each reapportionment without and including the time-varying covariates in model 2.7

In both panel models in each decade, adding fixed effects does increase the precision with which the treatment effect of a positive and a negative change in representation due to reappor- tionment is measured, compared to the non-panel models. However, as in the other models, neither is statistically significant.8 9

4.7 Discussion and Conclusion

Using a year-by-year measure of state-level representation in the U.S. House of Representatives,

I showed that the House becomes increasingly malapportioned in every year following a given

Census reapportionment until the next reapportionment corrects, as much as possible, any dis- parity in representation. This pattern is to be expected since the Census occurs only every ten

7 Because rcposi (gain state status) and rcnegi (lose state status) are time-invariant, they cannot be estimated sepa- rately in a fixed effects model. 8In all panel models, an F-test rejects the null hypothesis that the state effects all equal zero, with p = 0.00. A Hausman specification test indicates that the effects are more appropriately modeled as fixed rather than random, with p at most .05. 9I tried other ways of measuring spending. For example, I used Federal Aid to States (published in the Statistical Abstract of the United States) as the independent variable and RSI as the independent variable, calculated using both Federal Expenditures and Federal Aid. I also tried pooling the data from each decade together. None of these strategies produced significant results. In the Appendix, I include graphs and regression results using spending data from the Federal Assistance Awards Data System (FAADS). Because data only exist for the years 1983 to 2002, I could only test for a treatment effect around the 1990 Census. This is the only data in which I found a positive treatment effect using the estimation strategies described herein. CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 94 years, while demographic patterns like migration and fertility are constantly changing across states. Though the House was intended to be the chamber of equal representation, the condi- tion is not (and is not constitutionally required to be) monitored between every Congressional session.

Unlike malapportionment in the Senate, malapportionment in the House affects different sets of states in each decade. While some states lost or gained representatives in each reapportionment

I examined here, most were affected only a few times if ever at all. Thus, not only is malappor- tionment in the House periodically corrected, it is less persistent for any given state over time than malapportionment in the Senate because it is induced by relative population change and not population itself. In the House, states may become under (over) represented if the rate of increase in their populations is large (small) relative to that of other states, but not simply because they are large (small) states.

However, malapportionment in the House may still be normatively concerning if it indeed affects the amount of federal outlays states receive. If it does so, then the fastest growing states would be comparatively under-funded while the slowest growing or shrinking states would be comparatively over-funded.

Using a difference-in-differences model, I found no evidence that malapportionment in the

House affects federal outlays to states. This is contrary to the predictions of universalistic bar- gaining theories and to evidence from the U.S. Senate, U.S. state legislatures, and other countries’ legislatures. Thus, the null finding could simply be a function of the particular data and estimation strategy I chose.

The DID models I estimated are strict tests of the treatment effect of representation on spend- ing because I look for an effect only in the first year after reapportionment. Thus any change in spending would have to occur right away. If, for example, new legislators do not have the same bargaining power as older legislators (contrary to the assumptions of universalistic bargaining models) either because they have undesirable committee seats, have no potential logrolls to offer, or simply do not yet know the ropes, then it may take longer than one Congressional session for 95 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE the effect of increased representation to take hold. On the flip side, it may also take longer than one session for the effect of a decrease in representation to take hold. If most spending measured in the CFFR data is not actually pork barrel or even discretionary, but instead tied to formulas that require amending laws to change, then again, the decreased size of a states delegation may not immediately matter. Similarly, if the effect exists in the first year but is small, then the test I conducted may not be statistically powerful enough to detect it.

However, from a modeling standpoint, identifying a difference-in-differences effect using only the time periods immediately preceding and following treatment in data in which serial correla- tion is likely to exist is the safest choice. Bertrand, Duflo and Mullainathan(2004) show that DID estimates in data that are serially correlated can have standard errors that are inconsistent, specif- ically too small, and thus can lead to false positive conclusions about the effect of a treatment.

Year-to-year federal outlays to states surely exhibit such correlation.

Indeed, using the same federal funds data as in this analysis but a different difference-in- differences estimation strategy, Elis, Malhotra and Meredith(2009) find a positive effect on fund- ing due to increased representation in the House in pooled data from the 1970s, 1990s, and 2000s that includes all pre- and post-treatment years instead of just those immediately preceding and following reapportionment.10 Unfortunately, it is not possible to know whether the difference in our findings is due to the fact that the treatment of a change in representation takes time to affect outlays, to a false positive result due to serial correlation, or some combination of the two.

In conclusion, even though Census reapportionment presents a natural experiment around which to test whether a change in representation in the House causes a corresponding change fed- eral outlays to states, I find, in this paper, no evidence for such an such effect using a conservative difference-in-differences estimation.

10They omit the 1980s because, in 1983, the Census switched from reporting total federal outlays to states in the Statistical Abstract of the United States to reporting them as part of the Consolidated Federal Funds Report, and so they could not confirm the comparability of the data for the 1980 reapportionment. When I reanalyzed their results, I found that including the 1980 reapportionment data produces small, insignificant, and inconsistently signed effects of increased representation in each of their specifications, similar to what I find here. CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 96

Figure 4.1: Relative Representation in the U.S. House by State, 1972-1982

Figure 4.2: Relative Representation in the U.S. House by State, 1973-1992 97 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

Figure 4.3: Relative Representation in the U.S. House by State, 1983-2002

Figure 4.4: Relative Representation in the U.S. House by State, 1993-2004 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 98

Figure 4.5: Relative Federal Spending in States, 1972-1982

Figure 4.6: Relative Federal Spending in States, 1973-1992 99 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

Figure 4.7: Relative Federal Spending in States, 1983-2002

Figure 4.8: Relative Federal Spending in States, 1993-2004 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 100

Table 4.1: Change in Representatives in the U.S. House Following the 1970, 1980, 1990, and 2000 Censuses 1970 1990

Arizona 1 Alabama -1 Arizona 1 Illinois -2 California 5 Iowa -1 California 7 Iowa -1 Colorado 1 New York -2 Florida 4 Kansas -1 Florida 3 North Dakota -1 Georgia 1 Kentucky -1 Texas 1 Ohio -1 North Carolina 1 Louisiana -1 Pennsylvania -2 Texas 3 Massachusetts -1 Tennessee -1 Virginia 1 Michigan -2 West Virginia -1 Washington 1 Montana -1 Wisconsin -1 New Jersey -1 New York -3 Ohio -2 Pennsylvania -2 West Virginia -1

1980 2000

Arizona 1 Illinois -2 Arizona 2 Connecticut -1 California 2 Indiana -1 California 1 Illinois -1 Colorado 1 Massachusetts -1 Colorado 1 Indiana -1 Florida 4 Michigan -1 Florida 2 Michigan -1 Nevada 1 Missouri -1 Georgia 2 Mississippi -1 New Mexico 1 New Jersey -1 Nevada 1 New York -2 Oregon 1 New York -5 North Carolina 1 Ohio -1 Tennessee 1 Ohio -2 Texas 2 Oklahoma -1 Texas 3 Pennsylvania -2 Pennsylvania -2 Utah 1 South Dakota -1 Wisconsin -1 Washington 1 Source: Census Bureau 101 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

Table 4.2: Fiscal Impact of Reapportionment Following the 1970 Census: Difference-in-Differences Estimates DID DID with Covariates Panel DID Panel DID with Covariates

1974 0.22 0.14 0.21 0.59 (-0.23 - 0.66) (0.03 - 0.25)* (0.18 - 0.25)** (0.28 - 0.90)** gain state 1.18 0.06 (0.30 - 2.06)** (-0.13 - 0.26) lose state 0.66 0.08 (-0.03 - 1.35) (-0.06 - 0.23) 1974 * gain state 0.00 -0.06 0.00 -0.03 (-1.25 - 1.25) (-0.31 - 0.19) (-0.10 - 0.10) (-0.15 - 0.09) 1974 * lose state -0.03 -0.01 -0.03 -0.03 (-1.01 - 0.95) (-0.12 - 0.18) (-0.11 - 0.05) (-0.11 - 0.06)

R2 0.18 0.97 0.81 0.86 95% confidence intervals in parentheses * significant at 5%; ** significant at 1%

Table 4.3: Fiscal Impact of Reapportionment Following the 1980 Census: Difference-in-Differences Estimates DID DID with Covariates Panel DID Panel DID with Covariates

1984 0.20 0.16 0.20 0.32 (-0.27 - 0.66) (0.06 - 0.27)** (0.17 - 0.22)** (0.21 - 0.43)** gain state 0.75 0.12 (0.13 - 1.38)* (-0.01 - 0.25) lose state 1.17 0.04 (0.53 - 1.82)** (-0.09 - 0.17) 1984 * gain state 0.00 -0.03 0.00 -0.03 (-0.89 - 0.88) (-0.18 - 0.13) (-0.05 - 0.04) (-0.10 - 0.03) 1984 * lose state -0.01 -0.01 -0.01 0.01 (-0.92 - 0.90) (-0.18 - 0.15) (-0.06 - 0.04) (-0.05 - 0.06)

R2 0.25 0.98 0.90 0.93 95% confidence intervals in parentheses * significant at 5%; ** significant at 1% CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 102

Table 4.4: Fiscal Impact of Reapportionment Following the 1990 Census: Difference-in-Differences Estimates DID DID with Covariates Panel DID Panel DID with Covariates

1994 0.10 0.09 0.10 0.10 (-0.32 - 0.52) (0.01 - 0.16)* (0.09 - 0.11)** (0.04 - 0.16)** gain state 1.52 0.01 (0.88 - 2.15)** (-0.12 - 0.15) lose state 0.94 -0.03 (0.41 - 1.47)** (-0.13 - 0.07) 1994 * gain state 0.02 0.00 0.02 0.02 (-0.87 - 0.92) (-0.16 - 0.16) (-0.003 - 0.05) (-0.01 - 0.05) 1994 * lose state 0.01 0.03 0.01 0.02 (-0.74 - 0.76) (-0.10 - 0.16) (-0.01 - 0.03) (-0.01 - 0.05)

R2 0.39 0.98 0.92 0.93 95% confidence intervals in parentheses * significant at 5%; ** significant at 1%

Table 4.5: Fiscal Impact of Reapportionment Following the 2000 Census: Difference-in-Differences Estimates DID DID with Covariates Panel DID Panel DID with Covariates

2004 0.10 0.08 0.10 0.13 (-0.31 - 0.52) (-0.00 - 0.17) (0.09 - 0.12)** (0.04 - 0.23)** gain state 1.18 -0.15 (0.53 - 1.83)** (-0.27 - -0.02)* lose state 1.02 -0.05 (0.43 - 1.61)** (-0.17 - 0.07) 2004 * gain state 0.04 -0.03 0.04 -0.01 (-0.88 - 0.95) (-0.19 - 0.13) (-0.01 - 0.08) (-0.06 - 0.05) 2004 * lose state -0.01 0.03 -0.01 0.00 (-0.84 - 0.83) (-0.11 - 0.18) (-0.04 - 0.03) (-0.04 - 0.04)

R2 0.31 0.98 0.82 0.88 95% confidence intervals in parentheses * significant at 5%; ** significant at 1% 103 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

4.8 Appendix

When I began this project, the first data I used was from the Federal Assistance Awards Data

System (FAADS), which is gathered by the Census Bureau and described in Bickers and Stein

(1992). It includes spending from all federal domestic assistance programs, for example, social security, agricultural subsidies paid to producers, research grants, and community development grants and covers the years 1983 to 2002. Unlike the CFFR data presented in the main body of the paper, FAADS does not include data on spending from federal procurements, government employee wages, and certain loan programs. FAADS data is closer to what political scientists consider to be pork or discretionary, particularistic federal spending in states.

Using the same estimation strategy as I describe in the main paper, but with the FAADS data as the dependent variable, I estimated a positive treatment effect from a gain in representation following reapportionment on the amount of federal money a state receives. The effect was statis- tically significant only in the two panel models but of approximately the same magnitude across all four specifications. The effect can also be clearly seen in the graph of RSI overtime. Between the years 1992 and 1994, the average RSI of gain states noticeably increases relative to states with no change.

Because I could only get FAADS data from 1983 to 2002, I could only test the 1990 reappor- tionment cycle. I turned to other data sources in an effort to replicate the 1990 finding in case there was something historically singular about that reapportionment. There are plenty of rea- sons to suspect so. Between 1992 and 1994, Democrats lost control of the House for the first time in decades. Also, the 1990 Census population counts were widely acknowledged to be less accu- rate than usual. And indeed, I was not able to replicate the effect I found in the 1990s FAADS data, even using 1990 data from other sources. The RSI graph and regression results using the FAADS data are below. CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE 104

Figure 4.9: Relative Federal Spending in States, 1983-2002 (FAADS Data) 105 CHAPTER 4. MALAPPORTIONMENT IN THE HOUSE

Table 4.6: Fiscal Impact of Reapportionment Following the 2000 Census: Difference-in-Differences Estimates (FAADS Data) DID DID with Covariates Panel DID Panel DID with Covariates

1994 0.24 0.21 0.24 0.20 (-0.17 - 0.66) (0.10 - 0.31)** (0.19 - 0.29)** (-0.00 - 0.40) gain state 1.51 0.07 (0.89 - 2.12)** (-0.10 - 0.25) lose state 1.11 0.13 (0.58 - 1.63)** (-0.01 - 0.27) 1994 * gain state 0.15 0.11 0.15 0.14 (-0.73 - 1.04) (-0.10 - 0.32) (0.05 - 0.26)** (0.03 - 0.26)* 1994 * lose state -0.03 -0.05 -0.03 0.00 (-0.77 - 0.71) (-0.22 - 0.13) (-0.12 - 0.06) (-0.10 - 0.10)

R2 0.43 0.97 0.81 0.85 95% confidence intervals in parentheses * significant at 5%; ** significant at 1% 106

Part II

Bibliography 107 BIBLIOGRAPHY

Bibliography

Agresti, Alan and Brent A. Coull. 1998. “Approximate Is Better than ”Exact” for Interval Estima-

tion of Binomial Proportions.” The American 52(2):119–126. URL: http://www.jstor.org/stable/2685469

Alvarez, R. Michael and Jason L. Saving. 1997a. “Congressional Committees and the Political

Economy of Federal Outlays.” Public Choice 92:55–73.

Alvarez, R. Michael and Jason L. Saving. 1997b. “Deficits, Democrats, and Distributive Benefits:

Congressional Elections and the Pork Barrel in the 1980s.” Political Research Quarterly 50:809–831.

Angrist, Joshua D. and Jorn-Steffen Pischke. 2009. Mostly Harmless : An Empirist’s

Companion. Princeton University Press.

Ansolabehere, Stephen, Alan Gerber and James Snyder. 2002. “Equal Votes, Equal Money: Court-

Ordered Redistricting and Public Expenditures in the American States.” The American Political

Science Review 96:767–777.

Ansolabehere, Stephen, James M. Snyder and Michael M. Ting. 2003. “Bargaining in Bicameral

Legislatures: When and Why Does Malapportionment Matter?” American Political Science Re-

view 97:471–481.

Arai, Mahmood. 2009. “Cluster-Robust Standard Errors using R.”. URL: http://people.su.se/ ma/clustering.pdf BIBLIOGRAPHY 108

Arceneaux, Kevin and David W. Nickerson. 2009. “Modeling Certainty with Clustered Data: A

Comparison of Methods.” Political Analysis 17:177–190.

Atlas, Cary M., Thomas W. Gillian, Robert J. Hendershott and Mark A. Zupan. 1995. “Slicing

the Federal Government Net Spending Pie: Who Wins, Who Loses, and Why.” The American

Economic Review 85:624–629.

Balla, Steven J., Eric D. Lawrence, Forrest Maltzman and Lee Sigelman. 2002. “Partisanship, Blame

Avoidance, and the Distribution of Legislative Pork.” American Journal of Political Science 46:515–

525.

Baum, Christopher F., Austin Nichols and Mark E. Schaffer. 2010. “Evaluating one-way and two-

way cluster-robust covariance matrix estimates.” United Kingdom Stata Users’ Group Meetings

2010 12.

Berry, Christopher R., Barry C. Burden and William G. Howell. 2008. “Congress, in Theory: Sub-

jecting Canonical Models of Distributive Politics to Basic (but Long Overdue) Empirical Tests.”

Presented at the annaul meeting of the Midwest Political Science Association.

Bertrand, Marianne, Esther Duflo and Sendhil Mullainathan. 2004. “How Much Should We Trust

Difference-in-Differences Estimates?” Quarterly Journal of Economics February:249–275.

Bickers, Kenneth N. and Robert M. Stein. 1992. Federal Domestic Outlays 1983-1990: A Databook.

M.E. Sharpe, Inc.

Bickers, Kenneth N. and Robert M. Stein. 2000. “The Congressional Pork Barrel in a Republican

Era.” Journal of Politics 62:1070–1086.

Carsey, Thomas M. and Barry Rundquist. 1999. “Party and Committee in Distributive Politics:

Evidence from Defense Spending.” The Journal of Politics 61:1156–1169.

Caughey, Devon and Jasjeet S. Sekhon. 2011. “Elections and the Regression Discontinuity Design:

Lessons from Close U.S. House Races, 1942-2008.” Political Analysis 19:385–408. 109 BIBLIOGRAPHY

Collie, Melissa P. 1988. “Universalism and the Parties in teh U.S. House of Representatives, 1921-

1980.” American Journal of Political Science 32:865–883.

Cooper, Joseph and David Brady. 1981. “Institutional Context and Leadership Style: The House

from Cannon to Rayburn.” American Political Science Review 75:411–425.

Cortner, Richard C. 1970. The Apportionment Cases. University of Tennessee Press.

Cox, Gary W. and Mathew D. McCubbins. 1993. Legislative Leviathan: Party Government in the

House. University of California Press.

Donohue, John J. and Justin Wolfers. 2006. “Uses and Abuse of Empirical Evidence in the Death

Penalty Debate.” Stanford Law Review 58:791–835.

Edgington, Eugene S. and Patrick Onghena. 2007. Randomization Tests. 4th ed. Chapman &

Hall/CRC.

Elis, Roy, Neil Malhotra and Marc Meredith. 2009. “Apportionment Cycles as Natural Experi-

ments.” Political Analysis 17:358–376.

Erikson, Robert S., Pablo M. Pinto and Kelly T. Rader. 2010. “Randomization Tests and Multi-Level

Data in U.S. State Politics.” State Politics and Policy Quarterly 10:180–198.

Erikson, Robert S., Pablo M. Pinto and Kelly T. Rader. N.d. “Dyadic Research in International

Relations: A Cautionary Tale.” working paper.

Ernst, Michael D. 2004. “Permutation Methods: A Basis for Exact Inference.” Statistical Science

19(4):676–685.

Fisher, R. A. 1935. The . Edinburgh: Oliver and Boyd.

Good, Phillip. 1994. Permutation Tests: A Practical Guide to Methods for Testing Hypotheses.

Springer-Verlag.

Greene, William H. 2000. Econometric Analysis. 4th ed. Prentice Hall. BIBLIOGRAPHY 110

Hahn, Jinyong, Petra Todd and Wilbert Van der Klaauw. 2001. “Identification and Estimation of

Treatment Effects with a Regression-Discontinuity Design.” Econometrica 69:201–209.

Helland, Eric and Alexander Tabarrok. 2004. “Using Placebo Laws to Test More Guns, Less

Crime.” Advances in Economic Analysis and Policy 4:1–7.

Ho, Daniel E. and Kosuke Imai. 2006. “Randomization Inference with Natural Experiments: An

Analysis of Ballot Effects in the 2003 California Recall Election.” Journal of the American Statistical

Association 101:888–900.

Horiuchi, Yusaku and Jun Saito. 2003. “Reapportionment and Redistribution: Consequences of

Electoral Reform in Japan.” American Journal of Political Science 47(4):669–682. URL: http://dx.doi.org/10.1111/1540-5907.00047

Huber, Peter J. 1967. The Behavior of maximum Likelihood Estimates under Nonstandard Con-

ditions. In Proceedings of the Fifth Berkeley Symposium on and Probability.

University of California Press.

Imbens, Guido and Karthik Kalyanaraman. 2009. “Optimal Bandwidth Choice for the Regression

Discontinuity Estimator.”. URL: http://www.nber.org/papers/w14726

Keele, Luke, Corrine McConnaughy and Ismail White. 2008. “Randomization Tests With Experi-

mental Data.” Presented at the Annual Meeting of the Midwest Political Science Association.

Kennedy, Peter E. 1995. “Randomization Tests in Econometrics.” Journal of Business and Economic

Statistics 13:85–94.

Kennedy, Peter E. and Brian S Cade. 1996. “Randomization Tests for Multiple Regression.” Com-

munications in Statistics: Simulation and Computation 34:923–36.

Kezdi, Gabor. 2004. “Robust Standard Error Estimation in Fixed-Effects Panel Models.” Hungarian

Statistical Review 9:95–116. 111 BIBLIOGRAPHY

Kloek, T. 1981. “OLS Estimation in a Model Where a Microvariable is Explained by Aggregates

and Contemporaneous Disturbances are Equicorrelated.” Econometrica 49(1):205–207.

Lax, Jeffrey R. and Kelly T. Rader. 2010. “Legal Constraints on Supreme Court Decision Making:

Do Jurisprudential Regimes Exist?” Forthcoming in Journal of Politics 72(2):273–284.

Lee, David S. 2008. “Randomized Experiments from Non-Random Selection in U.S. House Elec-

tions.” Journal of Econometrics 142:675–697.

Lee, David S., Enrico Moretti and Matthew J. Butler. 2004. “Do Voters Affect or Elect Policies?

Evidence from the U.S. House.” The Quarterly Journal of Economics August:807–859.

Lee, Frances E. 1998. “Representation and Public Policy: The Consequences of Senate Apportion-

ment for the Geographic Distribution of Federal Funds.” The Journal of Politics 60:34–62.

Lee, Frances E. 2003. “Geographic Politics in the U.S. House of Representatives: Coalition Building

and Distribution of Benefits.” American Journal of Political Science 47:714–728.

Lee, Francis E. 2000. “Senate Representation and Coalition Building in Distributive Politics.” Amer-

ican Political Science Review 94:59–72.

Lee, Francis E. and Bruce I. Oppenheimer. 1999. Sizing up the Senate: The Unequal Consequences of

Equal Representation. University of Chicago Press.

Leoni, Eduardo L. 2009. “Analyzing Multiple Surveys: Results from Monte Carlo Experiments.”

Political Analysis forthcoming.

Levitt, Steven D. and James M. Snyder. 1995. “Political Parties and the Distribution of Federal

Outlays.” American Journal of Political Science 39:958–980.

Liang, Kung-Yee and Scott L. Zeger. 1986. “Longitudinal using generalized linear

models.” Biometrika 73(1):13–22.

Manly, Brian F. J. 1997. Randomization, Bootstrap and Monte Carlo Methods in Biology. 2nd edition.

ed. London, Chapman Hall. BIBLIOGRAPHY 112

Mayhew, David R. 1974. Congress: The Electoral Connection. Yale University Press.

McKay, Robert B. 1965. Reapportionment: The Law and Politics of Equal Representation. Twentieth

Century Fund.

Moir, Robert. 1998. “A Monte Carlo Analysis of the Fischer Randomization Technique: Reviving

Randomization for Experimental Economists.” Experimental Economics 1:87–100.

Moore, David S., George P. McCabe, William M. Duckworth and Stanley L. Sclove. 2003. The

Practice of Business Statistics Companion Chapter 18: Bootstrap Methods and Permutation Tests. W.

H. Freeman.

Moulton, Brent R. 1986. “Random group effects and the precision of regression estimates.” Journal

of Econometrics 32(3):385 – 397.

Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables

on Micro Units.” The Review of Economics and Statistics 72(2):334–338.

Nichols, Austin. 2007. “Causal Inference with Observational Data.” The Stata Journal 7:507–541.

Nichols, Austin and Kelly T. Rader. 2007. “Spending in the Districts of Marginal Incumbent Victors

in the House of Representatives.” Unpublished working paper.

Nichols, Austin and Mark Schaffer. 2007. “Clustered Standard Errors in Stata.”. URL: http://repec.org/usug2007/crse.pdf

O’Gorman, Thomas W. 2005. “The Performance of Randomization Tests that Use Permutations of

Independent Variables.” Communications in Statistics: Simulation and Computation 34:895–908.

Park, Hun Myoung. 2008. Hypothesis Testing and the Statistical . Technical working

paper ed. The University of Information Technology Services (UITS) Center for Statistical and

Mathematical Computing, Indiana University.

Primo, David M., Matthew L. Jacobsmeier and Jeffrey Milyo. 2007. “Estimating the Impact of State

Policies and Institutions with Mixed-Level Data.” State Politics and Policy Quarterly 7:446–459. 113 BIBLIOGRAPHY

Richards, Mark J. and Herbert M. Kritzer. 2002. “Jurisprudential Regimes in Supreme Court Deci-

sion Making.” American Political Science Review 96(2):305–20.

Rogers, William. 1993. “Regression Standard Errors in Clustered Samples.” Stata Technical Bulletin

13:19–23.

Rohde, David. 1991. Parties and Leaders in the Postreform House. University of Chicago Press.

Rosenbaum, Paul R. 2002. “Covariance Adjustment in Randomized Experiemnts and Observa-

tional Studies.” Statistical Science 17:286–327.

Rosenbaum, Paul R. 2009. Design of Observational Studies. Springer Series in Statistics.

Schick, Allen. 2000. The Federal Budget: Politics, Policy, Process. Revised ed. Brookings Institution

Press.

Shepsle, Kenneth A. and Barry R. Weingast. 1981. “Political Prefences in the Pork Barrel: A Gen-

eralization.” American Journal of Political Science 25:96–111.

Stein, Robert M. and Kenneth N. Bickers. 1994. “Congressional Elections and the Pork Barrel.” The

Journal of Politics 56:377–399.

The Constitution, the Congress, and the Census: Representation and Reapportionment. 1999. Census

Bureau . URL: http://www.census.gov/dmd/www/dropin7.htm

Thistlethwaite, Donald L. and Donald T. Campbell. 1960. “Regression-Discontinuity Analysis: An

Alternative to the Ex-Post Facto Experiment.” Journal of Educational Psychology 51:309–317.

Walker, Robert. 2010. “Randomization Inference and Hidden Instruments: Voting Hours, Suffi-

cient Statistics, and Heterogeneous Treatment Effects in a Likelihoodist Framework.” Presenta-

tion at the Midwest Political Science Conference. BIBLIOGRAPHY 114

Weingast, Barry R. and William J. Marshall. 1988. “The Industrial Organization of Congress;

or, Why Legislatures, Like Firms, Are Not Organized as Markets.” Journal of Political Economy

96:132–163.

White, Halbert. 1980. “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct

Test for Heteroskedasticity.” Econometrica 48:817–838.

Wolfinger, Raymond E., Benjamin Highton and Megan Mullin. 2005. “How Postregistration Laws

Affect the Turnout of Citizens Registered to Vote.” State Politics and Policy Quarterly 5:1–23.

Wooldridge, Jeffery M. 2005. Introductory Econometrics: A Modern Approach. 3rd ed.

Thompson/South-Western.

Wooldridge, Jeffrey M. 2003. “Cluster-Sample Methods in Applied Econometrics.” American Eco-

nomic Review 93:113–138.