A Comment on the King Ecological Inference Solution
Total Page:16
File Type:pdf, Size:1020Kb
I the Assumption Fits ::: : A Comment on the King Ecological Inference Solution Wendy K. Tam Cho Abstract I examine a recently prop osed solution to the ecological inference prob- lem King 1997. It is asserted that the prop osed mo del is able to recon- struct individual-level b ehavior from aggregate data. I discuss in detail b oth the b ene ts and limitations of this mo del. The assumptions of the basic mo del are often inappropriate for instances of aggregate data. The extended version of the mo del is able to correct for some of these limitations. However, it is dicult in most cases to apply the extended mo del prop erly. Intro duction For a wide variety of questions, esp ecially those which involve histori- cal researchor volatile issues such as race, aggregate data supply one of the only sources of reliable data. However, making inferences ab out micro-level units when the only available data are aggregated ab ove the micro-level unit in question is extremely dicult. Consider the example of aggregate data analysis in investigating the gender gap by estimat- ing presidential voting b ehavior among women. For each precinct, the Thanks to Bruce Cain, Douglas Rivers, Gary King, Walter Mebane, Henry Brady, Chris Achen, David Freedman, Brian Gaines, Jasjeet Sekhon, Simon Jack- man, Michael Herron, Rui de Figueiredo, JakeBowers, Cara Wong, Mike Alvarez, and Jonathan Nagler for helpful comments. I am also grateful to participants at a CIC Video Metho ds Seminar and at the University of California, Berkeley, for lively and helpful discussions, and to the National Science Foundation Grant No. SBR-9806448 for research supp ort. 144 Political Analysis number of votes received by each presidential candidate is known. In addition, the numb er of registered female voters and the numb er of reg- istered male voters is known. However, b ecause of the secret ballot, the manner in which the votes were cast is unknown. In the absence of re- liable survey data, one can determine the number of men and women who voted for each candidate only by mo deling the situation. There are avariety of assumptions and many statistical metho ds up on which such a mo del can b e formed. Whenever a statistical mo del is employed, one should consider the implications of the mo del's assumptions. All aggregate data mo dels in- corp orate assumptions whichmust b e taken under careful consideration. The merits of applying a statistical mo del to a problem necessarily de- p end on the consistency of the assumptions with the problem at hand. The King 1997 mo del hereafter referred to as \EI" for reconstructing individual b ehavior from aggregate data is no exception. In this pap er, I examine the b ene ts and limitations of applying EI to instances of aggregate data. The Basic EI Mo del There are basically two \ avors" of EI. For ease, one will be called \basic EI" and the other will b e referred to as \extended EI." Basic EI merits its own discussion b ecause it is claimed to b e adequate in many situations King 1997, 284. For this reason, I discuss basic EI rst, and then examine the extended mo del, with particular attention to its ability to comp ensate for the shortcomings of the basic mo del. The Basic Mo del. EI is a notable advancement to ecological in- ference in that it incorp orates two ideas which are ideally suited for ag- gregate data, but whichhave never previously b een utilized in tandem. Together, these elements bring a new degree of eciency to aggregate data analysis. The rst of these two ideas is the deterministic metho d of b ounds, rst intro duced by Duncan and Davis 1953. The metho d of b ounds narrows the range of p ossible parameter estimates. Since we are normally interested in estimating probabilities or prop ortions, the range of p ossible parameter values is immediately restricted to the closed in- terval [0; 1]. Through the metho d of b ounds, we can usually restrict this range even further. However, while the metho d of b ounds can, in theory, provide extremely narrow b ounds, it rarely do es so in practice. The EI b ounds are an obvious extension of the Duncan and Davis b ounds and have b een known and used previously Shively 1974; Ansolab ehere and Rivers 1997. The additional information incorp orted in these b ounds is still not generally sucient to makeinteresting substantive claims. Since the metho d of b ounds is generally not sucient in and of I the Assumption Fits ::: 145 itself, EI incorp orates a second probabilistic comp onent, the random co ecient mo del, largely p opularized bySwamy 1971. In particular, EI assumes that the parameters are not constant and that the parameter variation can b e describ ed by a truncated bivariate normal distribution. In other words, the assumption is that the parameters \have something in common|that they vary but are at least partly dep endentuponone another" King 1997, 93. The suggestion that a random co ecient mo del may b e b etter suited than OLS for aggregate data was originally made by Go o dman 1959. King's contribution, then, is just the choice of distribution. The basic EI mo del incorp orates three assumptions King 1997, 158. First, the parameters are assumed to be distributed according to a truncated bivariate normal distribution. Second, the parameters are assumed to be uncorrelated with the regressors. In other words, \aggregation bias" is not present King 1997, 55. Lastly, it is assumed that the data do not exhibit any spatial auto correlation. Since the ba- sic mo del is not appropriate for every instance of aggregate data King 1997, 24, it is useful to determine how robust the basic mo del is to deviations from its assumptions and thus to determine when the basic mo del is appropriate. Monte Carlo Simulations. The assumptions of the mo del can b e examined through Monte Carlo simulations. King 1997 p erformed some of these tests for basic EI. In particular, one Monte Carlo exp eri- ment with data inconsistent with the spatial auto correlation assumption but consistent with the distributional assumption and the assumption of uncorrelated parameters and regressors was p erformed King 1997, 168, Table 9.1. Another Monte Carlo simulation included data whichwere inconsistent with the distributional assumption but consistent with the spatial auto correlation assumption and the assumption of uncorrelated parameters and regressors King 1997, 189, Table 9.2. A nal Monte Carlo simulation generated data inconsistent with the no aggregation bias assumption but consistent with b oth the distributional and spatial auto correlation assumptions King 1997, 179{182, = 0 case. Follow- ing are three Monte Carlo simulations which exactly replicate King's setup. In addition, to gain a b enchmark for comparison, the results are compared to those obtained using OLS as the aggregate data mo del. Consider rst the consequences of spatial auto correlation in aggre- gate data. The results of a Monte Carlo simulation are rep orted in Table 1. Each rowof the table summarizes 250 simulations drawn from the mo del with the degree of spatial auto correlation and number of ob- 1 servations p. The data were generated in exactly the same manner as 1 Due to computational problems resulting from a lack of RAM, the simulations 146 Political Analysis TABLE 1. Consequences of Spatial Auto correlation OLS Basic EI p Error S.D. p Error S.D. 0 100 .0224 .0167 100 .001 .020 0 750 .0085 .0065 1,000 .000 .007 .3 100 .0227 .0165 100 .001 .020 .3 750 .0082 .0063 1,000 .000 .006 .7 100 .0221 .0161 100 .001 .022 .7 750 .0079 .0060 1,000 .001 .006 describ ed in King 1997, 166. The rep orted results for basic EI are taken directly from Table 9.1 in King 1997, 168. While one might exp ect spatial auto correlation to b e problematic in aggregate analysis, this is clearly not the case if the data are consistent with the other two assumptions. The Monte Carlo evidence implies that spatial auto correlation, on its own, do es not induce bias into either the OLS or EI mo del. While the error for EI is smaller and approaches zero faster, the OLS results are similarly favorable. Certainly, one would be thrilled with an aggregate data mo del that p erforms as well as the OLS mo del do es on these data. Indeed, b oth of these mo dels are robust against deviations from the spatial auto correlation assumption when it is the only inconsistent assumption. A second Monte Carlo exp eriment examines the consequences of data that are inconsistent with the distributional assumption but exhibit neither aggregation bias nor spatial auto correlation. These data were generated according to the exact data generating pro cess describ ed in King 1997, 188 for the truncated normal distribution. In order to maintain consistency with the other two assumptions, the parameters were chosen so that truncation is symmetric. Each distribution has two mo des which di er but have the same variance/covariance structure. The results are displayed in Table 2. The numb ers rep orted for basic EI are taken directly from King, Table 9.2, p. 189. The p oint estimates from the basic EI mo del seem to be b etter than the p oint estimates from the OLS mo del. However, once we take the standard deviation into account, the estimates are indistinguishable.