Determinants of Precinct-Level Voting in the 2008–2016 American Presidential Elections∗ Ryne Rohla‡ September 20, 2018
Abstract This paper uses the first national, multi-year, geocoded precinct-level dataset to measure changes in turnout and partisan support by race and education level in three recent presidential elections. After dasymmetrically matching precinct ge- ographies to demographic data, ecological inference techniques demonstrate widen- ing racial and education-based polarization. Race estimates vary with assumed spa- tial heterogeneity level, but may suggest less initial racial sorting than commonly believed, especially for black voters. Counterfactuals reveal changing subgroup partisanship drove the 2016 outcome more than differential turnout. Regression analyses decompose changes in turnout and partisan support between cycles to suggest possible motivations, finding declining importance of economic characteris- tics in favor of identity-related measures. Last, an instrumental variables analysis explores causal effects of the fracking boom on local voting, finding support for ret- rospective voting. Groups benefiting from expanded resource extraction increased turnout and Republican support while opposing groups—Native Americans and graduate degrees holders—may have become more Democratic when exposed to local fracking utilization.
Keywords: Elections, precinct data, ecological inference, retrospective voting
‡Washington State University, School of Economic Sciences, 101 Hulbert Hall, Pullman, WA 99164. Email: [email protected]. ∗Acknowledgments: I would like to thank Gregmar Galinato, Benjamin Cowan, M. Keith Chen, Matthew Birch, Anthony Delmond, Eric Dunaway, Christopher Clarke, and Casey Bolt for their com- ments and suggestions with regard to the writing of this paper. I would also like to thank John Miles Coleman, Brandon Finnigan and Decision Desk HQ, Derek Norris and OpenElections, David Bradlee, Michael McDonald, Aaron Bycoffe and FiveThirtyEight, Patrick Ruffini, Phillip Bump, Nate Cohn, Tom Giratikanon, Benjamin Anderstone, Kevin Rancik, the Harvard Election Data Archive, the University of California Berkeley’s Statewide Database, Dave Leip’s U.S. Election Atlas, and countless election and GIS officials for their assistance in collecting the election data anchoring this paper. An additional thank you to Mark Nibbelink and DrillingInfo for access to their proprietary well data. 1 Introduction
The 2016 presidential election outcome came as a shock to most political scientists due to its deviation from pre-election state polls and projections. Subsequent analyses have relied upon aggregated result data or post-election surveys, each with methodological shortcomings. A comprehensive analysis based on higher quality data and within the context of prior elections may provide a clearer picture of an event many fail to fully understand. American election research often focuses on county-level outcomes due to data avail- ability. County-level data aggregates away variation in dense urban environments while over-representing sparse rural regions, leading to aggregation bias in estimates of in- dividual beliefs and behavior. Sub-county “precinct” data opens more precise research opportunities by utilizing the most exact, encompassing, and representative data possible. This paper uses the first ever nationwide multi-election precinct dataset to answer how turnout and partisan support changed by race and education between the 2008, 2012, and 2016 presidential elections and probe possible reasonings for these changes. This novel dataset permits stronger statistical techniques than other studies and helps pinpoints changes with spatial granularity and statistical power previously impossible. In doing so, this paper will contribute to a wide spectrum of literature on American politics. The paper’s analysis be accomplished in three ways. First, a battery of ecological inference techniques will be employed to isolate effects of differential turnout and partisan support patterns. Second, plausible motivations for these changes will be inferred through regression analyses using economic and social covariates. Finally, causal estimates of effects from geography-specific exogenous shocks will be explored through a case study of the “fracking” boom. These analyses were chosen to exploit the unique structure and richness of precinct data and illustrate how further research could benefit from this dataset. The preponderance of voting determinant literature relies on either county-level data or post-election individual-level surveys. Numerous county-level studies document cor- relations with demographics and socioeconomic factors in nationwide voting behavior
1 (Mas and Moretti, 2009; Hawley, 2011; Warf, 2011; Bor, 2017; Scala and Johnson, 2017). More unique county-level analyses investigate impacts on voting from right-to- work laws (Feigenbaum, Hertel-Fernandez, and Williamson, 2018), newspaper entry and exit (Gentzkow, Shapiro, and Sinkinson, 2011), Chinese trade exposure (Autor et al., 2016), historical lynchings (Williams, 2018), and election day weather (Gomez, Hanford, and Krause, 2007). These studies could be made more powerful by the use of precinct- level data given the increasing availability of large, geocoded datasets. Literature which does access precinct-level data utilizes limited geographies such as sets of municipalities or counties (Ferreira and Gyourko, 2009; Rutchick, 2010; Augenblick and Nicholson, 2016), a single state or select groups of states (DellaVigna and Kaplan, 2007; Brunner, Ross, and Washington, 2011; Gerber, Kessler, and Meredith, 2011) or a single time period (Hersh and Nall, 2016; Martin and Yurukoglu, 2017). This limitation arises from substantial barriers to precinct data access. Given that voting is habit-forming in both turnout (Gerber et al., 2003; Fujiwara et al., 2016) and partisanship (Shachar, 2003), changes in voting behavior between elections best identify electoral responses to external stimuli. Analyzing levels rather than changes cannot discern relationships id- iosyncratic to any given election and may distort individual preferences for candidate attributes. Voting exhibits two core responses to external shocks: changes in turnout and a “swing” toward one party or another, both of which may affect the level of two-party polarization. Changes in polarization level are especially important to individual social and economic behavior. Political polarization affects where individuals work and shop more strongly than race or religion (McConnell et al., 2018), how long we spend with relatives at family gatherings (Chen and Rohla, 2018), who individuals date and marry (Alford, Hatemi, and Hibbing 2011), and how willing we are to express our true prefer- ences (Perez-Truglia and Cruces, 2017) including donating to campaigns (Perez-Truglia, 2018). Shocks in highly polarized environments may radically change our self-image, in- group and out-group dynamics, and how individuals treat others on a daily basis (Oc, Moore, and Bashshur, 2018).
2 Election results aggregate individual preferences and gauge reactions to the events and policies which precede them, but imperfectly communicate crucial individual- and subgroup-level information. Ecological inference techniques have attempted to recover lost information—such as the overall partisan preferences of each racial group—beginning with Goodman (1953). A flurry of statistical techniques were developed from the late- 1980s through the early-2000s, but stalled partially due to a lack of high quality data to which the techniques could be applied. While each technique relies on varying but strong assumptions (Gelman et al., 2001) and no universally agreed upon method of model validation exists, the identification of subgroup voting behavior remains vital to legal issues.1 This paper applies ecological inference to the two key demographic cleavages most responsible for recent electoral changes: race and education level. Changes in ecological estimates of subgroup voting patterns can be decomposed by regression analysis to explore potential reasons why groups may have changed voting behavior between cycles. This is frequently done in a descriptive and non-causal way due to extensive endogeneity concerns. Causal estimation of changes in voting behavior tend to be possible only for plausibly exogenous shocks or discrete policy changes. The fracking boom is one such shock. Evaluating localized, geography-specific shocks to voting behavior sees applications in several fields. Urban economics has the “homevoter hypothesis” wherein proposed local projects affecting property values directly impact voting behavior.2 An analogous litera- ture spanning both political science and economics studies “retrospective voting” based on micro-level factors. While general economic conditions have long been known to cor- relate with voting behavior (Fiorina, 1978), retrospective voting literature has recently delved into impacts from localized policies changes such as wind turbine installation (Stokes, 2016), interstate highway expansion (Nall, 2013), and local infrastructure degra-
1Legal issues such as legislative district pre-clearance under the Voting Rights Act of 1965 and whether redistricting schema constitute undue bias toward minority groups. This paper’s precinct-level dataset has wide applicability to a range of redistricting-related topics and projects. 2Papers utilizing local election data have analyzed sports venues (Coates and Humphreys, 2006; Dehring, Depken, and Ward, 2008), airports (Ahlfeldt and Maennig, 2015), new housing development (Kahn, 2011), and school voucher programs (Brunner, Sonstelie, and Thayer, 2001), generally finding support for the hypothesis.
3 dation (Burnett and Kogan, 2015), along with with effects from purely exogenous shocks such as property damage from extreme weather events (Healy and Malhotra, 2010; Cole, Healy, and Werker, 2012; Chen, 2013), and spatially-clustered lottery winnings (Bagues and Esteve-Volart, 2016). This paper’s analysis of precinct results first reveals widening variance between each election and increasing gaps between median and mean partisan outcomes. Ecologi- cal inference techniques show wide variation in estimates of racial voting by the spatial heterogeneity level assumed, while education-based estimates show less variance. The benchmark spatial King-Rosen model with geographically-weighted regression covariate suggests turnout rates for minorities peaked in 2008 before declining in 2012 and 2016, with the exception of Hispanics, while white voters have increased turnout rates. This model also implies racial voting patterns were more polarized in 2016, but that voters may not be as racially polarized as commonly believed and found in exit polls, in partic- ular black voters. Education level results show increasing polarization of the electorate between those with and without a college degree. Next, the paper uses regression analysis to decompose the support structures of can- didates across time and isolate possible explanations for why turnout and partisan sup- port patterns have changed. First, results suggest a shift in deterministic power away from economic variables such as income, unemployment, and income inequality toward identity-related variables such as race, country of birth, and education. Second, analy- sis of turnout changes between 2012 and 2016 suggest minorities and college graduates turned out more heavily in well-off precincts with large foreign-born populations while high school graduate turnout surged in working class precincts with high manufactur- ing, construction, and natural resource-based economies. Black turnout slumped in cities while Hispanic turnout surged in urban areas. Third, white voters became more Republi- can in 2016 in less educated and economically tepid precincts while diversity and wealth promoted more white Clinton support. Hispanics and Asians, however, became more Republican in the presence of high rates of foreign born individuals. Last, the paper uses an instrumental variable approach to estimate causal effects of
4 the fracking boom on changes in turnout and partisan support between each election. Re- sults of this method show strong retrospective voting tendencies, with voters most likely to benefit from the fracking boom increasing turnout and Republican support. Groups likely opposed or harmed by increased fracking production such as Native Americans and graduate degree holders show some evidence of increased Democratic support. An ag- gregate analysis of the data implies pro-Republican effects most notable in Pennsylvania, where the cumulative effect of eight years of fracking additions may have been sufficient to account for Trump’s entire vote margin.
2 Data
2.1 Precinct Data Background and Collection
Every county in the United States is subdivided into one or more geographical units for the purposes of election administration, voter registration, and redistricting higher-level legislative districts3 which are commonly referred to as “precincts.”4 Election precincts do not generally conform to other geographies such as census blocks, zip codes, and municipalities.56 State-level election officials may aggregate precinct-level election results and geogra- phies but often do not. On a national level, the only election with a publicly-available set of practicably-compelete results and geographies is the 2008 presidential election. This set is maintained by the Harvard Election Data Archive and was accomplished with the as- sistance of the Census Bureau collecting and releasing Voting Tabulation District (VTD)
3The exception being Alaska, where precincts are nested within State House Districts and may cross borough boundaries. Delaware and Hawaii also nest precincts within State House Districts, but these districts tend to also nest within counties or be sub-precincted by the intersection of these districts and counties. A further exception is Kalawao County, Hawaii, which sees its entire election process administered by neighboring Maui County; Kalawao nevertheless typically maintains a separate precinct. 4They have alternate official names in some states, such as “election districts” in New Jersey and New York and “wards” in Wisconsin. 5With the exception of being nested within townships in New England and parts of the Northeast and Midwest. 6Most precincts, unlike counties, maintain approximate population equality within a state, meaning the density of precincts follows population density as of the most recent redrawing of precinct lines. Precincts vary in size from a few city blocks in Manhattan to hundreds of square miles in the rural Mountain West. Precincts may have one unique polling place assigned, may share with neighboring precincts, or may not relate to polling places in vote-by-mail states.
5 shapefiles, which approximate precincts, in conjunction with the 2010 Decennial Census.7 Data for 2008 for this paper was drawn from this source and from Dave Bradlee’s Redis- tricting App, which supplements Census geographies with census block group estimates for the 2008 election in these three states. Precinct-level results for the 2012 and 2016 presidential elections were hand-collected for this paper through an extensive process of web-scraping and contacting state Secre- taries of State, Boards of Election, and other statewide and county-wide electoral au- thorities. For states which do not compile precinct-level election result data, individual county clerks were contacted by email, phone, fax, or in person. These precinct-level votes were mapped to precinct polygonal shapefiles using Geo- graphic Information Systems (GIS) software.8 This process was iteratively updated to be as complete as practicable given the author’s time, manpower, and financial limitations. The resultant dataset covers over 99% of votes in more than 173,000 precincts in 99.9% of counties in each of these two elections.9 The results of these three elections were merged with each other and with demographic data from the 2005-2009, 2008-2012, and 2012-2016 American Community Surveys10 and 2010 Decennial Census through dasymetric re-aggregation. Dasymetric mapping sub- stantially reduces re-aggregation error compared to inclusion-exclusion techniques such as centroid containment or naive re-aggregation algorithms such as areal interpolation (Zandbergen and Ignizio, 2010).11 This methodology closely approximates characteris- tics of the current precincts across time and, as most of the re-aggregated characteristics
7Even the Census’s collection is incomplete as Census files do not contain precinct geographies for all or large portions of Montana, Oregon, and Rhode Island. 8Shapefiles for twenty-four states were obtaining in whole from state authorities or academic institu- tions. The remaining states required substantial modification from out-of-date shapefiles or wholesale creation by hand. This process often started with state-level 2010 VTD shapefiles which were then extensively updated with county-level files from local authorities or through hand modification and dig- itization of paper or electronic static map files. Many precinct boundary lines varied between 2012 and 2016 and had to be modified separately. Some minor township-level aggregation was required in the Northeast and Midwest. 9Further information of the gaps in coverage can be found in Section 2.2. 10These were the closest datasets available to the three elections at the time of writing. 11Using the 2016 precinct boundaries as a base, these datasets were first mapped in their native geographies, then projected onto census block centroids. These census block-projections were then weighted according to their population as of the 2010 Decennial Census and summed up to the 2016 precinct polygons.
6 are rates or proportions, minimally distorts re-aggregated values. With the re-aggregated 2008 and 2012 results, we can define “swing”12, a measure of the change in voting outcomes from an election in time t − 1 to t in 2016 precinct i, as
1 Di,t − Ri,t Di,t−1 − Ri,t−1 Si,t = − (1) 2 Di,t + Ri,t + Oi,t Di,t−1 + Ri,t−1 + Oi,t−1
where Dt, Rt, and Ot represent the total votes for the Democratic candidate, Republican candidate, and all other candidates, respectively. This measure is arbitrarily defined such that an increase in the percentage margin of victory for the Democratic candidate increases the swing and vice versa. Figure 1a maps the swing in precinct-level voting between the 2012 and 2016 presiden- tial elections while Figure 1b does the same for the 2008 and 2012 elections, with all data re-aggregated to 2016 boundaries. The 2012-16 swing map exhibits a strong Republican swing in much of the rural Midwest and Northeast, with dark blue Democratic swings in the Mormon Corridor and in the suburbs of large cities. The 2008-2012 swing map sees fewer strong swings, with the largest pro-Republican swings occurring in coal mining regions in Appalachia and southern Illinois and Indiana, in Utah, and in upper regions of the Mountain West. The most visible Democratic swings in the latter map occur in western Indian reservations, the southern Black Belt, the Rio Grande Valley, lower Ohio, and upper New York.
2.2 Precinct Data Limitations
There are three limiting factors in the dataset for the 2012 and 2016 presidential elections. First, there exist several classes of votes which may not be assigned to sub-county geogra- phies by local election officials. Ballots which were cast as absentee, provisionally, prior to election day, by overseas citizens, by members of the armed forces, or by individuals who changed residences close to an election may not be assigned to a precinct but simply generalized to the county level. This non-assignment tends to be uniform within a state
12This definition of swing is commonly used in international elections, but less often in American elections.
7 but not between states. These types of ballots may comprise a large share of the vote in certain circumstances, and these votes often differ in composition and candidate share from other votes within a county. As shown in Table1, these ballot types comprised approximately 3% of all votes cast in both 2012 and 2016. The unallocable vote types tended to be 3-6 points more Democratic-leaning than allocable votes. Second, precinct geographies were not obtainable for two rural counties: Lake County, Oregon and Walworth County, South Dakota. An additional seven rural counties in Arkansas, Alabama, and Kentucky did not readily release precinct-level votes in 2012, but did in 2016. These counties were aggregated, and precinct-level swings were assigned on the basis of county swings. Last, a small portion of votes were unable to be allocated to geographies due to author limitations in obtaining precinct shapefiles with the latest divisions of previous precincts.13 In most cases, when precincts are added, these new precincts exist as subsets of previous precincts; when the exact nature of this division could not be discerned, the new precinct may have been omitted. This error accounts for 0.3% and 0.7% of all votes in the 2016 and 2012 elections, respectively. These votes tended to lie in Republican-leaning rural areas which swung toward Trump relative to Romney. Figure A1a maps the total unallocable absentees and related vote types by county, averaged between 2012 and 2016. Idaho, Louisiana, Maryland, New Jersey, South Car- olina, and Virginia were the only states where more than 10% of total votes were of this type. Figure A1b maps the average omitted vote rate by county for the two elections. The largest errors lie in select counties in Alabama and West Virginia. In Alabama, 4-5% of all votes were omitted, while 4% of West Virginia votes were omitted. Approx- imately 80% of counties in both years were completely error-free, and 93% of counties were missing no more than 1% of votes.
13Control over drawing precinct boundaries falls to county or other local election officials under general guidelines put forth by state governments; boundaries are renamed, redrawn, and renumbered both following decennial censuses and in the intercensal period. This redrawing process is often only publicized locally, and up-to-date maps and election result data may be very difficult to obtain for many counties. Difficulties may include opaque and varying responsibilities for election management, frequent turnover in local election officials, frequently obsolete contact information, fees for data, lengthy Freedom of Information Act requirements, opaquely differing housing of maps and GIS data apart from election officials, lack of digitized data, and lack communication and transmission faculties.
8 Despite these limitations, both the 2012 and 2016 datasets include 96.7% of all votes cast, and the summed totals closely align with actual national averages. This fact, com- bined with their limited scale, assuages most concerns about data quality. It may still be possible for these limitations to bias estimates, but their systematic nature may be accounted for through empirical techniques in later sections.
2.3 Fracking Background and Data
Hydraulic fracturing, commonly known as “fracking”, is a technique using non-vertical or “directional” drilling and subterranean pressurized fluid injection to extract otherwise inaccessible hydrocarbon deposits such as shale and tight oil plays. The technology and economic conditions necessary for fracking’s profitability arose rapidly during the mid-2000s. As shown in Figure2, the number of directional wells in the United States increased by 32% from 1996 to 2000 and by 39% from 2000 to 2004 before rising to a 67% growth rate between 2004 and 2008, 65% between 2008 and 2012, and 41% between 2012 and 2016. The overall number of such wells climbed from 47,127 in 2000 to 270,511 in 2017. The geographic extent of growth in fracking-related wells is directly limited by the exogenous location of shale plays which benefit from this extraction technique. The most notable region affected by the fracking boom is the Bakken Formation in western North Dakota and eastern Montana which received a large surge of in-migration during this period, although other regions were also affected. Feyrer, Mansur, and Sacerdote (2017) document large and persistent wage and job gains in regions affected, leading to substantial wage migration (Wilson, 2016) and an increase in birth rates but not marriage rates (Kearney and Wilson, 2018). Montana and North Dakota made large swings to the right in the 2012 and 2016 presidential elections, which might be expected given an influx of workers in the natural resource extraction industry, traditionally aligned with the Republican Party. However, these swings were not constrained to areas within the Bakken Formation at the county level, and environmental issues and Native American land rights issues have arisen in
9 reaction to the fracking boom, implying potential polarizing effects. Income effects and increased population may also be forces which drive voters leftward. An a priori predic- tion, therefore, might be an increase in polarization in fracking boom areas. While Fedaseyeu, Gilje, and Strahan (2015) use county-level data in only seven states to find shale booms increase support for conservative candidates, otherwise conservative counties on the Great Plains and in the Mountain West often have dense pockets of Democratic-leaning Native American or environmentalist voters whose responses may be obscured by using more highly-aggregated data. Their study also does not account for voting behavior in the highly unique 2016 presidential election. Proprietary data on hydrocarbon well locations, characteristics, and output were pro- vided by the company DrillingInfo through their academic outreach initiative. Their data provides the latitude and longitude, orientation, activity status, quarterly output, and other characteristics of each well, but does not state explicitly if a well uses hydraulic fracturing. Following Wilson (2016) and Kearney and Wilson (2018), fracking wells are identified through the intersection of directional wells with those situated over a shale or tight oil play. These regions were identified by GIS shapefiles provided by the United States Energy Information Agency (EIA).
2.4 Summary Statistics
2.4.1 Precinct Data
Of the 173,355 precinct geographies collected for the 2016 election, 168,825 saw nonzero votes cast. Table2 displays summary statistics for these vote-casting precincts. The mean precinct cast 782.6 votes in 2016, 375.4 for Clinton and 361.1 for Trump, and gave Trump a 1.9% larger margin than Romney; the mean precinct voted about five points to the left of the median precinct, implying a larger share of strongly-Clinton precincts than strongly-Trump precincts. The median precinct cast more votes in 2008 than in either 2012 or 2016, bottoming out in 2012. This roughly corresponds to overall turnout
10 patterns between the three elections.14 The gap between the median and mean Democratic margins widened from 2008 to 2012 and again from 2012 to 2016, from 4.26% to 5.11% to 5.30%. This widening can be explained by an increasing reliance of the Democratic Party on highly unanimous areas at the expense of more balanced precincts. This observation corroborates a geographic par- tisan sorting narrative wherein Democratic voters inefficiently pack themselves in dense, urban precincts at the expense of influence in swing districts. This process is sometimes referred to as “unintentional gerrymandering” (Chen and Rodden, 2013). The increasing spatial polarization of the electorate can also be observed in the increas- ing standard deviations of each party’s share. The standard deviation of the Democratic share steadily increased from 20.77% to 22.49% to 24.04% across the three cycles. Figure 3 displays a histograms comparing the distribution of Democratic shares in 2012 and 2016; the 2016 distribution is noticeably flatter, implying higher polarization.15 Appendices A.1 and A.2 contrast observed demographic and political covariate means in the precinct-level dataset with comparable county and survey data. Precinct data is more representative than either and substantially moreso than county data. These sections also discuss additional methodological advantages precinct data exhibits over these alternative data types.
2.4.2 Fracking Data
Between the 2008 and 2012 presidential elections, 182,226 total wells were drilled across the United States, of which 55,347—30.4%—were identified by the fracking well criteria. These fracking wells generated a mean 31,111.7 barrel of oil equivalents (BOEs) per year compared to 15,547.1 BOEs per year overall. Between the 2012 and 2016 presidential elections, an additional 134,585 wells were added; 56,343 or 41.9% were fracking wells. The yearly output of these wells was higher at a mean 58,095.7 BOEs per year due to
14The mean and standard deviation for turnout numbers are skewed and artificially inflated due to heavier county-level aggregation in 2012. 15The only exception is in the far right tail of the Democratic share, where Obama surpassed Clin- ton considerably. This can be explained both by Clinton’s relative under-performance in highly black precincts and by stronger third party performances in 2016. See Section A.3 for more details.
11 diminishing returns over time. Figure4 maps these newly added fracking wells and their associated shale plays. As shown in Table3, 16% of precincts intersected at least partially with a shale or tight oil play, but only 1.4% of precincts saw a fracking well drilled between the 2008 and 2012 elections, and 1.0% of precincts saw the same between the 2012 and 2016 elections. The average precinct saw 0.32 fracking wells added in each period, with a mean 410 BOEs per year added between 2008 and 2012 and 657 BOEs per year added between 2012 and 2016. Precincts with fracking wells added were substantially more Republican than both the nation and within shale plays in general, topping 60% for the Republican candidate in each cycle. These precincts swung further to the right by 4-6% between the three races, a swing larger than shale plays generally. Precincts which added a fracking well between 2008 and 2012 swung 4.2% toward Romney while those which did not add a fracking well during this period but did during 2012 to 2016 swung 4.0% to Romney, implying a substantial portion of this swing may be unrelated to fracking wells themselves or due to spatial spillover. Precincts which added a fracking well in the first period, but did not in the second period swung 5.2% to Trump while those which added in both swung 5.6% to Trump. Precincts which only added a fracking well in the second period swung 7.4% to Trump.
3 Empirical Methods
3.1 Ecological Inference
One widely agreed-upon ecological inference procedure does not exist. Goodman (1953) first operationalized the ecological regression (“ER”), which assumes homogeneous turnout and subgroup partisan support patterns. These assumptions typically do not hold in real- ity. ER is also without upper or lower bound, permitting impossible estimates. ER results can be tamed by incorporating spatial fixed effects to isolate intra-regional variation, but the unbounded property remains after aggressive spatial controls. Thomsen (1987) used a logistic regression approach to correct the boundedness issue and allowed for partial
12 heterogeneity by averaging across geographic regions wherein homogeneity is assumed. This procedure was updated by Park (2008) with more flexible non-linear substitution patterns (“Thomsen-Park” or “TP”). Freedman (1991) introduced the neighborhood model (“NM”), a simple and bounded method which assumed complete heterogeneity at the precinct level such that all subgroup patterns were driven solely by geography.16 King (1997), later generalized by King et al. (1999) and Rosen et al. (2001) (“King- Rosen” or “KR”), developed a two-stage “method of bounds” estimator which allows precinct-level heterogeneity, estimates turnout by subgroup and precinct, and is bounded. KR relies on an assumed distributional form—truncated normal in King (1997) or multinomial- Dirichlet in Rosen et al. (2001)—while producing “untamed” estimates which may not represent a large improvement over other methods (Freedman et al., 1998; Tam Cho, 1998) and cannot be used as a dependent variable due to consistency issues without an adequate covariate describing the aggregation structure (Herron and Shotts, 2003). KR also produces biased estimates in the presence of “extreme spatial heterogeneity” which spatial weighting may be able to solve (Anselin and Tam Cho, 2002). Calvo and Esco- lar (2003) suggest running the KR procedure with the precinct-level coefficient estimate from a local geographically-weighted regression (“GWR”) as a covariate, producing a spatial-KR model (“SKR”) which generates consistent estimates.17
The SKR procedure begins with the accounting identity that turnout rate Ti is a weighted sum of group population shares d. For j groups, the turnout rate in precinct i is X Ti = θi,jdi,j (2) j P where di,j = 1. As the θi,j values are the only unknowns, we can reduce the dimen- sionality by one by solving for θi,k as a function of θi,−k. With perfect information and under homogeneous subgroup preferences across precincts, the each j − 1-dimensional
16Heterogeneous voting preferences within races and ethnic groups are well-established, but ill- measured. For example, Cuban and Vietnamese descendants tend to be far more Republican than other Hispanic or Asian subgroups. The extent to which this heterogeneity varies by geography is poorly understood. 17This process presages later models equating ecological inference problems with instrumental variable methods for estimating causal effects (Spenkuch, 2018).
13 hyperplane should intersect at a unique point. With heterogeneous preferences, a unique intersection will not exist, but the density of intersections will be higher near the “true”
θi,j values. To estimate these values, SKR uses a three-stage hierarchical Bayesian procedure:
first, assume Ti follows a multinomial distribution and define the contribution of precinct i’s results to the likelihood function as the product of all θi,j. Next, define φi,j,p as the share of the precinct’s population voting in both subgroup j and voting for candidate p
and assume φi,j,p is distributed Dirichlet as a function of covariate Zi. The probability
density function of φi,j,p is
Γ ω P exp(δ + ζZ) j p Y f(φ) = φωj exp(δ+ζZ)−1 (3) Q Γ (exp(δ + ζZ)) p p
where F () is the gamma function. Both δ, ζ are given uniform priors while ω is given a exponential prior. By Bayes’s theorem, the posterior distribution is
f(θ, φ, Z) = L(θj|φj,p)f(φj,p|δj,p, ζj,p, ωj)p(δj,p, ζj,p, ωj) (4)
This posterior is maximized using Monte Carlo simulations based on a Gibbs sampler using a Metropolis algorithm.
The covariate Zi in the SKR model is generated by first running ER, then regressing a geographically weighted regression projecting ER-predicted turnout on the ER residual. The GWR estimator, which mirrors a generalized least squares estimator, is
0 −1 0 ˆ ˆ ˆ ER Zi = Ti wiTi Ti wiεˆi (5)
where the spatial weighting matrix wi weights nearby precincts with a positive value if they fall with a bandwidth distance and zero otherwise. The optimal kernel size is based on the degree of spatial non-stationarity and is frequently found by minimizing the Akaike information criterion. The entire procedure can be run in one step or as two steps, first predicting turnout and then partisan vote shares using the predicted electorate
14 composition. KR and SKR uniquely produce heterogeneous precinct-level estimates of turnout and vote shares for each subgroup. Predicted electorate compositions can used in the other procedures, increasing estimate accuracy compared to assuming turnout reflects voting- age population (VAP). The SKR model thus contains the most comprehensive set of a priori characteristics desirable in an ecological inference estimator and be the primary basis of this section’s analysis, although its estimates will be compared to other proce- dures.
3.2 Determinants of Subgroup Voting
While ecological inference can deduce how groups changed behavior between election cycles, less can be said about why these changes occurred. Regression analysis can illuminate correlates which may have motivated these outcomes. While causal effects cannot be claimed in this section, results presented here point toward plausible rationales for behavior or suggest there may not exist relationships previously thought. Specifying a regression design requires addressing several issues. First, the data limita- tions in Section 2.2 must be accounted for. Unallocable absentee votes operate primarily at the state level, but sometimes at the county level. Because these types of votes tend to be disproportionately Democratic-leaning, they may bias the remaining results rightward. County-level fixed effects serve to correct for this type of bias. County-level fixed effects should also correct for much of the error introduced by author limitations. A more conservative approach would be to also weight observations by the likelihood these errors will affect observational accuracy. The following error weight EW based on the percentage of missing votes MV for each precinct i in county c for the 2012 and 2016 elections is used: