Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress Viet-An Nguyen Jordan Boyd-Graber Philip Resnik Kristina Miler Computer Science Computer Science Linguistics & UMIACS Government and Politics University of Maryland University of Colorado University of Maryland University of Maryland College Park, MD Boulder, CO College Park, MD College Park, MD [email protected] Jordan.Boyd.Graber [email protected] [email protected] @colorado.edu
Abstract In this paper, we develop this idea further by introducing HIPTM, the Hierarchical Ideal Point We introduce the Hierarchical Ideal Point Topic Model, to estimate multi-dimensional ideal Topic Model, which provides a rich picture points for legislators in the U.S. Congress. HIPTM of policy issues, framing, and voting behav- differs from previous models in three ways. First, ior using a joint model of votes, bill text, HIPTM uses not only votes and associated bill text, and the language that legislators use when but also the language of the legislators themselves; debating bills. We use this model to look at this allows predictions of ideal points from politi- the relationship between Tea Party Repub- cians’ writing alone. Second, HIPTM improves licans and “establishment” Republicans in the interpretability of ideal-point dimensions by the U.S. House of Representatives during incorporating data from the Congressional Bills the 112th Congress. Project (Adler and Wilkerson, 2015), in which bills 1 Capturing Political Polarization are labeled with major topics from the Policy Agen- das Project Topic Codebook.1 And third, HIPTM Ideal-point models are one of the most widely discovers a hierarchy of topics, allowing us to ana- used tools in contemporary political science re- lyze both agenda issues and issue-specific frames search (Poole and Rosenthal, 2007). These models that legislators use on the congressional floor, fol- estimate political preferences for legislators, known lowing Nguyen et al. (2013) in modeling framing as their ideal points, from binary data such as leg- as second-level agenda setting (McCombs, 2005). islative votes. Popular formulations analyze legis- Using this new model, we focus on Republican lators’ votes and place them on a one-dimensional legislators during the 112th U.S. Congress, from scale, most often interpreted as an ideological spec- January 2011 until January 2013. This is a par- trum from liberal to conservative. ticularly interesting session of Congress for politi- Moving beyond a single dimension is attractive, cal scientists, because of the rise of the Tea Party, however, since people may lean differently based a decentralized political movement with populist, on policy issues; for example, the conservative libertarian, and conservative elements. Although movement in the U.S. includes fiscal conservatives united with “establishment” Republicans against who are relatively liberal on social issues, and vice Democrats in the 2010 midterm elections, lead- versa. In multi-dimensional ideal point models, ing to massive Democratic defeats, the Tea Party therefore, the ideal point of each legislator is no was—and still is—wrestling with establishment longer characterized by a single number, but by a Republicans for control of the Republican party. multi-dimensional vector. With that move comes a The Tea Party is a new and complex phe- new challenge, though: the additional dimensions nomenon for political scientists; as Carmines and are often difficult to interpret. To mitigate this D’Amico (2015) observe: “Conventional views of problem, recent research has introduced methods ideology as a single-dimensional, left-right spec- that estimate multi-dimensional ideal points using trum experience great difficulty in understanding both voting data and the texts of the bills being or explaining the Tea Party.” Our model identifies voted on, e.g., using topic models and associating legislators who have low (or high) levels of “Tea each dimension of the ideal point space with a topic. Partiness” but are (or are not) members of the Tea The words most strongly associated with the topic Party Caucus, and providing insights into the na- can sometimes provide a readable description of its corresponding dimension. 1http://www.policyagendas.org/
1438 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 1438–1448, Beijing, China, July 26-31, 2015. c 2015 Association for Computational Linguistics ture of polarization within the Republican party. ated text (Gerrish and Blei, 2012; Gu et al., 2014; HIPTM also makes it possible to investigate a num- Lauderdale and Clark, 2014; Sim et al., 2015). ber of questions of interest to political scientists. For example, are there Republicans who identify 3 Hierarchical Ideal Point Topic Model themselves as members of the Tea Party, but whose Bringing topic models (Blei and Lafferty, 2009) votes and language betray a lack of enthusiasm for into ideal-point modeling provides an interpretable, Tea Party issues? How well can we predict from text-based foundation for political scientists to un- someone’s language alone whether they are likely derstand why the models make the predictions they to associate themselves with the Tea Party? Our do. However, both the topic—what is discussed— computational modeling approach to “Tea Parti- and the framing—how it is discussed—also reveal ness”, distinct from self-declared Tea Party Caucus political preferences. We therefore introduce frame- membership, may have particular value in under- specific ideal points, using a hierarchy of topics to standing Republican party politics going forward model issues and their issue-specific frames. Al- because, despite the continued influence of the Tea though the definition of “frame” is itself a moving Party, the official Tea Party Caucus in the House target in political science (Entman, 1993), we adopt of Representatives is largely inactive and its future the theoretically motivated but pragmatic approach uncertain (Fuller, 2015). of Nguyen et al. (2013): just as agenda-issues map naturally to topics in probabilistic topic models 2 Polarization across Dimensions (e.g., Grimmer (2010)), the frames as second-level Ideal point models describe probabilistic rela- agenda-setting (McCombs, 2005) map to second- tionships between observed responses (votes) on level topics in a hierarchical topic model. a set of items (bills) by a set of responders (legis- Our model’s inputs are votes v , each the { a,b} lators) who are characterized by continuous latent response of legislator a [1,A] to bill b [1,B]. ∈ ∈ traits (Fox, 2010). A popular formulation posits an Two types of text supplement the votes: floor ideal point u for each lawmaker a, a polarity x , speeches (documents) w from legislator a , and a b { d} d and popularity yb for each bill b, all being values the text wb0 of bill b. While congressional debates in ( , + ) (Martin and Quinn, 2002; Bafumi are typically about one piece of legislation, we −∞ ∞ et al., 2005; Gerrish and Blei, 2011). Lawmaker a make no assumptions about the mapping between votes “Yes” on bill b with probability wd and wb0 . In principle this allows wd to be any text by legislator ad (e.g., not just floor speeches p(va,b = Yes ua, xb, yb) = Φ(uaxb + yb) (1) | about this bill, but blogs, social media, press re- where Φ(α) = exp(α)/(1 + exp(α)) is the logis- leases) and—unlike Gerrish and Blei (2011)—this 2 tic (or inverse-logit) function. Intuitively, most permits us to make predictions about individuals lawmakers vote “Yes” on bills with high popularity even without vote data for them. Figure 1 shows yb and “No” on bills with low yb. When a bill’s the plate notation diagram of HIPTM, which has the popularity is lower, the outcome of the vote va,b following generative process: depends more on the interaction between the law- 1. For each issue k [1,K] maker’s ideal point ua and the bill’s polarity xb. ∈ ? (a) Draw k’s associated topic φk Dir(β, φk) Multi-dimensional ideal point models replace (b) Draw issue-specific distribution∼ over frames scalars ua and xb with K-dimensional vectors ua ψk GEM(λ0) (c) For∼ each frame j [1, ) (specific to issue k) and xb (Heckman and Jr., 1997; Jackman, 2001; ∈ ∞ i. Draw j’s associated topic φk,j Dir(β, φk) ∼ Clinton et al., 2004). Unfortunately, as Lauderdale ii. Draw j’s regression weight ηk,j (0, γ) ∼ N and Clark (2014) observe, the binary data used for 2. For each document d [1,D] by legislator ad ∈ these models are “insufficiently informative to sup- (a) Draw topic (i.e., issue) distribution θd Dir(α) (b) For each issue k [1,K], draw frame∼ distribu- port analyses beyond one or two dimensions”, and ∈ tion ψd,k DP(λ, ψk) ∼ the additional dimensions are difficult to interpret. (c) For each token n [1,Nd] ∈ i. Draw an issue zd,n Mult(θd) To address this lack of interpretability, recent work ∼ ii. Draw a frame td,n Mult(ψd,z ) has proposed multi-dimensional ideal point models ∼ d,n iii. Draw word wd,n Mult(φz ,t ) to jointly capture both binary votes and the associ- ∼ d,n d,n 3. For each legislator a [1,A] on each issue k [1,K] ∈ ∈ 2A probit function is also often used where Φ(α) is instead (a) Draw issue-specific ideal point ua,k PJk ˆ ∼ the cumulative distribution function of a Gaussian distribu- ( ψa,k,j ηk,j , ρ) weighting ηk,j by how N j=1 tion (Martin and Quinn, 2002). much the legislator talks about that frame
1439 at each frame node is a Dirichlet draw centered at the corresponding (parent) issue node. While the