Mapping Political Communities: A Statistical Analysis of Lobbying Networks in Legislative Politics∗

In Song Kim† Dmitriy Kunisky‡

January 3, 2018

Abstract A vast literature demonstrates the significance for policymaking of lobbying by special inter- est groups. Yet, empirical studies of political representation have been limited by the difficulty of observing a direct connection between politicians and interest groups. We bridge the two with an original dataset of two distinct observable political behaviors: (1) sponsorship of con- gressional bills, and (2) lobbying on congressional bills. We develop a latent space network model to locate politicians and interest groups in a common “marketplace”, where proximity implies a closer political connection or alignment of interests. In contrast to repeated find- ings of ideological latent dimensions in previous literature on such models, we find distinct issue-specific political communities of interest groups and politicians. To validate the existence and interpretation of the community structure, we apply stochastic block models and a bipar- tite link community model to explicitly model political actors’ community memberships. We consistently find that the latent preference structure of politicians and interest groups is non- ideological and primarily corresponds to industry interests and topical congressional committee memberships. Our findings therefore provide evidence for the existence of powerful political networks in U.S. legislative politics that do not align with the ideological polarization observed in electoral politics.

Keywords: Network analysis, lobbying, ideal point estimation, scaling, stochastic block model, link community model, community detection.

∗We thank J. Lawrence Broz, Devin Caughey, Nolan McCarty, Cristopher Moore, Michael Peress, Yunkyu Sohn, and Hye Young You for helpful comments. Kim acknowledges financial support from the National Science Foundation (SES-1264090 and SES-1725235). †Assistant Professor, Department of Political Science, Massachusetts Institute of Technology, Cambridge, MA, 02139. Email: [email protected], URL: http://web.mit.edu/insong/www/ ‡Ph.D. Student, Department of Mathematics, Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012. Email: [email protected]. 1 Introduction

Special interest groups engage in lobbying to promote their political objectives (e.g., Wright, 1990; Grossman and Helpman, 2001).1 A dominant view among political scientists holds that interest groups take part in such costly political activity in order to retain access to policymakers, and in return policymakers gain an effective means of informing their legislative decisions (Bauer, Dexter, and Poll, 1972; Potters and Van Winden, 1992; Austen-Smith and Wright, 1992; Austen-Smith, 1995; Wright, 1996; Ansolabehere, Snyder, and Tripathi, 2002). Others argue that lobbying serves instead as a “legislative subsidy” through which lawmakers work with their “allies” to achieve a common objective (Hall and Deardorff, 2006). In either case, despite the significance of political connections between interest groups and politicians (Khwaja and Mian, 2005; Faccio, Masulis, and McConnell, 2006; Faccio, 2006; Kang and You, 2017), empirical studies of legislative politics have been limited by the difficulty of directly observing these political ties, let alone the actual policy context in which they are formed. In this article, we identify a type of political connection that can be observed tractably between every Member of Congress and every actively lobbying special interest group. We construct a large network dataset of lobbying activity on 108,086 congressional bills introduced between the 106th and the 114th Congress. Our data is unique in joining two distinct types of political behavior around each bill: (1) sponsorship2 by a politician, and (2) lobbying by interest groups. Based on how often politicians and interest groups performing these two political activities coincide on individual bills, we infer the structure of political networks underlying the legislative process of the U.S. Congress. Although lobbying on a single bill does not necessarily imply political ties to its sponsor, recurring instances of lobbying that involve the same interest group / sponsor pair across various bills, which our network data also captures, do reliably encode close political relationships.3 Figure 1 shows that numerous bills are introduced in each Congress, the majority of which are lobbied by at least one interest group. Furthermore, as the right panel shows, the distribution of the number of unique interest groups lobbying on a given bill is highly skewed to the right, implying that lobbied bills tend to reflect narrow interests and legislative expertise. A typical example: Mitch McConnell (R-KY) sponsored “A bill to exempt the aging process of distilled spirits from the production period for purposes of capitalization of interest costs” (113th S. 1457), and the

1We use the term special interest groups to refer to any political actors who have particular policy objectives, including firms, trade associations, labor unions, business associations, and professional associations. 2For evidence that sponsorship is likely to be a more reliable position-taking signal than the much less costly political action of cosponsorship, see e.g. Rocca and Gordon (2010). 3There is ample empirical evidence that lobbyists help to draft or even write bills on behalf of legislators with whom they have political connections (Nourse and Schacter, 2002). For an example, see http://westernpriorities. org/2016/05/09/how-much-did-rep-scott-tipton-copy-from-his-biggest-donor-this-much/.

1 Bills Introduced 14000 3000

12000 2500 10000 Bills Lobbied 2000 8000 1500 6000

Number of Bills 1000 4000

2000 500

Bills Voted Number of Bills in 113th Congress 0 0 106 108 110 112 114 0 50 100 150 200+ Congress Number of Unique Interest Groups Lobbying per Bill

Figure 1: Descriptive Statistics of Lobbying on Congressional Bills. The left panel com- pares the numbers of bills introduced, lobbied, and voted on between the 106th and the 114th Congress. On average, about 12,000 bills are introduced in each Congress; only a very small subset of bills are eventually voted on the floor, while we identify the majority of bills as lobbied by at least one special interest group. Note that fewer bills are identified as lobbied prior to the 110th Congress, because lobbying report filing was not digitized until the Honest Leadership and Open Government Act of 2007 was passed. The right panel shows the distribution of the number of unique interest groups lobbying on each bill in the 113th Congress. The distribution is highly skewed: the median is three, the maximum is 978, and 25% of the lobbied bills are lobbied by only one interest group.

Distilled Spirits Council of the U.S. was the only interest group to lobby on the bill.4 The main contribution of this paper is to develop statistical network models to determine whether certain interest groups frequently lobby the bills sponsored by particular legislators (or vice versa). First, we introduce a Bayesian latent space model to estimate a latent “ideal point” (or preferred policy) for each political actor. Unlike existing studies that rely on roll calls over just the small subset of bills that are voted on the floor (Poole and Rosenthal, 2011; Clinton, Jackman, and Rivers, 2004), we analyze all Senate and House bills5 and their sponsorship to infer the underlying policy preferences of legislators and interest groups. Our model bridges legislators and interest groups by locating them in a common “marketplace”, in which proximity implies a closer alignment of interests. This approach is similar in methodology to recent studies that uncover ideal points of other political actors through various observable connections to legislators, such as campaign contributions and social media following (Bonica, 2013; Barber´a,2014; Bond and Messing, 2015). We find that the estimated preferred policy locations do not align with existing measures of ideology such as DW-NOMINATE scores (Poole and Rosenthal, 2011). Instead, we observe

4See Kim (2017) for similar patterns in trade bills. 5Many of the bills thought of as “dying” before being voted on are actually amended and merged with other bills that will be voted on the floor eventually, but our dataset allows us to observe political connections before this process takes place, and therefore at a finer granularity.

2 clustering of interest groups and legislators according to their industry affiliations and memberships in committees with jurisdiction over those industries, respectively. This is in contrast to repeated findings in the literature of ideologically-driven preferences for political actors based on other types of political behaviors and connections, such as roll call votes, campaign contributions, and cosponsorship (e.g., Clinton, Jackman, and Rivers, 2004; Fowler, 2006; Shor and McCarty, 2011; Bonica, 2013; Imai, Lo, and Olmsted, 2016). Yet, this result is consistent with Ansolabehere, Snyder, and Tripathi (2002), who find that the political actors who act mainly through lobbying are likely to be more bipartisan and less ideological than those who act mainly through campaign contributions. It also attests to the influence committees have over policy outcomes from the earliest stages of the legislative process (Shepsle and Weingast, 1987). Furthermore, we find that party control over Congress is an important determinant of the “popularity” of individual legislators in our model, the preferred targets of lobbying skewing towards majority parties in each chamber of Congress. Our findings therefore suggest that legislative sponsorship and lobbying activities are driven not by ideological factors, but by agenda-setting powers (Peress, 2013) and more generally by influential positions of legislators and their political contacts within specific policy domains (Weingast and Moran, 1983). Next, we explicitly model community memberships of interest groups and politicians in order to validate the existence and interpretation of the community structure observed in the Bayesian latent space model, whose outputs suffer the drawback of requiring arbitrary decisions to be made in demarcating communities. To address this issue, we use extensions of the stochastic block model, a tool developed to study social networks (Snijders and Nowicki, 1997) and elaborated in greater generality in the machine learning literature, to directly capture the underlying group memberships that drive political actors’ interactions. Consistent with the findings from the Bayesian latent space model, we find several distinct political communities that correspond to committee assignments of politicians and industry categorizations of interest groups. We also verify that the community memberships identified by the proposed stochastic block model largely agree with the clusters of firms and politicians found in the Bayesian latent space model, confirming that the lobbying network exhibits significant non-ideological clustering. Finally, we allow for the possibility of political actors belonging to multiple political commu- nities, proposing a link community model that identifies interest groups and politicians whose interactions are characterized by “mixed” community membership. We find that politicians who are members of non-industry-specific “procedural” committees such as the House Committee on Ways and Means and the Senate Committee on Appropriations, as well as aggregate inter- est groups such as the Chamber of Commerce that tend to represent members’ heterogeneous

3 political interests, are likely to participate simultaneously in multiple political communities. Mixed membership modeling techniques give a precise and principled representation of this essential fea- ture of the lobbying network. Our findings suggest that lobbyists do not necessarily target elected officials who are ideologically sympathetic “allies” interested in advancing specific legislation, a thesis advanced by Hall and Deardorff (2006). Instead, special interest groups and politicians who share domain-specific, rather than ideological, political interests frequently interact, even when those actors have a mixture of domain-specific interests. Taken together, our findings call for “interests” and “committee power” to be brought back into the study of legislative politics (Shepsle and Weingast, 1987). We show that the majority of legislative activity operates differently from public position-taking driven by electoral motivations, as observed through roll call votes (Mayhew, 1974). We also develop network analysis tools for further research in this direction. To the best of our knowledge, ours is the first statistical study of lobbying networks in legislative politics in their entirety, including both politicians and interest groups. Furthermore, the proposed methodology provides a new empirical framework for applied researchers to systematically identify political communities and characterize their memberships while leveraging the rich structure of network data that have become increasingly available in various subfields of political science (Keck and Sikkink, 1998; Hoff, Raftery, and Handcock, 2002; Maoz et al., 2006; Hafner-Burton, Kahler, and Montgomery, 2009; Clark and Lauderdale, 2010; Ward, Stovel, and Sacks, 2011). The rest of the paper is organized as follows. In Section 2, we introduce a database of lobbying information and a network dataset obtained from it. In Section 3, we introduce our Bayesian Latent Space Network Model (LSNM) and apply it to lobbying data from the 111th and 113th Congresses. We then apply the Bipartite Stochastic Block Model (biSBM) and the Bipartite Link Community Model (biLCM) to validate our findings in Section 4 and Section 5, respectively. Lastly, Section 6 gives some concluding remarks. Open-source software implementing the proposed methods will be made available as a package for the R and Python languages. The network data, all estimated pairwise measurements of political connections between politicians and interest groups, the esti- mated spatial preferred policy locations and their posterior distributions, and the visualization tools used in preparing this paper will also be made publicly available.

2 The Lobbying Network Database

The Lobbying Disclosure Act of 1995 (amended by the Honest Leadership and Open Government Act of 2007) requires mandatory quarterly electronic filing for any organization (“client”) that “actively participates” in lobbying. Filings must disclose the general lobbying issue area, the con- gressional and federal agencies contacted, and the specific issues on which lobbyists (“registrants”)

4 have engaged in political activities.6 Based on the rich information available from this disclosure of political activities, researchers have studied various aspects of the politics of lobbying, including the sources of and returns on decisions to lobby (Richter, Samphantharak, and Timmons, 2009; De Figueiredo and Richter, 2014; Kang, 2015; You, 2017), the similarities and differences between campaign contribution and lobbying (Ansolabehere, Snyder, and Tripathi, 2002), characteristics of lobbyists (Baumgartner et al., 2009; Vidal, Draca, and Fons-Rosen, 2012; Bertrand, Bombardini, and Trebbi, 2014), and the implications of lobbying for trade politics (Bombardini and Trebbi, 2012; Kim, 2017). However, the available lobbying data has the important limitation that interest groups are not required to report the identities of their political contacts. This is unfortunate, given that almost 90% of lobbying reports indicate that at least one Member of Congress or a member of their staff was indeed contacted. We construct an original database describing the lobbying network, which consists of all of the links described above among lobbyists, interest groups, bills, and politicians. The database is generated from lobbying reports filed between 1999 and 2017, which are available from the Senate Office of Public Records (SOPR). A key component of the database is a suite of automated systems to (1) detect congressional bills reported to have been lobbied, (2) identify the session of Congress those bills occurred in, and (3) relate each bill to its sponsor. We next give a brief outline of the process used to construct the lobbying database. First, we identify all lobbying activities related to legislative bills, which is possible due to 2 U.S.C. 1604(b)(2)(A) legally requiring interest groups to disclose the “list of bill numbers” § lobbied.7 For example, a report filed by Intel Corporation (the only firm to lobby on this bill) reads “H.R. 289, National STEM Education Tax Incentive or Teachers Act of 2011—Science and math education legislation” In practice, bills are often referenced within the text of a lobbying report, and sometimes only by alternative names or subtitles, so the entire report text must be pro- cessed and all bill references algorithmically identified. Second, we identify the session of Congress that each bill belongs to, which is often not mentioned explicitly, as in the above example. We use text data mining techniques to determine whether information such as the bill title or phrases like “Act of [YEAR]” occur in the report text. Based on our algorithm, described in greater detail in Appendix A.1, we correctly determine that, for instance, the above bill H.R. 289 is from the 112th Congress. If the year that the lobbying report was filed had been used to determine the Congress instead, then this bill would have been identified as H.R. 289 “Value Our Time Elections Act”

6See https://lobbyingdisclosure.house.gov/amended_lda_guide.html for the definition of lobbying. 7Compliance with disclosure requirements is closely monitored and enforced. It is annually audited by the Gov- ernment Accountability Office (GAO). According to the 2014 audit report by GAO, 90% of organizations filed reports as required, and 93% could provide documentation related to expenses. Any interest groups that fail to abide by the law are subject to $200,000 fine or up to 5 years of imprisonment, or both. The 2014 GAO report on lobbyists’ compliance with disclosure requirements is available from http://www.gao.gov/products/GAO-15-310.

5 from the 113th Congress (since the report was filed in 2013).8 Finally, we identify the sponsor of each bill, completing the connection between lobbying clients and politicians. We repeat this process over each of 1,111,859 lobbying reports, which in total link 20,092 special interest groups with 1,164 (former and previous) Members of Congress. Figure A.3 in the Appendix shows that the political activities of lobbying and sponsorship tend to be heavily skewed in their frequency. In particular, most pairs of politicians and interest groups have very few interactions, while there are a very small number of actors who tend to be highly active, the top few of whom we enumerate in the tables in Figure A.3. For example, the interest group / politician pair with the greatest number of bills lobbied and sponsored in common during the 113th Congress is the Iraq and Afghanistan Veterans of America and Senator Bernard Sanders (D-VT), Senator Sanders sponsoring 25 bills that were all lobbied on by the organization; we will later see that a group of veterans’ associations in fact forms one of the more significant clusters that our models identify. More details on our datasets are given in Appendix A.2.

Notations Before describing our models, we introduce some mathematical notation and short- hand terminology that will be useful in the sequel. We index a set of interest groups (we will sometimes use the term “lobbying clients” or simply “clients”, this being the terminology of the Lobbying Disclosure Act) by [m] = 1, . . . , m and a set of legislators by [n]. When a bill is re- { } ported as lobbied on by client i and sponsored by politician j, we say that the client and politician coincide on that bill, and each lobbying report client i writes that mentions the bill is a political incidence. We denote the number of incidences (over some period of time, usually a single ses- sion of Congress) between the pair i, j by Ai,j, and organize these numbers as the entries of the incidence matrix A Rm×n. This matrix may be viewed as the adjacency matrix of a bipartite ∈ graph G with weighted edges, where the politicians and interest groups lie on opposite sides of the partition, and the weight of an edge connecting a given client and politician is the number of incidences between them. The degree (a term from network theory) of a client or politician is the sum of the incidences between them and every actor of the other type. Therefore, the degree of a politician is the total number of times bills they have sponsored have been lobbied on (this can be thought of as the total number of bills sponsored, weighted by how much the bill has been lobbied), and the degree of an interest group is simply the total number of times they have lobbied on individual bills. High-degree nodes are important structural features of a network, and we will see the highest-degree agents playing important roles in our models repeatedly.

8In fact, the OpenSecrets.org database from the Center for Responsive Politics, which is often used in academic research, erroneously finds that Intel lobbied on H.R. 289 (113th). See http://www.opensecrets.org/lobby/ billsum.php?id=hr289-113.

6 3 A Bayesian Latent Space Model of the Lobbying Network

In this section, we develop a Bayesian Latent Space Network Model (LSNM). Following the in- troduction of this style of model by Hoff, Raftery, and Handcock (2002), there has been a great proliferation of variants, of which we mention just a few that are especially useful for understanding our methods. Our model is formally similar to the “Wordfish” model of Slapin and Proksch (2008) for text analysis, but while we estimate the underlying preference structure of two distinct groups of interacting political actors, Slapin and Proksch (2008) estimate ideal points of a single group of political agents based on text data. Our model is also closely related to that of Barber´a(2014), which relates ideal points of politicians and social media users, but instead of modeling whether a binary political interaction exists (e.g., Twitter follows), we consider how frequently political contacts occur, our interaction measurements therefore having an associated magnitude.

3.1 The Model

We model the connection between client i and legislator j as a function of the two actors’ latent preferred policy positions in a d-dimensional Euclidean space, θ Rd and ψ Rd respectively, i ∈ j ∈ and assume that the frequency of their interactions Ai,j has a Poisson distribution, with its mean parametrized by the Euclidean distance θ ψ . k i − jk To account for the differences in agents’ baseline propensities to sponsor or lobby (see Fig- ure A.3), we also include client- and legislator-specific terms, αi and βj respectively, in the mean 9 of the modeled Ai,j. We then take the probabilistic model

A Poisson(µ ) (1) i,j ∼ i,j µ = exp(α + β θ ψ ) (2) i,j i j − k i − jk for the entries, which we model as independent conditional on the parameters, to obtain the following joint distribution:

m n Y Y P (A α, β, θ, ψ) = Poisson (A exp (α + β θ ψ )) . (3) | i,j | i j − k i − jk i=1 j=1

We take a further hierarchical extension of this model where the latent space positions of clients and politicians are distributed with multivariate normal population distributions, (0, diag(τ )) N and (µ, Σ) respectively, where the distribution of client positions is centered and assumed to have N 9These variables in the literature are variously called “popularity”, “gregariousness”, or “idiosyncratic” factors (we adopt the first term). In other contexts, such as stochastic block models, these variables are called “degree correction” factors, since they adjust the model to account for broad degree distributions. A similar construction to ours was tested thoroughly on synthetic datasets by Krivitsky et al. (2009).

7 a diagonal covariance matrix to account for translational and rotational invariances of the latent space positions. Likewise, popularity factors are assumed to be drawn from normal population distributions, (ν, σ2 ) and (0, σ2 ), where the latter is centered to account for the translational N (L) N (P ) invariance of the popularity factors. The posterior distribution under this model is then:

m n Y Y P (α, β, θ, ψ A) Poisson (A exp (α + β θ ψ )) | ∝ i,j | i j − k i − jk × i=1 j=1 m n Y Y (θ 0, diag(τ )) (ψ µ, Σ) N i | × N j | × i=1 j=1 m n Y Y (α ν, σ2 ) (β 0, σ2 ). (4) N i | (L) × N j | (P ) i=1 j=1

Identifiability As alluded to above, the priors we use resolve identifiability problems due to the invariance of θi and ψj under rotations and translations, and the invariance of α and β under translations. The remaining identifiability issue is the invariance of θi and ψj under reflections, which makes the posterior distribution multimodal. Note that this becomes more and more of a problem as the latent space dimension increases: in dimension d, we would expect the posterior to have 2d modes (one for each choice of which axes to reflect across). We will only consider d 1, 2 , and with d = 2 we find that this issue can still be resolved by the same method as ∈ { } is usual with d = 1 in ideal point estimation, namely by fixing the position of one agent whose position we believe to be far from the origin of the latent space (near the origin, a point is close to all of its reflections). In practice, even fixing the starting position of one important agent—in the sense of being connected to many others and therefore affecting the latent space geometry to a significant extent—appears usually to be enough to ensure that MCMC sampling will only explore one mode of the posterior. As we will see, the agents on the periphery of the latent space tend to belong to particular industries. Thus, we choose one significant client from such an industry (we have found large energy holdings companies, which together form the “core” of the very large energy industry cluster, to work well for this purpose, but in principle any important member of the industries we will highlight would suffice) and fix its starting position in a particular orthant of Rd, which resolves the reflection identifiability issue adequately for our purposes. For a thorough discussion of this problem when d = 1, see Bafumi et al. (2005). For some technical discussion of an unusual identifiability issue we observed when replacing θ ψ in our model with θ ψ 2, k i − jk k i − jk see Appendix A.3.3.

Computation We fit our model with the Hamiltonian Monte Carlo No-U-Turn Sampler imple- mentation in the Stan software package (Carpenter et al., 2017). In the results presented here, we will work mainly with estimated posterior means of our parameters taken as point estimates,

8 1.00

0.75

0.50

0.25

0.00

0.25

0.50 DW-NOMINATE Dimension 1

0.75 ( = 0.13) 3 2 1 0 1 2 3 Lobbying Network 1D Latent Space Position

Figure 2: 1-Dimensional LSNM vs. DW-NOMINATE Ideology. We illustrate the lack of correlation between politicians’ latent space positions inferred from a one-dimensional LSNM and the DW-NOMINATE ideology dimension. This is the first suggestion from our modeling that the ideological structures common in electoral politics are not prevalent in the lobbying network. Points are colored on a linear scale from blue to red according to the DW-NOMINATE dimension as well, to introduce a convention we will use throughout the paper. and perform some further analysis of these mean latent space positions and popularity factors so as to focus on illustrating politically meaningful findings. We give further details, a more technical evaluation of our implementation, and the Stan code describing our model in Appendix A.3.

3.2 Empirical Applications to Recent Congresses

We apply the proposed model to the filtered lobbying network dataset for the 113th Congress, whose precise construction is given in Appendix A.2. We first consider a one-dimensional LSNM to check whether the estimated latent preferred policy positions of politicians (ψ R) correlate with j ∈ the DW-NOMINATE ideology measures. As Figure 2 shows, we do not find evidence that ideolog- ical differences drive the interaction between interest groups’ lobbying and politicians’ legislative activities (Pearson’s ρ = 0.13 between our latent dimension and the DW-NOMINATE ideology dimension). This lack of correlation holds for restrictions to only Democrat and Republican Mem- bers of Congress, as well. This is in stark contrast to other studies that find a strong ideological clustering based on network data such as campaign contributions (Bonica, 2013), Twitter follows (Barber´a,2014), and Facebook likes/endorsements (Bond and Messing, 2015).10

10We note that, consistent with Fowler (2006), we find a strong ideological dimension when we include cosponsor- ships in formulating the incidence matrix rather than only sponsorships. Only in the formulation with sponsorship do we observe meaningful structure for both interest groups and politicians, so we consider this a more meaningful dataset for analyzing the lobbying network.

9 Steve Stockman (R-TX) SteveStockman (R-OH) Jordan Jim (D-IA) Kelly Robin (D-IL) Davis Danny Burgess (R-TX) Michael (R-TN) Black Diane Thomas Coburn (R-OK) Larry Bucshon (R-IN) (D-IA) Harkin Thomas (D-NY) Crowley Joseph (D-MD) Edwards Donna (R-NC) Foxx Virginia Luke Messer (R-IN) (R-NY) ChrisCollins (R-AL) Bachus Spencer Manchin(D-WV) Joe Johnson (D-GA)Henry (R-NJ) Garrett Scott Bob Corker (R-TN) (R-FL) Bilirakis Gus (D-CA) Waters Maxine Hahn(D-CA) Janice Grimm (R-NY) Michael Neugebauer(R-TX) R. Bill Huizenga (R-MI) (D-MA) Capuano M. (D-NJ) Menendez R. (R-NV) Heck Joseph (R-MI) Miller Candice Emerson(R-MO) J.A. Trey Gowdy (R-SC) (D-IL) Quigley Mike Action Fund Action America Physicians Emergency Council Medical Colleges Practitioners of Technology Assn. Markets Fin. Assn. Gun Rights America of Owners Gun Gun for SafetyEverytown Assn.of Rifle Ntl. Health of College American AARP Leadership Healthcare American of Assn. Nurse of Assn. American Universities NorthwesternUniversity University Vanderbilt Virginiaof University MassachusettsInstitute Education Hill McGraw Finance OneFinancial Capital Assn. Bankers American Bank of America and Industry Securities Insurance Omahaof Mutual Corporation Aflac Mutual Nationwide Insurance Allstate Mutual Family American Travel Assn. Travel US International Marriott Travel Business Global USA Morphotrust 4 , are omitted for the sake of visual clarity. 3 Politicians (Democrat) Politicians (Republican) Politicians Clients Lobbying 2 Welch Allyn, Inc. 1 0 This figure presents the two-dimensional latentspace positions and popularity 1 Latent Space Dimension 1 Space Latent 2 We indicate several significant clusters corresponding to specific industries and issue areas. The 3 a )). We annotate the clusters with some representative members; for complete membership lists for j β 4

) or exp( 1 2 3 4 5 4 3 2 1 0

i Latent Space Dimension 2 Dimension Space Latent α Positions: Full 113th Congress. LSNM Pompeo, Mike (R-TX) Mike Pompeo, Flake, Jeff (R-AZ) McKinley, (R-WV) David (R-WY) John Barrasso, (D-MA) Markey Edward (D-NH) Jeanne Shaheen, (R-KY) Paul Rand VitterDavid (R-LA) Gossar (R-AZ) Paul (D-CA) Boxer Barbara Tom (D-NM) Udall (D-PA) Casey Robert (R-VA) Goodlatte Bob (R-MI) Rogers Mike (R-TX) McCaul Michael (D-WV) Rockefeller John WydenRon (D-OR) (D-CA) Matsui Doris Issa (R-CA) Darrell ChaffetzJason (R-UT) (R-TX) Cornyn John (R-UT) Orrin Hatch (D-VT) Patrick Leahy (D-CA) Zoe Lofgren (D-NY) Schumer Charles Miller (R-FL) Miller fJe Richard Burr (R-NC) Mike Co f man (R-CO) (D-NV)Titus Dina (D-WA) Larsen Rick (D-TX) O’Rourke Beto Estimated Two agents with outlying latent space positions (below the region shown), Rob Woodall (R-GA) and Veterans of America of Veterans Fund Council a Military Military & Veterans’airs f A Afghanistan & Iraq Assn. Reserve Fleet Assn. Enlisted Retired Assn.Spinal United Legion American Energy Energy Hathaway Berkshire Sector Co. Public Koch Grid USA National Fuels Renewable Assn. of Alliance Mnf. Automobile Arch Coal BP America USA Chevron Conservation & Land Use Defense Legal Earthjustice Green Blue Alliance Club Sierra Society The Wilderness International Club Safari Chemistry American Telecom Ntl. Assn. of Broadcasters Bell Cincinnatti AT&T Corporation Time Warner Cable Verizon Gov. Relations Twenty-First Fox Century Technology Google Yahoo! Amazon Mastercard Dell Texas Instruments Oracle Packard Hewlett Figure 3: factors inferred from thecolor 113th dimension Congress dataset. scalesan from agent’s red popularity to factoreach blue (exp( of with the the highlightedconcentration clusters, DW-NOMINATE of consult ideology politicians Appendix dimension, belonging A.6. and to The the procedural region committees, point indicated which with size we a is will dashed proportional analyze circle extensively to is with the the a “center exponential link cluster” community of where model we in observe Section a 5. strong

10 Next, we move to a two-dimensional model to facilitate the interpretation of our estimates.

Figure 3 presents the posterior means of the estimated latent spatial positions, θi and ψj, for all clients i and politicians j for the two-dimensional LSNM. Each client and each politician is represented by a circle, the clients’ circles colored black, and the politicians’ circles colored according to their DW-NOMINATE ideology measures, which in practice ends up showing a gradiated version of the Democrat/Republican party split. The size of each circle is proportional to the exponential of their popularity factor so that the mean of the Poisson interaction between two agents at fixed distance is proportional to the product of their circles’ sizes, an intuitive scaling for visual interpretation. We analyze the posterior distributions further in Appendix A.3, with particular attention to the uncertainties of the point estimates given by the posterior means to illustrate the stability of these results. Several findings emerge from this analysis. First, we find that significant clustering appears in our representation of the data. As shown from the example members of each group indicated in Figure 3, the clustering among lobbying clients is strongly industry-based. These clusters emerge even though the clients involved do not lobby on the same bills (recalling that most bills are only lobbied on by a few clients). In other words, related clients are “pulled together” through their indirect connections to the same politicians, who sponsor separate bills on which the clients indi- vidually lobby. Second, there is no clear partisan divide either across or within industry clusters, as in the one-dimensional model. Rather, we find in each large cluster of clients a concentration of politicians from relevant committees involved with the policy issues that directly affect those clients. For instance, Figure 4 shows that the legislators who belong to the “Finance” and “Insur- ance” clusters are likely to serve in the House Financial Services Committee. We provide the full lists of cluster memberships in Appendix A.6 for reference. Finally, the relative geome- try of the clusters is also meaningful: we observe adjacencies between clusters corresponding to “Finance” and “Insurance”, “Technology” and “Telecommunications”, “Energy” and “Military & Veterans’ Affairs.” This corresponds with common intuitions about relationships among industries, confirming that our spatial model captures meaningful structure in the lobbying network. Not all regions of our spatial model, however, are characterized by domain-specific policy-related connections. We find that a region in the center of the latent space (the dotted circle region in Figure 3) is populated by politicians with committee memberships that are not directly related to the industries of the relevant clients (which vary widely) but rather concern federal spending and taxation, most prominently the House Committee on Ways and Means and House Committee on Appropriations, as shown in Figure 4. (Indeed, trade and taxation tend to be affected by complex interdependencies across several industries.) There are also several high-degree

11 Finance & Insurance Clusters Energy Cluster Center Cluster 24 21 18 15 12 9 6 3

Number of Politicians 0

(H) Budget (H) Budget (J) Economic (H) Judiciary

(S) Special Aging (S) Appropriations (H) Appropriations (S) Appropriations (H) Small(H) Ways Business & Means (H) Armed Services (H) Ways & Means (H) Financial Services (S) Foreign Relations (H) Financial Services (H) Homeland Security (H) Natural Resources (H) Energy & Commerce (H) Energy & Commerce

(H) Oversight & Gov. Reform (H) Education & the Workforce (S) Environment(S) Energy & & Natural Public WorksResources (H) Science, Space, & Technology (H) Transportation & Infrastructure (H) Transportation & Infrastructure (H) Transportation & Infrastructure (S) Agriculture, Nutrition, & Forestry (S) Banking, Housing, & Urban Affairs (S) Small Business & Entrepreneurship

Figure 4: Committees Membership of Politicians Per Cluster. Histograms of the top 10 committee memberships for politicians in the larger clusters highlighted in Figure 3. Senate committees and House committees are prefixed with (S) and (H), respectively.

lobbying clients in this region, for example the Chamber of Commerce (the single highest-degree lobbying client) and Nuclear Energy Institute located on the ideological conservative side, and the American Civil Liberties Union, Specialty Equipment Market Association, Mental Health America, and Human Rights Campaign located on the ideological liberal side in this area. The same happens for politicians, the cluster including Thomas Harkin (D-IA), Patty Murray (D-WA), and Carl Levin (D-MI) on the ideological liberal side, and Harold Rogers (R-KY) and Frank Lucas (R-OK) on the ideological conservative side. Next, we infer the same model parameters from a dataset restricted to only bills introduced in each chamber of Congress.11 We consistently find the same cluster structure in these models as in that of the full Congress, suggesting that lobbying clients do not typically have a preferred chamber of Congress. However, one significant additional finding from this exercise is that politicians of the party holding a majority in a given chamber typically have a higher propensity for their bills to

be lobbied on. Figure 5 displays the histograms of politician popularity factors βj obtained from these datasets.12 We observe an alignment between lobbying popularity at the party level and party majority for both the 113th Congress, in which Congress was split with one chamber controlled by each party, and the 111th Congress, in which both chambers were controlled by the same party. This suggests that power positions of legislators, in addition to their political expertise and

11For the sake of consistency, we use the filtering procedure based on lobbying on all bills introduced in the 113th Congress to determine the lobbying clients to be included in our dataset, and form two datasets by counting only bills in either chamber, then applying a final filtering pass. 12Note that the popularity factor does not necessarily directly correspond to a greater propensity to lobby or sponsor in our model; in particular, an agent is occasionally embedded far from all others but has a large popularity factor, which corresponds to a modest but uniform propensity to interact with many other agents. Nonetheless, since the agent positions in our latent space exhibit a “crowded center” and accordingly these outlying cases are quite rare, the popularity factor is a reasonable proxy for the “inherent propensity” of a politician to have their bills lobbied on.

12 111th Senate (Democrat Majority) 113th Senate (Democrat Majority) 0.4 Democrat µ Republican µ Democrats 0.3 Republicans

0.2

Relative Frequency 0.1

0.0

111th House (Democrat Majority) 113th House (Republican Majority) 0.4

0.3

0.2

Relative Frequency 0.1

0.0 5 0 5 10 5 0 5 10 − − βj Popularity Factor βj Popularity Factor

Figure 5: Lobbying Skews Towards Majority Parties. Histograms of politician popularity factors per party in each chamber, reflecting in each case the greater popularity of the party with a majority in that chamber. The histograms are normalized to account for the imbalance in numbers of members of each party in each chamber (i.e. the fact that there are chamber majority parties in the first place). connections within specific policy domains (Shepsle, 1978), are involved in determining the salient patterns of lobbying. Finally, we examine the heterogeneity of lobbying structures within industries. This is an important exercise because the grouping presented in Figure 3 can be seen as arbitrary without systematically examining the structure of possible subclusters. Before dealing with this problem more formally with a stochastic block model in Section 4, we proceed by limiting our analysis to a set of bills that concern a particular policy domain. We choose to study the energy sector, both because the industry has recently undergone important changes such as the shale-gas revolution and growing environmental issues generating major political conflicts (Rabe and Borick, 2013; Rabe, 2014), and because it emerges as a very prominent cluster in our analysis. In fact, “Energy” is also one of the overall most lobbied policy issues. We analyze all bills that the Congressional Research Service has identified as “Energy” bills to check whether the proposed spatial model identifies any distinct political communities within the industry. Figure 6 visualizes the results of this model. We observe a strong separation among firms related to electricity production and domestic energy needs (utilities, hydropower, natural gas, and coal) on the one hand, and firms related to fuel

13 J. Rockefeller (D-WV) Rockefeller J. (D-ND) Heitkamp Heidi (D-WV) Rahall Nick (R-WY) JohnBarrasso (R-CO) Tipton Scott (R-UT) Hatch Orrin (D-MT) Tester Jon (D-CT) JohnLarson Mike Pompeo (R-KS) Tom Marino (R-PA) (R-MI) Upton Fred (D-MT) JohnWalsh Shuster (R-PA)Bill Mike Mulvaney (R-SC) (D-NM) Lujan Ben (D-CA) Boxer Barbara (D-NY) SeanMaloney N/A inferred from a dataset America Assn. Operators Assn. Alliance Barbecue Assn. (PSEG) Group LSNM Coal & Mining Arch Coal Arch Energy Peabody of Workers Mine United NewmontMining Co. BituminousCoal Hydropower Rivers American EnergyCovanta Partners Northwest River Resources Water National Assn. Hydropower Ntl. Gas Natural Pipeline Alliance Gas Natural American Energy Spectra Phosphorus United CenterPointEnergy Williams Companies and Patio, Hearth, Utilities & Holdings Energy Duke NVEnergy SouthernCompany Institute Electric Edison Grid National Enterprise Service Public Rare Earth Metals Metals Rare Ucore Energy Electron Tech. Western Great Thomas and Skinner Lynas Corp. 4 2 Politicians (Democrat) Politicians (Republican) Politicians Clients Lobbying 0 2 Latent Space Dimension 1 Space Latent 4 We present the same visualization as in Figure 3 but for the 6 5 5 0

10 10 Latent Space Dimension 2 Dimension Space Latent Positions: Energy Bills. LSNM (R-WI) J. Bridgenstine (R-OK)Bridgenstine J. YoungDon (R-AK) (D-IA) Loebsack David N/A (R-AR) SteveWomack Roger Wicker (R-MS) F. Sensenbrenner (R-VA) Goodlatte Bob David McKinley (R-WV) McKinley David (R-OH) Latta Robert M. Blackburn (R-TN) JeanneShaheen (D-NH) (D-VA) Warner Mark Estimated Council Builders ManufacturingAssn. and Heating, Inst. Refrigeration Ethanol Manufacturers Store Country Construction & Real Estate Building Green US Intl. Honeywell Home Assn.of Ntl. Door and Window Conditioning, Air Oil Company Oil Shell BP America Corp. Mobil Exxon USA Chevron Petroleum Marathon Renewable Fuels Assn.Growers Corn Ntl. for Coalition American LLC Poet Fuel Renewable Patriot Board Biodiesel Ntl. Assn. Biofuels Advanced Automotive Company Motor Ford Automobile of Alliance Kia / Hyundai Food Foods Conagra Council Chicken Ntl. Federation Turkey Ntl. Institute Meat American Assn. Produce Fresh Utd. Old Barrel Cracker Assn. Foods Dairy Intl. built with onlyindustries energy-related (Coal, bills. Hydropower, Oil, The(Automotive, Renewables, same Food, Utilities), Construction). clustering we see into Weand clusters also industry-based clusters corresponding observe pertaining groups a to to arises: large-scale other electricity industries division generation in having between and addition an clusters distribution interest to pertaining on in the a to energy other fuel number legislation (on and of the transportation meaningful top on and energy the bottom sub- one of hand, our plot, respectively). Figure 6:

14 production (oil, biofuels) and vehicular energy needs on the other. Moreover, we find that within each group, firms also separate by product and home industry (for firms not directly part of the energy industry). The findings from this section provide strong evidence for the existence of political communities in legislative politics that do not align with the known ideological divisions in the U.S. Congress, and for the necessity of explicitly modeling community memberships in order to better understand the lobbying network.

4 A Community Model of the Lobbying Network

In this section, we validate the existence and interpretation of the underlying political community structure that we identified in Section 3. We use variants of the stochastic block model (SBM) to explicitly model the interaction between a politician and a client as a function of their community memberships in a probabilistic framework, so that politicians and interest groups with shared group memberships interact more frequently. The findings from this intuitive model confirm that political communities do exist, and that political ideology plays a minimal role in driving agents’ interaction and determining lobbying activities in the U.S. Congress.

4.1 Bipartite Stochastic Block Model

We consider a generative model of political networks to validate the community structures we have found, using variants of the stochastic block model (SBMs) to infer the community structure in our data. Unlike the LSNM, SBM’s explicitly assume a data generating process in which the existence of communities and their memberships determine the probability of political interaction. Furthermore, being probabilistic models, they give a principled way to perform statistical inference and estimation of the community parameters. We base our analysis on the Bipartite Stochastic Block Model (biSBM) developed by Larremore,

Clauset, and Jacobs (2014). Following the formulation of LSNM in Section 3, we model Ai,j as having a Poisson distribution, whose mean now depends exclusively on the cluster memberships of client i and politician j. To formalize this for our case, we introduce a matrix parameter B Rk×` ∈ with entries Br,s, where k is the number of client clusters and ` the number of politician clusters, viewed as hyperparameters.13 To fix these, we draw upon our exploratory analysis of the lobbying network with the LSNM, which suggested roughly how many distinct clusters we might expect to exist. Using the analysis of Figure 3, we choose to use eight politician and lobbying clients throughout our analysis. The membership parameters are denoted by x [k] and y [`] for each i ∈ j ∈ 13The original work of Larremore, Clauset, and Jacobs (2014) treats as multiple edges what we treat as non-negative integer edge weights, but formally the models are identical.

15 i [m] and j [n]. Then, the entries A are modeled as ∈ ∈ i,j

A Poisson(B ) (5) i,j ∼ xi,yj independently, so that the joint probability is:

m n Y Y P(A B, x, y) = Poisson(A B ). (6) | i,j | xi,yj i=1 j=1

Degree Correction The biSBM usually exhibits an undesirable property of grouping nodes by their degree, especially when the of the network is not tightly concentrated, a feature that we observe in our lobbying network data. Specifically, the model is likely to cluster politicians who sponsor larger number of lobbied bills and clients who lobby many bills regardless of their shared community membership. To adjust for this, we follow Larremore, Clauset, and

Jacobs (2014) and introduce additional parameters αi and βj per client and politician respectively, and adjusting the model given by equation (5) to

A Poisson(α β B ), (7) i,j ∼ i j xi,yj for αi, βj > 0. We must now identify these new parameters, since among the sets of parameters α, β, and B, a positive constant can be multiplied to any one and divided from any other without P P changing the model. This is easily resolved by constraining αi = βj = 1 for all a [k] xi=a yj =b ∈ and b [`]. Letting deg( ) be the degree of an agent, the maximum likelihood estimates are ∈ • deg(i) deg(j) αˆ = , βˆ = , (8) i P deg(i0) j P deg(j0) xi0 =xi yj0 =yj the fractions of the edges leaving a node’s cluster that are leaving the node itself. Due to that interpretation of the new parameters, the resulting model is named the Degree-Corrected Bipar- tite Stochastic Block Model (dc-biSBM). As has been shown in Karrer and Newman (2011) and Larremore, Clauset, and Jacobs (2014), degree correction is an important modeling mechanism in networks with outlying high-degree nodes, and in this setting the dc-biSBM typically obtains more meaningful clusterings than the biSBM. We estimate the maximum likelihood values of B, x, y under the biSBM and the dc-biSBM using the code provided by Larremore, Clauset, and Jacobs (2014), taking the best parameter set of 50 independently initialized runs.

16 iue7: Figure loih ns sepce,tetoagrtm odffrn xet ao ruighigh-degree grouping favor each extents that different clusters the to the clusters, algorithms together small two group in the to together expected, matrix nodes As incidence the finds. permuting algorithm by shown dataset, main nldn oladmnn ( mining For and boundaries. coal the industry including conventional in reflect 8 accurately Group especially more Client which, much instance, memberships side, cluster client yields lobbying also the correction on Degree agents. highest-degree the of The the clients. in 5 Group (bot- Client left (top), from right clusters to (politician) tom) client eight the enumerating Indeed, together. nodes the high-degree on based 3 Figure in of hand groupings by the marked communities clients political lobbying genuine this and as on politicians interpret Based confidence greater data. network with lobbying may we the result, in patterns community “checkerboard” clear exist there the by given ings matrix incidence the shows 7 Figure Findings Empirical 4.2 smnindbfr,wtotdge orcin the correction, degree without before, mentioned As omnt oe Comparison. Model Community Politicians Politicians dc-biSBM biSBM ncnrs,gvsmc oeblne lse ie n esconcentration less and sizes cluster balanced more much gives contrast, in , Degree-Corrected BipartiteStochasticBlockModel tppnl n the and panel) (top ainlMnn Association Mining National Bipartite StochasticBlockModel dc-biSBM A dc-biSBM ihisrw n oun ree codn otegroup- the to according ordered columns and rows its with , biSBM Lobbying Clients otisarneo nryrltdfim,terfocuses their firms, energy-related of range a contains dc-biSBM rdcn h otblne lse sizes. cluster balanced most the producing 17 ossso xcl h v ihs-erelobbying highest-degree five the exactly of consists eut fthe of Results bto ae) ohmdl hwthat show models Both panel). (bottom biSBM , biSBM ebd Energy Peabody uesfo h a fgrouping of flaw the from suffers n the and , LSNM dc-biSBM results. 0 40 80 120 160 0 40 80 120 160 ,ntrlgas natural ), nthe on (American Natural Gas Alliance), utilities (National Grid USA, Consolidated Edi- son), and renewables (National Corn Growers Association, Clean Energy Fuels Cor- poration, Renewable Biofuels). The situation is similar for other industries. The dc-biSBM’s Client Group 1 captures a range of health-related firms, including health insurance (Anthem, Blue Cross Blue Shield and two state subsidiaries), pharmaceuticals (GlaxoSmithKline), education (Association of American Medical Colleges, American College of Physi- cians), and hospitals (Greater New York Hospital Association). Client Group 6 contains finance companies, including banks (Capital One, Bank of America), insurance companies (Aflac, State Farm), and other industry associations (International Swaps and Deriva- tives Association, Financial Services Roundtable). Client Group 4 contains a broad range of technology companies, including internet companies (Google, Yahoo!, eBay), telecommuni- cations (Verizon, AT&T), and hardware manufacturers (Intel, Semiconductor Industry Association). Overall, we find that degree correction is an essential ingredient in enabling the biSBM to capture meaningful clustering structure in the lobbying network. Previously, we interpreted the proximity of the latent space locations obtained by by the LSNM as a measure that reflected community membership. To validate this interpretation, we now com- pare the results from the LSNM in Section 3 with the maximum-likelihood community memberships found by the dc-biSBM. Figure 8 overlays the clusters identified by dc-biSBM on the geometry ob- tained from the latent space model. To facilitate a direct comparison, we show the latent positions of each individual political actor as well as the subjective grey boundaries that we used to group clients and politicians. We then form eight paired political communities, each consisting of one lobbying client community and one politician community. For example, we group all clients in Client Group 1 and all legislators in Politician Group 4 and call them members of “Community 1.” Finally, we assign different colors and markers to visually distinguish the eight political com- munities. The figure shows that the community structures align remarkably well with each other: the boundaries that we drew indeed correspond to distinct political communities or to mixtures of two or three political communities (this is especially prevalent near the center of the latent space, which we observed earlier did not exhibit strong industry-level clustering). Moreover, we confirm that the relative geometry of the LSNM clusters was indeed meaningful. For example, we see that “Finance” and “Insurance” clusters that we identified as being close to each other are in fact grouped as a single political community by the dc-biSBM (“Community 3” with green squares in the center-right part of the figure). Similarly, the “Technology” and “Telecom” clusters are mostly merged, with a small group of mainly politicians (purple triangles) separated. More interestingly, the “Veterans’ Affairs” cluster is extended (brown plus signs) to

18 5 Community 1 Community 2 Community 3 4 Community 4 Community 5 Community 6 3 Community 7 Community 8

2

1

0 Latent Space Dimension 2 Space Latent 1

2

3

4 4 3 2 1 0 1 2 3 4 Latent Space Dimension 1 Figure 8: Latent Space Position vs. Community Membership. We plot paired communities of clients and politicians discovered by the dc-biSBM against the latent positions from the LSNM. These communities align well with the industry-level clusters we manually found, as shown previ- ously in Figure 3. The community pairs, written as pairs of a Lobbying Client Community and Politician Community referred to by index, are as follows: (1, 4), (2, 8), (3, 7), (4, 1), (5, 5), (6, 3), (7, 6), and (8, 2). include a number of civil rights groups such as the National Association for the Advance- ment of Colored People and the Leadership Conference on Civil and Human Rights, forming a larger cluster of we might call “Humanitarian Causes”, which is only weakly visible from direct examination of the latent space. Similarly, a small cluster we identified as “Gun Rights” in the latent space model is extended (gray circles) to include much of the “Conservation and Land Use” cluster, which is plausible as hunting rights are an issue of common interest for many groups in these clusters. Thus, we confirm with an explicit analysis that there in fact exist communities in the lobby- ing network underlying U.S. legislative politics, and that these communities are aligned not with the one-dimensional ideological polarization observed in electoral politics, but rather with a rich collection of industry- and committee-level distinctions that are differently elucidated by different statistical models. Moreover, we find that in some cases, the non-geometric approach of stochastic block modeling can uncover interesting community structure that is not clear from the ideal point models previously used in much of the relevant literature.

19 5 A Link Community Model of the Lobbying Network

In this section, we propose a link community model that captures actors’ simultaneous membership in several political communities. In such a model, it is network links (edges) rather than nodes that belong to political communities, so that the interactions between politicians and interest groups around various issues are grouped (as it will turn out) by issue area, and actors’ simultaneous memberships in these communities may be calculated as the relative frequency which which they participate in each community. This methodology allows us to capture the important possibili- ties of politicians’ having multidimensional preferences involving several issues (Lauderdale and Clark, 2014), as well as aggregate interest groups such as industry organizations accomodating the heterogenous interests of their members (e.g., individual firms).

5.1 Mixed Membership in Multiple Political Communities

A salient feature of legislative lobbying is that political interactions between legislators and interest groups are not limited to a single issue type. That is, even a single politician and lobbying client pair frequently coincide on bills that do not intuitively belong in a single community together. This is especially apparent between agents with broad ranges of interests, like large lobbying clients and senior politicians. For example, the Chamber of Commerce and Senator Barbara Boxer (D- CA) coincide on both the “Water Resources Development Act of 2013” (113th S. 601) and the “United States-Israel Strategic Partnership Act of 2013” (113th S. 462). Likewise, the Specialty Equipment Market Association (SEMA) and Senator John Cornyn (R-TX) coincide on both the “Keep the IRS Off Your Health Care Act of 2013” (113th S. 983) and the “21st Century Endangered Species Transparency Act” (113th S. 2635). Therefore, a natural modeling assumption is that politicians have simultaneous membership in several legislative communities, rather than just one. This is particularly apparent among politi- cians, who almost always sit on several committees having distinct legislative jurisdictions. The same applies, if less transparently, to lobbying clients—many firms, especially larger holding com- panies, lobby on behalf of their diverse subsidiaries in multiple arenas. For instance, we have seen in our previous models two prominent legislative-industrial communities, one corresponding to “healthcare”, and another to “retail and shipping.” The lobbying client Philips Holding USA, Inc., which specializes in medical equipment, home appliances, and lighting solutions, un- surprisingly participates in both of these communities. The dc-biSBM from Section 4, however, is inadequate to capture this “mixed membership” of lobbying activities. Panel (a) of Figure 9 illus- trates the assumption made in the dc-biSBM that only a single community membership is allowed for each node. In fact, we also had difficulty in identifying the community membership of Philips

20 (a) dc-biSBM (b) mmSBM (c) biLCM

Figure 9: Schematic Comparison of Community Models. We illustrate the basic fea- tures of the network models provided by the single-membership dc-biSBM model, and the mixed- membership mmSBM and biLCM models, by drawing the data associated with each node and edge in a single draw of each generative model. (a) In the single-membership model, we only observe node community memberships and counts of edges between nodes. (b) In the mmSBM (adjusted to admit Poisson-weighted edges), we observe node membership distributions and edges between pairs of nodes of a single color (belonging to a single link community). (c) In the biLCM, we observe node membership distributions and edges of different colors between pairs of nodes (belonging potentially to any number of distinct link communities).

Holding USA, Inc. near the center of the latent space in Figure 3 inferred by the LSNM14, the region whose proper interpretation was not clear in our previous analysis. To motivate our modeling technique for resolving this issue, it is instructive to first consider another popular model for overlapping communities in networks. Airoldi et al. (2008) introduced the mixed-membership stochastic block model (mmSBM) whereby each node has an associated probability distribution over communities and the interaction between a pair of nodes depends on a pair of community assignments that the nodes choose for that specific interaction from their respective distributions. That is, each political actor can have multiple community memberships while their interaction with a specific partner is determined by their join memberships in a common community. For example, the authors apply the mmSBM to the Sampson monastery dataset (Sampson, 1969), a popular example in the social networks literature which summarizes survey data on social relationships among 18 novice monks joining a monastery, and identify the group of “waverer” monks, who do not commit to one of the primary social factions in the monastery, but rather maintain friendships with members of several factions. Although this model has some desirable features that allow multiple memberships for political actors, the mmSBM always retains the property that the mixed membership it models consists of an agent varying their community membership across their interactions with other agents. But the mixed membership we are interested in modeling is different—we are concerned with political

14The approximate mean coordinates of this lobbying client’s latent position are (−0.25, 0.51).

21 actors playing many roles at once, even in their interactions with a single partner over the course of a session of Congress as the motivating examples above demonstrate. We illustrate this important conceptual difference graphically with Panel (b) of Figure 9 that considers an extended version of the original mmSBM to non-binary weighted relationships that are closer to the data we observe in the lobbying network.15 It shows that the mmSBM family of models would bizarrely insist that a single common community membership must account for each pair of incidences involving a given pair (denoted by the same edge color), while in reality, as illustrated by the examples in the previous section, the bills that link a given politician and interest group often have different underlying motivations and subject matter, and thus should be modeled as such.

5.2 The Bipartite Link Community Model

We propose a link community model for our dataset which we call the Bipartite Link Community Model (biLCM). Our model combines three features which, to the best of our knowledge, have not been previously combined in the link community or overlapping community modeling literature. First, it explicitly models link communities; second, it does so in the bipartite setting; and finally, it uses a statistically principled maximum likelihood approach. Our model structure is most similar to that of Ball, Karrer, and Newman (2011) (which, as noted in that work, is in turn formally quite similar to topic models used for text modeling), but borrows the ideas for specialization to bipartite graphs from Larremore, Clauset, and Jacobs (2014). The most similar work we are aware of is the link community detection task in bipartite graphs treated as a pure optimization problem on a graph and solved by an ad hoc genetic algorithm by Li, Zhang, and Zhang (2015); unlike that work, our model provides an underlying probabilistic model and therefore a natural statistical interpretation. We suppose that there are k link communities, which we will always take as indexed by a variable z [k]. Each client and politician will have a vector of parameters α and β , respectively, which ∈ i,z j,z represent their involvement in legislation belonging to community z. The number of bills lobbied by client i, sponsored by politician j, and belonging to legislation community z is modeled as

Poisson with a mean proportional to αi,zβj,z. To resolve the identification issue, we assume that Pm Pn for each fixed z, i=1 αi,z = j=1 βj,z = 1, and introduce another parameter κz to capture the overall weight of group z, so that the number of bills between client i and politician j in legislation community z has mean κzαi,zβj,z. We assume that these Poisson variables are independent, so

15Mathematically, an mmSBM adjusted to have Poisson-distributed weights on interactions would model each edge weight as a mixture of Poisson distributions, while the link community model we will introduce will model each edge weight as a sum of independent Poisson distributions, one for each link community in which the agents interact.

22 that the model for the incidence matrix entries is   X Ai,j Poisson  κzαi,zβj,z , (9) ∼ z∈[k] and the joint distribution is

m n k ! Y Y X P(A α, β, κ) = Poisson κ α β . (10) | z i,z j,z i=1 j=1 z=1

We derive an Expectation-Maximization (EM) algorithm for this model in Appendix A.5, which involes alternating expectation and maximization update steps until the log-likelihood of the model converges. As for the other stochastic block models, we take the best convergence point of 50 independently initialized runs. We have found this algorithm very efficient and straightforward to implement, not requiring any serious performance optimization at the scale of our datasets. The update equations produced by our derivation are given below (the first equation is the expectation step for ancillary optimization parameters, and the last three equations are maximization steps for the model parameters).

κ α β q (z) = z i,z j,z , (11) i,j Pk z=1 κzαi,zβj,z Pm Pn i=1 j=1 Ai,jqi,j(z) κz = Pm Pn , (12) i=1 j=1 αi,zβj,z Pn j=1 Ai,jqi,j(z) αi,z = Pm Pn , (13) i=1 j=1 Ai,jqi,j(z) Pm i=1 Ai,jqi,j(z) βj,z = Pm Pn . (14) i=1 j=1 Ai,jqi,j(z)

5.3 Empirical Findings

We begin our analysis by examining how “spread out” agents’ distributions over link communities are, or how “mixed” the legislative activity of an agent is between different incidence communities. It is simple to check from the model definition that the total number of incidences of community z that touch client i is, on average, κzαi,z, and likewise κzβj,z for politician j. A natural choice of measurement is then to normalize these quantities to form probability distributions pi,z =

κzαi,z κzβj,z Pk and qi,z = Pk , and then consider the entropies Hi = H(pi,1, . . . , pi,k) and Hj = z=1 κzαi,z z=1 κzβj,z 16 H(qj,1, . . . , qj,k) . In computing entropies, we take logarithms base 2 for the sake of interpretability:

16We reuse the letter H for these quantities since it will always be clear from context whether we are discussing the entropy of specifically clients or specifically politicians. The astute reader may also object at this point that in fact a more principled quantity to consider would be the mean of the entropy of empirical link distributions drawn from our model with a given set of parameters. We concur, but find in practice that this mean agrees almost perfectly

23 Link Community Distribution Zonal Entropies Link Community Distribution Examples 5

4

3

2

1

0

1

Latent Space Dimension 2 2

3

4 4 2 0 2 4 4 2 0 2 4 Latent Space Dimension 1 Latent Space Dimension 1

Figure 10: Latent Space Position vs. Link Community Distribution. The left panel divides the latent space inferred for the full 113th Congress dataset into hexagonal regions (keeping only those containing some clients or politicians), and colors each region by the average entropy of link community memberships (from the parameters inferred for the biLCM model) of the agents in that region. The right panel plots examples link community distributions at their corresponding latent positions, emphasizing that the agents with the most “spread out” link community distribution tend to lie in the center of the latent space. We observe that the distributions become more skewed towards single communities as we move to the edges of the latent space, where we see similar link community clusterings to those found in the LSNM results. an entropy of h may then be interpreted as, roughly speaking, an agent typically participating in 2h link communities. Simply, put the higher the estimated entropy value, the more likely that a given political actor participates in multiple communities. We find that the biLCM model can shed light on the properties of the region near the center of the latent space, where we did not find strong community structure. The left panel of Figure 10 repeats the latent space plot for the full 113th Congress dataset, but now dividing the latent space displayed in Figure 3 into small hexagonal parcels and coloring these by the mean link community membership entropy of agents lying within them. We clearly observe high-entropy agents (darker hexagons) clustered near the center of the latent space. This implies that we can interpret the center region as capturing lobbying behaviors that are not specific to certain industries or intuitive groupings of special interests, but rather related to political actors with multiple community memberships. As a more intuitive visualization, we provide a schematic visual representation of the same phenomenon in the right panel, where we choose just a few example lobbying clients and represent each with a pie chart showing their estimated link community membership distribution. We con- sistently find many of the same industry-specific communities in the boundary (marked by nodes

(Pearson’s ρ = 0.99) with the quantity we compute.

24 Senator Distribution Committee(s)

Dianne Feinstein (D-CA) Appr., Intelligence, Judiciary, R&A

Patty Murray (D-WA) Appr., Budget, HELP, Veterans’ Affairs

Kay Hagan (D-NC) Armed Services, BHUA, HELP, SBE

Harry Reid (D-NV) Intelligence

Charles Schumer (D-NY) BHUA, Finance, Judiciary, Printing, Library, R&A

Robert Casey (D-PA) ANF, Economic, Finance, HELP

S. Whitehouse (D-RI) Aging, Budget, EPW, HELP, Judiciary

Mark Warner (D-VA) BHUA, Budget, Finance, Intelligence, Printing, R&A

Mike Lee (R-AZ) Armed Services, Economic, Energy, Judiciary

Mark Pryor (D-AR) Appr., CST, Economic, Ethics, HSGA, R&A, SBE

Claire McCaskill (D-MO) Aging, Armed Services, CST, HSGA

Marco Rubio (R-FL) CST, Foreign Relations, Intelligence, SBE

Mark Kirk (R-IL) Aging, Appr., BHUA, HELP

Ted Cruz (R-TX) Aging, Armed Services, CST, Judiciary, R&A ......

Tim Scott (R-SC) Aging, CST, Energy, HELP, SBE

Richard Burr (R-NC) Finance, HELP, Intelligence, Veterans’ Affairs

Bob Corker (R-TN) Aging, BHUA, Foreign Relations

James Risch (R-ID) Energy, Foreign Relations, SBE, Ethics, Intelligence

Richard Shelby (R-AL) Appr., BHUA, R&A

Martin Heinrich (D-NM) Economic, Energy, Intelligence

John Boozman (R-AR) ANF, Appr., EPW, Veterans’ Affairs

Table 1: Link Community Distributions and Committee Memberships. We enumerate the highest- and lowest-entropy Senators in the 113th Congress with a histogram of their link community distributions and a list of their Senate committee memberships. Procedural or broad economic committees are highlighted in black, and we observe a concentration of membership in such committees among the highest-entropy Senators. We use the following abbreviations for some of the committee names: ANF—Agriculture, Nutrition, and Forestry; BHUA—Banking, Housing, and Urban Affairs; CST—Commerce, Science, and Transportation; EPW—Environment and Public Works; HELP–Health, Education, Labor, and Pensions; SBE—Small Business and Entrepeneurship; R&A—Rules and Administration; Appr.—Appropriations. Economic, Library, and Printing refer to the respective joint committees.

25 with a single or two colors) as we have seen before, such as the energy, technology, health, and finance industries. In addition, we find that political actors in the center region tend to belong to many distinct political communities. This is indicated by the distribution of multiple colors that correspond to the distribution over community membership. Finally, we enumerate the highest-entropy politicians in order to clarify the reasons why this situation might emerge in Table 1. We find that, as we observed cumulatively with those politi- cians clustered near the center of the latent space, these politicians tend to belong to committees involving general categories of government oversight or budgetary allocation. This is shown by the distribution over 8 political communties in the second columns where politicians with high entropy values tend to have memberships in multiple communties whereas the politicians at the bottom have a concentrated community membership. We believe that these findings indicate another mode of lobbying, essentially distinct from the strongly community-based mode observed in the previous models, wherein politicians with control over these broad swathes of policy are lobbied by wide ranges of firms, independently of industry or special interest affiliation. Another related phenomenon is also captured by the high-entropy link community distribution agents of this model on the side of special interest groups: of the high-entropy interest groups, many are large “associations” collectively representing many small businesses, employee groups, or broad causes, such as the Chamber of Commerce, the Specialty Equipment Market Association, Heritage Action for America, the National Treasury Employees Union, and numerous others. In light of these observations, we suggest that the “high-entropy center” from the LSNM represents different forms of aggregate action in the lobbying marketplace, either in the form of direct lobbying of politicians with broad responsibilities (as indicated by their membership in procedural committees), or in the form of lobbying through large representative associations by many individuals or smaller firms.

6 Concluding Remarks

Lobbying is known to be an important channel through which special interest groups affect the U.S. legislative process. Even so, explicit observable connections between interest groups and the politicians who represent those interests have proved elusive, since the individual politicians contacted in the course of lobbying need not be disclosed. We assemble a new lobbying network dataset by measuring legislative interactions between interest groups and politicians across all bills introduced since the 106th Congress, after carefully identifying the lobbying instances associated with each bill. We show that this network can be usefully modeled by both latent space models and community models. Unlike previous applications of latent space models in political science under the rubric

26 of ideal point estimation, our models suggest that lobbying interactions mostly depend not on an underlying ideological spectrum, but rather on an underlying grouping into domain-specific communities. Stochastic block models confirm and clarify this finding by explicitly modeling the community structure we empirically observe. Furthermore, our bipartite link community model captures a richer network structure in which some politicians have a tight connection with a single industry specifically related to their committee membership, while others who serve in “procedural” committees with broader subject matter are linked to multiple political communities and aggregate interest groups that represent highly heterogeneous political interests. The consistency between latent space models and non-geometric community models in describing these patterns provides strong evidence for political divisions that are not aligned with ideological polarization as the dominant structural feature of the lobbying network in U.S. legislative politics. Our findings open up new possibilities for studying the mechanics of lobbying, showing that it is helpful to view lobbying activity as occurring in a network of political actors, and therefore suggesting that other network analysis techniques may reveal further meaningful patterns underly- ing lobbying. Of course, the lobbying network also contains much more information than we have analyzed here, including bills and their contents, lobbyists and their lobbying firm affiliations, dis- closures of money spent on lobbying, and the timing of lobbying and legislative events. Extending the network models in this work to describe new types of agents and interactions should further enhance the picture of lobbying and its political effects that we have obtained here.

27 References

Airoldi, Edoardo M., David M. Blei, Stephen E. Feinberg, and Eric P. Xing. 2008. “Mixed Mem- bership Stochastic Blockmodels.” Journal of Machine Learning Research 9.

Ansolabehere, Stephen, James M Snyder, and Micky Tripathi. 2002. “Are PAC Contributions and Lobbying Linked? New Evidence from the 1995 Lobby Disclosure Act.” Business and Politics 4 (2): 131–155.

Austen-Smith, David. 1995. “Campaign Contributions and Access.” American Political Science Review 89 (3): 566–581.

Austen-Smith, David, and John R Wright. 1992. “Competitive Lobbying for a Legislator’s Vote.” Social Choice and Welfare 9 (3): 229–257.

Bafumi, Joseph, Andrew Gelman, David K Park, and Noah Kaplan. 2005. “Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation.” Political Analysis 13: 171– 187.

Ball, Brian, Brian Karrer, and M.E.J. Newman. 2011. “An efficient and principled method for detecting communities in networks.” Physical Review E 84.

Barber´a,Pablo. 2014. “Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data.” Political Analysis 23 (1).

Bauer, Raymond, Lewis Anthony Dexter, and Ithiel de Sola Poll. 1972. American Business and Public Policy: The Politics of Foreign Trade. New York: Aldine.

Baumgartner, Frank R, Jeffrey M Berry, Marie Hojnacki, Beth L Leech, and David C Kimball. 2009. Lobbying and Policy Change: Who Wins, Who Loses, and Why. University of Chicago Press.

Bertrand, Marianne, Matilde Bombardini, and Francesco Trebbi. 2014. “Is It Whom You Know or What You Know? An Empirical assessment of the Lobbying Process.” The American Economic Review 104 (12): 3885–3920.

Bombardini, Matilde, and Francesco Trebbi. 2012. “Competition and Political Organization: To- gether or Alone in Lobbying for Trade Policy?” Journal of International Economics 87 (1): 18–26.

28 Bond, Robert, and Solomon Messing. 2015. “Quantifying Social Medias Political Space: Estimating Ideology from Publicly Revealed Preferences on Facebook.” American Political Science Review 109 (1): 62–78.

Bonica, Adam. 2013. “Mapping the Ideological Marketplace.” American Journal of Political Science 58 (2): 367–386.

Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betan- court, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).

Clark, Tom S, and Benjamin Lauderdale. 2010. “Locating Supreme Court Opinions in Doctrine Space.” American Journal of Political Science 54 (4): 871–890.

Clinton, Joshua, Simon Jackman, and Douglas Rivers. 2004. “The Statistical Analysis of Roll Call Data.” American Political Science Review 98 (2): 355–370.

De Figueiredo, John M, and Brian Kelleher Richter. 2014. “Advancing the Empirical Research on Lobbying.” Annual Review of Political Science 17: 163–185.

Faccio, Mara. 2006. “Politically Connected Firms.” The American economic review 96 (1): 369–386.

Faccio, Mara, Ronald W Masulis, and John McConnell. 2006. “Political Connections and Corporate Bailouts.” The Journal of Finance 61 (6): 2597–2635.

Fowler, James H. 2006. “Connecting the Congress: A study of Cosponsorship Networks.” Political Analysis 14 (4): 456–487.

Grossman, Gene M, and Elhanan Helpman. 2001. Special Interest Politics. Cambridge, MA: MIT press.

Hafner-Burton, Emilie M, Miles Kahler, and Alexander H Montgomery. 2009. “Network Analysis for International Relations.” International Organization 63 (3): 559–592.

Hall, Richard L, and Alan V Deardorff. 2006. “Lobbying as legislative subsidy.” American Political Science Review 100 (1): 69–84.

Hoff, Peter D, Adrian E Raftery, and Mark S Handcock. 2002. “Latent Space Approaches to Social Network Analysis.” Journal of the american Statistical association 97 (460): 1090–1098.

Imai, Kosuke, James Lo, and Jonathan Olmsted. 2016. “Fast Estimation of Ideal Points with Massive Data.” American Political Science Review 110 (4): 631–656.

29 Kang, Karam. 2015. “Policy Influence and Private Returns from Lobbying in the Energy Sector.” The Review of Economic Studies 83 (1): 269–305.

Kang, Karam, and Hye Young You. 2017. “The Value of Connections in Lobbying.” Working pa- per available at https://hyeyoungyou.files.wordpress.com/2015/08/value_of_ connections1.pdf.

Karrer, Brian, and M.E.J. Newman. 2011. “Stochastic Blockmodels and Community Structure in Networks.” Physical Review E 83.

Keck, Margaret E, and Kathryn Sikkink. 1998. Activists Beyond Borders: Advocacy Networks in International Politics. Press.

Khwaja, Asim Ijaz, and Atif Mian. 2005. “Do Lenders Favor Politically Connected Firms? Rent Provision in an Emerging Financial Market.” The Quarterly Journal of Economics 120 (4): 1371–1411.

Kim, In Song. 2017. “Political Cleavages within Industry: Firm-level Lobbying for Trade Liberal- ization.” American Political Science Review 111 (1): 1–20.

Krivitsky, Pavel N, Mark S Handcock, Adrian E Raftery, and Peter D Hoff. 2009. “Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models.” Social Networks 31: 204–213.

Larremore, Daniel B., , and Abigail Z. Jacobs. 2014. “Efficiently Inferring Commu- nity Structure in Bipartite Networks.” Physical Review E 90.

Lauderdale, Benjamin E, and Tom S Clark. 2014. “Scaling Politically Meaningful Dimensions Using Texts and Votes.” American Journal of Political Science 58 (3): 754–771.

Li, Zhenping, Shihua Zhang, and Xiangsun Zhang. 2015. “Mathematical Model and Algorithm for Link Community Detection in Bipartite Networks.” 05 (01): 421-434.

Maoz, Zeev, Ranan D Kuperman, Lesley Terris, and Ilan Talmud. 2006. “Structural Equivalence and International Conflict: A Social Networks Analysis.” Journal of Conflict Resolution 50 (5): 664–689.

Mayhew, David R. 1974. Congress: The Electoral Connection. Yale University Press.

Nourse, Victoria, and Jane S Schacter. 2002. “The Politics of Legislative Drafting: A Congressional Case Study.”.

30 Peress, Michael. 2013. “Estimating Proposal and Status quo Locations using Voting and Cospon- sorship Data.” The Journal of Politics 75 (3): 613–631.

Poole, Keith T, and Howard L Rosenthal. 2011. Ideology and Congress. Vol. 1 Transaction Pub- lishers.

Potters, Jan, and Frans Van Winden. 1992. “Lobbying and Asymmetric Information.” Public choice 74 (3): 269–292.

Rabe, Barry G. 2014. “Shale Play Politics: The Intergovernmental Odyssey of American Shale Governance.” Environmental Science and Technology 48 (15): 8369-8375.

Rabe, Barry G., and Christopher Borick. 2013. “Conventional Politics for Unconventional Drilling? Lessons from Pennsylvania’s Early Move into Fracking Policy Development.” Review of Policy Research 30 (3): 321-340.

Richter, Brian Kelleher, Krislert Samphantharak, and Jeffrey F Timmons. 2009. “Lobbying and Taxes.” American Journal of Political Science 53 (4): 893–909.

Rocca, Michael S., and Stacy Gordon. 2010. “The Position-Taking Value of Bill Sponsorship in Congress.” Political Research Quarterly 63 (1): 387–397.

Sampson, Samuel F. 1969. “A novitiate in a period of change: An experimental and case study of social relationships.”.

Shepsle, Kenneth A. 1978. The Giant Jigsaw Puzzle: Democratic Committee Assignments in the Modern House. University of Chicago Press.

Shepsle, Kenneth A., and Barry R. Weingast. 1987. “The Institutional Foundations of Committee Power.” American Political Science Review 81 (1): 85104.

Shor, Boris, and Nolan McCarty. 2011. “The Ideological Mapping of American Legislatures.” American Political Science Review 105 (3): 530–551.

Slapin, Jonathan B, and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating Time-series Party Positions from Texts.” American Journal of Political Science 52 (3): 705–722.

Snijders, Tom AB, and Krzysztof Nowicki. 1997. “Estimation and Prediction for Stochastic Block- models for Graphs with Latent Block Structure.” Journal of classification 14 (1): 75–100.

Vidal, Jordi Blanes I, Mirko Draca, and Christian Fons-Rosen. 2012. “Revolving Door Lobbyists.” The American Economic Review 102 (7): 3731–3748.

31 Ward, Michael D, Katherine Stovel, and Audrey Sacks. 2011. “Network Analysis and Political Science.” Annual Review of Political Science 14: 245–264.

Weingast, Barry R., and Mark J. Moran. 1983. “Bureaucratic Discretion or Congressional Con- trol? Regulatory Policymaking by the Federal Trade Commission.” Journal of Political Economy 91 (5): 765-800.

Wright, John R. 1990. “Contributions, Lobbying, and Committee Voting in the US House of Representatives.” American Political Science Review 84 (02): 417–438.

Wright, John R. 1996. Interest Groups and Congress. Boston: Allyn and Bacon.

You, Hye Young. 2017. “Ex Post Lobbying.” The Journal of Politics 79 (4): 1162-1176.

32 A Supplementary Appendix

A.1 Identifying Bills and Missing Congress Numbers

Identifying congressional bills in lobbying reports is difficult because bill numbers are repeated across Congresses, and often do not appear directly annotated with Congress numbers in lobbing reports. Using the report filing year to guess the Congress often leads to erroneous matches, because reports filed at the beginning of a new Congress tend to include disclosures of lobbying activities from the previous year (and therefore, if a new Congress has begun recently, from the previous Congress as well). For example, consider the following lobbying report filed by Google, Inc. in 2013. It reads:

Monitor legislation regarding online privacy including Safe Data Act (H.R. 2577, S. 1207) and Do not track proposals (H.R. 654). Monitor any Congressional or Administration efforts to impose privacy laws on search engines. Monitor Spectrum acts (S. 911, H.R. 2482).

Figure A.1: First Quarter Report by Google, Inc. in 2013

A naive guess would be that the House bill H.R.2577 refers to a bill from the 113th Congress, because the report was filed in 2013. However, it is clear from the report that this is a 112th Congress bill, the “SAFE Data Act”. We use the following strategies to mitigate this problem and correctly identify Congress session numbers under various circumstances.

1. Bill Number Search: We first identify bill numbers (e.g., H.R.2577 above) using regular expression search in the report text. In the above example in Figure A.1, our algorithm would identify bill numbers H.R. 2577, S.1207, H.R.654, S.911, and H.R.2482. Note that all of these bills are from the 112th Congress rather than the 113th.

2. Congress Identification: Given a bill number found in a specific issue text (a section of the lobbying report), we attempt to identify the most likely Congress to which that bill would belong using other text around the bill number. Starting from the Congress containing the year of the lobbying report, we consider a range of candidate Congresses extending backwards from that of the lobbying report (by default, we consider three Congresses back; therefore in the above example, we would consider the 113th, 112th, and 111th Congresses). We then obtain the bills of the same number from each of these Congresses (omitting the Congresses that do not have a bill of the given number), and compute a bag-of-words representation (after a tokenization and stopword filtering pipeline) of each of those bills, giving some vectors

v1,..., vn, and the same representation of the text around the mention of the bill number, w. The Congress that we choose then corresponds to the maximizer of the cosine similarity,

33 its index given by v>w i∗ = argmax i . (15) 1≤ ≤ v w i n k ikk k If no bill having the same number exists in the entire range of Congresses we consider, we simply guess that the bill comes from the Congress of the year the lobbying report was filed.

3. Congress Propagation: If we successfully find a match for a Congress, it may be propa- gated to the other bills mentioned in the lobbying report, since, being scheduled on a quarterly basis, lobbying report will effectively always only pertain to legislation in a single Congress. If different bills in a lobbying report disagree on the best-match Congress, a majority vote may be taken, but this rarely occurs in practice.

4. Bill Title Search: Bills are sometimes only referred to by titles or alternate names. To account for this, we clean and tokenize the specific issue sections of the lobbying report, and perform a text matching operation against a table of bill titles. For instance, this operation would identify “Safe Data Act” in our previous example, even if the bill number H.R.2577 were not mentioned.

5. Bill Range Expansion: It is also common for bills with nearby numbers to be related, and for lobbying reports to refer to ranges of bills when they are all being lobbied at once.

“H.R.3009, Trade Act of 2002. Certain miscellaneous tariff bills to suspend the rates of duty on certain toy-related articles (H.R.4182-4186; S.2099-2103). WTO market access negotiations for non-agricultural products Port and border security measures”

Figure A.2: Midyear Report by Mattel, Inc. in 2002

Therefore, if we find two bill numbers that are close (by default if they share the same prefix and their numbers differ by at most 10), then we consider all other bills with numbers in be- tween as also being lobbied. For instance, the pattern H.R.4182-4186 in the excerpt shown in Figure A.2 would be expanded into bills H.R.4182, H.R.4183, H.R.4184, H.R.4185, and H.R.4186 all being considered lobbied.

A.2 Assembling Datasets

A.2.1 Filtering

Besides reducing the size of our data, the key task in filtering is to increase the minimum degree of the agents we include. This is especially important for latent space models, which will be the class of models we consider in the greatest detail; it is intuitive that there is no principled way to position an actor in the lobbying marketplace if the actor does not lobby or sponsor very much,

34 or at all. Thus, while it may appear appealing at first to filter the dataset independently on the client and politician side by separate summary statistics like total numbers of bills sponsored or lobbying reports submitted during a Congress, such filtering is not necessarily aligned with the goal of finding a large submatrix of A (or induced subgraph of G) with only agents of sufficiently high degree. In particular, we want to avoid sponsor filtering causing some clients who survived client filtering to have their degree reduced again, or vice-versa. We find that the simplest way to avoid such issues is to build the full matrix A without filtering first, and then filter based only on incidences in a way that explicitly ensures that only high-degree agents remain. To that end, we use a filtering procedure defined by two thresholds, denoted TL and

TP throughout, which alternates removing all clients with degree lower than TL and removing all politicians degree lower than TP , until no more clients or politicians are removed in a full iteration. This is a simple greedy algorithm for finding an induced subgraph of G with only high-degree nodes, which we find to suffice for our purposes; almost always just a few iterations are required for the algorithm to terminate, but rarely just one iteration.

A.2.2 Specifications

Each of our datasets pertains to a single Congress, and will be identified by the number of the Congress, along with one of the following specifications.

Unthresholded: All politicians in both chambers of Congress any of whose sponsored bills • have been lobbied on are included, and all clients who have lobbied on any bills are included.

Full: The unthresholded dataset, filtered with the algorithm from XXX with thresholds • TL = 100 and TP = 10. (The name reflects that this dataset contains both chambers of Congress unlike the per-chamber datasets below; the unthresholded dataset will only be considered briefly at the beginning of our analysis and should not be confused with this dataset, which will play a more important role.)

Senate or House: The full dataset, restricted to include only politicians in the Senate or • House of Representatives, respectively, and then filtered again with TL = 20 and TP = 10 (this last filtering only has a very small effect on the dataset size, and serves only to exclude a few very low-weight rows and columns from the restricted incidence matrix).

Top Term: The same construction as the full dataset, but including only those bills whose • top term is the specified term (thereby restricting bills to a certain issue area). We tune the thresholds used for these datasets to give dataset sizes roughly similar to those of the other

datasets. For the “Energy” top term, we take TL = 5, TP = 5.

35 Congress Specification Clients Politicians Sparsity 113 Unthresholded 6,747 542 98.28% Full 707 525 91.99% Senate 701 111 95.15% House 707 414 93.79% Top Term “Energy” 518 101 95.03% 111 Senate 945 118 86.27% House 949 410 94.26%

Table A.1: Datasets. The specification, shape, and sparsity (fraction of zero entries) of all of the datasets used in our analysis.

Bills Lobbied per Client Bill Sponsored per Politician 0.12

0.10

0.08

0.06

0.04

0.02 Proportion of Politicians Proportion of Lobbying Clients

0.00 0 50 100 150 200+ 0 20 40 60 80 100 Bills Lobbied Bills Sponsored

Lobbying Client Bills Lobbied Politician Bills Sponsored Iraq/Afghanistan Veterans of America 725 Robert Menendez 107 Chamber of Commerce 684 Alan Grayson 96 National Cable and Telecom Assn. 312 Mark Begich 92 Berkshire Hathaway Energy 307 David Vitter 85 Xcel Energy 292 Harry Reid 80 SEMA 290 Amy Klobuchar 78 American Civil Liberties Union 286 Sherrod Brown 69 Duke Energy 283 Bernard Sanders 69 Integrys Energy Group 273 Dianne Feinstein 69 Edison Electric Institute 253 Charles Schumer 67

Figure A.3: Distribution of Political Actions: We present the skewed distributions of politician sponsorship and lobbying counts for the 113th Congress dataset. The activity counts and names of the most active politicians and clients are presented in the tables.

Lastly, we refer in shorthand to the full 113th Congress dataset as the “main dataset”, since for models other than latent space models we will present findings only for this dataset, our interest in those models being primarily methodological comparison. More generally, most of our analyses will focus on the 113th Congress, with the 111th Congress referenced for comparison. Table A.1 shows basic statistics for all datasets we will use.

36 A.3 Latent Space Model Details

A.3.1 Computation

The Stan code in Listing A.1 was used to fit our latent space models, run through the PyStan interface. In practice, by far the most important optimization for this model is the vectorization of the declaration of the Poisson distribution for edge weights; factorizing the latent space posi- tion prior covariance matrix into Cholesky factors and using them to transform standard normal variables is a common optimization for multivariate normal priors, but our covariance matrix is so small that we find this not to affect (or even to degrade, in some cases) sampling performance. For each latent space model we run four MCMC chains, each drawing 10,000 samples, the first 4,000 of which are discarded as warmup (“burn-in”) samples, leaving us with 6,000 usable samples per chain, for a total of 24,000 samples, which are used to compute estimates of posterior means of all of our models parameters as used in all visualizations in the main text.

A.3.2 MCMC Diagnostics

Position Variances We employ a useful visualization that captures the variance structure of all of our latent space position estimates at once. For any agent, if we take N samples of its d-dimensional position, we may view these samples as the columns of a matrix X Rd×N . We ∈ d let µ R be the vector of means of the rows of X, and let X0 be the centered matrix obtained ∈ 1 > d×d by subtracting µ from each entry of the ith row of X. Then, W = X0X R is the i N−1 0 ∈ empirical covariance matrix of the position samples, which we diagonalize to obtain orthonormal eigenvectors v1,..., vd and associated non-negative eigenvalues λ1, . . . , λd. This is essentially a principal component analysis (PCA), except unlike the usual high-dimensional setting for PCA, we actually have many points in a low-dimensional space, N d. Nonetheless, the computed  quantities have the same interpretations: λi is the amount of variance of the dataset of sampled positions in the direction of principal component vi, and the shape of the set of sampled positions is approximated by the ellipsoid with axes in the directions of the vi and having lengths √λi (one “directional standard deviation” in each principal direction). For small d, in particular for d = 2 as we use for most of our models, these ellipses can be drawn for all embedded points at once, and their scale is easy to understand visually relative to the scale of the entire latent space. Small ellipses correspond to concentrated posterior distributions for position parameters, and concentrated posteriors justify the use of the mean position point in our other visualizations (also, though we did not use such algorithms, concentrated posteriors would justify safe use of inferential approximations such as EM algorithms and variational methods). This visualization is given for the model inferred from the main dataset in Figure A.4.

37 data { int N_row; // number of rows int N_col; // number of columns int D; // dimensionality of latent space int edges[N_row, N_col]; // connection strength data } transformed data { int flat_ix; int flat_edges[N_row * N_col];

flat_ix = 1; for (i in 1:N_row) { for (j in 1:N_col) { flat_edges[flat_ix] = edges[i][j]; flat_ix = flat_ix + 1; } } } parameters { vector[D] mu_col_embedding; vector[D] cov_row_embedding_diag; cov_matrix[D] cov_col_embedding; row_vector[D] row_embedding[N_row]; row_vector[D] col_embedding[N_col];

real mu_row_factor; real sigma_row_factor; real sigma_col_factor; vector[N_row] row_factor; vector[N_col] col_factor; } model { int flat_jx; vector[N_row * N_col] flat_log_means;

row_factor ∼ normal(mu_row_factor, sigma_row_factor); col_factor ∼ normal(0.0, sigma_col_factor);

row_embedding ∼ multi_normal( rep_vector(0.0, D), diag_matrix(cov_row_embedding_diag)); col_embedding ∼ multi_normal( mu_col_embedding, cov_col_embedding);

flat_jx = 1; for (i in 1:N_row) { for (j in 1:N_col) { flat_log_means[flat_jx] = row_factor[i] + col_factor[j] - distance( row_embedding[i], col_embedding[j]); flat_jx = flat_jx + 1; } } flat_edges ∼ poisson_log(flat_log_means); } Listing A.1: Stan code defining a sampler for the posterior distribution of the LSNM. Note that for d = 1 the model is greatly simplified, and it is much more efficient to remove the extra dimension from the types related to the latent space position distribution.

38 Typical Latent Space Positions: Politicians Typical Latent Space Positions: Lobbying Clients 5 5

4 4

3 3

2 2

1 1

0 0

1 1 − − Latent Space Dimension 2 Latent Space Dimension 2 2 2 − − 3 3 − − 4 4 − 4 2 0 2 4 − 4 2 0 2 4 − − − − Latent Space Dimension 1 Latent Space Dimension 1

Figure A.4: Latent Space Position Uncertainties. One of the cardinal advantages of the Bayesian formulation of our latent space model is that it allows us to investigate the posterior distribution beyond a point estimate of the parameters. We illustrate here an estimate of the “typical set” of each agent’s latent space position, an ellipse containing most of the samples drawn of their position by our MCMC algorithm. The computations giving the parameters of these ellipses are detailed in Section A.3.2. The uncertainties are visibly small enough that we may surmise the clusters drawn in Figure 3 to be “stable”, in the sense that similar clusters would appear in a typical draw from the posterior distribution.

Popularity Factor Variances For popularity factors αi and βj, it is more convenient to consider the variance of the estimates of the exponentials exp(αi) and exp(βj): first, these are non-negative and so their statistics are easier to understand, and second they are the quantities we visualize and are ultimately more interested in, since the interaction mean scales linearly in each. Thus, we compute the standard error (standard deviation divided by mean) of each of these quantities. As we did for the position variances, since there are many popularity factors and we analyze them in this work only at a coarse-grained level, we prefer to visualize all of the standard errors at once, giving in Figure A.5 their histograms for the main dataset. We observe that the distributions of the standard errors are strongly concentrated on the low end; manual inspection reveals that the agents with the highest standard errors are those with the lowest degrees, and so are less significant to our substantive analysis anyway (these agents also tend to have smaller popularity factors, which can further inflate the standard error). In both the latent positions and popularity factors, we see that the politician parameters have more uncertainty than the lobbying client parameters, which is consistent with our model finding that the latent space structure is imposed primarily by lobbying client industry groupings.

39 150

125

100

75

50 Number of Politicians

Number of Lobbying Clients 25

0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8

Standard Error of exp(αi) Standard Error of exp(βj )

Figure A.5: Latent Space Model Popularity Factor Uncertainties. Distributions of the standard errors (standard deviation as a fraction of the mean) of the sampled popularity factors for all lobbying clients and all politicians.

Trace Plots In Figure A.6, we show two representative sets of trace plots for latent space position and popularity factor parameters for one lobbying client and one politician. Generally, and as is visible in these particular examples, the marginal distributions for lobbying client parameters mix more rapidly than those of politician parameters, and the marginals for popularity factors mix more rapidly than those of latent positions.

A.3.3 Identifiability in Multidimensional Log-Quadratic Latent Space Models

Latent space models that measure distance between latent positions with the Euclidean distance θ ψ typically have means depending (after a suitable logarithmic or sigmoidal transformation k i − jk for Poisson or Bernoulli distributions for the incidence matrix entries) linearly on either θ ψ k i − jk or θ ψ 2 (we call these log-linear and log-quadratic latent space models, respectively). We k i − jk chose the former dependence for our model after finding that it gave a latent space that was better- behaved numerically and more interpretable. In this section, we give a brief heuristic mathematical argument and some numerical evidence on our datasets for why this might be the case. We suggest that the issue may arise from a new invariance in the log-quadratic LSNM, which only appears in models with ambient dimension d > 1 (explaining why, as far as we know, it has not been observed previously). The problem of indentifiability (or “aliasing”) as it arises in our model is that the log-mean of the Poisson distributions we assign to Ai,j, given by

log µ = α + β + θ ψ , (16) i,j i j k i − jk

40 Politician: Dimension 1 Client: Dimension 1

0.5 0.5 0.0 1.0 0.5

Politician: Dimension 2 Client: Dimension 2 0.0

1.0 0.5

0.5 1.0

Politician: Popularity Factor Client: Popularity Factor

0.5 0.8

0.6 1.0 0.4 0 200 400 600 0 200 400 600 Iteration (Every 10) Iteration (Every 10)

Figure A.6: Latent Space Model Trace Plots. We show trace plots for the model parameters (two latent space dimensions and one popularity factor) drawn from two (out of four) MCMC chains, for one client, Intel Corporation, and one politician, Rand Paul (R-KY). admits certain transformations of (α, β, θ, ψ) that do not change the mean, and therefore do not change the modeled distribution. This can create multimodal or ridged posterior distributions, making sampling more difficult, so we would like to add additional assumptions (in Bayesian data analysis this is usually done by adding “more informative” priors) to remove these symmetries.

Two classes of symmetries are easy to see: any constant can be added to all αi and subtracted from all βj, and any translation, rotation, or reflection can be applied to both θi and ψj. These identifiability issues are well-understood in the literature, and are addressed specifically for our model in the main text.

But when we instead consider means νi,j satisfying

log ν = α + β + θ ψ 2, (17) i,j i j k i − jk we believe that a more subtle identifiability problem arises, which depends on all four of the

41 parameters together. We expand the distance term, obtaining

d d ! d ! d X X X X log ν = α + β + (θ ψ )2 = α + θ2 + β + ψ2 2 θ ψ , (18) i,j i j i` − j` i i` j j` − i` j` `=1 `=1 `=1 `=1 and consider a family of transformations parametrized by d non-zero constants K1,...,Kd, and taking the form

θ K θ for ` [d] (19) i` 7→ ` i` ∈ 1 ψj` ψj` for ` [d] (20) 7→ K` ∈ d X α α + (1 K2)θ2 (21) i 7→ i − ` i` `=1 d X  1  β β + 1 ψ2 (22) j 7→ j − K2 j` `=1 `

The idea is that the first two mappings ensure that the last term of (18) is unchanged, while the last two adjust α and β so that the contributions due to the first two mappings are cancelled.

When d = 1, since there is just a single parameter K1, the αi either all increase or all decrease, and the βj do the opposite, so if, as mentioned above, we have already constrained these parameters to not admit translations, then this transformation will only be admissible if K1 = 1, which is the identity transformation, so there is no identifiability problem. However, when d > 1, we can have Pm a non-trivial transformation with the αi and βj remaining centered (which we think of as i=1 αi Pn and j=1 βj remaining constant) so long as

m d d m ! X X X X 0 = (1 K2)θ2 = (1 K2) θ2 (23) − ` i` − ` i` i=1 `=1 `=1 i=1   n d   d   n X X 1 2 X 1 X 2 0 = 1 ψ = 1  ψ  (24) − K2 j` − K2 j` j=1 `=1 ` `=1 ` j=1 which generically admits solutions with K` not all identically equal to 1 when d > 1. We see from our definitions (19) and (20) that under this family of transformations, we would expect the latent positions to move roughly along hyperbolae. In Figure A.7, we verify this prediction with a numerical experiment on our main dataset.

A.4 Model Comparison

We give a few points of comparison between our latent space and community models, taking the point of view not of attempting to choose a single “best fit” model, but rather of confirming that

42 the models all consistently reflect the same underlying properties of the lobbying network. There are two components of the models to compare: first, both the dc-biSBM and the LSNM include latent variables playing the role of “popularity factors”, which we named in both models αi and

βj for lobbying clients and politicians respectively (though in the LSNM their role differed by an exponential transform). In Figure A.8, we compare these values and confirm that they are positively correlated for both politicians and lobbying clients. Second, in the LSNM we assign latent positions to each agent, while in the community models we divide up the agents into discrete clusters. This prompts the question of whether the latent positions of the LSNM “respect” the clusterings suggested by the community models, i.e., of whether the communities identified the block models are localized in the latent space. We address this question in Figure A.9, and Figure A.10 for the biSBM and dc-biSBM models, respectively, observing that the latter both gives less weight to high-degree nodes, and is more aligned with the LSNM geometric representation.

Typical Positions: Politicians (Linear) Typical Positions: Lobbying Clients (Linear)

4 4

2 2

0 0

2 2 Latent Space− Dimension 2 Latent Space− Dimension 2

4 4 − 4 2 0 2 4 − 4 2 0 2 4 − − − − Latent Space Dimension 1 Latent Space Dimension 1

Typical Positions: Politicians (Quadratic) Typical Positions: Lobbying Clients (Quadratic)

2 2

1 1

0 0

1 1 − − Latent Space Dimension 2 Latent Space Dimension 2

2 2 − − 2 0 2 2 0 2 − − Latent Space Dimension 1 Latent Space Dimension 1

Figure A.7: Identifiability Problems in Log-Quadratic Latent Space Model. We see that typical latent space positions “stretch” along hyperbolae in latent space models with a Poisson mean depending log-quadratically on the distance between positions, which is much mitigated in the same model depending log-linearly on the distance.

43 60 30

40 20 Value in LSNM Value in LSNM ) ) i j 20 α 10 β exp( exp( 0 (ρ = 0.73) 0 (ρ = 0.63) 0.00 0.05 0.10 0.0 0.2 0.4 αi Value in dc-biSBM βj Value in dc-biSBM

Figure A.8: Popularity Factor Comparison. We plot the popularity factors obtained by the LSNM and dc-biSBM, confirming that there is a fairly strong positive correlation between their values. It is most natural to take the exponential of the factors in the LSNM in this comparison, as can be seen from the formal expressions for the models’ distributions in the main text.

A.5 Bipartite Link Community Model EM Algorithm

In this appendix, we present a more detailed derivation of the EM update equations for the link community model. After discarding constant terms, the log-likelihood of this model is given by

m n k ! m n k X X X X X X log P(A α, β, κ) = A log κ α β κ α β (25) | i,j z i,z j,z − z i,z j,z i=1 j=1 z=1 i=1 j=1 z=1

We then apply the standard technique of introducing parameters qi,j(z) that form a probability distribution over z for fixed i and j and applying Jensen’s inequality to obtain the objective function of our optimization task, a lower bound on the log-likelihood:

m n k     df X X X κ α β (A, α, β, κ, q) = A q (z) log z i,z j,z κ α β (26) L i,j i,j q (z) − z i,z j,z i=1 j=1 z=1 i,j log P(A α, β, κ). ≤ |

We then seek to maximize via coordinate ascent. Maximizing with respect to the q (z) with all L i,j other parameters fixed simply sets these parameters to the values that make Jensen’s inequality sharp, which are κ α β q (z) = z i,z j,z . (27) i,j Pk z=1 κzαi,zβj,z

It is also straightforward to differentiate with respect to κz, obtaining the update

Pm Pn i=1 j=1 Ai,jqi,j(z) κz = Pm Pn . (28) i=1 j=1 αi,zβj,z

44 For the α and β parameters, we must constrain our optimization to respect the normalization that for all z, Pm α = Pn β = 1. If add to Lagrange multiplier terms λ (1 Pm α ) + i=1 i,z j=1 j,z L z − i=1 i,z µ (1 Pn β ), then we obtain the updates z − j=1 j,z Pn j=1 Ai,jqi,j(z) αi,z = Pn , (29) λz + j=1 κzβj,z Pm i=1 Ai,jqi,j(z) βj,z = Pm , (30) µz + i=1 κzαi,z but now we see that to obtain the desired normalization we must in fact take λz and µz such that they cancel out these denominators and leave us with the simpler

Pn j=1 Ai,jqi,j(z) αi,z = Pm Pn , (31) i=1 j=1 Ai,jqi,j(z) Pm i=1 Ai,jqi,j(z) βj,z = Pm Pn , (32) i=1 j=1 Ai,jqi,j(z) which in particular do not involve any coupling between αi,z and βj,z, meaning that a single itera- tion of updates suffices for the M step of our EM algorithm, simplifying the calculation substantially compared to the nested coordinate ascents that are usually involved in mixed-membership stochas- tic block models.

45 Client Cluster 1 Client Cluster 2 Politician Cluster 1 Politician Cluster 2

2.5

0.0

2.5

5.0

Client Cluster 3 Client Cluster 4 Politician Cluster 3 Politician Cluster 4

2.5

0.0

2.5

5.0

Client Cluster 5 Client Cluster 6 Politician Cluster 5 Politician Cluster 6

2.5

0.0

2.5

5.0

Client Cluster 7 Client Cluster 8 Politician Cluster 7 Politician Cluster 8

2.5

0.0

2.5

5.0

2.5 0.0 2.5 2.5 0.0 2.5 2.5 0.0 2.5 2.5 0.0 2.5

Figure A.9: LSNM vs. biSBM. We plot the latent positions of politicians and lobbying clients from the LSNM, split by their memberships in clusters from the biSBM, with both run on the main dataset. For the sake of visual clarity, we no longer vary the sizes of the plotted latent space positions based on popularity factors. Localization of groups in the latent space suggests that the latent space model is somewhat consistent with the biSBM, but much of the fine-grained lobbying client clustering we observe in Figure 3 is not reflected in the biSBM results. (The largest clusters biSBM finds are those with low lobbying and sponsorship activity, and correspond to the most spread out clusters in these plots.)

46 Client Cluster 1 Client Cluster 2 Politician Cluster 1 Politician Cluster 2

2.5

0.0

2.5

5.0

Client Cluster 3 Client Cluster 4 Politician Cluster 3 Politician Cluster 4

2.5

0.0

2.5

5.0

Client Cluster 5 Client Cluster 6 Politician Cluster 5 Politician Cluster 6

2.5

0.0

2.5

5.0

Client Cluster 7 Client Cluster 8 Politician Cluster 7 Politician Cluster 8

2.5

0.0

2.5

5.0

2.5 0.0 2.5 2.5 0.0 2.5 2.5 0.0 2.5 2.5 0.0 2.5

Figure A.10: LSNM vs. dc-biSBM. We plot the latent positions of politicians and lobbying clients from the LSNM, split by their memberships in clusters from the dc-biSBM, with both run on the main dataset. Almost all clusters found by dc-biSBM are closely localized in the latent space, showing that the dc-biSBM captures much of the same information as the latent space model.

47 A.6 113th Congress Latent Space Model Clusters

The tables below list all client and politician members of each of the clusters shown in Figure 3.

A.6.1 Cluster: Finance

Lobbying Client Name Politician Name State Party American Bankers Assn. Securities Assn. Barr, Garland KY Republican American Land Title Assn. Campbell, John CA Republican American Council of Life Insurers Corker, Bob TN Republican Bank of America Corp. Duffy, Sean WI Republican Capital One Financial Corp. Fincher, Stephen TN Republican Center for Responsible Lending Garrett, Scott NJ Republican Citigroup Management Corp. Hahn, Janice CA Democrat Clearing House Payments Co. LLC Hultgren, Randy IL Republican Credit Union Ntl. Assn. Isakson, John GA Republican Community Associations Institute Johnson, Tim SD Democrat Community Bankers Assn. of Illinois King, Peter NY Republican Compass Bank Luetkemeyer, Blaine MO Republican Deutsche Bank Securities, Inc. McHenry, Patrick NC Republican Fifth Third Bancorp Miller, Gary CA Republican Financial Services Roundtable Neal, Richard MA Democrat Genworth Financial, Inc. Perlmutter, Ed CO Democrat Independent Community Bankers of America Pittenger, Robert NC Republican Investment Company Institute Shelby, Richard AL Republican Intl. Swaps and Derivatives Assn. Sherman, Brad CA Democrat Massachusetts Mutual Life Insurance Co. Wagner, Ann MO Republican Mortgage Bankers Assn. Waters, Maxine CA Democrat MetLife Group, Inc. Morgan Stanley Ntl. Assn. of Federal Credit Unions New York Bankers Assn. Petroleum Marketers Assn. of America Principal Financial Group Radian Group, Inc. Securities Industry and Financial Markets Assn. State Farm Insurance Companies Northwestern Mutual Life Insurance Co. TIAA-CREF Transamerica Companies Unum Group Wells Fargo & Co.

A.6.2 Cluster: Travel

Lobbying Client Name Politician Name State Party Global Business Travel Assn. Coats, Daniel IN Republican Mariott International, Inc. Emerson, Jo Ann MO Republican Morphotrust USA Gowdy, Trey SC Republican Sabre Global, Inc. Heck, Joseph NV Republican U.S. Travel Assn. Miller, Candice MI Republican United States Olympic Committee Quigley, Mike IL Democrat

48 A.6.3 Cluster: Insurance & Real Estate

Lobbying Client Name Politician Name State Party ACE INA Holdings Brown, Sherrod OH Democrat Aflac, Inc. Capuano, Michael MA Democrat Allstate Insurance Company Chambliss, Saxby GA Republican American Family Mutual Insurance Company Collins, Susan ME Republican American Insurance Assn. Crawford, Eric AR Republican Assn. for Advanced Life Underwriting Cummings, Elijah MD Democrat American Council of Life Insurers Diaz-Balart, Mario FL Republican Building Owners and Managers Assn. Intl. Farenthold, Blake TX Republican Chubb Corporation Grimm, Michael NY Republican Cincinnati Financial Corporation Huizenga, Bill MI Republican Farmers Group, Inc. Hurt, Robert VA Republican Hartford Financial Services Group Menndez, Robert NJ Democrat Independent Insurance Agents & Brokers Neugebauer, Randy TX Republican of America Palazzo, Steven MS Republican J.P. Morgan Chase & Company Richmond, Cedric LA Democrat KPMG LLP Ross, Dennis FL Republican Liberty Mutual Group Sires, Albio NJ Democrat National Assn. of Insurance and Financial Stivers, Steve OH Republican Advisors Thompson, Bennie MS Democrat National Assn. of Mutual Insurance Companies Wilson, Frederica FL Democrat National Assn. of Professional Insurance Agents National Assn. of Real Estate Investment Trusts Nationwide Mutual Insurance Company Property Casualty Insurers Assn. of America Prudential Financial, Inc. Real Estate Roundtable State Farm Mutual Automobile Insurance Company The Travelers Companies, Inc. and Subsidiaries The Village of Bald Head Island United Services Automobile Assn. Zurich

A.6.4 Cluster: Universities

Lobbying Client Name Politician Name State Party Aircraft Owners & Pilots Assn. Bachus, Spencer AL Republican Assn. of Science-Technology Centers, Inc. Collins, Chris NY Republican Boston University Doyle, Michael PA Democrat California Institute of Technology Foxx, Virginia NC Republican Massachusetts Institute of Technology Johnson, Henry GA Democrat McGraw Hill Education Manchin, Joe WV Democrat Ntl. Business Aviation Assn. Messer, Luke IN Republican Northeastern University Reed, Tom NY Republican Reichert, David WA Republican Ohio State University Rokita, Todd IN Republican University of Cincinatti University of Illinois University of Rochester University of Southern California University of Virginia Vanderbilt University

49 A.6.5 Cluster: Technology

Lobbying Client Name Politician Name State Party Actavis Barton, Joe TX Republican Amazon.com Brooks, Susan IN Republican Amazon Corporate LLC Carper, Thomas DE Democrat American Assn. of Law Libraries Carter, John TX Republican American Express Company Castor, Kathy FL Democrat American Intellectual Property Law Assn. Chaffetz, Jason UT Republican American Mushroom Institute Coble, Howard NC Republican Assn. of National Advertisers, Inc. Cornyn, John TX Republican Business Roundtable DeFazio, Peter OR Democrat Business Software Alliance Deutch, Theodore FL Democrat CompTIA Member Services, LLC Enzi, Michael WY Republican Consumer Electronics Assn. Hatch, Orrin UT Republican Dell, Inc. Heck, Denny WA Democrat Digital 4th Holding, George NC Republican Direct Marketing Assocation, Inc. Issa, Darrell CA Republican Eastman Kodak Company Jeffries, Hakeem NY Democrat Ebay, Inc. Kaptur, Marcy OH Democrat EMC Corporation Leahy, Patrick VT Democrat Experian, Inc. Lofgren, Zoe CA Democrat Facebook, Inc. McCaskill, Claire MO Democrat Google, Inc. Salmon, Matt AZ Republican Hewlett-Packard Company Schumer, Charles NY Democrat Intel Corporation Shea-Porter, Carol NH Democrat Interactive Advertising Bureau Toomey, Patrick PA Republican Intl. Council of Shopping Cneters Intuit, Inc. Womack, Steve AR Republican LinkedIn Corporation Yoder, Kevin KS Republican Magazine Publishers of America Mastercard McAfee, Inc. Microsoft Corporation National Venture Capital Assn. Oracle Corporation Pfizer, Inc. Qualcomm, Inc. Semiconductor Industry Assn. Software & Information Industry Assn. Texas Instruments, Inc. The Internet Assocation Visa, Inc. Yahoo!, Inc.

A.6.6 Cluster: Gun Rights

Lobbying Client Name Politician Name State Party Boyd Gaming Corporation Davis, Danny IL Democrat Bridgepoint Education Jordan, Jim OH Republican Everytown for Gun Safety Action Fund Kelly, Robin IL Democrat Gun Owners of America, Inc. Lowenthal, Alan CA Democrat Ntl. Assn. for Gun Rights McCarthy, Carolyn NY Democrat Ntl. Rifle Assn. of America Stockman, Steve TX Republican

50 A.6.7 Cluster: Telecom

Lobbying Client Name Politician Name State Party American Cable Assn. , Inc. Amash, Justin MI Republican Americans for Tax Reform Ayotte, Kelly NH Republican AT&T Corporation Chabot, Steve OH Republican AT&T Services, Inc. and its Affiliates Collins, Doug GA Republican CBS Corporation Conaway, K. TX Republican Cincinnati Bell Eshoo, Anna CA Democrat Comcast Corporation Goodlatte, Bob VA Republican Competitive Carriers Assn. Heller, Dean NV Republican Cox Enterprises, Inc. Keating, William MA Democrat CTIA: The Wireless Assn. Latta, Robert OH Republican Endo Pharmaceuticals, Inc. Matsui, Doris CA Democrat IHeartMedia, Inc. McCain, John AZ Republican Loews Corporation McCaul, Michael TX Republican Multistate Tax Commission Meng, Grace NY Democrat National Assn. of Broadcasters Nunes, Devin CA Republican National Cable & Telecommunications Assn. Rockefeller, John WV Democrat National Telecommunications Cooperative Assn. Rogers, Mike MI Republican Time Warner Cable, Inc. Rush, Bobby IL Democrat Twenty-First Century Fox, Inc. Walden, Greg OR Republican TwinLogic Strategies Wasserman Schultz, Debbie FL Democrat United States Telecom Assn. Watt, Melvin NC Democrat Verizon Wyden, Ron OR Democrat

A.6.8 Cluster: Military, Veterans’ Affairs

Lobbying Client Name Politician Name State Party American Legion Bishop, Sanford GA Democrat Disabled American Veterans Boozman, John AR Republican Fleet Reserve Assn. Brownley, Julia CA Democrat Iraq and Afghanistan Veterans of America, Inc. Burr, Richard NC Republican Military Officers’ Assn. of America Butterfield, George NC Democrat Paralyzed Veterans of America Coffman, Mike CO Republican Retired Enlisted Assocation Gallego, Pete TX Democrat United Spinal Assn. Gutirrez, Luis IL Democrat Kirkpatrick, Ann AZ Democrat Langevin, James RI Democrat Larsen, Rick WA Democrat McCarthy, Kevin CA Republican Miller, Jeff FL Republican Negrete McLeod, Gloria CA Democrat O’Rourke, Beto TX Democrat Perry, Scott PA Republican Pingree, Chellie ME Democrat Ruiz, Raul CA Democrat Runyan, Jon NJ Republican Sinema, Kyrsten AZ Democrat Takano, Mark CA Democrat Titus, Dina NV Democrat Tsongas, Niki MA Democrat Walorski, Jackie IN Republican Wenstrup, Brad OH Republican

51 A.6.9 Cluster: Conservation & Land Use

Lobbying Client Name Politician Name State Party American Beverage Assn. Bishop, Rob UT Republican American Iron and Steel Institute Bishop, Timothy NY Democrat American Motorcyclist Assn. Booker, Cory NJ Democrat American Rivers Bordallo, Madeleine GU Democrat Arkema, Inc. Boxer, Barbara CA Democrat Assn. of California Water Agencies Calvert, Ken CA Republican Blue Green Alliance Casey, Robert PA Democrat Bunge North America, Inc. Costa, Jim CA Democrat Council on Federal Procurement of Architectural Crapo, Michael ID Republican Engineering Service Daines, Steve MT Republican Earthjustice Legal Defense Fund Davis, Rodney IL Republican Environmental Working Group DeGette, Diana CO Democrat Grocery Manufacturers Assn. Farr, Sam CA Democrat International Code Council Fleming, John LA Republican Jackson County Mississippi Garamendi, John CA Democrat JDRF Intl. Gibbs, Bob OH Republican Las Vegas Valley Water District Gosar, Paul AZ Republican League of Conservation Voters Hagan, Kay NC Democrat Management Assn. for Private Photogrammetric Hastings, Alcee FL Democrat Surveyors Heinrich, Martin NM Democrat Metropolitan Water District of Southern Horsford, Steven NV Democrat California Huffman, Jared CA Democrat Ntl. Corn Growers Assn. Jackson Lee, Sheila TX Democrat Ntl. Right to Work Committee Kind, Ron WI Democrat Ntl. Society of Professional Surveyors Kinzinger, Adam IL Republican Ntl. Wildlife Federation Lautenberg, Frank NJ Democrat Conservancy Lujn, Ben NM Democrat Northern California Power Agency Maloney, Sean NY Democrat Ocean Conservancy McClintock, Tom CA Republican Open Space Institute Murkowski, Lisa AK Republican Public Power Council Paul, Rand KY Republican Safari Club Intl. Ruppersberger, C. MD Democrat Sierra Club Schweikert, David AZ Republican Society of the Plastics Industry, Inc. Scott, Austin GA Republican Solano County Thompson, Mike CA Democrat Southern Environmental Law Center Tipton, Scott CO Republican Southern Nevada Water Authority Udall, Tom NM Democrat The Wilderness Society Valadao, David CA Republican Transportation Communications Intl. Union Vela, Filemon TX Democrat Trust for Public Lands Vitter, David LA Republican United Phosphorus, Inc. Williams, Roger TX Republican Waggoner Engineering, Inc. Wittman, Robert VA Republican Waterways Council, Inc. Young, Don AK Republican

52 A.6.10 Cluster: Energy

Lobbying Client Name Politician Name State Party American Gas Assn. Alexander, Rodney LA Republican American Municipal Power, Inc. Barrasso, John WY Republican American Natural Gas Alliance, Inc. Bennet, Michael CO Democrat American Petroleum Institute Bridenstine, Jim OK Republican American Public Power Assn. Capito, Shelley WV Republican Arch Coal Clarke, Yvette NY Democrat Berkshire Hathaway Energy Clay, Wm. MO Democrat Biotechnology Industry Organization (BIO) Coons, Chris DE Democrat Black Hills Corp. Cramer, Kevin ND Republican BP America, Inc. Duncan, Jeff SC Republican Centerpoint Energy, Inc. Engel, Eliot NY Democrat Chesapeake Energy Corp. Enyart, William IL Democrat Chevron USA, Inc. Flake, Jeff AZ Republican Clean Energy Fuels Corp. Flores, Bill TX Republican CMS Energy Corp. Franks, Trent AZ Republican Consolidated Edison Co. of New York Gardner, Cory CO Republican Covanta Energy Corp. Gohmert, Louie TX Republican Dominion Green, Gene TX Democrat DTE Energy Harper, Gregg MS Republican Duke Energy Corp. Heitkamp, Heidi ND Democrat Edison Electric Institute Hoeven, John ND Republican Energy Future Holdings Inhofe, James OK Republican Entergy Services, Inc. Johanns, Mike NE Republican Exelon Business Services LLC LaMalfa, Doug CA Republican Exxon Mobil Corp. Lamborn, Doug CO Republican FirstEnergy Corp. Lankford, James OK Republican Ford Motor Company Larson, John CT Democrat Great Plains Energy Loebsack, David IA Democrat Growth Energy, Inc. Marchant, Kenny TX Republican Hannon Armstrong Markey, Edward MA Democrat HollyFrontier Corporation Matheson, Jim UT Democrat Hyundai Motor Company McAllister, Vance LA Republican Independent Petroleum Assn. of America McConnell, Mitch KY Republican Integrys Energy Group, Inc. McKinley, David WV Republican Koch Companies Public Sector, LLC Meehan, Patrick PA Republican Marathon Oil Corporation Mica, John FL Republican Marathon Petroleum Corporation Mullin, Markwayne OK Republican Missouri River Energy Services Olson, Pete TX Republican Ntl. Corn Growers Assn. Payne, Donald NJ Democrat Ntl. Fisheries Institute Peters, Gary MI Democrat Ntl. Grid USA Peters, Scott CA Democrat Ntl. Ground Water Assn. Pompeo, Mike KS Republican Ntl. Mining Assn. Rahall, Nick WV Democrat Ntl. Propane Gas Assn. Rothfus, Keith PA Republican Nextera Energy, Inc. Scalise, Steve LA Republican Nisource, Inc. Sensenbrenner, F. WI Republican NV Energy Serrano, Jos NY Democrat Ohio Municipal Electric Assn. Shaheen, Jeanne NH Democrat Olin Corporation Shimkus, John IL Republican Pacific Gas and Electric Co. Southerland, Steve FL Republican

53 Peabody Energy Stewart, Chris UT Republican Peabody Investments Corp. Thornberry, Mac TX Republican Pepco Holdings, Inc. Turner, Michael OH Republican Pinnacle West Capital Corp. Udall, Mark CO Democrat Poet, LLC Waxman, Henry CA Democrat PPL Corporation Welch, Peter VT Democrat Public Service Enterprise Group (PSEG) Whitfield, Ed KY Republican Puget Sound Energy Wicker, Roger MS Republican Renewable Fuels Assn. Yarmuth, John KY Democrat Salt River Project Sempra Energy Sheet Metal & Air Conditioning Contractors Ntl. Assn. Shell Oil Company Solar Energy Industries Assn. Southern Company The Fertilizer Institute UIL Holdings Corporation Vectren Corporation Xcel Energy, Inc.

54 A.6.11 Cluster: Health

Lobbying Client Name Politician Name State Party AARP Bera, Ami CA Democrat Alzheimer’s Assn. Black, Diane TN Republican American Academy of Dermatology Assn. Blunt, Roy MO Republican American Academy of Ophthalmology Braley, Bruce IA Democrat American Academy of Otolaryngology Bucshon, Larry IN Republican American Assn. for Justice Burgess, Michael TX Republican American Assn. of Neurological Surgeons Clark, Katherine MA Democrat American Assn. of Nurse Anesthetists Coburn, Thomas OK Republican American Bar Assn. Cotton, Tom AR Republican American Chiropractic Assn. Courtney, Joe CT Democrat American College of Emergency Physicians Crowley, Joseph NY Democrat American College of Physicians DeSantis, Ron FL Republican American College of Rheumatology Dent, Charles PA Republican American College of Surgeons Edwards, Donna MD Democrat American College of Surgeons Professional Gerlach, Jim PA Republican Assn. Gingrey, Phil GA Republican American Fed. of State, County, and Granger, Kay TX Republican Municipal Employees Graves, Sam MO Republican American Health Care Assn. Harris, Andy MD Republican American Hospital Assn. Hirono, Mazie HI Democrat American Medical Assn. Jenkins, Lynn KS Republican American Optometric Assn. Johnson, Sam TX Republican American Physical Therapy Assn. Kildee, Daniel MI Democrat American Society of Anesthesiologists Kingston, Jack GA Republican American Society of Interventional Pain Pascrell, Bill NJ Democrat Physicians Pierluisi, Pedro PR Democrat America’s Essential Hospitals Pitts, Joseph PA Republican Assn. of American Medical Colleges Price, Tom GA Republican Children’s Healthcare of Atlanta Renacci, James OH Republican City of Baltimore Roskam, Peter IL Republican Coalition of Boston Teaching Hospitals Schock, Aaron IL Republican Congress of Neurological Surgeons Schwartz, Allyson PA Democrat Cornell University Sessions, Pete TX Republican Cubist Pharmaceuticals Smith, Adrian NE Republican Federation of American Hospitals Tierney, John MA Democrat Greater New York Hospital Assn. Walberg, Tim MI Republican Healthcare Leadership Council Hospital & Healthsystem Assn. of Pennsylvania Illinois Hopsital Assn. Indiana University Health, Inc. International Brotherhood of Teamsters Iowa Hospital Assn. Jewish Federation of Metropolitan Chicago Massachusetts Medical Society Michigan State University Ntl. Active and Retired Federal Employees Assn. Ntl. Air Traffic Controllers Assn. National Assn. of Psychiatric Health Systems

55 National Assn. of Pediatric Nurse Practitioners National Fed. of Federal Employees National Treasury Employees Union Nebraska Hospital Assn. Network North Shore - Long Island Jewish Health System NumbersUSA Action Ochsner Clinic Foundation Partners Healthcare, Inc. Professional Aviation Safety Specialists Rutgers: The State University of New Jersey Susan B. Anthony List Tenet Healthcare Corporation The Ickes & Enright Group, Inc. The Trustees of Purdue University Tufts University Tulane University United Parcel Service University of California University of Northern Iowa University of Wisconsin-Madison World Wildlife Fund, Inc. Yum! Brands, Inc.

56