<<

Supervised Topic Models

Advanced Machine Learning for NLP Jordan Boyd-Graber HIERARCHIES

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 1 of 29 | | Motivation: Representing Elected Officials with Ideal Points

An essential tool in political science: distinguish trends and characterize subgroups

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 2 of 29 | | Evaluation: Tea Party in the House

The Tea Party American political movement for freedom, small government, lower tax • Disrupting Republican Party and recent elections • Organizations: • Institutional: ◦ Other: , , Freedom Works “Conventional◦ views of ideology as a single–dimensional, leftÂU-right˝ • spectrum experience great difficulty in understanding or explaining the Tea Party.”

Goal Explain Tea Partiers in terms of issues and votes • Identify Tea Partiers from their rhetoric •

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 3 of 29 | | But all politicians—by • definition—talk

Not everyone has a voting record

Ideal points estimated based on • voting record Not all candidates have a voting • record Governors ◦ Entertainers ◦ CEOs ◦

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 4 of 29 | | Not everyone has a voting record

Ideal points estimated based on • voting record Not all candidates have a voting • record Governors ◦ Entertainers ◦ CEOs ◦ But all politicians—by • definition—talk

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 4 of 29 | | This work: congressional floor speeches

Let’s use whatever data we have

A single model that uses: Bill text • Votes • Commentary • to map political actors to the same continuous space.

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 5 of 29 | | Let’s use whatever data we have

A single model that uses: Bill text • Votes • Commentary • to map political actors to the same continuous space. This work: congressional floor speeches

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 5 of 29 | | Ideal Point Review

Outline

1 Ideal Point Review

2 Hierarchical Ideal Point Topic Model

3 Predicting Membership

4 How They Vote

5 How They Talk

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 6 of 29 | | Ideal Point Review

One-dimensional Ideal Point using Votes

YEA NAY NAY YEA

NAY YEA YEA

YEA YEA NAY

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review

One-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

YEA NAY NAY YEA

NAY YEA YEA p(va,b = Yea) = (uaxb + yb)

YEA YEA NAY exp(↵) (↵)= exp(↵)+1

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review

One-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

One-dimensional ideal point of legislator a

YEA NAY NAY YEA

NAY YEA YEA p(va,b = Yea) = (uaxb + yb)

YEA YEA NAY

Polarity of bill b Popularity of bill b

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review

One-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

One-dimensional ideal point of legislator a

YEA NAY NAY YEA

NAY YEA YEA p(va,b = Yea) = (uaxb + yb)

YEA YEA NAY

Polarity of bill b Popularity of bill b

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review

One-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

One-dimensional ideal point of legislator a

YEA NAY NAY YEA

NAY YEA YEA p(va,b = Yea) = (uaxb + yb)

YEA YEA NAY

Polarity of bill b Popularity of bill b

LIBERAL CONSERVATIVE

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

Multi-dimensional ideal point of legislator a

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X

K dimensions

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

Multi-dimensional ideal point of legislator a

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X

K dimensions

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes

Legislator a votes 'Yea' on bill b with probability

Multi-dimensional ideal point of legislator a

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X

K dimensions ? Dimensions are difficult ? to interpret ?

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes & Text

Bill Text

YEA NAY NAY YEA

NAY YEA YEA

YEA YEA NAY

? Dimensions are difficult ? to interpret ?

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes & Text

Legislator a votes 'Yea' on bill b with probability

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = xb ua,k#b,k + yb k=1 ! YEA YEA NAY X

? Dimensions are difficult ? to interpret ?

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes & Text

Legislator a votes 'Yea' on bill b with probability

Multi-dimensional ideal point of legislator a

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = xb ua,k#b,k + yb k=1 ! YEA YEA NAY X

Topic proportion of bill b estimated from its text ? Dimensions are difficult ? to interpret ?

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Ideal Point Review

Multi-dimensional Ideal Point using Votes & Text

Legislator a votes 'Yea' on bill b with probability

Multi-dimensional ideal point of legislator a

YEA NAY NAY YEA K

NAY YEA YEA p(va,b = Yea) = xb ua,k#b,k + yb k=1 ! YEA YEA NAY X

Topic proportion of bill b estimated from its text

obamacare, patient, doctor, insurance, affordable_care, hospital

balanc_budget, debt_ceil, cap, cut_spend, rais_tax debt_limit, nation_debt,

employ, hire, job_creator, union, nlrb, boe, labor, business_owner

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Hierarchical Ideal Point Topic Model

Outline

1 Ideal Point Review

2 Hierarchical Ideal Point Topic Model

3 Predicting Membership

4 How They Vote

5 How They Talk

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 10 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Intuition

What are your thoughts on the issue of immigration?

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 11 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Overview

Using both votes and text to learn Two-level topic hierarchy: • Ideal points in multiple interpretable dimensions •

Health

obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board

Frame H1 Frame H2 Frame H3

afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Overview

Using both votes and text to learn Two-level topic hierarchy: • First-level nodes map to agenda issues ◦ Second-level nodes map to issue-specific frames ◦ Ideal points in multiple interpretable dimensions •

Health

obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board First-level Frame H1 Frame H2 Frame H3 nodes afford_care, obamacar, replac, patient, doctor, Topic exchang, mandat, insur, physician, hospit, ~ patient_protect, health_insur, medicaid, board, Hierarchy human_servic, coverag, georgia, Agenda issues public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment Second-level flexibl, teach_health, preexist_condit nodes ~ Issue-specific frames

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Overview

Using both votes and text to learn Two-level topic hierarchy: Use existing labeled data to learn priors for • interpretable issues Ideal points in multiple interpretable dimensions •

Use prior to learn Health obamacar, patient, doctor, physician, afford_care, interpretable hospit, insur, replac, mandat, exchang, health_insur, issue topics coverag, medicaid, patient_protect, board

Frame H1 Frame H2 Frame H3

afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Overview

Using both votes and text to learn Two-level topic hierarchy: Ideal points for frames for predictions using • text only Ideal points in multiple interpretable dimensions •

Health

obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board

Frame H1 Frame H2 Frame H3

afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit Learn ideal point for each frame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Overview

Using both votes and text to learn Two-level topic hierarchy: • Ideal points in multiple interpretable dimensions •

Health

obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board

Frame H1 Frame H2 Frame H3

afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit

Multi-dimensional Ideal Points

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Overview

Hierarchical Ideal Point Topic Model: Inputs

A collection of votes va,b • { } A collection of D speeches w d , each of which is given by legislator • ad { } A collection of bill text B w 0b • { }

Bill text

YEA NAY NAY YEA

NAY YEA YEA Speeches Votes YEA YEA NAY

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Modeling bill text

Each bill text b is a mixture over K issues ϑb • Each bill token generated from topic at first-level issue node •

Health Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Generative Process

Each speech d also has a distribution θd over K issues • Each issue k , each speech d has distribution over frames ψd ,k • Each speech token from topic at second-level frame node • Health Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model

Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics

obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper

Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model

Nested Chinese Restaurant Process

Start at a CRP, choose a table • That table has not just a dish (distribution over words) but also business • card That card tells you which restaurant to go to next • You do this L times •

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 14 of 29 | | Hierarchical Ideal Point Topic Model

Topic Hierarchies

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 15 of 29 | | Hierarchical Ideal Point Topic Model

Generative Process

φ1,1

Start at the root node

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model

Generative Process

φ1,1

φ2,1 φ2,2 φ2,3

Need to choose which table to sit at

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model

Generative Process

φ1,1

φ2,1 φ2,2 φ2,3 φ2*

This is a CRP! (Can create new table too.)

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model

Generative Process

φ1,1

φ2,1 φ2,2 φ2,3 φ2*

φ7,1 φ7,2 φ7,3

Repeat

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model

Generative Process

φ1,1

φ2,1 L φ2,2 φ2,3 φ2*

φ7,1 φ7,2 φ7,3

Your path then becomes the set of topics you use for this document

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model

Generative Process

φ1,1

φ2,1 L φ2,2 φ2,3 φ2*

φ7,1 φ7,2 φ7,3

Warning: Probably don’t want to only use one path per document (but useful explanation)

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Data 240 Republican Representatives in the 112th U.S. House • 60 are members of the Tea Party Caucus (self-identified) • 60 key votes selected by Freedom Works (2011-2012) • Speeches, bill text and voting records from the

Hierarchical Ideal Point Topic Model

Evaluation: Tea Party in the House

The Tea Party American political movement for freedom, small government, lower tax • Disrupting Republican Party and recent elections •

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 17 of 29 | | Hierarchical Ideal Point Topic Model

Evaluation: Tea Party in the House

The Tea Party American political movement for freedom, small government, lower tax • Disrupting Republican Party and recent elections • Data 240 Republican Representatives in the 112th U.S. House • 60 are members of the Tea Party Caucus (self-identified) • 60 key votes selected by Freedom Works (2011-2012) • Speeches, bill text and voting records from the Library of Congress •

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 17 of 29 | | Predicting Membership

Outline

1 Ideal Point Review

2 Hierarchical Ideal Point Topic Model

3 Predicting Membership

4 How They Vote

5 How They Talk

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 18 of 29 | | Features Text-based features: normalized term frequency (TF) and TF-IDF • Vote: binary features • HIPTM: features extracted from our model including • K -dim ideal point ua,k estimated from both votes and text ◦ T ˆ K -dim ideal point estimated from text only ηk ψa,k ◦ PK B probabilities estimating a ’s votes Φ(xb k 1 ϑb,k ua,k + yb ) ◦ =

Predicting Membership

Tea Party Caucus Membership Prediction

Experiment setup Task: Binary classification of whether a legislator is a member of the Tea Party • Caucus Evaluation metric: AUC-ROC • Classifier: SVMl i g ht • Five-fold stratified cross-validation •

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 19 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction

Experiment setup Task: Binary classification of whether a legislator is a member of the Tea Party • Caucus Evaluation metric: AUC-ROC • Classifier: SVMl i g ht • Five-fold stratified cross-validation • Features Text-based features: normalized term frequency (TF) and TF-IDF • Vote: binary features • HIPTM: features extracted from our model including • K -dim ideal point ua,k estimated from both votes and text ◦ T ˆ K -dim ideal point estimated from text only ηk ψa,k ◦ PK B probabilities estimating a ’s votes Φ(xb k 1 ϑb,k ua,k + yb ) ◦ =

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 19 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Votes & Text

AUCROC

0.75

0.70

0.65

0.60

TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Votes & Text

AUCROC

0.75

0.70

0.65

0.60

TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All

Text-based Features

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Votes & Text

AUCROC

0.75

0.70

0.65

0.60

TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All

Text-based Vote Features Features

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Votes & Text

AUCROC

0.75

0.70

0.65

0.60

TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All

Text-based Vote Our Features Features Features

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Votes & Text

AUCROC

0.75

0.70

0.65

0.60

TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All

Text-based Vote Our Combining Vote Features Features Features with TF/TF-IDF

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Votes & Text

AUCROC

0.75

0.70

0.65

0.60

TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All

Text-based Vote Our Combining Vote Combining Vote Features Features Features with TF/TF-IDF with Our Features

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Text Only

AUCROC

0.65

0.63

0.61

TF TF-IDF HIPTM

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 21 of 29 | | Predicting Membership

Tea Party Caucus Membership Prediction: Text Only

AUCROC

0.65

0.63

0.61

TF TF-IDF HIPTM

Training: text only Training: text and votes Test: text only Test: text only

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 21 of 29 | | How They Vote

Outline

1 Ideal Point Review

2 Hierarchical Ideal Point Topic Model

3 Predicting Membership

4 How They Vote

5 How They Talk

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 22 of 29 | | How They Vote

One-dimensional Ideal Points

Judy Biggert

Ander Crenshaw

Jim Gerlach Justin Amash

Michael Grimm

Christopher H. Smith Paul C. Broun

Harold Rogers Tom Graves

Peter T. King Jim Jordan

Patrick Meehan Raúl Labrador

Rodney M. Alexander Joe Walsh

Bob Dold Scott Garrett

Steve Stivers Doug Lamborn

Jon Runyan Tom McClintock

Ileana RosLehtinen

Dave G. Reichert Trey Gowdy

Mario DiazBalart Trent Franks

-1.6 -1.5 -1.4 -1.3 -1.2 1.4 1.6 1.8 Ideal Point Ideal Point Tea Party Caucus a Member a Nonmember Tea Party Caucus a Member a Nonmember

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote

One-dimensional Ideal Points

Judy Biggert Jeff Duncan

Ander Crenshaw Jeff Flake

Jim Gerlach Justin Amash

Michael Grimm Mick Mulvaney

Christopher H. Smith Paul C. Broun

Harold Rogers Tom Graves

Peter T. King Jim Jordan

Patrick Meehan Raúl Labrador

Rodney M. Alexander Joe Walsh

Bob Dold Scott Garrett

Steve Stivers Doug Lamborn

Jon Runyan Tom McClintock

Ileana RosLehtinen David Schweikert

Dave G. Reichert Trey Gowdy

Mario DiazBalart Trent Franks

-1.6 -1.5 -1.4 -1.3 -1.2 1.4 1.6 1.8 Ideal Point Ideal Point Tea Party Caucus a Member a Nonmember Tea Party Caucus a Member a Nonmember

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote

One-dimensional Ideal Points

Judy Biggert

Ander Crenshaw

Jim Gerlach Alexander and Crenshaw’s votes • Michael Grimm only agree with Freedom Works 48%

Christopher H. Smith and 50% respectively Harold Rogers Both voted for raising the debt ceiling Peter T. King • and are listed as “traitor” Patrick Meehan

Rodney M. Alexander

Bob Dold

Steve Stivers

Jon Runyan

Ileana RosLehtinen

Dave G. Reichert

Mario DiazBalart

-1.6 -1.5 -1.4 -1.3 -1.2 Ideal Point Tea Party Caucus a Member a Nonmember

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote

One-dimensional Ideal Points

Jeff Duncan

Jeff Flake

Flake and Amash didn’t self-identify Justin Amash • as members of the Tea Party Mick Mulvaney

Caucus but have been endorsed by Paul C. Broun

other Tea Party organizations Tom Graves

Jim Jordan

Raúl Labrador

“Some 46 House members and six senators Joe Walsh

had been [Tea Party] . . . In addition, there Scott Garrett

were about 18 other House members like Doug Lamborn Trey Gowdy, , and Justin Tom McClintock Amash, and several senators, including Jeff David Schweikert Flake and Pat Toomey, who owed their election to support from the Tea Party and its Trey Gowdy Washington allies.” Trent Franks 1.4 1.6 1.8 Ideal Point Tea Party Caucus a Member a Nonmember

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote

Multi-dimensional Ideal Points

Government.Operations l

Macroeconomics l l

Transportation l l

Labor..Employment..and.Immigration l

Foreign.Trade l

Banking..Finance..and.Domestic.Commerce l ll l l l l

6 3 0 3 Ideal Points Tea Party Caucus Member Nonmember

Freedom Works’ key votes on most highly polarized dimensions are about government spending

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 24 of 29 | | How They Talk

Outline

1 Ideal Point Review

2 Hierarchical Ideal Point Topic Model

3 Predicting Membership

4 How They Vote

5 How They Talk

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 25 of 29 | | How They Talk

Framing Healthcare

Health obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board

Frame H1 Frame H2 Frame H2 afford_care, exchang, patient, doctor, physician, hospit, obamacar, replac, mandat, insur, patient_protect, human_servic, medicaid, board, georgia, health_insur, coverag, social_secur, public_health, slush_fund, ppaca, save_medicar, nurs, tennesse, premium, repeal_obamacar, entitl, mandatori, mandatori_spend, page, bureaucrat, advisori_board, govern_takeov, purchas, governor, hospit, health_center, medicin, independ_payment unconstitut, preexist_condit, employ flexibl, teach_health, unlimit

-.13 .04 .56

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 26 of 29 | | How They Talk

Framing Macroeconomics

Macroeconomics balanc_budget, borrow, debt_ceil, cap, cut_spend, nation_debt, grandchildren, rais_tax, entitl

Frame M1 Frame M2 Frame M2 white_hous, shut, balanc_budget, debt_ceil, borrow, nation_debt, continu_resolut, mess, cap, cut_spend, debt_limit, rais_tax, entitl, prosper, hous_republican, novemb, spend_cut, fiscal_hous, chart, grandchildren, govern_shutdown grandchildren, guarante spend_monei, size, gdp

-.57 -.24 .56

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 27 of 29 | | How They Talk

Polarization

Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance; Not Issues, Civil Liberties Transportation

Macroeconomics; Health; Public Lands and Government Water Management Issue Frames Issue Distribution of Distribution

Polarized Operations

YES NO

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk

Polarization

Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance Not Issues, Civil Liberties Transportation

Macroeconomics; Health; Public Lands and Government Water Management Issue Frames Issue Distribution of Distribution

Polarized Operations

YES NO

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk

Polarization

Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance Not Issues, Civil Liberties Transportation

Macroeconomics; Health; Public Lands Government and Water Management Issue Frames Issue Distribution of Distribution

Polarized Operations

YES NO

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk

Polarization

Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance Not Issues, Civil Liberties Transportation

Macroeconomics; Health; Public Lands and Government Water Management Issue Frames Issue Distribution of Distribution

Polarized Operations

YES NO

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk

Hierarchies are Cool

For a sweep of single parameter, BNP not that useful • Complex structures are more realistic applications • Combining with supervised objective • Unsolved problem: good prediction with interpretable structure •

Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 29 of 29 | |