Supervised Topic Models
Advanced Machine Learning for NLP Jordan Boyd-Graber HIERARCHIES
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 1 of 29 | | Motivation: Representing Elected Officials with Ideal Points
An essential tool in political science: distinguish trends and characterize subgroups
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 2 of 29 | | Evaluation: Tea Party in the House
The Tea Party American political movement for freedom, small government, lower tax • Disrupting Republican Party and recent elections • Organizations: • Institutional: Tea Party Caucus ◦ Other: Tea Party Express, Tea Party Patriots, Freedom Works “Conventional◦ views of ideology as a single–dimensional, leftÂU-right˝ • spectrum experience great difficulty in understanding or explaining the Tea Party.”
Goal Explain Tea Partiers in terms of issues and votes • Identify Tea Partiers from their rhetoric •
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 3 of 29 | | But all politicians—by • definition—talk
Not everyone has a voting record
Ideal points estimated based on • voting record Not all candidates have a voting • record Governors ◦ Entertainers ◦ CEOs ◦
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 4 of 29 | | Not everyone has a voting record
Ideal points estimated based on • voting record Not all candidates have a voting • record Governors ◦ Entertainers ◦ CEOs ◦ But all politicians—by • definition—talk
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 4 of 29 | | This work: congressional floor speeches
Let’s use whatever data we have
A single model that uses: Bill text • Votes • Commentary • to map political actors to the same continuous space.
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 5 of 29 | | Let’s use whatever data we have
A single model that uses: Bill text • Votes • Commentary • to map political actors to the same continuous space. This work: congressional floor speeches
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 5 of 29 | | Ideal Point Review
Outline
1 Ideal Point Review
2 Hierarchical Ideal Point Topic Model
3 Predicting Membership
4 How They Vote
5 How They Talk
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 6 of 29 | | Ideal Point Review
One-dimensional Ideal Point using Votes
YEA NAY NAY YEA
NAY YEA YEA
YEA YEA NAY
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review
One-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
YEA NAY NAY YEA
NAY YEA YEA p(va,b = Yea) = (uaxb + yb)
YEA YEA NAY exp(↵) (↵)= exp(↵)+1
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review
One-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
One-dimensional ideal point of legislator a
YEA NAY NAY YEA
NAY YEA YEA p(va,b = Yea) = (uaxb + yb)
YEA YEA NAY
Polarity of bill b Popularity of bill b
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review
One-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
One-dimensional ideal point of legislator a
YEA NAY NAY YEA
NAY YEA YEA p(va,b = Yea) = (uaxb + yb)
YEA YEA NAY
Polarity of bill b Popularity of bill b
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review
One-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
One-dimensional ideal point of legislator a
YEA NAY NAY YEA
NAY YEA YEA p(va,b = Yea) = (uaxb + yb)
YEA YEA NAY
Polarity of bill b Popularity of bill b
LIBERAL CONSERVATIVE
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 7 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
Multi-dimensional ideal point of legislator a
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X
K dimensions
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
Multi-dimensional ideal point of legislator a
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X
K dimensions
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes
Legislator a votes 'Yea' on bill b with probability
Multi-dimensional ideal point of legislator a
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = ua,kxb,k + yb k=1 ! YEA YEA NAY X
K dimensions ? Dimensions are difficult ? to interpret ?
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 8 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes & Text
Bill Text
YEA NAY NAY YEA
NAY YEA YEA
YEA YEA NAY
? Dimensions are difficult ? to interpret ?
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes & Text
Legislator a votes 'Yea' on bill b with probability
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = xb ua,k#b,k + yb k=1 ! YEA YEA NAY X
? Dimensions are difficult ? to interpret ?
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes & Text
Legislator a votes 'Yea' on bill b with probability
Multi-dimensional ideal point of legislator a
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = xb ua,k#b,k + yb k=1 ! YEA YEA NAY X
Topic proportion of bill b estimated from its text ? Dimensions are difficult ? to interpret ?
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Ideal Point Review
Multi-dimensional Ideal Point using Votes & Text
Legislator a votes 'Yea' on bill b with probability
Multi-dimensional ideal point of legislator a
YEA NAY NAY YEA K
NAY YEA YEA p(va,b = Yea) = xb ua,k#b,k + yb k=1 ! YEA YEA NAY X
Topic proportion of bill b estimated from its text
obamacare, patient, doctor, insurance, affordable_care, hospital
balanc_budget, debt_ceil, cap, cut_spend, rais_tax debt_limit, nation_debt,
employ, hire, job_creator, union, nlrb, boe, labor, business_owner
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 9 of 29 | | Hierarchical Ideal Point Topic Model
Outline
1 Ideal Point Review
2 Hierarchical Ideal Point Topic Model
3 Predicting Membership
4 How They Vote
5 How They Talk
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 10 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Intuition
What are your thoughts on the issue of immigration?
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 11 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Overview
Using both votes and text to learn Two-level topic hierarchy: • Ideal points in multiple interpretable dimensions •
Health
obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board
Frame H1 Frame H2 Frame H3
afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Overview
Using both votes and text to learn Two-level topic hierarchy: • First-level nodes map to agenda issues ◦ Second-level nodes map to issue-specific frames ◦ Ideal points in multiple interpretable dimensions •
Health
obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board First-level Frame H1 Frame H2 Frame H3 nodes afford_care, obamacar, replac, patient, doctor, Topic exchang, mandat, insur, physician, hospit, ~ patient_protect, health_insur, medicaid, board, Hierarchy human_servic, coverag, georgia, Agenda issues public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment Second-level flexibl, teach_health, preexist_condit nodes ~ Issue-specific frames
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Overview
Using both votes and text to learn Two-level topic hierarchy: Use existing labeled data to learn priors for • interpretable issues Ideal points in multiple interpretable dimensions •
Use prior to learn Health obamacar, patient, doctor, physician, afford_care, interpretable hospit, insur, replac, mandat, exchang, health_insur, issue topics coverag, medicaid, patient_protect, board
Frame H1 Frame H2 Frame H3
afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Overview
Using both votes and text to learn Two-level topic hierarchy: Ideal points for frames for predictions using • text only Ideal points in multiple interpretable dimensions •
Health
obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board
Frame H1 Frame H2 Frame H3
afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit Learn ideal point for each frame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Overview
Using both votes and text to learn Two-level topic hierarchy: • Ideal points in multiple interpretable dimensions •
Health
obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board
Frame H1 Frame H2 Frame H3
afford_care, obamacar, replac, patient, doctor, exchang, mandat, insur, physician, hospit, patient_protect, health_insur, medicaid, board, human_servic, coverag, georgia, public_health, social_secur, save_medicar, slush_fund, ppaca, premium, nurs, tennesse, mandatori, repeal_obamacar, page, bureaucrat, mandatori_spend, entitl, advisori_board, governor, hospit, govern_takeov, medicin, health_center, purchas, unconstitut, independ_payment flexibl, teach_health, preexist_condit
Multi-dimensional Ideal Points
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Overview
Hierarchical Ideal Point Topic Model: Inputs
A collection of votes va,b • { } A collection of D speeches w d , each of which is given by legislator • ad { } A collection of bill text B w 0b • { }
Bill text
YEA NAY NAY YEA
NAY YEA YEA Speeches Votes YEA YEA NAY
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 12 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Modeling bill text
Each bill text b is a mixture over K issues ϑb • Each bill token generated from topic at first-level issue node •
Health Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Generative Process
Each speech d also has a distribution θd over K issues • Each issue k , each speech d has distribution over frames ψd ,k • Each speech token from topic at second-level frame node • Health Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model
Hierarchical Ideal Point Topic Model: Modeling votes Legislator a votes ‘Yea’ on bill b with probability K • p v Yea Φ x P ϑ u y ( a,b = ) = ( b k=1 b,k a,k + b ) PJk Ideal point ua,k ( j 1 ηk,j ψa,k,j ,ρ) • = ∼Health N Macroeconomics
obamacar, patient, doctor, physician, afford_care, balanc_budget, borrow, debt_ceil, cap, cut_spend, hospit, insur, replac, mandat, exchang, health_insur, nation_debt, grandchildren, rais_tax, entitl, coverag, medicaid, patient_protect, board white_hous, debt_limit, prosper
Frame H1 Frame H2 Frame H3 Frame M1 Frame M2 Frame M3 white_hous, shut, afford_care, obamacar, replac, balanc_budget, patient, doctor, continu_resolut, borrow, exchang, mandat, insur, debt_ceil, cap, physician, hospit, mess, nation_debt, patient_protect, health_insur, cut_spend, medicaid, board, hous_republican, rais_tax, entitl, human_servic, coverag, debt_limit, georgia, novemb, prosper, chart, public_health, social_secur, spend_cut, save_medicar, govern_shutdown, grandchildren, slush_fund, ppaca, premium, fiscal_hous, nurs, tennesse, senat_reid, spend_monei, mandatori, repeal_obamacar, grandchildren, page, bureaucrat, harri_reid, vision, size, gdp, mandatori_spend, entitl, guarante, default, advisori_board, shutdown, liber, tax_increas, cent, governor, hospit, govern_takeov, august, obama, medicin, arriv, govern_spend, health_center, purchas, unconstitut, deficit_spend, rein, independ_payment republican_parti, social_secur flexibl, teach_health, preexist_condit feder_budget blame
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 13 of 29 | | Hierarchical Ideal Point Topic Model
Nested Chinese Restaurant Process
Start at a CRP, choose a table • That table has not just a dish (distribution over words) but also business • card That card tells you which restaurant to go to next • You do this L times •
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 14 of 29 | | Hierarchical Ideal Point Topic Model
Topic Hierarchies
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 15 of 29 | | Hierarchical Ideal Point Topic Model
Generative Process
φ1,1
Start at the root node
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model
Generative Process
φ1,1
φ2,1 φ2,2 φ2,3
Need to choose which table to sit at
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model
Generative Process
φ1,1
φ2,1 φ2,2 φ2,3 φ2*
This is a CRP! (Can create new table too.)
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model
Generative Process
φ1,1
φ2,1 φ2,2 φ2,3 φ2*
φ7,1 φ7,2 φ7,3
Repeat
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model
Generative Process
φ1,1
φ2,1 L φ2,2 φ2,3 φ2*
φ7,1 φ7,2 φ7,3
Your path then becomes the set of topics you use for this document
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Hierarchical Ideal Point Topic Model
Generative Process
φ1,1
φ2,1 L φ2,2 φ2,3 φ2*
φ7,1 φ7,2 φ7,3
Warning: Probably don’t want to only use one path per document (but useful explanation)
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 16 of 29 | | Data 240 Republican Representatives in the 112th U.S. House • 60 are members of the Tea Party Caucus (self-identified) • 60 key votes selected by Freedom Works (2011-2012) • Speeches, bill text and voting records from the Library of Congress •
Hierarchical Ideal Point Topic Model
Evaluation: Tea Party in the House
The Tea Party American political movement for freedom, small government, lower tax • Disrupting Republican Party and recent elections •
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 17 of 29 | | Hierarchical Ideal Point Topic Model
Evaluation: Tea Party in the House
The Tea Party American political movement for freedom, small government, lower tax • Disrupting Republican Party and recent elections • Data 240 Republican Representatives in the 112th U.S. House • 60 are members of the Tea Party Caucus (self-identified) • 60 key votes selected by Freedom Works (2011-2012) • Speeches, bill text and voting records from the Library of Congress •
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 17 of 29 | | Predicting Membership
Outline
1 Ideal Point Review
2 Hierarchical Ideal Point Topic Model
3 Predicting Membership
4 How They Vote
5 How They Talk
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 18 of 29 | | Features Text-based features: normalized term frequency (TF) and TF-IDF • Vote: binary features • HIPTM: features extracted from our model including • K -dim ideal point ua,k estimated from both votes and text ◦ T ˆ K -dim ideal point estimated from text only ηk ψa,k ◦ PK B probabilities estimating a ’s votes Φ(xb k 1 ϑb,k ua,k + yb ) ◦ =
Predicting Membership
Tea Party Caucus Membership Prediction
Experiment setup Task: Binary classification of whether a legislator is a member of the Tea Party • Caucus Evaluation metric: AUC-ROC • Classifier: SVMl i g ht • Five-fold stratified cross-validation •
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 19 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction
Experiment setup Task: Binary classification of whether a legislator is a member of the Tea Party • Caucus Evaluation metric: AUC-ROC • Classifier: SVMl i g ht • Five-fold stratified cross-validation • Features Text-based features: normalized term frequency (TF) and TF-IDF • Vote: binary features • HIPTM: features extracted from our model including • K -dim ideal point ua,k estimated from both votes and text ◦ T ˆ K -dim ideal point estimated from text only ηk ψa,k ◦ PK B probabilities estimating a ’s votes Φ(xb k 1 ϑb,k ua,k + yb ) ◦ =
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 19 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Votes & Text
AUCROC
0.75
0.70
0.65
0.60
TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Votes & Text
AUCROC
0.75
0.70
0.65
0.60
TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All
Text-based Features
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Votes & Text
AUCROC
0.75
0.70
0.65
0.60
TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All
Text-based Vote Features Features
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Votes & Text
AUCROC
0.75
0.70
0.65
0.60
TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All
Text-based Vote Our Features Features Features
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Votes & Text
AUCROC
0.75
0.70
0.65
0.60
TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All
Text-based Vote Our Combining Vote Features Features Features with TF/TF-IDF
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Votes & Text
AUCROC
0.75
0.70
0.65
0.60
TF TFIDF Vote HIPTM Vote-TF Vote-TF-IDF Vote-HIPTM All
Text-based Vote Our Combining Vote Combining Vote Features Features Features with TF/TF-IDF with Our Features
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 20 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Text Only
AUCROC
0.65
0.63
0.61
TF TF-IDF HIPTM
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 21 of 29 | | Predicting Membership
Tea Party Caucus Membership Prediction: Text Only
AUCROC
0.65
0.63
0.61
TF TF-IDF HIPTM
Training: text only Training: text and votes Test: text only Test: text only
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 21 of 29 | | How They Vote
Outline
1 Ideal Point Review
2 Hierarchical Ideal Point Topic Model
3 Predicting Membership
4 How They Vote
5 How They Talk
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 22 of 29 | | How They Vote
One-dimensional Ideal Points
Judy Biggert Jeff Duncan
Ander Crenshaw Jeff Flake
Jim Gerlach Justin Amash
Michael Grimm Mick Mulvaney
Christopher H. Smith Paul C. Broun
Harold Rogers Tom Graves
Peter T. King Jim Jordan
Patrick Meehan Raúl Labrador
Rodney M. Alexander Joe Walsh
Bob Dold Scott Garrett
Steve Stivers Doug Lamborn
Jon Runyan Tom McClintock
Ileana RosLehtinen David Schweikert
Dave G. Reichert Trey Gowdy
Mario DiazBalart Trent Franks
-1.6 -1.5 -1.4 -1.3 -1.2 1.4 1.6 1.8 Ideal Point Ideal Point Tea Party Caucus a Member a Nonmember Tea Party Caucus a Member a Nonmember
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote
One-dimensional Ideal Points
Judy Biggert Jeff Duncan
Ander Crenshaw Jeff Flake
Jim Gerlach Justin Amash
Michael Grimm Mick Mulvaney
Christopher H. Smith Paul C. Broun
Harold Rogers Tom Graves
Peter T. King Jim Jordan
Patrick Meehan Raúl Labrador
Rodney M. Alexander Joe Walsh
Bob Dold Scott Garrett
Steve Stivers Doug Lamborn
Jon Runyan Tom McClintock
Ileana RosLehtinen David Schweikert
Dave G. Reichert Trey Gowdy
Mario DiazBalart Trent Franks
-1.6 -1.5 -1.4 -1.3 -1.2 1.4 1.6 1.8 Ideal Point Ideal Point Tea Party Caucus a Member a Nonmember Tea Party Caucus a Member a Nonmember
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote
One-dimensional Ideal Points
Judy Biggert
Ander Crenshaw
Jim Gerlach Alexander and Crenshaw’s votes • Michael Grimm only agree with Freedom Works 48%
Christopher H. Smith and 50% respectively Harold Rogers Both voted for raising the debt ceiling Peter T. King • and are listed as “traitor” Patrick Meehan
Rodney M. Alexander
Bob Dold
Steve Stivers
Jon Runyan
Ileana RosLehtinen
Dave G. Reichert
Mario DiazBalart
-1.6 -1.5 -1.4 -1.3 -1.2 Ideal Point Tea Party Caucus a Member a Nonmember
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote
One-dimensional Ideal Points
Jeff Duncan
Jeff Flake
Flake and Amash didn’t self-identify Justin Amash • as members of the Tea Party Mick Mulvaney
Caucus but have been endorsed by Paul C. Broun
other Tea Party organizations Tom Graves
Jim Jordan
Raúl Labrador
“Some 46 House members and six senators Joe Walsh
had been [Tea Party] . . . In addition, there Scott Garrett
were about 18 other House members like Doug Lamborn Trey Gowdy, Mark Meadows, and Justin Tom McClintock Amash, and several senators, including Jeff David Schweikert Flake and Pat Toomey, who owed their election to support from the Tea Party and its Trey Gowdy Washington allies.” Trent Franks 1.4 1.6 1.8 Ideal Point Tea Party Caucus a Member a Nonmember
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 23 of 29 | | How They Vote
Multi-dimensional Ideal Points
Government.Operations l
Macroeconomics l l
Transportation l l
Labor..Employment..and.Immigration l
Foreign.Trade l
Banking..Finance..and.Domestic.Commerce l ll l l l l
6 3 0 3 Ideal Points Tea Party Caucus Member Nonmember
Freedom Works’ key votes on most highly polarized dimensions are about government spending
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 24 of 29 | | How They Talk
Outline
1 Ideal Point Review
2 Hierarchical Ideal Point Topic Model
3 Predicting Membership
4 How They Vote
5 How They Talk
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 25 of 29 | | How They Talk
Framing Healthcare
Health obamacar, patient, doctor, physician, afford_care, hospit, insur, replac, mandat, exchang, health_insur, coverag, medicaid, patient_protect, board
Frame H1 Frame H2 Frame H2 afford_care, exchang, patient, doctor, physician, hospit, obamacar, replac, mandat, insur, patient_protect, human_servic, medicaid, board, georgia, health_insur, coverag, social_secur, public_health, slush_fund, ppaca, save_medicar, nurs, tennesse, premium, repeal_obamacar, entitl, mandatori, mandatori_spend, page, bureaucrat, advisori_board, govern_takeov, purchas, governor, hospit, health_center, medicin, independ_payment unconstitut, preexist_condit, employ flexibl, teach_health, unlimit
-.13 .04 .56
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 26 of 29 | | How They Talk
Framing Macroeconomics
Macroeconomics balanc_budget, borrow, debt_ceil, cap, cut_spend, nation_debt, grandchildren, rais_tax, entitl
Frame M1 Frame M2 Frame M2 white_hous, shut, balanc_budget, debt_ceil, borrow, nation_debt, continu_resolut, mess, cap, cut_spend, debt_limit, rais_tax, entitl, prosper, hous_republican, novemb, spend_cut, fiscal_hous, chart, grandchildren, govern_shutdown grandchildren, guarante spend_monei, size, gdp
-.57 -.24 .56
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 27 of 29 | | How They Talk
Polarization
Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance; Not Issues, Civil Liberties Transportation
Macroeconomics; Health; Public Lands and Government Water Management Issue Frames Issue Distribution of Distribution
Polarized Operations
YES NO
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk
Polarization
Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance Not Issues, Civil Liberties Transportation
Macroeconomics; Health; Public Lands and Government Water Management Issue Frames Issue Distribution of Distribution
Polarized Operations
YES NO
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk
Polarization
Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance Not Issues, Civil Liberties Transportation
Macroeconomics; Health; Public Lands Government and Water Management Issue Frames Issue Distribution of Distribution
Polarized Operations
YES NO
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk
Polarization
Ideal Point Not Polarized Distributions Civil Rights, Minority Banking and Finance Not Issues, Civil Liberties Transportation
Macroeconomics; Health; Public Lands and Government Water Management Issue Frames Issue Distribution of Distribution
Polarized Operations
YES NO
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 28 of 29 | | How They Talk
Hierarchies are Cool
For a sweep of single parameter, BNP not that useful • Complex structures are more realistic applications • Combining with supervised objective • Unsolved problem: good prediction with interpretable structure •
Advanced Machine Learning for NLP Boyd-Graber Supervised Topic Models 29 of 29 | |