SOCS Poster.Graffle
Total Page:16
File Type:pdf, Size:1020Kb
Learning to Extract International Relations from News Text Presenter: Brendan O'Connor, Carnegie Mellon University Joint work with Brandon Stewart (political scientist at Harvard) and Noah Smith (CMU). Forthcoming at ACL 2013. See more: http://brenocon.com/irevents Event data in international relations Previous work: knowledge engineering What are the causes of war and peace? Do democracies engage in fewer wars? Why do some Besides manual coding (which is too labor-intensive at scale), previous work in political science crises spiral into conflict, but others are resolved peacefully? Can we forecast future conflicts? uses a knowledge engineering approach: a manually defined ontology of event types and 15,000 textual patterns to identify events -- this took decades of knowledge engineering to To help answer these questions, political scientists use event data: historical datasets of friendly construct. This is very difficult to maintain and must be completely rebuilt for new domains and hostile interactions between countries, as reported in news articles. How can we extract (e.g. domestic politics, commercial news, literature...) this structured information, from millions of news articles? We seek to automate some of this process: from the textual data, is it possible to automatically Left: visualization of GDELT data for the Syria conflict, which extracts events from news using learn the semantic event types, and extract meaningful real-world political dynamics? a knowledge engineering approach. http://gdelt.utdallas.edu Israeli−Palestinian Diplomacy A: Israel-Jordan Peace C: U.S. Calls for West Bank Treaty Withdrawal Our approach: learning both event types and political dynamics B: Hebron Protocol D: Deadlines for Wye River Peace 0.8 Accord E: Negotiations in Mecca meet with, sign with, praise, F: Annapolis Conference say with, arrive in, host, tell, Data A B C D E F 0.4 welcome, join, thank 6.5 million news articles, 1987-2008 “Pakistan promptly accused India” [AP, 1/1/2000] 0.0 arrive in, visit, meet with, travel to, leave, hold 1994 1997 2000 2002 2005 2007 with, meet, meet in, fly to, be in, arrive for talk "diplomacy" with, say in, arrive with, head to, hold in, due Police Actions and Crime Response in, leave for, make to, arrive to, praise Qualitative evaluation: A: Series of Suicide Attacks E: Passover Massacre Preprocess with in Jerusalem F: 400-Person Prisoner Swap 1.2 Case study. Israeli- B: Island of Peace Massacre G: Gaza Street Bus Bombing syntactic parsing C: Arrests over Protests H: Stage Club Bombing Palestinian relations. ISR D: Tensions over Treatment I: House to House Sweep for 7 and named entity identification of Pal. Prisoners militant leaders accuse, criticize, reject, tell, 0.8 J: Major Prisoner Release These are inferences accuse, blame, say, break with, sever with, A blame on, warn, call, attack, rule with, charge, D G hand to, warn, ask, detain, from our model. "verbal conflict" say←ccomp come from, say ←ccomp, suspect, F H I J release, order’ 0.4 Left: shows event class slam, accuse government ←poss, accuse agency C E ←poss, criticize, identify B probability time-series. Timestep (week): 268 PAL 0.0 Right: Verbs for the Source (subject): PAK 1994 1997 2000 2002 2005 2007 event class. Receiver (object): IND kill in, have troops in, die in, be in, wound in, have soldier in, hold in, kill in attack in, remain Israeli Use of Force Tradeoff Predicate path (verb): accuse(subj=Src, dobj=Rec) 2 TABARI lexicon matching "material conflict" in, detain in, have in, capture in, stay in, about Oslo II Signed Second Intifada Begins ←pobj troops in, kill, have troops ←partmod station in, station in, injure in, invade, shoot in kill, fire at, enter, kill, attack, Two additional notes. 0.8 raid, strike, move, pound, bomb (1) There were a number of patterns in the TABARI lexicon that had multiple conflicting codes. See verbdict/conflicting codes.txt. impose, seal, capture, seize, 0.4 Textual event tuples: Learn(2) As a Bayesian described latent in the variable paper, the model dependency, paths are traversedEvent from types source (ɸ): to receiver, cre- Dyadic relations (θ): arrest, ease, close, deport, Every pair of countries has time-series ating thewhich corresponding simultaneously word discovers: sequence. Prepositions areAn un-collapsed event type is anda (soft) put cluster into the of sequence. verbs. Every pair of countries has time-series close, release of verb events. There is special handling of xcomp’s, which sometimesAbove represent: example an clusters infinitival discovered ‘to’ and by sometimes our model. of event type probabilities. 0.0 do not; we generate two versions, with and without ‘to’; if either one matches to a TABARI pattern 1994 1997 2000 2002 2005 2007 then that counts as a match. The implementation is in verbdict/match.py 3 Inference Model Quantitative evaluations Conclusions The full smoothed⌧ model2 is: Context model (smoothed frames): 5.5 Our novel method simultaneously (1) extracts a The key assumption is 2 ● ● 2 ⌧ InvGamma ● ● database of political events, (2) infers latent ⌧ ⇠ 4.5 ● dyadic and temporal coherence, βk,s,r,t 1 βk,s,r,t σ2 InvGamma Does the automatic ontology ● model k k k ● sociopolitical context, and (3) organizes − ⇠ ● match one designed by experts? ● Null that a pair of countries tends to have similar event types βk,s,r,t 1 βk,s,r,t ↵k Normal 3.5 insightful summaries of large and high- k 2k ⇠ Compare verb clusters to manually Vanilla during one time period. − σk βs,r,1,k N(0, 100) Smoothed dimensional textual data. 2 defined ones in previous work Scale impurity 2.5 ⇠ is better) (lower σk 2 ↵k ⌘k,s,r,t βs,r,t>1,k N(βk,s,r,t 1, ⌧ ) Context Model | Context) Type P(Event (TABARI). This causes event type's verb clusters to reflect real-world k ⇠ − Next steps include semi-supervised methods to ↵k ⌘k,s,r,t 2 1.5 Context Model | Context) Type P(Event ⌘ N(↵ + β , σ ) co-occurrences, which are often semantically meaningful. k s,r,t,k k k,s,r,t k 2 3 4 5 10 20 50 100 exploit previously built knowledge bases, which ⇠ Number of frames (K) ✓s,r,t ✓s,r,t, = Softmax(⌘s,r,t, ) will greatly help political science researchers , ✓s,r,t ⇤ ⇤ Mathematically, this is encoded as a logistic normal s "Source" the incorporation of temporal and location s "Source" 0.7 admixture model (i.e. a type of "topic model"). entity s s Does the model predict conflict? entity z Language model: textual analysis, and discovery of new actors r "Receiver"r "Receiver" z Use the model's inferred political b ImproperUniform ● ● model and their properties. entity entity r r ⇠ 0.6 Training is with blocked Gibbs sampling dynamics to predict whether a ● t Timestept Timestep φk Dir(b/V ) Log. Reg (a Markov Chain Monte Carlo algorithm). i Event tuple wwpredpathpredpath ⇠ conflict is happening between Vanilla i Event tuple z ✓s,r,t i Language Model Type) | Event P(Text Smoothed k Framek Frame i Language Model Type) | Event P(Text ⇠ countries, as defined by the 0.5 w φz Militarized Interstate Dispute (higher is better) b φ ⇠ Conflict prediction AUC dataset. b φ 0.4 2 3 4 5 10 20 50 100 The blocked Gibbs sampler proceeds on the following groups of variables. These conditionals Number of frames (K) implicitly also condition on w, s, r, t. Context (Politics) submodel • – [↵ ⌘, β, σ2]: Exact | – [β ⌘, ↵, σ2]: Exact, FFBS algorithm | Context/Language bridge • – [⌘ β, ↵,z]: Laplace approximation Metropolis-within-Gibbs step | Language submodel • – [z ⌘]: Exact, collapsing out φ | 2.