SOCS Doctoral Symposium, June 27, 2013

Learning to Extract International Relations from News Text Brendan O’Connor Machine Learning Department Carnegie Mellon University Presentation: NSF SOCS Doctoral Symposium, June 27, 2013 Forthcoming, ACL 2013. Joint work with Brandon Stewart (political science, Harvard) Noah Smith (CMU) Paper and other information at: http://brenocon.com/irevents/ 1 Thursday, June 27, 13 Computational Social Science Thursday, June 27, 13 Computational Social Science 1890 Census tabulator - solved 1880’s data deluge Computation as a tool for social science applications Thursday, June 27, 13 Automated Text Analysis • Textual media: news, books, articles, internet, messages... • Automated content analysis: tools for discovery and measurement of concepts, attitudes, events • Natural language processing, information retrieval, data mining, and machine learning as quantitative social science methodology 4 Thursday, June 27, 13 International Relations Event Data Extracted from news text http://gdelt.utdallas.edu 5 Thursday, June 27, 13 Previous work: knowledge engineering approach Open-source TABARI software and ontology/patterns ~15000 verb patterns, ~200 event classes (Schrodt 1994..2012; ontology goes back to 1960’s) 03 - EXPRESS INTENT TO COOPERATE Event types 07 - PROVIDE AID 15 - EXHIBIT MILITARY POSTURE 191 - Impose blockade, restrict movement Verb patterns not_ allow to_ enter ;mj 02 aug 2006 per event type barred travel block traffic from ;ab 17 nov 2005 block road ;hux 1/7/98 Extract events from news text 6 Thursday, June 27, 13 Previous work: knowledge engineering approach Open-source TABARI software and ontology/patterns ~15000 verb patterns, ~200 event classes (Schrodt 1994..2012; ontology goes back to 1960’s) 03 - EXPRESS INTENT TO COOPERATE Event types 07 - PROVIDE AID 15 - EXHIBIT MILITARY POSTURE 191 - Impose blockade, restrict movement Verb patterns not_ allow to_ enter ;mj 02 aug 2006 per event type barred travel block traffic from ;ab 17 nov 2005 block road ;hux 1/7/98 Extract events from news text Issues: 1. Hard to maintain and adapt to new domains 2. Precision is low (Boschee et al 2013) 6 Thursday, June 27, 13 Our approach • Joint learning for high-level summary of event timelines • 1. Automatically learn the verb ontology • 2. Extract events / political dynamics • Social context to drive unsupervised learning about language 7 Thursday, June 27, 13 Newswire entity/predicate data • 6.5 million news articles, 1987-2008 • Focus on events between two actors: (SourceEntity, ReceiverEntity, Time, wpredpath) • “Pakistan promptly accused India” [1/1/2000] => (PAK, IND, 268, SRC -nsubj> accuse <dobj- REC) • Named entities: dictionary of country names • Predicate paths: where verb dominates Source in subject position. Receiver most commonly directobj, prepobj constructions (some others too) Thursday, June 27, 13 Newswire entity/predicate data Very rare to see parsers in text-as-data studies. Parsers are slow, hard to use, and make errors. • Entities: as noun phrases ROOT Events: as verbs and arguments • S • Co-occurrence has low precision NP VP . • Preprocess with Stanford CoreNLP for NP PP VBD NP PP . part-of-speech tags and syntactic DT NN IN NP saw PRP IN NP dependencies The cat in DT NN him with DT NN • Filters for topics, factivity, verb-y paths, the hat a telescope and parse quality • Makes unsupervised learning easier: verb-argument information decides which words represent the event 9 Thursday, June 27, 13 Vanilla Model Model Independent contexts Frame learning from One (s,r,t) slice of graphical model verb co-occurrence within contexts 2 Thus the vector ⇤ ,s,r,t encodes the relative log- τ ⇥ odds of the different frames for events appearing Frame prior βk,s,r,t 1 βk,s,r,t in the context (s, r, t). This simple logistic nor- − k k mal prior is, inlog-odds terms of in topic models,Overall analogousprevalence of ype | ype Context) 2 T to the asymmetricthis context Dirichlet priorframe version (event of LDA class) σk inFrameWallach prioret in al. (2009), since the αk parameter αk ηk,s,r,t can learn that some frames tend⌘ to be moreN( likely↵ , σ2) Context Model P(Event this context k,s,r,t k k k ⇠2 than others. The variance parameter ⇧k controls admixture sparsity, analogousexp to⌘k,s,r,t a Dirichlet’s con- θs,r,t ✓k,s,r,t = centration parameter. K ype) exp ⌘ s "Source" k0=1 k0,s,r,t entity s T r "Receiver" z 3.1 Smoothing FramesP Across Time r z ✓s,r,t entity The vanilla model⇠ is capable of inducing frames t Timestep w φz i Event tuple wpredpath | Event ext through dependency⇠ path co-occurences, when k Frame i Language Model P(T multiple events occur in a given context. How- ever, many dyad-time slices are very sparse; for β φ example,Training: of the blocked 739 weeks Gibbs insampling the dataset, the most prevalent(Markov dyadChain (ISR-PSE) Monte Carlo) has a nonzero Figure 1: Directed probabilistic diagram of the event count in 525 of them. Only 104 directed- 10 modelThursday, for June one 27, 13(s, r, t) dyad-time context, for the dyads have more than 25 nonzero weeks. One smoothed model. solution is to increase the bucket size (e.g., to months); however, previous work in political science has demonstrated that answer questions of in- The context model generates a frame prior ⌅s,r,t • for every context (s, r, t). terest about reciprocity dynamics require recover- ing the events at weekly or even daily granularity Language model: • (Shellman, 2004), and in any case wide buckets Draw lexical sparsity parameter ⇥ from a dif- help only so much for dyads with fewer events • fuse prior (see 4). or less media attention. Therefore we propose a § (SF) model, in which the frame For each frame k, draw a multinomial distri- smoothed frames • distribution for a given dyad comes from a la- bution of dependency paths, ⌥k Dir(⇥). ⇥ tent parameter ⇥ ,s,r,t that smoothly varies over For each (s, r, t), for every event tuple i in ⇥ • time. For each (s, r), draw the first timestep’s val- that context, ues as ⇥k,s,r,1 N(0, 100), and for each context Sample its frame z(i) Mult(⌅ ). ⇥ • ⇥ s,r,t (s, r, t > 1), Sample its predicate realization • (i) 2 w Mult(⌥z). Draw ⇥k,s,r,t N(⇥k,s,r,t 1, ⌃ ) predpath ⇥ • ⇥ − 2 Draw ⇤k,s,r,t N(αk + ⇥k,s,r,t, ⇧k) Thus the language model is very similar to a topic • ⇥ 2 models’ generation of token topics and wordtypes. Other parameters (αk, ⇧k) are same as the vanilla We use structured logistic normal distributions model. This model assumes a random walk pro- to represent contextual effects. The simplest is the cess on ⇥, a variable which exists even for contexts vanilla (V) context model, that contain no events. Thus inferences about ⇤ will be smoothed according to event data at nearby For each frame k, draw global parameters from timesteps. This is an instance of a linear Gaus- • 2 diffuse priors: prevalence αk and variability ⇧k. sian state-space model (also known as a linear dy- For each (s, r, t), namical system or dynamic linear model), and is a • convenient formulation because it has well-known 2 Draw ⇤k,s,r,t N(αk, ⇧ ) for each frame k. exact inference algorithms. This parameterization • ⇥ k Apply a softmax transform, of ⇤ is related to one of the topic models pro- • posed in Blei and Lafferty (2006), though at a dif- exp ⇤k,s,r,t ⌅k,s,r,t = ferent structural level and using a different infer- K exp ⇤ k =1 k0,s,r,t ence technique ( 4). This also draws on models 0 § Smoothed Model Vanilla Model Linear dynamical system Independent contexts Model (Random walk) Frame learning from verb co-occurrence One (s,r,t) slice of graphical model βk,s,r,1 N(0, 100) ⇠ 2 within contexts βk,s,r,t N(βk,s,r,t 1,⌧) 2 ⇠ − Thus the vector ⇤ ,s,r,t encodes2 the relative log- τ ⌘k,s,r,t N(↵k + βk,s,r,t⇥ , σk) odds⇠ of the different frames for events appearing Frame prior βk,s,r,t 1 βk,s,r,t in the context (s, r, t). This simple logistic nor- − k k mal prior is, inlog-odds terms of in topic models,Overall analogousprevalence of ype | ype Context) 2 T to the asymmetricthis context Dirichlet priorframe version (event of LDA class) σk inFrameWallach prioret in al. (2009), since the αk parameter αk ηk,s,r,t can learn that some frames tend⌘ to be moreN( likely↵ , σ2) Context Model P(Event this context k,s,r,t k k k ⇠2 than others. The variance parameter ⇧k controls admixture sparsity, analogousexp to⌘k,s,r,t a Dirichlet’s con- θs,r,t ✓k,s,r,t = centration parameter. K ype) exp ⌘ s "Source" k0=1 k0,s,r,t entity s T r "Receiver" z 3.1 Smoothing FramesP Across Time r z ✓s,r,t entity The vanilla model⇠ is capable of inducing frames t Timestep w φz i Event tuple wpredpath | Event ext through dependency⇠ path co-occurences, when k Frame i Language Model P(T multiple events occur in a given context. How- ever, many dyad-time slices are very sparse; for β φ example,Training: of the blocked 739 weeks Gibbs insampling the dataset, the most prevalent(Markov dyadChain (ISR-PSE) Monte Carlo) has a nonzero Figure 1: Directed probabilistic diagram of the event count in 525 of them. Only 104 directed- 11 modelThursday, for June one 27, 13(s, r, t) dyad-time context, for the dyads have more than 25 nonzero weeks.

SOCS Doctoral Symposium, June 27, 2013

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support