Semantic Role Labeling

Introduction Applications Semantic Role Labeling Question & answer systems Who did what to whom at where?

The police officer detained the suspect at the scene of the crime

ARG0 V ARG2 AM-loc Agent Theme Location

30 Can we figure out that these have the same meaning?

XYZ corporation bought the stock. They sold the stock to XYZ corporation. The stock was bought by XYZ corporation. The purchase of the stock by XYZ corporation... The stock purchase by XYZ corporation...

4 A Shallow Semantic Representation: Semantic Roles

Predicates (bought, sold, purchase) represent an event semantic roles express the abstract role that arguments of a predicate can take in the event

More specific More general

buyer agent proto-agent

5 Semantic Role Labeling

Semantic Roles 2 CHAPTER 22 SEMANTIC ROLE LABELING •

Thematic Role Definition AGENT The volitional causer of an event EXPERIENCER The experiencer of an event FORCE The non-volitional causer of the event THEME The participant most directly affected by an event RESULT The end product of an event CONTENT The proposition or content of a propositional event INSTRUMENT An instrument used in an event BENEFICIARY The beneficiary of an event SOURCE The origin of the object of a transfer event GOAL The destination of an object of a transfer event Figure 22.1 Some commonly used thematic roles with their definitions.

Getting to semantic roles(22.1) Sasha broke the window. (22.2) Pat opened the door. Neo-Davidsonian event representation: A neo-Davidsonian event representation of these two sentences would be

e,x,y Breaking(e) Breaker(e,Sasha) Sasha broke the window 9 ^ BrokenThing(e,y) Window(y) ^ ^ e,x,y Opening(e) Opener(e,Pat ) Pat opened the door 9 ^ OpenedThing(e,y) Door(y) ^ ^

Subjects of break and open: In this representation,Breaker theand rolesOpener of the subjects of the verbs break and open are deep roles Breaker and Opener respectively. These deep roles are specific to each event; Break- Deep roles specific to each event (breaking, opening)ing events have Breakers, Opening events have Openers, and so on. Hard to reason about them for NLU applications like QAIf we are going to be able to answer questions, perform inferences, or do any 7 further kinds of natural language understanding of these events, we’ll need to know a little more about the of these arguments. Breakers and Openers have something in common. They are both volitional actors, often animate, and they have direct causal responsibility for their events. Thematic roles Thematic roles are a way to capture this semantic commonality between Break- ers and Eaters. agents We say that the subjects of both these verbs are agents. Thus, AGENT is the thematic role that represents an abstract idea such as volitional causation. Similarly, the direct objects of both these verbs, the BrokenThing and OpenedThing, are both prototypically inanimate objects that are affected in some way by the action. The theme semantic role for these participants is theme. Thematic roles are one of the oldest linguistic models, proposed first by the Indian grammarian Panini sometime between the 7th and 4th centuries BCE. Their modern formulation is due to Fillmore (1968) and Gruber (1965). Although there is no universally agreed-upon set of roles, Figs. 22.1 and 22.2 list some thematic roles that have been used in various computational papers, together with rough definitions and examples. Most thematic role sets have about a dozen roles, but we’ll see sets with smaller numbers of roles with even more abstract meanings, and sets with very large numbers of roles that are specific to situations. We’ll use the general term semantic roles semantic roles for all sets of roles, whether small or large. Thematic roles

• Breaker and Opener have something in common! • Volitional actors • Often animate • Direct causal responsibility for their events • Thematic roles are a way to capture this semantic commonality between Breakers and Eaters. • They are both AGENTS. • The BrokenThing and OpenedThing, are THEMES.

8 • prototypically inanimate objects affected in some way by the action Thematic roles

• One of the oldest linguistic models • Indian grammarian Panini between the 7th and 4th centuries BCE • Modern formulation from Fillmore (1966,1968), Gruber (1965) • Fillmore influenced by Lucien Tesnière’s (1959) Éléments de Syntaxe Structurale, the book that introduced dependency grammar • Fillmore first referred to roles as actants (Fillmore, 1966) but switched to the term case

9 Thematic roles

• 2 CHAPTER 22 SA typical set:EMANTIC ROLE LABELING 22.2 DIATHESIS ALTERNATIONS 3 • • Thematic Role Definition Thematic Role Example AGENT The volitional causer of an event AGENT The waiter spilled the soup. EXPERIENCER The experiencer of an event EXPERIENCER John has a headache. FORCE The non-volitional causer of the eventFORCE The wind blows debris from the mall into our yards. THEME The participant most directly affectedTHEME by an event Only after Benjamin Franklin broke the ice... RESULT The end product of an event RESULT The city built a regulation-size baseball diamond... CONTENT The proposition or content of a propositionalCONTENT event Mona asked “You met Mary Ann at a supermarket?” INSTRUMENT An instrument used in an event INSTRUMENT He poached catfish, stunning them with a shocking device... BENEFICIARY The beneficiary of an event BENEFICIARY Whenever Ann Callahan makes hotel reservations for her boss... SOURCE The origin of the object of a transferSOURCE event I flew in from Boston. GOAL The destination of an object of a transferGOAL event I drove to Portland. Figure 22.1 Some commonly used thematic roles with theirFigure definitions. 22.2 Some prototypical examples of various thematic roles.

(22.1) Sasha broke the window. 22.2 Diathesis Alternations (22.2)10 Pat opened the door. The main reason computational systems use semantic roles is to act as a shallow A neo-Davidsonian event representation of these twomeaning sentences representation would be that can let us make simple inferences that aren’t possible from the pure surface string of , or even from the . To extend the e,x,y Breaking(e) Breaker(e,Sashaearlier) examples, if a document says that Company A acquired Company B, we’d 9 ^ BrokenThing(e,y) Windowlike to(y) know that this answers the query Was Company B acquired? despite the fact ^ ^ e,x,y Opening(e) Opener(e,Pat )that the two sentences have very different surface syntax. Similarly, this shallow 9 ^ OpenedThing(e,y) Doorsemantics(y) might act as a useful intermediate language in . ^ ^ Semantic roles thus help generalize over different surface realizations of pred- icate arguments. For example, while the AGENT is often realized as the subject of In this representation, the roles of the subjects of thethe verbs sentence,break inand otheropen casesare the THEME can be the subject. Consider these possible deep roles Breaker and Opener respectively. These deep roles are specificrealizations to each of event; the thematicBreak- arguments of the verb break: ing events have Breakers, Opening events have Openers, and so on. (22.3) John broke the window. If we are going to be able to answer questions, perform inferences, or do any AGENT THEME further kinds of natural language understanding of these events, we’ll need to know a little more about the semantics of these arguments. Breakers(22.4) Johnand Openersbroke thehave window with a rock. something in common. They are both volitional actors, often animate,AGENT and theyTHEME have INSTRUMENT direct causal responsibility for their events. (22.5) The rock broke the window. Thematic roles Thematic roles are a way to capture this semantic commonalityINSTRUMENT between Break- THEME ers and Eaters. (22.6) The window broke. agents We say that the subjects of both these verbs are agents. Thus,THEMEAGENT is the thematic role that represents an abstract idea such as volitional(22.7) causation.The window Similarly,was broken by John. the direct objects of both these verbs, the BrokenThing and OpenedThingTHEME , are both AGENT prototypically inanimate objects that are affected in some wayThese by examples the action. suggest The that break has (at least) the possible arguments AGENT, theme semantic role for these participants is theme. THEME, and INSTRUMENT. The set of thematic role arguments taken by a verb is Thematic roles are one of the oldest linguistic models, proposed first by the thematic grid often called the thematic grid, q-grid, or case frame. We can see that there are Indian grammarian Panini sometime between the 7th and 4th centuries BCE. Their case frame (among others) the following possibilities for the realization of these arguments of modern formulation is due to Fillmore (1968) and Gruberbreak (1965): . Although there is no universally agreed-upon set of roles, Figs. 22.1 and 22.2 list some thematic roles that have been used in various computational papers, together withAGENT rough/Subject, definitionsTHEME/Object and examples. Most thematic role sets have about a dozen roles,AGENT but we’ll/Subject, see setsTHEME/Object, INSTRUMENT/PPwith with smaller numbers of roles with even more abstract meanings,INSTRUMENT and sets with/Subject, very THEME/Object large numbers of roles that are specific to situations. We’ll useTHEME the general/Subject term semantic roles semantic roles for all sets of roles, whether small or large. It turns out that many verbs allow their thematic roles to be realized in various syntactic positions. For example, verbs like give can realize the THEME and GOAL arguments in two different ways: 22.2 DIATHESIS ALTERNATIONS 3 • Thematic Role Example AGENT The waiter spilled the soup. EXPERIENCER John has a headache. 22.2 FORCEDIATHESIS ALTERNATIONSThe wind blows debris3 from the mall into our yards. • THEME Only after Benjamin Franklin broke the ice... RESULT The city built a regulation-size baseball diamond... Thematic Role Example CONTENT Mona asked “You met Mary Ann at a supermarket?” AGENT The waiter spilled the soup. INSTRUMENT He poached catfish, stunning them with a shocking device... EXPERIENCER John has a headache. BENEFICIARY Whenever Ann Callahan makes hotel reservations for her boss... FORCE The wind blows debris from theSOURCE mall into our yards.I flew in from Boston. THEME Only after Benjamin Franklin brokeGOAL the ice... I drove to Portland. RESULT The city built a regulation-sizeFigure baseball 22.2 diamondSome prototypical... examples of various thematic roles. CONTENT Mona asked “You met Mary Ann at a supermarket?” INSTRUMENT He poached catfish,22.2 stunning Diathesis them with a Alternations shocking device... BENEFICIARY Whenever Ann Callahan makes hotel reservations for her boss... SOURCE I flew in from Boston. The main reason computational systems use semantic roles is to act as a shallow GOAL I drove to Portland. meaning representation that can let us make simple inferences that aren’t possible Figure 22.2 Some prototypical examples of various thematicfrom the roles. pure surface string of words, or even from the parse tree. To extend the earlier examples, if a document says that Company A acquired Company B, we’d like to know that this answers the query Was Company B acquired? despite the fact 22.2 Diathesis Alternations that the two sentences have very different surface syntax. Similarly, this shallow semantics might act as a useful intermediate language in machine translation. Semantic roles thus help generalize over different surface realizations of pred- The main reason computational systems use semanticicate arguments. roles is to For act example, as a shallow while the AGENT is often realized as the subject of meaning representation that can let us make simplethe sentence,inferences in other that aren’t cases the possibleTHEME can be the subject. Consider these possible from the pure surface string of words, or even fromrealizations the parse of the tree. thematic To extend arguments the of the verb break: earlier examples, if a document says that Company(22.3) A acquiredJohn broke Company the window. B, we’d like to know that this answers the query Was Company BAGENT acquired? despiteTHEME the fact that the two sentences have very different surface(22.4) syntax.John Similarly,broke the this window shallowwith a rock. semantics mightThematic grid, case frame, act as a useful intermediate language inAGENT machine translation.θTHEME-grid INSTRUMENT (22.5) The rock broke the window. Semantic roles thus help generalize over different surface realizations of pred- INSTRUMENT THEME icate arguments. For example, while the AGENT (22.6)is oftenThe realized window asbroke. the subject of the sentence, in other cases the THEME can be the subject. Consider these possible thematicTHEME grid, case frame, θ-grid realizationsExample usages of “break” of the thematic arguments of the verb (22.7)break:The window was broken by John. Break: John broke the window. THEME AGENT (22.3) AGENT, THEME, INSTRUMENT. AGENT THEME These examples suggest that break has (at least) the possible arguments AGENT, (22.4) John broke the window with a rock. THEME, and INSTRUMENT. The set of thematic role arguments taken by a verb is thematic grid often called the thematic grid, q-grid, or case frame. We can see that there are AGENT THEME INSTRUMENT case frame (among others) the following possibilities for the realization of these arguments of (22.5) The rock broke the window. break: Some realizations: INSTRUMENT THEME AGENT/Subject, THEME/Object (22.6) The window broke. AGENT/Subject, THEME/Object, INSTRUMENT/PPwith THEME INSTRUMENT/Subject, THEME/Object THEME/Subject (22.7) The window was broken by John. THEME AGENT It turns out that many verbs allow their thematic roles to be realized in various syntactic positions. For example, verbs like give can realize the THEME and GOAL These11 examples suggest that break has (at least)arguments the possible in two arguments different ways:AGENT, THEME, and INSTRUMENT. The set of thematic role arguments taken by a verb is thematic grid often called the thematic grid, q-grid, or case frame. We can see that there are case frame (among others) the following possibilities for the realization of these arguments of break: AGENT/Subject, THEME/Object AGENT/Subject, THEME/Object, INSTRUMENT/PPwith INSTRUMENT/Subject, THEME/Object THEME/Subject It turns out that many verbs allow their thematic roles to be realized in various syntactic positions. For example, verbs like give can realize the THEME and GOAL arguments in two different ways: 4 CHAPTER 22 SEMANTIC ROLE LABELING • Diathesis alternations (or verb alternation)

(22.8) a. Doris gave the book to Cary. Break: AGENT, INSTRUMENT, or THEME as AGENT THEME GOAL subject

b. Doris gave Cary the book. Give: THEME and GOAL in either order AGENT GOAL THEME

These multipleDative alternation argument structure: particular semantic realizations (theclasses of verbs, fact that break“verbs of future having” can take AGENT, INSTRUMENT,( oradvanceTHEME, allocateas subject,, offer, owe and), “send verbs” (give can realizeforward its THEME, hand, mailand), “verbs of GOAL in verb throwing” (kick, pass, throw), etc. alternation either order) are called verb alternations or diathesis alternations. The alternation dative Levin (1993): 47 semantic classes (“Levin classes”) for 3100 English verbs and alternation we showed above for give, the dative alternation, seems to occur with particular se- mantic classes ofalternations. In online resource verbs, including “verbs ofVerbNet future having”. (advance, allocate, offer, 12 owe), “send verbs” (forward, hand, mail), “verbs of throwing” (kick, pass, throw), and so on. Levin (1993) lists for 3100 English verbs the semantic classes to which they belong (47 high-level classes, divided into 193 more specific classes) and the various alternations in which they participate. These lists of verb classes have been incorporated into the online resource VerbNet (Kipper et al., 2000), which links each verb to both WordNet and FrameNet entries.

22.3 Semantic Roles: Problems with Thematic Roles

Representing meaning at the thematic role level seems like it should be useful in dealing with complications like diathesis alternations. Yet it has proved quite diffi- cult to come up with a standard set of roles, and equally difficult to produce a formal definition of roles like AGENT, THEME, or INSTRUMENT. For example, researchers attempting to define role sets often find they need to fragment a role like AGENT or THEME into many specific roles. Levin and Rappa- port Hovav (2005) summarize a number of such cases, such as the fact there seem to be at least two kinds of INSTRUMENTS, intermediary instruments that can appear as subjects and enabling instruments that cannot: (22.9) a. The cook opened the jar with the new gadget. b. The new gadget opened the jar. (22.10) a. Shelly ate the sliced banana with a fork. b. *The fork ate the sliced banana. In addition to the fragmentation problem, there are cases in which we’d like to reason about and generalize across semantic roles, but the finite discrete lists of roles don’t let us do this. Finally, it has proved difficult to formally define the thematic roles. Consider the AGENT role; most cases of AGENTS are animate, volitional, sentient, causal, but any individual noun phrase might not exhibit all of these properties. semantic role These problems have led to alternative semantic role models that use either many fewer or many more roles. The first of these options is to define generalized semantic roles that abstract proto-agent over the specific thematic roles. For example, PROTO-AGENT and PROTO-PATIENT proto-patient are generalized roles that express roughly agent-like and roughly patient-like mean- ings. These roles are defined, not by necessary and sufficient conditions, but rather by a set of heuristic features that accompany more agent-like or more patient-like meanings. Thus, the more an argument displays agent-like properties (being voli- tionally involved in the event, causing an event or a change of state in another par- ticipant, being sentient or intentionally involved, moving) the greater the likelihood Problems with Thematic Roles Hard to create standard set of roles or formally define them Often roles need to be fragmented to be defined. Levin and Rappaport Hovav (2015): two kinds of INSTRUMENTS intermediary instruments that can appear as subjects The cook opened the jar with the new gadget. The new gadget opened the jar. enabling instruments that cannot Shelly ate the sliced banana with a fork. 13 *The fork ate the sliced banana. Alternatives to thematic roles

1. Fewer roles: generalized semantic roles, defined as prototypes (Dowty 1991) PROTO-AGENT PROTO-PATIENT PropBank 2. More roles: Define roles specific to a group of predicates FrameNet

14 Semantic Role Labeling

The Proposition Bank (PropBank) PropBank

• Palmer, Martha, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71–106

16 PropBank Roles Following Dowty 1991

Proto-Agent • Volitional involvement in event or state • Sentience (and/or perception) • Causes an event or change of state in another participant • Movement (relative to position of another participant) Proto-Patient • Undergoes change of state • Causally affected by another participant • Stationary relative to movement of another participant 17 PropBank Roles

• Following Dowty 1991 • Role definitions determined verb by verb, with respect to the other roles • Semantic roles in PropBank are thus verb-sense specific. • Each verb sense has numbered argument: Arg0, Arg1, Arg2,… Arg0: PROTO-AGENT Arg1: PROTO-PATIENT Arg2: usually: benefactive, instrument, attribute, or end state Arg3: usually: start point, benefactive, instrument, or attribute Arg4 the end point 18 (Arg2-Arg5 are not really that consistent, causes a problem for labeling) 22.4 THE PROPOSITION BANK 5 •

that the argument can be labeled a PROTO-AGENT. The more patient-like the proper- ties (undergoing change of state, causally22.4 affectedTHE by anotherPROPOSITION participant,BANK stationary5 relative to other participants, etc.), the greater• the likelihood that the argument can bethat labeled the argument a PROTO can- bePATIENT labeled. a PROTO-AGENT. The more patient-like the proper- tiesThe (undergoing second change direction of state, is instead causally to affected define semantic by another roles participant, that are stationary specific to a particularrelative to verb other or participants, a particular etc.), group the of greater semantically the likelihood related that verbs the argument or nouns. can be labeledIn the nexta PROTO two-PATIENT sections. we describe two commonly used lexical resources that makeThe use second of these direction alternative is instead versions to define of semantic semantic roles. rolesPropBank that are specificuses both to a proto- rolesparticular and verb verb-specific or a particular semantic group roles. of semanticallyFrameNet relateduses verbs semantic or nouns. roles that are spe- cificIn to the a general next two semantic sections weidea describe called a twoframe commonly. used lexical resources that make use of these alternative versions of semantic roles. PropBank uses both proto- roles and verb-specific semantic roles. FrameNet uses semantic roles that are spe- 22.4 Thecific Proposition to a general semantic Bank idea called a frame. 22.4 The Proposition Bank PropBank The Proposition Bank, generally referred to as PropBank, is a resource of sen- tences annotated with semantic roles. The English PropBank labels all the sentences PropBank inThe theProposition Penn ; Bank, the generally Chinese referred PropBank to as PropBank labels sentences, is a resource in the Penn of sen- Chinese TreeBank.tences annotated Because with semantic of the difficulty roles. The of English defining PropBank a universal labels set all the of thematicsentences roles, thein the semantic Penn TreeBank; roles in PropBank the Chinese are PropBank defined labels with respect sentences to anin the individual Penn Chinese verb sense. EachTreeBank. sense Becauseof each verb of the thus difficulty has a specific of defining set of a roles, universal which set are of thematic given only roles, numbers ratherthe semantic than names: roles inArg0 PropBank, Arg1 are, definedArg2, and with so respect on. In to angeneral, individualArg0 verbrepresents sense. the Each sense of each verb thus has a specific set of roles, which are given only numbers PROTO-AGENT, and Arg1, the PROTO-PATIENT. The semantics of the other roles rather than names: , , , and so on. In general, represents the are less consistent,Arg0 oftenArg1 beingArg2 defined specifically for eachArg0 verb. Nonetheless there PROTO-AGENT, and Arg1, the PROTO-PATIENT. The semantics of the other roles areare someless consistent, generalization; often being the Arg2 definedis specifically often the benefactive, for each verb. instrument, Nonetheless attribute, there or endare some state, generalization; the Arg3 the start the Arg2 point,is benefactive, often the benefactive, instrument, instrument, or attribute, attribute, and the or Arg4 theend end state, point. the Arg3 the start point, benefactive, instrument, or attribute, and the Arg4 theHere end point. are some slightly simplified PropBank entries for one sense each of the verbsHereagree areand somefall slightly. Such simplified PropBank PropBank entries entries are called for oneframe sense files each; note of the that the definitionsverbs agree inand thefall frame. Such file PropBank for each entries role (“Other are called entityframe agreeing”, files; note “Extent, that the amount fallen”)definitions are in informal the frame glosses file for intended each role to (“Other be read entity by humans, agreeing”, rather “Extent, than amount being formal definitions.fallen”) are informal glosses intended to be read by humans, rather than being formal definitions. (22.11) agree.01 (22.11)Arg0:agree.01 Agreer PropBank Frame Files Arg0: Agreer Arg1: Proposition Arg1: Proposition Arg2:Arg2: Other Other entity entity agreeing agreeing Ex1: [ The group] agreed [ it wouldn’t make an offer]. Ex1: [Arg0Arg0The group] agreed [Arg1Arg1it wouldn’t make an offer]. Ex2:Ex2: [ [ArgM-TMPArgM-TMPUsually]Usually] [Arg0 [Arg0John]John]agreesagrees[Arg2[Arg2withwith Mary] Mary] [[Arg1Arg1onon everything]. everything]. (22.12)(22.12) fall.01fall.01 Arg1:Arg1: Logical Logical subject, subject, patient, patient, thing thing falling falling Arg2:Arg2: Extent, Extent, amount amount fallen fallen Arg3:Arg3: start start point point Arg4:Arg4: end end point, point, end end state state of of arg1 arg1 Ex1: [Arg1 Sales] fell [Arg4 to $25 million] [Arg3 from $27 million]. Ex1: [Arg1 Sales] fell [Arg4 to $25 million] [Arg3 from $27 million]. Ex2: [Arg1 The average junk bond] fell [Arg2 by 4.2%]. 19 Ex2: [Arg1 The average junk bond] fell [Arg2 by 4.2%]. Note that there is no Arg0 role for fall, because the normal subject of fall is a PROTONote-PATIENT that there. is no Arg0 role for fall, because the normal subject of fall is a PROTO-PATIENT. 6 CHAPTER 22 SEMANTIC ROLE LABELING 6 CHAPTER• 22 SEMANTIC ROLE LABELING • The PropBank semantic roles can be useful in recovering shallow semantic in- formation aboutAdvantage of a The verbal PropBank arguments. semantic ConsiderProbBank roles the verb canincrease beLabeling useful: in recovering shallow semantic in- (22.13) increase.01formation“go about up incrementally” verbal arguments. Consider the verb increase: Arg0:(22.13) causerincrease.01 of increase “go up incrementally” Arg1: thingArg0: increasing causer of increase Arg2: amount increased by, EXT, or MNR Arg3: startArg1: point thing increasing Arg4: endArg2: point amount increased by, EXT, or MNR A PropBank semanticArg3: startrole labeling point would allow us to infer the commonality in the event structuresArg4: of the end following point three examples, that is, that in each case Big FruitThis would allow us to see the commonalities in these 3 sentences: Co. is theAAGENT PropBankand the semantic price of bananas role labelingis the THEME would, allow despite us the to differing infer the commonality in surface forms.the event structures of the following three examples, that is, that in each case Big (22.14) [Arg0FruitBig Co. Fruitis the Co.AGENT ] increasedand [theArg1 pricethe price of bananas of bananas].is the THEME, despite the differing (22.15) [Arg1surfaceThe forms. price of bananas] was increased again [Arg0 by Big Fruit Co. ] (22.16) [ The price of bananas] increased [ 5%]. 20 Arg1(22.14) [Arg0 Big Fruit Co. ] increasedArg2 [Arg1 the price of bananas]. PropBank also has a number of non-numbered arguments called ArgMs, (ArgM- (22.15) [Arg1 The price of bananas] was increased again [Arg0 by Big Fruit Co. ] TMP, ArgM-LOC, etc) which represent modification or adjunct meanings. These are relatively stable(22.16) across[Arg1 predicates,The price so aren’t of bananas] listed with increased each frame [Arg2 file.5%]. Data labeled with these modifiersPropBank can be also helpful has a in number training of systems non-numbered to detect temporal, arguments location, called ArgMs, (ArgM- or directionalTMP, modification ArgM-LOC, across etc) predicates. which represent Some of themodification ArgM’s include: or adjunct meanings. These are TMP relativelywhen? stable across predicates, yesterday so evening, aren’t listed now with each frame file. Data labeled LOC with thesewhere? modifiers can be at helpful the museum, in training in San systems Francisco to detect temporal, location, DIR or directionalwhere to/from? modification across down, to predicates. Bangkok Some of the ArgM’s include: MNR how? clearly, with much enthusiasm PRP/CAUTMPwhy?when? because ... , in response yesterday to evening,the ruling now REC LOC where?themselves, each at other the museum, in San Francisco ADV DIRmiscellaneouswhere to/from? down, to Bangkok PRD MNRsecondary predicationhow? ...ate the meat raw clearly, with much enthusiasm While PropBankPRP/CAU focuseswhy? on verbs, a related project, because NomBank ...(Meyers , in response et al., to the ruling 2004) adds annotationsREC to noun predicates. For examplethemselves, the noun agreement each otherin Apple’s agreementADV with IBM miscellaneouswould be labeled with Apple as the Arg0 and IBM as the Arg2. ThisPRD allows semanticsecondary role labelers predication to assign labels ...ate to the arguments meat raw of both verbal and nominal predicates. While PropBank focuses on verbs, a related project, NomBank (Meyers et al., 2004) adds annotations to noun predicates. For example the noun agreement in 22.5 FrameNetApple’s agreement with IBM would be labeled with Apple as the Arg0 and IBM as the Arg2. This allows semantic role labelers to assign labels to arguments of both verbal and nominal predicates. While making inferences about the semantic commonalities across different sen- tences with increase is useful, it would be even more useful if we could make such inferences in many more situations, across different verbs, and also between verbs 22.5and nouns. FrameNet For example, we’d like to extract the similarity among these three sen- tences: (22.17) [ The price of bananas] increased [ 5%]. Arg1While making inferences about theArg2 semantic commonalities across different sen- (22.18) [ The price of bananas] rose [ 5%]. Arg1tences with increase is useful,Arg2 it would be even more useful if we could make such (22.19) There has been a [ 5%] rise [ in the price of bananas]. inferences in manyArg2 more situations,Arg1 across different verbs, and also between verbs Note thatand the nouns. second For example example, uses the we’d different like to verb extractrise, and the the similarity third example among these three sen- uses the noun rather than the verb rise. We’d like a system to recognize that the tences:

(22.17) [Arg1 The price of bananas] increased [Arg2 5%]. (22.18) [Arg1 The price of bananas] rose [Arg2 5%]. (22.19) There has been a [Arg2 5%] rise [Arg1 in the price of bananas]. Note that the second example uses the different verb rise, and the third example uses the noun rather than the verb rise. We’d like a system to recognize that the 6 CHAPTER 22 SEMANTIC ROLE LABELING •

The PropBank semantic roles can be useful in recovering shallow semantic in- formation about verbal arguments. Consider the verb increase: (22.13) increase.01 “go up incrementally” Arg0: causer of increase Arg1: thing increasing Arg2: amount increased by, EXT, or MNR Arg3: start point Arg4: end point A PropBank semantic role labeling would allow us to infer the commonality in the event structures of the following three examples, that is, that in each case Big Fruit Co. is the AGENT and the price of bananas is the THEME, despite the differing surface forms.

(22.14) [Arg0 Big Fruit Co. ] increased [Arg1 the price of bananas]. (22.15) [Arg1 The price of bananas] was increased again [Arg0 by Big Fruit Co. ] (22.16) [Arg1 The price of bananas] increased [Arg2 5%]. PropBank also has a number of non-numbered arguments called ArgMs, (ArgM- TMP,Modifiers or adjuncts of the predicate: ArgM-LOC, etc) which represent modification or adjunct meanings. These are relativelyArg-M stable across predicates, so aren’t listed with each frame file. Data labeled with these modifiers can be helpful in training systems to detect temporal, location, or directional modification across predicates. Some of the ArgM’s include: ArgM-TMP when? yesterday evening, now LOC where? at the museum, in San Francisco DIR where to/from? down, to Bangkok MNR how? clearly, with much enthusiasm PRP/CAU why? because ... , in response to the ruling REC themselves, each other ADV miscellaneous PRD secondary predication ...ate the meat raw While PropBank focuses on verbs, a related project, NomBank (Meyers et al., 2004) adds annotations to noun predicates. For example the noun agreement in Apple’s agreement with IBM would be labeled with Apple as the Arg0 and IBM as 21 the Arg2. This allows semantic role labelers to assign labels to arguments of both verbal and nominal predicates.

22.5 FrameNet

While making inferences about the semantic commonalities across different sen- tences with increase is useful, it would be even more useful if we could make such inferences in many more situations, across different verbs, and also between verbs and nouns. For example, we’d like to extract the similarity among these three sen- tences:

(22.17) [Arg1 The price of bananas] increased [Arg2 5%]. (22.18) [Arg1 The price of bananas] rose [Arg2 5%]. (22.19) There has been a [Arg2 5%] rise [Arg1 in the price of bananas]. Note that the second example uses the different verb rise, and the third example uses the noun rather than the verb rise. We’d like a system to recognize that the PropBanking a Sentence PropBank - A TreeBanked Sentence Martha Palmer 2013 (S (NP-SBJ Analysts) S (VP have A sample (VP been VP (VP expecting parse tree (NP (NP a GM-Jaguar pact) have VP (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) NP-SBJ been VP (VP would Analysts (VP give expectingNP (NP the U.S. car maker) SBAR (NP (NP an eventual (ADJP 30 %) stake) NP S (PP-LOC in (NP the British company)))))))))))) a GM-Jaguar WHNP-1 VP pact that NP-SBJ VP *T*-1 would NP give Analysts have been expecting a GM-Jaguar NP PP-LOC pact that would give the U.S. car maker an the US car NP eventual 30% stake in the British company. maker an eventual NP 30% stake in the British 22 company TheThe samesame parse tree sentence, PropBankedPropBanked Martha Palmer 2013 have been expecting (S Arg0 (NP-SBJ Analysts) (VP have (VP been Arg1 Arg0 (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) Analysts a GM-Jaguar (VP would pact (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British Arg0 that would give company)))))))))))) Arg1

*T*-1 Arg2 an eventual 30% stake in the British company the US car maker expect(Analysts, GM-J pact) 23 give(GM-J pact, US car maker, 30% stake) Verb Frames Coverage By Language – Current Count of Senses (lexical units) Annotated PropBank Data

2013 Verb Frames Coverage Count of sense (lexical units)

• Penn English TreeBank, Estimated Coverage Language Final Count OntoNotes 5.0. in Running Text • Total ~2 million words English 10,615* 99% • Penn Chinese TreeBank Chinese 24, 642 98% • Hindi/Urdu PropBank Arabic 7,015 99% • Arabic PropBank • Only 111 English adjectives 24 From Martha Palmer 2013 Tutorial

54 EnglishPlus nouns and light verbs Noun and LVC annotation

! Example Noun: Decision ! Roleset: Arg0: decider, Arg1: decision…

! “…[yourARG0] [decisionREL]

[to say look I don't want to go through this anymoreARG1]”

! Example within an LVC: Make a decision

! “…[the PresidentARG0] [madeREL-LVB]

the [fundamentally correctARGM-ADJ] [decision ] [to get on offense ]” 25 REL ARG1 Slide from Palmer 2013

57 Semantic Role Labeling

FrameNet 6 CHAPTER 22 SEMANTIC ROLE LABELING •

The PropBank semantic roles can be useful in recovering shallow semantic in- formation about verbal arguments. Consider the verb increase: (22.13) increase.01 “go up incrementally” Arg0: causer of increase Arg1: thing increasing Arg2: amount increased by, EXT, or MNR Arg3: start point Arg4: end point A PropBank semantic role labeling would allow us to infer the commonality in the event structures of the following three examples, that is, that in each case Big Fruit Co. is the AGENT and the price of bananas is the THEME, despite the differing surface forms.

(22.14) [Arg0 Big Fruit Co. ] increased [Arg1 the price of bananas]. (22.15) [Arg1 The price of bananas] was increased again [Arg0 by Big Fruit Co. ] (22.16) [Arg1 The price of bananas] increased [Arg2 5%]. PropBank also has a number of non-numbered arguments called ArgMs, (ArgM- TMP, ArgM-LOC, etc) which represent modification or adjunct meanings. These are relatively stable across predicates, so aren’t listed with each frame file. Data labeled with these modifiers can be helpful in training systems to detect temporal, location, or directional modification across predicates. Some of the ArgM’s include: TMP when? yesterday evening, now LOC where? at the museum, in San Francisco DIR where to/from? down, to Bangkok MNR how? clearly, with much enthusiasm PRP/CAU why? because ... , in response to the ruling REC themselves, each other ADV miscellaneous PRD secondary predication ...ate the meat raw While PropBank focuses on verbs, a related project, NomBank (Meyers et al., 2004) adds annotations to noun predicates. For example the noun agreement in Apple’s agreement with IBM would be labeled with Apple as the Arg0 and IBM as the Arg2. This allows semantic role labelers to assign labels to arguments of both verbal and nominal predicates.

22.5 FrameNet

While makingCapturing descriptions of the same event inferences about the semantic commonalities across different sen- tences with increaseby different nouns/verbsis useful, it would be even more useful if we could make such inferences in many more situations, across different verbs, and also between verbs and nouns. For example, we’d like to extract the similarity among these three sen- tences:

(22.17) [Arg1 The price of bananas] increased [Arg2 5%]. (22.18) [Arg1 The price of bananas] rose [Arg2 5%]. (22.19) There has been a [Arg2 5%] rise [Arg1 in the price of bananas]. Note that the second example uses the different verb rise, and the third example uses the noun rather than the verb rise. We’d like a system to recognize that the

27 FrameNet

• Baker et al. 1998, Fillmore et al. 2003, Fillmore and Baker 2009, Ruppenhofer et al. 2006 • Roles in PropBank are specific to a verb • Role in FrameNet are specific to a frame: a background knowledge structure that defines a set of frame-specific semantic roles, called frame elements, • includes a set of pred cates that use these roles • each word evokes a frame and profiles some aspect of the frame

28 22.5 FRAMENET 7 •

price of bananas is what went up, and that 5% is the amount it went up, no matter whether the 5% appears as the object of the verb increased or as a nominal modifier of the noun rise. FrameNet The FrameNet project is another semantic-role-labeling project that attempts to address just these kinds of problems (Baker et al. 1998, Fillmore et al. 2003, Fillmore and Baker 2009, Ruppenhofer et al. 2006). Whereas roles in the PropBank project are specific to an individual verb, roles in the FrameNet project are specific to a frame. What is a frame? Consider the following set of words: reservation, flight, travel, buy, price, cost, fare, rates, meal, plane There are many individual lexical relations of hyponymy, synonymy, and so on between many of the words in this list. The resulting set of relations does not, however, add up to a complete account of how these words are related. They are clearly all defined with respect to a coherent chunk of common-sense background information concerning air travel. frame We call the holistic background knowledge that unites these words a frame (Fill- more, 1985). The idea that groups of words are defined with respect to some back- ground information is widespread in artificial intelligence and cognitive science, model where besides frame we see related works like a model (Johnson-Laird, 1983), or script even script (Schank and Abelson, 1977). A frame in FrameNet is a background knowledge structure that defines a set of frame elements frame-specific semantic roles, called frame elements, and includes a set of predi- cates that use these roles. Each word evokes a frame and profiles some aspect of the frame and its elements. The FrameNet dataset includes a set of frames and frame elements, the lexical units associated with each frame, and a set of labeled example sentences. For example, the change position on a scale frame is defined as follows: This frame consists of words that indicate the change of an Item’s posi- The “Change position on a scale” Frametion on a scale (the Attribute) from a starting point (Initial value) to an end point (Final value). This frame consists of words that indicate the change of an Some of the semantic roles (frame elements) in the frame are defined asITEM in ’s Core roles Fig. 22.3. Note that these are separated into core roles, which are frame specific, and Non-coreposition roles on a scale (the non-core roles, whichA areTTRIBUTE more like the) from a starting point ( Arg-M arguments in PropBank, expressedINITIAL more general properties of time, location, and so on. VALUE) to an end point (Here are some exampleFINAL sentences:VALUE)

(22.20) [ITEM Oil] rose [ATTRIBUTE in price] [DIFFERENCE by 2%]. (22.21) [ITEM It] has increased [FINAL STATE to having them 1 day a month]. (22.22) [ITEM Microsoft shares] fell [FINAL VA L U E to 7 5/8].

(22.23) [ITEM Colon cancer incidence] fell [DIFFERENCE by 50%] [GROUP among men].

(22.24) a steady increase [INITIAL VA L U E from 9.5] [FINAL VA L U E to 14.3] [ITEM in dividends]

29 (22.25) a[DIFFERENCE 5%] [ITEM dividend] increase... Note from these example sentences that the frame includes target words like rise, fall, and increase. In fact, the complete frame consists of the following words: 8 CHAPTER 22 SEMANTIC ROLE LABELING •

Core Roles ATTRIBUTE The ATTRIBUTE is a scalar property that the ITEM possesses. DIFFERENCE The distance by which an ITEM changes its position on the scale. FINAL STATE A description that presents the ITEM’s state after the change in the ATTRIBUTE’s value as an independent predication. FINAL VA L U E The position on the scale where the ITEM ends up. INITIAL STATE A description that presents the ITEM’s state before the change in the AT- TRIBUTE’s value as an independent predication. INITIAL VA L U E The initial position on the scale from which the ITEM moves away. ITEM The entity that has a position on the scale. VALUE RANGE A portion of the scale, typically identified by its end points, along which the values of the ATTRIBUTE fluctuate. Some Non-Core Roles DURATION The length of time over which the change takes place. SPEED The rate of change of the VALUE. GROUP The GROUP in which an ITEM changes the value of an ATTRIBUTE in a specified way. Figure 22.3 The frame elementsThe “Change position on a scale” Frame in the change position on a scale frame from the FrameNet Labelers Guide (Ruppenhofer et al., 2006).

VERBS: dwindle move soar escalation shift advance edge mushroom swell explosion tumble climb explode plummet swing fall decline fall reach triple fluctuation ADVERBS: decrease fluctuate rise tumble gain increasingly diminish gain rocket growth dip grow shift NOUNS: hike double increase skyrocket decline increase drop jump slide decrease rise 30 FrameNet also codes relationships between frames, allowing frames to inherit from each other, or representing relations between frames like causation (and gen- eralizations among frame elements in different frames can be representing by inher- itance as well). Thus, there is a Cause change of position on a scale frame that is linked to the Change of position on a scale frame by the cause relation, but that adds an AGENT role and is used for causative examples such as the following:

(22.26) [AGENT They] raised [ITEM the price of their soda] [DIFFERENCE by 2%]. Together, these two frames would allow an understanding system to extract the common event semantics of all the verbal and nominal causative and non-causative usages. FrameNets have also been developed for many other languages including Span- ish, German, Japanese, Portuguese, Italian, and Chinese.

22.6 Semantic Role Labeling semantic role labeling Semantic role labeling (sometimes shortened as SRL) is the task of automatically finding the semantic roles of each argument of each predicate in a sentence. Cur- rent approaches to semantic role labeling are based on supervised , often using the FrameNet and PropBank resources to specify what counts as a pred- icate, define the set of roles used in the task, and provide training and test sets. 8 CHAPTER 22 SEMANTIC ROLE LABELING The “Change position on a scale” Frame•

Core Roles ATTRIBUTE The ATTRIBUTE is a scalar property that the ITEM possesses. DIFFERENCE The distance by which an ITEM changes its position on the scale. FINAL STATE A description that presents the ITEM’s state after the change in the ATTRIBUTE’s value as an independent predication. FINAL VA L U E The position on the scale where the ITEM ends up. INITIAL STATE A description that presents the ITEM’s state before the change in the AT- TRIBUTE’s value as an independent predication. INITIAL VA L U E The initial position on the scale from which the ITEM moves away. ITEM The entity that has a position on the scale. VALUE RANGE A portion of the scale, typically identified by its end points, along which the values of the ATTRIBUTE fluctuate. Some Non-Core Roles DURATION The length of time over which the change takes place. SPEED The rate of change of the VALUE. GROUP The GROUP in which an ITEM changes the value of an 31 ATTRIBUTE in a specified way. Figure 22.3 The frame elements in the change position on a scale frame from the FrameNet Labelers Guide (Ruppenhofer et al., 2006).

VERBS: dwindle move soar escalation shift advance edge mushroom swell explosion tumble climb explode plummet swing fall decline fall reach triple fluctuation ADVERBS: decrease fluctuate rise tumble gain increasingly diminish gain rocket growth dip grow shift NOUNS: hike double increase skyrocket decline increase drop jump slide decrease rise

FrameNet also codes relationships between frames, allowing frames to inherit from each other, or representing relations between frames like causation (and gen- eralizations among frame elements in different frames can be representing by inher- itance as well). Thus, there is a Cause change of position on a scale frame that is linked to the Change of position on a scale frame by the cause relation, but that adds an AGENT role and is used for causative examples such as the following:

(22.26) [AGENT They] raised [ITEM the price of their soda] [DIFFERENCE by 2%]. Together, these two frames would allow an understanding system to extract the common event semantics of all the verbal and nominal causative and non-causative usages. FrameNets have also been developed for many other languages including Span- ish, German, Japanese, Portuguese, Italian, and Chinese.

22.6 Semantic Role Labeling

semantic role labeling Semantic role labeling (sometimes shortened as SRL) is the task of automatically finding the semantic roles of each argument of each predicate in a sentence. Cur- rent approaches to semantic role labeling are based on supervised machine learning, often using the FrameNet and PropBank resources to specify what counts as a pred- icate, define the set of roles used in the task, and provide training and test sets. Relation between frames

Inherits from: Is Inherited by: Perspective on: Is Perspectivized in: Uses: Is Used by: Subframe of: Has Subframe(s): Precedes: Is Preceded by: Is Inchoative of: Is Causative of: 32 8 CHAPTER 22 SEMANTIC ROLE LABELING •

Core Roles ATTRIBUTE The ATTRIBUTE is a scalar property that the ITEM possesses. DIFFERENCE The distance by which an ITEM changes its position on the scale. FINAL STATE A description that presents the ITEM’s state after the change in the ATTRIBUTE’s value as an independent predication. FINAL VA L U E The position on the scale where the ITEM ends up. INITIAL STATE A description that presents the ITEM’s state before the change in the AT- TRIBUTE’s value as an independent predication. INITIAL VA L U E The initial position on the scale from which the ITEM moves away. ITEM The entity that has a position on the scale. VALUE RANGE A portion of the scale, typically identified by its end points, along which the values of the ATTRIBUTE fluctuate. Some Non-Core Roles DURATION The length of time over which the change takes place. SPEED The rate of change of the VALUE. GROUP The GROUP in which an ITEM changes the value of an ATTRIBUTE in a specified way. Figure 22.3 The frame elements in the change position on a scale frame from the FrameNet Labelers Guide (Ruppenhofer et al., 2006).

VERBS: dwindle move soar escalation shift advance edge mushroom swell explosion tumble climb explode plummet swing fall decline fall reach triple fluctuation ADVERBS: decrease fluctuate rise tumble gain increasingly diminish gain rocket growth dip grow shift NOUNS: hike double increase skyrocket decline increase drop jump slide decrease rise Relation between frames FrameNet also codes relationships between frames, allowing frames to inherit from each other, or representing relations between frames like causation (and gen- eralizations“cause change position on a scale” among frame elements in different frames can be representing by inher- itanceIs Causative of: as well). Thus, thereChange_position_on_a_scale is a Cause change of position on a scale frame that is linked to the Change of position on a scale frame by the cause relation, but that addsAdds an agent Role an AGENT role and is used for causative examples such as the following:

(22.26) [AGENT They] raised [ITEM the price of their soda] [DIFFERENCE by 2%]. •Together,add.v, thesecrank.v two frames, curtail.v would, cut.n allow an, cut.v understanding, decrease.v system, development.n to extract the , commondiminish.v event semantics, double.v of all the, drop.v verbal, andenhance.v nominal causative, growth.n and non-causative, increase.v, usages. FrameNetsknock down.v have also, beenlower.v developed, move.v for many, promote.v other languages, push.n including, push.v Span-, ish, German,raise.v Japanese,, reduce.v Portuguese,, reduction.n Italian, and, slash.v Chinese., step up.v, swell.v

33 22.6 Semantic Role Labeling semantic role labeling Semantic role labeling (sometimes shortened as SRL) is the task of automatically finding the semantic roles of each argument of each predicate in a sentence. Cur- rent approaches to semantic role labeling are based on supervised machine learning, often using the FrameNet and PropBank resources to specify what counts as a pred- icate, define the set of roles used in the task, and provide training and test sets. Relations between frames

EVENT Event Place TRANSITIVE_ACTION CAUSE_TO_MAKE_NOISE MAKE_NOISE Time event.n, happen.v, Event Purpose Sound occur.v, take place.v, ... Place Place Place Time Time Time OBJECTIVE_INFLUENCE Place Agent Agent Noisy_event Time Cause Cause Sound_source Patient Sound_maker cough.v, gobble.v, Influencing_entity hiss.v, ring.v, yodel.v, ... blare.v, honk.v, play.v, — Influencing_situation ring.v, toot.v, ... Dependent_entity affect.v, effect.n, Inheritance relation Causative_of relation impact.n, impact.v, ... Excludes relation

34 Figure 2: Partial illustration of frames, roles,Figure from Das et al 2010 and LUs related to the CAUSE TO MAKE NOISE frame, from the FrameNet lexicon. “Core” roles are filled ovals. Non-core roles (such as Place and Time) as unfilled ovals. No particular signifi- cance is ascribed to the ordering of a frame’s roles in its lexicon entry (the selection and ordering of roles above is for illustative convenience). CAUSE TO MAKE NOISE defines a total of 14 roles, many of them not shown here.

data that does not correspond to an LU for the frame it evokes. Each frame definition also includes a set of frame elements, or roles, corresponding to different aspects of the concept represented by the frame, such as participants, props, and attributes. We use the term argument to refer to a sequence of word tokens annotated as filling a frame role. Fig. 1 shows an example sentence from the training data with annotated targets, LUs, frames, and role-argument pairs. The FrameNet lexicon also provides information about relations between frames and between roles (e.g., INHERITANCE). Fig. 2 shows a subset of the relations between three frames and their roles. Accompanying most frame definitions in the FrameNet lexicon is a set of lexico- graphic exemplar sentences (primarily from the ) annotated for that frame. Typically chosen to illustrate variation in argument realization patterns for the frame in question, these sentences only contain annotations for a single frame. We found that using exemplar sentences directly to train our models hurt performance as evaluated on SemEval’07 data, even though the number of exemplar sentences is an or- der of magnitude larger than the number of sentences in our training set ( 2.2). This is § presumably because the exemplars are neither representative as a sample nor similar to the test data. Instead, we make use of these exemplars in features ( 4.2). § 2.2 Data Our training, development, and test sets consist of documents annotated with frame- semantic structures for the SemEval’07 task, which we refer to collectively as the SemEval’07 data.3 For the most part, the frames and roles used in annotating these documents were defined in the FrameNet lexicon, but there are some exceptions for which the annotators defined supplementary frames and roles; these are included in the

3The full-text annotations and other resources for the 2007 task are available at http://framenet. icsi.berkeley.edu/semeval/FSSE.html.

4 Computational Linguistics Volume 40, Number 1

1. Introduction

FrameNet (Fillmore, Johnson, and Petruck 2003) is a linguistic resource storing consider- able information about lexical and predicate-argument semantics in English. Grounded in the theory of frame semantics (Fillmore 1982), it suggests—but does not formally define—a semantic representation that blends representations familiar from word-sense disambiguation (Ide and Veronis´ 1998) and semantic role labeling (SRL; Gildea and Jurafsky 2002). Given the limited size of available resources, accurately producing richly structured frame-semantic structures with high coverage will require data-driven techniques beyond simple supervised classification, such as latent variable modeling, semi-supervised learning, and joint inference. In this article, we present a computational and statistical model for frame-semantic , the problem of extracting from text semantic predicate-argument structures such as those shown in Figure 1. We aim to predict a frame-semantic representation with two statistical models rather than a collection of local classifiers, unlike earlier ap- proaches (Baker, Ellsworth, and Erk 2007). We use a probabilistic framework that cleanly integrates the FrameNet lexicon and limited available training data. The probabilistic framework we adopt is highly amenable to future extension through new features, more relaxed independence assumptions, and additional semi-supervised models. Carefully constructed lexical resources and annotated data sets from FrameNet, detailed in Section 3, form the basis of the frame structure prediction task. We de- compose this task into three subproblems: target identification (Section 4), in which frame-evoking predicates are marked in the sentence; frame identification (Section 5), in which the evoked frame is selected for each predicate; and argument identification (Section 6), in which arguments to each frame are identified and labeled with a role from that frame. Experiments demonstrating favorable performance to the previous state of the art on SemEval 2007 and FrameNet data sets are described in each section. Some novel aspects of our approach include a latent-variable model (Section 5.2) and a semi- supervised extension of the predicate lexicon (Section 5.5) to facilitate disambiguation of words not inSchematic of Frame Semantics the FrameNet lexicon; a unified model for finding and labeling arguments

Figure from Das et al (2014) Figure35 1 An example sentence from the annotations released as part of FrameNet 1.5 with three targets marked in bold. Note that this annotation is partial because not all potential targets have been annotated with predicate-argument structures. Each target has its evoked semantic frame marked above it, enclosed in a distinct shape or border style. For each frame, its semantic roles are shown enclosed within the same shape or border style, and the spans fulfilling the roles are connected to the latter using dotted lines. For example, manner evokes the CONDUCT frame, and has the AGENT and MANNER roles fulfilled by Austria and most un-Viennese,respectively.

10 FrameNet Complexity 1 Introduction

But there still are n't enough ringers to ring more than six of the eight bells . Frame LU

N_m NOISE_MAKERS bell.n Agent Sound_maker CAUSE_TO_MAKE_NOISE ring.v UFFICIENCY Item Enabled_situation S enough.a XISTENCE Entity E there be.v

Figure 1: A sentence from PropBank and the SemEval’07 training data, and a partial depiction of gold FrameNet annotations. Each frame is a row below the sentence (or- dered for readability). Thick lines indicate targets thatFrom Das et al. 2010 evoke frames; thin solid/dotted lines with labels indicate arguments. “N m” under bells is short for the Noise maker role of the NOISE MAKERS frame—it is a denoted frame element because it is also the target. The last row indicates that there. . . are is a discontinuous target. In PropBank, the 36 verb ring is the only annotated predicate for this sentence, and it is not related to other predicates with similar meanings.

FrameNet (Fillmore et al., 2003) is a rich linguistic resource containing considerable information about lexical and predicate-argument semantics in English. Grounded in the theory of frame semantics (Fillmore, 1982), it suggests—but does not formally define—a semantic representation that blends word-sense disambiguation and semantic role label- ing. In this report, we present a computational and statistical model for frame-, the problem of extracting from text semantic predicate-argument structures such as those shown in Fig. 1. We aim to predict a frame-semantic representation as a structure, not as a pipeline of classifiers. We use a probabilistic framework that cleanly integrates the FrameNet lexicon and (currently very limited) available training data. Al- though our models often involve strong independence assumptions, the probabilistic framework we adopt is highly amenable to future extension through new features, re- laxed independence assumptions, and semisupervised learning. Some novel aspects of our current approach include a latent-variable model that permits disambiguation of words not in the FrameNet lexicon, a unified model for finding and labeling arguments, and a precision-boosting constraint that forbids arguments of the same predicate to over- lap. Our parser, named SEMAFOR,1 achieves the best published results to date on the SemEval’07 FrameNet task (Baker et al., 2007).

2 Resources and Task

We consider frame-semantic parsing resources.

2.1 FrameNet Lexicon The FrameNet lexicon is a taxonomy of manually identified general-purpose frames for English.2 Listed in the lexicon with each frame are several lemmas (with part of speech) that can denote the frame or some aspect of it—these are called lexical units (LUs). In a sentence, word or phrase tokens that evoke a frame are known as targets. The set of LUs listed for a frame in FrameNet may not be exhaustive; we may see a target in new

1Semantic Analyzer of Frame Representations 2Like the SemEval’07 participants, we used FrameNet v. 1.3 (http://framenet.icsi.berkeley. edu).

3 FrameNet and PropBank representations Computational Linguistics Volume 40, Number 1

(a)

37 (b) Figure 2 (a) A phrase-structure tree taken from the Penn Treebank and annotated with PropBank predicate-argument structures. The verbs created and pushed serve as predicates in this sentence. Dotted arrows connect each predicate to its semantic arguments (bracketed phrases). (b) A partial depiction of frame-semantic structures for the same sentence. The words in bold are targets,whichinstantiatea(lemmatizedandpart-of-speech–tagged)lexical unit and evoke asemanticframe.Everyframeannotationisshownenclosedinadistintshapeorborderstyle, and its argument labels are shown together on the same vertical tier below the sentence. See text for explanation of abbreviations.

phrase-structure syntax trees from the Wall Street Journal section of the Penn Treebank (Marcus, Marcinkiewicz, and Santorini 1993) annotated with predicate-argument structures for verbs. In Figure 2(a), the syntax tree for the sentence is marked with various semantic roles. The two main verbs in the sentence, created and pushed,are the predicates. For the former, the constituent more than 1.2 million jobs serves as the semantic role ARG1 and the constituent In that time serves as the role ARGM-TMP.Similarly for the latter verb, roles ARG1, ARG2, ARGM-DIR,andARGM-TMP are shown in the figure. PropBank defines core roles ARG0 through ARG5,whichreceivedifferentinterpretations for different predicates. Additional modifier roles ARGM-* include ARGM-TMP (temporal) and ARGM-DIR (directional), as shown in Figure 2(a). The PropBank representation therefore has a small number of roles, and the training data set comprises some 40,000 sentences, thus making the semantic role labeling task an attractive one from the perspective of machine learning. There are many instances of influential work on semantic role labeling using PropBank conventions. Pradhan et al. (2004) present a system that uses support vector machines (SVMs) to identify the arguments in a syntax tree that can serve as semantic roles, followed by classification of the identified arguments to role names via a collection of binary SVMs. Punyakanok et al. (2004) describe a semantic role labeler that uses inte- ger linear programming for inference and uses several global constraints to find the best

12 Semantic Role Labeling

Semantic Role Labeling Algorithm 22.6 SEMANTIC ROLE LABELING 9 Semantic role labeling (SRL) •

•RecallThe task that theof differencefinding the semantic roles of each argument of each between these two models of semantic roles is that FrameNetpredicate in a (22.27) employssentence. many frame-specific frame elements as roles, while Prop- Bank (22.28) uses a smaller number of numbered argument labels that can be inter- preted• FrameNet as verb-specificversus labels,PropBank along with: the more general ARGM labels. Some examples: [You] can’t [blame] [the program] [for being unable to identify it] (22.27) COGNIZER TARGET EVALUEE REASON [The San Francisco Examiner] issued [a special edition] [yesterday] (22.28) ARG0 TARGET ARG1 ARGM-TMP A simplified semantic role labeling algorithm is sketched in Fig. 22.4. While there are a large number of algorithms, many of them use some version of the steps 39 in this algorithm. Most algorithms, beginning with the very earliest semantic role analyzers (Sim- mons, 1973), begin by parsing, using broad-coverage parsers to assign a parse to the input string. Figure 22.5 shows a parse of (22.28) above. The parse is then traversed to find all words that are predicates. For each of these predicates, the algorithm examines each node in the parse tree and decides the semantic role (if any) it plays for this predicate. This is generally done by supervised classification. Given a labeled training set such as PropBank or FrameNet, a feature vector is extracted for each node, using feature templates described in the next subsection. A 1-of-N classifier is then trained to predict a semantic role for each constituent given these features, where N is the number of potential semantic roles plus an extra NONE role for non-role constituents. Most standard classification algorithms have been used (logistic regression, SVM, etc). Finally, for each test sentence to be labeled, the classifier is run on each relevant constituent. We give more details of the algorithm after we discuss features.

function SEMANTICROLELABEL(words) returns labeled tree

parse PARSE(words)

for each predicate in parse do for each node in parse do featurevector EXTRACTFEATURES(node, predicate, parse)

CLASSIFYNODE(node, featurevector, parse)

Figure 22.4 A generic semantic-role-labeling algorithm. CLASSIFYNODE is a 1-of-N clas- sifier that assigns a semantic role (or NONE for non-role constituents), trained on labeled data such as FrameNet or PropBank.

Features for Semantic Role Labeling A wide variety of features can be used for semantic role labeling. Most systems use some generalization of the core set of features introduced by Gildea and Jurafsky (2000). A typical set of basic features are based on the following feature templates (demonstrated on the NP-SBJ constituent The San Francisco Examiner in Fig. 22.5): The governing predicate, in this case the verb issued. The predicate is a cru- • cial feature since labels are defined only with respect to a particular predicate. The phrase type of the constituent, in this case, NP (or NP-SBJ). Some se- • mantic roles tend to appear as NPs, others as S or PP, and so on. History

• Semantic roles as a intermediate semantics, used early in • machine translation (Wilks, 1973) • question-answering (Hendrix et al., 1973) • spoken-language understanding (Nash-Webber, 1975) • dialogue systems (Bobrow et al., 1977) • Early SRL systems Simmons 1973, Marcus 1980: • parser followed by hand-written rules for each verb • dictionaries with verb-specific case frames (Levin 1977) 40 Why Semantic Role Labeling • A useful shallow semantic representation • Improves NLP tasks like: • Shen and Lapata 2007, Surdeanu et al. 2011 • machine translation Liu and Gildea 2010, Lo et al. 2013

41 22.6 SEMANTIC ROLE LABELING 9 •

Recall that the difference between these two models of semantic roles is that FrameNet (22.27) employs many frame-specific frame elements as roles, while Prop- Bank (22.28) uses a smaller number of numbered argument labels that can be inter- preted as verb-specific labels, along with the more general ARGM labels. Some examples: [You] can’t [blame] [the program] [for being unable to identify it] (22.27) COGNIZER TARGET EVALUEE REASON [The San Francisco Examiner] issued [a special edition] [yesterday] (22.28) ARG0 TARGET ARG1 ARGM-TMP A simplified semantic role labeling algorithm is sketched in Fig. 22.4. While there are a large number of algorithms, many of them use some version of the steps in this algorithm. Most algorithms, beginning with the very earliest semantic role analyzers (Sim- mons, 1973), begin by parsing, using broad-coverage parsers to assign a parse to the input string. Figure 22.5 shows a parse of (22.28) above. The parse is then traversed to find all words that are predicates. For each of these predicates, the algorithm examines each node in the parse tree and decides the semantic role (if any) it plays for this predicate. This is generally done by supervised classification. Given a labeled training set such as PropBank or FrameNet, a feature vector is extracted for each node, using feature templates described in the next subsection. A 1-of-N classifier is then trained to predict a semantic role for each constituent given these features, where N is the number of potential semantic roles plus an extra NONE role for non-role constituents. Most standard classification algorithms have been used (logistic regression, SVM, etc). Finally, for each test sentence to be labeled, theA simple modern classifier is run on each relevantalgorithm constituent. We give more details of the algorithm after we discuss features.

function SEMANTICROLELABEL(words) returns labeled tree

parse PARSE(words)

for each predicate in parse do for each node in parse do featurevector EXTRACTFEATURES(node, predicate, parse)

CLASSIFYNODE(node, featurevector, parse)

Figure 22.4 A generic semantic-role-labeling algorithm. CLASSIFYNODE is a 1-of-N clas- sifier that assigns a semantic role (or NONE for non-role constituents), trained on labeled data such42 as FrameNet or PropBank.

Features for Semantic Role Labeling A wide variety of features can be used for semantic role labeling. Most systems use some generalization of the core set of features introduced by Gildea and Jurafsky (2000). A typical set of basic features are based on the following feature templates (demonstrated on the NP-SBJ constituent The San Francisco Examiner in Fig. 22.5): The governing predicate, in this case the verb issued. The predicate is a cru- • cial feature since labels are defined only with respect to a particular predicate. The phrase type of the constituent, in this case, NP (or NP-SBJ). Some se- • mantic roles tend to appear as NPs, others as S or PP, and so on. How do we decide what is a predicate

• If we’re just doing PropBank verbs • Choose all verbs • Possibly removing light verbs (from a list) • If we’re doing FrameNet (verbs, nouns, adjectives) • Choose every word that was labeled as a target in training data

43 10 CHAPTER 22 SEMANTIC ROLE LABELING Semantic Role Labeling•

S

NP-SBJ =ARG0 VP

DT NNP NNP NNP

The San Francisco Examiner

VBD = TARGET NP =ARG1 PP-TMP =ARGM-TMP

issued DT JJ NN IN NP

aspecialeditionaroundNNNP-TMP

noon yesterday Figure44 22.5 Parse tree for a PropBank sentence, showing the PropBank argument labels. The dotted line shows the path feature NP S VP VBD for ARG0, the NP-SBJ constituent The San Francisco Examiner. " # # The headword of the constituent, Examiner. The headword of a constituent • can be computed with standard head rules, such as those given in Chapter 11 in Fig. ??. Certain headwords (e.g., pronouns) place strong constraints on the possible semantic roles they are likely to fill. The headword part of speech of the constituent, NNP. • The path in the parse tree from the constituent to the predicate. This path is • marked by the dotted line in Fig. 22.5. Following Gildea and Jurafsky (2000), we can use a simple linear representation of the path, NP S VP VBD. and " # # " represent upward and downward movement in the tree, respectively. The # path is very useful as a compact representation of many kinds of grammatical function relationships between the constituent and the predicate. The voice of the clause in which the constituent appears, in this case, active • (as contrasted with passive). Passive sentences tend to have strongly different linkings of semantic roles to surface form than do active ones. The binary linear position of the constituent with respect to the predicate, • either before or after. The subcategorization of the predicate, the set of expected arguments that • appear in the verb phrase. We can extract this information by using the phrase- structure rule that expands the immediate parent of the predicate; VP VBD ! NP PP for the predicate in Fig. 22.5. The named entity type of the constituent. • The first words and the last word of the constituent. • The following feature vector thus represents the first NP in our example (recall that most observations will have the value NONE rather than, for example, ARG0, since most constituents in the parse tree will not bear a semantic role):

ARG0: [issued, NP, Examiner, NNP, NP S VP VBD, active, before, VP NP PP, " # # ! ORG, The, Examiner] Other features are often used in addition, such as sets of n-grams inside the constituent, or more complex versions of the path features (the upward or downward halves, or whether particular nodes occur in the path). It’s also possible to use dependency parses instead of constituency parses as the basis of features, for example using dependency parse paths instead of constituency paths. 10 CHAPTER 22 SEMANTIC ROLE LABELING •

S

NP-SBJ =ARG0 VP

Features DT NNP NNP NNP

The San Francisco Examiner

Headword of constituent VBD = TARGET NP =ARG1 PP-TMP =ARGM-TMP Examiner issued DT JJ NN IN NP aspecialeditionaroundNNNP-TMP Headword POS noon yesterday Figure 22.5 Parse tree for a PropBank sentence, showing the PropBank argument labels. The dotted line NNP shows the path feature NP S VP VBD for ARG0, the NP-SBJ constituent The San Francisco Examiner. " # Named Entity type of # constit The headword of the constituent, Examiner. The headword of a constituent Voice of the clause • can be computedORGANIZATION with standard head rules, such as those given in Chapter 11 Active in Fig. ??. Certain headwords (e.g., pronouns) place strong constraints on the possibleFirst and last words of semantic roles they are likely to fill. constit The headword part of speech of the constituent, NNP. Subcategorization of pred • The path in the parse tree from the constituent to the predicate. This path is • The, Examiner VP -> VBD NP PP marked by the dotted line in Fig. 22.5. Following Gildea and Jurafsky (2000), we can use a simple linear representation of the path, NP S VP VBD. and " # # " representLinear upwardposition,clause and downward movement in there: predicate tree, respectively. The 45 # path is very useful as a compact representation of many kinds of grammatical function relationshipsbefore between the constituent and the predicate. The voice of the clause in which the constituent appears, in this case, active • (as contrasted with passive). Passive sentences tend to have strongly different linkings of semantic roles to surface form than do active ones. The binary linear position of the constituent with respect to the predicate, • either before or after. The subcategorization of the predicate, the set of expected arguments that • appear in the verb phrase. We can extract this information by using the phrase- structure rule that expands the immediate parent of the predicate; VP VBD ! NP PP for the predicate in Fig. 22.5. The named entity type of the constituent. • The first words and the last word of the constituent. • The following feature vector thus represents the first NP in our example (recall that most observations will have the value NONE rather than, for example, ARG0, since most constituents in the parse tree will not bear a semantic role):

ARG0: [issued, NP, Examiner, NNP, NP S VP VBD, active, before, VP NP PP, " # # ! ORG, The, Examiner] Other features are often used in addition, such as sets of n-grams inside the constituent, or more complex versions of the path features (the upward or downward halves, or whether particular nodes occur in the path). It’s also possible to use dependency parses instead of constituency parses as the basis of features, for example using dependency parse paths instead of constituency paths. 10 CHAPTER 22 SEMANTIC ROLE LABELING •

S

NP-SBJ =ARG0 VP

DT NNP NNP NNP

The San Francisco Examiner

VBD = TARGET NP =ARG1 PP-TMP =ARGM-TMP issued DT JJ NN IN NP aspecialeditionaroundNNNP-TMP

noon yesterday Figure 22.5 Parse tree for a PropBank sentence, showing the PropBank argument labels. The dotted line shows the path feature NP S VP VBD for ARG0, the NP-SBJ constituent The San Francisco Examiner. " # # The headword of the constituent, Examiner. The headword of a constituent • can be computed with standard head rules, such as those given in Chapter 11 in Fig. ??. Certain headwords (e.g., pronouns) place strong constraints on the possible semantic roles they are likely to fill. The headword part of speech of the constituent, NNP. • Path Features The path in the parse tree from the constituent to the predicate. This path is • marked by the dotted line in Fig. 22.5. Following GildeaPath in andthe parse tree from the constituent to the predicate Jurafsky (2000), 10 CHAPTER 22 SEMANTIC ROLE LABELING we can use a simple linear representation of the path, NP S VP •VBD. and " # # S " represent upward and downward movement in the tree, respectively. The # NP-SBJ =ARG0 VP path is very useful as a compact representation of many kindsDT of NNP grammatical NNP NNP function relationships between the constituent and the predicate.The San Francisco Examiner

VBD = TARGET NP =ARG1 PP-TMP =ARGM-TMP

The voice of the clause in which the constituent appears, in this case,issuedactive DT JJ NN IN NP

• aspecialeditionaroundNNNP-TMP (as contrasted with passive). Passive sentences tend46 to have strongly different noon yesterday Figure 22.5 Parse tree for a PropBank sentence, showing the PropBank argument labels. The dotted line linkings of semantic roles to surface form than do activeshows ones. the path feature NP S VP VBD for ARG0, the NP-SBJ constituent The San Francisco Examiner. " # # The headword of the constituent, Examiner. The headword of a constituent • The binary linear position of the constituent with respect to thecan be predicate, computed with standard head rules, such as those given in Chapter 11 • in Fig. ??. Certain headwords (e.g., pronouns) place strong constraints on the either before or after. possible semantic roles they are likely to fill. The headword part of speech of the constituent, NNP. • The path in the parse tree from the constituent to the predicate. This path is • The subcategorization of the predicate, the set of expected argumentsmarked by the dotted that line in Fig. 22.5. Following Gildea and Jurafsky (2000), we can use a simple linear representation of the path, NP S VP VBD. and • " # # " represent upward and downward movement in the tree, respectively. The appear in the verb phrase. We can extract this information by using# the phrase- path is very useful as a compact representation of many kinds of grammatical function relationships between the constituent and the predicate. structure rule that expands the immediate parent of the predicate; VPThe voice of theVBD clause in which the constituent appears, in this case, active • (as contrasted! with passive). Passive sentences tend to have strongly different NP PP for the predicate in Fig. 22.5. linkings of semantic roles to surface form than do active ones. The binary linear position of the constituent with respect to the predicate, • either before or after. The named entity type of the constituent. The subcategorization of the predicate, the set of expected arguments that • • appear in the verb phrase. We can extract this information by using the phrase- structure rule that expands the immediate parent of the predicate; VP VBD ! The first words and the last word of the constituent. NP PP for the predicate in Fig. 22.5. • The named entity type of the constituent. • The first words and the last word of the constituent. The following feature vector thus represents the first NP in our example• (recall The following feature vector thus represents the first NP in our example (recall that most observations will have the value NONE rather than, for example, ARG0, that most observations will have the value NONE rather than, for example,since most constituentsARG in the parse0, tree will not bear a semantic role):

ARG0: [issued, NP, Examiner, NNP, NP S VP VBD, active, before, VP NP PP, " # # ! since most constituents in the parse tree will not bear a semantic role): ORG, The, Examiner] Other features are often used in addition, such as sets of n-grams inside the constituent, or more complex versions of the path features (the upward or downward ARG halves, or whether particular nodes occur in the path). 0: [issued, NP, Examiner, NNP, NP S VP VBD, active, before, VPIt’s also possibleNP to use dependency PP, parses instead of constituency parses as the " # # basis of features,! for example using dependency parse paths instead of constituency ORG, The, Examiner] paths. Other features are often used in addition, such as sets of n-grams inside the constituent, or more complex versions of the path features (the upward or downward halves, or whether particular nodes occur in the path). It’s also possible to use dependency parses instead of constituency parses as the basis of features, for example using dependency parse paths instead of constituency paths. 38 3. MACHINE LEARNING FOR SEMANTIC ROLE LABELING S

NP VP

NN NNS VBD NP1 S

Housing lobbies persuaded NNP NP VP

Congress *1 TO VP

to VB NP PP

raise DT NN to $124,875

the ceiling

Figure 3.4: Treebank annotation of equi constructions. An empty category is indicated by *, and co- indexing by superscript 1.

The most common values of the path feature, along with interpretations, are shown in Ta- ble 3.1. Frequent path features Table 3.1: Most frequent values of path feature in the training data. Frequency Path Description 14.2% VB VP PP PP argument/adjunct ↑ ↓ 11.8 VB VP S NP subject ↑ ↑ ↓ 10.1 VB VP NP object ↑ ↓ 7.9 VB VP VP S NP subject (embedded VP) ↑ ↑ ↑ ↓ 4.1 VB VP ADVP adverbial adjunct ↑ ↓ 3.0 NN NP NP PP prepositional complement of noun ↑ ↑ ↓ 1.7 VB VP PRT adverbial particle ↑ ↓ 1.6 VB VP VP VP S NP subject (embedded VP) ↑ ↑ ↑ ↑ ↓ 14.2 no matching parse constituent 31.4 Other 47 From Palmer, Gildea, Xue 2010 For the purposes of choosing a frame element label for a constituent,the path feature is similar to the governing category feature defined above. Because the path captures more information, it may be more susceptible to parser errors and data sparseness. As an indication of this, the path feature 10 CHAPTER 22 SEMANTIC ROLE LABELING •

S

NP-SBJ =ARG0 VP

DT NNP NNP NNP

The San Francisco Examiner

VBD = TARGET NP =ARG1 PP-TMP =ARGM-TMP issued DT JJ NN IN NP

aspecialeditionaroundNNNP-TMP

noon yesterday Figure 22.5 Parse tree for a PropBank sentence, showing the PropBank argument labels. The dotted line shows the path feature NP S VP VBD for ARG0, the NP-SBJ constituent The San Francisco Examiner. " # # The headword of the constituent, Examiner. The headword of a constituent • can be computed with standard head rules, such as those given in Chapter 11 in Fig. ??. Certain headwords (e.g.,Final feature vector pronouns) place strong constraints on the possible semantic roles they are likely to fill. The headword part of speech of the constituent, NNP. • • For “The San Francisco Examiner”, The path in the parse tree from the constituent to the predicate. This path is • marked by the dotted line• inArg0, [issued Fig. 22.5. Following, NP, Examiner, Gildea and JurafskyNNP, (2000)active, before, , VPàNP PP, we can use a simple linearORG, The, representationExaminer, ] of the path, NP S VP VBD. and " # # " represent upward and downward movement in the tree, respectively. The # path is very useful as a compact representation of many kinds of grammatical function relationships between• Other features could be used as well the constituent and the predicate. The voice of the clause in which• sets theof n constituent-grams inside the appears,constituent in this case, active • (as contrasted with passive).• Passiveother path features sentences tend to have strongly different linkings of semantic roles to surface form than do active ones. • the upward or downward halves The binary linear position of the constituent with respect to the predicate, • either before or after. 48 • whether particular nodes occur in the path The subcategorization of the predicate, the set of expected arguments that • appear in the verb phrase. We can extract this information by using the phrase- structure rule that expands the immediate parent of the predicate; VP VBD ! NP PP for the predicate in Fig. 22.5. The named entity type of the constituent. • The first words and the last word of the constituent. • The following feature vector thus represents the first NP in our example (recall that most observations will have the value NONE rather than, for example, ARG0, since most constituents in the parse tree will not bear a semantic role):

ARG0: [issued, NP, Examiner, NNP, NP S VP VBD, active, before, VP NP PP, " # # ! ORG, The, Examiner] Other features are often used in addition, such as sets of n-grams inside the constituent, or more complex versions of the path features (the upward or downward halves, or whether particular nodes occur in the path). It’s also possible to use dependency parses instead of constituency parses as the basis of features, for example using dependency parse paths instead of constituency paths. 3-step version of SRL algorithm

1. Pruning: use simple heuristics to prune unlikely constituents. 2. Identification: a binary classification of each node as an argument to be labeled or a NONE. 3. Classification: a 1-of-N classification of all the constituents that were labeled as arguments by the previous stage

49 Why add Pruning and Identification steps?

• Algorithm is looking at one predicate at a time • Very few of the nodes in the tree could possible be arguments of that one predicate • Imbalance between • positive samples (constituents that are arguments of predicate) • negative samples (constituents that are not arguments of predicate) • Imbalanced data can be hard for many classifiers • So we prune the very unlikely constituents first, and then use a 50 classifier to get rid of the rest. 32 3. MACHINE LEARNING FOR SEMANTIC ROLE LABELING tree. In addition, since it is not uncommon for a constituent to be assigned multiple semantic roles by different predicates (generally a predicate can only assign one semantic role to a constituent), the semantic role labeling system can only look at one predicate at a time, trying to find all the arguments for this particular predicate in the tree.The tree will be traversed as many times as there are predicates in the tree. This means there is an even higher proportion of constituents in the parse tree that are not arguments for the predicate the semantic role labeling system is currently looking at any given point.There is thus a serious imbalance between positive samples (constituents that are arguments to a particular predicate) and negative samples (constituents that are not arguments to this particular predicate). Machine learning algorithms generally do not handle extremely unbalanced data very well. For these reasons, many systems divide the semantic role labeling task into two steps, identifi- cation,in which a binary decision is made as to whether a constituent carries a semantic role for a given predicate, and classification in which the specific semantic role is chosen. Separate machine learning classifiers are trained for these two tasks,often with many of the same features (Gildea and Jurafsky, 2002; Pradhan et al., 2005). Another approach is to use a set of heuristics to prune out the majority of the negative samples, as a predicate’s roles are generally found in a limited number of syntactic relations to the predicate itself. Some semantic labeling systems use a combination of both approaches: heuristics are first applied to prune out the constituents that are obviously not an argument for a certain predicate, and then a binary classifier is trained to further separate the positive samples from the negative samples. The goal of this filtering process is just to decide whether a constituent is an argument or not. Then a multi-class classifier is trained to decide the specific semantic role for this argument. In the filtering stage, it is generally a good idea to be conservative and err on the side of keeping too manyPruning heuristics constituents rather than being too aggressive– Xue and filteringand Palmer (2004) out true arguments. This can be achieved by lowering the threshold for positive samples, or conversely, raising the threshold for • Add sisters of the predicate, then aunts, then greatnegative samples. -aunts, etc

(20)• But ignoring anything in a coordination structure

S

S CC S

NP VP and NP VP

Strikes VBD VP Premier VBD PP and Ryzhkov mismanagementwere VBD warned of tough measures

51 cited A common final stage: joint inference

• The algorithm so far classifies everything locally – each decision about a constituent is made independently of all others • But this can’t be right: Lots of global or joint interactions between arguments • Constituents in FrameNet and PropBank must be non-overlapping. • A local system may incorrectly label two overlapping constituents as arguments • PropBank does not allow multiple identical arguments • labeling one constituent ARG0 52 • Thus should increase the probability of another being ARG1 How to do joint inference • Reranking • The first stage SRL system produces multiple possible labels for each constituent • The second stage classifier the best global label for all constituents • Often a classifier that takes all the inputs along with other features (sequences of labels) 53 22.6 SEMANTIC ROLE LABELING 9 • 22.6 SEMANTIC ROLE LABELING 9 • 22.6 SEMANTIC ROLE LABELING 9 Recall that the difference between these• two models of semantic roles is that FrameNetRecall ( that22.27 the) employs difference many between frame-specific these two frame models elements of semantic as roles, roles while is Prop- that BankFrameNetRecall (22.28 ( that22.27) uses the) employs a difference smaller many number between frame-specific of numbered these two frame argument models elements of labels semantic as roles, that can roles while be is Prop-inter- that pretedBankFrameNet (22.28 as verb-specific (22.27) uses) employs a smaller labels, many number along frame-specific of with numbered the more frame argument general elements labels ARGM as roles, that labels. can while be Some Prop-inter- examples:pretedBank (22.28 as verb-specific) uses a smaller labels, number along of with numbered the more argument general labels ARGM that labels. can be Some inter- examples:preted as[You] verb-specific can’t labels, [blame] along [the with program] the more [for being general unable ARGM to identify labels. it] Some (22.27) examples:[You]COGNIZER can’t [blame]TARGET [theEVALUEE program] [forREASON being unable to identify it] (22.27) [The[You]COGNIZER San Francisco can’t [blame]TARGET Examiner] [theEVALUEE issued program] [a special[forREASON being edition] unable [yesterday] to identify it] (22.28)(22.27) [TheARGCOGNIZER0 San Francisco TARGET Examiner] EVALUEE issuedTARGET [aARG specialREASON1 edition] [yesterday]ARGM-TMP (22.28) [TheARG0 San Francisco Examiner] issuedTARGET [aARG special1 edition] [yesterday]ARGM-TMP (22.28)A simplified semantic role labeling algorithm is sketched in Fig. 22.4. While thereA are simplifiedARG a large0 number semantic of role algorithms, labelingTARGET many algorithm of ARG them1 is sketched use some in versionARGM Fig.-22.4TMP of the. While steps inthere thisA are simplified algorithm. a large number semantic of role algorithms, labeling many algorithm of them is sketched use some in version Fig. 22.4 of the. While steps inthere thisMost are algorithm. a algorithms, large number beginning of algorithms, with the many very ofearliest them semantic use some role version analyzers of the(Sim- steps mons,in thisMost 1973) algorithm. algorithms,, begin by beginning parsing, using with the broad-coverage very earliest parsers semantic to role assign analyzers a parse(Sim- to the inputmons,Most string. 1973) algorithms,, Figure begin by22.5 beginning parsing,shows using a with parse the broad-coverage of very (22.28 earliest) above. parsers semantic The to parse role assign is analyzers then a parse traversed(Sim- to the toinputmons, find string. 1973) all words, Figure begin that by22.5 are parsing,shows predicates. using a parse broad-coverage of (22.28) above. parsers The to parse assign is then a parse traversed to the toinput findFor string. all each words Figure of these that22.5 are predicates,shows predicates. a the parse algorithm of (22.28 examines) above. eachThe parse node is in then the parse traversed tree andto findFor decides all each words the of these semantic that are predicates, predicates. role (if the any) algorithm it plays for examines this predicate. each node in the parse tree andThisFor decides each is generally the of these semantic done predicates, role by supervised (if the any) algorithm it plays classification. for examines this predicate. Given each node a labeled in the training parse tree set suchandThis decides as PropBank is generally the semantic or done FrameNet, role by supervised (if any) a feature it plays classification. vector for this is extracted predicate. Given afor labeled each node, training using set featuresuchThis as templates PropBank is generally described or done FrameNet, by in supervised the a next feature subsection. classification. vector is extracted Given afor labeled each node, training using set featuresuchA as 1-of-N templates PropBank classifier described or FrameNet, is then in trained the a next feature to subsection. predict vector a semantic is extracted role for for each each node, constituent using givenfeatureA 1-of-N these templates features, classifier described where is then in N trained the is next the to number subsection. predict of a semantic potential role semantic for each roles constituent plus an extragivenA NONE 1-of-N these features, classifierrole for non-role where is then N trained constituents. is the to number predict Most of a semantic standard potential classification role semantic for each roles constituent algorithms plus an haveextragiven been NONE these used features, role (logistic for non-role where regression, N constituents. is the SVM, number etc). Most of Finally, standard potential for classification each semantic test sentence roles algorithms plus to bean labeled,haveextra been NONE the usedMore complications: classifier role (logistic for non-role is runregression, on constituents. each SVM, relevant etc). MostFrameNet constituent. Finally, standard for Weclassification each give test more sentence algorithms details to be of thelabeled,have algorithm been the used classifier after (logistic we is discuss runregression, on features. each SVM, relevant etc). constituent. Finally, for We each give test more sentence details to be of thelabeled, algorithmWe need an extra step to find the frame the classifier after we is discuss run on features. each relevant constituent. We give more details of the algorithm after we discuss features. function SEMANTICROLELABEL(words) returns labeled tree function SEMANTICROLELABEL(words) returns labeled tree parse PARSE(words) function S EMANTICROLELABEL(words) returns labeled tree parsefor eachPpredicateARSE(wordsin parse) do

parsefor eachPredicatevectorfor eachPpredicateARSEnode(wordsininßparseparse) ExtractFrameFeaturesdodo (predicate,parse)

Frame for eachfeaturevectorßnodeClassifyFramein parseEXTRACT(dopredicate,predicatevectorFEATURES(node, predicate) , parse) for each predicate in parse do for eachfeaturevectorCLASSIFYnode inNODEparseE(XTRACTnodedo , featurevectorFEATURES,(nodeparse,)predicate, parse)

featurevectorCLASSIFYNODEE(XTRACTnode, featurevectorFEATURES,(nodeparse,)predicate, parse)

Figure 22.4 AC genericLASSIFY semantic-role-labelingNODE(node, featurevector algorithm., parse C)LASSIFY, Frame) NODE is a 1-of-N clas- sifierFigure that 22.4 assignsA generica semantic semantic-role-labeling role (or NONE for non-role algorithm. constituents), CLASSIFYN trainedODE is on a 1-of- labeledN clas- data sifiersuchFigure as that FrameNet22.4 assignsA generica or semantic PropBank. semantic-role-labeling role (or NONE for non-role algorithm. constituents), CLASSIFYN trainedODE is on a 1-of- labeledN clas- data 54 sifiersuch as that FrameNet assigns a or semantic PropBank. role (or NONE for non-role constituents), trained on labeled data such as FrameNet or PropBank. Features for Semantic Role Labeling Features for Semantic Role Labeling A wide variety of features can be used for semantic role labeling. Most systems use Features for Semantic Role Labeling Asome wide generalization variety of features of the can core be set used of for features semantic introduced role labeling. by Gildea Most and systems Jurafsky use Asome(2000) wide generalization. variety A typical of features set of of the basic can core features be set used of forare features semantic based introduced on role the following labeling. by Gildea Most feature and systems templates Jurafsky use some(2000)(demonstrated generalization. A typical on the set ofNP-SBJ of the basic coreconstituent features set of are featuresThe based San introduced on Francisco the following by ExaminerGildea feature andin Fig. templates Jurafsky22.5): (2000)(demonstratedThe. A typicalgoverning on the setNP-SBJpredicate of basicconstituent features, in this are caseThe based the San verb on Francisco theissued following. Examiner The predicate featurein Fig. templates is22.5 a cru-): • (demonstratedThecial feature governing on the sinceNP-SBJpredicate labelsconstituent are, in defined this caseThe only the San with verb Francisco respectissued to. Examiner The a particular predicatein Fig. predicate. is22.5 a cru-): • Thecial feature governingphrase since typepredicate labelsof the are constituent,, in defined this case only in the this with verb case, respectissuedNP to.(or The a particularNP-SBJ predicate). predicate. Some is a cru- se- • Themanticcial featurephrase roles since type tend labels toof appear the are constituent, defined as NPs, only others in this with as case,S respector PPNP, to(or and a particularNP-SBJ so on. ). predicate. Some se- • Themanticphrase roles type tend toof appear the constituent, as NPs, others in this as case,S or PPNP,(or andNP-SBJ so on. ). Some se- • mantic roles tend to appear as NPs, others as S or PP, and so on. Computational Linguistics Volume 40, Number 1

Table 4 Features used forFeatures for Frame Identification frame identification (Equation (2)). All also incorporate f ,theframebeing 20 scored. ℓ = wℓ, πℓ consists of the words and POS tags of a target seen in an exemplar or training sentence⟨ as⟩ evoking f . The features with starred bullets wereDas et al (2014) also used by Johansson and Nugues (2007).

the POS of the parent of the head word of ti • 21 ∗ the set of syntactic dependencies of the head word of t • i ∗ if the head word of ti is a verb, then the set of dependency labels of its children • the dependency label on the edge connecting the head of t and its parent • i the sequence of words in the prototype, wℓ • the lemmatized sequence of words in the prototype • the lemmatized sequence of words in the prototype and their part-of-speech tags π • ℓ WordNet relation22 ρ holds between ℓ and t • i WordNet relation22 ρ holds between ℓ and t ,andtheprototypeisℓ • i WordNet relation22 ρ holds between ℓ and t , the POS tag sequence of ℓ is π ,andthePOS • i ℓ tag sequence of ti is πt exemplar sentences. Note that this model makes an independence assumption: Each frame is predicted independently of all others in the document. In this way the model is55 similar to J&N’07. However, ours is a single conditional model that shares features and weights across all targets, frames, and prototypes, whereas the approach of J&N’07 consists of many separately trained models. Moreover, our model is unique in that it uses a latent variable to smooth over frames for unknown or ambiguous LUs. Frame identification features depend on the preprocessed sentence x, the prototype ℓ and its WordNet lexical-semantic relationship with the target ti,andofcoursethe frame f .Ourmodelusesbinaryfeatures,whicharedetailedinTable4.

5.3 Parameter Estimation

Given a training data set (either SemEval 2007 data set or the FrameNet 1.5 full text annotations), which is of the form x(j),t(j),f(j), (j) N ,wediscriminativelytrainthe ⟨⟨ A ⟩⟩j=1 frame identification model by maximizing the training data log-likelihood:23

N mj (j) (j) (j) max log pθ( f , ℓ t ,x )(3) θ i | i j=1 i=1 ℓ (j) Lf ! ! ∈!i

In Equation (3), mj denotes the number of frames in a sentence indexed by j.Note that the training problem is non-convex because of the summed-out prototype latent

20 POS tags are found automatically during preprocessing. 21 If the target is not a subtree in the parse, we consider the words that have parents outside the span, and apply three heuristic rules to select the head: (1) choose the first word if it is a verb; (2) choose the last word if the first word is an adjective; (3) if the target contains the word of, and the first word is a noun, we choose it. If none of these hold, choose the last word with an external parent to be the head. 22 These are: IDENTICAL-WORD, SYNONYM, ANTONYM (including extended and indirect antonyms), HYPERNYM, HYPONYM, DERIVED FORM, MORPHOLOGICAL VARIANT (e.g., plural form), VERB GROUP, ENTAILMENT, ENTAILED-BY, SEE-ALSO, CAUSAL RELATION, and NO RELATION. 23 We found no benefit on either development data set from using an L2 regularizer (zero-mean Gaussian prior).

24 4.3. LANGUAGE-(IN)DEPENDENT SEMANTIC ROLE LABELING 63 constituents in that chain are assigned the same semantic role. The other scenario is when there is a discontinuous argument where multiple constituents jointly play a role with respect to a predicate. A constituent in a parse tree receives multiple semantic roles when there is argument sharing where this constituent plays a role for multiple predicates. This can happen in a coordination structure when multiple predicates are conjoined and share a subject. This can also happen in subject control or object control structures when two verbs share a subject or an object. Not just English (22)

Arg0

ArgM-TMP ArgM-MNR

警方 Rel Arg1 正在 详细 调查 事故 原因 56

4.3.2 SEMANTIC ROLE LABELING FOR VERBS Commonalities Like English semantic role labeling, Chinese semantic role labeling can be formu- lated as a classification task with three distinct stages:pruning,argument identification,and argument classification. The pruning algorithm described in Chapter 3 turns out to be straightforward to im- plement for Chinese data, and it involves minor changes in the phrase labels. For example, IP in the Chinese Treebank corresponds roughly to S in the Penn Treebank, and CP corresponds roughly to SBAR. Example 23 illustrates how the pruning algorithm works for Chinese. Assuming the predicate of interest is 调查 (“investigate”), the algorithm first adds the NP (事故 “accident” 原因 “cause”) to the list of candidates.Then it moves up a level and adds the two ADVPs ( 正在 “now” and 详细 “thoroughly”) to the list of candidates. At the next level, the two VPs form a coordination structure and thus no candidate is added.Finally,at the next level,the NP ( 警方 “police”) is added to the list of candidates. Obviously, the pruning algorithm works better when the parse trees that are the input to the semantic role labeling system are correct. In a realistic scenario, the parse trees are generated by a syntactic parser and are not expected to be perfect. However, experimental results Not just verbs: NomBank S Meyers et al. 2004 ✟✟❍❍ 2004) experimented with an automatic SRL sys- ✟ ❍ ✟✟ ❍❍ ✟ ❍ tem developed using a relatively small set of man- NP VP ✟✟❍❍ (ARG0) ✟ ❍ ually selected nominalizations from FrameNet and ✟❍ ✟ ❍ ✟✟ ❍❍ ✟ ❍ NNP NNP VBD VP Penn Chinese TreeBank. The SRL accuracy of ✟❍ ✟✟ ❍❍ Ben Bernanke was ✟✟ ❍❍ their system is not directly comparable to ours. VBN PP ✟✟❍❍ (Support) ✟ ❍ ✟✟ ❍❍ 3 Model training and testing nominated IN NP ✟❍ ✟✟ ❍❍ as ✟ ❍ We treat the NomBank-based SRL task as a clas- NP NN (ARG1) predicate sification problem and divide it into two phases: ✏P ✏✏ PP Greenspan ’s replacement argument identification and argument classifica- 57 Figure from Jiang and Ng 2006 tion. During the argument identification phase, Figure 1: A sample sentence and its parse tree la- each parse tree node is marked as either argument beled in the style of NomBank or non-argument. Each node marked as argument is then labeled with a specific class during the argument classification phase. The identification PropBank SRL and discusses possible future re- model is a binary classifier , while the classifica- search directions. tion model is a multi-class classifier. Opennlp maxent1, an implementation of Maxi- 2 Overview of NomBank mum Entropy (ME) modeling, is used as the clas- sification tool. Since its introduction to the Natural The NomBank (Meyers et al., 2004c; Meyers Language Processing (NLP) community (Berger et al., 2004b) annotation project originated from et al., 1996), ME-based classifiers have been the NOMLEX (Macleod et al., 1997; Macleod et shown to be effective in various NLP tasks. ME al., 1998) nominalization lexicon developed under modeling is based on the insight that the best the New York University Proteus Project. NOM- model is consistent with the set of constraints im- LEX lists 1,000 nominalizations and the corre- posed and otherwise as uniform as possible. ME spondences between their arguments and the ar- models the probability of label l given input x as guments of their verb counterparts. NomBank in Equation 1. f (l, x) is a feature function that frames combine various lexical resources (Meyers i maps label l and input x to either 0 or 1, while the et al., 2004a), including an extended NOMLEX summation is over all n feature functions and with and PropBank frames, and form the basis for anno- as the weight parameter for each feature func- tating the argument structures of common nouns. i tion f (l, x). Z is a normalization factor. In the Similar to PropBank, NomBank annotation is i x identification model, label l corresponds to either made on the Penn TreeBank II (PTB II) corpus. “argument” or “non-argument”, and in the classi- For each common noun in PTB II that takes argu- fication model, label l corresponds to one of the ments, its core arguments are labeled with ARG0, specific NomBank argument classes. The classifi- ARG1, etc, and modifying arguments are labeled cation output is the label l with the highest condi- with ARGM-LOC to denote location, ARGM- tional probability p(l x). MNR to denote manner, etc. Annotations are | n made on PTB II parse tree nodes, and argument exp( i=1 ifi(l, x)) boundaries align with the span of parse tree nodes. p(l x)= (1) | Zx A sample sentence and its parse tree labeled in the style of NomBank is shown in Figure 1. To train the ME-based identification model, For the nominal predicate “replacement”, “Ben training data is gathered by treating each parse tree Bernanke” is labeled as ARG0 and “Greenspan node that is an argument as a positive example and ’s” is labeled as ARG1. There is also the special the rest as negative examples. Classification train- label “Support” on “nominated” which introduces ing data is generated from argument nodes only. “Ben Bernanke” as an argument of “replacement”. During testing, the algorithm of enforcing non- The support construct will be explained in detail in overlapping arguments by (Toutanova et al., 2005) Section 4.2.3. is used. The algorithm maximizes the log- We are not aware of any NomBank-based auto- probability of the entire NomBank labeled parse matic SRL systems. The work in (Pradhan et al., 1http://maxent.sourceforge.net/

139 Additional Issues for nouns

• Features: • Nominalization lexicon (employmentà employ) • Morphological stem • Healthcare, Medicate à care • Different positions • Most arguments of nominal predicates occur inside the NP • Others are introduced by support verbs • Especially light verbs “X made an argument”, “Y took a nap”

58 Semantic Role Labeling

Conclusion Semantic Role Labeling • A level of shallow semantics for representing events and their participants • Intermediate between parses and full semantics • Two common architectures, for various languages • FrameNet: frame-specific roles • PropBank: Proto-roles • Current systems extract by • parsing sentence • Finding predicates in the sentence 60 • For each one, classify each parse tree constituent