VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to

Andrea Di Fabio♦♥, Simone Conia♦, Roberto Navigli♦ ♦Department of Computer Science ♥Department of Literature and Modern Cultures Sapienza University of Rome, Italy {difabio,conia,navigli}@di.uniroma1.it

Abstract The automatic identification and labeling of ar- gument structures is a task pioneered by Gildea We present VerbAtlas, a new, hand-crafted and Jurafsky(2002) called Semantic Role Label- lexical-semantic resource whose goal is to ing (SRL). SRL has become very popular thanks bring together all verbal synsets from Word- to its integration into other related NLP tasks Net into semantically-coherent frames. The such as (Liu and Gildea, frames define a common, prototypical argu- ment structure while at the same time pro- 2010), visual semantic role labeling (Silberer and viding new concept-specific information. In Pinkal, 2018) and (Bas- contrast to PropBank, which defines enumer- tianelli et al., 2013). ative semantic roles, VerbAtlas comes with In order to be performed, SRL requires the fol- an explicit, cross-frame set of semantic roles lowing core elements: 1) a verb inventory, and 2) linked to selectional preferences expressed in terms of WordNet synsets, and is the first a semantic role inventory. However, the current resource enriched with semantic information verb inventories used for this task, such as Prop- about implicit, shadow, and default arguments. Bank (Palmer et al., 2005) and FrameNet (Baker We demonstrate the effectiveness of VerbAtlas et al., 1998), are language-specific and lack high- in the task of dependency-based Semantic quality interoperability with existing knowledge Role Labeling and show how its integration bases. Furthermore, such resources provide low into a high-performance system leads to im- to medium coverage of the verbal lexicon (cf. Ta- provements on both the in-domain and out-of- ble1), with PropBank showing the best figures, domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org. but still lower than other lexical inventories like WordNet (Fellbaum et al., 1998). Finally, the 1 Introduction informativeness of the semantic roles defined in the various resources ranges from underspecified, During the last two decades, we have witnessed as in PropBank’s roles, to overspecified, as in increased attention to Natural Language Under- FrameNet’s frame elements. This poses multiple standing, a core goal of Natural Language Pro- issues in terms of interpretability or cross-domain cessing (NLP). Several challenges, however, are applicability. yet to be overcome when it comes to performing To overcome the above limitations, in this paper sentence-level semantic tasks (Navigli, 2018). In we present VerbAtlas, a manually-crafted inven- order to understand the meaning of sentences, the tory of verbs and argument structures which pro- semantics of verbs plays a crucial role, since verbs vides several contributions: 1) full coverage of the define the argument structure roughly in terms of English verbal lexicon, 2) prototypical argument "who" did "what" to "whom", with the arguments structures for each cluster of synsets that define a being the constituents that bear a semantic relation semantically-coherent frame, 3) cross-domain ex- (called semantic role) with the verb. In the fol- plicit semantic roles, 4) the specification of refined lowing example, "Joe" and "lunch" are arguments semantic information and selectional preferences of "eat", whose argument structure identifies them, for the argument structure of frames, 5) linkage respectively, as the Agent and the Patient of to WordNet and, as a result, to BabelNet (Navigli the scenario evoked by the verb: and Ponzetto, 2010) and Open Multilingual Word- net (Bond and Foster, 2013), which in turn en- [Joe]Agent is [eating]Verb his [lunch]Patient able scalability across languages. Furthermore, to

627 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 627–637, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics Cluster types # Argument roles # Meaning units # FrameNet Frames 1,224 Frame elements 10,542 Lexical units 5,200 VerbNet Levin’s classes 329 Thematic roles 39 Senses 6,791 PropBank Verbs 5,649 Proto-roles 6 Framesets 10,687 WordNet – – – – Synsets 13,767 VerbAtlas Frames 466 Semantic roles 25 Synsets 13,767

Table 1: Quantitative analysis of popular verbal resources. Since each of these resources is based on different linguistic assumptions, in the top row we use general labels to encompass resource-specific terms that can be viewed as homologous: cluster type, argument roles and meaning units. WordNet does not provide argument structures except for sentence frames which, however, are syntactic and do not specify any roleset. make VerbAtlas suitable for NLP tasks that rely on among others. Its application goes well beyond PropBank, we also provide a mapping to its frame- the annotation of corpora: in fact, it was also sets. Finally, we prove through an SRL experiment adopted for the Abstract Meaning Representa- that VerbAtlas is robust and enables state-of-the- tion (Banarescu et al., 2013), a semantic language art performances on the CoNLL-2009 dataset. that aims at abstracting away from cross-lingual syntactic idiosyncrasies, and NomBank (Meyers 2 Related work et al., 2004), a resource which provides argument structures for nouns. However, PropBank’s major The most popular English verbal resources are drawback is that its roles do not explicitly mark FrameNet (Baker et al., 1998), PropBank (Palmer the type of semantic relation with the verb, instead et al., 2005), and VerbNet (Kipper-Schuler, 2005). they just enumerate the arguments (i.e., Arg0, Each resource is based on a different linguistic Arg1, etc.). Due to this, role labels do not pre- theory, which leads to different information being serve the same type of semantic relation across provided for each verb (cf. Table1). FrameNet, verbs, e.g., the first arguments of "eat" and "feel" in particular, was the first resource to be used for are both labeled with Arg0 even if they express SRL (Gildea and Jurafsky, 2002): it is based on different relations (Agent and Experiencer, frame semantics, theorized by Fillmore(1976), respectively). which assumes different roles, i.e., frame ele- ments, for different frames1. This led to a prolif- VerbNet addresses this limit by providing ex- eration of thousands of roles for only 5200 verbs. plicit, human-readable roles such as Agent, Such domain specificity makes it difficult to scale Patient, Experiencer, etc. Yet, VerbNet to open-text SRL (Hartmann et al., 2017). suffers from low coverage, in that it includes PropBank challenges the issue of FrameNet’s only 6791 verbs, which makes it a suboptimal re- roles with a repository of only 6 different core source for wide-coverage SRL. Another drawback roles plus 19 modifiers for 10,687 framesets2. This of VerbNet is its organization into Levin’s classes resource is the most widely adopted for SRL, as (Levin, 1993), namely, 329 groups of verbs shar- also attested by the popularity of datasets such as ing the same syntactic behavior, independently of CoNLL-2005 (Carreras and Màrquez, 2005) and their meaning. As a consequence, its classes can- CoNLL-2009 (Hajicˇ et al., 2009). PropBank’s not be used straightforwardly in a semantic task. methodology was also used for other languages, On top of their individual limitations, the above such as Arabic (Palmer et al., 2008), Chinese (Xue resources also have some common drawbacks. and Palmer, 2003), Spanish and Catalan (Taulé One of these is language specificity, which implies et al., 2008), Hindi-Urdu (Bhatt et al., 2009), a considerable amount of work will be needed for Brazilian Portuguese (Duran and Aluísio, 2011), the creation of a corresponding resource for each Finnish (Haverinen et al., 2015), Turkish (¸Sahin new language of interest. Another common prob- and Adalı, 2018), Basque (Aldezabal et al., 2010), lem is the lack of coverage, as shown in Table 1. The highest-coverage inventory is PropBank, 1"By the term ‘frame’ I have in mind any system of con- cepts related in such a way that to understand any one of them with its 10,687 framesets and 5,649 verbs. How- you have to understand the whole structure in which it fits." ever, PropBank’s coverage is still limited when (Fillmore, 1982). compared to computational lexicons like Word- 2"A frameset corresponds to a coarse-grained sense of the verb which has a specific set of semantic arguments." (Babko- Net, which contains 13,767 verbal concepts and Malaya, 2005). 11,529 distinct verbs.

628 Figure 1: Structure of the EAT frame. The colored roles on the left (a) constitute the Predicate Argument Structure of the frame and their respective selectional preferences within round brackets. Box (b) shows a sample of the frame’s synsets (a synonym per synset is shown). In the example sentences (c), we show the head words colored according to their semantic roles. While in the first sentence the synset {devour} does not project the Location, the synset {wash down} projects an implicit Instrument with its synset-level selectional preference (liquid).

To get the best of the three worlds, there group semantically-coherent synsets from Word- have been attempts to map the aforementioned re- Net (v3.0). A VerbAtlas frame is a cluster of verb sources to each other. To our knowledge the most meanings expressing similar semantics, which ex- popular endeavors are SemLink (Palmer, 2009; pands upon the frame notion of FrameNet. Each Bonial et al., 2013) and Predicate Matrix (De La- frame is provided with an argument structure that calle et al., 2014), the latter being an extension generalizes over all the synsets in the frame plus of the former via automatic methods. However, preferential selections for each semantic role. Fur- while the main drawback of SemLink is coverage, thermore, synsets are enriched with novel seman- the Predicate Matrix suffers from quality issues. tic information. In Figure1 we show the structure Both the foregoing limitations are addressed in of the EAT frame in VerbAtlas, which we use as VerbAtlas, the manually-curated resource that we a running example to illustrate the new features in present in this paper. With VerbAtlas we pro- our work and compare them with current verbal vide the community with a resource that improves resources. the main features of the existing inventories of verbs, while also adding new semantic informa- 3.1 Frame organization tion. Compared to FrameNet, VerbAtlas has fewer We define a frame in VerbAtlas as a cluster of frames and roles while at the same time present- WordNet synsets that, with different shades of ing full coverage in terms of concepts (cf. Table meaning, express a certain scenario. For instance, 1). VerbNet’s roles, instead, provided inspiration the EAT frame, an excerpt of which is shown in for our role repository, but we reduced the num- Figure1(b), comprises all the synsets specializing ber of roles from 39 to 25 in order to alleviate the general scenario of "eating", including synsets data sparsity issues. PropBank was mapped to such as {eat}, {devour, guttle, raven, pig}, etc. VerbAtlas synsets and roles in order to enable SRL This organization of frames is intended to systems to exploit its additional semantic infor- overcome the limitations affecting VerbNet and mation and improve verb coverage. Moreover, in FrameNet. In fact, while the former organizes contrast to the other resources, the use of Word- verbs by syntactic rather than semantic behavior, Net synsets makes VerbAtlas able to scale multi- the latter is affected by the sparsity of 5,200 ver- lingually through resources such as BabelNet. bal senses (i.e., lexical units) distributed across In Section3 we introduce VerbAtlas and its fea- frames. Instead, PropBank’s framesets corre- tures; in Section4 we explain how this resource spond to a coarse-grained sense of a verb, but was built and organized. Finally, Section5 vali- each frameset expresses its own separate argument dates VerbAtlas experimentally in the SRL task. structure. 3 VerbAtlas Differently from the other resources, VerbAtlas frames are organized into 466 wide-coverage and We now introduce VerbAtlas, a new verbal se- semantically-coherent clusters (cf. Table1) which mantic resource structured into frames which provide cross-frame argument structures.

629 3.2 Semantic roles has been manually labeled with selectional pref- A limit in PropBank is its inventory of so-called erences from a set of 116 macro-concepts. Our proto-roles (Arg0, Arg1, etc., with the first selectional preferences are defined by WordNet two roughly corresponding, respectively, to a synsets whose hyponyms are expected to be likely proto-agent and a proto-patient (Dowty, 1991)), candidates to the corresponding argument slot, a which does not provide human-readable labels strategy similar to that of Agirre and Martinez and is predicate-specific (i.e., the same label does (2002) which, instead, was algorithm-based. not necessarily depict the same semantic relation Consider again the EAT frame and its PAS: Agent Patient Location across predicates). , , (Figure1 (a)). In this frame, most of the example sentences from the While PropBank roles do not provide clear WordNet synsets express a Patient like "cake", explicit semantics, FrameNet roles are ex- "meat" or "banana", thus, since their common hy- plicit but frame-specific (e.g., Ingestor and pernym is "food", we provided the PAS with the Ingestibles for the "Ingestion" frame, or information that the Patient prototypically ex- Impactor and Impactee for the "Impact" pects hyponyms of the {food, solid food} synset. frame). This produces a fine-grained role inven- The Location role is labeled with its homony- tory that makes it difficult for SRL systems to mous {location} synset given its generality. generalize across frames. In VerbAtlas we follow VerbNet and take a middle-ground approach: our 3.5 Synset-level semantic information inventory of 25 semantic roles is inspired by Verb- To enrich the semantic representation of synsets, Net, whose 39 labels (like Agent, Patient, VerbAtlas provides semantic and pragmatic Time, etc.) are explicit, cross-frame and domain- (English-specific) information regarding implicit general. The rationale is that these features enable (1,028 labels), shadow and default arguments neural networks employed in the SRL task to gen- (2,979 labels) inspired by Pustejovsky(1995) and eralize across frames in a consistent way, as we inferred from synset’s glosses and examples. To show in Section 5.2. For example, we can tag the our knowledge, the following information is new arguments of different VerbAtlas frames – such as in a large-scale verbal resource such as ours: EAT,HIT and CONQUER – with just two roles, namely: Agent and Patient. • implicit arguments, i.e., implicit in the ar- gument structure of the verb but not always 3.3 Prototypical Argument Structure syntactically expressed. Consider the synset Each VerbAtlas frame expresses a Prototypical Ar- {overleap, vault} in our JUMP frame: since gument Structure (PAS) that generalizes over all its gloss is "Jump across or leap over (an ob- the synsets in a particular frame. The PAS speci- stacle)", we know that the Patient of this fies a roleset that defines the frame’s overall mean- verb can be a hyponym of {obstacle}, there- ing (e.g., in Figure1(a) Agent, Patient and fore implying a selectional preference on the Location). In Figure1(c) we show two sen- role with the {obstacle} synset. tences annotated with argument roles. • shadow and default arguments: the former Since we do not distinguish between core roles is incorporated in the meaning of the verb and adjuncts, we decided to also include in the but not syntactically expressed. An exam- various PAS roles which might be projected op- ple from the EAT frame is {eat in, dine in} tionally by argument structures that are nonethe- ("Eat at home"). This synset is tagged with less present in the scenario evoked by the frame. the shadow argument Location = {home}, For instance, the inclusion of the Location role since the latter is not expressed syntactically. in the PAS of the EAT example ensures a robust On the other hand, default arguments are log- descriptiveness of the PAS across synsets in the ically implied but not syntactically expressed. same frame. These are also tagged as shadow arguments. For instance, the synset {deliver} (as in "Our 3.4 Selectional preferences local supermarket delivers") has the label To narrow down the number of candidates for a Patient = {grocery} to provide the com- particular argument slot and provide further se- monsense information that what a supermar- mantic structure, the semantic role of the PAS ket usually delivers is groceries.

630 We are aware of the problems raised about EAT CLASSIFY SWITCH KILL CLEAN MOVE SOMETHING Pustejovsky’s framework of analysis (Fodor and DRINK AMELIORATE JUDGE Lepore, 1998), but we believe that this new infor- POUR TRANSPORT PERFORM mation, if properly exploited, would be fruitful for OBTAIN BEHAVE STOP better meaning representations. GIVE HIT COMBINE Table 2: Frames with the highest number of synsets. 3.6 Linkage to PropBank and multilingual resources 4.2 Creation of frames To allow a straightforward applicability to and VerbAtlas frames were induced via semantic sim- evaluation of VerbAtlas on semantic tasks based ilarity between synsets. During multiple iterations on PropBank, such as SRL, each frameset and over the verb inventory, if two or more synsets roleset in the CoNLL-2009 dataset was mapped were perceived as similar, namely, if they shared to the corresponding VerbAtlas frame and roleset. features like the purpose of the action and the This mapping was done starting from the Predicate participants in the action (Hill et al., 2015) (e.g., Matrix (De Lacalle et al., 2014) and then manu- "dine" and "lunch"), they were clustered together ally corrected and augmented. Finally, we aligned to form a new semantically-coherent frame. A PropBank’s roles with the corresponding roles in one-synset-one-frame strategy was used to avoid VerbAtlas PAS (e.g., Arg0 and Arg1 of Prop- any future mapping problem with other resources. Bank’s eat.01 frameset were mapped, respectively, The process is based on synset-by-synset human to the VerbAtlas PAS roles Agent and Patient inspection. Two synsets are clustered together of the frame EAT). in the same frame if they express semantically- Moreover, thanks to the use of WordNet and its similar scenarios. For example, the verbs “kill” semantic nature, VerbAtlas can easily scale to ar- and “slaughter” share similar participants in bitrary languages. This can be achieved by lever- the action (an Agent who kills/slaughters; a aging BabelNet (Navigli and Ponzetto, 2010), a Patient who is killed/slaughtered) and purpose lexical-semantic resource that provides multilin- (Agent makes Patient die), so they are clus- gual synsets in 284 different languages linked to tered into the KILL frame. WordNet itself. As a result, VerbAtlas can be At the end of the first iteration, we checked the used in virtually any language, in contrast to Prop- resulting frames and named them according to the Bank and other resources which are inherently common action implied by the synsets contained language-specific and require considerable human therein. For example, the EAT frame (Figure1(b)) intervention in each new language. is composed of synsets that depict different kinds Finally, the above implies that if there is a ver- of action implying eating, like {devour, . . . , pig} bal repository for one of the languages linked to and {gorge, . . . , glut}. In Table2 we report the the aforementioned resources, its argument struc- frames with the highest number of verb synsets. tures can be seamlessly aligned with the PAS of To validate the resulting frames we adopted a VerbAtlas, as well as with VerbAtlas frames. strategy similar to Hovy et al.(2006): we provided 3 linguists not involved in the clustering with a 4 Methodology random sample of 1000 frame-synset pairs and asked them if the action expressed by the synset 4.1 Bottom-up approach was implied by that expressed by the frame. We it- The manual construction of VerbAtlas was per- erated over the inventory various times and moved formed in a bottom-up fashion. Rather than forc- the synsets from one frame to another until the Co- ing synsets into a predetermined set of frames, we hen’s Kappa coefficient (Eugenio and Glass, 2004) started the clustering process from the full inven- of their (yes or no) agreement was κ ≥ 0.80. Once tory of 13,767 WordNet synsets, which represent this value was attained, we finalized the overall the building blocks of VerbAtlas frames. This al- clustering and the resulting frames. lowed us to induce the semantics of 466 frames (with an average of 29.5 synsets per frame) and 4.3 Establishing explicit semantic roles make VerbAtlas consistent both in terms of lexical Inspired by existing research on semantic roles semantics and the frame’s argument structures. (Bonial et al., 2011; Allen and Teng, 2018),

631 Agent Material as in the sentence "The reporter defected to an- Attribute Patient Beneficiary Product other network"). Cause Purpose Co-Agent Recipient 4.5 Rationale of selectional preferences Co-Patient Result Co-Theme Source Each semantic role of a PAS was provided with Destination Stimulus selectional preferences from a set of 116 synsets. Experiencer Theme The set was created by generalizing across the ar- Extent Time Goal Topic guments in the example sentences of each synset Instrument Value in a given frame. The result for each PAS argu- Location ment was one or more hypernyms in the WordNet taxonomy which were common across the frame Table 3: List of VerbNet roles which make up the synsets. VerbAtlas role inventory. In contrast to VerbNet selectional restrictions, Affector Manner the selectional preferences in VerbAtlas do not re- Axis Path strict the occurrence of words with a particular fea- Context Pivot ture (e.g., solid, liquid, etc.), rather, they suggest Duration Precondition Final_Time Predicate the most prototypical hypernym(s) for a given ar- Initial_Location Reflexive gument. We opted for preferences instead of re- Initial_State Trajectory strictions to not exclude the potential metaphorical Table 4: List of VerbNet roles unused in VerbAtlas. use of a verb. 5 Experiments VerbAtlas implements VerbNet’s human-readable In previous sections we discussed the qualitative role labels across its frames and merges together and structural advantages of VerbAtlas compared some VerbNet roles which can be seen as comple- to existing resources. Because, in principle, a new mentary. For example, Initial_Location linguistic resource might seem to provide an arbi- and Initial_State are subsumed by trary contribution, we also provide here an experi- Source. The hunch is that we can consider them mental validation of its usefulness in an SRL task. as playing the same role but in different scenarios: First, we present our experimental setup (Section the former is the Source for verbs of change of 5.1) and then discuss our results (Section 5.2). location, while the latter is the Source for verbs of change of state. Furthermore, in VerbAtlas we 5.1 Experimental setup did not want to use poorly instantiated roles (e.g., Goal We argue that the quality and wealth of Affector), therefore 14 of the roles (Table4) information in VerbAtlas can effectively improve were subsumed by coarser and more common the performance of an existing, high-performance roles (e.g., Agent in place of Affector, see SRL system like Cai et al.(2018). Table3). Datasets We performed our experiments on the 4.4 Rationale of a Prototypical Argument English part of CoNLL-2009 in-domain and out- Structure of-domain datasets (Hajicˇ et al., 2009) designed We defined as "prototypical" an argument struc- for the dependency-based SRL task and whose ture capable of being applied to all the synsets annotations concern predicate argument structures in a frame. To achieve this, each PAS was de- from PropBank and nominal argument structures fined at the end of the first iteration over the in- from NomBank (Meyers et al., 2004). The dataset ventory of synsets and constantly adjusted until consists of 39,279 sentences and 958,167 tokens the last iteration. The initial PAS was inspired by (18.7% of which are argument bearers). the argument structure of the common verbal con- Through this experiment we aim to show that cept implied by the synsets constituting the frame. the additional information provided by VerbAtlas For instance, in the case of BETRAY, Agent and frames and semantic roles improves the perfor- Patient. The PAS was later expanded due to mance of a neural network on the in-domain test synsets inside the frame that projected additional set. We also aim to demonstrate a better ability to arguments (e.g., the Goal role of {defect, desert} generalize on the out-of-domain dataset.

632 and, if the predicate is a verb, also VerbAtlas la- bels (i.e., frames and their roles). With this design choice, the output of our model can be directly compared to the output of previous SRL models. At the same time, our model achieves a deeper and more general understanding of the relations be- tween a PB or NB predicate and its arguments by learning from the corresponding VerbAtlas frame and its semantically-coherent roles. Formally, our model is built on top of the fol- lowing components: • A word representation layer that, given a sentence s = hw1, w2, . . . , wni, builds a sequence of word representations x = hx1, x2,..., xni where xi is an embedding representing wi. Each xi is the result of the concatenation of a pre-trained word em- bedding ept, and the following trainable vec- tors: a ew, a lemma em- bedding el, a POS embedding epos, and a predicate lemma embedding epred (ac- tive only if wi is a predicate). Formally: pt w l pos pred xi = e ⊕ e ⊕ e ⊕ e ⊕ e . We use GloVe embeddings (Pennington et al., 2014) as our underlying pre-trained word embed- dings. • A densely-connected BiLSTM encoder that, given a sequence x of word representa- tions, returns a sequence of encodings y = Figure 2: Overview of the model architecture. From hy = BiLSTM(xi; x): ∀i ∈ {1, . . . , n}i, bottom to top, a sequence of word embeddings is fed to i where y is a dynamic representation of x a densely-connected BiLSTM encoder where the out- i i put of each encoding layer is concatenated with its in- with respect to the context defined by the put. The predicate hidden representation from the sec- whole sequence x. In a densely-connected ond BiLSTM layer is used for VerbAtlas frame disam- BiLSTM encoder, the output of each layer biguation (right), whereas the predicate hidden repre- is concatenated with the input of the same sentation from the fourth BiLSTM layer is used for layer to mitigate the vanishing gradient prob- PropBank predicate disambiguation (left). The top- k lem. If hi is the encoding of the k-th layer for most output of the BiLSTM encoder is used together k+1 k f k k xi, then hi = hi ⊕ LSTM (hi ; h1:i−1) ⊕ with the output of the PropBank predicate disambigua- b k k m tion layer to obtain a PropBank-style SRL output, and LSTM (hi ; hi+1:n), and yi = hi where together with the output of the VerbAtlas frame disam- m = 6 is the final BiLSTM layer, while f b biguation layer to obtain a VerbAtlas-style SRL output. LSTM and LSTM are the forward and backward LSTM transformations.

Baseline model Our baseline model in Figure2 • A frame disambiguation layer that, given i is built on top of the syntax-agnostic model pro- the BiLSTM encoding ypred of a predicate posed by Cai et al.(2018) in that it is mainly com- wpred at the i-th encoder layer (with i = 2), posed of a word representation layer, a sequence disambiguates wpred with a VerbAtlas frame f encoder and a biaffine attentional role scorer. A f, returning a trainable frame embedding e : key difference is that our model features a multi- i 0 i 0 hpred = ReLU(W · ypred + b ) output layer that returns PropBank (PB) and Nom- f i f Bank (NB) labels (i.e., framesets and their roles) f = argmax(W · hpred + b ) 633 • A predicate disambiguation layer that, Syntax-aware system PRF1 j given the BiLSTM encoding ypred of a pred- Roth and Lapata (2016) 88.1 85.3 86.7 icate wpred at the j-th encoder layer (with Marcheggiani and Titov (2017) 89.1 86.8 88.0 j = 4) and the frame embedding ef , disam- He et al. (2018) 89.7 89.3 89.5 biguates wpred with a PB or NB frameset p, Li et al. (2018) 90.3 89.3 89.8 p returning a trainable predicate embedding e : Syntax-agnostic system PRF1 Marcheggiani et al. (2017) 88.7 86.8 87.7 hj = ReLU(W00 · (yj ⊕ ef ) + b00) pred pred He et al. (2018) 89.5 87.9 88.7 Cai et al. (2018) 89.9 89.2 89.6 p = argmax(Wp · hj + bp) pred This work 90.5 89.5 90.0 • biaffine attentional PB and NB role A Table 5: Results on the English in-domain test. scorer that, given a BiLSTM encoding yi for p a word wi and a predicate embedding e , re- p Syntax-agnostic system P turns a vector s of PB/NB role scores for wi i He et al. (2018) 95.5 with respect to p in a similar fashion to Cai Cai et al. (2018) 95.0 et al.(2018). Formally: This work 96.0 sp = y> · Wrole · ep + Urole(y ⊕ ep) + brole i i PB PB i PB Table 6: Predicate disambiguation results on the En- glish in-domain test. In CoNLL-2009 predicates are • A biaffine attentional VerbAtlas role already identified, hence we only report predicate dis- scorer that, given a BiLSTM encoding yi for ambiguation precision. f a word wi and a frame embedding e , returns f a vector si of VerbAtlas role scores for wi Syntax-agnostic system PRF1 with respect to f. Formally: Marcheggiani et al. (2017) 79.4 76.2 77.7 He et al. (2018) 81.7 76.1 78.8 sf = y> · Wrole · ef + Urole(y ⊕ ef ) + brole i i VA VA i VA Cai et al. (2018) 79.8 78.3 79.0 This work 81.1 78.4 79.7 We found it beneficial to interleave the frame disambiguation layer and the predicate disam- Table 7: Results on the English out-of-domain test. biguation layer within the BiLSTM encoder lay- ers, enhancing the input of the upper encoder lay- ers with the corresponding frame and predicate et al.(2018) with a 0.2% F 1 improvement. We re- embeddings. Finally, our model loss is defined as: call that the main difference between our baseline model and that of Cai et al.(2018) lies in the addi- ltotal = l PropBank/NomBank roles + l VerbAtlas roles + tional VerbAtlas frame disambiguation layer and

l predicate disambiguation + l frame disambiguation role scorer. This demonstrates the boost provided by VerbAtlas in PropBank/NomBank-based SRL. Since the choice of hyperparameters for an LSTM-based model can significantly affect results In-domain predicate disambiguation Remark- (Reimers and Gurevych, 2017), we trained our ably, as shown in Table6, our model outperforms baseline using the same values as in Cai et al. the previously reported best scores in the predicate (2018) unless otherwise stated. disambiguation subtask, scoring 96.0% in preci- sion for the in-domain test set and reporting a 5.2 Results 1.0% improvement over Cai et al.(2018) and a In-domain SRL Table5 reports the results 0.5% improvement over He et al.(2018). of our syntax-agnostic baseline model on the CoNLL-2009 in-domain test set. Not only does Out-of-domain SRL The results of our baseline our model outperform the syntax-agnostic model model on the out-of-domain SRL CoNLL-2009 of Cai et al.(2018) by a significant margin (ac- dataset are shown in Table7. Our system im- cording to a χ2 test, p < 0.05, on precision and proves the results reported by Cai et al.(2018) by recall) with a 0.4% F1 improvement, but it also 0.7% F1, proving that using VerbAtlas improves slightly outperforms the syntax-aware model of Li the model’s ability to generalize across domains.

634 F1 evaluated on verbs + nouns only verbs performance is noteworthy, with a 0.6% decrease This work 90.0 91.1 in F1 on SRL of verbal predicates. without VerbAtlas 89.5 90.5 without PB nor NB – 90.9 6 Conclusions and Future Work In this paper we presented VerbAtlas, a new Table 8: The model shows a significant F1 decrease when it is trained without exploiting VerbAtlas frames large-scale verbal semantic resource which pro- and roles (second row) falling in line with Cai et al. vides generalizing argument structures with cross- (2018) (89.5% vs 89.6%, respectively). On the con- frame semantic roles. The resource is available at trary, removing the PropBank role scorer leads to a neg- http://verbatlas.org. ligible performance drop (last row). In contrast to other verb repositories, VerbAtlas offers full coverage of the English verbs and ad- 5.3 Analysis dresses the issues of current predicate resources, while at the same time providing linkage to Word- Our main experiment proved that a semantics- Net and PropBank. This makes the resource fully focused resource for verbal predicates such as compatible with previous datasets and scalable to VerbAtlas can be successfully employed for arbitrary languages thanks to BabelNet. CoNLL-like SRL datasets that include a mix While the frame creation process resulted in a of nominal and verbal predicates. But what strong agreement between annotators, we further is the contribution of VerbAtlas (i.e., of its validated the quality of VerbAtlas experimentally semantically-clustered frames and semantically- by showing that the integration of its frame infor- coherent roles) to the overall performance? How mation together with its explicit semantic roles en- does a VerbAtlas-only model fare when evaluated ables a neural architecture to improve its perfor- solely on verbal predicates against a PropBank- mance on the Semantic Role Labeling task. This only model? To answer these questions, we evalu- improvement translates across domains, demon- ated the model in two further settings. strating the robustness and variety of the knowl- A first study aimed at quantifying the boost edge provided in our resource. in performance the model gets from the use of As future work, we plan to take full advan- VerbAtlas. Removing the VerbAtlas frame dis- tage of the novel semantic features available in ambiguation layer and role scorer from our model VerbAtlas, such as wide-coverage selectional pref- significantly decreases (according to a χ2 test, p erences and synset-level information, by exploit- < 0.05, on precision and recall) the overall per- ing them in multilingual SRL and Word Sense Dis- formance in F score by 0.5%, as reported in Ta- 1 ambiguation tasks. Our plans include integrating ble8 (left column), with results that are compara- the selectional preferences from SyntagNet (Maru ble to those of Cai et al.(2018) (89.5% vs 89.6% et al., 2019), a new, large-scale lexical-semantic F , Tables8 and5, respectively). This proves 1 combination resource. We also plan to extend our that the model with VerbAtlas achieves a better methodology to nouns and adjectives, in a similar understanding of semantic roles, thanks to using fashion to (O’Gorman et al., 2018) and connect the the VerbAtlas frame disambiguation layer and role resulting frames to those in VerbAtlas. scorer. A second study aimed at comparing PropBank Acknowledgments and VerbAtlas on their common ground, i.e., their The authors gratefully acknowledge coverage and organization of verb meanings into the support of the ERC Consolida- framesets and frames, respectively. Table8 (right tor Grant MOUSSE No. 726487 and column) shows that removing the PropBank bi- the ELEXIS project No. 731015 un- affine role scorer leads to a negligible performance der the European Union’s Horizon drop (0.2% F ) in how it performs on argument 1 2020 research and innovation pro- labeling of verb predicates using only the seman- gramme. tic roles of VerbAtlas mapped at the output layer to PropBank roles (see Section 3.6); we note that This work was supported in part by the MIUR the higher number of roles in VerbAtlas makes the under grant “Dipartimenti di eccellenza 2018- argument labeling problem potentially harder. In 2022” of the Department of Computer Science of contrast, when VerbAtlas is removed, the drop in Sapienza University.

635 References Jiaxun Cai, Shexia He, Zuchao Li, and Hai Zhao. 2018. A full end-to-end semantic role labeler, syntactic- Eneko Agirre and David Martinez. 2002. Integrating agnostic over syntactic-aware? In Proceedings of selectional preferences in WordNet. In Proceedings the 27th International Conference on Computational of First International WordNet Conference. Mysore Linguistics, pages 2753–2765. (India), pages 1–9. Xavier Carreras and Lluís Màrquez. 2005. Introduc- Izaskun Aldezabal, María Jesús Aranzabe, tion to the CoNLL-2005 Shared Task: Semantic Arantza Díaz de Ilarraza Sánchez, and Ainara Role Labeling. In Proceedings of the Ninth Con- Estarrona. 2010. Building the Basque PropBank. In ference on Computational Natural Language Learn- Proceedings of LREC. ing, CONLL ’05, pages 152–164, Stroudsburg, PA, USA. Association for Computational Linguistics. James Allen and Choh Man Teng. 2018. Putting se- mantics into semantic roles. In Proceedings of the Maddalen Lopez De Lacalle, Egoitz Laparra, and Ger- Seventh Joint Conference on Lexical and Computa- man Rigau. 2014. Predicate Matrix: extending Sem- tional Semantics, pages 235–244. Link through WordNet mappings. In Proceedings of LREC, pages 903–909. Olga Babko-Malaya. 2005. PropBank annotation guidelines. URL: http://verbs.colorado.edu. David Dowty. 1991. Thematic proto-roles and argu- ment selection. Language, 67(3):547–619. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet project. In Proceed- Magali Sanches Duran and Sandra Maria Aluísio. ings of the 17th International Conference on Com- 2011. PropBank-Br: a Brazilian Portuguese corpus putational Linguistics, pages 86–90. Association for annotated with semantic role labels. In Proceedings Computational Linguistics. of the 8th Brazilian Symposium in Information and Human Language Technology. Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Barbara Di Eugenio and Michael Glass. 2004. The Knight, Philipp Koehn, Martha Palmer, and Nathan kappa statistic: A second look. Computational lin- Schneider. 2013. Abstract Meaning Representation guistics, 30(1):95–101. for Sembanking. In Proceedings of the 7th Linguis- tic Annotation Workshop and Interoperability with Christiane Fellbaum et al. 1998. WordNet: An elec- Discourse, pages 178–186. tronic database. MIT Press, Cambridge, MA. Charles J. Fillmore. 1976. Frame semantics and the na- Emanuele Bastianelli, Giuseppe Castellucci, Danilo ture of language. Annals of the New York Academy Croce, and Roberto Basili. 2013. Textual inference of Sciences, 280(1):20–32. and meaning representation in human robot interac- tion. In Proceedings of the Joint Symposium on Se- Charles J. Fillmore. 1982. Frame semantics. Linguis- mantic Processing. Textual Inference and Structures tics in the Morning Calm, pages 111–137. in Corpora, pages 65–69. Jerry A. Fodor and Ernie Lepore. 1998. The empti- Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, ness of the lexicon: reflections on James Puste- Owen Rambow, Dipti Sharma, and Fei Xia. 2009. jovsky’s The Generative Lexicon. Linguistic In- A multi-representational and multi-layered quiry, 29(2):269–288. for Hindi-Urdu. In Proceedings of the Third Lin- guistic Annotation Workshop (LAW III), pages 186– Daniel Gildea and Daniel Jurafsky. 2002. Automatic 189. labeling of semantic roles. Computational linguis- tics, 28(3):245–288. Francis Bond and Ryan Foster. 2013. Linking and extending an open multilingual Wordnet. In Pro- Jan Hajic,ˇ Massimiliano Ciaramita, Richard Johans- ceedings of the 51st Annual Meeting of the Associa- son, Daisuke Kawahara, Maria Antònia Martí, Lluís tion for Computational Linguistics, volume 1, pages Màrquez, Adam Meyers, Joakim Nivre, Sebastian 1352–1362. Padó, Jan Štepánek,ˇ et al. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies Claire Bonial, William Corvey, Martha Palmer, in multiple languages. In Proceedings of the Thir- Volha V Petukhova, and Harry Bunt. 2011. A hier- teenth Conference on Computational Natural Lan- archical unification of LIRICS and semantic guage Learning: Shared Task, pages 1–18. Associa- roles. In 2011 IEEE Fifth International Conference tion for Computational Linguistics. on Semantic Computing, pages 483–489. IEEE. Silvana Hartmann, Ilia Kuznetsov, Teresa Martin, and Claire Bonial, Kevin Stowe, and Martha Palmer. 2013. Iryna Gurevych. 2017. Out-of-domain FrameNet Renewing and revising SemLink. In Proceedings semantic role labeling. In Proceedings of the 15th of the 2nd Workshop on Linked Data in Linguistics Conference of the European Chapter of the Associa- (LDL-2013): Representing and linking lexicons, ter- tion for Computational Linguistics: Volume 1, Long minologies and other language data, pages 9–17. Papers, pages 471–482.

636 Katri Haverinen, Jenna Kanerva, Samuel Kohonen, Meeting of the Association for Computational Lin- Anna Missilä, Stina Ojala, Timo Viljanen, Veronika guistics, pages 216–225. Association for Computa- Laippala, and Filip Ginter. 2015. The Finnish tional Linguistics. proposition bank. Language Resources and Eval- uation, 49(4):907–926. Tim O’Gorman, Sameer Pradhan, Martha Palmer, Ju- lia Bonn, Kathryn Conger, and James Gung. 2018. Shexia He, Zuchao Li, Hai Zhao, and Hongxiao Bai. The new Propbank: Aligning Propbank with AMR 2018. Syntax for semantic role labeling, to be, or not through POS unification. In Proceedings of LREC. to be. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Vol- Martha Palmer. 2009. Semlink: Linking Propbank, ume 1: Long Papers), volume 1, pages 2061–2071. VerbNet and FrameNet. In Proceedings of the gen- erative lexicon conference, pages 9–15. GenLex-09, Felix Hill, Roi Reichart, and Anna Korhonen. 2015. Pisa, Italy. Simlex-999: Evaluating semantic models with (gen- uine) similarity estimation. Computational Linguis- Martha Palmer, Olga Babko-Malaya, Ann Bies, tics, 41(4):665–695. Mona T. Diab, Mohamed Maamouri, Aous Man- souri, and Wajdi Zaghouani. 2008. A pilot arabic Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Propbank. In LREC. Ramshaw, and Ralph Weischedel. 2006. Ontonotes: the 90% solution. In Proceedings of the human lan- Martha Palmer, Daniel Gildea, and Paul Kingsbury. guage technology conference of the NAACL, Com- 2005. The Proposition Bank: An annotated cor- panion Volume: Short Papers, pages 57–60. Associ- pus of semantic roles. Computational linguistics, ation for Computational Linguistics. 31(1):71–106. Karin Kipper-Schuler. 2005. VerbNet: A broad- Jeffrey Pennington, Richard Socher, and Christo- coverage, comprehensive verb lexicon. University pher D. Manning. 2014. Glove: Global vectors for of Pensylvania. word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- Beth Levin. 1993. English verb classes and alterna- guage Processing, EMNLP 2014, pages 1532–1543. tions: A preliminary investigation. University of ACL. Chicago press. James Pustejovsky. 1995. The Generative Lexicon. Zuchao Li, Shexia He, Jiaxun Cai, Zhuosheng Zhang, MIT Press, Cambridge MA. Hai Zhao, Gongshen Liu, Linlin Li, and Luo Si. 2018. A unified syntax-aware framework for seman- Nils Reimers and Iryna Gurevych. 2017. Report- tic role labeling. In Proceedings of the 2018 Con- ing score distributions makes a difference: Perfor- ference on Empirical Methods in Natural Language mance study of LSTM-networks for sequence tag- Processing, pages 2401–2411. ging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Process- Ding Liu and Daniel Gildea. 2010. Semantic role ing, EMNLP 2017, Copenhagen, Denmark, Septem- features for machine translation. In Proceedings ber 9-11, 2017, pages 338–348. of the 23rd International Conference on Computa- tional Linguistics, pages 716–724. Gözde Gül ¸Sahinand E¸srefAdalı. 2018. Annotation of semantic roles for the Turkish proposition bank. Marco Maru, Federico Scozzafava, Federico Martelli, Language Resources and Evaluation, 52(3):673– and Roberto Navigli. 2019. SyntagNet: Challeng- 706. ing Supervised Word Sense Disambiguation with Lexical-Semantic Combinations. In Proceedings of Carina Silberer and Manfred Pinkal. 2018. Ground- EMNLP-IJCNLP 2019. ing semantic roles in images. In Proceedings of the 2018 Conference on Empirical Methods in Natural Adam Meyers, Ruth Reeves, Catherine Macleod, Language Processing, pages 2616–2626. Rachel Szekely, Veronika Zielinska, Brian Young, and Ralph Grishman. 2004. The NomBank project: Mariona Taulé, Maria Antònia Martí, and Marta Re- An interim report. In Proceedings of the Work- casens. 2008. Ancora: Multilevel annotated corpora shop Frontiers in Corpus Annotation at HLT-NAACL for Catalan and Spanish. In Proceedings of LREC. 2004. Nianwen Xue and Martha Palmer. 2003. Annotat- Roberto Navigli. 2018. Natural Language Under- ing the propositions in the Penn Chinese Treebank. standing: Instructions for (present and future) use. In Proceedings of the Second Workshop on Chi- In Proceedings of the Twenty-Seventh International nese Language Processing, SIGHAN 2003, Sapporo, Joint Conference on Artificial Intelligence (IJCAI Japan, July 11-12, 2003. 2018), pages 5697–5702, Stockholm, Sweden. Roberto Navigli and Simone Paolo Ponzetto. 2010. BabelNet: Building a very large multilingual se- mantic network. In Proceedings of the 48th Annual

637