Semeval-2019 Task 1: Cross-Lingual Semantic Parsing with UCCA

SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA Daniel Hershcovich Zohar Aizenbud Leshem Choshen Elior Sulem Ari Rappoport Omri Abend School of Computer Science and Engineering Hebrew University of Jerusalem {danielh,zohara,borgr,eliors,arir,oabend}@cs.huji.ac.il Abstract L H H U We present the SemEval 2019 shared task After , on Universal Conceptual Cognitive Annota- A A P A P tion (UCCA) parsing in English, German and French, and discuss the participating graduation John moved systems and results. UCCA is a cross- R C linguistically applicable framework for se- to Paris mantic representation, which builds on exten- sive typological work and supports rapid an- Figure 1: An example UCCA graph. notation. UCCA poses a challenge for ex- isting parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discon- One of our goals is to benefit semantic parsing in tinuous structures and non-terminal nodes cor- languages with less annotated resources by mak- responding to complex semantic units. The ing use of data from more resource-rich languages. shared task has yielded improvements over the state-of-the-art baseline in all languages We refer to this approach as cross-lingual parsing, and settings. Full results can be found in the while other works (Zhang et al., 2017, 2018) de- task’s website https://competitions. fine cross-lingual parsing as the task of parsing codalab.org/competitions/19160. text in one language to meaning representation in another language. 1 Overview In addition to its potential applicative value, work on semantic parsing poses interesting algo- Semantic representation is receiving growing at- rithmic and modeling challenges, which are often tention in NLP in the past few years, and many different from those tackled in syntactic parsing, proposals for semantic schemes have recently including reentrancy (e.g., for sharing arguments been put forth. Examples include Abstract Mean- across predicates), and the modeling of the inter- ing Representation (AMR; Banarescu et al., 2013), face with lexical semantics. Broad-coverage Semantic Dependencies (SDP; UCCA is a cross-linguistically applicable se- Oepen et al., 2016), Universal Decompositional mantic representation scheme, building on the Semantics (UDS; White et al., 2016), Parallel established Basic Linguistic Theory typological Meaning Bank (Abzianidze et al., 2017), and Uni- framework (Dixon, 2010b,a, 2012). It has demon- versal Conceptual Cognitive Annotation (UCCA; strated applicability to multiple languages, includ- Abend and Rappoport, 2013). These advances in ing English, French and German, and pilot an- semantic representation, along with correspond- notation projects were conducted on a few lan- ing advances in semantic parsing, can potentially guages more. UCCA structures have been shown benefit essentially all text understanding tasks, and to be well-preserved in translation (Sulem et al., have already demonstrated applicability to a vari- 2015), and to support rapid annotation by non- ety of tasks, including summarization (Liu et al., experts, assisted by an accessible annotation in- 2015; Dohare and Karnick, 2017), paraphrase de- terface (Abend et al., 2017).1 UCCA has al- tection (Issa et al., 2018), and semantic evaluation ready shown applicative value for text simplifica- (using UCCA; see below). In this shared task, we focus on UCCA parsing in multiple languages. 1https://github.com/omriabnd/UCCA-App Scene Elements P Process The main relation of a Scene that evolves in time (usually an action or movement). S State The main relation of a Scene that does not evolve in time. A Participant Scene participant (including locations, abstract entities and Scenes serving as arguments). D Adverbial A secondary relation in a Scene. Elements of Non-Scene Units C Center Necessary for the conceptualization of the parent unit. E Elaborator A non-Scene relation applying to a single Center. N Connector A non-Scene relation applying to two or more Centers, highlighting a common feature. R Relator All other types of non-Scene relations: (1) Rs that relate a C to some super-ordinate relation, and (2) Rs that relate two Cs pertaining to different aspects of the parent unit. Inter-Scene Relations H Parallel Scene A Scene linked to other Scenes by regular linkage (e.g., temporal, logical, purposive). L Linker A relation between two or more Hs (e.g., “when”, “if”, “in order to”). G Ground A relation between the speech event and the uttered Scene (e.g., “surprisingly”). Other F Function Does not introduce a relation or participant. Required by some structural pattern. Table 1: The complete set of categories in UCCA’s foundational layer. tion (Sulem et al., 2018b), as well as for defining movement or some other relation that evolves semantic evaluation measures for text-to-text gen- in time. Each Scene contains one main relation eration tasks, including machine translation (Birch (marked as either a Process or a State), as well et al., 2016), text simplification (Sulem et al., as one or more Participants. For example, the 2018a) and grammatical error correction (Choshen sentence “After graduation, John moved to Paris” and Abend, 2018). (Figure1) contains two Scenes, whose main The shared task defines a number of tracks, relations are “graduation” and “moved”. “John” based on the different corpora and the availabil- is a Participant in both Scenes, while “Paris” ity of external resources (see §5). It received sub- only in the latter. Further categories account for missions from eight research groups around the inter-Scene relations and the internal structure world. In all settings at least one of the submitted of complex arguments and relations (e.g., coor- systems improved over the state-of-the-art TUPA dination and multi-word expressions). Table1 parser (Hershcovich et al., 2017, 2018), used as a provides a concise description of the categories baseline. used by the UCCA foundational layer. UCCA distinguishes primary edges, corre- 2 Task Definition sponding to explicit relations, from remote edges (appear dashed in Figure1) that allow for a unit UCCA represents the semantics of linguistic ut- to participate in several super-ordinate relations. terances as directed acyclic graphs (DAGs), where Primary edges form a tree in each layer, whereas terminal (childless) nodes correspond to the text remote edges enable reentrancy, forming a DAG. tokens, and non-terminal nodes to semantic units UCCA graphs may contain implicit units with that participate in some super-ordinate relation. no correspondent in the text. Figure2 shows the Edges are labeled, indicating the role of a child annotation for the sentence “A similar technique in the relation the parent represents. Nodes and is almost impossible to apply to other crops, such edges belong to one of several layers, each corre- as cotton, soybeans and rice.”2 It includes a sin- sponding to a “module” of semantic distinctions. gle Scene, whose main relation is “apply”, a sec- UCCA’s foundational layer covers the ondary relation “almost impossible”, as well as predicate-argument structure evoked by pred- two complex arguments: “a similar technique” icates of all grammatical categories (verbal, and the coordinated argument “such as cotton, nominal, adjectival and others), the inter-relations soybeans, and rice.” In addition, the Scene in- between them, and other major linguistic phe- cludes an implicit argument, which represents the nomena such as semantic heads and multi-word agent of the “apply” relation. expressions. It is the only layer for which an- While parsing technology is well-established notated corpora exist at the moment, and is thus the target of this shared task. The layer’s basic 2The same example was used by Oepen et al.(2015) to notion is the Scene, describing a state, action, compare different semantic dependency schemes. A F D A F P A U is IMPLICIT to apply . E E C E C R E C U E A similar technique almost impossible to other crops , R C U C N C such as cotton , soybeans and rice Figure 2: UCCA example with an implicit unit. for syntactic parsing, UCCA has several formal Thousand Leagues Under the Sea corpus, which properties that distinguish it from syntactic rep- includes the entire book in German. For consistent resentations, mostly UCCA’s tendency to abstract annotation, we replace any Time and Quantifier la- away from syntactic detail that do not affect argu- bels with Adverbial and Elaborator in these data ment structure. For instance, consider the follow- sets. The resulting training, development4 and test ing examples where the concept of a Scene has a sets5 are publicly available, and the splits are given different rationale from the syntactic concept of a in Table2. Statistics on various structural proper- clause. First, non-verbal predicates in UCCA are ties are given in Table3. represented like verbal ones, such as when they The corpora were manually annotated accord- appear in copula clauses or noun phrases. Indeed, ing to v1.2 of the UCCA guidelines,6 and re- in Figure1, “graduation” and “moved” are con- viewed by a second annotator. All data was passed sidered separate Scenes, despite appearing in the through automatic validation and normalization same clause. Second, in the same example, “John” scripts.7 The goal of validation is to rule out cases is marked as a (remote) Participant in the grad- that are inconsistent with the UCCA annotation uation Scene, despite not being explicitly men- guidelines. For example, a Scene, defined by the tioned. Third, consider the possessive construc- presence of a Process or a State, should include at tion in “John’s trip home”. While in UCCA “trip” least one Participant. evokes a Scene in which “John” is a Participant, a Due to the small amount of annotated data avail- syntactic scheme would analyze this phrase simi- able for French, we only provided a minimal train- larly to “John’s shoes”.

Load more