Cross-Lingual UCCA Semantic Parsing Using Recursive Masked Sequence Tagging Gabriel Marzinotto, Johannes Heinecke, Geraldine Damnati

MaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging Gabriel Marzinotto, Johannes Heinecke, Geraldine Damnati To cite this version: Gabriel Marzinotto, Johannes Heinecke, Geraldine Damnati. MaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging. Proceed- ings of the Thirteenth International Workshop on Semantic Evaluation, Jun 2019, Minneapolis, United States. hal-02298429 HAL Id: hal-02298429 https://hal.archives-ouvertes.fr/hal-02298429 Submitted on 4 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. MaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging Gabriel Marzinotto1;2 Johannes Heinecke1 Geraldine´ Damnati1 (1) Orange Labs / Lannion France (2) Aix Marseille Univ, CNRS, LIS / Marseille France fgabriel.marzinotto,johannes.heinecke,[email protected] Abstract glish, French and German (with pilot annotation This paper describes our recursive system for projects on Czech, Russian and Hebrew). De- SemEval-2019 Task 1: Cross-lingual Seman- spite the newness of UCCA, it has proven useful tic Parsing with UCCA. Each recursive step for defining semantic evaluation measures in text- consists of two parts. We first perform seman- to-text generation and machine translation (Birch tic parsing using a sequence tagger to estimate et al., 2016). UCCA represents the semantics of the probabilities of the UCCA categories in the a sentence using directed acyclic graphs (DAGs), sentence. Then, we apply a decoding policy where terminal nodes correspond to text tokens, which interprets these probabilities and builds and non-terminal nodes to higher level semantic the graph nodes. Parsing is done recursively, we perform a first inference on the sentence to units. Edges are labelled, indicating the role of extract the main scenes and links and then we a child in the relation to its parent. UCCA pars- recursively apply our model on the sentence ing is a recent task and since UCCA has several using a masking feature that reflects the deci- unique properties, adapting syntactic parsers or sions made in previous steps. Process contin- parsers from other semantic representations is not ues until the terminal nodes are reached. We straight-forward. Current state of the art parser choose a standard neural tagger and we fo- TUPA (Hershcovich et al., 2017) uses a transition cused on our recursive parsing strategy and on the cross lingual transfer problem to develop based parsing to build UCCA representations. a robust model for the French language, using Building over previous work on FrameNet Se- only few training samples. mantic Parsing (Marzinotto et al., 2018a,b) we chose to perform UCCA parsing using sequence 1 Introduction tagging methods along with a graph decoding pol- Semantic representation is an essential part of icy. To do this we propose a recursive strategy in NLP. For this reason, several semantic represen- which we perform a first inference on the sentence tation paradigms have been proposed. Among to extract the main scenes and links and then we them we find PropBank (Palmer et al., 2005) and recursively apply our model on the sentence with FrameNet Semantics (Baker et al., 1998), Ab- a masking mechanism at the input in order to feed stract Meaning Representation (AMR) (Banarescu information about the previous parsing decisions. et al., 2013), Universal Decompositional Seman- tics (White et al., 2016) and Universal Conceptual 2 Model Cognitive Annotation (UCCA) (Abend and Rap- poport, 2013). These constantly improving rep- Our system consists of a sequence tagger that is resentations, along with the advances in semantic first applied on the sentence to extract the main parsing, have proven to be beneficial in many NLU scenes and links and then it is recursively applied tasks such as Question Answering (Shen and La- on the extracted element to build the semantic pata, 2007), text summarization (Genest and La- graph. At each step of the recursion we use a palme, 2011), dialog systems (Tur et al., 2005), in- masking mechanism to feed information about the formation extraction (Bastianelli et al., 2013) and previous stages into the model. In order to convert machine translation (Liu and Gildea, 2010). the sequence labels into nodes of the UCCA graph UCCA is a cross-lingual semantic representa- we also apply a decoding policy at each stage. tion scheme, has demonstrated applicability in En- Our tagger is implemented using deep bi- directional GRU (biGRU). This simple architec- To train such a model, we build a new training ture is frequently used in semantic parsers across corpus in which the sentences are repeated several different representation paradigms. Besides its times. More precisely, a sentence appears N times flexibility, it is a powerful model, with close to (N being the number of non terminal nodes in the state of the art performance on both PropBank (He UCCA graph) each one a with different mask. et al., 2017) and FrameNet semantic parsing (Yang 2.2 Multi-Task UCCA Objective and Mitchell, 2017; Marzinotto et al., 2018b). More precisely, the model consists of a 4 layer Along with the UCCA-XML graph representa- bi-directional Gated Recurrent Unit (GRU) with tions, a simplified tree representation in CoNLL highway connections (Srivastava et al., 2015). Our format was also provided. Our model combines model uses has a rich set of features including syn- both representations using a multitask objective tactic, morphological, lexical and surface features, with two tasks. TASK1 consists in, for a given which have shown to be useful in language ab- node and its corresponding mask, predicting the stracted representations. The list is given below: children and their arc labels. TASK1 encodes the children spans using a BIO scheme. The • Word embeddings of 300 dimensions 1. TASK2 consists in predicting the CoNLL sim- • Syntactic dependencies of each token2. plified UCCA structure of the sentence. More • Part-of-speech and morphological features precisely, TASK2 is a sequence tagger that pre- such as gender, number, voice and degree2. dicts the UCCA-CoNLL function of each token. TASK2 • Capitalization and word length encoding. is not used for inference purposes. It is only a support that help the model to extract rele- • Prefixes and Suffixes of 2 and 3 characters. vant features, allowing it to model the whole sen- • A language indicator feature. tence even when parsing small pre-terminal nodes. • Boolean indicator of idioms and multi word expression. Detailed in section 3.2. 2.3 Label Encoding • Masking mechanism, which indicates, for a We have previously stated that TASK1 uses BIO given node in the graph, the tokens within the encoded labels to model the structure of the chil- span as well as the arc label between the node dren of each node in the semantic graph. In some and its parent. See details in section 2.1. rare cases, the BIO encoding scheme is not suf- ficient to model the interaction between parallel Except for words where we use pre-trained em- scenes. For example, when we have two paral- beddings, we use randomly initialized embedding lel scenes and one of them appears as a clause layers for categorical features. inside the other. In such cases, BIO encoding does not allow to determine whether the last part 2.1 Masking Mechanism of the sentence belongs to the first scene or to We introduce an original masking mechanism in the clause. Despite this issue, prior experiments order to feed information about the previous pars- testing more complete label encoding schemes ing stages into the model. During parsing, we (BIEO, BIEOW) showed that BIO outperforms the first do an initial inference step to extract the main other schemes on the validation sets. scenes and links. Then, for each resulting node, we build a new input which is essentially the same, 2.4 Graph Decoding but with a categorical sequence masking feature. During the decoding phase, we convert the BIO la- For the input tokens in the node span, this feature bels into graph nodes. To do so, we add a few con- is equal to the label of the arc between the node straints to ensure the outputs are feasible UCCA and its parent. Outside of the node span, this mask graphs that respect the sentence’s structure: is equal to O. A diagram of this masking process • We merge parallel scenes (H) that do not have is shown in figure1. The process continues and either a verb or an action noun to the nearest the model recursively extracts the inner semantic previous scene having one. structures (the node’s children) in the graph, until • Within each parallel scene, we force the ex- the terminal nodes are reached. istence of one and only one State (S) or 1Obtained from https://github.com/facebookresearch/MUSE Process (P) by selecting the token with the 2 Using Universal Dependencies categories. highest probability of State or Process. B:H I:H I:H I:H B:L B:H I:H Step 1 Bi-GRU Word The mice ate cheese and fell asleep Mask INIT INIT INIT INIT INIT INIT INIT B:A I:A B:P B:A O O O Step 2.A Bi-GRU The mice ate cheese and fell asleep H H H H O O O REM-A REM-A O O O B:D B:P Step 2.B Bi-GRU The mice ate cheese and fell asleep O O O O O H H B:E B:C O O O O O Step 3 Bi-GRU The mice ate cheese and fell asleep 1 Interne Orange A A O O O O O Figure 1: Masking mechanism through recursive calls.

Load more