GCN-Sem at Semeval-2019 Task 1: Semantic Parsing Using Graph Convolutional and Recurrent Neural Networks

GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks Shiva Taslimipoor Omid Rohanian Sara Mozeˇ Research Group in Computational Linguistics University of Wolverhampton, UK fshiva.taslimi, omid.rohanian, [email protected] Abstract automatically converted to other parsing formats, e.g. Abstract Meaning Representation (AMR) and This paper describes the system submitted to Semantic Dependency Parsing (SDP), inter alia the SemEval 2019 shared task 1 ‘Cross-lingual Semantic Parsing with UCCA’. We rely on the (Hershcovich et al., 2018). semantic dependency parse trees provided in Although the schemes are formally different, the shared task which are converted from the they have shared semantic content. In order to original UCCA files and model the task as tag- perform our experiments, we target the converted ging. The aim is to predict the graph structure CONLLU format, which corresponds to tradi- of the output along with the types of relations tional bi-lexical dependencies and rely on the con- among the nodes. Our proposed neural architecture is composed of Graph Convolution and version methodology which is provided in the BiLSTM components. The layers of the sys- shared task (Hershcovich et al., 2019) to attain tem share their weights while predicting de- UCCA graphs. pendency links and semantic labels. The sys- UCCA graphs contain both explicit and implicit tem is applied to the CONLLU format of the units 1 However, in bi-lexical dependencies, nodes input data and is best suited for semantic de- are text tokens and semantic relations are direct pendency parsing. bi-lexical relations between the tokens. The con- 1 Introduction version between the two format results in partial loss of information. Nonetheless, we believe that Universal Conceptual Cognitive Annotation it is worth trying to model the task using one of (UCCA) (Abend and Rappoport, 2013) is a the available formats (i.e. semantic dependency semantically motivated approach to grammatical parsing) which is very popular among NLP re- representation inspired by typological theories of searchers. grammar (Dixon, 2012) and Cognitive Linguistics Typically, transition-based methods are used in literature (Croft and Cruse, 2004). In parsing, syntactic (Chen and Manning, 2014) and seman- bi-lexical dependencies that are based on binary tic (Hershcovich et al., 2017) dependency parsing. head-argument relations between lexical units By contrast, our proposed system shares several are commonly employed in the representation of similarities with sequence-to-sequence neural ar- syntax (Nivre et al., 2007; Chen and Manning, chitectures, as it does not specifically deal with 2014) and semantics (Hajicˇ et al., 2012; Oepen parsing transitions. Our model uses word, POS et al., 2014; Dozat and Manning, 2018). and syntactic dependency tree representations as UCCA differs significantly from traditional de- input and directly produces an edge-labeled graph pendency approaches in that it attempts to ab- representation for each sentence (i.e. edges and stract away traditional syntactic structures and re- their labels as two separate outputs). This multi- lations in favour of employing purely semantic label neural architecture, which consists of a BiL- distinctions to analyse sentence structure. The STM and a Graph Convolutional Network (GCN), shared task, ‘cross-lingual semantic parsing with is described in Section3. UCCA’ (Hershcovich et al., 2019) consists in parsing English, German, and French datasets using 1Explicit units (terminal nodes) correspond to tokens in the UCCA semantic tagset. In order to enable the text, but implicit (semantic) units have no corresponding multi-task learning, the UCCA-annotated data is component in the text. 102 Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), pages 102–106 Minneapolis, Minnesota, USA, June 6–7, 2019. ©2019 Association for Computational Linguistics 2 Related Work of words). A recent trend in parsing research is sequence-to- Graph Convolution. Convolutional Neural Net- sequence learning (Vinyals et al., 2015b; Kitaev works (CNNs), as originally conceived, are se- and Klein, 2018), which is inspired from Neural quential in nature, acting as detectors of N- Machine Translation. These methods ignore ex- grams (Kim, 2014), and are often used as feature- plicit structural information in favour of relying on generating front-ends in deep neural networks. long-term memory, attention mechanism (content- Graph Convolutional Network (GCN) has been in- based or position-based) (Kitaev and Klein, 2018) troduced as a way to integrate rich structural rela- or pointer networks (Vinyals et al., 2015a). By do- tions such as syntactic graphs into the convolution ing so, high-order features are implicitly captured, process. which results in competitive parsing performance In the context of a syntax tree, a GCN can be (Jia and Liang, 2016). understood as a non-linear activation function f Sequence-to-sequence learning has been partic- and a filter W with a bias term b: X ularly effective in Semantic Role Labeling (SRL) c = f( W xi + b) (1) (Zhou and Xu, 2015). By augmenting these i2r(v) models with syntactic information, researchers where r(v) denotes all the words in relation with have been able to develop state-of-the-art systems a given word v in a sentence, and c represents the for SRL (Marcheggiani and Titov, 2017; Strubell output of the convolution. Using adjacency matri- et al., 2018). ces, we define graph relations as mask filters for As information derived from dependency parse the inputs (Kipf and Welling, 2017; Schlichtkrull trees can significantly contribute towards under- et al., 2017). standing the semantics of a sentence, Graph Con- In the present task, information from each graph volutional Network (GCN) (Kipf and Welling, corresponds to a sentence-level dependency parse 2017) is used to help our system perform semantic tree. Given the filter W and bias b , we can there- parsing while attending to structural syntactic in- s s fore define the sentence-level GCN as follows: formation. The architecture is similar to the GCN component employed in Rohanian et al.(2019) for C = f(W XT A + b ) (2) detecting gappy multiword expressions. s s s where Xn×v, An×n, and Co×n are tensor rep- 3 Methodology resentation of words, the adjacency matrix, and the convolution output respectively.3 In Kipf For this task, we employ a neural architecture util- and Welling(2017), a separate adjacency matrix ising structural features to predict semantic pars- is constructed for each relation to avoid over- ing tags for each sentence. The system maps a parametrising the model; by contrast, our model sentence from the source language to a probability is limited to the following three types of relations: distribution over the tags for all the words in the 1) the head to the dependents, 2) the dependents sentence. Our architecture consists of a GCN layer to the head, and 3) each word to itself (self-loops) (Kipf and Welling, 2017), a bidirectional LSTM, similar to Marcheggiani and Titov(2017). The fi- and a final dense layer on top. nal output is the maximum of the weights from the The inputs to our system are sequences of three individual adjacency matrices. words, alongside their corresponding POS and The model architecture is depicted in Figure1. named-entity tags.2 Word tokens are repre- sented by contextualised ELMo embeddings (Pe- 4 Experiments ters et al., 2018), and POS and named-entity tags Our system participated in the closed track for En- are one-hot encoded. We also use sentence-level glish and German and the open track for French. syntactic dependency parse information as input We exclusively used the data provided in the to the system. In the GCN layer, the convolu- shared task. The system is trained on the train- tion filters operate based on the structure of the ing data only, and the parameters are optimised us- dependency tree (rather than the sequential order ing the development set. The results are reported 2spaCy (Honnibal and Johnson, 2015) is used to generate 3o: output dimension; v: word vectors dimension; n: sen- POS, named-entity and syntactic dependency tags. tence length 103 Word as explained in Section 4.1. The parameters are Representation optimised on the English Wiki development data (batch-size = 16 and number of epochs = 100) and GCN GCN used for all four settings. As no training data was (Root to (Dependency GCN Dependency) to Root) (Root) available for French, the trained system on English Wiki was used to parse French sentences of 20K Max Leagues. For this reason the French model is eval- uated within the open track. BiLSTM 4.3 Official Evaluation Our model predicts two outputs for each dataset: FFN primary edges and their labels (UCCA semantic categories). 5 Figure 1: A GCN-based recurrent architecture. Table1 shows the performance (in terms of pre- on blind-test data in both in-domain and out-of- cision, recall, and F1-score) for predicting primary domain settings. We focus on predicting the pri- edges in both labeled (i.e. with semantic tags) and mary edges of UCCA semantic relations and their unlabeled settings (i.e. ignoring semantic tags). labels. Table2 shows F1-scores for each semantic cate- gory separately. Although the overall performance 4.1 Data of the system, as shown in the official evaluation in Table1, is not particularly impressive, there are The datasets of the shared task are devised for four a few results worth reporting. These are listed in settings: 1) English in-domain, using the Wiki cor- Table2. pus; 2) English out-of-domain, using the Wiki corpus as training and development data, and 20K Our system is ranked second in predicting four Leagues as test data; 3) German in-domain, using relations, i.e. L (linker), N (Connector), R (Rela- the 20K Leagues corpus; 4) French setting with tor), and G (Ground), in all settings displayed in no training data (except trial data), using the 20K bold.

Load more