Multi-Lingual Dependency Parsing: Word Representation and Joint
Total Page:16
File Type:pdf, Size:1020Kb
Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis Mathieu Dehouck To cite this version: Mathieu Dehouck. Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis. Computer Science [cs]. Université de lille, 2019. English. tel-02197615 HAL Id: tel-02197615 https://tel.archives-ouvertes.fr/tel-02197615 Submitted on 30 Jul 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Centre de Recherche en Informatique, Signal et Automatique de Lille École Doctorale Sciences Pour L’Ingénieur Thèse de Doctorat Spécialité : Informatique et Applications préparée au sein de l’équipe Magnet, du laboratoire Cristal et du centre de recherche Inria Lille - Nord Europe financée par l’Université de Lille Mathieu Dehouck Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis Parsing en Dépendances Multilingue : Représentation de Mots et Apprentissage Joint pour l’Analyse Syntaxique sous la direction de Dr. Marc TOMMASI et l’encadrement de Dr. Pascal DENIS Soutenue publiquement à Villeneuve d’Ascq, le 20 mai 2019 devant le jury composé de: Mme Sandra KÜBLER Indiana University Bloomington Rapportrice M. Alexis NASR Université d’Aix Marseille Rapporteur Mme Hélène TOUZET CNRS Présidente du jury M. Philippe BLACHE CNRS Examinateur M. Carlos GÓMEZ RODRÍGUEZ Universidade da Coruña Examinateur M. Pascal DENIS Inria Encadrant M. Marc TOMMASI Université de Lille Directeur Multi-Lingual Dependency Parsing : Word Representation and Joint Training for Syntactic Analysis Parsing en Dépendances Multilingue : Représentation de Mots et Apprentissage Joint pour l’Analyse Syntaxique Mathieu DEHOUCK 20 Mai 2019 Contents 1 Introduction 14 1.1 Outline ................................. 17 2 Preliminaries 20 2.1 Snippet of Graph Theory ........................ 21 2.2 Dependency Parsing .......................... 22 2.2.1 Dependency Parsing as a NLP Task ............. 25 2.2.2 Graph-based Parsing ...................... 25 2.2.3 Transition-based Parsing .................... 30 2.2.4 Other Approaches ....................... 32 2.2.5 Evaluation ........................... 33 2.3 Machine Learning and Structured Prediction ............. 34 2.3.1 Structured Prediction ..................... 35 2.3.2 Learning Scoring Functions .................. 36 2.3.3 Large Margin Classifiers .................... 37 2.3.4 Online Learning ........................ 38 2.3.5 Neural Parsers ......................... 41 2.4 Conclusion ................................ 42 3 Representing Word Information 45 3.1 Lemmas, Parts-of-speech and Morphological Attributes ....... 45 3.1.1 Parts-of-speech ......................... 46 3.1.2 Morphological Features .................... 47 3.2 Learning Word Representation ..................... 52 3.2.1 The Different Views of a Word ................ 52 3.2.2 Types of Word Representations ................ 53 3.2.3 Beyond One-Hot Encoding .................. 54 3.2.4 Distributional Hypothesis ................... 55 3.2.5 Distributional Semantics .................... 56 3.2.6 Continuous Representations .................. 58 3.2.7 Discrete Representations .................... 59 3.2.8 Engineering, Learning and Selection ............. 61 3.3 Conclusion ................................ 62 4 Related Works on Multi-Lingual Dependency Parsing 65 4.1 Multi-Lingual Dependency Parsing .................. 65 4.2 Universal Dependencies ........................ 67 4.3 Related Work .............................. 72 4.3.1 Delexicalised Parsers ...................... 76 4.3.2 Annotation Projection ..................... 76 4.3.3 Cross-Lingual Representations ................. 77 2 4.3.4 Direct Transfer and Surface Form Rewriting ......... 77 4.3.5 Multi-Lingual Dependency Parsing .............. 78 5 Delexicalised Word Representation 79 5.1 Related Work .............................. 80 5.2 Delexicalised Words .......................... 82 5.3 Representation Learning ........................ 84 5.3.1 Delexicalised Contexts ..................... 84 5.3.2 Structured Contexts ...................... 86 5.3.3 Structured Delexicalised Contexts ............... 87 5.3.4 Dimension Reduction ...................... 88 5.4 Dependency Parsing with Delexicalised Word ............ 89 5.5 Experiments ............................... 92 5.5.1 Settings ............................. 92 5.5.2 Results ............................. 96 5.6 Conclusion ................................ 98 6 Phylogenetic Learning of Multi-Lingual Parsers 99 6.1 Related Work .............................. 100 6.1.1 Multi-Task Learning ...................... 100 6.1.2 Multi-Lingual Dependency Parsing .............. 101 6.2 Phylogenetic Learning of Multi-Lingual Parsers ........... 101 6.2.1 Model Phylogeny ........................ 102 6.2.2 Phylogenetic Datasets ..................... 103 6.2.3 Model Training ......................... 104 6.2.4 Sentence Sampling ....................... 105 6.2.5 Zero-Shot Dependency Parsing ................ 106 6.3 Tree Perceptron ............................. 106 6.3.1 Averaging Policy ........................ 107 6.4 Neural Model .............................. 108 6.5 Experiments ............................... 110 6.5.1 Setting ............................. 110 6.5.2 Results with Training Data .................. 114 6.5.3 Zero-Shot Dependency Parsing ................ 119 6.6 Conclusion ................................ 122 7 Measuring the Role of Morphology 125 7.1 Morphological Richness ......................... 126 7.2 Measuring Morphological Richness .................. 127 7.2.1 Related Work on Measures of Morphological Richness .... 127 7.2.2 Form per Lemma Ratio .................... 129 7.3 Morphological Richness in Dependency Parsing ........... 130 7.4 Measuring Morphology Syntactical Information ........... 132 7.5 Annotation Scheme Design ....................... 134 7.5.1 Part-of-speech Tags ....................... 135 7.5.2 Word Tokenisation ....................... 135 7.5.3 Dependency Scheme ...................... 136 7.6 Experiments ............................... 137 7.6.1 Parsing Model ......................... 138 7.6.2 Word Representation ...................... 138 7.6.3 Experimental Setting ...................... 139 3 7.6.4 Results ............................. 140 7.6.5 Assessing the Impact of the Annotation Scheme ....... 145 7.7 Conclusion ................................ 149 8 Conclusion 152 8.1 Contribution .............................. 152 8.2 Future Works .............................. 153 9 Appendix 156 9.1 Model Propagation for Dependency Parsing ............. 156 9.1.1 Experiments .......................... 157 9.1.2 Results ............................. 158 9.2 Measuring the Role of Morphology .................. 164 4 5 List of Figures 2.1 A simple graph. ............................. 22 2.2 A simple directed graph. ........................ 22 2.3 A weighted graph. ........................... 22 2.4 A simple tree. .............................. 22 2.5 Example of unlabeled dependency tree. ................ 23 2.6 Example of labeled dependency tree. ................. 23 2.7 Example of non-projective dependency tree. ............. 24 2.8 Example of projective dependency tree. ................ 24 2.9 Illustration of Eisner algorithm tree merging. ............. 30 2.10 Example of constituency tree. ..................... 32 2.11 Partial dependency tree showing structure constraints. ....... 35 3.1 Analysis of words into morphemes. .................. 48 3.2 Sentence with a missing word. ..................... 55 3.3 Zipfian distribution on English words. ................. 58 3.4 Example of hierarchical clustering. .................. 61 5.1 Example of structured contexts. .................... 86 6.1 Phylogenetic tree of Slavic languages. ................. 103 6.2 Character model for word embedding architecture. ......... 109 6.3 Neural network architecture for edge scoring. The contextualised representation of the governor (eat) and the dependent (Cats) are concatenated and passed through a rectified linear layer and a final plain linear layer to get a vector of label scores. ........... 110 6.4 Phylogenetic tree for all the languages of UD 2.2. .......... 112 6.5 Phylogenetic tree for all the Indo-European languages of UD 2.2. 113 7.1 Parsing result differences with respect to morphological complexity. 144 9.1 Propagation weights for model propagation. ............. 162 6 List of Tables 2.1 Edge vector feature templates. ..................... 27 3.1 French conjugation paradigm of chanter. ............... 50 3.2 Example of one-hot representation. .................. 54 3.3 Example of co-occurrence matrix. ................... 56 3.4 Example of vector binarisation. .................... 60 4.1 Sentence in CONLL-U format. ..................... 69 4.2 List of UD universal parts-of-speech. ................. 69 4.3 List of UD universal dependency relations. .............. 70 4.4 List of most common UD morphological attributes. ......... 71 4.5 List of UD treebanks