Hybrid Long-Distance Functional Dependency Parsing

HYBRID LONG-DISTANCE FUNCTIONAL DEPENDENCY PARSING THESIS presented to the Faculty of Arts of the University of Zurich for the degree of Doctor of Philosophy by Gerold Schneider from Nidau, Switzerland Accepted in the winter semester 2006/2007 on the recommendation of Professor Dr. Michael Hess and Dr. Paola Merlo Zurich, July 2008 i Hybrid Long-Distance Functional Dependency Parsing c 2008 Gerold Schneider [email protected] [email protected] ii Abstract This thesis proposes a robust, hybrid, deep-syntatic dependency-based parsing architecture and presents its implementation and evaluation. The architecture and the implementation are carefully designed to keep search-spaces small with- out compromising much on the linguistic performance or adequacy. The resulting parser is deep-syntactic like a formal grammar-based parser but at the same time mostly context-free and fast enough for large-scale application to unrestricted texts. It combines a number of successful current approaches into a hybrid, comparatively simple, modular and open model. This thesis reports three results: We suggest, implement, and evaluate a parsing architecture that is fast, robust and efficient enough to allow users to do broad-coverage parsing of unrestricted texts from varied domains. We present a probability model and a combination between a rule-based competence grammar and a statistical lexicalized performance disambiguation model. We show that inherently complex linguistic problems can be broken down and approximated sufficiently well by less complex methods. In particu- lar (1) on the level of long-distance dependencies, the majority of them can be approximated by using a labelled DG, context-free finite-state based pat- terns, and post-processing, (2) on the level of long-distance dependencies, a slightly extended DG allows us to use mildly context-sensitive operations known from Tree-Adjoining Grammar (TAG), (3) on the base phrase level, parsing can successfully be approximated by the more shallow approaches of chunking and tagging. We conclude that labelled DG is sufficiently ex- pressive for linguistically adequate parsing. We argue that our parser covers the middle ground between statistical parsing and formal grammar-based parsing. The parser has competitive performance and has been applied widely. iii Contents 1 Introduction 1 1.1 Ambiguity . 3 1.2 Approaches to Parsing . 5 1.2.1 Full or Shallow Parsing . 5 1.2.2 Formal Grammar Based Parsing Approaches . 7 1.2.3 Probabilistic Parsers . 7 1.2.4 Mixed Models . 8 1.3 Task-Specific Division of Labour . 9 1.4 Relations and Differences to Current Approaches . 10 1.5 Overview of this Thesis . 13 2 Overview of the Parser and the Architecture 16 2.1 The Modular Architecture . 16 2.2 Integrating Linguistic Knowledge into the System . 30 2.2.1 The Boundedness of Long-Distance Dependencies . 30 2.2.2 Constraints on Grammar Rules . 31 2.2.3 Mapping Grammatical Relations for the Probability Esti- mation . 31 3 Dependency Grammar 34 3.1 Conceptions of DG . 34 iv v CONTENTS 3.1.1 Extended Valency Grammar . 35 3.1.2 Government Grammar . 37 3.1.3 Terminal Node Context-Free Grammar . 38 3.1.4 A Version of X-bar Theory . 47 3.2 Characteristics of functional DG . 50 3.2.1 Definition of Head . 50 3.2.2 Projectivity . 56 3.2.3 Functionalism . 57 3.3 The Relationship of DG to HPSG and LFG . 59 3.3.1 HPSG . 59 3.3.2 LFG . 62 3.4 Conclusions . 62 4 State of the Art 63 4.1 Introduction . 63 4.2 PP attachment disambiguation . 64 4.2.1 The question . 64 4.2.2 Hindle and Rooth . 66 4.2.3 Collins and Brooks . 67 4.3 Treebank-Based Statistical Parsers . 68 4.3.1 Collins 1996 . 68 4.3.2 Model 1, 1997 . 75 4.3.3 Model 2 . 76 4.3.4 Model 3 . 77 4.3.5 Recovering empty nodes and functional tags with Treebank- Based Statistical Parsers . 78 4.4 Dependency-oriented Statistical Parsers . 79 4.4.1 Link Grammar . 79 4.4.2 Eisner . 79 CONTENTS vi 4.4.3 MacDonald et al. 80 4.4.4 Nivre et al. 80 4.4.5 Yamada and Matsumoto . 82 4.5 Data-Oriented Parsing (DOP) . 83 4.6 Statistical Formal Grammar Parsers . 84 4.6.1 Lexical-Functional Grammar (LFG) . 84 4.6.2 Head-Driven Phrase Structure Grammar (HPSG) . 88 4.6.3 Combinatory Categorial Grammar (CCG) . 90 4.6.4 Tree-Adjoining Grammar (TAG) . 91 4.7 Shallow Parsing, Finite-state Cascading . 91 4.7.1 Tag-Based Chunking and Partial Parsing Grammars . 92 4.7.2 Grefenstette, Brants . 94 4.8 Memory-Based Grammatical Relation Finding . 94 4.8.1 Daelemans et al. 1999 . 95 4.8.2 Buchholz 2002 . 96 4.9 Conclusions . 97 5 Grammar Engineering 98 5.1 Introduction . 98 5.1.1 The Penn Treebank . 99 5.1.2 The Difficulty of Writing Grammars . 102 5.1.3 Benefits of a Hand-Written Grammar . 103 5.2 Subcategorisation and Lexicalisation . 105 5.3 The Rules in Detail . 106 5.3.1 Major Types of Grammar Rules . 107 5.3.2 Minor Types . 116 5.3.3 Unconventional Types . 119 5.4 Conclusion . 120 vii CONTENTS 6 Extended Locality: Treatment of Long-Distance Dependencies in Func- tional Dependency Grammar 121 6.1 Introduction . 121 6.2 The Boundedness of Long-Distance Dependencies . 123 6.2.1 Local Relations . 124 6.2.2 Nonlocal Relations . 126 6.3 A Quantitative Analysis of Types of Empty Nodes . 132 6.3.1 Overview . 132 6.3.2 NP Traces . 134 6.3.3 NP PRO . 134 6.3.4 WH Traces . 135 6.4 A Qualitative Analysis of Types of Empty Nodes . 142 6.4.1 Tree-Adjoining Grammar . 146 6.4.2 TAG Adjoining and mild context-sensitivity . 147 6.4.3 The Nature of Elementary and Auxiliary Trees . 150 6.4.4 Sketching TAG Adjoining in DG . 151 6.4.5 An Implementation at the Functional Level . 155 6.4.6 Ambiguous WH-attachment . 156 6.4.7 Subjacency and Barriers . 165 6.4.8 Minimality and Relativized Minimality . 166 6.5 TAG Adjoining and LFG . 167 6.5.1 Functional Uncertainty . 167 6.5.2 A chunks and F-structure version of LFG . 168 6.6 Conclusions . 168 7 Evaluation and Discussion 171 7.1 Introduction . 171 7.1.1 Traditional Syntactic Evaluation: Labelled Bracketing . 171 7.1.2 Dependency-Based Evaluation: Lin 1995 . 172 CONTENTS viii 7.1.3 Desiderata . 173 7.1.4 An Annotation Scheme for Evaluation: Carroll et al. 1999, 2003 . 174 7.2 GREVAL: A standard 500 sentence test corpus . 175 7.2.1 Bidirectional Mapping of Pro3Gres to GREVAL . 176 7.2.2 Unidirectional Mapping of Pro3Gres to GREVAL . 179 7.2.3 Long-Distance Dependencies . 187 7.2.4 Comparison to Lin’s MINIPAR . 190 7.2.5 Comparison to Carroll and Briscoe’s RASP . 190 7.2.6 Comparison to Buchholz, Charniak, and Collins, accord- ing to Preiss . 194 7.2.7 Comparison to Collins’s Model 1 . 195 7.3 Tentative Comparison to Nivre’s MaltParse . 198 7.3.1 Comparison Across Different Corpora . 198 7.3.2 Comparing with the same Corpus: Participation in CoNLL- XI Shared Task . 202 7.4 Evaluation on Biomedical Term-Annotated Corpora . 204 7.4.1 Evaluation on 100 Random Sentences from the GENIA Corpus . 205 7.4.2 Evaluation with the Stanford Dependency Scheme on 900 BioInfer sentences . 207 7.4.3 Task-Oriented, Practical Evaluation of Pro3Gres relation extraction . 208 7.5 Exploring Precision and Recall Trade-Offs . 211 7.5.1 High Recall Parsing . 214 7.5.2 High Precision Parsing . 214 7.6 Baseline, Distance Measure, Lexicalisation . 227 7.6.1 The Baseline System . 227 7.6.2 The Distance Measure . 229 7.6.3 Lexicalisation . 231 ix CONTENTS 7.7 Disambiguation from the Parsing Context . 235 7.7.1 Parsing Speed, Pruning, and Local Maxima . 235 7.7.2 Local Readings Constrain Each Other . 237 7.8 Linguistic constraints . 239 7.9 Conclusions . 240 8 Conclusions 241 8.1 The Cornerstones of Pro3Gres . 241 8.2 A Hybrid Architecture . 243 8.3 Applications . 245 A tgrep Queries for Grammatical Relations 265 B Gradience and Mapping: A Small Selection of Problematic Cases 272 List of Figures 2.1 Pro3Gres flowchart . 17 2.2 Parser output of the sample sentence What could rescue the bill would be some quick progress on a bill amending the National De- fense Education Act of 1958 in the classical arrow notation . 18 2.3 Parser output of the sample sentence What could rescue the bill would be some quick progress on a bill amending the National De- fense Education Act of 1958 in stemma notation . 19 2.4 Complete CYK chart for an example sentence with 6 terminals . 22 2.5 Parser output . 30 3.1 Dependency tree example . 40 3.2 Constituency tree example . ..

Load more