Formal Methods for Testing Grammars Inari Listenmaa
Total Page:16
File Type:pdf, Size:1020Kb
Thesis for the Degree of Doctor of Philosophy Formal Methods for Testing Grammars Inari Listenmaa Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg Gothenburg, Sweden 2019 Formal Methods for Testing Grammars Inari Listenmaa © Inari Listenmaa, 2019. ISBN 978-91-7833-322-6 Technical Report 168D Research groups: Functional Programming and Language Technology Department of Computer Science and Engineering Chalmers University of Technology and University of Gothenburg SE-412 96 Gothenburg Sweden Telephone +46 (0)31-772 1000 Typeset in Palatino and Monaco by the author using X TE EX Printed at Chalmers reproservice Gothenburg, Sweden 2019 Abstract Grammar engineering has a lot in common with software engineering. Analogous to a pro- gram specification, we use descriptive grammar books; in place of unit tests, we have gold standard corpora and test cases for manual inspection. And just like any software, our gram- mars still contain bugs: grammatical sentences that are rejected, ungrammatical sentences that are parsed, or grammatical sentences that get the wrong parse. This thesis presents two contributions to the analysis and quality control of computa- tional grammars of natural languages. Firstly, we present a method for finding contradictory grammar rules in Constraint Grammar, a robust and low-level formalism for part-of-speech tagging and shallow parsing. Secondly, we generate minimal and representative test suites of example sentences that cover all grammatical constructions in Grammatical Framework, a multilingual grammar formalism based on deep structural analysis. Keywords: Grammatical Framework, Constraint Grammar, Satisfiability, Test Case Generation, Grammar Analysis Acknowledgements I am grateful to my supervisor Koen Claessen for his support and encouragement. I cherish our hands-on collaboration, which often involved sitting in the same room writing code to- gether. You have been there for all aspects of a PhD, both inspiration and perspiration! My co-supervisor Aarne Ranta is an equally important figure in my PhD journey, and a major reason I decided to pursue a PhD in the first place. The reader of this thesis may thank Aarne for making me rewrite certain sections over and over again, until they started making sense! Furthermore, I want to thank my opponent Fred Karlsson, and Torbjörn Lager, Gordon Pace and Laurette Pretorius for agreeing to be in my grading committee, as well as Graham Kemp for being my examiner. Looking back, I have an enormous gratitude the whole GF community. It all started back in 2010 when I was a master’s student in Helsinki and joined a research project led by Lauri Carlson. Lauri Alanko was most helpful office mate when I was learning GF, Kaarel Kaljurand was my first co-author and the Estonian resource grammar we wrote was my first large-scale GF project. As important as it was to learn from others during my first steps, becoming a teacher myself has been even more enlightening. I am happy to have met and guided newer GF enthusiasts, especially Bruno Cuconato and Kristian Kankainen—I may have learned more from your questions than you from my answers! During my PhD studies, I’ve had the chance to collaborate with several people outside Gothenburg and my research group. I want to thank Eckhard Bick, Tino Didriksen and Fran- cis Tyers for introducing me to CG and keeping up with my questions and weird ideas. I am grateful to Jose Mari Arriola and Maxux Aranzabe for hosting my research visit in the Basque country and giving me a chance to work with the coolest language I’ve seen so far—eskerrik asko! In addition to my roles as a student, teacher and colleague, I’ve enjoyed more unstruc- tured exploration among peers, and pursuing interesting side tracks. Wen Kokke deserves special thanks for making such an unlikely collaboration happen; as well as for reading a number of nonsensical test sentences in Dutch, and for everything else! On a more day-to-day basis, I want to thank all my colleagues at the department. My office mates Herb and Prasanth have shared with me joys and frustrations, and helped me to decipher supervisor comments. In addition to my supervisors and office mates, I thank the rest of the language technology group: Grégoire, John, Krasimir, Peter, Ramona and Thomas. Outside my office and research group, I want to thank Anders, Dan, Daniel, Elena, Irene, Gabriel, Guilhem, Simon H., Simon R. and Víctor for all the fun things during the 5 years: in- teresting lunch discussions, fermentation parties, hairdyeing parties, climbing, board games, forest excursions, playing music together, sharing a houseboat—just to name a few things. (Also it’s handy to have a stack of your old theses in my office to use as an inspiration for writing acknowledgements!) Sometimes it’s also fun to meet people outside research! I want to thank the wonderful people in Kulturkrock and Chalmers sångkör for making me feel at home in Sweden, not just in the small bubble of my research group. Ett extra tack till Anka och Jonatan för att ni rättat min svenska! All the factors I’ve mentioned previously have been important in finishing my PhD. List- ing all the factors that enabled me to start a PhD would take more space than the actual mono- graph, so let me be brief and thank my parents, firstly, for making me exist, and secondly, for raising me to believe in myself, be willing to take challenges, and never stop learning. This work has been carried out within the REMU project — Reliable Multilingual Digital Communication: Methods and Applications. The project was funded by the Swedish Re- search Council (Vetenskapsrådet) under grant number 2012-5746. Contents 1 Introduction to this thesis ................................ 1 1.1 Symbolic evaluation of a reductionistic formalism . 2 1.2 Test case generation for a generative formalism . 4 1.3 Structure of this thesis . 5 1.4 Contributions of the author . 5 2 Background ........................................ 7 2.1 Constraint Grammar . 7 2.2 Grammatical Framework . 12 2.3 Software testing and verification . 18 2.4 Boolean satisfiability (SAT) . 21 2.5 Summary . 23 3 CG as a SAT-problem ................................... 25 3.1 Related work . 26 3.2 CG as a SAT-problem . 26 3.3 SAT-encoding . 34 3.4 Experiments . 41 3.5 Summary . 47 4 Analysing Constraint Grammar ............................. 49 4.1 Related work . 50 4.2 Analysing CGs . 52 4.3 Evaluation . 62 4.4 Experiments on a Basque grammar . 69 4.5 Generative Constraint Grammar . 74 4.6 Conclusions and future work . 81 5 Test Case Generation for Grammatical Framework .................. 85 5.1 Related work . 86 5.2 Grammar . 87 5.3 Using the tool . 90 5.4 Generating the test suite . 91 5.5 Evaluation . 101 5.6 Case study: Fixing the Dutch grammar . 107 5.7 Conclusion and future work . 113 6 Conclusions ........................................ 115 6.1 Summary of the thesis . 115 6.2 Insights and future directions . 116 Bibliography .......................................... 121 Appendix: User manual for gftest ............................. 133 Chapter 1 Introduction to this thesis There are many approaches to natural language processing (NLP). Broadly divided, we can contrast data-driven and rule-based approaches, which in turn contain more subdivisions. Data-driven NLP is based on learning from examples, rather than explicit rules. Consider how would a system learn that the English string cat corresponds to the German string Katze: we feed it millions of parallel sentences, and it learns that whenever cat appears in one, Katze appears in another with a high probability. This approach is appropriate, when the target language is well-resourced (and grammatically simple), the domain is unlimited and correct- ness is not crucial. For example, a monolingual English speaker who wants to get the gist of any non-English web page will be happy with a fast, wide-coverage translation system, even if it makes some mistakes. Rule-based NLP has a different set of use cases. Writing grammar rules takes more hu- man effort than training a machine learning model, but in many cases, it is the more feasible approach: • Quality over coverage. Think of producers instead of consumers of information: the producer needs high guarantees of correctness, but often there is only a limited domain to cover. • Less need for language resources. Most of the 6000+ languages of the world do not have the abundance of data needed for machine learning, making rule-based approach the only feasible one. • Grammatically complex languages benefit proportionately more from a grammar- based approach. Think of a grammar as a compression method: a compact set of rules 1 Chapter 1. Introduction to this thesis generates infinitely many sentences. Grammars can also be used in conjunction with machine learning, e.g. creating new training data to prevent data sparsity. • Grammars are explainable, and hence, testable. If there is a bug in a particular sen- tence, we can find the reason for it and fix it. In contrast, machine learning models are much more opaque, and the user can just tweak some parameters or add some data, without guarantees how it affects the model. Testing grammars has one big difference from testing most software: natural language has no formal specification, so ultimately we must involve a human oracle. However, we can automate many useful subtasks: detect ambiguous constructions and contradictory grammar rules, as well as generate minimal and representative set of examples that cover all the construc- tions. Think of the whole grammar as a haystack, and we suspect there are a few needles—we cannot promise automatic needle-removal, but instead we help the human oracle to narrow down the search. Our work is by no means the first approach to grammar testing: for instance, Butt et al.