Generalised LR Parsing Algorithms

Generalised LR parsing algorithms Giorgios Robert Economopoulos Department of Computer Science Royal Holloway, University of London August, 2006 Abstract This thesis concerns the parsing of context-free grammars. A parser is a tool, de- ¯ned for a speci¯c grammar, that constructs a syntactic representation of an input string and determines if the string is grammatically correct or not. An algorithm that is capable of parsing any context-free grammar is called a generalised (context- free) parser. This thesis is devoted to the theoretical analysis of generalised parsing algorithms. We describe, analyse and compare several algorithms that are based on Knuth's LR parser. This work underpins the design and implementation of the Parser Animation Tool (PAT). We use PAT to evaluate the asymptotic complexity of generalised parsing algorithms and to develop the Binary Right Nulled Generalised LR algorithm { a new cubic worst case parser. We also compare the Right Nullable Generalised LR, Reduction Incorporated Generalised LR, Farshi, Tomita and Ear- ley algorithms using the statistical data collected by PAT. Our study indicates that the overheads associated with some of the parsing algorithms may have signi¯cant consequences on their behaviour. 2 Acknowledgements My sincere gratitude goes to Elizabeth Scott and Adrian Johnstone. Their support and guidance throughout the course of my PhD has been invaluable. I would also like to thank my examiners, Peter Rounce and Nigel Horspool, for their constructive comments and suggestions that really improved the thesis. Many friends and colleagues have helped me over the last four years, but special thanks goes to Martin Green, Rob Delicata, Jurgen Vinju, Tijs van der Storm, Magiel Bruntink, Hartwig Bosse and Urtzi Ayesta. Of course none of this would have been possible without the continued support of my parents, Maggie and Costas, and my brother, Chris. Finally, I would like to thank Valerie for being there for me when I needed her most. 3 Contents 1 Introduction 15 1.1 Motivation . 15 1.2 Contributions . 16 1.3 Outline of thesis . 17 I Background material 18 2 Recognition and parsing 19 2.1 Languages and grammars . 19 2.2 Context-free grammars . 21 2.2.1 Generating sentences . 21 2.2.2 Backus Naur form . 22 2.2.3 Recognition and parsing . 24 2.2.4 Parse trees . 24 2.3 Parsing context-free grammars . 24 2.3.1 Top-down parsing . 25 2.3.2 Bottom-up parsing . 27 2.3.3 Parsing with searching . 29 2.3.4 Ambiguous grammars . 29 2.4 Parsing deterministic context-free grammars . 30 2.4.1 Constructing a handle ¯nding automaton . 30 2.4.2 Parsing with a DFA . 35 2.4.3 LR parsing . 36 2.4.4 Parsing with lookahead . 38 2.5 Summary . 41 3 The development of generalised parsing 42 3.1 Overview . 42 3.2 Unger's method . 45 3.3 The CYK algorithm . 47 3.4 Summary . 49 4 4 Generalised LR parsing 51 4.1 Using the LR algorithm to parse all context-free grammars . 51 4.2 Tomita's Generalised LR parsing algorithm . 55 4.2.1 Graph structured stacks . 56 4.2.2 Example { constructing a GSS . 56 4.2.3 Tomita's Algorithm 1 . 58 4.2.4 Parsing with ²-rules . 61 4.2.5 Tomita's Algorithm 2 . 63 4.2.6 The non-termination of Algorithm 2 . 67 4.3 Farshi's extension of Algorithm 1 . 68 4.3.1 Example { a hidden-left recursive grammar . 70 4.3.2 Example { a hidden-right recursive grammar . 72 4.4 Constructing derivations . 75 4.4.1 Shared packed parse forests . 76 4.4.2 Tomita's Algorithm 4 . 78 4.4.3 Rekers' SPPF construction . 81 4.5 Summary . 88 II Improving Generalised LR parsers 89 5 Right Nulled Generalised LR parsing 90 5.1 Tomita's Algorithm 1e . 91 5.1.1 Example { a hidden-left recursive grammar . 93 5.1.2 Paths in the GSS . 95 5.2 Right Nulled parse tables . 98 5.3 Reducing non-determinism . 99 5.4 The RNGLR recognition algorithm . 101 5.5 The RNGLR parsing algorithm . 105 5.6 Resolvability . 112 5.7 Summary . 115 6 Binary Right Nulled Generalised LR parsing 116 6.1 The worst case complexity of GLR recognisers . 117 6.2 Achieving cubic time complexity by factoring the grammar . 121 6.3 Achieving cubic time complexity by modifying the parse table . 124 6.4 The BRNGLR recognition algorithm . 125 6.5 Performing ‘on-the-ﬂy’ reduction path factorisation . 127 6.6 GLR parsing in at most cubic time . 131 6.7 Packing of bookkeeping SPPF nodes . 136 6.8 Summary . 141 5 7 Reduction Incorporated Generalised LR parsing 143 7.1 The role of the stack in bottom-up parsers . 144 7.2 Constructing the Intermediate Reduction Incorporated Automata . 148 7.3 Reducing non-determinism in the IRIA . 151 7.4 Regular recognition . 152 7.4.1 Recursion call automata . 152 7.4.2 Parse table representation of RCA . 154 7.5 Generalised regular recognition . 155 7.5.1 Example { ensuring all pop actions are done . 161 7.6 Reducing the non-determinism in the RCA . 166 7.7 Reducing the number of processes in each Ui .............. 166 7.8 Generalised regular parsing . 167 7.8.1 Constructing derivation trees . 167 7.8.2 Constructing an SPPF . 169 7.9 Summary . 178 8 Other approaches to generalised parsing 180 8.1 Mark-Jan Nederhof and Janos J. Sarbo . 180 8.2 James R. Kipps . 183 8.3 Jay Earley . 188 8.4 Bernard Lang . 192 8.5 Klaas Sikkel . 192 III Implementing Generalised LR parsing 194 9 Some generalised parser generators 195 9.1 ANTLR ................................... 196 9.2 PRECCx .................................. 197 9.3 JavaCC . 197 9.4 BtYacc . 197 9.5 Bison . 198 9.6 Elkhound: a GLR parser generator . 198 9.7 Asf+Sdf Meta-Environment . 199 10 Experimental investigation 201 10.1 Overview of chapter . 201 10.2 The Grammar Tool Box and Parser Animation Tool . 202 10.3 The grammars . 202 10.4 The input strings . 203 10.5 Parse table sizes . 204 6 10.6 The performance of the RNGLR algorithm compared to the Tomita and Farshi algorithms . 207 10.6.1 A non-cubic example . 207 10.6.2 Using di®erent LR tables . 209 10.6.3 The performance of Farshi's algorithms . 211 10.7 The performance of the BRNGLR algorithm . 214 10.7.1 GSS construction . 214 10.7.2 SPPF construction . 218 10.8 The performance of Earley algorithm . 222 10.9 The performance of the RIGLR algorithm . 224 10.9.1 The size of RIA and RCA . 224 10.9.2 Asymptotic behaviour . 225 IV 229 11 Concluding remarks 230 11.1 Conclusions . 230 11.2 Future work . 231 Bibliography 233 A Parser Animation Tool user manual 241 A.1 Acquisition and deployment of PAT and its associated tools . 241 A.1.1 Java Runtime Environment (JRE) . 241 A.1.2 Downloading PAT . 241 A.1.3 The Grammar Tool Box . 242 A.1.4 aiSee . 242 A.1.5 Input library . 242 A.2 User interface . 243 A.2.1 Specifying the grammar, parse table and string to parse . 244 A.2.2 Selecting a parsing algorithm . 245 A.2.3 Output options . 247 A.3 Animating parses . 248 A.3.1 GDL output of GSS and SPPF . 249 A.3.2 Histograms . 250 A.3.3.

Generalised LR Parsing Algorithms

The Earley Algorithm Is to Avoid This, by Only Building Constituents That Are Compatible with the Input Read So Far

LATE Ain't Earley: a Faster Parallel Earley Parser

A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages

Lecture 10: CYK and Earley Parsers Alvin Cheung Building Parse Trees Maaz Ahmad CYK and Earley Algorithms Talia Ringer More Disambiguation

Efficient Recursive Parsing

A Mobile App for Teaching Formal Languages and Automata

An Earley Parsing Algorithm for Range Concatenation Grammars

Advanced Parsing Techniques

Verifiable Parse Table Composition for Deterministic Parsing*

Basics of Compiler Design

Deterministic Parsing of Languages with Dynamic Operators

Principles and Implementation of Deductive Parsing