Parsing Expression Grammars: A Recognition-Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology Cambridge, MA
[email protected] Abstract 1 Introduction For decades we have been using Chomsky’s generative system of Most language syntax theory and practice is based on generative grammars, particularly context-free grammars (CFGs) and regu- systems, such as regular expressions and context-free grammars, in lar expressions (REs), to express the syntax of programming lan- which a language is defined formally by a set of rules applied re- guages and protocols. The power of generative grammars to ex- cursively to generate strings of the language. A recognition-based press ambiguity is crucial to their original purpose of modelling system, in contrast, defines a language in terms of rules or predi- natural languages, but this very power makes it unnecessarily diffi- cates that decide whether or not a given string is in the language. cult both to express and to parse machine-oriented languages using Simple languages can be expressed easily in either paradigm. For CFGs. Parsing Expression Grammars (PEGs) provide an alterna- example, fs 2 a∗ j s = (aa)ng is a generative definition of a trivial tive, recognition-based formal foundation for describing machine- language over a unary character set, whose strings are “constructed” oriented syntax, which solves the ambiguity problem by not intro- by concatenating pairs of a’s. In contrast, fs 2 a∗ j (jsj mod 2 = 0)g ducing ambiguity in the first place. Where CFGs express nondeter- is a recognition-based definition of the same language, in which a ministic choice between alternatives, PEGs instead use prioritized string of a’s is “accepted” if its length is even.