Eindhoven University of Technology MASTER Theory And

Eindhoven University of Technology MASTER Theory and implementation of a family of parser components Keim, Erik Award date: 2007 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain TECHNISCHE UNIVERSITEIT EINDHOVEN Department of Mathematics and Computer Science MASTER ’S THESIS Theory and implementation of a family of parser components by Erik Keim Supervisor: dr ir C. Hemerik (TU/e) Eindhoven, August 2007 Abstract Parsing is a much studied subject and many methods exist for parsing a sentence. In this thesis a subset of the parsing methods is described. The goal of this thesis is to gain more insight in the different parsing methods of the subset. The focus of this thesis is on recursive descent, top-down parsing methods for Extended Context Free Grammars. Different parser aspects such as parser generation and interpretation, backtracking, memoization, tree construction and transduction are described in this thesis. The goal of this study is to come to a classification of the different parsing methods. Several of the algorithms in the classification are described in more detail. This thesis serves also as a catalogue of the described parsing algorithms. An algorithm can be chosen based on its properties. The second part of the thesis consists of the design of a parser framework. The parsing algorithms in the classification are implemented in this parser framework. The parser framework allows the generation and use of several parser components. The parser components can be combined with grammar, scanner and tree builder components for constructing different language processing applications. Deterministic, limited backtrack and full backtrack interpreting parsers and parser generators are implemented with the options to do tree building and transduction. All components are integrated in the Borland Delphi IDE. The integration with Delphi allows easy construction of language processing applications. i Acknowledgement This thesis is the result of the final Master’s project of my study of computer science at the Technische Universiteit Eindhoven. The project was done at the department of Software Engineering and Technology under supervision of Kees Hemerik. The first time I approached Kees Hemerik for a Master’s project, he proposed the idea of constructing a parser framework consisting of several parser components. Kees had some ideas about the framework, but no concrete plan existed yet. The research on parsing algorithms as well as the design of a software toolkit attracted my attention. The subject of parsing was not new to me, as I had done a few courses on the subject. It seemed that what I learned in those courses was just the tip of the iceberg. The vast amount of information available on parsing can be overwhelming and without the guidance of my supervisor I would certainly have been lost. Throughout the project, the guidance of my supervisor was invaluable. I want to thank Kees Hemerik for pointing me in the right directions and for giving me very useful feedback about the thesis, the implementation of the parser framework, and the progress of the project. I also want to thank Gerard Zwaan for advising me when Kees Hemerik was not around and for reviewing the thesis. There are some other people I want to thank. First of all I want to thank Henk for correcting my paper. I want to thank my parents for their support. I wish to thank my friends Bas, Rudy and Eric for welcome distraction and discussion of my project. Last but not least I want to thank Yara for her support, encouragement and patience. ii Contents Abstract.................................................................................................................................................... i Acknowledgement.................................................................................................................................... ii Contents ................................................................................................................................................. iii List of Figures......................................................................................................................................... vi List of Tables ......................................................................................................................................... vii List of Algorithms.................................................................................................................................. vii 1 Introduction.........................................................................................................................................1 1.1 Problem statement.......................................................................................................................1 1.2 Outline of the paper.....................................................................................................................2 2 Preliminaries .......................................................................................................................................3 3 Parser Classification ............................................................................................................................8 3.1 Problems ......................................................................................................................................8 3.2 Grammar......................................................................................................................................9 3.3 Direction ......................................................................................................................................9 3.4 Non-Determinism .........................................................................................................................9 3.5 Implementation Style...................................................................................................................9 3.6 Classification..............................................................................................................................11 4 Parsing Algorithms............................................................................................................................12 4.1 Table of Properties ....................................................................................................................12 4.2 Scanning.....................................................................................................................................13 4.3 Parsing Problem ........................................................................................................................15 4.4 Top-Down Parsing Process ........................................................................................................15 4.4.1 Non-Termination ...........................................................................................................16 4.4.2 Full Backtrack Strategy ................................................................................................17 4.4.3 Deterministic Strategy...................................................................................................18 4.4.4 Limited Backtrack Strategy ..........................................................................................20 4.5 Error-handling............................................................................................................................22 4.6 Parser Implementation...............................................................................................................23 4.6.1 Interpreting Parsers.......................................................................................................24 4.6.2 Parser Generators..........................................................................................................25 5 Membership.......................................................................................................................................26 5.1 Deterministic Interpreting Parser ..............................................................................................26 5.2 Deterministic Parser Generator .................................................................................................30 5.3 Limited Backtrack Interpreting Parser......................................................................................33 5.4 Limited Backtrack Parser Generator.........................................................................................35 iii 5.5 Full Backtrack Interpreting Parser............................................................................................37

Eindhoven University of Technology MASTER Theory And

Derivatives of Parsing Expression Grammars

A Typed, Algebraic Approach to Parsing

A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages

Adaptive LL(*) Parsing: the Power of Dynamic Analysis

Efficient Recursive Parsing

Conflict Resolution in a Recursive Descent Compiler Generator

A Parsing Machine for Pegs

Advanced Parsing Techniques

Verifiable Parse Table Composition for Deterministic Parsing*

Syntactic Analysis, Or Parsing, Is the Second Phase of Compilation: the Token File Is Converted to an Abstract Syntax Tree

Lecture 3: Recursive Descent Limitations, Precedence Climbing

Basics of Compiler Design