The Impact of Terminal Prefixing on LL(1) Parsing

J. Appl. Environ. Biol. Sci. , 7(5)64-76, 2017 ISSN: 2090-4274 Journal of Applied Environmental © 2017, TextRoad Publication and Biological Sciences www.textroad.com The Impact of Terminal Prefixing on LL(1) Parsing Muhammad Shumail Naveed Department of Computer Science, University of Balochistan, Pakistan Received: December 10, 2016 Accepted: February 22, 2017 ABSTRACT In this paper a terminal prefixing is presented as a normal form of context-free grammar and applied on a LL(1) grammar. Its impact on the size of derivations, magnitude of parse trees and particularly on the efficiency of LL(1) parsing is analyzed. Results show that terminal prefixing has a significant impact on size of parse trees and similarly on the efficiency of LL(1) parser. The discussion included in this paper is helpful in the advancement of other parsers and also for the anatomization of topical normal forms. KEYWORDS : parsing, context-free grammar, LL(1) parsing, Greibach normal form, ambiguity 1. INTRODUCTION Parsing is the process of arranging a linear representation according to a particular grammar [1], and parser is a program that accomplished this process. Parsing has a great importance and extensive history of academic study yet it is not a solved problem [2], and therefore researchers have devoted decades analyzing to develop efficient parsing methods. Parsing has diverse applications especially in the area of compiler construction [3], natural language processing [4,5] and automatic speech recognition [6]. In classical compilers the parsing is a second phase and deals with the verification of syntax and generation of parse tree which defines the structure of a program. In contemporary compilers the top-down and bottom-up are the common parsing techniques. The top-down parsing starts the parsing from the root of the parse tree and determine from the input token stream how to develop the parse tree. Top-down parser perform this duty by using the information of the lookahead tokens and of the productions in the grammar. Furthermore, it scan the incoming tokens from left to right. Top-down parsing can be performed with a set of recursive functions or through table-driven approach which is easy to construct and modify [7]. Top-down parser can only parse a restricted class of a context-free grammar. Fundamentally, the context-free grammar (CFG) is one of a powerful concept of computational linguistics, programming languages and computer science. In computer science the context-free grammar is generally used to describe the syntax of programming languages [8]. It is also very useful in the description of restricted natural language based programming languages [9, 10]. Formally the context-free grammar is a four tuple (N,T,P,S), where N is a set of nonterminals. T is a set of terminals. Nonterminals and terminals are always disjoint. P is set of productions or rules, in which each rule takes the following form [11]: B → α, where B ∈ N and α ∈ (N U T) *. S is the start symbol and a subset of N. The context-free grammar is usually normalized before using in any real solution and parsing is no exception. The normalization usually restrict the structure of a grammar either by restricting the size of production or the arrangement of symbols in the productions. A restricted class of context-free grammar is used in top-down parsing. The top-down parsing has two main types: recursive-descent parsing and LL(1) parsing. The recursive-descent parsing analyze the input by using the recursive procedures where each such procedure represents one of the production of its context-free grammar. On the other hand, the LL(1) parsing generally does not require backtracking during building a parse tree and commonly used in compilers. LL(1) parsing is one of a simple and viable choice for syntax analysis which is a second phase of a compiler. Syntax analyzer is one of a time consuming phase of a compiler and consequently the development of efficient parser remains an active area of research in computer science and computational linguistics. This paper introduced a terminal prefixing which is a more general form of Greibach normal form and apply this terminal prefixing to simplify the LL(1) grammar and determine its impact on derivation, parse tree and LL(1) parsing. The paper is structured as follows. In section 2 the LL(1) parsing is described and the major work conducted in the improvement of LL parsing is included in section 3. In section 4 the terminal prefixing is introduced. The initial evaluation and results are included in section 5. Finally, section 6 describes the conclusion. *Corresponding Author : Dr. Muhammad Shumail Naveed, Department of Computer Science, University of Balochistan, Quetta, Pakistan. Email: [email protected] 64 Naveed, 2017 2. LL(1) Parsing The LL(1) parsing is a top-down, non-recursive predictive parsing technique which requires no backtracking and constructed over the class of grammars called LL(1) grammars. The fundamental concepts of LL(1) grammars were formally presented by Lewis and Stearns [12], and the attributes of LL(1) grammar were refined by Rosenkrantz and Stearnsc [13]. LL(1) parsing used an explicit stack to parse the sentence. It is also used in the definition of other parsing techniques especially in a bottom-up parsing algorithms [8]. The first “L” in LL(1) indicates that input is processed from left to right. The second “L” indicates that it performs the left most derivation for the input, and “1” indicates that in each step only one symbol of input is used as a lookahead to predict the parsing action. In LL parsing the lookahead symbols may be the multiple symbols. LL parsing is the foundation of a contemporary parser generator ANother Tool for Language Recognition (ANTLR) [14]. LL(1) parsing is usually implemented through a table called the parsing table. The construction of a LL(1) parsing table involves the computing of two functions associated with the LL(1) grammar. These two functions, FIRST and FOLLLOW facilitate to fill up the entries of LL(1) parsing table. LL(1) grammars are the restricted class of context-free grammars and therefore LL(1) parsing is inherently a restricted class of parsing technique. There are several limitations of LL(1) parsing. First, it cannot handle the ambiguous grammar. Similarly it cannot handle the left recursive grammar and it also can’t handle a grammar containing the conflict. If G is a context-free grammar and L( G) is a language defined over G, then grammar G is said to be ambiguous if there is at least one string in L(G) for which G generates multiple parse trees [15]. Ambiguity is a common characteristic of natural language; however in programming language it is necessary to remove the ambiguity if possible. Concretely there is no well defined algorithm for the identification and removal of ambiguity, nevertheless in most of the situations it is possible to identify and remove the ambiguity. Principally any language L can be defined by different grammars. If every grammar that defines L is ambiguous grammar, then the said language is known as inherently ambiguous. A grammar G is said to be left recursive, if it contains at least one left recursive production. A left recursive is one that takes the following form: X → X α X → β Where X is a nonterminal and α and β are the strings of nonterminals and terminals and β does not begin with X. The production having left recursion can be handled by rewriting that production into two rules: one that defines β first and another one that defines the repetition of α: X → β X’ X’ → α X’ | ε Conflict is an undesirable condition for LL(1) parser, that occur when multiple grammar productions choices have a common prefix: X → β α X → β γ This problem can be solved by left factoring. Through left factoring the β is factored on the left and rule is rewritten as: X → β X’ X’ → α X’ → γ 3. Previous Work For decades, experts and scientists have worked towards expanding the recognition strength and the performance of LL parsing. Parr et al. [16] introduced the ALL(*) parsing which integrates the plainness, effectiveness and predictability of traditional LL(k) parsers with the ability of a GLR-type method to take parsing decisions. The significant novelty is to move analysis to parse-time, which allows ALL(*) deal context-free grammar without any left recursion. Theoretically ALL(*) is O(n 4) but executes linearly on grammar used in practice. Shilling [17] introduced an efficient incremental parsing algorithm. It is devised for the language-based editors and implemented in Fred language-based editor. It can parse input at intervals of extremely tiny granularity and bound the amount of incremental parsing required when alterations are perform internal to the editing buffer. In [18], the notion of LLLR is introduced. An LLLR parser generates the left parse of the input in a linear time and can be built for any LL(k) grammar. It works as an LL(k) and used as a foundation for a syntax-directed translation. LL(k) is the backbone of the LLLR(k) parser and generate semantic action by means of top-down approach just similar to the canonical LR(k) parser. LLLR parsing is similar to LL(*) parsing but does not perform backtracking and utilize LR(k) parser to manage the LL(k) conflicts [19]. The LL(k) parser of an LL(k) grammar is fundamentally a single state deterministic push-down automaton that performs left parse of the input string. The LL( k) parsing is evolved as duple of LR ( k) parsing.

The Impact of Terminal Prefixing on LL(1) Parsing

LLLR Parsing: a Combination of LL and LR Parsing

Compiler Design

The Use of Predicates in LL(K) and LR(K) Parser Generators

Part 1: Introduction

Compiler Construction

ANTLR Reference PDF Manual

Town-Down LL Parsing

Types of Parse Trees

Generalized LR Parsing in Haskell

CSE P 501 – Compilers

LL and LR Parsing with Abstract Machines, Or Syntax for Semanticists Midlands Graduate School 2016

The Modelcc Model-Based Parser Generator