Solving Difficult LR Parsing Conflicts by Postponing Them

DOI: 10.2298/CSIS101116008R Solving Difficult LR Parsing Conflicts by Postponing Them C. Rodriguez-Leon1 and L. Garcia-Forte1 Departamento de EIO y Computacion,´ Universidad de La Laguna [email protected], [email protected], http://nereida.deioc.ull.es Abstract. Though yacc-like LR parser generators provide ways to solve shift-reduce conflicts using token precedences, no mechanisms are provided for the resolution of difficult shift-reduce or reduce-reduce conflicts. To solve this kind of conflicts the language designer has to modify the grammar. All the solutions for dealing with these difficult conflicts branch at each alternative, leading to the exploration of the whole search tree. These strategies differ in the way the tree is explored: GLR, Backtrack- ing LR, Backtracking LR with priorities, etc. This paper explores an en- tirely different path: to extend the yacc conflict resolution sublanguage with new constructs allowing the programmers to explicit the way the conflict must be solved. These extensions supply ways to resolve any kind of conflicts, including those that can not be solved using static precedences. The method makes also feasible the parsing of grammars whose ambiguity must be solved in terms of the semantic context. Besides, it brings to LR-parsing a common LL-parsing feature: the advantage of providing full control over the specific trees the user wants to build. Keywords: parsing, lexical analysis, syntactic analysis. 1. Introduction Yacc-like LR parser generators [3] provide ways to solve shift-reduce conflicts based on token precedence. No mechanisms are provided for the resolution of difficult reduce-reduce or shift-reduce conflicts. To solve such kind of conflicts the language designer has to modify the grammar. Quoting Merrill [5]: Yacc lacks support for resolving ambiguities in the language for which it is attempting to generate a parser. It does a simple-minded approach to resolving shift/reduce and reduce/reduce conflicts, but this is not of sufficient power to solve the really thorny problems encountered in a genuinely ambiguous language Some context-dependency ambiguities can be solved through the use of lexical tie-ins: a flag which is set by the semantic actions, whose purpose is to alter the way tokens are parsed [1, p. 106]. But it is not always possible or easy to resort to this kind of tricks to fix some context dependent ambiguity. C. Rodriguez-Leon and L. Garcia-Forte A more general solution is to extend LR parsers with the capacity to branch at any multivalued entry of the LR action table. For example, Bison [1], via the %glr-parser directive and Elkhound [4] provide implementations of the Generalized LR (GLR) algorithm [11]. In the GLR algorithm, when a conflicting transition is encountered, the parsing stack is forked into as many parallel parsing stacks as conflicting actions. The next input token is read and used to determine the next transitions for each of the top states. If some top state does not transit for the input token it means that path is invalid and that branch can be discarded. Though GLR has been successfully applied to the parsing of ambiguous languages, the handling of languages that are both context-dependent and ambiguous is more difficult [10, p. 3]. The Bison manual [1] points out the following caveats when using GLR: . there are at least two potential problems to beware. First, always analyze the conflicts reported by Bison to make sure that GLR splitting is only done where it is intended. A GLR parser splitting inadvertently may cause problems less obvious than an LALR parser statically choos- ing the wrong alternative in a conflict. Second, consider interactions with the lexer with great care. Since a split parser consumes tokens without performing any actions during the split, the lexer cannot obtain information via parser actions. Some cases of lexer interactions can be eliminated by using GLR to shift the complications from the lexer to the parser. You must check the remaining cases for correctness. The strategy presented here extends yacc conflict resolution mechanisms with new ones, supplying ways to resolve conflicts that can not be solved using static precedences. The algorithm for the generation of the LR tables remains unchanged, but the programmer can modify the parsing tables during run time. The technique involves labelling the points in conflict in the grammar spe- cification and providing additional code to resolve the conflict when it arises. Crucially, this does not requires rewriting or transforming the grammar, trying to resolve the conflict in advance, backtracking or branching into concurrent spe- culative parsers. Instead, the resolution is postponed until the conflict actually arises during parsing, whereupon user code inspects the state of the underlying parse engine to decide the appropriate solution. There are two main benefits: Since the full power of the native universal hosting language is at disposal, any grammar ambiguity can be tackled. We can also expect - since the conflict handler is written by the programmer - a more efficient solution which reduces the required amount of backtracking or branching. This technique can be combined to complement both GLR and backtracking LR algorithms [10] to give the programmer a finer control of the branching process. It puts the user - as it occurs in top down parsing - in control of the parsing strategy when the grammar is ambiguous, making it easier to deal with efficiency and context dependency issues. One disadvantage is that it requires some knowledge of LR parsing. It is conceived to be used when none of the available techniques - static precedences, grammar modification, 518 ComSIS Vol. 8, No. 2, Special Issue, May 2011 Solving Difficult LR Parsing Conflicts by Postponing Them backtracking LR or Generalized LR - produces satisfactory solutions. We have implemented these techniques in eyapp [7], a yacc-like LALR parser generator for Perl [13, 6]. This paper is divided in six sections. The next section introduces the Post- poned Conflict Resolution (PPCR) strategy. The following three sections illustrate the way the technique is used. The first presents an ambiguous grammar where the disambiguating rule is made in terms of the previous context. The next shows the technique on a difficult grammar that has been previously used in the literature [1] to illustrate the advantages of the GLR engine: the declara- tion of enumerated and subrange types in Pascal [12]. The last example deals with a grammar that can not be parsed by any LL(k) nor LR(k), whatever the value of k, nor for packrat parsing algorithms [2]. The last section summarizes the advantages and disadvantages of our proposal. 2. The Postponed Conflict Resolution Strategy The Postponed Conflict Resolution (PPCR) is a strategy to apply whenever there is a shift-reduce or reduce-reduce conflict which is unsolvable using static precedences. It delays the decision, whether to shift or reduce and by which production to reduce, to parsing time. Let us assume the eyapp compiler an- nounces the presence of a reduce-reduce conflict. The steps followed to solve a reduce-reduce conflict using the PPCR strategy can be divided in two activities: conflict identification and mapping (steps 1a to 1d) and writing the solver (step 2a). 1. Conflict Identification and Mapping (a) Identify the conflict: What LR(0)-items/productions and tokens are involved?. Tools must support that stage, as for example via the .output file gen- erated by eyapp. Suppose we identify that the participants are the two LR(0)-items A ! α" and B ! β" when the lookahead token is @. (b) Give a name to the productions: the software must allow the use of symbolic labels to refer by name to the productions involved in the conflict. Let us assume that production A ! α has label :rA and production B ! β has label :rB. A difference with yacc is that in eyapp productions can have names and labels. In eyapp names and labels can be explicitly given using the directive %name, using the following syntax: %name :rA A ! α %name :rB B ! β (c) Give a symbolic name to the conflict. In this case we choose isAorB as name of the conflict. ComSIS Vol. 8, No. 2, Special Issue, May 2011 519 C. Rodriguez-Leon and L. Garcia-Forte (d) Inside the body section of the grammar, mark the points of conflict using the new reserved word %PREC followed by the conflict name: %name :rA A ! α %PREC IsAorB %name :rB B ! β %PREC IsAorB 2. Writing the Conflict Handler (a) Introduce a %conflict directive inside the head section of the transla- tion scheme to specify the way the conflict will be solved. The directive is followed by some code - known as the conflict handler - whose mis- sion is to modify the parsing tables. This code will be executed each time the associated conflict state is reached. This is the usual layout of the conflict handler: %conflict IsAorB { if (is_A) { $self->YYSetReduce(’@’, ’:rA’ ); } else { $self->YYSetReduce(’@’, ’:rB’ ); } } The call to is_A represents the context-dependent dynamic knowledge that allows us to take the right decision. It is usually a call to a nested parser for A but it can also be any other contextual information we have to determine which one is the right production. Inside a conflict handler the Perl default variable $_ refers to the full input text and $self refers to the parser object. Variables in Perl - like $self - have prefixes like $ (scalars), @ (lists), % (hashes or dictionaries), & (subroutines), etc. specifying the type of the variable. These prefixes are called sigils. The sigil $ indicates a scalar variable, i.e. a variable that stores a single value: a number, a string or a reference.

Solving Difficult LR Parsing Conflicts by Postponing Them

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support