MODULE 18 – LALR After understanding the most powerful CALR parser, in this module we will learn to construct the LALR parser. The CALR parser has a large set of items and hence the LALR parser is designed that has lesser number of items but with reduction in the number of conflicts which is a problem of SLR parser. This module will discuss the construction of LR(1) items necessary for LALR parsing, LALR parsing table followed by parsing a string using the LALR parser.

18.1 Need for LALR parser

Though the CALR parser is powerful enough in avoiding the conflicts of the SLR parser, it suffers from a large set of LR(1) items. This increases the number of entries in the CALR parsing table and thus increases the time complexity of computation and parsing. Increase in the number of items is reduced in LALR parsing table by combining the items that have the same core items but different look-ahead. Thus this is less powerful than CALR parser but avoids shift/reduce conflicts as shifts do not use look-ahead. As we are combining the items with different look- ahead into one, the LALR parser may introduce reduce-reduce conflicts, but not much of a problem for grammars of programming languages.

18.2 LR(1) items

The algorithm for LR(1) items for the LALR parser is computed by first constructing the LR(1) items as in the case of the CALR parser and then combining the items that have the same item- set but differing look-ahead into one item. The algorithm for the CALR’s LR(1) items construction is discussed in module 17. Combining the items alone is discussed by means of an example.

Example 18.1

Let us construct the LR(1) items for the grammar given below to construct the LALR parsing table.

S  CC C  cC C  d The augmented grammar is given below and the CALR’s LR(1) items are repeated here for a quick reference in Table 18.1

• S’  S • S  CC • C  cC • C  d

Table 18.1 LR(1) items of CALR parsing.

Item Set of Items Goto(I, X) Comments I0 S’  .S, $ This is the initial item. We have a non-terminal S S  .CC, $ after the dot. So we add the productions of S, with C  .cC, c/d look-ahead as FIRST($) since β is ε. Now again we C  .d, c/d have non-terminal C after the dot and here β is ‘C” and ‘a’ is $. So, we add the productions of C with lookahead as FIRST(C$). FIRST(C) = {c, d} from the two productions of C. Thus we add two items for each of the productions of C one with ‘c’ and other with ‘d’ as look-ahead. However, we could represent it in a combined fashion as given in this items set. I1 S’  S., $ (I0 , S) Shifting the dot results in a kernel item, the look- ahead remains the same. I2 S  C.C, $ (I0 , C) The dot is shifted by one position to the right. Now C  .cC, $ we have C after the dot. β is ε and we add the items C  .d, $ of C with FIRST($) as look-ahead. I3 C  c.C, c/d (I0 , c), Shifting the dot by one position and keeping the C  .cC, c/d (I3 , c) initial look ahead as it is, results in the first item. C  .d, c/d Now we have a C after the dot. β is ε and we add the items of C with FIRST(c/d) as look-ahead. I4 C  d., c/d (I0 , d), Kernel item with the look-ahead being the same (I3, d) I5 S  CC., $ (I2 , C) Kernel item I6 C  c.C, $ (I2 , c) The dot is shifted by one position to the right. Now C  .cC, $ (I6 ,c) we have C after the dot. β is ε and we add the items C  .d, $ of C with FIRST($) as look-ahead. I7 C  d., $ (I2 ,d) Kernel item (I6 ,d) I8 C  cC., c/d (I3 ,C) Kernel item I9 C  cC., $ (I6 ,C) Kernel item and no more new items are necessary to be added.

From Table 18.1 consider items I3 and I6. Both these items set have the same core but they differ in their look-ahead and hence we combine them and call it as item I36 as given below.

I36 : goto(I0 , c), goto(I36 , c), C  c.C, c/d/$ C  .cC, c/d/$ C  .d, c/d/$

Similarly items I4 and I7 could be combined together as item I47 and so is items I8 and I9 as I89 . The items are given below:

• I47 : goto(I2 ,d) goto(I6 ,d)

C  d., c/d/$

• I89 : goto(I3 ,C)

C  cC., c/d/$

Thus we have reduced 3 items from the CALR’s LR(1) items and have items I0, I1, I2, I36, I47, I5 and I89.

18.2 LALR parsing table

After constructing the LR(1) items by combining the necessary items we use this reduced set to construct the LALR parsing table. The parsing construction is the same as that discussed for the CALR parser in the previous module but we work with LALR’s LR(1) items. The LALR parsing table is given in Table 18.2 for the grammar of example 18.1.

Table 18.2 LALR parsing table

State Action Goto Comments c d $ S C 0 s36 s47 1 2 Goto(I0 ,c) = I36 , => [0,c] = s36 Goto(I0 ,d) = I47=> [0,d] = s47 Goto(I0 ,S) = I1 => [0,S] = 1 Goto(I0 ,C) = I2 => [0,C] = 2 1 accept I1 has [S’  S., $] so at [1, $] we have accept action 2 S36 S47 5 Goto(I2 ,c) = I36 , => [2,c] = s36 Goto(I2 ,d) = I47=> [2,d] = s47 Goto(I2 ,C) = I5 => [2,C] = 5 36 s36 s47 89 Goto(I36 ,c) = I36 , => [36,c] = s36 Goto(I36 ,d) = I47=> [36,d] = s47 Goto(I36 ,C) = I89 => [36,C] = 89 47 r3 r3 r3 C  d., c/d/$, so at the intersection of [47, c], [47,d] and [47, $] we set reduce by C  d 5 r1 S  CC., $, at the intersection of [5, $] we set reduce by S  CC 89 r2 r2 r2 C  cC., c/d/$ at the intersection of [89,c], [89,d] and [89, $] we set reduce by C  cC

18.3 LALR Parsing

The LALR parsing algorithm is the same as CALR’s parsing algorithm except that this algorithm will refer to the LALR parsing table and the input stack. This parser will not have a shift/reduce conflict but for some grammar this will have a reduce/reduce conflict and the parser will be in favor of reducing with the first production.

Example 18.2 Consider the grammar of example 18.1 and see the parsing action of LALR parser for the input “ccdd”.

Like other parsers, the input string is appended with $ and the parsing action is shown in Table 18.3

Table 18.3 Parsing action of the LALR parser

Stack Input Action 0 ccdd$ [0, c] – shift 36 0 c 36 c d d $ [36, c] – shift 36 0 c 36 c 36 d d $ [36, d] – shift 47 0 c 36 c 36 d 47 d $ [47, d] – reduce 3, pop 2 symbols from stack, push C, goto(36, C) = 89 0 c 36 c 36 C 89 d $ [89, d] – reduce 2, pop 4 symbols from the stack, push C, goto(36, C) = 89 0 c 36 C 89 d $ [89, d] – reduce 2, pop 4 symbols from the stack, push C, goto(0, C) = 2 0 C 2 d $ [2, d] – shift 47 0 C 2 d 47 $ [47, $] – reduce 3, pop 2 symbols from the stack, goto(2, C) = 5 0 C 2 C 5 $ [5, $] – reduce 1, pop 4 symbols off the stack, goto(0, S) = 1 0 S 1 $ [1, $] – accept – successful parsing

As can be seen from Table 18.3 the number of steps in parsing is lesser than that of the CALR parser.

Example 18.4 For the pointer variable declaration grammar, the modified set of LR (1) items and the parsing table are given in Table 18.4 and 18.5 respectively

Item Set of Items Goto(I, X) Comments I0 S’  .S, $ Initial item. Then all the items need to be added S  •L=R, $ with ‘$’ as look ahead for S, R. But for L we have S  •R,$ two look-ahead ‘$’ and ‘=’one from S .L=R and L  •*R,=/$ other from R  .L. L  •id,=/$ R  •L,$ I1 S’  •S,$ (I0,S) Kernel item to result in accept action I2 S  L•=R,$ (I0,L) After the dot we have a terminal and hence no additional items need to be added R  L•, $ Kernel item I3 S  R•, $ (I0,R) Kernel item I4 L  *•R,=/$ (I0,*), Items of R to be added with the same look-ahead R  •L,=/$ (I4,*) which results in addition of the items corresponding L  •*R,=/$ to R and in –turn L L  •id, =/$ I5 L  id•,=/$ (I0,id) Kernel item (I4,id ) I6 S  L=•R,$ (I2,=) Items of R to be added with same look ahead and R  •L, $ in-turn items of L are added. L  •*R, $ L •id, $ I7 L  *R•,=/$ (I4,R) Kernel item I8 R  L•,=/$ (I4,L) Kernel item I9 S  L=R•,$ (I6,R) Kernel item I10 R  L•,$ (I6,L), Kernel item (I11,L) I11 L  *•R,$ (I6,*) This is a new item and is different from I4 because R  •L,$ (I11,*) they have a different look-ahead L  •*R,$ L  •id, $ I12 L  id•,$ (I6,id) Kernel item (I11,id) I13 L  *R•, $ (I11,id) Kernel item

I4 and I11 could be combined together and called I411

I5 and I12 could be combined together and called I512

I7 and I13 could be combined together and called I713

I8 and I10 could be combined together and called I810

Thus we reduce the set of items from 14 to 10 in the LALR parsing algorithm. The modified LALR parsing table is given in Table 18.5

Table 18.5 LALR parsing table

State Action Goto id * = $ S L R 0 s512 s411 1 2 3 1 acc 2 s6 r5 3 r2 411 s512 s411 810 713 512 r4 r4 6 s512 s411 810 9 713 r3 r3 810 r5 r5 9 r1 11 s512 s411 810 713

18.4 Conflicts in LL and LR Parsers

LL parsing tables are computed using FIRST/FOLLOW where the rows correspond to the non- terminals and the columns correspond to the terminals. To construct the parsing table the grammar need to be pre-processed to remove and need to be left factored and generate a modified grammar. This modified grammar is used to construct the FIRST and FOLLOW which are used to construct the parsing table.

LR parsing tables are computed using Closure and Goto, where the actions correspond to the shift, reduce, accept and error situation. The three types of LR parsers are SLR, CALR and LALR and all of them constructs a parsing table where the rows correspond to the states which are the result of LR() items and the columns corresponds to the terminals and non-terminals.

This parsing table is fundamental and is very important for the parsing action. An incorrect parsing table will result in an ambiguous parsing. A grammar is said to be LL(1) if its LL(1) parse table has no conflicts, SLR if its SLR parse table has no conflicts, LALR(1) if its LALR(1) parse table has no conflicts and CALR(1) if its CALR(1) parse table has no conflicts. The conflicts can be shift / reduce conflict or a reduce/ reduce conflict.

Conflicts in LL and LR parsers

Conflicts are resolved depending on whether operators / symbols involved are left/right associative or precedence. The following is the manner in which the parsers resolve the conflicts.

• For Left-associative operators the conflict is resolved in favor of reduce action.

• For Right-associative operators the conflict is resolved in favor of shift action.

• If the stack has a higher precedent operator the conflict is in favor of reduce action.

• If the stack has a lower precedent operator the conflict is in favor of shift action.

18.5 Error Detection and Recovery

Canonical LR parser uses full LR (1) parse table and will never make a single reduction before recognizing the error when a syntax error occurs on the input. SLR and LALR may still reduce when a syntax error occurs on the input, but will never shift the erroneous input symbol. An error is detected if the symbol on top of the stack and the input symbol do not have a LR parsing table entry. The parsers recover from errors so that the compilation can be carried forward and will not make it as a permanent change. The errors are recovered in one of the following ways:

• Panic mode: In this mode of error recovery, the stack symbols are popped until a state with a goto on a non-terminal A is found, where A represents a non-terminal of the grammar. From the input, the symbols are discarded until we find a symbol in the input that matches with the FOLLOW set of A.

• Phrase-level recovery: We implement individual error routines and call appropriate routines which will pop the stack / discard the input or both and log this information in an error log and recovers from error so that parsing could continue.

• Error productions: New error productions are added to the grammar. In the event of an incorrect state and table entry match, the symbols in the stack are popped until state has error production and this is pushed onto the stack. After that the input symbols are discarded till a parsing action could continue.

Summary:

In this module we discussed the construction of LR(1) items for the LALR parser which is a modified LR(1) items after constructing it for the CALR parser. Using the modified LR(1) items the LALR parsing table is constructed and is used to parser a given string. We also discussed the LALR parsing action along with error recovery in LR parsers.