COMPILER DESIGN

For COMPUTER SCIENCE

COMPILER DESIGN .

SYLLABUS , , syntax-directed translation. Runtime environments. Intermediate code generation. ANALYSIS OF GATE PAPERS

Exam Year 1 Mark Ques. 2 Mark Ques. Total 2003 3 5 13 2004 3 1 5 2005 1 4 9 2006 1 5 11 2007 1 5 11 2008 2 1 4 2009 1 - 1 2010 2 1 4 2011 1 1 3 2012 - 2 4 2013 1 3 7 2014 Set-1 1 1 3 2014 Set-2 2 2 6 2014 Set-3 2 1 4 2015 Set-1 1 2 5 2015 Set-2 2 1 4 2015 Set-3 1 1 3 2016 Set-1 1 2 5 2016 Set-2 1 1 3 2017 Set-1 0 1 2 2017 Set-2 2 2 6

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission CONTENTS Topics Page No

1. INTRODUCTION TO COMPILERS

1.1 Language processing system 0 1.2 Phases of compiler 0 1.3 Analysis of the program 01 1.4 Lexical analysis 2 1.5 Input Buffering 13 Gate Questions 07 1 2. PARSING 12

2.1 Syntax analysis 1 2.2 The Role of the parser 1 2.3 Context free grammars 15 2.4 Derivations 5 2.5 Top down parsing 6 2.6 LL(1) grammar 17 2.7 LR parser 23 2.8 SLR parsing 26 2.9 LALR parsing 40 2.10 Error handling 542 Gate Questions 48 0 3. SYNTAX DIRECTED TRANSLATION 54

3.1 Introduction 65 3.2 Synthesized attributes 6 3.3 Syntax trees 7 3.4 Translation schemes 78 3.5 Top-down translation 0 3.6 Run time environment 85 Gate Questions 879 3 4. INTERMEDIATE CODE GENERATION 8

4.1 Introduction 9 4.2 Three address code 9 Gate Questions 3 4 5. ASSIGNMENT QUESTIONS 981

03

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1 INTRODUCTION TO COMPILERS

1.1 Language Processing System: Compilers:  A compiler is a program that reads a program written in one language the source language and translates it into an equivalent program in another language the target language.  Compilers are sometimes classified as single pass multi-pass, load and go, debugging or optimizing depending on how they have been constructed or on what function they are supposed to perform.

Fig. 1.2 a compiler Assembler:  As it is difficult to write machine language instructions, programmers Fig. 1.1 Language Processing System begin to use a mnemonic (symbols) for Preprocessor: A preprocessor produce each machine instruction, which they input to compilers. They may perform the would subsequently translate into following functions. machine language. Such a mnemonic machine language is now called an 1. Macro processing: assembly language. The assemblers A preprocessor may allow a user to were written to automate the define macros that are short hands for translation of assembly language into longer constructs. machine language. 2. File inclusion: A preprocessor may include header files Interpreter: into the program text.  An interpreter is a program that 3. Rational preprocessor: appears to execute a source program as These preprocessors augment older if it were machine language. languages with more modern flow-of-  Languages such as BASIC, LISP can be control and data structuring facilities. translated using interpreters. JAVA also 4. Language Extensions: uses interpreter. The process of These preprocessor attempts to add interpretation can be carried out in capabilities to the language by certain following phases. amounts to build-in macro. 1. Lexical analysis

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 2. Syntax analysis 3. Semantic analysis 4. Direct Execution Advantages:  Modification of user program can be easily made and implemented as execution proceeds.  Debugging a program and finding errors is simplified task for a program used for interpretation.  The interpreter for the language makes it machine independent Disadvantages:  The execution of the program is slower.  Memory consumption is more.

1.2 Phases of compiler Fig 1.3 Phases of a compiler  A compiler operates in phases. A phase is a logically interrelated operation that 1.2.1 Lexical analysis: takes source program in one representation and produces output in  This is the initial part of reading and another representation. There are two analyzing the program text. phases of compilation.  The text is read and divided into  Analysis (Machine Independent/ tokens, each of which corresponds to a Language Dependent) symbol in the ,  Synthesis (Machine Dependent/ e.g., a variable name, keyword or Language independent) number.  Compilation process is partitioned into no-of-sub processes called ‘phases’. 1.2.2 Syntax analysis:  In practice some of the phases grouped together and the intermediate The second stage of translation is called representation between the grouped Syntax analysis or parsing. In this phase phases need not explicitly constructed. expressions, statements, declarations etc. A common division into phases is are identified by using the results of lexical described below. analysis. Syntax analysis is aided by using  The front end includes all analysis techniques based on of the phase and the intermediate code programming language. generator with a certain amount of code optimization as well. 1.2.3 Semantic Analysis :  The back end includes the code optimization phase and the final code Syntax analyzer will just create . generation phase. Semantic Analyzer will check actual meaning The front end includes the source program of the statement parsed in parse and produces intermediate code which the tree. Semantic analysis can compare back end synthesizes the target program information in one part of a parse tree to from the intermediate code. that in another part (e.g., compare reference to variable agrees with its

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission declaration, or that parameters to a creates an intermediate representation function call match the function definition). of the source program. 2) Synthesis : It constructs the desired 1.2.4 Intermediate code generation : target program from the intermediate representation. The output of the syntax analyzer is some  Out of the two parts, synthesis requires representation of a parse tree. the the most specialized techniques. intermediate code generation phase  During analysis, the operations implied transforms this parse tree into an by the source program are determined intermediate language representation of and recorded in a hierarchical structure the source program. called as tree. 1.2.6 Code optimization: Example, a syntax tree for an assignment statement is shown in Fig. 1.4 This is optional phase described to improve the intermediate code so that the output runs faster and takes less space.

1.2.7 Code generation: The last phase of translation is code generation. A number of optimizations to reduce the length of machine language Fig. 1.4 Syntax tree for position: = initial program are carried out during this phase. + rate × 60. The output of the code generator is the machine language program of the specified 1.3.2 The Context of a Compiler computer.  In addition to a compiler, several other 1.3 Analysis of the Source Program programs may be required to create an executable target program.  Linear or Lexical analysis, in which  A source program may be divided into stream of characters making up the modules stored in separate files. source program is read from left-to-  The task of collecting the source right and grouped into tokens that are program is sometimes entrusted to a sequences of characters having a distinct program, called as Preprocessor. collective meaning. The preprocessor may also expend  Hierarchical or Syntax analysis, in shorthand’s called macros, into source which characters or tokens are grouped language statements. hierarchically into nested collections with collective meanings. 1.3.3 Symbol-Table Management  Semantic analysis, in which certain checks are performed to ensure that the  A symbol table is a data structure components of a program fit together containing a record for each identifier, meaningfully. with fields for the attributes of the identifier. The data structure allows us 1.3.1 The Analysis-Synthesis Model of to find the record for each identifier Compilation quickly and to store or retrieve data from that record quickly, There are two parts of compilation:  Symbol table is a Data Structure in a 1) Analysis : It breaks up the source Compiler used for Managing information program into constituent pieces and about variables & their attributes,

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1.3.4 Error Detection and Reporting This code moves the contents of the address a into register1, then adds the Each phase can encounter errors. However, constant 2 to it treating the content of after detecting an error, a phase must register1 as a fixed point number and somehow deal with that error, so that finally stores the result in the location compilation can proceed, allowing further named by b thus it computes b=a+2 errors in the source program to be detected, A compiler that stops when it 1.3.7 Two-Pass Assembly finds the first error is not as helpful as it could be. The simplest form of assembler makes two  The lexical phase can detect errors passes over the input, where a pass where the character remaining in the consists of reading an input file once. In the input do not form any token in the first pass, all the identifiers that denote language. storage locations are found and stored in a  Where token stream violates the rule symbol table. that violates structure rules of language In the second pass, the assembler scans the determined by syntax analysis phase. input again. This time, it translates each  The lexical phase can detect errors operation code into the sequence of bits where the characters remaining in the representing that operation in machine input do not form any token of the language, and it translates each identifier language. Errors where the token representing a location into the address stream violates the structure rules given for that identifier in the symbol table. (syntax) of the Language are The output of the second pass is usually determined by the syntax analysis relocatable machine code, meaning that it phase. can be loaded starting at any location L in memory; i.e., if L is added to all addresses 1.3.5 Retargeting in the code, then all references will be correct. Thus, the output of the assembler It is a process of creating more & more must distinguish those portions of compilers for the same source language, instructions that refer to addresses that can but for different machines. be relocated.

1.3.6 Assemblers 1.3.8 Preprocessor

Some compiler produces assembly code Preprocessor produce input to compiler. that’s passes to an assembler for further They may perform following function. processing. Other compilers perform the job of the assembler, producing relocatable  Macro processing : A Preprocessor may machine code that can be passed directly to allow a user to define macro that are loader/link editors. shorthand’s for longer constructs. Assembly code is a mnemonic version of  File inclusion : A pre-processor may machine code, in which names are used include header file into the program text. instead of binary codes for operations, and For example : C processor causes the names are also given to memory addresses. context of the file to replace A typical sequence of assembly instructions the statement # include when it might be processes a file, containing this MOV a, R1 statement. ADD #2, R1  Rational pre-processor : These MOV R1, b processor augment older languages with

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission more modern flow of control and data cohesive sequence of characters, such as an structure facilities for example, such a identifier, a keyword (if, -while, etc.), a pre-processor might provide user with punctuation character, or a multi-character built in macros for construct like while operator like : = The character sequence statement or if-statement , where none forming a token is called the lexeme for the exists in the programming language token. itself. Certain tokens will be augmented by a  Language Extensions : These processors "lexical value". For example, when an attempt to add capabilities to the identifier like rate is found, the lexical language by what amounts to built in analyzer not only generates a token, say id, macros bat also enters the lexeme rate into the symbol table, if it is not already there. The 1.3.9 Phases vs. Passes: lexical value associated with this occurrence of id points to the symbol-table Scanning of complete text from Left hand entry for rate. side to Right side is called as "Pass" For In this section, we shall use id1, id2, and id3 translations, only use passes i.e. for position, initial, and rate, respectively, implemented more 2 passes. emphasize that the internal representation - Scan of an identifier is different from the - Translate character sequence forming the identifier. Multi pass - requires less space, but it is The representation of assignment slower. statement (1.1) after lexical analysis is Single pass - It is faster, but requires more therefore suggested by: space. Id1 = id2 + id3 * 60 ------(1.2) 1.3.12 Syntax Analysis Phase i) The syntax Analysis Phase: The syntax analysis phase imposes a hierarchical structure on the token stream, shown below

Fig. 1.4

1.3.10 The Analysis Phases

As translation progresses, the compiler's internal representation of the source Fig. 1.5 program changes . We illustrate these ii) 'The Semantic Analysis Phase: Daring representations by considering the the semantic analysis, it is considered translation of the statement. that in our example all identifiers have position: initial + rate * 60 ------(1.1) been declared to be real and that 60 by itself is assumed to be an integer. 'Type 1.3.11 Lexical Analyzer checking of syntax tree reveals that * is applied to a real rate and an integer, 60. The lexical analysis phase reads the The general approach is to convert the characters in the source program and integer into a real. This has been groups them into a stream of tokens in achieved by creating an integer into a which each token represents a logically real

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission ↓ Code optimizer Temp 1: = id3*60.0 Id1: = id2 + temp1 Code optimizer ↓ MOVF id3, R2 MULF #60.0, R2 Fig. 1.6 MOVF id2, R1 ADDF R2, R1 1.3.13 The Synthesis Phases MOVF R1, id1

i) Intermediate Code Generation (or) ICG Fig.1.7  After syntax and semantic analysis, (a) some compilers generate an explicit intermediate representation of the source program.  We can think of this intermediate representation as a program for an abstract machine.  This intermediate representation (b) should have two important properties; it should be easy to Symbol Table 1 position ……. 2 Initial ……. 3 rate …….. 4

Position: initial + rate*60 ↓ Lexical analyzer Id1:=id2+id3*60 Syntax analyzer Fig 1.8 Data structure in (b) is for the tree in (a)

ii) Code Optimization :- ↓ Semantic analyzer  This phase attempts to improve the ↓ intermediate code , so that faster running code will result  Some optimizations are trivial. For example, a natural algorithm generates the intermediate code (1.3), using an instruction for each operator in the tree representation after semantic analysis, ↓ even though there is a better way to Intermediate code generator ↓ perform the same calculation, using the Temp 1 : int to real(60) two instructions. Temp 2: id3*temp1 Temp1: = id3 * 60.0 Temp 3 = id3 + temp2 Id1: = id2 + temp1

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1.3.14 Advantages of Code Optimization: Interpreter→50 ti (is better for single instruction) - Improves Efficiency - Occupies less memory 1.3.16 Advantage of Interpreter: - Executes faster Code optimization applied on 3-address - Certain language features are supported code is called Machine Independent by Interpreter rather than compiler. optimization. - "Portability"

iii) Code Generation:- 1.3.17 important observations: To find the number of tokens, we divide the  The final phase of the compiler is the entire statement, into tokens as shown generation of target code, consisting below. normally of relocatable machine code Note: The within- double colon information or assembly code. (" ") is treated as single token.  Memory locations are selected for each of the variables used by the program. Example: Then, intermediate instructions are each translated into a sequence of machine instructions that perform the same task.  A crucial aspect is the assignment of variables to registers.

1.3.15 Loaders and Link-Editors Fig. 1.9

 A program called a loader performs the 1.4 Lexical Analysis two functions of loading and link editing.  The process of loading consists of taking The main purpose of lexical analysis is to reloadable machine code, altering the make life easier for the subsequent syntax reloadable addresses and placing the analysis phase. In theory, the work that is altered instructions and data in done during lexical analysis can be made an memory at the proper locations. integral part of syntax analysis. However,  The link-editor allows us to make a there are reasons for keeping the phases single program from several files of separate relocatable machine code. These files  A token is a string of characters, may have been the result of several categorized according to the rules as a different compilations, and one or more symbol (e.g. IDENTIFIER, NUMBER, may be library files of routines COMMA, etc.). The process of forming provided by the system and available to tokens from an input stream of any program that needs them. characters is called tokenization and  Interpreter: Unlike compiler, interpreter the lexer categorizes them according to takes single instruction &executes. symbol type. A token can look like anything that is useful for processing an Example: input text stream or text file. 5000 statements - Program  LA reads the source program one 50 statements-need to be visited for a test character at a time, carving the source run program into a sequence of automatic Compiler→500 tc+ 50 tc units called ‘Tokens’.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1: Type of the token.  Its main task is to read the input 2: Value of the token. characters and produce as output a Type : variable, operator, keyword, sequence of tokens that the parser uses constant for syntax analysis. Value : Name of variable, current  This iteration, summarized variable (or) pointer to symbol table schematically in fig 1.9 is commonly  In a compiler, linear analysis is called implemented by making the lexical lexical analysis or scanning. For analyzer be a sub- routine or a co- example, in lexical analysis the routine of the parser. Upon receiving a characters in the assignment statement "get next token" command from the position: =initial + rate * 60 would be parser, the lexical analyzer reads input grouped into the following tokens: characters until it can identify the next 1. The identifier ‘position’, ‘initial’, ‘rate’. token. -. 2. The assignment symbol ‘:=’, ‘+’, ‘*’ 3. The number 60. The blanks separating the characters of these tokens would normally be eliminated during lexical analysis.

1.4.1 Token, Pattern, Lexeme:

Token: Token is a sequence of characters that can be treated as a single logical entity. Fig. 1.10 Interaction of lexical analyzer Typical tokens are, 1)Identifiers with parser. 2)keywords 3)operators 4) special symbols 5)constants In some compilers, the lexical analyzer is in Pattern: A set of strings in the input for charge of making a copy of the source which the same token is produced as program with the error messages marked output. This set of strings is described by a in it. If the source language supports some rule called a pattern associated with the macro pre-processor functions, then these token. preprocessor functions may also be Lexeme: A lexeme is a sequence of implemented as lexical analysis takes place. characters in the source program that is Up on receiving a “get next token” matched by the pattern for a token. Below command from the parser, the lexical is the example. analyzer reads input characters until it can Token Lexeme pattern identify the next token. const const const Its secondary tasks are, if if if  One task is stripping out from the relation <,<=,=,<>,>=,> or>=or source program comments and white letter followed by letters & digit space is in the form of blank, tab, new i pi Any numeric constant line characters. num 3.14 Any character b/w  Another task is correlating error “and”except” messages from the compiler with the literal “core” pattern source program 1.4.2 Lexical Analyzer Sometimes lexical analyzer is divided in to cascade of two phases.  The lexical analyzer is the first phase of 1) Scanning a compiler. 2) lexical analysis.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission The scanner is responsible for doing simple which the information about the token tasks, while the lexical analyzer proper is kept. does the more complex operations.  The pointer becomes the attribute for the token. 1.4.3 Typical Task Of the Scanner  For diagnostic purposes, we may be (Lexical Analyzer) interested in both the lexeme for an identifier and the line number on which - Scan input it was first seen. Both these items of - Find Integer and Floating Point Constants information can be stored in the - Remove Comments symbol-table entry for the identifier. - Remove white spaces - Treat white spaces appropriately in the 1.4.6 Lexical Errors form of blank, tab and new line characters - Counting the number of lines  Error handling is very localized, w.r.t. - Find string and character constants input source - Pass token to parser  Example : while (x: = 0) do Generates no lexical errors in PASCAL Example : The C statement  In what situation do Errors Occur? If (++x==5) foo(3) it is tokenized as - Prefix of remaining input doesn’t match any defined token IF ( ++ ID == 5 ) ID ( Int ) : Other possible error recovery actions are: 1.4.4 Issues in Lexical Analysis 1. deleting an extraneous character 2. inserting a missing character There are several reasons for separating 3. replacing an incorrect character by a the analysis phase of compiling into lexical correct character analysis and parsing. 4. transposing two adjacent characters. 1. Simpler design is perhaps the most important consideration.  If no prefix of the input string forms a 2. Compiler efficiency is improved. A valid token, a lexical error has occurred. separate lexical analyzer allows us to When this happens, the lexer will construct a specialized and potentially usually report an error. more efficient processor for the task.  At this point, it may stop reading the 3. Compiler portability is enhanced. Input input or it may attempt continued alphabet peculiarities and other device- lexical analysis by skipping characters specific anomalies can be restricted to until a valid prefix is found. the lexical analyzer.  The purpose of the latter approach is to try finding further lexical errors in the 1.4.5 Attributes for Tokens same input, so several of these can be corrected by the user before re-running  The lexical analyzer collects information the lexer. about tokens into their associated  Some of these subsequent errors may, attributes. however, not be real errors but may be  The tokens influence parsing decisions; caused by the lexer not skipping enough the attributes influence the translation characters (or skipping too many) after of tokens. the first error is found.  As a practical matter, a token has If the string fi is encountered in a C usually only a single attribute - a program for the first time in the context pointer to the symbol-table entry in fi( a = = f(x) )......

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission A lexical analyzer cannot tell whether fi (iii) (r)* is a regular expression- is a misspelling of the keyword if or an denoting. (L(r))* undeclared function identifier. Since fi (iv)(r) is a regular expression denoting is a valid identifier, the lexical analyzer L(r) must return the token for an identifier and let some other phase of the 1.4.10 Regular Definitions compiler handle any error. But, suppose a situation does arise in If ∑ is an alphabet of basic symbols, then a which the lexical analyzer is unable to regular definition is a sequence of proceed because none of the patterns definitions of the form

for tokens matches a prefix of the dr11 remaining input. Perhaps the simplest dr recovery strategy is "panic mode" 22 recovery. We delete successive ......

characters from the remaining input drnn until the lexical analyzer can find a well- Where each di is a distinct, name, and each formed token. This recovery technique η is a-regular expression over the symbols may occasionally confuse the parser, in ∑ ∪{d1,d2,…..di-1}i. e the basic symbols but in an interactive computing and the previously defined names. By environment it may be quite adequate. restricting each η to symbols of ∑ and the previously defined names, we can construct 1.4.7 Specification of Tokens a regular expression over ∑ for any η by repeatedly replacing regular expressions' Regular expressions are an important names by the expressions they denote. If η notation for specifying patterns. Each used dj for some j  I, then η might be pattern matches a set of strings so regular recursively defined, and this substitution expressions will serve as names for sets of process would not terminate. strings Example: A regular definition for Pascal 1.4.8 Regular Expressions identifiers Letter → A |B...|Z| a |b|...... |z A regular expression is a notation to specify Digit → 0|1|.....|9 a regular set. In Pascal, an identifier is a Id → letter (letter | digit) letter followed by zero or more letters or digits. With regular expression notation, we 1.4.11 Recognition of tokens: define Pascal identifiers as We learn how to express pattern using 1.4.9 Letter (letter | digit) regular expressions. Now, we must study Following are the rules that define the how to take the patterns for all the needed regular expressions over alphabet ∑ tokens and build a piece of code that 1. ∈ is a regular expression denoting {∈} examins the input string and finds a prefix 2. If x is a symbol in∑, then x is a regular that is a lexeme matching one of the expression denoting {x} patterns. 3. Suppose r and s are regular expressions Stmt if expr then stmt | If expr then else denoting L(r) and L(s) stmt | є (i) (r) | (s) is a regular expression Expr term relop term | term denoting L(r)  L(s) Term id | number (ii) (r)(s) is a regular expression The terminal of grammar, which are if, then denoting L(r)L(s) , else, relop ,id and numbers are the names

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission of tokens as far as the lexical analyzer is the first approach, the design and use of an concerned, the patterns for the tokens are automatic generator, we also consider described using regular definitions. techniques that are helpful in manual digit-->[0,9] design. digits-->digit+ number-->digit(.digit)?(e.[+-]?digits)? letter-->[A-Z,a-z] id-->letter(letter/digit)* if-->if then-->then else-->else relop -->/<=/>=/==/< >

1.5 Input Buffering

This section covers some efficiency issues concerned with the buffering of input. We first mention a two-buffer input scheme that is useful when look- ahead on the input is necessary to identify tokens. Then we introduce some useful techniques for speeding up the lexical analyzer, such as the use of "sentinels" to mark the buffer end. There are three general approaches to the implementation of a lexical analyzer.

1. Use a lexical-analyzer generator 2. Write the lexical analyzer in a conventional systems programming language, using the I/O facilities of that language to read the input. 3. Write the lexical analyzer in assembly language and explicitly manage the reading of input.

The three choices are listed in order of increasing difficulty for the implementer. Unfortunately, the harder-to-implement approaches often yield faster lexical analyzers. Since the lexical analyzer is the only phase of the compiler that reads the source program character-by-character, it is possible to spend a considerable amount of time in the lexical analysis phase, even though the later phases are conceptually more complex. Thus, the speed of lexical analysis is a concern in compiler design. While the bulk of the chapter is devoted to

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission GATE QUESTIONS

Q.1 The number of tokens in the 4. Code improving transformations following C statements can be performed at both source Print f(“i=%d, & i = % x”, i & i); is Language & intermediate code a) 3 b) 26 level c) 10 d) 21 a) 1 and 2 b) 1 and 4 [GATE- 2000] c) 3 and 4 d) 1, 3 and 4 [GATE -2009] Q.2 Which of the following is not an advantages of using shared, Q.5 Which data structure in a compiler dynamically linked libraries as is used for managing information opposed to using statically linked about variables and their attributes? libraries? a) Abstract syntax-tree a) Smaller sizes of executable files b) Symbol table b) Lesser overall page fault rate in c) Semantic stack the system d) Parser table c) Faster program startup [GATE -2010] d) Existing programs need not be relinked to take advantage of Q.6 In a compiler, keyboards of a newer versions language are recognized during [GATE -2003] a) Parsing of the program b) The code generation Q.3 Consider a program P that consists c) The lexical analysis of the program of two source modules M1 and M2 d) Data flow analysis contained in two different files. If [GATE -2011] M1 contains a reference to a function defined in M2, the reference will be Q.7 Match the following: resolved at List-I a) edit time b) Compile time (P) Lexical analysis c) link time d) load time (Q) Top down parsing [GATE- 2004] (R) Semantic analysis (S) Runtime environments Q.4 Which of the following statements List-II are true? (i) Leftmost derivation 1. There exist parsing algorithms (ii) Type checking for some programming languages (iii) Regular expressions whose complexities are less than (iv) Activation records ɵ (n3). a) P ↔ i, Q ↔ ii, R ↔ iv, S ↔ iii 2. A programming language which b) P ↔ iii, Q ↔ i, R ↔ ii, S ↔ iv allows recursion can be c) P ↔ ii, Q ↔ iii, R ↔ i, S ↔ iv implemented with static storage d) P ↔ iv, Q ↔ i, R ↔ ii, S ↔ iii allocation. [GATE -2016] 3. No L-attributed definition can be evaluated in the framework of Q.8 Match the following according to bottom-up parsing. input from the left column to the

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission complier phase (in the right column) a) P → (ii), Q→(iii), R→(iv),S→(i) that process it: b) P→(ii)Q→(i),R→(iii),S→(iv) List –I List-II c) P→(iii),Q→(iv), R→(i), S→(ii) P) Syntax tree i) Code generator Q) Character stream ii)Syntax analyzer d) P→(i),Q→(iv),R→(ii),S→(iii) R)Intermediate iii) Semantic analyzer [GATE- 2017] representation S) Token stream iv) Lexical analyzer

ANSWER KEY:

1 2 3 4 5 6 7 8 c d c b b c b c

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission EXPLANATIONS

Q.1 (c) Statement 1 True: Parsing A token is strictly a string of algorithms exists for some of the characters that the compiler programming languages whose. replaces with another string of Complexities are even less than characters. Token are defined as a θ(n^3). sort of search and replace. Tokens Statement 2 False: A programming can be used for common phrases, language allowing recursion cannot page elements, handy shortcuts, etc. be implemented with static storage Tokens are defined with either the allocation. define or replace tags and can use Statement 3 False: L-attributed single or paired tag syntax. definition can be evaluated in Examples framework of bottom up parsing. Statement 4 True: Code improvement can be done at both , source language level and <:replace [mailto] [email]:> Q.5 (b) Q.2 (d) Example Symbol table Let’s take a look at the advantages of Symbol table is a data structure in a the dynamic linking. compiler. It was devised for the Advantages of Dynamic Linking purpose that it can be used to 1. A stub is a piece of code which is manage information about variable used to stand in for functionality and their attributes. of some other program. This stub is included in the image or Q.6 (c) each binary routine reference is The lexical analysis of program. dynamic linking. 2. without dynamic linking, a Q.8 (c) library when replaced with a new version then all the programs that has reference to the library will be relinked to gain access. Dynamic linking avoids all such conditions.

Q.3 (c) To link the modules, firstly the modules need to be individually compiled. Once, the modules are compiled, linking is done. The linking is done at the link time.

Q.4 (b)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 2 PARSING

2.1 Syntax Analysis Fig.2.1 Position of parser in compiler model  It is second phase of compiler after lexical analyzer. The parser obtains a string of tokens from  It is also Called as Hierarchical Analysis the lexical analyzer, as shown in Fig. 2.1 & or Parsing. verifies that the string can be generated by  It Groups Tokens of source Program the grammar for the source language. into Grammatical Production A parser for a grammar of a programming  In Short Syntax Analysis Generates language Parse Tree  verifies that the string of tokens for a  A grammar gives a precise, yet easy-to- program in that language can indeed be understand syntactic specification of a generated from that grammar programming language  reports any syntax errors in the  A grammar gives a precise, yet easy-to- program understand syntactic specification of a  constructs a parse tree representation programming language of the program (not necessarily explicit)  From certain classes of grammars we  usually calls the lexical analyzer to can automatically construct an efficient supply a token to it when necessary parser that determines if a source  could be hand-written or automatically program is syntactically well formed. generated  Parser obtains a string of tokens from  is based on context-free grammars the lexical analyzer and verifies that it 2.2.1 Syntax Error Handling can be generated by the language for the source program. The parser should The error handler in a parser has simple to report any syntax errors in an stage goals:- intelligible fashion.  It should report the presence of errors  Languages evolve over a period of time, clearly and accurately. acquiring new constructs & performing  It should recover from each error additional tasks. These new constructs quickly enough to be able to detect can be added to a language more easily subsequent Errors. when there is an existing  It should not significantly slow down implementation based on a grammatical the processing to correct programs description of the language. 2.2.2 Error – Recovery Strategies 2.2 The Role of the Parser There are many different general strategies  Construct a parse tree that a parser can employ to recover from a  Report and recover from errors syntactic error. Although no strategy has  Collect information into symbols tables proven itself to be universally acceptable, a few methods have broad applicability. Here we introduce the following strategies.  Panic Mode Recovery:- On discovering an error, the parser discards input symbols one at a time until one of the designated sets of synchronizing token is found.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  Parse Level Recovery:- On discovering where N is a non terminal and X1……Xn are an error, a parser may perform local zero or more symbols, each of which is correction on the remaining input, that either a terminal or a non terminal. The it may replace a prefix of the remaining intended meaning of this notation is to say input by a string that allows the parser that the set denoted by N contains strings to continue. that are obtained by concatenating strings  Error Productions:- If we have a good from the sets denoted by X1…….Xn. In this idea of the common errors that might setting, a terminal denotes a singleton set, be encountered, we can augment the just like alphabet characters in regular grammar for the language at hand with expressions. We will, when no confusion is productions that generate the likely, equate a non terminal with the set of erroneous constructs. We then use the strings it denotes grammar augmented by these error Some examples: productions to construct a parser. Aa  Global Correction:- Given an incorrect says that the set denoted by the non string x and grammar G, the global terminal A contains the one-character correction algorithms will find a parser string a. tree for a related string y, such that the A aA number of insertions, deletions and says that the set denoted by A contains all changes of tokens required to transform x strings formed by putting an a in front of a into y is as small as possible. string taken from the set denoted by A. Together, these two productions indicate 2.3 Context-Free Grammars that A contains all non-empty sequences of Like regular expressions, context-free as and is hence (in the absence of other grammars describe sets of strings, i.e., productions) equivalent to the regular languages. expression a+. Additionally, a context-free grammar also We can define a grammar equivalent to the defines structure on the strings in the regular expression a* by the two Productions language it defines. B  A language is defined over some alphabet, B aB for example the set of tokens produced by a Context-free grammar is a four-tuple lexer or the set of alphanumeric characters. denoted as: The symbols in the alphabet are called G = (V, T, P, S) terminals. Where, A context-free grammar recursively defines 1. V is a finite set of symbols called as non- several sets of strings. Each set is denoted terminals or variables, by a name, which is called a non terminal. 2. T is a set of symbol that are called as The set of non terminals is disjoint from the terminals, set of terminals. One of the non terminals 3. P is a set of productions, and, are chosen to denote the language 4. S is a member of v, called as start symbol. described by the grammar. This is called Example: Terminals: id, +,-,*, /, (,) the start symbol of the grammar. The sets Non terminals: expr, op are described by a number of productions. Productions: expr  expr op expr Each production describes some of the expr (expr) possible strings that are contained in the expr - expr set denoted by a non terminal. expr id A production has the form op + |-|*|/ N X1n ...... X The start symbol: expr

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 2.3.1 Notational Connections 2.4 Derivations 1. These symbols are the terminals.  The basic idea of derivation is to i. Lower –case letters early in the consider productions as rewrite rules. alphabet e.g., a, b, c etc.  Whenever we have a non terminal, we ii. Operator symbols, e.g., +, - etc. can replace this by the right-hand side iii. Punctuation symbols, e.g., of any production in which the non parentheses, comma, etc. terminal appears on the left-hand side. iv. The digits 0, 1, 2,..., 9,  We can do this anywhere in a sequence v. Bold face strings such as id or if of symbols (terminals and non terminals) 2. The symbols are the non- terminals and repeat doing so until we have only i. Upper case letters early in the terminals left. alphabet, e.g., A, B, C Etc.  The resulting sequence of terminals is a ii. The letter S, which, when it appears, string in the language defined by the is usually the start symbol. grammar. iii. Lower case italic names such as expr Formally, we define the derivation Or Stmt relation )by the three rules 3. Upper case letters in the alphabet, e.g., 1: N   x,y,z, represent grammar symbols, i.e., either non-terminals or terminals if there is a production N  4. Lower case letters late in alphabet, e.g., 2:  u,v,...,z represent string of terminals 3:  5. Lower case greek letters,  ,, , if there is a b such that  and represent strings of grammar symbols. Thus a generic production could be where , and are (possibly empty) written as A  sequences of grammar symbols 6. If A , A 2...... A K are all (terminals and non terminals). productions with A on the left, we may  The first rule states that using a write A 1| 2| ...... k the production as a rewrite rule (anywhere alternatives for A. in a sequence of grammar symbols) is a 7. Unless otherwise stated, the left side of derivation step. the first production is the start symbol.  The second states that the derivation Expr Expr Expr / Expr*Expr / id relation is reflexive, i.e., that a sequence LMD: RMD: derives itself. ExprExpr Expr Expr Expr*Expr  The third rule describes transitivity, i.e., id Expr*Expr Expr Expr*id that a sequence of derivations is in itself a derivation. id id*Expr. Expr id*id. We can use derivation to formally define id id*id the language that a context-free grammar generates: TR T aTc R  R RbR  a T c   aa T cc   aa R cc 

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  aaRb R cc A leftmost derivation always chooses the  leftmost non-terminal to rewrite.  aa R bcc  Example: E E  E  id  E  id  id  aaRb R bcc A rightmost derivation always chooses the  rightmost non-terminal to rewrite.  aaRb R bRbcc  Example:  aa R bbRbcc  2.4.2 Ambiguity  aabb R bcc   A grammar that produces more than  aabbbcc one parse tree for some sentences is Derivation of the string aabbbcc using said to be ambiguous. grammar  Put another way, an ambiguous  a T c grammar is one that produces more  than one leftmost or more than one  aa T cc rightmost derivation for the same  sentence.  aa R cc   For certain types of parsers, it is  aa R bRcc describe that the grammar be made  unambiguous, for if it is not, we cannot  aa R bRbRcc  uniquely determine which parse tree to  aabR bRcc select for a sentence.   For some applications we shall also  aabR bRbRcc  consider methods whereby we can use  aabb R bRcc certain ambiguous grammars, together  with disambiguating rules that “throw  aabbb R cc  away” undesirable parse trees, leaving  aabbbcc us with only one tree for each sentence. Grammar Leftmost derivation of the string Ambiguous Unambiguous aabbbcc using grammar 1. There exists more 1.Unique “LMD/RMD” In this derivation, we have applied then one “LMD/RMD” for for a given string derivation steps sometimes to the leftmost a given string nonterminal, sometimes to the rightmost 2. More than one parse 2.LMD and RMD trees for a given String represent same parse and sometimes to a non-terminal that was tree neither. However, since derivation steps 3. LMD and RMD 3.Unique parse tree are local, the order does not matter. So, we represent different Parse for a given string might as well decide to always rewrite the tree leftmost non-terminal, as shown in a derivation that always rewrites the leftmost non-terminal is called a leftmost 2.4.3 Writing a Grammar derivation. similarly, a derivation that always rewrites the rightmost nonterminal  Grammars are capable of describing Is called a rightmost derivation. most, but not all, of the syntax of programming languages. 2.4.1 Left and Right most derivations  A limited amount of syntax analysis is Each derivation step needs to choose a done by a lexical analyzer as it produces non-terminal to rewrite and a production the sequence of tokens from the input to apply. characters.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  Certain constraints on the input, such as every sentence derivable from S is the requirement that identifiers be balanced, we use an inductive proof on the declared before they are used, cannot number of steps in a derivation. For the be described by a context-free basic step, we note that the only string of grammar. terminals derivable from S in one step is  The sequences of tokens accepted by a the empty string, which surely is balanced. parser form a superset of a Now assume that all derivations of fewer programming language; subsequent than n steps produce balanced sentences, phases must analyze the output of the and consider a leftmost derivation of parser to ensure compliance with rules exactly n steps. Such a derivation must be that are not checked by the parser. of the form ** Design: S (S)S  (x)S  (x)y 1. Rules for Token is described by Regular The derivations of x and y from S take Expression fewer than n steps so, by the inductive 2. Convert Regular Expression (R. E) to hypothesis, x and y are balanced. finite Automata. Therefore, string (x) y must be balanced. We have thus shown that any string derivable from S is balanced. We must next show that every balanced string is derivable from S. To do this we use 2.4.4 Regular Expression vs. Context- induction on the length of a string. For the Free Grammars basic step, the empty string is derivable Every construct that can be described by a from S, Now assume that every balanced regular expression can also be described by string of length less then 2n is derivable a context-free grammar. For example, the from S, and consider a balanced string w of regular expression (a|b)*abb and the length 2n, n  1. Surely w begins with a left grammar parenthesis. Then w can be written as (x)y where both x and y are balanced. Since x ARR0 RR aARR0 RR | bARR0 | aARR1 and y are of length less than 2n, they are ARR RR bARR 12 derivable from 5 by the inductive ARR23 RR bARR RR hypothesis. Thus, we can find a derivation

ARR3 RR  of the form Describe the same language, the set of strings of a’s and b’s ending in abb. We can Proving that w=(x)y is also derivable from S. mechanically convert a nondeterministic finite automaton (NFA) into a grammar 2.4.5 Rewriting ambiguous expression that generates the same language as grammars recognized by the NFA. If we have an ambiguous grammar Example 2.6: Consider the grammar EEE S (S)S| E num It may not be initially apparent, but this We can rewrite this to an unambiguous simple grammar generates all strings of grammar that generates the correct balanced parentheses, and only such structure. strings. To see this, we shall show first that As this depends on the associativity of  every sentences derivable from S is we use different rewrite rules for different balanced, and then that every balanced associativity. string is derivable from S. To show that

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission If  is left-associative, we make the EEE' grammar left-recursive by having a EEE' recursive reference to the left only of the EE' operator symbol: EEE' E ' num Operators with the same precedence must EE' have the same associativity for this to E ' num work, as mixing left-recursive and right- Now, the expression 2 3 4 can only be recursive productions for the same parsed as nonterminal makes the grammar ambiguous. As an example, the grammar EEE' EEE' EE' E ' num Exp Exp+Exp2 Exp Exp-Exp2 Exp Exp2 Exp2 Exp2*Exp3 Exp2 Exp2/Exp3 Exp2 Exp3 Exp3 num We get a slightly more complex syntax tree Exp3 (Exp) than in figure 3.10, but not enormously So. we handle right-associativity in a similar Grammar: Unambiguous expression fashion: We make the offending production grammar right-recursive: Seems like an obvious generalization of the EE'E principles used above, giving + and-. EE' The same precedence and different E ' num associativity. But not only is the grammar Non-associative operators are handled by Ambiguous, it does not even accept the non-recursive productions: intended language. For example, the string EE'E' num + num num is not derivable by this grammar. EE' In general, there is no obvious way to E ' num resolve ambiguity in an expression like Note that the latter transformation actually 1+2_3, where + is left-associative and _ is changes the language that the grammar right-associative (or vice-versa). generates, as it makes expressions of the Hence, most programming languages (and form num num num illegal. most parser generators) require operators So far, we have handled only cases where at the same precedence level to have an operator interacts with itself. This is identical associativity. easily extended to the case where several We also need to handle operators with operators with the same precedence and different precedence. This is done by Using associativity interact with each other, as for a non-terminal for each precedence level. example+ &-: The idea is that if an expression uses an

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission operator of a certain precedence level, then The non-terminal A generates the same its sub expressions cannot us operators of string as before but is no longer left lower precedence (unless these are inside recursive. This procedure eliminates all parentheses). Hence, the productions for a immediate from the A and A’

non-terminal corresponding to a particular productions (provided no i is), but it precedence level refers only to non- does not eliminate left recursion involving terminals that correspond to the same or derivations or two or more steps. higher precedence levels, unless parentheses or similar bracketing constructs Example 2.7 disambiguate the use of these grammar S Aa| b shows how these rules are used to make an A Ac|Sd| unambiguous version of grammar. The non terminal S is left recursive because SAa Sda , but it is not immediately left 2.4.6 Elimination of left recursion recursive. The following algorithm will systematically eliminate left recursion from A grammar is left recursive if it has non- any grammar. It is guaranteed to work if terminals A such that there is a derivation the grammar has no cycles (derivations of  A  A  for some string . the form A A) o Top-down parsing methods cannot handle - productions. left recursive grammars, so a transformation that eliminates left recursion is needed. Algorithm 1: Eliminating left recursion. Consider the following grammar for arithmetic expressions. Input: Grammar G with no cycles or - E E T | T productions. T T*F | F Output: An equivalent grammar with no F (E) | id left recursion. Eliminating the immediate left recursion Method: Apply the algorithm G. Note that (Production of the from A A ) to the the resulting non – left recursive grammar productions for E and then for T, we obtain may have -productions. ETE' E '  TE ' | 1. Arrange the non-terminals in some are T FT ' ARR1RR,ARR2RR...ARRnRR, 2. For i := i to n do begin T ' *FT ' | 3. For j :=i to i – I do begin F (E) | id 4. Replace each production of the form AiR Let production is of the from A A|   . A j No matter how many A – productions are By the productions there, we can eliminate immediate left ARR RR  RR RR  |...... |  RR RR recursion from them by the following i i j  technique. First, we group the A Where productions as ARRj   RR1 RR |  RR2 RR |...... | RRj RR  A A  |A  |...|A  |  |  |...|  1 2 m 1 2 n are all the current ARRj RR Where no i begins with an A. Then we productions; replace the A-productions by 5. End

A 1 A'|  2 A'|....| n A' 6. Eliminate the immediate left recursion A' A'|  A'|.....|  A'|  among the Ai – productions 1 2 m 7. End

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission The reason the procedure works is that cannot immediately tell which production after the iteration of the outer for loop in has to choose to expand stmt. In general, if

step (2), any production of the form A  RR12 RR |  RR RR are two A- ARRki RR ARR RR where k k. As a result, on the next iteration, non empty string derived from A, we do not

the inner loop (on j) progressively raises know whether to expand A to  RR1 RR or the lower limit on m in any production to RR RR . However, we may defer the ARR ARR RR a, until we must have m  2 im decision by expanding A to  A'. Then, 1. after seeing the input derived from it we Then, eliminating immediate left recursion expand A’ to or to RR RR . That for the A- productions forces m to be greater 2 than i. is, left-factored, the original productions become Example 2.8 AA'

Let us apply this procedure to grammar A'  RR12 RR |  RR RR (2.8), Technically, Algorithm 1 is not guaranteed to work, because of the - Algorithm 2. Left factoring a grammar. production. But in this case the production Input: Grammar G. A  turns out to be harmless. We order the non terminals S, A. There is Output: An equivalent left-factored grammar. no immediate left recursion among the S – productions, so nothing happens during Method: For each nonterminal ‘A’ find the step 2 for the case i=1. For i=2, we longest prefix common to two or more of substitute the S – productions in A Sd to its alternatives. If  , i.e., there is a obtain the following A – productions. nontrivial common prefix, replace all the A A Ac | Aad | bd | productions A  1|   2 |...| n |  Eliminating the immediate left recursion where  represents all alternatives that do among the A–productions yields the not begin with following grammar. A  A'|  S Aa | b A' 1 RR |  2 |....|  n A bdA'| A' Here A’ is a new nonterminal. Repeatedly A' cA'|adA'| apply this transformation until no two alternatives for a nonterminal have a 2.4.7 Left factoring common prefix. Example 2.9: The following grammar Left factoring is a grammar transformation abstracts the dangling-else problem; that is useful for producing a grammar S iEtS| iEtSeS| a suitable for predictive parsing. The basic  ------1 idea is that when it is not clear which of Eb  two alternative productions to use expand Here i, t, and e stand for it, then and else, E a non–terminal A, we may be able to and S for “expression” and “statement.” rewrite the A-productions to defer the Left-factored, this grammar becomes: decision until we have seen enough of the S iEtSS'| a input to make the right choice. For  S' eS|  ------2 example, if we have the two productions  stmt if expr then stmt else stmt | if expr Eb  then stmt On seeing the input token if, we hus, we may expand S to iEtSS’ on input i, and wait until iEtS has been seen to decide

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission whether to expand S’ to eS or to. Of (i.e., repeated scanning of the input ) ; course, grammars 1 and 2 both are because in the attempt to obtain the ambiguous, and on input e, it will not be leftmost derivation of the input string w, a clear which alternatives for S’ should be parse may encounter a situation in which a chosen. nonterminal A is required to be derived next, and there are multiple A- productions 2.5 Top – Down Parsing , such as A 1 |  2 ...|  n . In such a Parsing Methods situation deciding which A-productions to 1. Universal: There exist algorithms that derive A, and if this derivation finally leads can parse any context free grammar. to the derivation of w, then the parser These algorithms are too inefficient to announces the successful completion of be used anywhere. parsing. Otherwise, the parser resets the 2. Top down parsing: Build the parse input pointer to where it was when the tree from root to leaves ( using leftmost non-terminal A was derived, and it tries derivation). Examples are recursive another A-production. The parser will descent, and LL parser. continue this until it either announces the 3. Bottom up parsing: build the parse successful completion of the parsing or tree form leaves to root. reports failure after trying all or the Examples are operator precedence alternatives. parsing , LR(SLR, canonical LR and LALR ) 2.5.2 Recursive – Descent Parsing 2.5.1 Top – down parsing  It is a general form of top down parsing, that may involve , that is  Start from the start symbol and build making repeated scans of the input. the parse tree top-down.  The parser consists of a set of (possible  Apply a production to a non-terminal. recursive) procedures. The right hand side of the production  Each procedure is associated with a non will be the children of the non-terminal. terminal of the grammar that is  Match terminal symbols with the input responsible to derive based on the  May require backtracking current token.  Some grammars are backtracking free  For each non terminal in the production, (predictive) the procedure associated with the non Top down parsing attempts to find left terminal is called. most derivations for an input string w.  The sequence of matching and procedure Which is equivalent to constructing a parse calls in processing the input implicitly tree for the input string w that starts from defines a parse tree for the input. the root and creates the nodes of the parse tree in a predefined order. The reason that Example top down parsing seeks the leftmost Input card derivations for an input string w and not the right most derivations is that the input string w is scanned by the parse from left to right, one symbol/token at a time, and the leftmost derivations generate the leaves of the parse tree in left – to – right order, which matches the input scan order. S cAd Since top down parsing attempts to find the A ab leftmost derivations for an input string w. A Aa top-down parse may require backtracking

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Failure so backtracks to A. The transition diagram for above grammar Recursive-Descent parsing involves is as known backtracking, that is making repeated scans of the input. However, backtracking parsers is not seen frequently. One reason is that backtracking is rarely needed to parse programming language constructs. A left recursive grammar can cause a , even one with backtracking, to go into an infinite a loop. 2.5.3 Predictive parsing  The predictive parsing is a special form of recursive descent parsing, where no back tracking is required.  The special case of recursive parsing is called predictive parsing where no 2.5.4 Non-Recursive Predictive Parsing backtracking is required.  In many cases, by carefully writing a  The key problem during predictive grammar, eliminating left recursion from parsing is that of determining the it. And left factoring the resulting production to be applied for a non- grammar, we can obtain a grammar that terminal. can be parsed by a recursive-descent  The non-recursive parser in Fig.2.10 parser that needs no backtracking i.e., a looks up the production to be applied in predictive parser, a parsing table. In what follows, we A (recursive) procedure is associated with shall see how the table can be each non-terminal of a grammar. A constructed directly from certain production is chosen by looking at the look grammars. ahead symbol (the currently scanned input token). A non-terminal in the right side of the production results in a call to the corresponding procedure for the non- terminal. We can model predictive parsing using transition diagrams.

Example E E T | T T T*F| F Fig.2.10 Model of a non recursive F (E) | id predictive parser Eliminating the immediate left recursion to A table-driven predictive parser has an input buffer, a stack, a parsing table, and an the productions for E and then for T, we output stream. obtain ETE' The input buffer contains the string to be parsed, followed by $ a symbol used as a E  TE ' | right end marker to indicate the end of the T FT ' input string. T ' *FT ' | The stack contains a sequence of grammar symbols with $ on the bottom, indicating F (E) | id the bottom of the stack.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Initially, the stack contains the start symbol all j= 1, 2,.....k, then add to FIRST (X). of the grammar on top of $. The parsing If Y1 does not derive , then we add table is a two-dimensional array M [A, a ], . nothing more to FIRST (X), but if Y  where A is a nonterminal, and a is a 1 , then we add FIRST (Y ) and so on terminal or the symbol $. 2 The parser is controlled by a program that Example: behaves as follows. ETE' The program considers x, the symbol on E '  TE ' | top of the stack, and a, the current input symbol. T FT ' These two symbol determine the action or T ' *FT ' | the parser. There are three possibilities. F (E) | id a. If X=a=$, the parser halts and announces FIRST(E) {(,id} successful completion parsing. b. If X=a ≠$, the parser pops X off the stack FIRST(E') {  ,  } and advances the input pointer to the FIRST(T) {(,id} next input symbol. FIRST(T ') {*, } c. If X is a nonterminal, the program or an FIRST(F) {(,id} error entry. If for example, M[X, a] = {X UVM},the parser replaces x on top of the stack by WVU (with U on top). As 2.5.6 Follow sets output, we shall assume that the parser The follow se of a nonterminal A is the set just prints the production used; any of terminals that can appear immediately other code could be executed here if to the right of A in some sentential form, M[X, a] = error, the parser calls an error namely, recovery routine. S *CAa ,a is in the follow set of A. The behavior of the parser can be To compute FOLLOW (A) for all non- described in terms of its configurations, terminals A, apply the following rules until which give the stack contents and the nothing can be added to any FOLLOW set. remaining input. i) In place $, in FOLLOW(S), where S is the 2.5.5 First sets start symbol and $, is the input right end marker. The first set of a string  is the set of ii) If there is a production AB,   then terminals that begin the strings derived everything in FIRST( ) except for is from If  * .Then  is also in the first placed in FOLLOW(B) et of . iii) If there is a production AB , or To compute First ( ) for all grammar there is a production AB   where symbols , apply the following rules until . no more terminal or can be added to any FIRST( ) Contains (i.e., ) then FIRST set. everything in FOLLOW (A) is in follow i) If is terminal, then FIRST ( ) is { } (B). ii) If  is a production, then add to Example FIRST ( ). iii) If is nonterminal and X Y1Y2...Yk is production, the place a in a FIRST (X) if for come i, a is in FIRST (Y1) , and is in FIRST(Y1),....FIRST (Yi-1) ; that is , Y1...... Yi-1  . If is in FIRST (Y1) for

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission FOLLOW(E) {),$,} carefully, we see that it is tracing out a FOLLOW(E') {),$,} leftmost derivation for the input, That is, FOLLOW(T) { ,),$,} the productions output are those of a leftmost derivation, the input symbols that FOLLOW(T') { ,),$,} have FOLLOW(F) { ,*,),$,} Non- Input Symbol Construction of Predictive Parsing Table terminal id + * ( ) $ Input: Grammar G. E E→TE’ E→TE’ Output: Parsing Table M. E’ E→+TE’ E’→ϵ E’→ϵ T T→FT’ T→FT’ T’ T’→ϵ T’→*FT’ T’→ϵ T’→ϵ Method: F F→id F→(E) i) For each production A ,do steps 2&3. ii) For each terminal a in FIRST (  ), add Fig.2.11 parsing table M for grammar for to M [A, a ]. arithmetic expression iii) If is in FIRST ( ), add to M [A, b] for each symbol b in FOLLOW (A). If The Parsing of a string id + id * id is: is in FIRST ( ) and $ is in FOLLOW Stack Input Moves (A), add to M [A $] $E id + id * id$ Derivation using E→TE’ iv) Make each undefined entry of M be $E’T Id + id * id$ Derivation using T→FT’ error $E’T’F Id + id * id$ Derivation using F→id $E’T’id Id + id * id$ Popping id from the stack and advance one position in Example the input ETE' $E’T’ + id * id$ Derivation using T→ϵ E '  TE ' | $E’ + id * id$ Derivation using E’→+TE’ $E’T+ + id * id$ Popping + from the stack T FT ' and advance one position in T ' *FT ' | the input $E’T id * id$ Derivation using T→FT’ F (E) | id SE’T’F id * id$ Derivation using F→id FIRST(E) FIRST(T)  FIRST(F)  {(,id} $E’T’ id id * id$ Popping id from the stack FIRST(E') {  ,  } and advance one position in the input FIRST(T1 ) {*, } $E’T’ *id$ Derivation using T→*FT’ $E’T’F *id$ Popping * from the stack FOLLOW (E) =FOLLOW (E’) = { ), $} and advance one position in FOLLOW (T) = FOLLOW (T’) = {+,), $} the input FOLLOW (F) = { +, *, ), $} $E ‘T’ F id $ Derivation using F→id $E’T’ id id $ Popping id from the stack Example A predictive parsing table for this and advance one position in grammar is shown in Fig. 2.11. Blanks are the input $E’T’ $ error entries; non-blanks indicate a Derivation using T’→ϵ production with which to expand the top $E’ $ Derivation using E’→ϵ non-terminal on the stack. Note that we have not yet indicated how these entries $ $ Announce successful could be selected, but we shall do so shortly. completion of parsing With input id + id * id the predictive parser makes the sequence of moves in Fig. 2.12. 2.6 LL (1) Grammars The Input pointer points to the leftmost symbol of the string in the Input column. If A grammar is LL (1) grammar if its LL (1) we observe the actions of this parser parsing table has no multiply – defined

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission entries. No ambiguous and left recursive left parenthesis are in FIRST (T) as well. As grammar can be LL (1). another example, is in FIRST(E’) by rule (2). In the acronym LL (A), the first L stands for To compute FOLLOW sets, we put $ in the left-to-right scan of the input, the FOLLOW (E) by rule (1) for FOLLOW. By second L rule (2) applied to production F→ (E), the Stands for the leftmost derivation, and the right parenthesis is also in FOLLOW (E). By (1) indicates that the next input symbol is rule (3) applied to production used to decide the next parsing process E→TE’ $ and right parenthesis are in (i.e., length of the look ahead is “1”). In the * FOLLOW (E’). Since E', they are also in LL(1) parsing system, parsing is done by FOLLOW (T). scanning the input from left to right, and an For a last example of how the FOLLOW attempt is made to derive the input string rules are applied, the production E→TE’ in a leftmost order. The next input symbol implies, by rule (2), that everything other is used to decide what is to be done next in than in FIRST(E’) must be placed in the parsing process. FOLLOW (T). We have already seen that $ For a grammar to be LL (1), the following is in FOLLOW (T) condition must be satisfied: For every pair of productions 2.6.1 Some of the examples for finding { FIRST ( ) FIRST ( )  FIRST(  )   And IF FIRST ( ) contains, and FIRST (  ) Examples: S→AB does not contain Then A →a/b FIRST ( )  follow (A) =  B →d/c } Solution: FIRST (S) = FIRST (A) Example: Find first & follow for given = {a, b} grammar FIRST (A) = {a, b} ETE' FIRST (B) = {d, c} E '  TE ' | Example: S ABc T FT ' A a / b/ T ' *FT ' | Solution: F (E) | id FIRST (S) = {a, b, d,e,c} Then: FIRST (A) = {a, b, } FIRST(E) FIRST(T)  FIRST(F)  {(,id}, FIRST (B) = {d, e, } FIRST(E ') {  ,  } Example: S→ABCD FIRST(T ') {*, } A →d/ FOLLOW(E) FOLLOW(E ') {),$} B → e/l/ FOLLOW(T) FOLLOW(T ') {  ,),$} C →f/g D →m/ FOLLOW(F) { ,*,),$} For example, id and left parenthesis are Solution: added to FIRST(F) by rule (3) in the FIRST (S) = FIRST (A) + FIRST (BCD) definition of FIRST with i=1 in each case, = {d, e, l, f, g} since FIRST(id) ={id} and FIRST(‘(‘) = {( } Here ‘m’ is not included because ‘C’ by rule (1). Then by rule (3) with i = 1, the contains only terminals, not . So ‘D’ is not production T→FT1 implies that id and included.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Some of the examples for finding FOLLOW (  F contain ) = {g, f, h} 1. FOLLOW (C) : C is Non Terminal one & S →ABCDE Ends with ‘C’ only A →a/ Then FOLLOW (C) = FOLLOW (B) B →b/ = {g, f, h} C →c/ FOLLOW (D): {h} D →d/ FOLLOW (E): {g, f, h} E →e/ Important Observations ‘h’ came in Solution: FOLLOW (E) because F contain ‘ ’. So go FOLLOW() for ending term ‘h’ S {$} FOLLOW (F) = FOLLOW (D) A {b, c, d, e, $} = {h} B {c, d, e, $} C {d, e, $} 2.6.2 Construction of Predictive Parsing D {e, $} Tables E {$} FOLLOW (A) = FIRST (BCDE)  The following algorithm can be used to B contains , so proceed (Because beside B, construct a predictive parsing table for non-terminals are present) a grammar G. E contains , so add $ to it.  The idea behind the algorithm is the 1. following. Suppose A is a S →ABCDE production with a in FIRST(  ). Then , A →a/ the parser will expand A by a when the B →b/ current input symbol is . C →c/  The only complication occurs when a  D →d/ or * . In this case, we should again E →e/ expand A by a if the current input symbol is in FOLLOW (A), or if the $ on Solution : the input has been reached and $ is in Same as above grammar, it changes is E → e FOLLOW (A). then Since if ‘ ’ at E→ e Ends with ‘e’ only Algorithm for Construction of a predictive Do not processed, so ends with “$” parsing table. 2. S →aBDh Input: Grammar G. B →cC Output: Parsing table M. C →bC/ D →EF Method: E →g/ F →f/ 1. For each production of grammar, do steps 2 and 3. Solution 2. For each terminal a in FIRST(A), add FOLLOW S = $ to M[A, a]. FOLLOW (B) = FIRST (D)+{h} 3. If is in FIRST ( ), add to M[A, {h is terminal, it comes after D) b] for each terminal b in FOLLOW (A). If = FIRST (D) is in FIRST ( ) and $ is in FOLLOW = FIRST (E) + FIRST (F) (A), add to M[A, $]. = {g, f}

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 4. Make each under find entry of M be A grammar whose parsing table has no error. multiply-defined entries is said to be LL (1). The first “L” in LL(1) stands for scanning Example: Let us apply Algorithm above to the input from left to right, the second “L” grammar Fig. 2.13 Since FIRST (TE’) = for producing a leftmost derivation, and the FIRST (T) = {(, id}, production E →TE’ “l” for using one input symbol of lookahead causes M [E, ... and M[E, id] to acquire the at each step to make parsing action entry E →TE’. decisions. Production E’→+ TE’ causes M [E’, +] to LL (1) grammars have several distinctive acquire E’→+TE’. Production E’→ causes properties. No ambiguous or left recursive M[E’, )] and M [E’,$] to acquire E’→ since grammar can be LL (1). It can also be FOLLOW (E’) = {),$}. shown that a grammar G is LL (1) if and only if whenever A→  |  are two distinct 2.6.3 LL (1) Grammars parsing Table productions of G the following conditions hold: Algorithm for construction of predictive parser table can be applied to any grammar 1. For no terminal a do both and  G to produce a parsing table M. For some derive strings beginning with a, grammars, however, M may have some 2. At most one of and can derive the entries that are multiply defined. For empty string. example, if G is left recursive or ambiguous, * then M will have at least one multiply- 3. If , then  does not derive any defined entry. string beginning with a terminal in Example 2.13: Let us consider grammar FOLLOW (A). S →iEtss’/a S’→eS/ The main difficulty in using predictive E→b parsing is in writing a grammar for the The Parsing table for this grammar is source language such that a predictive shown in below Fig. parser can be constructed from the grammar Although left-recursion elimination Non- Input Symbol and left factoring are easy to do, they make terminal id + * ( ) $ E E→TE’ E→TE’ the resulting grammar hard to read and E’ E→+TE’ E’→ϵ E’→ϵ difficult to use for translation purposes. To T T→FT’ T→FT’ alleviate some of this difficulty, a common T’ T’→ϵ T’→*FT’ T’→ϵ T’→ϵ organization for a parser in a compiler is to F F→id F→(E) Fig. 2.13 parsing table M for grammar use a predictive parser for control constructs and to use operator precedence for The entry for M[S’, e] contains both S’ → eS expressions. However, if a LR parser for control constructs and to use operator and S’ → E, since FOLLOW (S’) = { e, $ }. The Grammar is ambiguous and the precedence for expressions. However, if a ambiguity is manifested by a choice in what LR parser generator, one can get all the production to use when an e (else) is seen. benefits of predictive parsing and operator precedence automatically. We can resolve the ambiguity if we choose S’ → eS. This choice corresponds to On the erroneous input, id * +id the parser associating else with the closest previous and error recovery mechanism of Fig.2.14 then. Note that the choice S’→ would behaves as in Fig. 2.15 prevent e from ever being put on the stack or removed from the input, and is therefore surely wrong.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission STACK INPUT REMARK  These routines may change, insert, or delete symbols on the input and issue $E ) id * + id$ Error, appropriate error messages. $E id * + id$ skip )  They may also pop from the stack. It is $ E’T id * + id$ id is in questionable whether we should permit $ E’T’F id * + id$ FIRST (E) alternation of stack symbols or the $ E’T’ id id * + id$ pushing of new symbols onto the stack, $ E’T’ *id $ since then the steps carried out by the $ E’T’F * $ parser might $ E’T’F + id$  Not correspond to the derivation of any $ E’T’F + id$ word in the language at all. $ E’T’ + id$ $ E’ + id$  In any event, we must be sure that there $ E’T’+ + id$ Error, M [F, is no possibility of an infinite loop. $ E’T id$ +1} = synch  Checking that any recovery action $ E’T’F id$ eventually results in an input symbol $ E’T’ id id$ being consumed (or the stack being $ E’T’ $ shortened if the end of the input has $ E’ $ F has been been reached) is a good way to protect $ $ popped against such loops. Fig. 2.15 Parsing and error recovery moves made by predictive parse 2.6.5 BOTTOM-Up Parsing

Non- Input Symbol  Start from the input sequence of tokens termin id + * ( ) $  Apply a production to the sentential al form and rewrite it to a new one. E E→T E→TE’ syn synch  Keep track of the current sentential E’ ch form. E’ E→+ E’ E’→ϵ TE →ϵ If we considered the input string from left T T→F T→FT’ to right. A search is done such that Right of T’ a production = a substring (left most) of x. T’ T’→ϵ T’→* T’ T’→ϵ Here this production is called Handle. FT’ →ϵ Conversely the string abbcde can be F F→id synch sync F→(E) syn synch h ch derived as follows: S  aAcBe since S aAcBe Fig. 2.14 Synchronizing tokens added to aAcde since B d table aAbcde since A Ab abbcde since A b. The above discussion of panic-mode recovery does not address the important Example: issue of error messages. In general, Find parse tree for expression abbcde informative error messages have to be considering the productions supplied by the compiler designer. S aAcBe A Ab 2.6.4 Phrase - level recovery. Ab  Phrase-level recovery is implemented Bd by filling in the blank entries in the Solutions. predictive parsing table with pointers Consider a substring (leftmost) of the given to error routines. string x = abbcde

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Find a production whose right side is a the left of that production, and if the substring (left most) of x. substring is chosen correctly at each Clearly A →b satisfies. step, a right most derivation is tracked Abbcde aAbcde since A →b out in reverse. aAcde since A →Ab  A more general method of shift aAcBe since B →d reducing parsing is called LR parsing. S since S →aAcBe Consider the grammar: 2.6.7 Stack Implementation or SR E → E + E | E * E | id Parsing And the rightmost derivation: A parser goes on shifting the input symbol E → E + E → E + E * E → E + E * id →E + id * onto stack until a handle comes on the top id →id + id * id of the stack. When a handle appears on The handles of the sentential forms the top of the stack, it performs reduction. occurring in the above derivation are This implementation makes use of a stack shown in the following table. Sentential Handle to hold grammar symbols and an input Form buffer to hold the string w to be parsed, id + id * id E→ is at position preceding + which is terminated by the right end E + id *id E→ is at position + marker $, the same symbol used to mark E + E * id E→ is at the position following + the bottom of the stack. The stack is initially E +E * E E→ E * E at the position following empty, and buffer contains the entire string + w. The parser shifts zero or more symbols E + E E→ E + E at the position preceding the end marker. from the input on to the stack until handle  appears on the top of the stack. The Fig.2.16 Reduction made by Shift – parser then reduces to the left side of the Reduce parser appropriate production. This cycle is Therefore, the bottom-up parsing is an repeated until is empty giving the attempt to detect the handle or right configuration ($S, $). If the parser enters sentential form. And whenever a handle is ($S, $), then it announces the successful detected, the reduction is performed. This completion of parsing. Thus, the primary is equivalent to performing rightmost operation of the parser is to shift and derivations in reverse and is called “handle reduce. pruning”. 2.6.6 Shift Reduce Parsing  It is a general style of bottom-up syntax analysis  An easy to implement form of shift reducing implement form of shift reducing parsing is called operator precedence parsing  Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves(the bottom)and working up towards the 2.6.8 Stack Operations root (the top).  At each reduction step a particular  Shift: Shift the next input symbol to the substring matching the right side a of a top of the stack production is replaced by the symbol on

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  Reduce: replace the handle at the top of  That’s why we use a state to uniquely the stack with the corresponding identify a part of a handle ( viable prefix) nonterminal so that stack scanning becomes  Accept: announce successful completion unnecessary. of the parsing  Every shift-reduce parser for such a  Error: call an error recovery routine grammar can reach a configuration in which the parser, knowing the entire Example: stack content and the next input E →TE’ symbol, cannot decide whether to shift E →+ TE’| or reduce (a shift/reduce conflict), or T →FT’ cannot decide which of several T →* FT’| reductions to make(a reduce/reduce F →(E) |id conflict). These grammars are not there Take the input string: id + id * id . The in LR(k) class of grammar, we refer to various steps in parsing this string are them as non-LR grammars. known in the following table in terms of the contents of the stack and unspent input. 2.6.11 Shift Reduce Conflict Stack Moves contents Input $ Id +id * id$ Shift id An ambiguous grammar can never be LR. $id + id * id$ Reduce by F →id Consider the following grammar $F + id * id$ Reduce by T →F Stmt →if expr then stmt $T + id * id$ Reduce by E →T | if expr then stmt else stmt $T +id * id$ Shift + | other $E + id * id$ Shift id If we have shift reduce parser in $E + id *id$ Reduce by F→id configuration $E + F *id$ Reduce by T →F $E + T *id$ Shift * STACK INPUT $E + T* id$ Shift id ...... if expr then stmt else....$ $E + T*id $ Reduce by F →id We cannot tell whether if expr then stmt is $E + T*F $ Reduce by T →T*F the handle, no matter what appears what $E + T $ Reduce by E →E + T appears below it on the stack. Here is a $E $ Accept shift/reduce conflict. Depending on what Fig.2.17 configuration of shift-reduce follows else on the input, it might be parser on input id+id*id correct to reduce “if expr then stmt” to 2.6.9 Viable prefixes stmt, or it might be correct to shift else and then to look for another stmt to complete  The set of prefixes of right sentential the alternative “if expr then stmt else stmt”. forms that can appear on the stack of Thus we cannot tell whether to shift or shift reduce parses are called viable reduce in this case, so the grammar is not prefixes. LR(1).  An equivalent definition of a viable prefix is that it is a prefix of a right 2.6.12 Reduce/Reduce Conflict sentential form that does not continue Consider the grammar past the right end of the rightmost 1. Stmt →id(parameter_list) handle of that sentential form. 2. Stmt →expr :=expr 3. Parameter_list→parameter_list, 2.6.10 Conflicts During Shift-Reduce parameter Parsing 4. Parameter_list →parameter  The stack has to be scanned to see 5. Parameter →id whether a handle appears on it. 6. expr →id(expr_list)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 7. expr →id 1. *is of highest precedence and left 8. expr_list →expr_list, expr associative 9. expr_list →expr 2. + is the lowest precedence and left A statement beginning with A (I,J) would associative appear as the token stream id(id , id) to the The operator-precedence relations for this parser. After shifting the first three tokens grammar are solved in the following table. onto the stack, a shift-reduce parser would + * ( ) id $ be in configuration. + > < < > < > STACK INPUT * > > < > < > ...... id (id id)...... ( < < < = < It is evident that id on top of the stack must ) > > > > be reduced, but by which production? The Id > > > > correct choice is production (5) if A is a $ < < < < procedure and production (7) if A is an array. The stack does not tell which 2.6.14 Using Operator Precedence information in the symbol table obtained Relations from the declaration of A has to be used. Hence reduce-reduce conflict is here.  The intension of the precedence relations is to delimit the handle of a 2.6.13 Operator-Precedence Parsing right-sentential form, with < marking the left end, = appearing in the interior It is a form of shift-reduce parsing, which is of the handle, and > marking the right easy to implement. These grammars have end. the property (among other essential  To be more precise, suppose we have a requirements) that no production right right-sentential form of an operator side is or has two adjacent non-terminals. grammar. A grammar with these properties is called  The fact that no adjacent nonterminals an operator grammar. appear on the right side of productions In operator precedence parsing, we define implies that no right-sentential form three disjoint precedence relations, <, =, will have two adjacent nonterminals and >, between certain pairs of terminals. either. These precedence relations guide the  Thus, we may write the right-sentential selection of handles and have the following from as βRR RR aRR RRβRR RR ...... meanings. 0 11 βRRni RR is either (the empty string) Relation meaning or a single nonterminal, and each a; is a a b A “takes precedence over” b =, and > holds. Fig: 2.18  Further, let us use $ to mark each end of the string, and define $ < b and b > $ for The common way of determining what all terminals b. precedence relations should hold between  Now suppose we remove the a pair of terminals is notations of nonterminals from the string and place associativity and precedence of operators. the correct relation < =, or >, between For example if * is to have higher each pair of terminals and between the precedence than +. We make + < * and * > + endmost terminals and the $’s marking consider the following operator grammar: the ends of the string. E → E + E | E * E | (E)| id with assumptions:

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission For example, suppose we initially have between * and $. These precedence the right-sentential form id + id * id and relations indicate that, in the right- the precedence relations are those sentential form E+E*E, the handle is E*E given in Fig.2.19. These relations are Note how the E’s surrounding the * become some of those that we would choose to part of the handle. parse according to grammar. Since the non-terminals do not influence id + * $ the parse, we need not worry about Id .> .> .> distinguishing among them. A single + <. .> <. .> marker “non-terminal” can be kept on the * <. .> .> .> $ <. <. <. stack of a shift-reduce parser to indicate placeholders for attribute values. Fig.2.19 operator-precedence relations It may appear from the discussion above that the entire right-sentential form must Then the string with the precedence be scanned at each step to find the handle. relations inserted is: Such is not the case if we use a stack to $ <.id.> + <.id.> * <.id.> $ store the input symbols already seen and if For example, < is inserted between the the precedence relations are used to guide leftmost $ and id since <. Is the entry in row the actions of a shift-reduce parser. If the . $ and column id. The handle can be found precedence relation < or  holds between by the following process. the topmost terminal symbol on the stack 1. Scan the string from the left end until and the next input symbol, the parser the first .> is encountered. In Fig (2.18) shifts; it has not yet found the right end of above, this occurs between the first id the handle. If the relation > holds, a and +. reduction is called for. At this point the 2. Then scan backwards (to the left) over parser has found the right end of the . any  s until a < is encountered. In Fig handle, and the precedence relations can be (2.18), we scan backward to $. used to find the left end of the handle in the 3. The handle contains everything to the stack. left of the first .> and to the < If no precedence relation holds between a encountered in step (2), including any pair of terminals (indicated by a blank intervening or surrounding entry in Fig. 2.19) then a syntactic error has nonterminals. (The inclusion of been detected and an error recovery routine surrounding nonterminals is necessary must be invoked, as discussed later in this so that two adjacent nonterminals do section. The above ideas can be formalized not appear in a right-sentential} form ) by the following algorithm. In (Fig. 2.18), the handle is the first id. 2.6.15 Operator precedence parsing If we are dealing with grammar for algorithm. arithmetic expression we then reduce id to E. At this point we have the right-sentential Input: An input string w and a table of form E+id*id. After reducing the two precedence relations. remaining id’s to E by the same steps, we Output: if w is well formed, a skeletal parse obtain the right-sentential form E +E*E tree, with a placeholder nonterminal E Consider now the string $+*$ obtained by Labeling all interior nodes; otherwise an deleting the non-terminals. Inserting the error indication. precedence relations, we get $<. + <. *.>$ Method: Initially, the stack contains $ and Indicating that the left end of the handle the input buffer contains the string w$. to lies between + and * and the right end Parse, we execute the program in Fig. 2.20

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1) Set ip to point to the first symbol of w$; Given Grammar is not LL(1). Because it is a 2) repeat forever left recursive Grammar 3) if $ is on top of the stack and ip points to $ Example: then Check, whether the given Grammar is 4) return Else begin LL(1)? 5) let a be the topmost terminal symbol on the stack And let b be the: symbol pointed to S aAa/  Check for LL (1) by ip; A abs / c 6) if a < b or a=b then begin Solution: Here 7) push b onto the stack; (S aAa/ ) 8) advance ip to the next input symbol; End;  9) else if a > b then /* reduce*/ Sa abs a/ } it is in the form of 10) repeat 11) pop the stack (If first ( ) follow(A) =   satisfies 12) until the top stack terminal is related by < then only it is LL(1)) To the terminal most recently popped A abS/ C 13) else error end Example: Check, whether the given Fig 2.20 Operator-precedence parsing Grammar is LL(1)? algorithm First Method of Checking: First (a abSa ) Follow (S) Check whether Grammar is LL(1) or not? = a {$, a} Important observation: Because S is starting product Suppose given Grammar is Ambiguous = a  0 Grammar. So it is not LL (1) It is not LR (0), LL (1) (or) CLR (1), SLR (1), (or) LALR (1) Second Method of checking: check with Multiple Entries 2.6.16 LL (1) Grammar (Validations) : a S aAa  1. A |  |  ...... (    are Non S  1 2 3 1, 2, 3 S   terminals)If first (1, ) and first (2 ) Because of Multiple Entries it is not LL (1) are mutually disjoint then first ( )  2.6.17 Operator-precedence Relations from associatively and Precedence first ( ) =  2. A/  First ( ) follow (A) =  The operator precedence parsers 3. The given grammar should not be left usually do not store the precedence recursive. table with the relationships; rather they If above 3 conditions are satisfied, then are implemented in a special way. only Given Grammar is LL (1).  Operator precedence parsers use Observation: precedence functions that map terminal symbols to integers, and so the For given Grammar, if LL(1) parsing table precedence relations between the is constructed without any multiple entries, symbols are implemented by numerical then Grammar is LL(1). comparison Example: Check, whether the given  Not every table of precedence relations Grammar is LL(1)? has precedence functions but in M M zc / h practice for most grammars such Solution: functions can be designed.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1. If operator 1 has higher 1. + is of highest precedence and Right- precedence than operator  , make associative, 2 2. * and/are of next highest precedence > and < . For example, if* and left-associative, and has higher precedence than +, make 3. + and – are of lowest precedence and * .> + <*. These relations ensure left-associative, (Blanks denote error that, in an expression of the form entries.) The reader should try out the E+E*E+E, the central E*E is the table to see that it works correctly, handle that will be reduced first. ignoring problems with unary minus 2. If and are operators of equal for the moment. Try the table on the precedence (they may in fact be the input id *(id +id)- id/id, for example. same operator), then make > Fig.2.21 Operator – precedence and > if the operators are left- relations. associative, or make > and < if they are right-associative. For 2.6.18 Precedence Functions example, if + and – are left-  Compilers using operator-precedence associative, then make + .> +, +. >-,- parsers need not store the table of .>-, and - . > +. If it is right precedence relations. associative, then make +<+. These  In most cases, the table can be enclosed relations ensure that E-E+E will by two precedence functions f and g have handle E-E selected and E+E+E that map will have the last E+E selected. terminal symbols to integers. We 3. Make 0 <. id, id . >  , < . (, ( < ) > , attempt to select f and g so that, for >), . > $, and $ < for all symbols a & b, operators . Also, let 1. f(a) < g(b) whenever a < . b, ( = ) $ < . ( $ <. Id 2. f(a) – g(b) whenever a = b, and ( < . ( id. > $ ) . > $ 3. f(a) > g(b) whenever a > b. (<. id id. > ) ) . > ) Thus the precedence relation between a These rules ensure that both id and and b can be determined by a numerical (E) will be reduced to E. Also, $ comparison between f(a) and g(b). Note, serves as both the left and right however, that f(a) and g(b) are. The loss of endmarker, causing handles to be error detection capability is generally not found between $’s whenever possible. considered serious enough to prevent the usage of precedence functions where Example 2.17: Figure 2.21 contains the possible errors can still be caught when a operator-precedence relations for grammar reduction is called for and no handle can be for arithmetic expression assuming found Not every table of precedence relations has + - * / ↑ id ( ) $ precedence functions to encode it, but in + .> .> <. <. <. <. <. .> practical cases the functions usually exist. - .> .> <. <. <. <. <. .> .> * .> .> .> .> <. <. <. .> .> Example 2.18: The precedence table of Fig. / .> .> .> .> <. <. <. .> .> 2.21 has the following pair of precedence ↑ .> .> .> .> <. <. <. .> .> Id .> .> .> .> .> .> functions, ( <. <. <. <. <. <. <. .> + - * / ↑ ( ) id = $ ) .> .> .> .> .> .> f 2 2 4 4 4 0 6 6 0 $ <. <. <. <. <. <. <. G 1 1 3 3 5 5 0 5 0 Fig. 2.22

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission For example, * < id, and f(*) < g(id). Note length of the longest path from the that f(id) > g(id) suggests that id > id; but, group of gRRbRR. in fact, no precedence relation holds between id and id. Other error entries in Fig. 2.21 are similarly replaced by one or another precedence relation. A simple method for finding precedence functions for a table, if such functions exist, is the following.

Algorithm4. Constructing precedence functions.

Input: An operator precedence matrix. Fig. 2.23 Graph representing precedence Output: precedence functions, representing functions the input matrix, or an indication that none exist. There are no cycles, so precedence function exists. As fRR$RR and gRR$RR have no out- Method: edges, f ($) = g($) = 0. The longest path 1. Create symbols fRRaRR and gRRaRR, for from g RR+RR has length 1, so =1. There is a each a that terminal or $. path from gRRidRR to fRR*RR to g RR+RR to 2. Partition the created symbols into as fRR+RR to g RR+RR to fRR$RR, so = 5. The many as possible, in such a way that if a resulting precedence functions are: = b, then fRRaRR and gRRbRR are in the + * id $ same group. Note that we may have to f 2 4 4 0 put symbols in the same group even if g 1 3 5 0 they are not related by=. For example, if Fig.2.24 a = b and c = b, then fRRaRR and fRRcRR must be in the same group, 2.6.19 Error Recovery in Operator- since they are both in the same group as Precedence Parsing gRRbRR. If, in addition, c = d, then fRRaRR and gRRdRR are in the same There are two points in the parsing process group even through a = d may not hold. at which an operator-precedence parser 3. Create a directed graph whose nodes can discover syntactic errors: are the groups found in (2). For any a 1. If no precedence relation holds between and b, if a < . b, place an edge from the the terminal on top of the stack and the group of gRRbRR to the group fRRaRR. If current input. a > b, place an edge from the group of 2. If a handle has been found, but there is fRRaRR to that of gRRbRR. Note that an no production with this handle as a edge or path from fa to gb means that right side. f(a) must exceed g(b); a path from 2.6.20 Handling Errors during gRR RR to fRR RR means that g(b) must b a Reductions exceed f(a) 4. If the graph constructed in (3) has a  In general, the difficulty of determining cycle, then no precedence functions appropriate diagnostics when no legal exist. If there are no cycles, let f(a) be right side is found depends upon whether the length of the longest path beginning there are a finite or infinite number of at the group of fRRaRR; let g(a) be the

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission possible strings that could be popped in Example 2.19: lines (10-12) of Fig.2.20. let us reconsider grammar for arithmetic  Any such string bRR1RR bRR2RR.... expression. bRRkRR must have = relations holding E → E+E | E-E | E*E | E/E | E+E | (E) | -E | id between adjacent symbols, a < bRR1RR The precedence matrix for this grammar = bRR2RR = .... = bRRkRR. was shown in Fig. 2.21, and its graph is  If an operator precedence table tell us given in Fig. 2.24. there is only one edge, that there are only a finite number of because the only pair related by = is the left sequences of terminals related by =then and right parenthesis. All but the right we can handle these strings on a case- parenthesis are initial, and all but the left by-case basis. parenthesis are final. Thus the paths from  For each such string x we can determine an initial to a final node are the paths +,- in advance a minimum-distance legal ,*,/,id, and to of length one, and the path right side y and issue a diagnostic from (to) of length two. There are but a implying that was found when y was finite number, and each corresponds to the found when y was intended. terminals of some production’s right side in It is easy to determine all strings that the grammar. Thus the error checker for could be popped from the stack in steps reductions needs to check that the proper (10-12) of Fig. 2.20. these are evident in set of nonterminal markers appear among the directed graph whose nodes the terminal strings being reduced. represent the terminals, with an edge Specifically, the checker does the following; from a to b if and only if a = b. Then the 1. If +,-,*,/, or +is reduced, it checks that possible strings are the labels of the nonterminals appear on both sides. If nodes along paths in this graph. Paths not, it issue the diagnostic missing consisting of a single node are possible. operand However, in order for a path bRR1RR 2. If id is reduced, it checks that there is no bRR2RR ..... bRRkRR to be “poppable” on nonterminal between to the right or some input, there must be a symbol a left. If there is it can warn missing (possibly $) such that a < bRR1RRCall operator such a bRR1 RR initial. Also, there must 3. If ( ) is reduced, it checks that there is a be a symbol c (possible $) such that nonterminal between the parentheses. bRRkRR >c. Call bRRkRR final:Only then If not, it can say no expression between could a reduction be called for and parentheses Also it must check that no bRR1RR bRR2 RR....bRRkRR the nonterminal appears sequence of symbols popped. If the on either side of the parentheses. If one graph has a path from an initial to a does, it issue the same diagnostic as in final node containing a cycle, then (2). are an infinity of strings that might be If there are an infinity of strings that popped; otherwise, there are only a may be popped, error messages cannot finite number. be tabulated on a case-by-case basis. We might use a general routine to determine whether some production right side is close (say distance 1 or 2, where distance is measured in terms of tokens, rather than characters, inserted deleted, or changed) to the popped Fig. 2.25 Graph for precedence matrix of string and if so, issue a specific Fig. 2.21 diagnostic on the assumption that production was intended. If no

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission production is close to the popped string, routine; the same routine could be used in we can issue a general diagnostic to the several places. Then when the parser effect that “something is wrong in the consults the entry for a and b in step (6) of current line”. Fig, 2.20, and no precedence relation holds between a and b, it finds a pointer to the

2.6.21 Handling Shift/Reduce Errors error-recovery routine for this error. We must now discuss the other way in which the operator- precedence parser Example 2.20: detects errors. When consulting the Consider the precedence matrix of Fig. 2.21 precedence parser detects errors. When again. In Fig. 2.25, we show the rows and consulting the precedence matrix to decide columns of this matrix that have one or whether to shift or reduce (lines 8 and 9 of more blank entries, and we have filled in Fig. 2.20), we may find that no relation these blanks with the names of error holds between the top stack symbol and the handling routines. first input symbol. For example, suppose a Id ( ) $ and b are the two top stack symbols (A is at id e3 e3 .> .> the top), c and d are the next two input ( <. <. .> e4 symbols, and there is no precedence ) e3 e3 = .> relation between b and c. To recover, we must modify the stack, input or both. we $ <. <. e2 e1 may change, we may change symbols, Fig. 2.26 Operator-precedence matrix inserted symbols onto the input or stack, or with error entries delete symbols from the input or stack. If The substance of these error handling we insert or change, we must be careful routines is as follows: that we do not get into an infinite loop, E1:/* called when whole expression is where for example, we perpetually insert missing */ symbols at the beginning of the input Insert id onto the input without being able to reduce or to shift any issue diagnostic: “missing operand” of the inserted symbols. e2: /* called when expression begins with a One approach that will assume us no right parenthesis */ infinite loop is to guarantee that after delete) from the input recovery the current input symbol can be issue diagnostic: “unbalanced right shifted (if the current input is $, guarantee parenthesis” that no symbol is placed on the input, and e3: /* called when id or ) is followed by id the stack is eventually shortened). For or ( * / example, given ab on the stack and cd on insert + onto the input the input, if a ≤cPP2PP we might pop b issue diagnostic: “missing operator” from the stack. Another choice is to delete c e4: /* called when expression ends with a from the input if b ≤ d. A third choice is to left parenthesis */ find a symbol e such that b ≤ e ≤ c and pop ( from the stack insert e in front of c on the input. More issue diagnostic : “missing right generally we might insert a string of parenthesis” symbols such that b≤ eRR1RR≤ eRR2≤...... ≤ Let us consider how this error-handling eRRnRR≤c if a single symbol for insertion mechanism would treat the Erroneous could not be found. The exact action chosen input id + ). The initial actions taken by the reflect the compiler designer’s intuition parser are to shift id, reduce it to E (we regarding what error is likely in each case. again use E for anonymous nonterminals For each blank entry in the precedence on the stack), and theft to shift the +. We matrix we must specify an error – recovery now have configuration

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission STACK INPUT one parser to another. The parsing $E + )$ program uses a stack to store a string of the Since + > ) a reduction is called for, and the S0X1X2S2...... XmSm, where sm is on top. handle is +. The error checker for The parsing table consists of two parts, a reductions is required to inspect for E’s to parsing function and a goto function. The left and right. Finding one missing, it issues program driving the LR parser behaves as the diagnostic missing follows: it determines sm. The state operand and does the reduction anyway. currently on the top of the stack, and ai. The Our configuration is now $E)$ current input symbol. If then consults There is no precedence relation between $ action [sm, al]. The parsing action table and), and the entry in Fig. 2.25 for this pair entry for state sm and input al which can of symbols is e2. Routine e2 causes have one of four values: diagnostic Unbalanced right parenthesis To i. Shift s, where s is a state be printed and removes the right parenthesis ii. Reduce by a grammar production A →β from the input. We are now left with the iii. accept final configuration for parser. iv errors $E $

2.7 LR Parsing

A general method of shift-reduce parsing is LR parsing. The L stands for scanning the input from left to right. The R stands for constructing a rightmost derivation in reverse. The k stands for the number of look ahead input symbol used to make parsing decisions. Fig.2.27 Model of an LR parser LR parsing method is the most general non- Algorithm: LR parsing algorithm backtracking shift reduce parsing method. There are three techniques for constructing Input: An input string w and a LR parsing a LR parsing table for a grammar. table with functions action and goto for a The first method, called simple LR(SLR), is grammar G. the easiest to implement, but the least powerful of the three. Output: if w is in L(G) a bottom-up parse The second method, called canonical LR, is for w; otherwise, an error indication the most powerful and the most expensive. The third method called, lookahead LR Method: initially the parser has s0 on its (LALR), is intermediate in power and costs stack, where s0 is the initial state and w$ in between the other two. the input buffer. The parser then executes the program given below until an accept of 2.7.1 LR Parsing Algorithm error action is encountered. Set ip point to the first symbol of w$; The schematic form of a LR parser is shown Repeat for given begin below. It consists of an input, an output, a Let s be the state on top of the stack and stack, a driver program, and a parsing table A the symbol pointed by ip; that has two parts (action and goto). The if action [s, a] = shift s’ then begin driver program is the same for all LR Push a then s’ on top of the stack; parsers; only the parsing tale changes from Advance ip to the next input symbol

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission End symbol 5 have both been pushed onto the else if action [s, a] = reduce A → β then stack and id has been removed from the begin input. Pop 2 * | β | symbols off the stack; Then * becomes the current input symbol, Let s’ be the state now on top of the stack; and the action of state 5 on input * is to Push A then goto [s’, A] on top of the stack; reduce by F → id. Two symbols are popped output the production A → β off the stack (one state symbol and one end grammar). State 0 is then exposed. Since else if action [s, a]= accept then the goto of state 0 on F is 3, F and 3 are return pushed onto the stack. We now have the else error() configuration in line (3) end Line Stack Input Moves Example: 1 0 id*id + Shift 1. E → E + T id$ 2 0id 5 *id + id$ Reduce by F 2. E → T →id 3. T → T * F 3 0F 3 *id + id$ Reduce by T 4. T → F →F 5. F → (E) 4 0T 2 *id + id$ Shift 6. F → id ACTION Go to 5 0T 2 * 7 id + id$ Shift State ID + * ( ) $ E T F 6 0T 2 * 7 id +id$ Reduce by F 5 →id 0 S5 S4 1 2 3 7 0T 2 * 7 F +id$ Reduce by T 1 S6 acc 10 →T*F 2 R2 S& R2 R2 8 0T 2 +id$ Reduce by E 3 R4 R4 R$ R4 →T 4 S5 S$ 8 2 3 9 0E 1 +id$ Shift 5 R6 R6 R6 R6 10 0E 1 + 6 id$ Shift 6 S5 S4 9 3 11 0E 1 + 6id $ Reduce by F 7 S5 S4 10 5 →id 8 S6 12 0E 1 + 6F 3 $ Reduce by T 9 R1 S7 R1 →F 10 R3 R3 R3 13 0E 1 + 6 T $ E →E + T 11 R5 R5 R5 9

2.7.2 The codes for the actions are 14 0E 1 $ Accept Fig. 2.28 i. Si means shift and stack state i ii. Rj means reduce by production 2.7.3 LR Grammars numbered j iii. Acc means accept How do we construct a LR parsing table for iv. Blank means error a given grammar? A grammar for which we can construct a parsing table is said to be a On input id * id +id the sequence of stack LR grammar. There are context-free and input contents are shown in the grammars that are not LR, but these can following table. For example, at line(1) the generally be avoided for typical LR parser is in state 0 with id as the first programming- language constructs. input symbol. The action in row 0 and Intuitively, in order for a grammar to be LR column id of the action field of parsing it is sufficient that a left-to–right shift- table is s5, meaning shift and cover the reduce parser be able to recognize handles stack with state 5. That is what happened when they appear on top of the stack. at line(2), the first token id and the state

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission An LR parser does not have to scan the 5 4 entire stack to know when the handle 7 S S 1 appears on top. Rather, the state symbol on 5 4 0 8 S S1 top of the stack contains all the information 6 1 it needs It is a remarkable fact that if it is 9 r1 r r1 r1 possible to recognize a handle knowing 7 only the grammar symbols on the stack, 10 r3 r r3 r3 then there is a finite automation that can, 3 by reading the grammar symbols on the 11 r5 r r5 r5 5 stack, then there is a finite automation that can, by reading the grammar symbols on Fig 2.29 Parsing table for expression the stack from top to bottom, determine grammar what handle, if any, is or top of the stack. The goto function of an LR parsing table is There is a significant difference between LL essentially such a finite automation. The and LR grammars. automation need not, however, read the For a grammar to be LR (k), we must be stack on every move. The state symbol able to recognize the occurrence of the stored on top of the stack is the state the right side of a production, having seen all of handle-recognizing finite automaton would what is derived from that right side with its be in if it had read the grammar symbols of input symbols of lookahead. the stack from bottom to top. Thus, the LR This requirement is far less stringent than parser can determine from the state on that for LL (k) grammars where we must be 6top of the stack everything that it needs to able to recognize the use of a production know about what is in the stack. seeing only the first k symbols of what its right side derives. Another source of information that an LR Thus, LR grammars can describe more parser can use to help make its shift-reduce languages than LL grammars. decisions are the next K input symbols. The cases k = 0 or k = 1 are of practical interest, 2.8 SLR Parsing and we shall only consider LR parsers with k≤1 here. For example, the action table in “simple LR” or SLR is a method to construct Fig. 2.28 uses one symbol of look ahead. A a LR parsing table from a grammar. In SLR grammar that can be parsed by an LR method, one look ahead information is parser examining up to k input symbols on used, but we usually omit the “(1)” after the each move is called an LR(k) grammar. SLR. 2.8.1 LR(0) Items STAT Action goto E Id + * ( ) $ E T F An LR(0) item (item for short) of a 0 S S 1 2 3 grammar G is a production of G with a dot 5 4 1 S ac of some position of the right side. Thus 6 c production A → XYZ yields the four items 2 r2 r r2 r2 A → XYZ 7 A → XYZ 3 r4 r r4 r4 A → XYZ 4 A → XYZ 4 S S 8 2 3 5 4 The production A →  generates only one 5 r6 r r6 r6 item, A → 6 The first item above indicates that we hope 6 S S 9 3 to see a string derivable from XYZ next on

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission the input. The second item indicates that end we have just seen on the input a string derivable from X and that we hope next to Example see a string derivable from YZ. The central Consider the augmented expression idea in the SLR method is to first construct grammar from the grammar a deterministic finite Ep →E automation to recognize viable prefixes. E →E + T | T One collection of sets of LR(0) items, which T →T * F | F we call canonical LR(0) collections, F →(E) | id provides the basic for constructing SLR If I is the set of one item {[E’ →E]}, then parsers. To construct the canonical LR(0) closure (I) contains the items collection for a grammar, we define an E’ →E augmented grammar and two functions, E →E + T closure and goto. E → T T →T * F 2.8.2 Augmented Grammar T →F F → (E) If G is a grammar with start symbol S, then F →id ‘G’, the augmented grammar for G, is ‘G’ Here, E’ → E is put in closure (I) by rule (1). with a new start symbol S’ and production Since there is an E immediately to the right S’ →S. The purpose of this new starting of dot, by rule (2) we add E productions production is to indicate the parser when it with dots at the left end, that is, E → E + T should stop parsing and announce and E → T. Now there is T immediately to acceptance of the input. That is, acceptance the right of a dot, so we add T →T * F. Next, occurs when and only when the parser is the F to the right of a dot forces F →id to be about to reduce S’→S. added. No other item is put into closure (I) by rule (2). 2.8.3 The Closure Operation 2.8.4 The Go to Operation If I is set of items for grammar G, then closure (I) is the set of items constructed The second useful function is goto (I, X) from I by the two rules: where I is a set of items and X is a grammar 1. Initially, even item in I is added to symbol. Goto (I, X) is defined to be the closure(I) closure of the set of all items [A →αXβ] such 2. If A →α Bβ is in closure(I) and B →  is that [A →α.Xβ] in I. a production, then add the item B → Intuitively, if I is the set of items that are valid for some viable prefix , then goto (I, to I, if it is not already there. We apply this rule until no more new items can be X) is the set of items that are valid for the added to closure(I) viable prefix  X. Functional closure (I); Example begin If I is the set of two items {[E’→E], [E →E + J : = I; T]}, then goto(I, +) consists of repeat E →E +T for each item A →α. Bβ in J and each T → T * F production B → for G T →F such that B → is not in J do F → (E) add B → to J F →id until no more items can be added to J. We computed goto (I, +) by examining I for return J items with + immediately to the right of the

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission dot. E’ →E is not such an item, but E → E + T T → .F is. We moved the dot over the + to get {E → F → (.E) E + T} and then took the closure of this set. F → .id I5: F → id. 2.8.5 The Set LR(0) items Construction I6: E → E. + T T → .T * F The algorithm to construct C, the canonical T → .F collection of sets of LR(0) items for an F → (.E) augmented grammar G’. The algorithm is F → .id shown below. Procedure items(G’): I7: T → T * F. begin F → (.E) C: {closure ({[S’ → S]})}. F → .id repeat for each set of items I in C and each I7: T → T * F. grammar symbol X such that goto(I, X) is not empty and not in I8: F → (E.) C do E → E + .T

add goto(I, X) to C I9: E → .E + T. until no more sets of items can be added to T → T. * T C end I10: T → T * F

I11: F → (E). Example Consider the augmented expression The goto function for this set of items is grammar shown of items is shown as the transition E’ →E diagram of a deterministic finite E →E + T | T automation in the following figure T →T * F | F F → (E) | id The canonical collection of sets of LR (0) items for this grammar is shown below. I0: E' .E E .E T T .T*F T .F F .(E) F .id

I1: E' E. E E. T

I2: E T. T T.*F I3: T F. I4: F (E.) E → .E + T E → .T Fig. 2.30 Transition diagram of DFA D T → .T * F for viable prefixes

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Explanation for point (3) above: Output: the SLR parsing table functions EE(R)1    action and goto for G’. T E E   (S) Method:

I1 i. Construct C = {I0, I1.....,In}, the collection { state contains Reduce & shift. But it is of sets LR(0) items for G’ not SR conflict, because E1 → is dummy. ii. State I is constructed from Ii. The So it is not inadequate state. Also E1 →E parsing actions for state i are shows power of “LR (0)”} determined as follows: OK NOT OK a.) If [A →αaβ] is in I, and goto (I, a) = I1, [LR (0)] No SR and RR SR (or) RR conflicts then set action [i, a] to “shift j”. Here conflicts, then It is LR(0) Present. It’s not LR(0) a must be a terminal. (or) Given Grammar is b.) If [A →α] is in I1, then set action [i, a] Ambiguous to “reduce A →α” for all a in [SLR (1)] No SR (or) Given Grammars is FOLLOW (A); RR conflicts & No Ambiguous & SR or RR Here A not be S”. multiple entries in Conflicts are present parsing table it is SLR c.) If [S’ →S] is in Ii,, then set action [i, $] (1) to “accept” [CLR (1) or LR (1)] If any conflicting actions are generated shown below by the above rules, we say that [LALR (1)] Looks ahead differ. grammar is not SLR (1). The algorithm It is not LALR fails to produce a parser in this case. [LL (1)] it is free of left It is ambiguous iii. The goto transitions for state i are recursion & Left Grammar is not LL(1) constructed for all nonterminal A using factoring. It is the rule: if goto (Ii A)=Ii then goto [i,A] unambiguous [LR (1)] (or) CLR (1): =j. LR (0) + Look a Head iv. All entries not defined by rules (2) and LR(0) with a 1 look a (3) are made “error” Head v. The initial state of the parser is the one LL(2) C-language constructed from the set of items containing [S’ →S]. 2.8.6 Construction of SLR Parsing Table Example Now we shall study how to construct the Let us construct the SLR table for the SLR parsing action and goto functions from following grammar the deterministic finite automation that E'E recognizes viable prefixes. Given a E E T | T grammar G we augment G to produce G’, T T*F | F and from G’ we construct G, the canonical collection of set of item for G’. We construct F (E) | id action, the parsing action function, and The canonical collection of sets of LR (0) goto, the goto function, from G using the items for this grammar was already given following algorithm. above. First consider the set of items I0 E ' .E Algorithm: constructing an SLR parsing E .E T Table E .T Input: An augmented grammar G’ T .T*F

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission T .F F .(E)  Augment grammar with a new production S‘’→S’ where S is start F .id symbol (serves a similar role to $) The item F → (E) gives rise to the entry  Initial item is S‘’ → S’ action [0,( ] = shift 4, the item F → id to the entry action [0, id] = shift 5. Other items in  Compute closure and go,to functions I0 yield no actions now consider I1: E' E.  Closure : Given a set of items I E .E T 1. All items in I are initially closure(I) The first item yields action [1, $] = accept, 2. If A → α.B β is in closure(I) and B → the second yields action [1, +] = shift 6. γ is a production, add B → .γ to closure(I),if not already there Next consider I2: E T. 3. Repeat until no new items are generated. T T.*F Since FOLLOW (E) = {$, +, )}, the first item Example: E → E + T | T makes action [2, $] = action [2, )] = reduce T → T * F | F E →T. The second item makes action [2,*] = F → (E) | id shift 7. Continuing in this fashion we obtain Augmented E’ → E the parsing action and go to tables that E → E + T | T were shown in following figure.2.28. T → T * F | F F → (E) | id 2.8.7 Canonical LR parsing 2.8.9 Building the Canonical LR(0)  SLR = “Simple LR “ – weakest, but Collection : Closure easiest to implement LR method  SLR grammar is one for which an SLR Initial closure(I) : {[E → .E]} parser can be constructed Matches A → α.B β (α = β = ϵ )  LR(0) item is a grammar production Add: E → .E + T with a dot somewhere in RHS. E → .T E.g. for A → XYZ , have New closure(I):{[E’ →.E],[E → .E + T],[E → A → .XYZ .T] } A → X.YZ Final closure(I):{[E’ → .E] , [E → .E + T] , [E A → XY.Z → .T],[T → .T*F] , [T → .F] , [F → .(E)], [F → A → XYZ. .id]}

 Production A → ∈ yields only A → . 2.8.10 Building the Canonical LR(0)  Dot tells us how much of RHS we have Collection : GOTO already processed and what we expect to see next. Go to (I,X) takes a set of items I and a  Idea is to build a DFA to recognize symbol X and returns the closure of the set viable prefixes. of all the items [A → αX.β] such that [A →  States of DFA are sets of LR(0) items, α.Xβ] is in I. (Like ∈-closure built by subset construction. (move(a)) in Subset construction.  These sets are called the canonical LR(0) collection. Intuitively. ”If you have item [A → α.Xβ] and you see the symbol X, move the dot to the 2.8.8 Building the Canonical LR(0) right of the X ”. Collection

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Example, I = {[E’ → E.], [E → E. + T]} I1: S’ → S , $ Matches: A → α.Xβ So add [E → E+.T] to goto(I, +) I2: S’ → C C, $ goto(I,+) = {[E → E + .T]} C → cC, $ Then Add closure of [E → E + .T] C → d, $ Go to(I,+) = {[E → E + .T]} [T → .T*F’], [T →.F’], I3: C → cC, c | d [F →.(E), [F →.id]]} C → cC, c | d C → d, c | d 2.8.11 Construction of the Sets of LR(1) items I4: C → d , c | d

Algorithm : construction of the sets of I5: S → CC , $ LR(1) items. I6: C → c C, $ Input : An augmented grammar G. C → cC, $ C → d, $ Output : The set of LR(1) items that the sets of items valid for one or more viable I 7 : C → d , $ prefixes of G. I8: C → cC , c | d Method : The procedures closure and go to and the main routine item for I9: C → cC , $ constructing the set of item are used. Procedure item (G’) The go to function for this set of items is Begin shown as the transition diagram of a C: = {closure ({[S’ →.S , $]})}; deterministic finite automation in figure Repeat below For each set of items 1 in C and each program symbol X such that go to (I,X) is not empty and not in C do add goto (I , X) to C Until no more sets of item can be added to C. End;

Example: Consider the following augmented grammar. S’ → S S’ → CC C → cC | d The canonical collection of sets of LR(1) items for grammar are shown below: I0: S’ →  S, $ S’ → CC, $ C → cC, c | d C → d, c | d Fig 2.31 The goto graph for grammar (2.3)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 2.9 LALR Parsing which have different lookaheads, then combine these sets of LR(1) items to Most LR-parser generators use an extended obtain a reduced collection, Ct of sets of version of the SLR construction called LR(1) items LALR(1). The “LA” in the abbreviation is There are two ways to construct LALR(1) short for “lookahead” and the (1) indicates parsing tables. The first (and certainly that the lookahead is one symbol, i.e., the more obvious way) is to construct the next input symbol. LR(1) table and merge the sets manually. If we compare the SLR(1)parser with This is sometimes referred as the "brute CLR(1) , we can find that CLR(1) parser is force" way. If you don’t mind first more powerful. finding all the multitude of states required CLR(1) has a greater number of states than by the canonical parser, compressing the the SLR(1) parser; hence its storage LR table into the LALR version is requirement is also greater than the SLR(1) straightforward. parser. We can device a parser that is an 1. Construct all canonical LR(1) states. intermediate between the two; that is the 2. Merge those states that are identical if parser’s power will be in between of that of the Look a heads are ignored, i.e., two SLR(1) and CLR(1), and its storage states being merged must have the requirement will be the same as SLR(1)’s. same number of items and the items Such a parse LALR(1), will be more useful; have the same core (i.e., the same since each of its states corresponds to the productions, differing only in set of LR(1) items, the information about lookahead). The look ahead on merged the look ahead is available in the state items is the union of the lookahead itself, making it more powerful than the from the states being merged. parser. 3. The successor function for the new LALR(1) state is the union of the 2.9.1Construction of LALR parsing successors of the merged states. If the table two configurations have the same core, The steps for constructing LALR parsing then the original successors must have table is given below. the same core as well, and thus the new Action Goto state has the same successors. State on A B $ S X 4. The action and go to entries are top of constructed from the LALR(1) states as stack 0 s36 s47 2 for the canonical LR(1) parser. 1 Acc 2.9.2 LALR table construction 2 s66 s47 5 36 s36 s47 89 S' → S 47 r3 r3 r3 S → XX 5 r1 X → aX 89 r2 r2 r2 X → b Looking at the configuration sets, we saw 1. Obtain canonical collection of sets of that states 3 and 6 can be merged, so can 4 LR(1) items and 7, and 8 and 9. Now we build this 2. Construct the parsing table by using LALR(1) table with the six remaining state this reduced collection 3. If more than one set of LR(1) items exist 2.9.3 The more clever approach in the canonical collection obtained that Having to compute the LR(1) configuration has identical cores of LR(0)’s out of sets first means we won’t save any time or

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission effort in building an LALR parser. However, I4: S → V =•E, $ the work wasn’t all for naught, because E → •F, $/+ when the parser is executing, it can work E → •E + F, $/+ with the compressed table, thereby saving F → •V, $/+ memory. The difference can be an order of F → •int, $/+ magnitude in the number of states. F → •(E), $/+ However there is a more efficient strategy V → •id, $/+ for building the LALR(1) states called step- I5: S → V = E•, $ by-step merging. The idea is that you E → E• + F, $/+ merge the configuration sets as you go, I6: E → F•, $/+ rather than waiting until the end to find the I7: F → V•, $/+ identical ones. Sets of states are I8: F–> int•, $/+ constructed as in the LR(1) method, but at I9: F → (•E), $/+ each point where a new set is spawned, you E → •F, )/+ first check to see whether it may be merged E → •E + F, )/+ with an existing set. This means examining F → •V, )/+ the other states to see if one with the same F → •int, )/+ core already exists. If so, you merge the F → •(E), )/+ new set with the existing one, otherwise V → •id )/+ you add it normally. I10: F → (E•), $/+ Here is an example of this method in E → E• + F, )/+ action: When we construct state I11, we get S' → S something we’ve seen before: S → V = E I11: E → F•,)/+ E → F | E + F It has the same core as I6, so rather than F → V | int | (E) add a new state, we go ahead and merge V → id with that one to get: Start building the LR(1) collection of I611: E → F•, $/+/) configuration sets as you would normally: We have a similar situation on state I12 I0: S' → •S, $ which can be merged with state I7 . The S → •V = E, $ algorithm continues like this, merging into V → •id, = existing states where possible and only I1: S'→ S•, $ adding new states when necessary. When I2: S' → V• = E, $ we finish creating the sets, we construct the I3: V → id•, = table just as in LR(1).

Action Goto 2.9.4 LALR(1) Grammars State on a b $ S X top of A formal definition of what makes a stack 0 s3 s4 1 2 grammar LALR(1) cannot be easily 1 Acc encapsulated in a set of rules, because it 2 s6 s7 5 needs to look beyond the particulars of a 3 s3 s4 8 production in isolation to consider the 4 r3 r3 other situations where the production 5 r1 appears on the top of the stack and what 6 s6 s7 happens when we merge those situations. 7 r3 9 8 r2 r2 Instead we state that what makes a 9 r2 grammar LALR(1) is the absence of conflicts in its parser. If you build the

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission parser and it is conflict-free, it implies the 2.10 Error Handling grammar is LALR(1) and vice-versa. As in LL(1) parsing tables, we can Stack Action Go implement error processing for any of the on top to variations of LR parsing by placing of appropriate actions in the parse table. stack Id + * ( ) $ E Here is a parse table for a simple arithmetic 0 S3 E1 E1 S2 E2 E1 1 expression grammar with error actions 1 E3 S4 E5 E3 E2 Acc inserted into what would have been the 2 S3 E1 E1 S2 E2 E1 6 blank entries in the table. 3 E3 R4 R4 E3 R4 R4 E –> E + E | E * E | (E) | id 4 S3 E1 E1 S2 E2 E1 7 Error e1 is called from states 0, 2, 4, 5 when 5 S3 E1 E1 S2 E2 E1 8 we encounter an operator. All of these 6 E3 S4 S5 E3 S9 E4 states expect to see the beginning of an 7 E3 R1 S5 E3 R1 R1 expression, i.e., an id or a left parenthesis. 8 E3 R2 R2 E3 R2 R2 One way to fix is for the parser to act as though Id was seen in the input and 9 E3 R3 R3 E3 R3 R3 shift state 3 on the stack (the successor for id in these states), effectively faking that LALR(1) is a subset of LR(1) and a superset the necessary token was found. The error of SLR(1). A grammar that is not LR(1) is message printed might be something like definitely not LALR(1), since whatever "missing operand". conflict occurred in the original LR(1) Error e2 is called from states 0, 1, 2, 4, 5 on parser will still be present in the LALR(1). finding a right parenthesis where we were A grammar that is LR(1) may or may not be expecting either the beginning of a new LALR(1) depending on whether merging expression (or potentially the end of input introduces conflicts. A grammar that is for state 1). A possible fix: remove right SLR(1) is definitely LALR(1). A grammar parenthesis from the input and discard it. that is not SLR(1) may or may not be The message printed could be "unbalanced LALR(1) depending on whether the more right parenthesis." precise lookaheads resolve the SLR(1) conflicts. LALR(1) has proven to be the 2.10.1 The Parser generator most used variant of the LR family. The weakness of the SLR(1) and LR(0) parsers First, a file , say translate. Y containing a mean they are only capable of handling a Yacc specification of the translator is small set of grammars. prepared. The UNIX system command yacc The expansive memory needs of LR(1) translate. Y transforms the file translate.y caused it to languish for several years as into C program called y .tab. c using the a theoretically interesting but intractable LALR method outlined in Algorithm. The approach. It was the advent of LALR(1) that programme y.tab.c is a representation of a offered a good balance between the power LALR parser written in C, along with other of the specific lookaheads and table size. C routines that the user may have The popular tools yacc and bison generate prepared. LALR(1) parsers and most programming By compiling y.tab.c along with the ly language constructs can be described with library an LALR(1)grammar(perhaps with a little grammar massaging or parser trickery to skirt some isolated issues).

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 10: F → id Predict(10)= { id }

+ - * / ( ) id $ E 1 1 Q 2 3 4 4 T 5 5 R 8 8 6 7 8 8 F 9 10

FIG 2.39 Creating an input/output → Because the above grammar is LL(1) translator with yacc A unique production number is stored in a table entry That contains the LR parsing program → Blank entries correspond to error using the command conditions Cc y.tab.c –ly In practice, special error numbers are We obtain the desired object program a.out used to indicate error situations that performs the translation specified by the origin Yacc program. If the other 3) S → aSbS/bSaS/ procedures are needed, they can be S → bSaS/ compiled or loaded with y.tab.c , just as Solution. with any C program It is LL(1) S → bSaS/ A Yacc source program has three parts: Because first (aSbS)  follow(S) =  declarations first (bSaS) follow(S) = %% translation rules 4) R → R + R /RR/R*/(R)/a/b %% Solution supporting C-routines Equivalent unambiguous grammar is E → E + T/T 2. Example on constructing the LL(1) T → TF / F parsing table: F → K* K → (K)/a/b 1) 1:S → A c B Predict(1) = {a, c} So it is LL(1) 2: → a A Predict(2) = {a} 3:A → ε Predict(3) = {c} 5) S → A/a 4:B → bB S Predict(4) = {b} A→ a 5:B → ε Predict(5)={$,a, c} Solution. Given grammar is not LL(1) Empty slots indicate error conditions Because S = first(A) ∩ first(a) = {a} ∩ {a} = {a}  2) 1:E → T Q Predict(1)= { ( , id } 2Q → + T Q Predict(2)= { + } 6) E → aA/(E) 3:Q → – T Q Predict(3)= { –} A→ +E/*E/ 4:Q → ε Predict(4)= { $ ,) } Solution. 5: T → F R Predict(5)= { ( , id } First (+E) ∩ first (*E) ∩ follow(A)  6:R → * F R Predict(6)= { * } First (aA) ∩ first ((E)) 7:R → / F R Predict(7)= { / } It is LL(1) 8:R → ε Predict(8)= {+, –, $, )} 9:F → (E) Predict(9)= { ( } 7) S → aBS/BC

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission B→ bC/bB/ S → S# C → Bx/Cy/z S → dA/aB Solution. A → bA/C LL(1) Table: Solution. a b c z Not taken SR conflict at I1 because S’ → .S is S S→ aBS { S→ BC } “augmented Grammar” B B → ϵ B bC / bB B → ∈  B  C → Bx C→ Cy C→z

8) A → BCD B → W/BX C → YCZ/mB D → DB/a Solution. It is not LL(1) BX having left recursion

Example on Bottom – Up Passing 1) S → aABe A → Abc | b B → d Input abbcde In above grammar, it is LR(0). SR & RR Output abbcde conflicts are not there aAbcde aAde 5) S → (L) aABe S → X S L → S 2) S → SAS/a L → L , S A → bSb / b Solution Solution. The operation Grammar is S → S A S /a S → SbSbS/a/SbS A → bSb/b

3) Prepare precedence table among Given → id * id b id * id Solution. $ < .id .> * < .id .> * <.bid .> $ $ < .* > b < .* > $ $ < .b. > $ *reduced because it handles $ < .*. > Id * b $ id .> .> .> * <. <. .> .> In above grammar , it is LR(0) . SR & RR b <. <. <. .> conflicts are not there $ <. <. <. 6) E’ → E# E → E + T/T 4) Chech for LR(0) T → X

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Check for LR(0) iv) All entries not defined by rules (2) and Solution. (3) are made “error” v) The initial state of the parser is the one constructed from the set containing item [ S’ → S, $].

Example: Consider the following augmented grammar S' .S S .CC Since, in the above construction NO SR & C .cC,| d RR conflicts are there, therefore it is LR(0 STACK Action Goto C d $ S C 2.10.2 Construction of LR parsing Table 0 s3 s4 1 2 1 acc Algorithm : Construction of the canonical 2 s6 s7 5 3 s3 s4 8 LR parsing table 4 r3 r3 5 r1 Input : An augmented grammar G’. 6 s6 S7 7 r3 Output: The canonical LR parsing table 8 r2 r2 function action and goto for G’. 9 r2 The canonical parsing table for the Method: grammar shown below i) Construct C = {I0, I1...... ,Ln}, the collection of sets of LR(1) items for G’. ii) State i of the parser is constructed from Ii. The parsing actions for state i are determined as follows: a) If [A →α. α β, b] is in Ii and goto (Ii, a) = Ij, then set action [i, a] to “shift j.” Here, a is required to be a terminal b) If [A→α , a] is in Ii, A ≠ S’. Then set action[i, a] to reduce A →α  ” c) If [S’ →S $] is in Ij, then set action [i, $] to “accept” If a conflict result from the above rules, the grammar is said not be LR(1), and the algorithm is said to fail. iii) The goto transitions for state i are determined as follows : If goto (Ii, A) = Ij, then goto [i, A] = j.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission GATE QUESTIONS

Q.1 Which of the following statements is S → CC false? C → cc | d a) An unambiguous grammar has This grammar is same leftmost & rightmost a) LL(1) derivation b) SLR(1) but not LL(1) b) An LL (1) parser is a top-down c) LALR(1) but not SLR(1) parser d) LR(1) but not LALR(1) c) LALR is more powerful than SLR [GATE- 2003] d) An ambiguous grammar can never be LR (k) for any k Q.6 Which of the following grammar [GATE- 2001] rules violate the requirements of an operator grammar? P, Q, R are non- Q.2 Which of the following suffices to terminals, and r, s, t are terminals. convert an arbitrary CFC to an LL(1) 1. P→ Q R 2. P → Q S R grammar? 3. P → Ɛ 4. P → Q t R r a) Removing left recursion alone a) 1 only b) 1 and 3 b) Factoring the grammar alone c) 2 and 3 d) 3 and 4 c) Removing left recursion and [GATE- 2004] factoring the grammar d) None of the above Q.7 The grammar A→AA`| (A) | Ɛ is not [GATE 2003] suitable for predictive-parsing because the grammar is Q.3 Assume that the SLR parser for a a) ambiguous grammar G has n1 states and the b) left-recursive LALR parser for G has n2 states. The c) right-recursive relationship between n1 and n2 is d) an operator grammar a) n1 is necessarily less than n2 [GATE -2005] b) n1 is necessarily equal to n2 c) n1 is necessarily greater than n2 Q.8 Consider the following grammar. d) None of these S →S * E [GATE- 2003] S → E E →F + E Q.4 Consider the grammar shown below E →F S → i E t S S' | a F →id S'→ e S | ε Consider the following LR(0) items E → b corresponding to the grammar above. in the predictive parser table, M of 1. S →S * E this grammar, the entries M[S’,e] & 2. E →F. + E M[S’, $] respectively are 3. E →F +. E a) {S' → e S) and {S' → ε} Given the items above, which two of b) { S' → e S} and { } them will appear in the same set in c) {S' → Ɛ} and {S' → ε} the canonical sets-of-items for d) {S' → e S, S' → ε} and {S’→ ε} grammar? [GATE -2003] a) 1 and 2 b) 2 and 3 Q.5 Consider the grammar shown c) 1 and 3 d) None of these below: [GATE -2006]

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Q.9 Consider the following grammar Si → eS│ε S → FR C → b R → *S|ε The grammar is not LL(1) because F → id a) it is left recursive In the predictive parser table M, of b) it is right recursive thegrammar the entries M[S, id] and c) it is ambiguous M[R, $] respectively. d) it is not context-free a) {S → FR) and {R → e) [GATE- 2007] b) (S → FR) and ( ), c) {S → F R} and {R → *S} Q.14 Consider the following two d) (F → id} and {R → Ɛ} statements: [GATE- 2006] P: Every regular grammar is L L(1). Q: Every regular set has a LR(1) Statements tor Linked Answer grammar. Questions 10 and 11 Which of the following is true? a) Both P and Q are true Q.10 Which one of the following b) P is true and Q is false grammars generates the language c) P is false and Q is true i j L={a b \ i ≠ j}? d) Both P and Q are false a) S → AC\CB b)S→ aS|Sb|a|b [GATE -2007] C→ aC b|a|b A → aA| ε Statements for Linked Answer B→ Bb| ε Questions 15 and 16 c) S → AC|CB d) S → AC |CB Consider the CFC with (S. A, B) as the non- C→ aC b| ε C → aC b | ε terminal alphabet, {a ,b} as the terminal A → aA| ε A → aA | a alphabet, S as the start symbol and the B → Bb| ε B → bB | b following set of production rules [GATE- 2006] S → aB S → bA B → b A→ a Q.11 In the correct grammar above, what B → bS A → aS is the length of the derivation steps B → aBB A→ bAA starting from S) to generate the string al bm l ≠ m Q.15 Which of the following strings is a) max (l, m) + 2 b) l + m +2 generated by the grammar? c) l + m + 3 d) max (l, m)+3 a) aaaabb b) aabbbb [GATE -2006] c) aabbab d) abbbba [GATE- 2007] Q.12 Which one of the following is a top- down parser? Q.16 For the correct answer strings to a) Recursive descent parser how many derivation trees are b) Operator precedence parser there? c) An LR(k) parser a) 1 b) 2 d) An LALR(k) parser c) 3 d) 4 [GATE -2007] [GATE-2007]

Q.13 Consider the grammar with non- Q.17 Which of the following describes a terminals N={S,C,S}, terminals handle (as applicable to LR-parsing) T = {a, b, i, t, e} with S as the start appropriately? symbol and the following set of a) It is the position in a sentential rules form where, the next shift or reduce operation will occur S → iCtSS1│a

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission b) It is non-terminal whose B → S production will be used for a B $ reduction in the next step S E1 E2 S →ε c) It is a production that may be A A →S A → S error used for reduction in a future B B S B S E step along with a position in the → → 3 sentential form where, the next shift or reduce operation will Q.20 The FIRST and FOLLOW sets for the occur non-terminals A and B are. d) It is the production,' that will be a) FIRST (A) = {a, b, ε) = FIRST (B) used for reduction in the next FOLLOW (A) = {a, b} step along with a position in the FOLLOW (B) = {a, b, $} sentential form where, the right b) FIRST (A) = {a, b, $} hand side of the production may FIRST (B) = {a, b, ε } be found FOLLOW (A) = {a, b} [GATE-2008] FOLLOW (B) = {$} c) FIRST (A) = {a, b, ε } = FIRST (B) Q.18 An LALR(1) parser for a grammar G FOLLOW (A) = {a, b} can have shift-reduce (S-R) conflicts, FOLLOW (B) =⊘ if and only, if d) FIRST (A) = {a, b,} = FIRST (B) a) the SLR(1) parser for G has S-R FOLLOW (A) = {a, b} conflicts FOLLOW (B) = {a, b,} b) the LR(1) parser for G has S-R [GATE-2012] conflicts c) the LR(0) parser for G has S-R Q.21 The appropriate entries for E1, E2 conflicts and E3 are d) the LALR(1) parser for G has a) E1 : S →aAbB , A →S reduce-reduce conflicts b) E1 : S→ aAbB , S →ε [GATE- 2008] E2 : S → bAaB , B → S E2 : S → bAaB , S →ε Q.19 The grammar S → aSa│bS│c is E3 : B → S a) LL(1) but not LR(1) E3 : S →ε b) LR(1) but not LL(1) c) E1 : S →aAbB , S →ε c) Both LL(1) and LR(1) d) E1 : A→S , S →ε d) Neither LL(1 ) nor LR(1) E2 : S →bAaB , S→ε [GATE- 2010] E2 : B →S , S →ε E3 : B →S Statements for Linked Answer E3 : B →S Questions 20 and 21 [GATE-2012] For the grammar below, a partial LL (1) Q.22 What is the maximum number of parsing table is also presented along with reduce moves that can be taken by a the grammar. Entries that need to be filled bottom-up parser for a grammar are indicated as E1, E2 and E3. εis the empty with no epsilon and unit production. string, $ indicates end of input and | (i.e., of type A  ε and A →a) to separates alternate right hand sides of parse a string with n tokens ? productions a) n/2 b) n-1 S →a A b B | b A a B | ε c) 2n-1 d) dn A →S [GATE-2013]

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Q.23 Consider the following tw sets of b) + is right associative, while * is LR(1) items of an LR (1) grammar. left associative X→ c.X, c/d X →c.X,$ c) Both + and * are right associative X →cX, c/d X →c.X,$ d) Both + and * are left associative [GATE-2014] X → d, c/d X → .d,$ Which of the following statements Q.26 Which one of the following is True related to merging of the two sets in at any valid state in shift-reduce the corresponding LALR parser parsing? is/are false? a) Viable prefixes appear only at 1. Cannot be merged since look the bottom of the stack and not ahead are different. inside 2. Can be merged but will result in b) Viable prefixes appear only at S-R the top of the stack and not conflict. inside 3. Can be merged but will result in c) The stack contains only a set of R-R conflict. viable prefixes 4. Cannot be merged since go to on d) The stack never contains viable c will lead to two different sets. prefixes a) Only 1 b) Only 2 [GATE-2015] c) 1 and 4 d) 1, 2, 3 and 4 [GATE-2013] Q.27 Match the following: (P) Lexical analysis (1) Graph coloring Q.24 A canonical set of items is given (Q) Parsing (2) DFA minimization below (R) Register allocation (3) Post-order traversal s → L. >R (S) Expression evaluation (4) Production tree Q → R. Codes: On input symbol < the set has A B C D a) a shift-reduce conflict and a a) 2 3 1 4 reduce-reduce conflict. b) 2 1 4 3 b) a shift-reduce conflict but not a c) 2 4 1 3 reduce-reduce conflict. d) 2 3 4 1 c) a reduce-reduce conflict but not [GATE-2015] a shift-reduce conflict. d) neither a shift-reduce nor a Q.28 Among simple LR (SLR), canonical reduce-reduce conflict. LR, and look-ahead LR (LALR), [GATE-2014] which of the following pairs identify the method that is very easy to Q.25 Consider the grammar defined by implement and the method that is the following production rules, with the most powerful, in that order? two operators ı and + a) SLR, LALR S →T *P b) Canonical LR, LALR T →U|T*U c) SLR, canonical LR P →Q+P |Q d) LALR, canonical LR Q→Id [GATE-2015] U→Id Q.29 Which of the following statements Which one of the following is TRUE? about parser is/ are CORRECT? a) + is left associative, while * is I. canonical LR is more powerful right associative than SLR.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission II. SLR is more powerful than LALR Q.31 Consider the following grammar: III. SLR is more powerful than P xQRS Cononical LR. Q yz | z a) I only b)II only R w | Sy c) III only d) II and III only [GATE-2017] What is FOLLOW (Q)? a) {R} b) {W} Q.30 Consider the following expression c) {W,y} d) {W,$} grammar G: [GATE-2017] E E T | T T T F | F F (E) | id Which of the following grammars is not left recursive, but is equivalent to G? E E T | T a) T T F | F F (E) | id ETE' E ' TE ' | b) T T F | F F (E) | id ETX X  TX | c) T FY Y  FY | F (E) | id E TX | (TX) d) X  TX |  TX | T id [GATE-2017] ANSWER KEY: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 (a) (c) (b) (d) (d) (b) (b) (c) (a) (d) (a) (a) (a) (a) 15 16 17 18 19 20 21 22 23 24 25 26 27 28 (c) (b) (d) (b) (c) (a) (c) (c) (d) (d) (b) (b) (c) (c) 29 30 31 (a) (c) (c)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission EXPLANATIONS

Q.1 (a) M[S',e] {S'   ,S'  eS} Statement 1 False: And M[S',$] S   An unambiguous grammar does not have similar leftmost and rightmost Q.5 (d) derivation: The grammar C c C | d is a Statement 2 True: LL (1) grammar canonical LR grammar. is a top down parsing and hence LL And as we know the following (1) parser is a top down parser. 1. Canonical grammar is also Statement 3 True: The sequence known as LR (1) grammar. with regards to power is 2. SLR (1) grammar is an LR (1) LLR>LALR>SLR. grammar. Statement 4 True: An ambiguous 3. Every LALR (1) is not LR (1) grammar can never be LR (k) for grammar. any k as they fail to LR. Q.6 (b) Q.2 (c) In P → QR there is no terminal LL(1) grammar is Top Down symbol, hence, it violates the Parsing. Now, we need to avoid the requirements. grammar to become ambiguous. For Also, according to the property of this, we need to remove left the grammar, No production side recursion and left factoring. This should be ε. Hence, P →ε does not how an arbitrary CFG can be follow the requirements. converted to LL(1) grammar. Q.7 (b) Q.3 (b) The grammar is a left recursive In deterministic finite automata, the grammar. The predictive parser does states of SLR and LALR parser are not go hand in hand with left the states of corresponding states. recursive grammar. Also, the number of states in SLR parser = number of states in LALR Q.8 (c) parser. The grammar is Q.4 (d) S→S*E The knowledge of the construction S→E of predictive parser table will help E→F+E in scoring at least 2 marks. These E→F types of questions are very S→id important. LR(0) items corresponding to the Now, let’s view the predictive parser grammar above. table (M) i) S→S*E Non a b e i t $ ii) E → F. + E terminal iii) E → F + .E S s→a S→iEiSS S’ S’→ε S→ε Now, S → S * .E …… (i) S’→es S → .E …… (ii) E E→b E → F + .E …… (iii) From Eq. (i) and (iii),

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission We see that in both the productions, C → aC | b | ε the Dot is left to E. Hence, belonging A → aA | a to the same set in canonical set of B → Bb | b items for the grammar . Consider the production: S → AC | CB Q.9 (a) Applying, C → ε Predictive parser table is to be We get, S → A | B constructed to get the values Now, Applying A or B, S generates Non-terminal id * $ i j S S→FR a^n. b^n. So, L = {a b | i ≠ j } R R→*S R→ε Now, consider the production, S → F F→id AC | CB. If the production C → acb is M[S,id] {S FR} applied then, it generates a^n. b^n. M[R,$] {R   } Now, applying either A or B it will give that number of a ≠ number of b. Q.10 (d) In the language, Q.11 (a) Language L = {ai bj | i ≠ j} Following determine the length of The expression i ≠ j indicates that the derivation, language L is a language in which Max. (l,m) + 2 to generate the string number of a ≠ number of b. a^|.b^m with l≠m. Here, we need to search for a In the option (d) of the previous language in which number of a ≠ question, the derivation can be number of b. given as Considering the options: 1. S → AC Option(a) 2. S → aC S → AC | CB 3. S → aaCb C → aC b | a | b 4. S→ aab A → aA | ε Total steps are 4 and therefore, the B → Bb | ε maximum length of derivation is A. (a), S → (C) B → a(B) → aBb → ab Q.12 (a) Here, number of a = number of b, Top-down parsing is a technique to hence, it is not the required grammar. analyze unknown data relationships. Option(b) This is done by hypothesizing S → aS | Sb | a | b general parser tree structures and S → aS → abB then considering whether the Here, number of a = number of b, known fundamental structures are hence, it is not the required compatible with the hypothesis that grammar. was made earlier. Option(c) Recursive descent parser is a top S → AC | CB down parser. C → aC b | ε A → aA | ε Q.13 (a) B → Bb | ε Let’s see the following set of rules S(C) B → acb B → ab S→iCtSS1|a Here, number of a = number of b, S1→eS|ε hence it is not the required C → b grammar. Step 1 S→iCtSS1|a Option(d) Step 2 S→iCtSS1|a S → AC | CB (as given C → b)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission In the second step, we can observe As per the definition of the handle, it that terminals are i, b and t while is a production p that will be used the left non-terminal is S. This for reduction in the next step along implies that second step onwards with a position in the sentential grammar is left recursive. form where the right hand side of the production may be found. Q.14 (a) Statement P True: The regular Q.18 (b) grammar is well recognized by LALR(1) item can have S-R (Shift LL(1) parsers and this how every reduce) conflict on a condition that regular grammar is LL(1). LR(1) parser for G has S-R (Shift Statement Q True: Considering the Reduce) conflict. strengths. The sequence with regards Now, lets understand all the queries to power is LR>LALR>SLR(in of LALR grammar). There are three issues which arises The sequence with regards to power 1. whether LALR is equivalent to is LR(1) > LL(1). SLR. This implies that regular grammar is 2. shift/Reduce conflict in the cases accepted by LR (1) parser and of LR(1) and LALR. therefore, every regular set has a 3. Reduce/Reduce conflict in the LR(1) grammar. cases of LR(1) and LALR. Let’s take an example grammar Q.15 (c) S’ → S Let’s solve out to get the string S → Bbb generated by the grammar. S → aab S→aB S → bBa S→aaBB B → a S→aabB First of all we work the SLR parser. S→aabbS Generating the items → S→aabbaB 10: S’ → .S S→aabbab S → .Bbb We get the string as aabbab. S → .aab S → .bBa Q.16 (b) B → .a The string received is aabbab. 11: S’ → S We can total 2 derivation trees: 12: S → B.bb Leftmost derivation and rightmost 13: S → a.ab derivation, B → a. 14: S → b.Ba B → .a 15: S → Bb.b 16: S → aa.b 17: S → bB.a 18: B → a. 19: S → Bb.b |10: S → aab. |11: S → bBa. Q.17 (d)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission In the underlined item, there is a Generating the LR(1) items of the shift/reduce conflict. To avoid this above grammar, conflict, we work with LR(1) parser. 10: S’ → .S, $ Generating the items of LR(1) S → .aAd, $ parser → S → .bBd, $ 10: S’ → .S, $ S → .aBe, $ S → .Bbb, $ S → .bAe, $ S → .aab, $ 11: S’ → S., $ S → .bBa, $ 12: S → a. Ad, $ B → .a, a/b S → a. Be, $ 11: S’ → S., $ A → .c, d 12: S → B.bb, $ B → .c, e 13: S → a.ab, $ 13: S → b. Bd, $ B → a., b S → b. Ae, $ 14: S → b.Ba, $ A → .c, e B → .a, a B → .c, d 15: S → Bb.b, $ 14: S → aA.d, $ 16: S → aa.b, $ 15: S → aB.e, $ 17: S → bB.a, $ 16: A → c. d, 18: B → a., a B → c., e 19: S → Bb.b, $ 17: S → bB. d, $ |10: S → aab., $ 18: S → bA.e, $ |11: S → bBa., $ 19: B → c., d In the underlined item, we find that A → c., e there is no shift/reduce conflict. |10: S → aAd., $ Here, we see that LALR = LR(1), |11: S → aBe., $ since no merging was required. |12: S → bBd., $ As well as we find that number of |13: S → aBe., $ state in SLR and LR(1) re same, The underline items are of our which may not be the same in order interest. We see that when we make cases. the parsing table for LR(1), we will Now, Let’s take another example. get something like this… In this example it is shown that The LR(1) Parsing Table (partly reduce/reduce conflict may arise in filled) LALR parser, even if it is not present a………… d e in LR(1) parser. 11 . We are given the following grammar 12 . . . to parse using LALR parser. First of . . all, this grammar will be parsed . using LR(1), and after that it will be 16 ………………. r6 r7 fed to LALR parser. .. . . S’ → S . . . S → .aAd . . . . . S → bBd 19 r7 r6 S → aBe . S → bAe This table on reduction to the LALR A → c parsing table, comes up in the forms B → c of

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission The LALR Parsing table (partly filled) Q.20 (a) a……… D e First (S) = first (aAbB) ∪First (bAaB) 11 . ∪∈ 12 . = {a} ∪ {b}∪∈ . . = {a, b, ∈ } . . First (A) = First S = {a, b, ∈} . First (B) = First S = {a, b, ∈} 169 ………… r6/r7 r7/r6 Follow(S)=($)∪follow(A)∪follows (B) .. . . Follow (A) = {a, b} . . . Follow (B) = Follow (S) = {$, a, b} . . Q.21 (c) . . . Entry E for [S, a] = {S → aAbB , S →∈} 19 r7 r6 S → aAbB . First (aAbB) = {a} So, we find that the LALR gain S→ aAbB reduce-reduce conflict whereas, the First (bAaB) = {b} corresponding LR(1) counterpart Put (S→∈)Production as per follow was void of it. This is a poof enough (S) that LALR is less potent than LR(1). Follow (S) = {a, b, $} But, since we have already proved Hence, (S → ∈) will come in [S , a] , that the LALR is void of shift-reduce [S , b] and [S , $] conflicts (given that the Entry E2 for [S , b]={S→ bAaB ,S → ∈} corresponding LR(1)is devoid of the Entry E3 for [B , $] = {B → S} same),whereas SLR (or LR(0)) is not Production (B → S) will come in as necessarily void of shift-reduce per First (S) = {a , b ,∈ } conflict, the LALR grammar is more Hence, put B→ S in [B , a] [B , b] potent than the SLR grammar. First (S) contains ∈ so consider [S , $] Shift-reduce conflict present in SLR Put [B → S] in [S , $] ⇓ Some of them are solve in….. Q.22 (c) LR(1) The maximum number of reduce ⇓ moves that can be taken by a All those solved are preserved in…. bottom-up parser for a grammar ⇓ with no epsilon and unit production LALR (i.e., of type A →ε and A → a) to So, we have answered all the queries parse a string with n taken (c) 2n-1 on LALR that we raised intuitively. Q.24 (d) Q.19 (c) S →aSa|bs|C , The above grammar is LL(1) because, First [aSa] ∩ first [bS] = (a) ∩ (b) = ∅ From above diagram, we can see First [bS] ∩ first [c] = (b) ∩ (c) = ∅ that there is no shift- reduce or First [c] ∩ first [aSa] =(c) ∩ (a) = ∅ reduce-reduce conflict. As the above grammar is LL(1), also LR(1) because LL(1) grammar is Q.25 (b) always LR(1) grammar.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission S→T × P T →U | T ×U P →Q + P|Q Q → Id U→ Id As the production rule T→T×U is defined as left recursive rule, so * is left associate operator. As the production rule P →Q + P is defined as right recursive rule, so + is right associative operator.

Q.29 (a) Cononical LR is the most powerful parsers among all the L R(K)parsers.

Q.30 (c ) E E T | T T T F | F F (E) | id There are 2 left recursion and both have to be removed .The following is the conversion E '  TE ' | ETE' T '  FT ' | Y FT ' F (E) | id Now by putting E’ as X and T’ as Y we get option ( c ) which is ETX T FY F (E) | id XTX Y FY

Q.31 (c) Follow (Q) = First (RS) {{w}  } First (S)={w,y}

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 3 SYNTAX DIRECTED TRANSLATION

3.1 Introduction  Special cases of syntax-directed definitions can be implemented in a  Check semantics single pass by evaluating semantic rules  Error reporting during parsing, without explicitly  Disambiguate overloaded operations constructing a parse tree or a graph  Type coercion showing dependencies between  Static checking attributes. - Type checking  Since single-pass implementation is - Control flow checking important for compile-time efficiency, - Uniqueness checking much of this chapter is devoted to - Name checks studying such special cases. One  There are two notations for associating important subclass, called the “L- semantic rules with productions, attributed’’ definitions, encompasses 1) Syntax directed definition virtually all translations that can be 2) Translation schemes. performed without explicit construction  Syntax directed definitions are high- of a parse tree. level specifications for translations.  They hide many implementation details 3.1.1 Attribute Grammar framework and free the user from having to specify explicitly the order in which translation  Generalization of CFG where each takes place translation schemes grammar symbol has an associated set indicate the order in which semantic of attributes. rules are to be evaluated, so that they  Values of attributes are computed by allow some implementation details to semantic values. be shown.  Two notations for associating semantic  we parse the input token stream build rules with production. the parser tree, and then traverse the → Syntax directed definition tree as needed to evaluate the semantic  High level specification rules at the parse-tree nodes.  Hides implementation details  Evaluation of the semantic rules may  Explicit order of evaluation is not generate code, save information in a specified. symbol table, issue error massages, or A syntax-directed definition is a perform other activities. The translation generalization of a context-free grammar in of the token stream is the result which each grammar symbol has an obtained by evaluating the semantic associated set of attributes, partitioned into rules, two subsets. 1) Synthesized attributes 2) Inherited attributes  An attribute can represent anything we choose: a string, a number type, a Fig.3.1 conceptual view of syntax- memory location, or whatever. directed translation  The value of an attribute at a parse-tree node is defined by a semantic rule  An implementation does not have to associated with the production used at follow the outline in fig 3.1 literally. that node.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  The value of a synthesized attribute at a  Function in semantic rules will often be node is computed from the values of written as expressions. attributes at the children of that node in  Occasionally, the only purpose of a the parse tree. semantic rule in a syntax-directed  The value of an inherited attribute is definition is to create a side effect such computed from the values of attributes semantic rules are written as procedure at the sibling and parent of that node. calls or program fragments.  Semantic rules set up dependencies  They can be thought of as rules defining between attributes that will be the values of dummy synthesized represented by a graph. attributes of the non-terminal on the  From the dependency graph, we derive left side of the associated production; an evaluation order for the semantic the dummy attribute and the := sign in rules. the semantic rule are not shown.  Evaluation of the semantic rules defines Example 3.1: the values of the attributes at the nodes The syntax-directed definition in fig. 3.2 is in the parse tree for the input string. for a desk-calculator program. This definition  A semantic rule may also have side associates an integer-value synthesized effects, e.g., printing a value of updating attribute called val with each of the non- a global variable. Of courses an terminals E, T and F. For each E,T and F- implementation need not explicitly production, the semantic rule computes the construct a parse tree or a dependency value of attribute value for the non- graph; it just has to produce the same terminal on the left side from the values of output for each input string. vat f or the non terminal on the right side  The process of computing the attribute production Semantic Rules values at the nodes is called annotating L→En Print(E,val) or decorating the parse tree. E→E1÷T E,val:= E1.val+T.val E→T E,val:= T.val 3.1.2 Form of a syntax-directed T→T1*F T.val:= T.val×F.val definition T→F T.val:=F.val F→(E) F.val:=E.val In a syntax-directed definition, each F→digit F.val:=digit.lexval grammar production A ―›α has associated Fig 3.2 syntax-directed definition of a with it a set of semantic rules of the form simple desk calculator. b:=f(C1,C2, … Cx) where f is function, and either The token digit has a synthesized attribute 1. b is a synthesized attribute of A and lexval whose value is assumed to be C1,C2,.. ,Ck are attributes belonging to supplied by the lexical analyzer. The rule the grammar symbols of the associated with the production L → En for production, or the starting non-terminal L is just a 2. b is an inherited attribution of one of the procedure that prints as output the value grammar symbols on the right side of of the arithmetic expression generated by the production and C1,C2,……Cx are E; we can think of this rule as defining a attributes belonging to the grammar dummy attribute for the non-terminal L A symbols of the production. yacc specification for this desk calculator In either case, we say that attribute b was presented in fig. 3.2 to illustrate depends on attributes C1,C2,……Cx an translation during LR parsing attribute grammar is a syntax-directed In a syntax-directed definition, terminals definition in which the function in semantic are assumed to have synthesized attributes rules cannot have side effects. only, as the definition does not provide any semantic rules for terminals. Values for

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission attributes of terminals are usually supplied also an edge to X.i from Y.y, since X.i by the lexical analyzer, as discussed. depends on both A.a and Y.y. Furthermore, the start symbol is assumed not to have any inherited attributes, unless Example3.2: otherwise started. Whenever the following production is used in a parse tree, we add the edges shown in 3.1.3 Dependence Graph: fig 3.3 to the dependency graph. E E1+E2 E.val:=E1.val+E2.val  If an attribute b depends on an attribute The three nodes of the dependency graph c, then the semantic rule for b at the marked by • represent the synthesized node must be evaluated after the attributes E.val, E1.val, and E2.val at the semantic rule that defines c. corresponding nodes in the parse tree. The  The dependency among the nodes can edge to E.val from E2.val shows that E.val be depicted by a directed graph called depends on E2.val and the edge to E.val dependency graph from E2val shows that E.val also depends Dependency Graph : Direct graph on E2.val the dotted lines represent the indicating interdependencies among the parse tree and are not part of the synthesized and inherited attributes of dependency graph. various nodes in a parse tree. We can create a following dependency Before constructing dependency graph for graph a parse tree, we put semantic rule into the form b=f(C1,C2,……Ck), by introducing a dummy synthesized attribute b for each semantic rule that consists of procedure call. The graph has a node for each attribute and an edge to the node for b from the node for c if attribute b depends Fig.3.3 E.val is synthesized from E1.val on attribute c in more detail, Algorithm to and E2.val construct dependency graph for each node in the parse tree do for each attribute a of The synthesized attribute E.val depends on the grammar symbol at it do construct a E1.val and E2.val hence the two edges one node in the dependency graph for a for each from E1.val & E2.val each node n in the parse tree do for each semantic rule b=f(C1,C2,……Cx) do 3.1.4 Evaluation Order associated with the production used at n do for i=1to k do construct an edge from ci to b; Any topological sort of dependency graph For example, suppose A.a :=f(X.x,Y.y) is a gives a valid order in which semantic rules semantic rule for the production A XY. must be evaluate This rule defines a synthesized attribute a4 = real A.a that depends on the attributes X.x and a5 = a4 Y.y. if this production is used in the parse addtype(id3.entry , a5) tree, then there will be three nodes A.a,X.x a7 = a5 and Y.y in the dependency graph with an addtype(id2.entry , a7) edge to A.a fromX.x since A.a depends on a9: = a7 addtype(id1.entry , a9) X.x, and an edge to A.a from Y.y since A.a A topological sort of a directed acyclic also depends on Y.y. graph ia any ordering m1, m2,...... mk of the If the production A XY has the semantic nodes of the graph such that edges go from rule X.i := g(A.a,Y.y) associated with it, then nodes earlier in the ordering to later nodes; there will be an edge to X.i from A.a and that is, if m1 → mj, then mi appears before mj in the ordering.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  Inherited attributes, which must be passed down from parent nodes to children nodes of the during the semantic analysis, pose a problem for bottom-up parsing.  Synthesized attributes are used extensively in practice. A syntax- directed definition that uses synthesized attributes exclusively is said to be an S- attributed definition. Fig.3.4 Dependency graph for parse tree Example 3.3: The figure shows the dependency graph for The s-attributed definition in example 3.1 the statement real id1 , id2 , id3 along with specifies a desk calculator that reads an the parse tree. Procedure calls can be input line containing an arithmetic through of as rules defining values of expression involving digits parentheses, dummy synthesized attributes of the the operator + and the *, followed by a nonterminal on the left side of the newline character n, and prints the value of associated production. Each of the semantic the expression for example, given the rules addtype (id.entry , L.in) associated expression 3*5+4 followed by a newline with the L productions leads to the creation the program prints the value 19. Figure of the dummy attribute. contains an annotated parse tree for the The translation specified by a syntax- input 3*5+4n. directed definition can be made precise as The output, printed at the root of the tree, follows. The underlying grammar is used to is the value of E.val at the first child of the construct a parse tree for the input. The root. dependency graph is constructed as discussed above. From a topological sort of the dependency graph, we obtain an evaluation order for the semantic rules. Evaluation of the semantic rules in this order yields the translation of the input string. A syntax-directed definition is said to be circular if the dependency graph for some parse tree generated by, its grammar has a cycle. 3.2 Synthesized attributes Fig.3.5 Annotated parse tree for3*5+4n  A syntax directed definition that uses To see how attribute values are computed, only synthesized attributes is said to be consider the leftmost bottommost interior an S-attributes definition. node, which corresponds to the use of the  A parse tree for an S-attributed definition production F digit The corresponding can be annotated by evaluating semantic semantic rule, F.val:= digit lexval defines rules for attributes. the attribute F.val at that node to have the  S-attribute grammars are a class of value 3 because the value of digit lexval at attribute grammars, comparable with L- the child of this node is 3 similarly, at the attributed grammars but characterized parent of this F-node, the attribute T.val by having no inherited attributes at all. has the value 3.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Now consider the node for the production  An inherited attribute system may be T→T*F. The value of the attribute T.val at replaced by an S – attribute system but this node is defined by it is more natural to use inherited Production semantic rule attributes. T →T1*F t.val:=T1.val×F.val In the following example, an inherited When we apply the semantic rule at this attribute distributes type information to node, T1.val has the value 3from the left the various identifiers in a declaration. child and F.val has the value 5 from the right child. Thus, T.val acquires the value Example3.4: A declaration generated by 15 at this node. the nonterminal D in the syntax directed The rule associated with the production for definition in fig. consist of the keyword int the starting nonterminal L→En prints the or real, followed by a list of identifiers. The value of the expression generated by E. nonterminal T has a synthesized attribute type, whose value is determined by the 3.2.1 Inherited attributes keyword in the declaration. The semantic rule l. in:= T. type, associated with  An inherited attribute is one whose production D →TL, sets inherited attribute value is defined in terms of attributes at L.in. to the type in the declaration. The the parent and/or siblings. rules then pass this type down the parse  Used for finding out the context in tree using the inherited attribute L.in. rules which it appears associated with the productions for L call  Inherited attributes are convenient for procedure add type to add the type of each expressing the dependence of a identifier to its entry in the symbol table programming language construct on the (pointed to by attribute entry). context in which it appears.  Figure shows an annotated parse tree  For example, we can use inherited for the sentence real id1,id2,id3. attribute to keep track of whether an  The value of L.in at the three L-nodes identifier appears on the left or right gives the type of the identifiers id1,id2 side of an assignment in order to decide and id3. whether the address or the value of the  These values are determined by identifier is needed. Although it is computing the value of the attribute always possible to rewrite a syntax- T.type at the left child of the root and direction definition to use only then evaluating L.in top-down at the synthesized attributes, it is often more three L-nodes in the right subtree of the natural to use syntax-directed root-at each node. definition with inherited attributes.  we also call the procedure addtype to  Possible to use only S-attributes but insert into the symbol table the fact that more natural to use inherited attributes the identifier at the right child of this D → T L L.in = T.type node has type real T → real T.type = real T → int T.type = int L → L1 , id L1.in = L.in; addtype (id.entry , L.in) L → id addtype(id.entry , L.in)  Inherited attributes help to find the context(type,scope etc.) of a token e.g., the type of a token or scope when the same variable name is used multiple times in a program in different functions.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Fig, 3.6 parse tree with inherited  Another simplification found in syntax attribute in at each node labeled L trees is that chains of single productions may be collapsed. 3.2.2 Construction of Syntax Trees

In this section, we show how syntax - directed definitions can be used-to specify the construction of syntax trees and other graphical representations of language constructs. The use of syntax trees as an intermediate Fig- 3.8 Syntax Tree for 3 * 5 + 4 representation allows translation to be decoupled from parsing. Translations and Syntax-directed translation can be based routines that are invoked during parsing on syntax trees as well as parse trees. The must live with two kinds of restrictions. approach is the same in each case; we First, a grammar that is suitable for parsing attach attributes to the nodes as in a parse may not reflect the natural hierarchical tree. structure of the constructs in the language. For example, a grammar for Fortran may 3.3.1 Constructing Syntax Trees for view a sub - routine as consisting simply of Expressions a list of statements. However, analysis of the subroutine may be easier if eve use a  The construction of a syntax tree for an tree representation that reflects the nesting expression is similar to the translation of do loops. Second, the parsing method of the expression into postfix form. constrains the order in which nodes in a  We construct stab trees for the sub parse tree are considered. This order may expressions by creating a node for each not match the order in which information operator and operand. about a construct becomes available. For  The children of an operator node are this reason, compilers for C usually the roots of the nodes representing the construct syntax trees for declarations. sub expressions constituting the 3.3 Syntax Trees operands of that operator.  Each node in a syntax tree can be  An (abstract) syntax tree is a condensed implemented as a record with several form of parse tree useful for fields. representing language constricts.  In the node for an operator, one field  The production S → if B then S1 else S2 identifies the operator and the might appear in a syntax tree as remaining fields contain pointers to the nodes for the operands. The operator is often called-the label of the node. When used for translation, the nodes in a syntax tree may have additional fields Fig. 3.7 Syntax Tree to hold the values (or pointers to values) of attributes attached to the  In a syntax tree, operators and node. keywords do not appear as leaves, but  In this section, we use the following rather are associated with the interior functions to create the nodes of create node that would be the parent of those trees for expressions with binary leaves in the parse tree. operators. Each function returns a pointer to a newly created node.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 1. mknode(op, left, right) creates an operator node with label op and two Figure 3.10 contains a S-attributed fields containing pointers to left and definition for constructing a syntax tree for right, an expression containing the operators + 2. mkleaf (id, entry) creates an and -. It uses the underlying productions of identifier node with pointer to the the grammar to schedule the calls of the symbol-table entry for the identifier. functions mknode and mkleaf to construct 3. mkleaf (num, vat) creates a number the tree. The synthesized attribute nptr for node with label of the num and a E and T keeps track of the pointers field containing value of the number. returned by the function calls.

Example 3.5: PRODUCTION SEMANTIC RULES The following sequence of functions calls E→E1 + T E.nptr := mknode(‘+’

create the syntax tree for the expression a- E1. nptr. T. nptr) 4 +c in Fig. 3.9 In this sequence, p1., p2..... p5 are pointers to nodes, and entry a and E→E1 – T E.nptr := mknode(‘-’ entry e are pointers to the symbol-table E1. nptr. T. nptr)

entries for identifiers a and e, respectively. E → T E1. Nptr: = T. nptr The following sequence of function calls creates a parse tree a – 4 + c T → (E) T. nptr: = E. nptr - p1: = mkleaf (id, entry a); T → id T. nptr: = mkleaf(id. Id.entry) - p : = mkleaf (num, f); 2 T → num T. nptr: = mkleaf(num, - p3: = (‘-‘, p1, p2); num.val) - p4: = mkleaf(id, entry e); - p5: = mknode(‘+’, p3, P4); Fig.3.10 Syntax-directed definition for constructing a syntax tree for an expression

Example 3.6: An annotated parse tree depicting the construction of a syntax tree for the expression a-4+c is shown in Fig. 3.11 the parse tree is shown dotted. The parse-tree Fig.3.9 Syntax tree for a-4+c nodes labeled by the nonterminals E and T use the synthesized attribute nptr to hold a  The tree is constructed bottom up. pointer to the syntax-tree node for the  The function calls mkleaf (id, entrya) expression represented by the non and mkleaf (num, 4) construct the terminal. leaves for a The semantic rules associated with the and 4; the pointers to these nodes are productions T → id and T → num define saved using p1 and p2. attribute T.ntptr to be a pointer to a new  The call mknode ('-‘, p1, p2) then leaf for an identifier and a number, constructs the interior node with the respectively. Attributes id, entry and leaves for a and 4 as children. After two numval are the lexical more steps, p5 is left pointing to the values assumed to be returned by the root. lexical analyzer with the tokens id and num. 3.3.2 A Syntax-Directed Definition for when an expression E is a single term, Constructing Syntax Trees corresponding to a use

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission be represented as a duplicated sub trees Expression a + a*(b - c) + {b – c )* d make a lef or node if not present, otherwise return pointer to the existing node. The leaf for a has two parents because a is common to the two sub-expressions a and a* (b-c). Likewise, both occurrences of the common sub-expression b-c are represented by the same node, which also has two parents. Fig. 3.11 Construction of a syntax-tree for a-4+c

of the production E → T, the attribute E.nptr gets the value of T.nptr, When the semantic rule E.nptr := mknode (‘-‘ E1, nptr, T.nptr) associated with the production E→ E1-T is invoked, previous rules have set E1.nptr and T.nptr to be pointers to the leaves for a and 4, Fig.3.12 Dag for the expression a + a*(b - respectively. c) + (b - c)*d In interpreting Fig 3.11 it is important to realize that the lower tree, formed from - The syntax-directed definition of Fig records is a "real" syntax tree that 3.12 will construct a dag instead of a constitutes the output, while the dotted syntax tree if we modify the operations tree above is the parse tree, which may for constructing nodes. exist only in a figurative sense. In the next - A dag is obtained if the function section, we show how a S-attributed constructing a node first checks to see definition can be simply implemented whether an identical node already exists. using the stack of a bottom-up parser to - For example, before constructing a new keep track of attribute values. node with label op and fields with pointers to left and right, mknode (op, 3.3.3 Directed Acyclic Graphs for left, right) can check whether such a Expressions node has already been constructed. - If so, mknode (op, left, right) can return - A directed acyclic graph for an a pointer to the previously constructed expression identifies the common sub node. The leaf-constructing function expressions in the expression. mkleaf can behave similarly. - Like a syntax tree, a dag has a node for every sub-expressions of the Example 3.7: The sequence of instructions expression; an interior node represents in Fig. 3.13 constructs the dag in Fig, an operator and its children represent provided mknode and mkleaf create new its operands. nodes only when necessary, returning - The difference is that a node in a dag pointers to existing nodes with the correct representing a common sub-expression label and children whenever possible. In has more than one "parent" in a syntax Fig. 3.14, a, b, c and d point to the symbol- tree, the common sub-expression would table for identifiers a, b, c, and d.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission (1) P1 := mkleaf (id, a); Algorithm 3.1 Value-number method for (2) P2 := mkleaf (id, a); constructing a node in a dag. (3) P3 := mkleaf (id, b); Suppose that nodes are stored in an array, (4) P4 := mkleaf (id, c); as in Fig 3.14, and that each node is (5) P5 := mknode (‘-‘, p3, p4); referred to by its value number. Let the (6) P6 := mknode (‘*’, p2, p5); signature of an operator node be a triple (7) P7 := mknode (‘+’, p1, p6); consisting of its labels op, left (8) P1 := mkleaf (id, b); child 1, and right child r. (9) P1 := mkleaf (id, c); (10) P1 := mknode (‘-‘, p8, p9); Input: Labels op node 1, and node r. (11) P1 := mkleaf (id, d); (12) P1 := mknode (‘*‘, p10, p11); Output: A. node with signature . (13) P1 := mknode (‘+‘, p7, p12); Fig. 3.13 Instructions for constructing Method: Search the array for a node m the dag with labels op, left child 1, and right child r. If there is such a node, return m; otherwise, When the call mkleaf(id, a) is repeated on create a new node n with labels op, left line 2, the node constructed by the child 1, right child r, and return n. previous call mkleaf(id, a) is returned, so p1 - An obvious way to determine if node m = p2. Similarly, the nodes returned on lines is already in the array is to keep all 8 and 9 are the same as those returned on previously created nodes on a list and lines 3 and 4, respectively. Hence, the node to check each node on the list to see if it returned on line 10 must be the same one has the desired signature. constructed by the call of mknode on line 5. - The search for m can be made more In many applications, nodes are efficient by using k lists, called buckets, implemented as records stored in an array, and using a hashing function A to as in the figure 3.14, each record has a label determine which bucket to search. field that determines the nature of the - The hash function h computes the node. We can refer to a node by its index or number of a bucket from the value of position in the array. The integer index of a op, 1 and r. It will always return the node is often called a value number for same bucket number, given the same historical reasons. For example, using value arguments. If m is not in the bucket h numbers, we can say node 3 has label +, its (op, l, r) then a new node n is created left child is node 1, and its right child is and added to this bucket, so subsequent node 2. The fallowing algorithm can be searches will find it there. used to create nodes for a dag - Several signatures may hash into the representation of an expression. same bucket number, but in practice we expect each bucket to contain a small number of nodes. Each bucket can be implemented as a linked list as shown in Fig.3.15. Each cell in a linked list represents a node. The bucket headers, consisting of pointers to the first cell in a list, are stored in an array. The bucket number returned by h(op, /, r) is an index into Fig.3.14 Nodes in a dag for i := i + 10 this array of bucket headers. allocated from an array

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission evaluate inherited attributes of m; dfvisit(m) end; evaluate synthesized attributes of n end Fig. 3.16 Depth-first evaluation order for attributes at the nodes of a parse tree

Fig.3.15 Data structure for searching We now introduce a class of syntax- buckets directed definitions, called L-attributed - In many compilers, nodes are allocated definitions, whose attributes can always be as they are needed, to avoid pre- evaluated in depth-first order. (The L is for allocating an array that may hold too "left,” because attribute information appears many nodes most of the time and not to flow from left to right.) Implementation enough nodes some of the time. of progressively larger classes of L.- - In this case, we cannot assume that attributed definitions is covered in the next nodes are in sequential storage, so we three sections of this chapter. L-attributed have to use pointers to refer to nodes. definitions include all syntax-directed" - If the hash function can be made to definitions based on LL(1) grammars; compute the bucket number from a 3.3.5 L-Attributed Definitions label and painters to children, then we can use pointers to nodes instead of A syntax-directed definition is L-attributed value numbers, Otherwise, if each inherited attribute of Xj, l  j n, on we can number the nodes in any way the right side of A X , X ...... X , depends and use this number as the value 1 2 n number of the node. only on - Dags can also be used to represent sets 1. the attributes of the symbols X1, X2 ..... of expressions, since a dag can have Xj-2 to the left of Xj in the production and more than one root 2. the inherited attributes of A. Note that every S-attributed definition is L- 3.3.4 L-Attributed Definitions attributed, because the restrictions (1) and (2) apply only to inherited attributes. - When translation takes place during parsing, the order of evaluation of Example 3.8: The syntax-directed attributes is linked to the order in definition in Fig.3.18 is not I.-attributed which nodes of a parse tree are because the inherited attribute Q.i of the "created" by the parsing method. grammar symbol Q depends on the - A natural order that characterizes many attribute R.s of the grammar symbol to its top-down and bottom-up translation right methods is depth- first order. - A class of syntax-directed definitions, 3.3.6 Bottom-Up Evaluation of S- called L-attributed definitions, whose Attributed Definitions attributes can always be evaluated in  Can be evaluated while parsing depth - first order.  Whenever reduction is made, value of Procedure fvisit(n: node); new synthesized attribute is computed begin from the attributes on the stack for each child m of n, Train left to right  The parser can keep the values of the do begin synthesized attributes associated with the grammar symbols on its stack.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  Whenever a reduction is made, the The following figure shows the sequence of values of the new synthesized attributes moves made by the parser on input are computed from the attributes 3* 5 + 4n. appearing on the stack for the grammar The contents of the state and val fields of symbol on the right side of the reducing the parsing stack are shown after each production. move. Following figure shows an example of a  Consider the sequence of events on parser for one attribute value. seeing the input symbol 3. State Stack value stack - The first move, the parser shifts the ...... state corresponding to the token digit X Xx (whose attribute value is 3) onto the Y Yy stack. Z Zz - On the second move the parser reduces ...... by the production F digit and implements the semantic rule F. val := Fig.3.17 Parser stack with a field for digit. texval. synthesized - On the third move the parser reduces Attributes by T F. No code fragment is associated with this production, so the  The current top of the stack is indicated val array is left unchanged. by the point tap.  Note that after each reduction the top We assume that synthesized attributes of the val stack contains the attribute are evaluated just before each value associated with the left side of the reduction. Suppose the semantic rule reducing production. A.a:=f(X.x, Y.y, Z.z) is associated with the production. Input State Val Production A  XYZ. Used  Before XYZ is reduced to A, the value of 3*5+4n ,, ,, the attribute Z.z is in val[top], that of *5+4n 3 3 Y.y in val[top-1] & that value of X-x in *5+4n F 3 F →digit val[top - 2]. *5+4n T 3 T →F 5+4n T* 3-  If a symbol has no attribute, then the +4n T*5 3.5 corresponding entry is undefined. +4n T*F 3.5 F →digit  After the reduction ,top is decremented +4n T 15 T →T*F by 2, the state covering A is put in state +4n E 15 E→T [ top] (i.e., where X was ) , and the value 4n E+ 15- of the synthesized attribute A.a is put in N E+4 15-4 val [top]. N E+F 15-4 F→digit N E+T 15-4 T→F N E 19 E→E+T Example: Consider the syntax-directed En 19- definition of the desk calculator in the L 19 L→En following figure Production Code Fragment 3.4 Translation Schemes L →En Print(val[top])  A translation scheme is a CFG where E →E1 + T Val[ntop] := val[top-2] + val[top] E →T semantic actions occur within the rhs of T →T1*F Val[top] := val[top-2] × val[top] production T →F  A translation scheme to map infix to F →(E) Val[ntop] := val[top-1] postfix F →digit F.val := digit.lexval

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  A translation scheme is a context-free When designing a translation scheme, We grammar in which attributes are must observe some restrictions to ensure associated with the grammar symbols that an attribute value is available when an and semantic actions enclosed between action refers to it These restrictions, braces {} are inserted within the right motivated by L-attributed definitions, sides of productions. ensure that an action does not refer to an attribute that has not been computed. Fig.3.18 A non-L-attributed syntax The easiest case occurs when only directed definition synthesized attributes are needed. For this case, we can construct the translation We shall use translation schemes in this scheme by creating an chapter as a useful notation for specifying action consisting of an assignment for each translation during parsing. semantic rule, and placing this action at the E → T R end of the right side of the associated R → addop T{print (addop.lexeme)}R1| production. T → num {print (num.val)} For example, the production and semantic Parse tree for 9 – 5 + 2; rule Production Rule Semantic Rule T → T1 * F T.val := T1.val × F.val yield the following production and semantic action T → T1*F {T.val := T1.val × F.val} If we have both inherited and synthesized attributes, we must be more careful: 1. An inherited attribute for a symbol on Fig 3.20 the right side of a production must be  Assume actions are terminal symbols computed in an action before that  Perform depth first order traversal to symbol. obtain 9 - 5 + 2 2. An action must not refer to a  When designing translation scheme , synthesized attribute of a symbol to ensure attribute value is available when the right of the action. referred to 3. A synthesized attribute for the non-  In case of synthesized attribute is terminal on the left can only be arrived it is trival computed after all attributes it references Figure 3.20 shows the parse tree for the have been computed. The action input 9-5+2 with each semantic action computing such attributes can usually attached as the appropriate child of the be placed at the end of the right side of node corresponding to the left side of their the production. production. In effect, we treat actions as In the next two sections, we show how a though they are terminal symbols, a translation scheme satisfying these three viewpoint that is a convenient mnemonic requirements can be implemented by for establishing when the actions are to be generalizations of top-down and bottom-up executed. We have taken the liberty of parsers. showing the actual numbers and additive The following translation scheme does not operator in place of the tokens num and satisfy the first of these three addop. When performed in depth-first requirements. order, the actions in Fig. 3.20 print the S → A1 A2{ A1.in : = 1; A2.in :=2} output 9-5+2. A → a {print(A.in)}

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission We find that the inherited attribute A, in B2 {B.ht:= max(B1.ht, B2.ht)} the second production is not defined when B → {B1.ps:= B.ps} an attempt is made to print its value during Sub {B2.ps:= shrink(B.ps)} a depth-first traversal of the parse tree for B2 {B.ht:= disp(B1.ht, B2.ht} the input string a. That is, a depth-first B →text{B.ht:= text.h × B.ps} traversal starts at S and visits the sub trees for A1 andA2 before the values of A1 and Fig.3.21 Translation scheme constructed A2.in are set. If the action defining the from Fig. 3.20 values of A1.in and A2.in is embedded before the A's on the right side of S→ A1 A2, Following the three requirements given instead of after, then A.in will be defined above. each time print (A.in) occurs. It is always For readability, each grammar symbol in a possible to start with an L-attributed production is written on a separate line syntax-directed- definition and construct a and actions are shown to the right Thus, translation scheme that satisfies the three S → {B.ps:= 10} B {S.ht := B.ht} requirements above. The next example is written as illustrates this construction. Given the S → {B.ps := 10} input B {S.ht:= B.ht} E sub 1 .val Note that actions setting inherited EQN places E, 1, and .val in the relative attributes positions and sizes shown in Fig. 3.19 E1.ps and B2.ps appear just before B1 and B2 Notice that the subscript 1 is printed in a on the right sides of productions. smaller size and font, and is moved down Differences between S-attributed and L- relative to E and .val. attributed definition: S-attributed L-attributed 1. It uses synthesized 1. It uses both synthesized Fig. 3.19 Syntax-directed placement of attributes only and inherited attributes only boxes After putting actions in the right 2. Semantic actions 2. Each inherited attribute place can be placed at the is restricted to inherit Production Semantic Rules end of rules either from parent/left sibling S → B B.ps := 10 3. Attribute can be 3. Semantic actions can be S.ht := B.ht evaluated daring placed anywhere of R.H.S B → B1B2 B1.ps := B.ps "BUP"

B2.ps := B.ps 4. Both BUP, TDP 4. Attributes are evaluated plays role by traversing the parse B.ht := max(B1.ht,B2.ht) tree depth first left to B → B sub B B .ps := B.ps 1 2 1 right B2.ps := shrink (B.ps) Fig.3.22 Bottom-Up Evaluation of B.ht := disp(B1.ht, B2.ht) Inherited B.ht := text.h × B.ps Attributes B → text In this section, we present a method to Fig.3.20 Syntax-directed definition for implement L-attributed definitions in the size and framework of bottom-up parsing. The height of boxes method is capable of handling all L- attributed definitions considered in the S → {B.ps:= 10) previous section in that it can implement B {S.ht:= B.ht} any L-attributed definition based on a B → {B1.ps:= B.ps} LL(1) grammar. It can also implement B1 {B1.ps:= B.ps}

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission many (but not all) L-attributed definitions Since the value of X.s is already on the based on LR(1) grammars. The method is a parser stack before any reductions take generalization of the bottom-up translation place in the sub-tree below Y, this value can technique. be inherited by Y. That is, if inherited attribute Y.i is defined by the copy rule Y.i 3.4.1 Removing Embedding Actions := X.s, then the value X.s can be used where from Translation Schemes Y.i is called for. Examples real p, q, r We needed to embed actions at various Then we show how the value of attribute T. places within the right side. To begin our type can be accessed when the productions discussion of how inherited attributes can for L are applied. The translation scheme be handled bottom up, we introduce a we wish to implement is transformation that makes all embedded actions in a translation scheme occur at the right ends of their productions. The transformation inserts new marker nonterminals generating e into the base grammar. We replace each embedded action by a distinct marker non- terminal M and attach the action to the end of the production M. For example, the translation Fig. 3.23.At each node for L, L.in = T.type scheme E → T R D → T {L.in:= T.type}L R → T {Print (‘+’)} R | - T {Print (‘-‘)} R | ∈ T → int {T.type:= integer} T → num {print(num.val)} T → real {T.type:= real} is transformed using marker nonterminals L → {L1.in:= L.in} M and N into L1, id {addtype (id.entry,L.in)} E → T R L → id {addtype(id.entry, L.in)} R → + T M R | - T N R | ∈ If we ignore the actions in the above T → num {print(num.val)} translation scheme, the sequence of moves M → ∈ {print('+')} made by the parser on the input of fig.3.23 N → ∈ {print('-')} is as in the following Fig.3.24 For clarity, The grammars in the two translation we show the corresponding grammar schemes accept exactly the same language symbol instead of a stack state and the and, by drawing a parse tree with extra actual identifier instead of the token id. nodes for the actions, we can show that the INPUT STATE PRODUCTIO actions are performed in the same order. N USED Actions in the transformed translation Real p, q, r - scheme terminate productions, so they can p, q, r Real be performed just before the right side is p, q, r T T → rel reduced during bottom-up parsing. , q, r Tp , q, r TL L → id 3.4.2 Inheriting Attributes on the , q, r TL, Parser Slack , r TL, q , r TL L → L, id R TL, A bottom-up parser reduces the right side TL, r of production A→XY by removing X and Y TL L → L, id from the top of the parser stack and D D → TL replacing them by Y Suppose X has a Fig.3.24 synthesized attribute X.s.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission The parser stack is implemented as a pair D → L: T of arrays state and val. If state |i| is for T → integer | char grammar symbol X, then vat [i] holds a L → L, id | id synthesized attribute X.s. The contents of Since identifiers are generated by L but the the state array are shown in Fig.3.24 Note type is not in the subtree for L, we cannot that every time the right side of a associate the type with an identifier using production for L is reduced in Fig.3.24 T is synthesized attributes alone. In fact, if in the stack just below the right side. We nonterminal L inherits a type from T to its can use this fact to access the attribute right in the first production, we get a value T type, syntax-directed definition that is not L- The implementation in Fig.3.25 uses the attributed, so translations based on it fact that attribute T type is at a known cannot be done during parsing. place in the val stack, relative to the top. Let A solution to this problem is to restructure top and ntop be the indices of the top entry the grammar to include the type as the last in the stack just before and after a element of the list of identifiers: reduction takes place, respectively. From D → ML the copy rules defining L.in, we know that L → , id L | : T tt3rpe can be used in place of L.in. T → integer | char . When the production L → id is applied, Now, the type can be carried along as a entry is on top of the val stack and T.type is synthesized attribute L.type. As each just below it. identifier is generated by L, its type can be Hence, addtype(val[top], val[top - 1] is entered into the symbol table. equivalent to addtype(id.fntry, T.type). Similarly, since the right side of the 3.4.4 A Difficult Syntax-Directed production L. → L, id has three symbols, Definition T.type appears in val [top -3] when this reduction takes place. The copy rules Algorithm 3.3, for implementing inherited involving L.in are eliminated because the attributes during bottom-up parsing, value of T.type in the stack is used instead. extends to some, but not all, LR grammars. The L-attributed definition in Fig 3.26 is PRODUCTION CODE FRAGMENT based on a simple LR(1) grammar, but A D → TL cannot be implemented, as in, during LR T → int Val[ntop]:= integer parsing. Nonterminal L in L → ∈ inherits T → real Val[ntop] := rwal L → L, id Addtype (val[top], val[top-3]) the count of the number of l's generated by L → id Addtype (val[top], val[top-1]) S. Since the production L → ∈ is the first Fig.3.25 The value of T.type is used in that a bottom-up parser would reduce by, place of L.in the translator at that time cannot know the number of 1's in the input. 3.4.3 Replacing Inherited by Synthesized Attributes PRODUCTION CODE FRAGMENT S → L L.count := 0  It is sometimes possible to avoid the L → L1 1 L1. count := L count + 1 use of inherited attributes by changing L → ∈ Print (L.count) the underlying grammar. Fig.3.26 Ditneult syntax-directed For example, a declaration in Pascal can definition consist of a list of identifiers followed by a type, e.g., m, n; integer. A grammar for such 3.5 Top-Down Translation declarations may include productions of the form In this section, L-attributed definitions will be implemented during predictive parsing.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission We work with translation schemes rather {Rt.i:= Ri – T.val} than syntax-directed definitions so that we A similar action adds 2 to the value of 9-5, can be explicit about the order in which yielding the result R.i - 6 at the bottom actions and attribute evaluations take node for R. The result is needed at the root place. We also extend the algorithm for left- as the value of E.val, the synthesized recursion elimination to translation attribute s for R, not shown in Fig. 3.29, is schemes with synthesized attributes. used to copy the result up to the root. For top-down parsing, we can assume that 3.5.1 Eliminating Left Recursion from a an Translation Scheme action is executed at the time that a symbol in Since most arithmetic operators associate the same position would be expanded. to the left, it is natural to use left-recursive Thus,in grammars for expressions. We now extend the second the algorithm for eliminating left recursion E → T {R.i := T.val} in to allow for attributes when the R {E.val.:= R.s} underlying grammar of a translation R → + scheme is transformed, The transformation T {R1.i := R.i + T.val) applies to translation schemes with R1 {R.s := R1.s} synthesized attributes. The next example R → -- motivates the transformation. T {R1.i:= R.i-T.val} R1 {R.s:= R1.s } Example 3.10: The new scheme produces T → ( the annotated parse tree of Fig. 3.29 for the E expression 9-5+2. The arrows in the figure ) {T.val := E.val} suggest a way of determining the value of T → num {T.val := numval} the expression. E → E1 + T {E.val:= E1.val + T.val} Fig.3.28 Transformed translation E → E1 - T {E.val:= E1.val - T.val} scheme with right-recursive grammar E → T {E.val:= T.val} T→ (E) {T.val:= E.val} T→ num {T.val:= num. val}

Fig. 3.27 Translation scheme with left- recursive grammar In Fig 3.29, the individual numbers are generated by T, and T.val takes its value from the lexical value of the number, given by attribute num.val. The 9 in the sub- expression 9-5 is generated by the leftmost T, bat the minus operator and 5 are generated by the R at the right child of the Fig.3.29 Evaluation of the expression 9- root. 5+2 The inherited attribute R.i obtains the value production in Fig 3.28, the first action 9 from T.val. The subtraction 9-5 and the (assignment to R1.i) is done after T has passing of the result 4 down to the middle been fully expanded to terminals, and the node for R are done by embedding the second action is done after R1 has been following action between T and R1 in R fully expanded. →TR1:

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission An inherited attribute of a symbol must be Fig 3.30 two ways of computing an computed by an action appearing before attribute value the symbol and a synthesized attribute of the non-terminal on the left must be Example 3.11: If the syntax-directed computed after all the attributes it depends definition in Fig.3.30 for constructing syntax on have been computed. trees is converted into a translation scheme, In order to adapt other left-recursive then the productions and semantic actions translation schemes for predictive parsing, for E become: we shall express the use of attributes RA E→E1+T{E.nptr:=mknode(‘+’,E1.nptr, and R.s in Fig. 3.28 more abstractly. T.nptr)} Suppose we have the following translation E→E1-T{E.nptr: = mknode(‘-’, E1.nptr, scheme T.nptr)} A → A1 Y {A.a:= g(A1.a, Y.y) } E → T {E.nptr:= T.nptr} A→ x {A.a := f(X.x) When left recursion is eliminated from this Each grammar symbol has a synthesized translation scheme, nonterminal E attribute written using the corresponding corresponds to A and the strings +T and -T lower case letter, and f and g are arbitrary in the first two productions correspond to functions. The generalization to additional Y; nonterminal T in the third production A-productions and to productions with corresponds to X. The transformed strings in place of symbols X and Y can be translation scheme is shown in Fig. 3.27. done as below. E→ T {R.i := T.nptr} The algorithm for eliminating left recursion R {E.nptr:= R.s} constructs the following grammar R→+ A→ X R T {R1.i := mknode (‘+', R.i , T.nptr} R→ Y R | R1→ - Taking the semantic actions into account, T {R1 { R1.i:= mknode (`-', R.i , T.nptr} the transformed scheme becomes R1 {R.s := R1.s} A→ X {R.i.:= f(X.x)} R → ∈ {R.s := R1.s} R { A.a:= R.s} T → ( R → Y {R1.i:= g{ R.i, Y..y)}. E R1 {R.s := R1.s} ) {T.nptr := E.nptr} R → {r.s := R.i} T → id {T.nptr := mkleaf (id, id.entry)} The transformed scheme uses attributes i T → num {T.nptr := mkleaf (num, num.val)} and s for R, as in Fig.3.28. To see why the result are the same, consider the two Fig.3.30 transformed translation annotated parse trees in Fig.3.30. The value scheme for constructing syntax trees of R.i at the bottom is passed up unchanged as R.s, and it becomes the correct value of Figure 3.30 shows how the actions in A.a at the root (R.s is not shown Fig.3.29 construct a syntax tree for a-4+c. in Fig.3.30 (b)). Synthesized attributes are shown to the right of the node for a grammar symbol, and inherited attributes are shown to the left. A leaf in the syntax tree is constructed by actions associated with the productions T → id and T → num, At the leftmost T, attribute T.nptr points to the leaf for a. A pointer to the node for a is inherited as attribute. R.i on the right side of E →TR.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission When the production R → TR1, is applied at non-terminal has just one synthesized the right child of the root, R.i points to the attribute. The function for A has a local nodefor a, and T.nptr to the node for 4. The variable for each attribute of each node for a-4 is constructed by applying grammar symbol that appears in a mknode to the minus operator and these production for A. pointers. Finally, when production R → ϵ is 2. The non-terminal for A decides what applied, R.i points to the root of the entire production to use based on the current syntax tree. The entire tree is returned input symbol through the s attributes of the nodes for R 3. The code associated with each (not shown in Fig.) until it becomes the production does the following, we value of E.nptr. consider the tokens, non-terminals, and actions on the right side of the production from left to right. i. For token X with synthesized attribute x, save the value of x in the variable declared for X, x. Then generate a call to match token x and advance the input. ii. For non-terminal B, generate an a assignment c := B(b1,b2...bk)with. a Fig.3.32 Use of inherited attributes to function call on the right side, where construct syntax trees b1, b2....bk are the variables for the inherited attributes of band c that is 3.5.2 Design of a Productive Translator the variable for the synthesized attribute of B. The following algorithm generalizes the Iii. For an action, copy the code into the construction of predictive parsing to parser, replacing each reference to implement a translation scheme based on a an attribute by the variable for that grammar suitable for top-down parsing. attribute. Algorithm: Construction of a predictive 3.5.3 Activation Records syntax directed translator.  Information needed by a single- Input: A syntax directed translation execution of a procedure is managed scheme with an underlying grammar using a contiguous block of storage suitable for predictive parsing. called an activation record consisting of the collection of fields shown in Output: Code for a syntax directed Fig.3.38. translator.  Not all languages, nor all compilers use Method: the technique is a modification of all of these fields; often registers can the predictive parsing construction. take the place of one or more of them. 1. For each non-terminal A, construct a  For languages like Pascal and C it is function that has a formal parameter for customary to push the, activation each incited attribute of A and that record of a procedure on the run-time returns the values of the synthesized stack when the procedure is called and attributes of a (possibly as a record, as a to pop the activation record off the pointer to a record with a field for each stack when control returns to the caller. attribute, or using the call-by-reference  The purpose of the fields of an mechanism for passing parameters). activation record is as follows, starting For simplicity, we assume that each from the field for temporaries.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission can be determined at compile time. An 1. Temporary values, such as those arising exception occurs if a procedure may have a in the evaluation of expressions, are local array whose size is determined by the stored in the field for temporaries. value of an actual parameter, available only 2. The field for local data holds data that is when the procedure is called at run time. local to an execution of a procedure. The layout of this field is discussed 3.6 Run Time Environment below. 3. The field for saved machine status holds  When one start running the program information about the state of the then some data is only available at run machine just before the procedure is time, so we must relate the static text of called. This information includes the a program to the actions that must values of the program counter and occur at run time to implement the machine registers that have to be program. restored when control returns from the  We need to understand the relationship procedure. between names and data objects 4. The optional access link is used to refer (address and value). nonlocal data held in other activation  Allocation and de-allocation is managed records. For a language like Fortran by run time support package , which access links are not needed because consists of routines loaded with the nonlocal data is kept in a fixed place. generated target code. Access links, or the related "display"  Each execution of a procedure is mechanism, are needed for Pascal. referred to as an activation of the 5. The optional control link points to the procedure. activation record of the caller.  If the procedure is recursive, several of returned value its activations may be alive at the same actual parameters time. optional control link optional access link 3.6.1 Procedures saved machine status local data  A procedure definition is a declaration Temporaries that associates an identifier (procedure Fig. 3.33 A general activation record name) with a statement (procedure body). 6. The field for actual parameters is used  When a procedure name appears in an by the calling procedure to supply executable statement, it is called at that parameters to the called procedure. We point. show space for parameters in the  Formal parameters are the one that activation record, but in practice appear in declaration. Actual parameters are often passed in machine parameters are the one that appear in registers for greater efficiency. when a procedure is called. 7. The field for the returned value is used by the called procedure to return a 3.6.2 Activation Trees value to the calling procedure. Again, in practice this value is often returned in a  Control flows sequentially register for greater efficiency.  Execution of a procedure starts at the The sizes of each of these fields can be beginning of body determined at the time when a procedure  It returns control to place where is called. In fact, the sizes of almost all fields procedure was called from

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  A tree can be used, called an activation tree, to depict the way control enters - The above activation tree is for a and leaves activations program to sort numbers.  The root represents the activation of - It calls read function(r). Then it calls the main program quick sort function(q (1,9)), which in  Each node represents an activation of turns calls partition function.(p (1,9)), procedure which partitions the array and returns - The node a is parent of b if control flows index 4 and result in further calls as from a to b shown - The node a is to the left of node b if  The flow control of the call quicksort(1 lifetime of a occurs before b Each ,9) would be like execution of a procedure body is - execution begins, referred to as an activation of the - enter readarray procedure. Life time of an activation of - exit readarray a procedure `p' is the sequence of stops - enter quicksort(1,9) during the execution of a procedure - enter partition(1,9) body. If 'a' and 'b' are procedure - exit partition(1,9) activations then their lifetimes are - enter quicksort(1,3) either non-overlapping or nested. A - exit quicksort(1,3) procedure Is recursive if a new - enter quicksort(5,9) activation can begin before an earlier - exit quicksort(5,9) activation of the same procedure has - exit quicksort(1,9) ended. An activation tree depicts the - way control enters & leaves activations. 3.6.4 Control Stacks

3.6.3 Properties of activation trees are:  The flow of control in a program corresponds to a depth first thermal of i. Each node represents an activation of a the activation tree that starts at the procedure root, visits a node before its children, ii. The root shows the activation of the and recursively visits children at each main program. node in left-to-right order. iii. The node for 'a' is the parent of the  We can use a stack, called as control node for 'b' if and only if control flows stack to keep track of live procedure from activation `a' to 'b'. activation. iv. The node 'a' is to the left of the node for  The idea is to push the node for an 'b' if and only if the lifetime of 'a' occurs activation into the control stack as the before the lifetime of 'b’. activation begins and to pop the node The flow of control in a program when the activation ends. corresponds to a depth first traversal of the  Then the contents of the control stack activation tree that starts at the root are related to paths to the root of the activation tree. When node n is at the top of the control stack, the stack contains the nodes along the path from n to the root. 3.6.5 The Scope of a Declaration A declaration associates information with a name, for example, int a. Scope rules determine which declaration of a name

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission applies when the name appears in the text maintain efficient access to appropriate of a program. The portion of the program data and machine resources. to which a declaration applies is called scope of that declaration. There are two 3.6.9 Subdivision of Runtime Memory types of scopes: LOCAL and NON LOCAL. At Runtime storage can be subdivided to hold: compile time, symbol table can be used to i. Target code: static (as its size can be find the declaration that applies to an determined at compile time) occurrence of a name a symbol table entry ii. Static data object: static is created whenever a declaration is iii. Dynamic data object: heap encountered. iv. Automatic data objects: stack Advantage of statically allocating data 3.6.6 Binding of Names object is that their address can be complied into the target code. The same name may denote different data objects (storage locations) at runtime. The Static Allocation term environment refers to a function that maps a name to a storage location and the Statically allocated names are bound to association is referred to as binding. The storage at compile time. Storage bindings of state refers to a function that maps a statically allocated names never change, so storage location to the value held there. even if a name is local to a procedure, its name is always bound to the same storage. The compiler uses the type of a name (retrieved from the symbol table) to determine storage size required. The required number of bytes (possibly aligned) is set aside for the name. The address of the An assignment changes the state but not storage is fixed at compile time. the environment. A binding is the dynamic counterpart of a declaration. A local name Limitations: in a procedure can have different binding in  The size required must be known at each activation of a procedure. compile time. 3.6.7 Static and Dynamic Notations  Recursive procedures cannot be implemented as all locals are statically Static Notation Dynamic Counterpart allocated. Definition of a Activations of a procedure procedure  No data structure can be created Declaration of a name Binding of the name dynamically as all data is static Scope of a declaration Lifetime of a binding Stack-dynamic allocation Cade Static Data  Storage is organized as a stack. Heap ↓  Activation records are pushed and ↑ popped. Stack  Locals and parameters are contained in the activation records for the call. 3.6.8 Storage Organization  This means locals are bound to fresh storage on every call. Most block-structured languages require  If we have a stack growing downwards, manipulation of runtime structures to we just need a stack_top pointer.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission  To allocate a new activation record, we just increase stack_top. Call by reference:  To deallocate an existing activation record, we just decrease stack_top. In call-by-reference evaluation (also referred to as pass-by-reference), a Heap Allocation function receives an implicit reference to a variable used as argument, rather than a Some languages do not have tree- copy of its value. This typically means that structured allocations. In these cases, the function can modify the variable used activations have to be allocated on the as argument. heap. This allows strange situations, like callee activations that live longer than their Example: callers’ activations. This is not common procedure swap(var x, y: integer); Heap is used for allocating space for objects var created at run time. For example: nodes of temp: integer; dynamic data structures such as linked lists begin and trees temp := x; x:= y; 3.6.10 Parameter passing techniques y := temp; Call by value: end;

In call-by-value, the argument expression is NOTE: FORTRAN supports only call by evaluated, and the resulting value is bound reference. If an expression is passed, then it to the corresponding variable in the is evaluated and kept in temporary function (frequently by copying the value location. The address of this temporary into a new memory region). In this location is passed to the procedure and this mechanism a function's argument is location will be changed by the procedure. evaluated before being passed to the Call by copy-restore: function. Call-by-copy-restore, copy-in copy-out, call- NOTE: In Pascal, a parameter is passed by by-value-result or call-by-value-return is a value unless the corresponding formal has special case of call-by-reference. The the keyword var. Expression can be passed formal parameter acts as a local variable by value but not by reference in Pascal. which is initialized to the value of the In C, parameters are passed by value. If actual parameter. Within the routine, variable is preceded by &, then it is pass by changes to the formal parameter only affect reference. the local copy. On returning from the Expression can be passed by value but not routine the final value of the formal by reference in C. parameter is assigned to the actual parameter. Example: The semantics of call-by-copy-restore also procedure swap(x, y: integer); differ from those of call-by-reference var where two or more function arguments temp: integer; alias one another; that is, point to the same begin variable in the caller's environment. Under temp := x; call-by-reference, writing to one will affect x:= y; the other; call-by-copy-restore avoids this y := temp; by giving the function distinct copies, but end; leaves the result in the caller's

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission environment undefined depending on which of the aliased arguments is copied back first. The result of call by reference and call by copy-restore also differs when there is an exception during execution of the function. When the reference is passed to the collie uninitialized, this evaluation strategy may be called call-by-result.

Call by name:

The arguments to a function are not evaluated before the function is called — rather, they are substituted directly into the function body and then left to be evaluated whenever they appear in the function. If an argument is not used in the function body, the argument is never evaluated; if it is used several times, it is re- evaluated each time it appears. If the values of variables in the expression are changed, the change will be seen immediately (similar to call by reference). Algol used call by name parameter passing mechanism.

Call by need:

Call-by-need is a memorized version of call- by-name where, if the function argument is evaluated, that value is stored for subsequent uses. In a "pure" (effect-free) setting, this produces the same results as call-by-name; when the function argument is used two or more times, call-by-need is almost always faster.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission GATE QUESTIONS

Q.1 In a bottom-up evaluation of a a) X = Y + Z syntax directed definition, inherited b) t1 = Y + Z; X = t1 attributes can c) t1 = Y; t2 = t1 + Z; X = t2 a) always be evaluated d) t1 = Y; t2 = Z; t3 = t1 + t2 ; X = t3 b) be evaluated only, if the [GATE- 2003] definition is- L-attributed c) be evaluated only, if the Q.4 Consider the grammar rule E → E1 - definition has synthesized E2 for arithmetic expressions. The attributes code generated is targeted to a CPU d) never be evaluated having a single user register. The [GATE-2003] subtraction operation requires the first operand to be in the register. If Q.2 Consider the translation scheme E1 and E2 do not have any common shown below: sub-expression, in order to get the S → R T shortest possible code. R → + T {print (‘+ ' ;) R|Ɛ a) E1 should be evaluated first T → num {print {num.val);} b) E2 should be evaluated first Here, num is a token that represents c) Evaluation of E1 and E2 should an integer and num.val represents necessarily be interleaved the corresponding integer value. d) Order to evaluation of E1 and E2 For an input string '9 + 5 + 2', this is of no consequence translation scheme will print [GATE -2004] a) 9 + 5 + 2 b) 9 5 + 2 + c) 9 5 2 + + d) + + 9 5 2 Q.5 Consider the following translation [GATE -2003] scheme : S → FR Q.3 Consider the syntax directed R → *E{print (‘*’);R|ε definition shown below E → F + E{print (‘+'); |F S → id = E F → (S)|id {print (id.value);} {gen (id.place = E.place;);} Here, id is a token that represents an E → E1 + E2 {t = newtemp ( ); integer and id value represents, the gen (t = E1.place + E2. Place;); corresponding integer value. For an E.place = t;} input '2*3+4', this translation E → id scheme prints {E.place = id.place;} a) 2 * 3 + 4 b) 2 * + 3 4 Here, gen is a function that c) 2 3 * 4 + d) 2 3 4 + * generates the output code, and [GATE -2006] newtemp is a function that returns the name of a new temporary Q.6 In a simplified computer, the variable on every call. Assume that instructions are OP Ri, Rj ─ Performs t1's are the temporary variable Rj OP Rj and stores the result in names generated by newtemp. For register Ri. OP m, Rj ─ Performs val the statement 'X = Y + Z’, the 3- OP Rj and stores the result in Rj . val address code sequence generated by denotes the content of memory this definition is location m.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission MOV m, Ri ─ Moves the content of memory location m to register Ri MOV Ri, m ─ Moves the content of register Rj to memory location m. The computer has only two registers, and OP is either ADD or SUB. Consider the following basic block: d) None of these t1= a + b [GATE -2011] t2= c + d t3=e – t2 Q.8 Consider the expression tree shown. t4=t1 – t3 Each leaf represents a numerical Assume that all operands are value, which can either be 0 or 1. initially in memory. The final value Over all possible choices of the of the computation should be in values at the leaves, the maximum memory. What is the minimum possible value of the expression number of MOV instructions in the represented by the tree is ___. code generated for this basic block? a) 2 b) 3 c) 5 d) 6 [GATE -2006]

Q.7 Consider two binary operators ↑ and ↓ with the precedence of operator ↓ being lower than that of operator ↑. Operator ↑ is right associative [GATE -2014] while operator ↓ is left associative. Q.9 One of the purposes of using Which one of the following represents intermediate code in compilers is to the parse tree for expression (7 ↓ 3 ↑ a) make parsing and semantic 4 ↑ 3 ↓ 2)? analysis simpler. a) b) improve error recovery and error reporting c) increase the chances of reusing the machine-independent code optimizer in other compliers. d) improve the register allocation. [GATE -2014] Q.10 The attributes of three arithmetic b) operators in some programming language are given below. Operator Precedence Associativity Arity + High Left Binary − Medium Right Binary * Low Left Binary The value of the expression 2 − 5 + 1 − 7 * 3 in this language is______. [GATE -2016]

c)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Q.11 Consider the following Syntax Using the above SDTS, the output Directed Translation Scheme printed by a bottom-up parser, for (SDTS), with non-terminals {S, A} the input aab is: and terminals {a, b}. a) 1 3 2 b) 2 2 3 S → aA {print 1} c) 2 3 1 d) syntax error S → a {print 2} [GATE -2016] A → Sb {print 3}

ANSWER KEY:

1 2 3 4 5 6 7 8 9 10 11 (c) (b) (c) (b) (d) (b) (b) 6 (c) 9 (c)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission EXPLANATIONS

Q.1 (c) Q.3 (c) Syntax directed definitions can be To generate the three addresses, we understood as below: must know the general form is the A syntax-directed definition is a three addresses. generalization of a context-free X: Y op Z grammar in which each grammar Where, X, Y and Z represents symbol has an association with a set name/contents/ compiler generated of attributes. temporaries. This set of attributes for a grammar Op represents operator (fixed or symbol is partitioned into two floating point arithmetic operator or subsets called synthesized and logical operator) inherited attributes of that grammar Representing it is in general form symbol. Each production rule has an taking the operator as +, we get association with a set of semantic X’ + Y + Z rules. Here, Let Y = t1 Semantic rules set up dependencies ⇒ X = t1 + Z between attributes which can be Let X = t2 represented by a dependency graph. ⇒ t2 = t1 + Z This dependency graph determines Therefore, we get the expression the evaluation order of this T1 = Y ; t2 = t1 + Z ; X = t2 semantic rule. Evaluation of a semantic rule defines the value of an Q.4 (b) attribute. But a semantic rule may This is option oriented answer. also have some side effects such as Evaluate each option to get the printing a value. answer. Let's evaluate E1 first. After Example the evaluation we may see that to Symbol T is associated with a evaluate E2 and to bring the synthesized attribute type. Symbol operands in the register for Z. is associated with an inherited subtraction, we have to make move attribute in. store operations. Now, if we evaluate E2 first Q.2 (b) Accordingly, firstly the expression Let’s make a tree representing the E2 is to be evaluated. Now, when E2 values to get the required output. is evaluated, next step is to evaluate E1. Doing this, we obtain E1 as one of the operands in the register and subtraction is performed directly.

Q.5 (d) Let’s construct a parser tree to get the string. For an input '2*3+4', this translation Therefore, the translation scheme scheme prints 234+*. prints 95 + 2.

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission So as per the above tree where Q.6 (b) leaves have been given the values, This how, we obtain 3MOV the maximum possible value of the instructions in the code generated expression represented by the tree for the basic Brock. is 6.

Opertions Variables Explaination Q.9 (c) MOV b,R1 b is kept R1 Intermediate code is machine ADD a,R1 a is added to independent code which makes it whatever is easy to retarget the compiler to present in generate code for newer and R1 i.e. a is different processors. added to b ⇒t1=a+b Q.11 (c) MOV d,R2 d is moved to R2 ADD c,R2 c is added to whatever is present in R2 i.e. c is added to d ⇒t2=c+d SUB e,R2 e-t2=t3 SUB R1,R2 t1-t3 MOV R2,T4 t4=t1-t3

Q.7 (b)

Q.8 (6)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 4 INTERMEDIATE CODE GENERATION

4.1 Introduction intermediate form programming language constructs such as declarations, 4.1.1 An Intermediate Representation assignments, and flow-of-control Intermediate representation (IR) is a statements. For simplicity, we assume that language for an abstract machine (or a the source program has already been language that can be easily evaluated by an parsed and statically checked, as in the abstract machine) organization of Fig.4-.1

 It should not include too much machine specific detail.  It provides a separation between front Fig 4.1 Position of intermediate code and back ends which helps compiler generator portability. Code generation and assignment of temporaries to registers 4.1.2 Intermediate Language are clearly separated from semantic analysis. Syntax trees and postfix notation,  It allow optimization independent of respectively, are two kinds of intermediate the target machine. Intermediate representations. A third, called three- representations are by design more address code, will be used in this chapter. absent and uniform, so optimization The semantic rules for generating three- routings are simpler. address code from common programming language constructs are similar to those for Properties of good intermediate constructing syntax trees or for generating representations: postfix notation.  Convenient to produce in the semantic Graphical representation. of a .syntax tree analysis phase depicts the natural hierarchical structure of  Convenient to translate into code for a source program. the desired target architecture A DAG gives the same information but in a  Each construct must have a clear and more compact way because common sub simple meaning such that optimizing expressions are identified. A syntax tree transformations that rewrite the IR can and DAG for the assignment statement a .= be easily specified and implemented. b*-c + b*-c appears in Fig- 4.2

The benefits of using a machine- independent form are:

1. Retargeting is facilitated; a compiler for a different machine can be created by attaching a back end for the new machine to an existing front end, 2. A machine- independent code optimizer can be applied to the Intermediate Fig. 4.2Graphical representation of representation. a := b*-c + b*-c chapter shows how the syntax-directed Postfix notation is a linear zed methods can be used to translate into representation of syntax tree; it is a list of

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission the nodes of the tree in which a node 2. Assignment instructions of the form appears immediately after its children. The x:=op y, where op is an unary operation. postfix notation for the syntax tree in Fig. Essential unary operations include 4.2 (a) is a b c uminus * b c uminus * + unary minus, logical negation, shift assign operators, and conversion operators that, for example, convert a fixed-point 4.2 Three-Address code number to a floating-point number. 3. Copy statements of the from x:= y where Tree address code is a sequence of the value of y is assigned to x. statements of the general form 4. The unconditional jump goto L. The x: =y op z three-address statement with label L is Where x, y, z are names, constants or the next to be executed. complier generated temporaries; op stands 5. Conditional jumps such as if x relop y for any operator, such as a fixed or floating- goto L. This instruction applies a point arithmetic operator, or a logical relational operator (<, =,> =, etc) to x operator on Boolean-valued data. Note that and y, and executes the statement with no build up arithmetic expressions are label. L next if x Stands in relation relop permitted as there is only one operator on to y. If not, the three-address statement the right side of a statement. following if x relop go to L is executed Thus a source language expression like x + next, as in the usual sequence. y*z might be translated into a sequence. 6. param x and call p, n for procedure calls t1:=y*z and return y, where y representing a t: := X + t1 returned value is optional- Their typical Where t1 & t2 are compiler generated use is as the sequence of three- address temporary names. statements param x1 4.2.1 Type of Three-Address param x2 Statements param xn call p, n  Three-Address codes are a form of IR generated as pan of a call of the similar to assembler for an imaginary procedure p(x1,x2...... xn). The integer machine. n indicating the number of actual  Three-address statements are similar to parameters in "call p, n" is not assembly code. redundant because calls can be nested.  Statements can have symbolic labels Indexed assignments of the form x and there are statements for flow of :=y[i] and x[i] :=y. The first of these sets control. x to the, value in the location i memory  A symbolic label represents the index of units beyond location y. The statement a three-address statement in the array x[i] := y sets the contents of the location holding intermediate code. i units beyond x to the value of y. In  Actual indices can be substituted for the both these instructions, x, y, and i refer labels either by making a separate pass, to data objects. or by using "back patching". 7. Address and pointer assignments of the Each three-address code instruction has form x: = &y, x := *y, and *x := y. The the form. first of these sets the value of x to the x:= y op z location of y. Presumably y is a name, 1. Assignment statements of the form x:= perhaps a temporary, that denotes- an y op z, where op is a binary arithmetic expression with an L-value such as or logical operation. A[i,j], and x is a pointer name or

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission temporary. That is, the r-value of x is the L-value (location) of some -object. The S-attributed definition in Fig.4.3 In the statement x :=*y, presumably y is generates three-address code for a pointer or a temporary whose r- value assignment statements. Given input a :=-b*- is a location. The r-value of x is made c + b*-c, it produces the code in Fig. (a). The equal to the contents of that location. synthesized attribute S.code represents the Finally, *x := y sets the r-value of the three- address code for the assignment S, object pointed by x to the r-value of y. The non-terminal has two attributes: The choice of allowable operators is an The function new temp returns a sequence important issue in the design of an of distinct names t1, t2„..in response to intermediate form. The operator set must successive calls. clearly be rich enough to implement the For convenience, we use the notation gen(x operations in the source language. A small ‘:=’ operator set is easier to implement on a y ‘ + ‘z) in Fig 4.3. to represent the three- new target machine. However, a restricted address statement x :=y + z. instruction set may force the front end to Expressions appearing instead of variables generate long sequences of statements for like x, y, and z are evaluated when passed some scarce language operations. The to gen, and quoted operators or operands, optimizer and code generator may then like +, are taken literally. In practice, three- have to work harder if good code is to be address statements might be sent to an generated. output file, rather than built up into the code attributes. 4.2.2 Syntax-Directed Translation into Flow-of-control statements can be added to Three-Address Code the language of assignments in Fig.4.3 by productions and semantic rules like the  We give below a S-definition generating ones for while statements in the figure 3-address code from assignments. 4.3.1 the code for S→ while E do S1, is  When three-address code is generated, generated using new attributes S.begin and temporary names are made up for the S .after to mark the first statement in the interior nodes of a syntax tree. code for E  The value of non-terminal E on the left PRODUC SEMANTIC RULES TON side of E → E1+E2 will be computed into S → id : = S.code:=E.code||gen(id.plac a new temporary t,In general, the three- E e‘:=’E.place) address code for id := E consists of code E → E1 + E.place := newtemp; E.code to evaluate E into some temporary t, E2 := E1. Code || E2.code|| followed by the assignment id.place := t. gen(E.place ‘:=”E1.place ‘+’  If an expression is a single identifier, E2.place) say y, then y itself holds the value of the E → E1 * E.place := newtemp; E2 E.code := E . Code || expression. 1 E2.code|| We use the following statements gen(E.place 1. S.code represents the 3-address code ‘:=”E1.place ‘*’ E2.place) for the assignment S, E →- E1 E.place := newtemp; 2. E.place, the name that will hold the E.code := E1. Code || value of E, and gen(E.place‘ :=” 3. E.code represents the sequence of 3- ‘uminu’ E1/place) address statements evaluating E. E → (E1) E.place := E1.place; E.code := E1. Code 4. Id.place is the name that holds the value E → id E.place := id.place of id which may be found by consulting E.code := ” the symbol table at id. entry

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Fig. 4.3 Syntax-directed definition to example, associated with the production E produce three-address code for → -E1 assignments. is the semantic rule E.code = Ei_code || ‘uminus’ and the statement following the code for S, In general, the intermediate form produced respectively. These attributes represent by the syntax-directed translations in this labels created by a function new label that chapter can be changed by making similar returns a new label every time it is called. modifications to the semantic rules. Note that S. after becomes the label of the statement that comes after the code for the 4.2.3 Implementations of Three- while statement. We assume that a non- Address Statements zero expression represents true; that is, when the value of E becomes zero, control A three-address statement is an abstract leaves the while statement Expressions form of intermediate code.In a compiler, that govern the flow of control may in these statements can be implemented as general be boolean expressions containing records with fields for the operator and the relational and logical operators. operands. Three such representations are quadruples, triples, and indirect triples.

4.2.4 Quadruples

 A quadruple is a record structure with four fields, which we call op, arg1 arg2, and result.  The op field contains an internal code Fig. 4.3.1 While Statement for the operator.  The three-address statement x := y op z Production Semantic Rules is represented by placing y in arg1, z in S→while E do S.begin:=newtable; arg2, and x in result. Statements with S1 unary operators like x; = -y or x := y do S.after:=newtable; not use arg2. S.code:=gen(S.begin’:’)||  Operators like param use neither arg2 E.code|| nor result. Conditional and unconditional gen(“if E.place ‘=’ ’0’ jumps put the target label in result. The ‘goto’ S.after)|| quadruples in Fig-.4.5(a) are for the S1.code|| assignment a := b*-c + b*-c. gen(‘goto’S.begin)||  The contents of fields arg 1, arg 2, and gen(S.after ‘:’) result are normally pointers to the symbol-table entries for the names Fig.4.4 Semantic rules generating code represented by these fields, If so, for a while statement temporary names must be entered into the symbol table as they are created. Postfix notation can be obtained by 4.2.5 Triples adapting the semantic rules in Fig. 4.3. The postfix notation for an identifier is the To avoid entering temporary names into identifier itself. The rules for the other the symbol table, we might refer to a productions concatenate only the operator temporary value by the position of the after the code for the operands. For statement that computes it. If we do so,

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission three-address statements can be represented The triples in Fig. 4.5(b) correspond to the by records with only three fields: op, arg1 quadruples in Fig. 4.5(a), Note that the and arg2, as in Fig. 4.5(b) copy statement a:= t5is encoded in the op arg 1 arg 2 result triple representation by placing a in the arg1 field and using the operator assign. (0) uminus C t1 A ternary operation like ×[i]; = y requires (1) * B t t 1 2 two entries in the triple structure, as (2) Uminus C t 3 shown in Fig 4.6(a), while x := y[i] is (3) * B t3 t4 naturally represented as two operations in (4) + t2 t4 t5 Fig.4.6(b). (5) := t5 a Op Arg1 Arg2 (a) Quadruples (0) [ ]= x I Op arg1 arg2 (1) assign (0) Y (0) uminus C (a)x[i] := y (1) * B (0) Op Arg1 Arg2 (2) Uminus C (0) [ ]= x I (3) * B (2) (1) assign (0) Y (4) + (1) (3) (b)x := y[i] (5) assign A (4) FigA.6 More triple representation

(b) Triples 4.2.6 Indirect Triples Another implementation of three-address Fig.4.5 Quadruple and triple code that has been considered is that of representations of Three-address listing pointers to triples, rather than statements listing the triples themselves. This  The fields arg1 and arg2, for the implementation is naturally called indirect arguments of op, are either pointers to triples. the symbol table (for programmer- For example, let us use an array statement defined names or constants) or pointers to list pointers to triples in the desired into the triple structure (for temporary order. Then the triples in Fig.4.5 (b) might values). be represented as in Fig.4.7 statement  Since three fields are used, this 0 -14 intermediate code format is known as triples. Except for the treatment of -1 -15 -2 -16 programmer-defined -3 -17 names, triples correspond to the -4 -18 representation of a syntax tree or dag -5 -19 by an array of nodes. Parenthesized numbers represent Op arg1 arg2 -14 uminus c pointers into the triple structure, while -15 * b -14 symbol-table pointers are represented -16 uminus c by the names themselves. In practice, -17 * b -16 the information needed to interpret the -18 + -15 -17 different kinds of entries in the arg1 -19 assign a -18 and arg2 fields can be encoded into the op field or some additional fields. Fig.4.7 Indirect triples representation of three- address statement

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission GATE QUESTIONS

Q.1 Consider the following C code d) x = 4 *5 ⇒ x = 20 is an example segment: of common subexpression For (i=0; i

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission 3) Dynamic allocation of activation Q.9 The least number of temporary records is essential to implement variables required to create a three- recursion. address code in static single 4) Both heap and stack are essential assignment form for the expression to implement recursion. q + r/3 + s – t * 5 + u * v/w is ______. a) 1 and 2 only b) 2 and 3 only [GATE -2015] c) 3 and 4 only d) 1 and 3 only [GATE -2014] Q.10. In the context of abstract-syntax- tree (AST) and control-flow-graph Q.7 Consider the basic block given (CFG), which one of the following is below. TRUE? a = b + c a) In both AST and CFG, let node, N2 c = a + d be the successor of node N1. In d = b + c the input program, the code e =d − b corresponding to N2 is present a = e + b after the code corresponding in The minimum number of nodes and N1. edges present in the DAG b) For any input program, neither representation of the above basic AST nor CFG will contain a cycle block respectively are c) The maximum number of a) 6 and 6 b) 8 and 10 successors of a node in an AST c) 9 and 12 d) 4 and 4 and a CFG depends on the input [GATE -2014] program d) Each node is AST and CFG Q.8 A variable x is said to be live at a corresponds to at most one statement Si in a program if the statement in the input program following three conditions hold [GATE -2015] simultaneously: 1. There exists a statement Sj that Q.11 Consider the intermediate code uses x given below. 2. There is a path from Si to Sj in (1) i = 1 the flow graph corresponding to (2) j=1 the program (3) t1= 5 * i 3. The path has no intervening (4) t2 = t1 + j assignment to x including at Si (5) t3 = 4 * t2 and Sj (6) t4 = t3 (7) a[t4] = -1 (8) j= i+1 (9) if j<=5 goto (3) (10) i = i+1 (11) if I < 5 goto (2) The number of nodes and edges in the control-flow-graph constructed The variables which are live both at for the above code, respectively, are the statement in basic block 2 and at a) 5 and 7 b) 6 and 7 the statement in basic block 3 of the b) 5 and 5 c) 7 and 8 above control flow graph are [GATE -2015] a) p, s, u b) r, s, u Q.12 Consider the following code c) r, u d) q, v segment. [GATE -2015]

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission x = u - t; a) Both G1 and G2 y = x * v; b) Only G1 x = y + w; c) Only G2 y = t - z; d) Neither G1 nor G2 y = x * y; [GATE -2016] The minimum number of total variables required to convert the Q.14 Consider the following intermediate above code segment to static single program in three address code assignment form is ______. p a b [GATE -2016] q p*c p u *v Q.13 A student wrote two context-free q p q grammars G1 and G2 for generating a single C-like array declaration. The Which one of the following dimension of the array is at least corresponds to a static single one. For example, int a[10][3]; assignment form of the above Code? The grammars use D as the start p1  a b p3  a b q p *c q p *c symbol, and use six terminal a) 11 b) 41 symbols int ; id [ ] num. p1  u *v p4  u *v Grammar G1 Grammar G2 q1 p 1 q 1 q5 p 4 q 4

D → int L; D → int L; p1  a b p1  a b

L → id [E L → id E q12 p *c q1  p*c E → num] E → E [num] c ) d) p3  u *v q2  u *v E → num][E E → [num] q2 p 4 q 3 q2  p q Which of the grammars correctly [GATE -2017] generate the declaration mentioned above? ANSWER KEY: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 d b d a a d a c 3 c b 7 a b

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission EXPLANATIONS

Q.1 (d) Q.5 (a) Considering the Options, when we It is given that Size of int is 4B and of evaluate the code, it emerge that char is 1B. The memory is byte there is no scope of common sub addressable. expression elimination in this code. Let the array be declared as Type There is no dead code in given code X[A][B][C] (where Type = int/char segment. and A,B,C are natural numbers). From t0 = i*1024, we conclude that Q.2 (b) B*C*(size of Type) = 1024. Code optimizations are generally From t1 = j*32, we conclude that carried out on intermediate code. C*(size of Type) = 32. This is done to convert the source From t2 = k*4, we conclude that size code to the machine language of the of Type = 4. target machine on the basis of back ⇒ Type = int, and end tools. It enhances the portability ⇒C = 8, and of the compilers to the other target ⇒B = 32. processors. Q.6 (d) Q.3 (d) To implement recursion, activation x = 4x 5 ⇒ x = 20 is not an example record should be implemented by of common sub-expression but it is providing dynamic memory constant folding. In constant folding allocation. This dynamic allocation expression consisting of constants is done from runtime stack. Heap is will be replaced by their final value essential to allocate memory for at compile time, rather than doing data structures at run-time, not for the calculation in run-time. recursion. So statement 1 and 3 are correction. Q.4 (a) Symbol table management is done Q.7 (a) during compitation to store and The given basic block can be retriew the information about rewritten as tokens. Type checking is one of the a=b+c a=b+c check performed during semantic c=a+d c=b+c+d analysis of compitation. Inline d=b+c ⟹ d=b+b+c+d=2b+c+d expansion is a compiler e=d-b e= b +b+c+d- optimization that replaces a =b+c+d a=e+b function call by the body of a=b+b+c+d=2b+c+d respective function. From above simplification it is Dynamic memory allocation is when visible that e is same as c and final an executing program request the value of a is same as d. So the final operating system to give it a block of basic block can be written as main memory, so it is performed follows: during sum time not during a = b + c complete time. c = a + d Option (a) is answer d = 2b + c + d

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission e = c a = d The DAG generated for the above basic block in as

Maximum number of modes and edges in above DAG is (6,6)

Q.10 (c) Optional (A) is not true when CFG contains cycle Option (B) is false as CFG can contain cycle Option (D) is false as a single node can contain block of statements.

Q.14 (b) Considering each option :

 In option (a) p,1 q1 are used again for temporary storage which is not allowed under static single assignment.  In option (c ) in 2nd line it should be

= p1 *c  In option (d) in 2nd line it should be = *c The correct code is

p3  a b

q41 p *c

p4  u *v

q5 p 4 q 4

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission ASSIGNMENT QUESTIONS

Q.1 Cross compiler is a compiler d) If one changes a statement, only a) Which is written in a language that statement needs that is different from the source recompilation. language b) That generates object code for Q.5 The cost of developing a compiler is its host machine. proportional to the c) Which is written in a language a) Complexity of the source that is a same as the source language language b) Complexity of the architecture of d) That runs on one machine but the target machine produces object code for another c) Flexibility of the available machine instruction set d) None of the above Q.2 Incremental-compilers is a compiler a) Which is written in a language Q.6 An ideal compiler should that is different from the source a) Be smaller in size language b) Take less time for compilation b) That generates object code for c) Be written in a high level its host machine. language c) Which is written in a language d) Produce object code that is that is a same as the source smaller in size and executes language faster d) That allows a modified portion of a program to be recompiled? Q.7 An optimizing compiler a) Is optimized to occupy less space Q.3 For which of the following reasons, b) Is optimized to take less time for an interpreter is preferred to a execution compiler? c) Optimizes the code a) It takes less time to execute. d) None of the above b) It is much helpful in the initial stages of program development. Q.8 In a compiler, grouping of c) Debugging can faster and easier. characters into tokens is done by d) It needs less computer resources the a) Scanner Q.4 For which of the following reasons, a b) Parser compiler is preferable to an c) Code generator interpreter? d) code optimizer a) It can generate stand-alone programs that often take less Q.9 whether a given pattern constitutes time for execution. a token or not b) It is much helpful in the initial a) Depend on the source language stages of program development. b) Depends on the target language c) Debugging can be faster and c) Depends on the compiler easier. d) None of the above comments true

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Q.10 A grammar will be meaningless if c) The number of grammar the symbols in the left hand side is a) Terminal set and the non- not greater than the number of terminal set are not disjoint grammar symbols in the right b) Left hand side of a production is hand side a single terminal d) All of the above c) Left hand side of a production has no non-terminal Q.15 If w is a starting of terminals and d) Left hand side of a production A,B are two non-terminal, than has more than two non-terminal which of the following are right linear grammar? Q.11 Which of the following grammars are a) A → Bw not phase-structured? b) A → Bw | w a) Regular c) A → wB | w b) Context-free d) None of the above c) context-sensitive Q.16 If a is a terminals and S, A, B are d) none of the above three non-terminals, then which of the following are regular grammars? Q.12 Which of the following is the most S  A aB | a a) b) general phase-structured grammar? A aS| b B bA | b a) Regular b) Context-free c) A Ba | Bab d) A abB| aB c) context-sensitive d) none of the above Q.17 Representing the syntax by a grammar is advantageous because Q.13 In a context-sensitive grammar, a) It is concise a) Ɛ can’t be the right hand side of b) It is accurate any production c) Automation becomes easy b) Number of grammar symbols on d) Intermediate cade can be the left hand side of a production generated easily and efficiently can’t be greater than the number of grammar symbols on the right Q.18 CFG can be recognized by a hand side a) Push-down automata c) Number of grammar symbols on b) 2-way linear bounded automata the left hand side of a production c) Finite state automata can’t be greater than the number d) None of above of terminals on the right hand side Q.19 CSG can be recognized by a d) Number of grammar symbols on a) Push-down automata the left hand side of a production b) 2-way linear bounded automata can’t be greater than the number c) Finite state automata of non-terminals on the right d) None of above hand side Q.20 Choose the correct statements. Q.14 In a context-free grammar, a) Sentence of a grammar is a a) Ɛ can’t be the right hand side of sentential from without any any production terminals. b) Terminal symbols can’t be b) Sentence of a grammar should present in the left hand side of be derivable from the start state. any production

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission c) Sentence of a grammar should string that can be generated by be frontier of a derivation tree, the given grammar. in which the root node has the b) A given language is ambiguous if start state as the label. no unambiguous grammar exists d) All of the above for it. c) Two different grammars may Q.21 A grammar can have generate the same language. a) A non-terminal A that can’t d) None of the above derive any string of terminals b) A non-terminal A that can be Q.27 Consider the grammar present in any sentential from S → ABC | Abc c) Ɛ as the only symbol on the left BA → AB hand side of a production Bb → bb d) None of the above Ab → ab Aa → aa Q.22 A top-down parser generate Which of the following sentences can be derived by this grammar? a) Left-most derivation a) abc b) aab b) Right-most derivation c) abcc d) abbc c) Right-most derivation in reverse d) Left-most derivation in reverse Q.28 The language generated by the above grammar is the set of all Q.23 A bottom-up parser generates strings, made up of a, b, c such that a) Left-most derivation a) The number of ɑ’ѕ, b’s, and c’s b) Right-most derivation will be equal c) Right-most derivation in reverse b) ɑ’s always precede b’s d) Left-most derivation in reverse c) b’s always precede c’s Q.24 A given grammar is said to be d) The number of ɑ’s b’s and c’s are ambiguous if same and the a’s precede b’s, a) Two or more productions have which precede c’s. the same non-terminal on the Q.29 In an incompletely specified left hand side automata b) A derivation tree has more than a) No edge should be labeled Ɛ one associated sentence b) From any given state, there can’t c) There sentence with more than be any token leading to two one derivation tree different states corresponding to it c) Some states have no transition d) Parenthesis are not present in on some tokens the grammar d) Start state may not be there

Q.25 The grammar E → E + E | E * E | a, is Q.30 The main difference between a a) Ambiguous DFSA and an NDFSA is b) Unambiguous a) In DFSA, Ɛ transition may be c) Ambiguous or not depends on present the given sentence b) In NDFSA, Ɛ transitions may be d) None of the above present c) In DFSA, from any given state, Q.26 Choose the correct sentence there can’t any alphabet leading a) Language corresponding to a to two different states. given grammar, is the set of all

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission d) In NDFSA, from any given state, b) Keeping track of variable there can’t be any alphabet declaration leading to two different states. c) Checking for the correct use of L- values and R-values Q.31 Two finite state machine are said to d) All of the above be equivalent if they a) Have the same number of states Q.37 The graph depicting the inter- b) Have the same number of edges dependencies of the attributes of c) Have the same number of states different nodes in a parse tree is and edges called a d) Recognize the same set of tokens a) Flow graph b) Dependency graph Q.32 Choose the correct answer. c) Karnaugh’s graph FORTRAN is a d) Steffi graph a) Regular language b) context-free language Q.38 Choose the correct statements. c) Context-sensitive language a) Topological sort can be used to d) Turing language obtain an evaluation order of a dependency graph. Q.33 If two finite states machine M and N b) Evaluation order for a are isomorphic, then M can be dependency graph dictates the transformed to N by relabeling order in which the semantic a) The states alone rules are done b) The edges alone c) Code generation depends on the c) Both the states and edges order in which the semantic d) None of above actions are performed. d) Only (a) and (c) are correct Q.34 In a syntax directed translation scheme, if the value of an attribute of Q.39 A syntax tree a node is a function of the value of a) Is another name for a parse tree the attributes of its children, then it b) Is a condensed form of parse tree is called a c) Should not have keywords as a) Synthesized attribute leaves b) inherited attribute d) None of the above c) Canonical attribute d) None of the above Q.40 Syntax directed translation scheme is desirable because Q.35 Synthesized attribute can easily be a) It is based on the syntax simulated by an b) Its description is independent of a) LL grammar anyimplementation b) Ambiguous grammar c) It is easy to modify c) LR grammar d) Only (a) and (c) are correct d) None of the above Q.41 Which of the following is not an Q.36 For which of the following intermediate code form? situations, inherited attribute is a a) Postfix notation natural choice? b) syntax tree a) Evaluation of arithmetic c) Tree address code expressions d) quadruples

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission Q.42 Three address codes can be a) List implemented by b) Search tree a)Indirect triples c) Hash table b) Direct triples d) Self-organizing list c) Quadruples d) none of the above Q.49 Access time of the symbol table will be logarithmic, if it is implemented Q.43 Three address code involves by a a) Exactly 3 addresses a) Linear list b) At the most 3 addresses b) Search tree c) No unary operator c) Hash table d) none of the above d) Self-organizing list

Q.44) Symbol table can be used for Q.50 An ideal compiler should a) Checking type compatibility a) Detect error b) Suppressing duplicate error b) Detect and report error messages c) Detect, report and correct error c) Storage allocation d) none of the above d) none of the above Q.51 Which of the following is not a Q.45 The best way to compare the source of error? different implementations of symbol a) Faulty design specification table is to compare the time b) Faulty algorithm required to c) Compiler themselves a) Add a new name d) None of the above b) Make an inquiry c) Add a new name and make an Q.52 Any transcription error can be inquiry repaired by d) None of above a) Insertion alone b) Deletion alone Q.46 Which of the following symbol table c) Insertion and deletion alone implementation is based on the d) Replacement alone property of locality of reference? a) Linear list Q.53 Hamming distance is a b) Search tree a) Theoretical way of measuring c) Hash table errors d) self-organization list b) Technique for assigning codes to a set of item know to occur with Q.47 Which of the following symbol table a given probability implementation is best suited if c) Technique for optimizing the access time is to be minimum? intermediate code a) Linear list d) None of the above b) Search tree c) Hash table Q.54 Error repair may d) self-organization list a) Theoretical way of measuring error Q.48 Which of the following symbol table b) Technique for assigning codes to implementation, makes efficient use a set items know to accur with a of memory? given probability

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission c) Technique for optimizing the b) Flow graph intermediate code c) DAG d) None of the above d) Hamiltonian graph

Q.55 A parser with the valid prefix Q.61 Reduction in strength means property is advantageous because a) Replacing run time computation a) It detects error as possible by compile time computation b) It detects errors as and when b) Removing loop invariant they computation occur c) Removing common sub- c) It limits the amount of erroneous expressions output passed to the next phase d) Replacing a costly operation by a d) All of the above relatively cheaper one

Q.56 The advantage of panic mode of Q.62 A basic block can be annualized by a error recovery is that a) DAG a) It is simple to implement b) Graph which my involve cycles b) It is very effective c) Flow-graph c) It never gets into an infinite loop d) None of the above d) None of the above Q.63 ud-chaining is useful for Q.57 To recover from an error, the a) Determining whether a operator precedence parser may particular definition is used a) Insert symbols onto the stack anywhere or not b) Insert symbols onto the input b) Constant folding c) Delete symbol from the stack c) Checking whether a variable is d) Delete symbols from the input used, without prior assignment d) None of the above Q.58 Which of the following optimization techniques are typically applied on Q.64) Which of the following concepts can loops be used to identify loops? a) Removal of invariant a) Dominators computation b) Unrolling b) Elimination of induction variables c) Depth first ordering c) Peephole optimization d) none of the above d) Constant folding Q.65 Which of the following are not loop Q.59 The technique of the following run optimization techniques? time computations by compile time a) Jamming computations is called b) Unrolling a) Constant folding c) Induction variable elimination b) Code hoisting d) None of the above c) Peephole optimization d) invariant computation Q.66) Running time of a program depends on the Q.60 The graph that show the basic blocks a) Way the registers are used & their successor relationship is b) Order in which computations are called performed a) Control graph

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission c) Way the addressing modes are d) A (comes in between used d) Usage of machine idioms Q.73 For the previous problem, the precedence relation between) & (will Q.67 du-chaining be a) Stands for use definition chaining a) > b) < b) Is useful for copy propagation c) = d) Undefined removal c) Is useful for induction variable Q.74 LR stands for removal a) Left to right d) None of the above b) Left to right reduction Q.68 Which of the following comments c) Right to left about peep-hole optimization are d) Left to right and right-most true? derivation in reverse a) It is applied to a small part of the code. Q.75 LR parsers are attractive because b) It can be used to optimize a) It can be constructed to intermediate code. recognize CFG corresponding to c) To get the best out of this almost all programming technique, it has to be applied constructs. repeatedly. b) It does not backtrack d) It can be applied to a portion of c) It detects error as and when they the code that is not contiguous. occur d) All of the above Q.69 Shift-reduce parsers are a) Top-down parsers Q.76 Which of the following is the most b) Bottom-up parsers powerful parser? c) May be top-down or bottom-up a) SLR parsers b) LALR d) None of the above c) Canonical LR Q.70 recursive decent parsing is an d) operator-precedence example of a) Top-down parsing Q.77 Choose the correct statements. b) Bottom-up parsing a) There are CFG’s that are not LR c) May be top-or bottom-up parsers b) An ambiguous grammar can d) None of the above never be LR c) An ambiguous grammar can be Q.71 In operator precedence parsing, LR precedence relations are defined d) Any CFG has to be LR a) For all pair of non-terminals b) For all pair of terminals Q.78 YACC builds up c) Predictive parsing a) SLR parsing table d) None of the above b) Canonical LR parsing table c) LALR parsing table Q.72 For parsing arithmetic expressions, d) none of the above Involving (,), + and -, the precedence relation between – and -, will be <, if Q.79 Choose the correct statements. a) We are talking of unary minus a) LL(k) grammar has to be a CFG b) Minus is right associative b) LL(k) grammar has to be c) Minus is left associative

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission unambiguous c) Context-free c) There are LL(k) grammars that d) None of the above are not context free d) LL(k) grammars cannot have left Q.85 The above grammar is used to recursive non-terminals generate all valid arithmetic expression in a hypothetical Q.80 Consider an Ɛ-free CFG. If for every language in which pair of production A→u and a→v a)/ Associates from the left a) If FIRST (u) ∩ FIRST (v) is empty b)– Associates from the left then the CFG has to be LL(1) c)+ Associative from the left b) If the CFG is LL(1) then FIRST d)* Associative from the left (u) ∩ FIRST (v) has to be empty Q.86 The above grammar is used to c) If FIRST (u) ∩ FIRST (v) is empty grammar is used to generate all valid then the CFG cannot be LL(1) arithmetic expression in a d) None of the above hypothetical language in which Q.81 LR (k) grammar a)+Has the highest precedence a) Can only examine a maximum of b)* Has the highest precedence k input symbols c)+Has the highest precedence b) Can be used to identify handles d)/ Has the highest precedence c) Can be used to identify the production associated with a Q.87 Back patching is useful for handling handle a) Conditional jumps d) Covers the LL(k) class b) Unconditional jumps c) Backward references Q.82 The set of all viable prefixes of right d) Forward references sentential from of a given grammar Let x be a string and let A be a non- a) Can be recognized by a finite terminal. FIRSTk (x) is the set of all state machine leading terminal strings of length k b) Cannot be recognized by a finite or less, in the strings derivable from state machine x. FOLLOWk (A) is the set of all c) Can be used to control an LR(k) derivable terminal strings of length parser k or less, that can follow A in some d) None of the above left-most sentential from. The next three questions are Q.83 The ‘k’, in LR (k) cannot be based on the above definition a)0 b) 1 c) 2 d) None of the above Q.88 Consider the grammar E → TE’ The next three questions are based on E → +TE’|Ɛ the following grammar T → FT’ E → E/X|X T’ → *FT’|Ɛ X → T-X|X*T|T F → (E)|id T → T+F|F FIRST1 (E) will be same as that of F → (E)| id a) FIRST1 (T) b) FIRST1 (id stands for identifier) c) FIRST1 (T’) d) All of the above Q.84 This grammar is a) Unambiguous Q.89 FOLLOW1 (F) is b) Ambiguous a) { + , * , ) , $ } b) { + , ) , $ }

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission c) { * , ) , $ } d) { + , ( , ) , * } Q.94 A shift reduce parser carries out the actions specified within braces Q.90 Which of the following remarks immediately after reducing with the logically follow? corresponding rule of grammar a) FIRSTx (Ɛ) ={Ɛ} S → xxW {print“1”} b) If FOLLOWk (A) contains Ɛ, then S → Y {print“2”} A is the start symbol W→ Sz{print “3”} c) If A → w, is a production in the What is the translation of xxxxyzz given grammar G, then FIRSTk using the syntax directed translation (A) contains FIRSTk (w) scheme described by the above d) If A → w, is a production in the rules? given grammar G, then FIRSTk a)23131 b) 11233 (w) contains FIRSTk (A) c) 11231 d) 33211 Q.91 Merging states with a common core may produce __, conflicts but does no Q.95 Which of the following features produce__conflicts in an LALR parser cannot be captured by CFG? a) Reduce-reduce; shift-reduce a) Syntax of if-then-else statements b) Shift-reduce; reduce-reduce b) Syntax of recursive procedures c) Shift-reduce; shift; reduce c) Whether a variable is declared d) None of the above before its d) Matching nested parenthesis Q.92 For a CFG, FOLLOW (A) is the set of all terminals that can immediately Q.96 A language L allow the declaration appear to the right of the non- of arrays whose sizes are not known terminals A in some sentential from, during compilation. It is required to We define two sets LFOLLOW(A) make efficient use of memory. and RFOLLOW(A) by replacing the Which one of the following is true? word sentential by “Left most a) A compiler using static memory sentential” & “Right most sentential” allocation can be written for L. respectively in the definition of b) A compiler cannot be written for FOLLOW(A) L. an interpreter must be used 4 a) FOLLOW (A) and LFOLLOW (A) c) A compiler using dynamic may be different allocation can be written for L. b) FOLLOW (A) and RFOLLOW (A) d) None of the above are always the same c) All the three are same d) All the three are different

Q.93 In a programming language, an identifier is permitted to be a letter followed by any number of letters or digits. If L and D denote the set of letters and digits respectively, which of the following expressions defines an identifier? a) (L U D)+ b)L.(L U D)* c) (L.D)* c) L.(L.D)*

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission ANSWER KEY:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (d) (d) (b) (a) (a) (a) (c) (a) (a) (a) (d) (c) (a) (b) (c) 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 (b) (a) (a) (b) (b) (a) (a) (c) (c) (a) (a) (a) (d) (c) (c) 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 (d) (b) (a) (a) (c) (b) (b) (b) (b) (a) (d) (b) (b) (a) (c) 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 (d) (c) (a) (b) (c) (d) (c) (a) (a) (a) (a) (a) (b) (a) (b) 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 (d) (a) (a) (a) (d) (a) (b) (a) (b) (a) (c) (a) (d) (d) (a) 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 (c) (a) (c) (a) (a) (a) (a) (d) (a) (a) (a) (d) (a) (a) (a) 91 92 93 94 95 96 (a) (a) (b) (a) (c) (c)

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission EXPLANATIONS

Q.25 (a) of the stack, it is popped and Consider the string a+a*a. it can be replaced by the corresponding left derived as hand side of the production. If E→E+E→E+E*E→a+E*E→a+a*E→a+ ultimately we have only the a*a starting non-terminal on the stack, Or when there are no more tokens to E→E*E→E+E*E→a+E*E→a+a*E→a+ be scanned, the parsing will be a*a successful. So, it is bottom-up. Since we know a string that can be derived in more than one way, the Q.94 (a) given grammar is ambiguous. The right most derivation of the string xxxxyzz is, Q.27 (a) S→xxW→xxSz→xxxxWz→xxxxSzz abc can be derived as follows. →xxxxyzz. S → Abc → abc using {Ab → ab} A shift reduce parser, performs the As we see, any production from the right-most derivation in reverse. So, start state has to end in c. So aab is first it reduces the Y to s, by the impossible. Option (c) and (d) also production S → y. As a consequences not possible. of this, 2 is immediately printed. Next, Sz is reduce to W, by the Q.28 (d) production W → Sz. So, 3 will be Generates some of the strings that printed. Proceeding this way, we get can be derived from the start state the output string 23131. and verify that they fall into the category covered by option (d). Q.95 (c) It is because, it is equivalent to Q.47 (c) recognizing wcw, where the first w If memory space is not the is the declaration and the second is constraint, then by increasing the its use. wcw is not a CFG. number of bins to K, the access time can be reduced by a factor of K. so, average number of item in a bin will decrease as the number of bins increases. In the case of list, access time will be proportional to n , the number of items, but we will be using as much memory space as is abQ.utely necessary. In the case of search tree implementation, the access time will be logarithmic.

Q.69 (b) Any shift-reduce parser typically word by shifting entries onto the stack. If a handle is found on the top

© Copyright Reserved by Gateflix.in No part of this material should be copied or reproduced without permission