Syntax Analysis

Syntax Analysis Amitabha Sanyal (www.cse.iitb.ac.in/~as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay September 2007 College of Engineering, Pune Syntax Analysis: 2/124 Syntax Analysis { Recap A syntax analyzer or parser • Ensures that the input program is well-formed by attemting to group tokens according to certain rules. This is called syntax checking. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 2/124 Syntax Analysis { Recap A syntax analyzer or parser • Ensures that the input program is well-formed by attemting to group tokens according to certain rules. This is called syntax checking. • May actually create the hierarchical structure that arises out of such grouping. This information is required by subsequent phases. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 4/124 Syntax Analysis { Example fundef fname params compound-stmt identifier ( ) f vdecl slist g main type varlist ; . main () { int varlist , var int i,sum; sum = 0; var identifier for (i=1; i<=10; i++); sum = sum + i; identifier sum printf("%d\n",sum); } i Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 6/124 Syntax Analysis { Recap • To check whether a program is well-formed requires a specification of what is a well-formed program. 1. the specification be precise. 2. the specification be complete. Must cover all the syntactic details of the language 3. the specification must be convenient to use by both language designer and the implementer A context free grammar meets these requirements. • How is the hierarchical structure of the program represented? 1. Using a data structure called a parse tree. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 8/124 Syntax Analysis How are parsers constructed ? • Till early seventies, parsers (in fact all of the compiler) were written manually. • A better understanding of parsing algorithms has resulted in tools that can automatically generate parsers. • Examples of parser generating tools: I Yacc/Bison: Bottom-up (LALR) parser generator I Antlr: Top-down (LL) scanner cum parser generator. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 10/124 Syntax Analysis Interface of a parser with the rest of the compiler get next Source Lexical token Parser Rest of the Program analyser compiler token Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 12/124 Specification of Syntax by Context Free Grammars Informal description of variable declarations in C: • starts with integer or real as the first token. • followed by one or more identifier tokens, separated by token comma • followed by token semicolon Question: Can the list of identifier tokens be empty? declaration ! type idlist ; idlist ! id j idlist , id type ! integer j real Illustrates the usefulness of a formal specification. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 14/124 Context Free Grammar A CFG G is formally defined to have four components (N; T ; S; P): 1. T is a finite set of terminals. 2. N is a finite set of nonterminals. 3. S is a special nonterminal ( from N ) called the start symbol. 4. P is a finite set of production rules of the form such as A! α, ∗ where A is from N and α from (N S T ) declaration ! type idlist ; start symbol idlist ! id j idlist , id terminals type ! integer j real non-terminals production Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 16/124 Derivation Example : G = (flistg; fid; ;g; list; flist ! list; id; list ! idg) A derivation is traced out as follows list ) list, id ) list, id, id ) id, id, id • The transformation of a string of grammar symbols by replacing a non-terminal by the corresponding right hand side of a production is called a derivation. • The set of all possible terminal strings that can be derived from the start symbol of a CFG is the language generated by the CFG. This grammar generates a list of one or more ids separated by commas. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 18/124 Why the Term Context Free ? Why the term context free ? 1. The only kind of productions permitted are of the form non-terminal ! sequence of terminals and non-terminals 2. Rules are used to replace an occurrence of the lhs non-terminal by its rhs. The replacement is made regardless of the context (symbols surrounding the non-terminal). Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 20/124 Notational Conventions Symbol type Convention single terminal letters a, b, c, operators delimiters, keywords single nonterminal letters A, B, C and names such as declaration , list and S is the start symbol single grammar symbol X , Y , Z (symbol from fN [ T g ) string of terminals letters x , y , z string of grammar symbols α, β, γ null string Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 22/124 Formal Definitions Let • A ! γ be a production rule • α A β be a string of grammar symbols • I Replacing the nonterminal A in α A β yields α γ β. I Formally, this is stated as α A β derives α γ β in one step. I Symbolically α A β ) α γ β. ∗ ∗ • α1 ) α2 means α1 derives α2 in zero or more steps. Clearly α ) α is always true for any α. + • α1 ) α2 means α1 derives α2 in one or more steps. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 24/124 FOrmal Definitions • The language L(G) generated by a context free grammar G is + defined as fw j S =) w; w 2 T ∗g. Strings in L(G) are called sentences of G. ∗ ∗ • A string α, α 2 (N S T ) , such that S =) α, is called a sentential form of G. • Two grammars are equivalent,if they generate the same language. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 26/124 Basic Concepts in Parsing • For constructing a derivation, there are choices at each sentential form. I choice of the nonterminal to be replaced I choice of a rule corresponding to the nonterminal. • Instead of choosing the nonterminal to be replaced, in an arbitrary fashion, it is possible to make an uniform choice at each step. I replace the leftmost nonterminal in a sentential form I replace the rightmost nonterminal in a sentential form The corresponding derivations are known as leftmost and rightmost derivations respectively. • Given a sentence w of a grammar G, there are several distinct derivations for w. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 28/124 Parse Trees A parse tree is a pictorial form of depicting a derivation. 1. root of the tree is labeled with S 2. each leaf node is labeled by a token or by 3. an internal node of the tree is labeled by a nonterminal 4. if an internal node has A as its label and the children of this node from left to right are labeled with X1; X2; : : : ; Xn then there must be a production A ! X1X2 : : : Xn where Xi is a grammar symbol. Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T E + T E + T F T F id F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T E + T F T F id F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T ) T + T + T E + T F T F id F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T ) T + T + T ) F + T + T E + T F T F id F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T ) T + T + T ) F + T + T E + T F ) id + T + T T F id F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T ) T + T + T ) F + T + T E + T F ) id + T + T T F id ) id + F + T F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T ) T + T + T ) F + T + T E + T F ) id + T + T T F id ) id + F + T ) id + id + T F id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id The parse tree: Leftmost derivation: E E ) E + T ) E + T + T E + T ) T + T + T ) F + T + T E + T F ) id + T + T T F id ) id + F + T ) id + id + T F id ) id + id + F id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id Leftmost derivation: The parse tree: E E ) E + T ) E + T + T E + T ) T + T + T ) F + T + T E + T F ) id + T + T ) id + F + T T F id ) id + id + T ) id + id + F F id ) id + id + id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 30/124 Illustration E ! E + T j T T ! T ∗ F j F F ! (E) j id Leftmost derivation: The parse tree: Rightmost derivation: E ) E + T E E ) E + T ) E + T + T ) E + F ) T + T + T E + T ) E + id ) F + T + T ) E + T + id E + T F ) id + T + T ) E + F + id ) id + F + T ) E + id + id T F id ) id + id + T ) T + id + id ) id + id + F F id ) F + id + id ) id + id + id ) id + id + id id Amitabha Sanyal IIT Bombay College of Engineering, Pune Syntax Analysis: 32/124 Derivations and Parse Trees The following summarize some interesting relations between the two concepts • Parse tree filters out the choice of replacements made in the sentential forms.

Load more