Uninformed Search

Parsing • A grammar describes syntactically legal strings in a language 8 Parsing • A recogniser simply accepts or rejects strings • A generator produces strings • A parser constructs a parse tree for a string • Two common types of parsers: –bottom-up or data driven –top-down or hypothesis driven • A recursive descent parser easily implements a top-down parser for simple grammars S Top down vs. bottom up parsing Top down vs. bottom up parsing • The parsing problem is to connect the root • Both are general techniques that can be made to work node S with the tree leaves, the input for all languages (but not all grammars!) • Top-down parsers: starts constructing • Recall that a given language can be described by the parse tree at the top (root) and move A = 1 + 3 * 4 / 5 down towards the leaves. Easy to implement by several grammars hand, but requires restricted grammars. E.g.: • Both of these grammars describe the same language - Predictive parsers (e.g., LL(k)) E -> E + Num E -> Num + E • Bottom-up parsers: build nodes on the bottom of E -> Num E -> Num the parse tree first. Suitable for automatic parser generation, handles larger class of grammars. E.g.: • The first one, with it’s left recursion, causes problems for top down parsers – shift-reduce parser (or LR(k) parsers) • For a given parsing technique, we may have to transform the grammar to work with it Parsing complexity Top Down Parsing Methods • How hard is the parsing task? How to we measure that? • Simplest method is a full-backup, recur- • Parsing an arbitrary CFG is O(n3) -- it can take time propor- sive descent parser tional the cube of the number of input symbols • This is bad! (why?) • Often used for parsing simple languages • If we constrain the grammar somewhat, we can always parse • Write recursive recognizers (subroutines) in linear time. This is good! (why?) for each grammar rule • Linear-time parsing • LL(n) : Left to right, – LL parsers Leftmost derivation, –If rules succeeds perform some action • Recognize LL grammar look ahead at most n (i.e., build a tree node, emit code, etc.) symbols. • Use a top-down strategy • LR(n) : Left to right, –If rule fails, return failure. Caller may – LR parsers Right derivation, try another choice or fail • Recognize LR grammar look ahead at most n symbols. –On failure it “backs up” • Use a bottom-up strategy 1 Top Down Parsing Methods: Problems Recursive Decent Parsing: Example For the grammar: • When going forward, the parser consumes tokens from the input, so what happens if <term> -> <factor> {(*|/)<factor>}* we have to back up? We could use the following recursive – suggestions? descent parsing subprogram (this one is • Algorithms that use backup tend to be, in written in C) general, inefficient void term() { factor(); /* parse first factor*/ – There might be a large number of possibilities while (next_token == ast_code || to try before finding the right one or giving up next_token == slash_code) { lexical(); /* get next token */ • Grammar rules which are left-recursive factor(); /* parse next factor */ } lead to non-termination! } Problems Left-recursive grammars • Some grammars cause problems for top down parsers • A grammar is left recursive if it has • Top down parsers do not work with left- rules like recursive grammars X -> X – E.g., one with a rule like: E -> E + T • Or if it has indirect left recursion, as in – We can transform a left-recursive grammar into X -> A one which is not A -> X • A top down grammar can limit backtracking if it only has one rule per non-terminal • Q: Why is this a problem? – The technique of rule factoring can be used to –A: it can lead to non-terminating eliminate multiple rules for a non-terminal recursion! Direct Left-Recursive Grammars Elimination of Direct Left-Recursion • Consider left-recursive • Concretely • Consider grammar T -> T + id E -> E + Num S S T-> id E -> Num S -> • T generates strings • S generates strings id • We can manually or automatically id+id rewrite a grammar removing left- id+id+id … recursion, making it ok for a top-down … • Rewrite using right- recursion parser. • Rewrite using right- T -> id recursion T -> id T S S’ S’ S’| 2 General Left Recursion Summary of Recursive Descent • The grammar S A | • Simple and general parsing strategy A S – Left-recursion must be eliminated first – … but that can be done automatically is also left-recursive because S + S • Unpopular because of backtracking – Thought to be too inefficient where + means “can be rewritten in one or more steps” • In practice, backtracking is eliminated by • This indirect left-recursion can also be further restricting the grammar to allow us automatically eliminated (not covered) to successfully predict which rule to use Predictive Parsers Predictive Parsers • That there can be many rules for a non-terminal • An important class of predictive parser only makes parsing hard peek ahead one token into the stream • A predictive parser processes the input stream • An LL(k) parser, does a Left-to-right parse, a typically from left to right Leftmost-derivation, and k-symbol lookahead – Is there any other way to do it? Yes for programming languages! • Grammars where one can decide which rule • It uses information from peeking ahead at the to use by examining only the next token are upcoming terminal symbols to decide which LL(1) grammar rule to use next • LL(1) grammars are widely used in practice • And always makes the right choice of which rule – The syntax of a PL can usually be adjusted to to use enable it to be described with an LL(1) grammar • How much it can peek ahead is an issue Predictive Parser Remember… Example: consider the grammar • Given a grammar and a string in the language defined by the grammar … S if E then S else S S begin S L • There may be more than one way to derive the string S print E leading to the same parse tree L end – It depends on the order in which you apply the rules An S expression starts either with L ; S L – And what parts of the string you choose to rewrite next E num = num an IF, BEGIN, or PRINT token, and an L expression start with an • All of the derivations are valid END or a SEMICOLON token, • To simplify the problem and the algorithms, we often and an E expression has only one production. focus on one of two simple derivation strategies – A leftmost derivation – A rightmost derivation 3 LL(k) and LR(k) parsers Predictive Parsing and Left Factoring • Two important parser classes are LL(k) and LR(k) • Consider the grammar Even if left recursion is • The name LL(k) means: E T + E E T removed, a grammar – L: Left-to-right scanning of the input T int may not be parsable – L: Constructing leftmost derivation T int * T with a LL(1) parser – k: max # of input symbols needed to predict parser action T ( E ) • The name LR(k) means: • Hard to predict because – L: Left-to-right scanning of the input – For T, two productions start with int – R: Constructing rightmost derivation in reverse – For E, it is not clear how to predict which rule to use – k: max # of input symbols needed to select parser action • Must left-factor grammar before use for predictive • A LR(1) or LL(1) parser never need to “look ahead” parsing more than one input token to know what parser • Left-factoring involves rewriting rules so that, if a non- production rule applies terminal has > 1 rule, each begins with a terminal Left-Factoring Example Using Parsing Tables Add new non-terminals X and Y to factor out • LL(1) means that for each non-terminal and token common prefixes of rules there is only one production • Can be represented as a simple table E T + E E T X – One dimension for current non-terminal to expand E T – One dimension for next token T int X + E – A table entry contains one rule’s action or empty if error T int * T X • Method similar to recursive descent, except T ( E ) T ( E ) – For each non-terminal S – We look at the next token a T int Y – And chose the production shown at table cell [S, a] For each non-terminal the • Use a stack to keep track of pending non-terminals revised grammar, there is either Y * T only one rule or every rule Y • Reject when we encounter an error state, accept when begins with a terminal or we encounter end-of-input E T X LL(1) Parsing Table Example LL(1) Parsing Table Example X + E | T ( E ) | int Y Left-factored grammar •Consider the [E, int] entry Y * T | E T X – “When current non-terminal is E & next input int, use production E T X” X + E | – It’s the only production that can generate an int in next place T ( E ) | int Y •Consider the [Y, +] entry Y * T | – “When current non-terminal is Y and current token is +, get rid of Y” – Y can be followed by + only in a derivation where Y End of input symbol •Consider the [E, *] entry – Blank entries indicate error situations The LL(1) parsing table – “There is no way to derive a string starting with * from non-terminal E” int * + ( ) $ E T X T X X + E int * + ( ) $ T int Y ( E ) E T X T X Y * T X + E T int Y ( E ) Y * T 4 LL(1) Parsing Algorithm LL(1) Parsing Example Stack Input Action E $ int * int $ pop();push(T X) initialize stack = <S $> and next T X $ int * int $ pop();push(int Y) repeat int Y X $ int * int $ pop();next++ case stack of Y X $ * int $ pop();push(* T) <X, rest> : if T[X,*next] = Y1…Yn * T X $ * int $ pop();next++ then stack <Y1… Yn rest>; T X $ int $ pop();push(int Y) else error (); int Y X $ int $ pop();next++; <t, rest> : if t == *next ++ Y X $ $ pop() then stack <rest>; X $ $ pop() else error (); $ $ ACCEPT! until stack == < > E TX int * + ( ) $ X +E E T X T X (1) next points to the next input token X where: T (E) X + E (2) X matches some non-terminal T int Y T int Y ( E ) (3) t matches some terminal Y *T Y Y * T Constructing Parsing Tables Computing First Sets • No table entry can be multiply defined Definition: First(X) = {t| X*t}{|X*} • If A , where in the line of A do we place Algorithm sketch (see book for details): ? 1.

Uninformed Search

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support