PARSING AS TREE TRAVERSAL Dale Gerdemann* Seminar Ffir Sprachwissenschaft, Universiti T T Bingen T

Total Page:16

File Type:pdf, Size:1020Kb

PARSING AS TREE TRAVERSAL Dale Gerdemann* Seminar Ffir Sprachwissenschaft, Universiti T T Bingen T PARSING AS TREE TRAVERSAL Dale Gerdemann* Seminar ffir Sprachwissenschaft, Universiti t T bingen t ABSTRACT be eliminated, however, by employing a version of Greibach Normal Form which This paper presents a unified approach is extended to handle argument instan- to parsing, in which top-down, bottom- tiations in definite clause grammars. up and left-corner parsers m:e related The resulting parsers resemble the to preorder, postorder and inorder tree standard Prolog versions of versions of traversals. It is shown that the sim- such parsers. One can then go one step plest bottom-up and left-corner parsers further and partially execute the parser are left recursive and must be con- with respect to a particular grammar--- verted using an extended Greibach nor- as is normally done with definite clause mal form. With further partial exe- gra,,nn~a,'s (Per(,ir~ ~ Warren [JO]). a cution, the bottom-up and left-corner surprising result of this partial execu- parsers collapse togethe~ as in the I]IJP tion is l.ha.t the bottom-up and left- parser of Matsumoto. corner parsers become identical when they are 1)oth partially executed. This 1 INTRODUCTION may explain why the BUP parser of ~/lil.tSllll]OtO eta]. [6] [71 was ,'eferre.d tO In this paper, I present a unified ap- as a bottona-u I) parser even though it proach to parsing, in which top-down, clearly follows a left-corner strategy. bottom-up and left-corner parsers are related to preorder, postorder and in- TREE TRAVERSAL order tree traversals. To some extent, this connection is already clear since PRO G RAM S for each parsing strategy the nodes of Following O'Keefe [8], we can imple- the parse tree are constructed accord- ment i)reorder, postorder and inorder ing to the corresponding tree traversal. tree tra.versals as I)CCs, which will then It is somewhat trickier though, to ac- 1)e converted directly into top-down tually use a tree traversa.l program as ])otl.om-u 1) and heft-corner l)arsers, re- a parser since the resulting pa.rser may spectively. The general schema is: be left recursive. This left recursion can *The research presented in this paper was x ._o r d e r(']'t'ee) --* partially sponsored by Teilprojekt Bd "Con- straints on Grammar for Efficient Generation" (x_ordered node labels in Tree). of the Sonderforschungsbereich 340 of the Deutsche Forschungsgemeinschaft. I wouhl Note tha.t in this case, since we are also like to thank Guido Minnen and Dieter most likely to call x_order with the Martini for helpflfl comments. All mistakes are of course my own. Tree va.riable instantiated, we are us- ?KI. Wilhelmstr. 113, D-72074 T(ibingen, ing the DCG in generation mode rather Germany, [email protected]. tha.n as a parser. When used as a parser 396 on the stringlS , the procedure will re- non-bi uary branching. 1 turn all trees whose x_order traw~rsal produces S. The three, instantiations of % top-down parser this procedure are as ['ollows: td(node(PreTerm,lf(Word))) --> [Word], {word(PreTerm,Word)}. Z preorder traversal td(node(Mother,Left,Right)) --> pre(empty) --> []. {rule(Mother,Left,Right)}, pre(node(Mother,Left,Right)) --> gd(Left), [Mother], td(Right). pre(Left), pre(Right). bottom-up parser bu(node(PreTerm,lf(Word))) --> postorder traversal [Word], post(empty) --> []. {word(PreTerm,Word)}. post(node(Mother,Left,Right)) --> bu(node(Mother,Left,Right)) --> post(Left), bu(Left), post(Right), bu(Right), [Mother]. {rule(Mother,Left,Right)}. inorder traversal Y, left-corner parser in(empty) --> []. ic(node(PreTerm,lf (Word))) --> in(node(Mother,Left,Right)) --> [Word] , in(Left), {word (Pr eTerm, Word) }. [Mother], ic (node (Mother, Left ,Right) ) --> in(Right). ic(Lef%), {rule (Mother, Left, Right) }, 2.1 DIRECT ENCODING OF ic (Right). PARSING STRATEGIES iks seen here the on]y difference be- Analogous to these three tl'aversal pro- tween the t]lree strategies concerns |,he. grams, there are three parsing strage- choice of when to select a phrase struc- gies, which differ from the tree traversal ture rule. 2 Do you start with a. rule and programs in only two respects. First, then try to satisfy it as iu the top-down the base case for a parser should be to apl~roa.ch , or do you parse the (laugh- parse a lexical item rathe,: than to parse t(ers of a. rule. first before selecting the an empty string. And second, in the re- rule as in the bottom-up approach, or cursive clauses, the mother care.gory fits do you l,al(e an inte,'mediate strategy as into the parse tree and is licensed by the in the left-corner al)l)roach. auxiliary predicate rule/3 but it does lq'he only ln'oblematic ease is for left corner not figure into the string that is parsed. since the corresponding tre.e traw~'rsal inorder As was the case for the three tree is normally defined only for bina,'y trees. But traversal programs, the three parsers inorder is easily extended to non-binary trees differ from each other only with respect as follows: i. visit the left daughter in inorder, to the right hand side order. ])'or sim- ii. visit the mot, her, iii. visit the rest; of the. daughters in inorder. plicity, I assume that phrase structure eAs opposed to, say, ~t choice of whether to rules are binary branching, though the use operations of expanding and matching or approach can easily be generalized to operations of shifting and reducing. 397 GREIBACH NORMAL % EGNF bottom-up FORM PARSERS bu(node(PreTerm,lf(Word))) --> [Word], While this approach reflects the logic {word(PreTerm,Word)}. of the top-down, bottom-up and left- bu(Node) --> corner parsers in a clear way, the result- [Word], ing programs are not all usable in Pro- {word(PreTerm,Word)}. log since the bottom-up and the left- b(node(PreTerm,lf(Word)),Node). corner parsers are left-recursive. There exists, however, a general technique for b(L,node(Mother,L,R)) --> removal of left-recursion, namely, con- bu(R), version to Oreibach normal form. The {rule(gother,L,R)}. standard Oreibach normal form conver- b(L,Node) --> sion, however, does not allow for I)CG bu(R), type rules, but we can easily take care {rule(Mother,L,g)}, of the Prolog arguments by a technique b(node(Mother,L,R),Node). suggested by Problem 3.118 of Pereira This, however is not very ef[icient & Shieber [9] to produce what I will since the two clauses of both bu and call Extended Greibach Normal Form b differ only in whether or not there (ECINF). 3 Pereira & Shieber's idea has is a final call to b. ~Ve can reduce been more formally presented in the l.he a.mount of backtracking by encod- Generalized Greibaeh Normal Form of ing this optiolmlity in the b procedure Dymetman ([1] [2]), however, the sim- itself. plicity of the parsers here does not jus- tify the extra complication in Dymet- % Improved EGNF bottom-up man's procedure. Using this transfor- bu(Node) --> mation, the bottom-up parser then be- [Word], comes as follows: 4 {word(PreTerm,Word)}, aEGNF is similar to normal GNF except b(node(PreTerm,lf(Word)),Node). that the arguments attached to non-terminals must be manipulated so that the original in- b(Node,Node) --> []. stantiations are preserved. For specific gram- b(L,Node) --> mars, it is pretty e~y to see that such a ma- bu(R), nipulation is possiMe. It is nmch more dif- tlcult (and beyond the scope of this paper) {rule(Mother,L,R)}, to show that there is a general rule tbr such b(node(Mother,L,R),Node). manipulations. 4The Greibach NF conversion introduces l~y tile same I",GNI: transform~Ltion one auxiliary predicate, which (following and improvement, s, tile resulting left- IIopcroft & Ulhnan [4]) I have called b. Of corner parser is only minimally different course, the GNF conversion also does not tell us what to do with the auxiliary procedures in from the bottom-up parser: curly brackets. What I've done here is silnply to put these auxiliary procedures in the trans- formed grammar in positions corresponding to Improved EGNF Left-corner where they occurred in the original grammar. Ic(Node) --> It's not clear that one can always find such a "corresponding" position, though in the case [Word], of the bottom-up and left-corner parsers such {word(PreTerm,Word)}, a position is easy to identify. b(node(PreTerm,lf(Word)),Node). 398 parsers are described in terms of pairs b(Node,Node) --> []. c)f op(wations such a.s expand/ma(,c]l, b(L,Node) --> shift/reduce or sprout/nlatch, l{tlt it {rule(Mother,L,g)}, is enl, irely unclear wha.(, expa.nding and Xc(R), matching has to do with shifting, re- b(node(Hother,L,R),Node). ducing or sprouting. By relating pars- ing (.o tree tri~versal, however, it b(:- comes much clearer how these three ap- 4 PARTIAL EXECUTION proac]ms 1,o parsing rcbd;e to each other. This is a natural comparison, since The improved ECNF bottom-np altd clearly t, he l)OSSiloh: orders in which a left-corner parsers (lilIhr now only in the tree can be traversed should not dif position of the auxiliary l)redicate in curly brackets. If this auxiliary pred- f(H' frolll the possible orders in which a parse I,ree can be constructed. ~Vhltt's icate is partially executed out with re- new in this paper, however, is tile idea spect to a particular gramlnar, the two gha.(, such tree traversal programs could pltrsers will become identical.
Recommended publications
  • Syntax-Directed Translation, Parse Trees, Abstract Syntax Trees
    Syntax-Directed Translation Extending CFGs Grammar Annotation Parse Trees Abstract Syntax Trees (ASTs) Readings: Section 5.1, 5.2, 5.5, 5.6 Motivation: parser as a translator syntax-directed translation stream of ASTs, or tokens parser assembly code syntax + translation rules (typically hardcoded in the parser) 1 Mechanism of syntax-directed translation syntax-directed translation is done by extending the CFG a translation rule is defined for each production given X Æ d A B c the translation of X is defined in terms of translation of nonterminals A, B values of attributes of terminals d, c constants To translate an input string: 1. Build the parse tree. 2. Working bottom-up • Use the translation rules to compute the translation of each nonterminal in the tree Result: the translation of the string is the translation of the parse tree's root nonterminal Why bottom up? a nonterminal's value may depend on the value of the symbols on the right-hand side, so translate a non-terminal node only after children translations are available 2 Example 1: arith expr to its value Syntax-directed translation: the CFG translation rules E Æ E + T E1.trans = E2.trans + T.trans E Æ T E.trans = T.trans T Æ T * F T1.trans = T2.trans * F.trans T Æ F T.trans = F.trans F Æ int F.trans = int.value F Æ ( E ) F.trans = E.trans Example 1 (cont) E (18) Input: 2 * (4 + 5) T (18) T (2) * F (9) F (2) ( E (9) ) int (2) E (4) * T (5) Annotated Parse Tree T (4) F (5) F (4) int (5) int (4) 3 Example 2: Compute type of expr E -> E + E if ((E2.trans == INT) and (E3.trans == INT) then E1.trans = INT else E1.trans = ERROR E -> E and E if ((E2.trans == BOOL) and (E3.trans == BOOL) then E1.trans = BOOL else E1.trans = ERROR E -> E == E if ((E2.trans == E3.trans) and (E2.trans != ERROR)) then E1.trans = BOOL else E1.trans = ERROR E -> true E.trans = BOOL E -> false E.trans = BOOL E -> int E.trans = INT E -> ( E ) E1.trans = E2.trans Example 2 (cont) Input: (2 + 2) == 4 1.
    [Show full text]
  • Derivatives of Parsing Expression Grammars
    Derivatives of Parsing Expression Grammars Aaron Moss Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada [email protected] This paper introduces a new derivative parsing algorithm for recognition of parsing expression gram- mars. Derivative parsing is shown to have a polynomial worst-case time bound, an improvement on the exponential bound of the recursive descent algorithm. This work also introduces asymptotic analysis based on inputs with a constant bound on both grammar nesting depth and number of back- tracking choices; derivative and recursive descent parsing are shown to run in linear time and constant space on this useful class of inputs, with both the theoretical bounds and the reasonability of the in- put class validated empirically. This common-case constant memory usage of derivative parsing is an improvement on the linear space required by the packrat algorithm. 1 Introduction Parsing expression grammars (PEGs) are a parsing formalism introduced by Ford [6]. Any LR(k) lan- guage can be represented as a PEG [7], but there are some non-context-free languages that may also be represented as PEGs (e.g. anbncn [7]). Unlike context-free grammars (CFGs), PEGs are unambiguous, admitting no more than one parse tree for any grammar and input. PEGs are a formalization of recursive descent parsers allowing limited backtracking and infinite lookahead; a string in the language of a PEG can be recognized in exponential time and linear space using a recursive descent algorithm, or linear time and space using the memoized packrat algorithm [6]. PEGs are formally defined and these algo- rithms outlined in Section 3.
    [Show full text]
  • Parsing 1. Grammars and Parsing 2. Top-Down and Bottom-Up Parsing 3
    Syntax Parsing syntax: from the Greek syntaxis, meaning “setting out together or arrangement.” 1. Grammars and parsing Refers to the way words are arranged together. 2. Top-down and bottom-up parsing Why worry about syntax? 3. Chart parsers • The boy ate the frog. 4. Bottom-up chart parsing • The frog was eaten by the boy. 5. The Earley Algorithm • The frog that the boy ate died. • The boy whom the frog was eaten by died. Slide CS474–1 Slide CS474–2 Grammars and Parsing Need a grammar: a formal specification of the structures allowable in Syntactic Analysis the language. Key ideas: Need a parser: algorithm for assigning syntactic structure to an input • constituency: groups of words may behave as a single unit or phrase sentence. • grammatical relations: refer to the subject, object, indirect Sentence Parse Tree object, etc. Beavis ate the cat. S • subcategorization and dependencies: refer to certain kinds of relations between words and phrases, e.g. want can be followed by an NP VP infinitive, but find and work cannot. NAME V NP All can be modeled by various kinds of grammars that are based on ART N context-free grammars. Beavis ate the cat Slide CS474–3 Slide CS474–4 CFG example CFG’s are also called phrase-structure grammars. CFG’s Equivalent to Backus-Naur Form (BNF). A context free grammar consists of: 1. S → NP VP 5. NAME → Beavis 1. a set of non-terminal symbols N 2. VP → V NP 6. V → ate 2. a set of terminal symbols Σ (disjoint from N) 3.
    [Show full text]
  • Abstract Syntax Trees & Top-Down Parsing
    Abstract Syntax Trees & Top-Down Parsing Review of Parsing • Given a language L(G), a parser consumes a sequence of tokens s and produces a parse tree • Issues: – How do we recognize that s ∈ L(G) ? – A parse tree of s describes how s ∈ L(G) – Ambiguity: more than one parse tree (possible interpretation) for some string s – Error: no parse tree for some string s – How do we construct the parse tree? Compiler Design 1 (2011) 2 Abstract Syntax Trees • So far, a parser traces the derivation of a sequence of tokens • The rest of the compiler needs a structural representation of the program • Abstract syntax trees – Like parse trees but ignore some details – Abbreviated as AST Compiler Design 1 (2011) 3 Abstract Syntax Trees (Cont.) • Consider the grammar E → int | ( E ) | E + E • And the string 5 + (2 + 3) • After lexical analysis (a list of tokens) int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • During parsing we build a parse tree … Compiler Design 1 (2011) 4 Example of Parse Tree E • Traces the operation of the parser E + E • Captures the nesting structure • But too much info int5 ( E ) – Parentheses – Single-successor nodes + E E int 2 int3 Compiler Design 1 (2011) 5 Example of Abstract Syntax Tree PLUS PLUS 5 2 3 • Also captures the nesting structure • But abstracts from the concrete syntax a more compact and easier to use • An important data structure in a compiler Compiler Design 1 (2011) 6 Semantic Actions • This is what we’ll use to construct ASTs • Each grammar symbol may have attributes – An attribute is a property of a programming language construct
    [Show full text]
  • Hardware Pattern Matching for Network Traffic Analysis in Gigabit
    Technische Universitat¨ Munchen¨ Fakultat¨ fur¨ Informatik Diplomarbeit in Informatik Hardware Pattern Matching for Network Traffic Analysis in Gigabit Environments Gregor M. Maier Aufgabenstellerin: Prof. Anja Feldmann, Ph. D. Betreuerin: Prof. Anja Feldmann, Ph. D. Abgabedatum: 15. Mai 2007 Ich versichere, dass ich diese Diplomarbeit selbst¨andig verfasst und nur die angegebe- nen Quellen und Hilfsmittel verwendet habe. Datum Gregor M. Maier Abstract Pattern Matching is an important task in various applica- tions, including network traffic analysis and intrusion detec- tion. In modern high speed gigabit networks it becomes un- feasible to search for patterns using pure software implemen- tations, due to the amount of data that must be searched. Furthermore applications employing pattern matching often need to search for several patterns at the same time. In this thesis we explore the possibilities of using FPGAs for hardware pattern matching. We analyze the applicability of various pattern matching algorithms for hardware imple- mentation and implement a Rabin-Karp and an approximate pattern matching algorithm in Endace’s network measure- ment cards using VHDL. The implementations are evalu- ated and compared to pure software matching solutions. To demonstrate the power of hardware pattern matching, an example application for traffic accounting using hardware pattern matching is presented as a proof-of-concept. Since some systems like network intrusion detection systems an- alyze reassembled TCP streams, possibilities for hardware TCP reassembly combined with hardware pattern matching are discussed as well. Contents vii Contents List of Figures ix 1 Introduction 1 1.1 Motivation . 1 1.2 Related Work . 2 1.3 About Endace DAG Network Monitoring Cards .
    [Show full text]
  • Compiler Construction
    UNIVERSITY OF CAMBRIDGE Compiler Construction An 18-lecture course Alan Mycroft Computer Laboratory, Cambridge University http://www.cl.cam.ac.uk/users/am/ Lent Term 2007 Compiler Construction 1 Lent Term 2007 Course Plan UNIVERSITY OF CAMBRIDGE Part A : intro/background Part B : a simple compiler for a simple language Part C : implementing harder things Compiler Construction 2 Lent Term 2007 A compiler UNIVERSITY OF CAMBRIDGE A compiler is a program which translates the source form of a program into a semantically equivalent target form. • Traditionally this was machine code or relocatable binary form, but nowadays the target form may be a virtual machine (e.g. JVM) or indeed another language such as C. • Can appear a very hard program to write. • How can one even start? • It’s just like juggling too many balls (picking instructions while determining whether this ‘+’ is part of ‘++’ or whether its right operand is just a variable or an expression ...). Compiler Construction 3 Lent Term 2007 How to even start? UNIVERSITY OF CAMBRIDGE “When finding it hard to juggle 4 balls at once, juggle them each in turn instead ...” character -token -parse -intermediate -target stream stream tree code code syn trans cg lex A multi-pass compiler does one ‘simple’ thing at once and passes its output to the next stage. These are pretty standard stages, and indeed language and (e.g. JVM) system design has co-evolved around them. Compiler Construction 4 Lent Term 2007 Compilers can be big and hard to understand UNIVERSITY OF CAMBRIDGE Compilers can be very large. In 2004 the Gnu Compiler Collection (GCC) was noted to “[consist] of about 2.1 million lines of code and has been in development for over 15 years”.
    [Show full text]
  • Advanced Parsing Techniques
    Advanced Parsing Techniques Announcements ● Written Set 1 graded. ● Hard copies available for pickup right now. ● Electronic submissions: feedback returned later today. Where We Are Where We Are Parsing so Far ● We've explored five deterministic parsing algorithms: ● LL(1) ● LR(0) ● SLR(1) ● LALR(1) ● LR(1) ● These algorithms all have their limitations. ● Can we parse arbitrary context-free grammars? Why Parse Arbitrary Grammars? ● They're easier to write. ● Can leave operator precedence and associativity out of the grammar. ● No worries about shift/reduce or FIRST/FOLLOW conflicts. ● If ambiguous, can filter out invalid trees at the end. ● Generate candidate parse trees, then eliminate them when not needed. ● Practical concern for some languages. ● We need to have C and C++ compilers! Questions for Today ● How do you go about parsing ambiguous grammars efficiently? ● How do you produce all possible parse trees? ● What else can we do with a general parser? The Earley Parser Motivation: The Limits of LR ● LR parsers use shift and reduce actions to reduce the input to the start symbol. ● LR parsers cannot deterministically handle shift/reduce or reduce/reduce conflicts. ● However, they can nondeterministically handle these conflicts by guessing which option to choose. ● What if we try all options and see if any of them work? The Earley Parser ● Maintain a collection of Earley items, which are LR(0) items annotated with a start position. ● The item A → α·ω @n means we are working on recognizing A → αω, have seen α, and the start position of the item was the nth token. ● Using techniques similar to LR parsing, try to scan across the input creating these items.
    [Show full text]
  • Lecture 3: Recursive Descent Limitations, Precedence Climbing
    Lecture 3: Recursive descent limitations, precedence climbing David Hovemeyer September 9, 2020 601.428/628 Compilers and Interpreters Today I Limitations of recursive descent I Precedence climbing I Abstract syntax trees I Supporting parenthesized expressions Before we begin... Assume a context-free struct Node *Parser::parse_A() { grammar has the struct Node *next_tok = lexer_peek(m_lexer); following productions on if (!next_tok) { the nonterminal A: error("Unexpected end of input"); } A → b C A → d E struct Node *a = node_build0(NODE_A); int tag = node_get_tag(next_tok); (A, C, E are if (tag == TOK_b) { nonterminals; b, d are node_add_kid(a, expect(TOK_b)); node_add_kid(a, parse_C()); terminals) } else if (tag == TOK_d) { What is the problem node_add_kid(a, expect(TOK_d)); node_add_kid(a, parse_E()); with the parse function } shown on the right? return a; } Limitations of recursive descent Recall: a better infix expression grammar Grammar (start symbol is A): A → i = A T → T*F A → E T → T/F E → E + T T → F E → E-T F → i E → T F → n Precedence levels: Nonterminal Precedence Meaning Operators Associativity A lowest Assignment = right E Expression + - left T Term * / left F highest Factor No Parsing infix expressions Can we write a recursive descent parser for infix expressions using this grammar? Parsing infix expressions Can we write a recursive descent parser for infix expressions using this grammar? No Left recursion Left-associative operators want to have left-recursive productions, but recursive descent parsers can’t handle left recursion
    [Show full text]
  • Top-Down Parsing & Bottom-Up Parsing I
    Top-Down Parsing and Introduction to Bottom-Up Parsing Lecture 7 Instructor: Fredrik Kjolstad Slide design by Prof. Alex Aiken, with modifications 1 Predictive Parsers • Like recursive-descent but parser can “predict” which production to use – By looking at the next few tokens – No backtracking • Predictive parsers accept LL(k) grammars – L means “left-to-right” scan of input – L means “leftmost derivation” – k means “predict based on k tokens of lookahead” – In practice, LL(1) is used 2 LL(1) vs. Recursive Descent • In recursive-descent, – At each step, many choices of production to use – Backtracking used to undo bad choices • In LL(1), – At each step, only one choice of production – That is • When a non-terminal A is leftmost in a derivation • And the next input symbol is t • There is a unique production A ® a to use – Or no production to use (an error state) • LL(1) is a recursive descent variant without backtracking 3 Predictive Parsing and Left Factoring • Recall the grammar E ® T + E | T T ® int | int * T | ( E ) • Hard to predict because – For T two productions start with int – For E it is not clear how to predict • We need to left-factor the grammar 4 Left-Factoring Example • Recall the grammar E ® T + E | T T ® int | int * T | ( E ) • Factor out common prefixes of productions E ® T X X ® + E | e T ® int Y | ( E ) Y ® * T | e 5 LL(1) Parsing Table Example • Left-factored grammar E ® T X X ® + E | e T ® ( E ) | int Y Y ® * T | e • The LL(1) parsing table: next input token int * + ( ) $ E T X T X X + E e e T int Y ( E ) Y * T e e e rhs of production to use 6 leftmost non-terminal E ® T X X ® + E | e T ® ( E ) | int Y Y ® * T | e LL(1) Parsing Table Example • Consider the [E, int] entry – “When current non-terminal is E and next input is int, use production E ® T X” – This can generate an int in the first position int * + ( ) $ E T X T X X + E e e T int Y ( E ) Y * T e e e 7 E ® T X X ® + E | e T ® ( E ) | int Y Y ® * T | e LL(1) Parsing Tables.
    [Show full text]
  • Comprehensive Examinations in Computer Science 1872 - 1878
    COMPREHENSIVE EXAMINATIONS IN COMPUTER SCIENCE 1872 - 1878 edited by Frank M. Liang STAN-CS-78-677 NOVEMBER 1078 (second Printing, August 1979) COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UNIVERSITY COMPUTER SC l ENCE COMPREHENSIVE EXAMINATIONS 1972 - 1978 b Y the faculty and students of the Stanford University Computer Science Department edited by Frank M. Liang Abstract Since Spring 1972, the Stanford Computer Science Department has periodically given a "comprehensive examination" as one of the qualifying exams for graduate students. Such exams generally have consisted of a six-hour written test followed by a several-day programming problem. Their intent is to make it possible to assess whether a student is sufficiently prepared in all the important aspects of computer science. This report presents the examination questions from thirteen comprehensive examinations, along with their solutions. The preparation of this report has been supported in part by NSF grant MCS 77-23738 and in part by IBM Corporation. Foreword This report probably contains as much concentrated computer science per page as any document In existence - it is the result of thousands of person-hours of creative work by the entire staff of Stanford's Computer Science Department, together with dozens of h~ghly-talented students who also helped to compose the questions. Many of these questions have never before been published. Thus I think every person interested in computer science will find it stimulating and helpful to study these pages. Of course, the material is so concentrated it is best not taken in one gulp; perhaps the wisest policy would be to keep a copy on hand in the bathroom at all times, for those occasional moments when inspirational reading is desirable.
    [Show full text]
  • Parse Trees (=Derivation Tree) by Constructing a Derivation
    CFG: Parsing •Generative aspect of CFG: By now it should be clear how, from a CFG G, you can derive strings wÎL(G). CFG: Parsing •Analytical aspect: Given a CFG G and strings w, how do you decide if wÎL(G) and –if so– how do you determine the derivation tree or the sequence of production rules Recognition of strings in a that produce w? This is called the problem of parsing. language 1 2 CFG: Parsing CFG: Parsing • Parser A program that determines if a string w Î L(G) Parse trees (=Derivation Tree) by constructing a derivation. Equivalently, A parse tree is a graphical representation it searches the graph of G. of a derivation sequence of a sentential form. – Top-down parsers • Constructs the derivation tree from root to leaves. • Leftmost derivation. Tree nodes represent symbols of the – Bottom-up parsers grammar (nonterminals or terminals) and • Constructs the derivation tree from leaves to tree edges represent derivation steps. root. • Rightmost derivation in reverse. 3 4 CFG: Parsing CFG: Parsing Parse Tree: Example Parse Tree: Example 1 E ® E + E | E * E | ( E ) | - E | id Given the following grammar: Lets examine this derivation: E Þ -E Þ -(E) Þ -(E + E) Þ -(id + id) E ® E + E | E * E | ( E ) | - E | id E E E E E Is the string -(id + id) a sentence in this grammar? - E - E - E - E Yes because there is the following derivation: ( E ) ( E ) ( E ) E Þ -E Þ -(E) Þ -(E + E) Þ -(id + id) E + E E + E This is a top-down derivation because we start building the id id parse tree at the top parse tree 5 6 1 CFG: Parsing CFG: Parsing S ® SS | a | b Parse Tree: Example 2 Rightmost Parse Tree: Example 2 abÎ L( S) derivation S Þ SS Þ Sb Þ ab S S S S S S S S Derivation S S S S Derivation S S S S Trees S S Trees a a b S S b a b S S S Leftmost S derivation S Þ SS Þ aS Þ ab Rightmost Derivation a b S S in Reverse 7 8 CFG: Parsing CFG: Parsing Practical Parsers Example 3 Consider the CFG grammar G • Language/Grammar designed to enable deterministic (directed S ® A and backtrack-free) searches.
    [Show full text]
  • Technical Correspondence Techniques for Automatic Memoization with Applications to Context-Free Parsing
    Technical Correspondence Techniques for Automatic Memoization with Applications to Context-Free Parsing Peter Norvig* University of California It is shown that a process similar to Earley's algorithm can be generated by a simple top-down backtracking parser, when augmented by automatic memoization. The memoized parser has the same complexity as Earley's algorithm, but parses constituents in a different order. Techniques for deriving memo functions are described, with a complete implementation in Common Lisp, and an outline of a macro-based approach for other languages. 1. Memoization The term memoization was coined by Donald Michie (1968) to refer to the process by which a function is made to automatically remember the results of previous compu- tations. The idea has become more popular in recent years with the rise of functional languages; Field and Harrison (1988) devote a whole chapter to it. The basic idea is just to keep a table of previously computed input/result pairs. In Common Lisp one could write: 1 (defun memo (fn) "Return a memo-function of fn." (let ((table (make-hash-table))) #'(lambda (x) (multiple-value-bind (val found) (gethash x table) (if found val (setf (gethash x table) (funcall fn x))))))). (For those familiar with Lisp but not Common Lisp, gethash returns two values, the stored entry in the table, and a boolean flag indicating if there is in fact an entry. The special form multiple-value-bind binds these two values to the symbols val and found. The special form serf is used here to update the table entry for x.) In this simple implementation fn is required to take one argument and return one value, and arguments that are eql produce the same value.
    [Show full text]