Compiler Construction

CSCI 742 - Compiler Construction Lecture 12 Cocke-Younger-Kasami (CYK) Algorithm Instructor: Hossein Hojjat February 20, 2017 Recap: Chomsky Normal Form (CNF) A CFG is in Chomsky Normal Form if each rule is of the form A BC ! A a ! where a is any terminal • A,B,C are non-terminals • B, C cannot be start variable • We allow the rule S if L ! 2 1 Recap: Conversion to CNF 1. remove unproductive non-terminals (optional) 2. remove unreachable non-terminals (optional) 3. make terminals occur alone on right-hand side 4. reduce arity of every production to 2 ≤ 5. remove -production rules X ! 6. remove unit productions X Y ! 7. unproductive non-terminals 8. unreachable non-terminals 2 Approaches to Parsing Top Down (Goal driven) S Start from the start non-terminal S • Grow parse tree downwards to match • Expr Plus Term the input word Bottom Up (Data Driven) Term num Start from the input word • Build up parse tree whose root is the • start non-terminal S num 3 CYK (Cocke-Younger-Kasami) One of the earliest recognition and parsing algorithms • Bottom-up parsing (starts with words) • Independently developed by J.Cocke, D.Younger, T.Kasami (CYK) • Build solutions compositionally from sub-solutions • Based on a “dynamic programming” approach • 4 CYK algorithm: Basic idea For a grammar G and a word w: For every substring v of length 1, find all non-terminals A such that • 1 A ∗ v ) 1 For every substring v of length 2, find all non-terminals A such that • 2 A ∗ v2 . ) . For the unique substring w of length w , find all non-terminals A such that • j j A ∗ w ) Check whether S belongs to the last set 5 CYK algorithm Standard CYK operates only on context-free grammars in • Chomsky Normal Form (CNF) To illustrate the benefits of CNF we use a non-CNF grammar • 6 Example word: 12.3e+4 Number ! Integer j Real Integer ! Digit j Integer Digit Real ! Integer Fraction Scale Fraction ! : Integer Scale ! e Sign Integer j Digit ! 0 j 1 j · · · j 8 j 9 Sign ! + j − 1 2 . 3 e + 4 7 Grammar has unit-rules: after finding a non-terminal, • we need to repetitively search for other non-terminals that derive the same word Chain of rules: Number Integer Digit • ) ) Example word: 12.3e+4 Number ! Integer j Real Integer ! Digit j Integer Digit Real ! Integer Fraction Scale Fraction ! : Integer Scale ! e Sign Integer j Sign Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Digit Sign ! + j − 1 2 . 3 e + 4 We start with substrings of length 1 • 7 Example word: 12.3e+4 Number ! Integer j Real Integer ! Digit j Integer Digit Real ! Integer Fraction Scale Fraction ! : Integer Scale ! e Sign Integer j Number Number Number Number Integer Integer Integer Sign Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Digit Sign ! + j − 1 2 . 3 e + 4 We start with substrings of length 1 • Grammar has unit-rules: after finding a non-terminal, • we need to repetitively search for other non-terminals that derive the same word Chain of rules: Number Integer Digit 7 • ) ) Example word: 12.3e+4 Number ! Integer j Real Integer ! Digit j Integer Digit Real ! Integer Fraction Scale Fraction ! : Integer Number, Integer Fraction Scale ! e Sign Integer j Number Number Number Number Integer Integer Integer Sign Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Digit Sign ! + j − 1 2 . 3 e + 4 7 Example word: 12.3e+4 Number ! Integer j Real Integer ! Digit j Integer Digit Real ! Integer Fraction Scale Scale Fraction ! : Integer Number, Integer Fraction Scale ! e Sign Integer j Number Number Number Number Integer Integer Integer Sign Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Digit Sign ! + j − 1 2 . 3 e + 4 7 Example word: 12.3e+4 Number ! Integer j Real Integer ! Digit j Integer Digit Number, Real Real ! Integer Fraction Scale Scale Fraction ! : Integer Number, Integer Fraction Scale ! e Sign Integer j Number Number Number Number Integer Integer Integer Sign Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Digit Sign ! + j − 1 2 . 3 e + 4 7 Example word: 12.3e+4 Number ! Integer j Real Number, Real Integer ! Digit j Integer Digit Number, Real Real ! Integer Fraction Scale Scale Fraction ! : Integer Number, Integer Fraction Scale ! e Sign Integer j Number Number Number Number Integer Integer Integer Sign Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Digit Sign ! + j − 1 2 . 3 e + 4 7 Example word: 12.3 Number ! Integer j Real Integer ! Digit j Integer Digit Real ! Integer Fraction Scale Number, Integer Fraction Fraction ! : Integer Number Number Number Scale ! e Sign Integer j Integer Integer Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Sign ! + j − 1 2 . 3 8 Example word: 12.3 Number ! Integer j Real Integer ! Digit j Integer Digit Number, Real Real ! Integer Fraction Scale Number, Real Number, Integer Fraction Scale Fraction ! : Integer Number Number Number Scale ! e Sign Integer j Integer Integer Integer Digit ! 0 j 1 j · · · j 8 j 9 Digit Digit Digit Sign ! + j − 1 2 . 3 We neglected -production rules until now • -rules can be used between any two symbols, at the beginning • and at the end 8 Substring Recognition s1,4 s2,4 word: w = a a a • 1 2 ··· n s1,3 Xi;j set of all non-terminals deriving the s3,4 • substring s s2,3 i;j s1,2 Start non-terminal S derives w (= s ) • 1;n s1,1 s2,2 s3,3 s4,4 if and only if S is a member of X1;n a1 a2 a3 a4 9 Substring Recognition How to check if the RHS of rule A A A A derives s ? • ! 1 2 ··· n i;j A1 A2 An (i < k < j) ai ak 1 ak ak+1 aj ··· − We need to check all the possible matching of substrings with A • i non-terminals Results of checking subproblem solutions are reusable • Subproblem results are computed once and then memoized for later • A1 A2 Matching in CNF: ai ak 1 ak aj ··· − ··· 10 Recognition Table X1,5 X1,4 X2,5 X1,3 X2,4 X3,5 X1,2 X2,3 X3,4 X4,5 X1,1 X2,2 X3,3 X4,4 X5,5 a1 a2 a3 a4 a5 Each row corresponds to one length of substrings Bottom Row: words of length 1 • Second from Bottom Row: words of length 2 • . • Top Row: word w • 11 Example S AB CB ! j A BA a grammar: ! j intput: babab B BC b ! j C AC a ! j B A, C B A, C B b a b a b 12 Example S AB CB ! j A BA a grammar: ! j intput: babab B BC b ! j C AC a ! j A, B S A, B S B A, C B A, C B b a b a b 12 Example S AB CB ! j A BA a grammar: ! j intput: babab B BC b ! j C AC a ! j S S S A, B S A, B S B A, C B A, C B b a b a b 12 Example S AB CB ! j A BA a grammar: ! j intput: babab B BC b ! j C AC a ! j S, A S S S A, B S A, B S B A, C B A, C B b a b a b 12 Example S AB CB ! j A BA a grammar: ! j intput: babab B BC b ! j C AC a ! j S S, A S S S A, B S A, B S B A, C B A, C B b a b a b 12 Exercise What is the time and space complexity of CYK? • • Assume size of input word is n, number of non-terminals is r 13 CYK Problem Parsers usually parse the original grammar without further • modification Grammar reflects the actual structure of language • Conversion to CNF can make it difficult to keep the intended • structure 14.

Compiler Construction

The Earley Algorithm Is to Avoid This, by Only Building Constituents That Are Compatible with the Input Read So Far

LATE Ain't Earley: a Faster Parallel Earley Parser

Lecture 10: CYK and Earley Parsers Alvin Cheung Building Parse Trees Maaz Ahmad CYK and Earley Algorithms Talia Ringer More Disambiguation

A Mobile App for Teaching Formal Languages and Automata

An Earley Parsing Algorithm for Range Concatenation Grammars

Principles and Implementation of Deductive Parsing

Analysis of Cocke-Younger-Kasami and Earley Algorithms on Probabilistic Context-Free Grammar

Syntactic Parsing: Introduction, CYK Algorithm

Parallel Parsing of Context-Free Grammars

Parsing of Context-Free Grammars

An Efficient Context-Free Parsing Algorithm for Natural Languages1

14.5 Supplemental: Context Free Grammars: the CYK Algorithm