Statistical Parsing and CKY Algorithm
Instructor: Wei Xu Ohio State University
Many slides from Ray Mooney and Michael Collins TA Office Hours for HW#2
• Dreese 390: - 03/28 Tue 10:00AM-12:00 noon - 03/30 Thu 10:00AM-12:00 noon - 04/04 Tue 10:00AM-12:00 noon Wuwei Lan
• Readings: - textbook http://ciml.info/dl/v0_99/ciml-v0_99-ch17.pdf - slide #28,29.30: https://cocoxu.github.io/courses/ 5525_slides_spring17/15_more_memm.pdf
Syntax
Syntactic Parsing Parsing
• Given a string of terminals and a CFG, determine if the string can be generated by the CFG: - also return a parse tree for the string - also return all possible parse trees for the string
• Must search space of derivations for one that derives the given string. - Top-Down Parsing - Bottom-Up Parsing Simple CFG for ATIS English
Grammar Lexicon S → NP VP Det → the | a | that | this S → Aux NP VP Noun → book | flight | meal | money S → VP Verb → book | include | prefer NP → Pronoun Pronoun → I | he | she | me NP → Proper-Noun Proper-Noun → Houston | NWA NP → Det Nominal Aux → does Nominal → Noun Prep → from | to | on | near | through Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP Parsing Example
S
VP
Verb NP book that flight
book Det Nominal
that Noun
flight Top Down Parsing • Start searching space of derivations for the start symbol. S
NP VP
Pronoun Top Down Parsing
S
NP VP
Pronoun X book Top Down Parsing
S
NP VP
ProperNoun Top Down Parsing
S
NP VP
ProperNoun X book Top Down Parsing
S
NP VP
Det Nominal Top Down Parsing
S
NP VP
Det Nominal X book Top Down Parsing
S
Aux NP VP Top Down Parsing
S
Aux NP VP X book Top Down Parsing
S
VP Top Down Parsing
S
VP
Verb Top Down Parsing
S
VP
Verb
book Top Down Parsing
S
VP
Verb X book that Top Down Parsing
S
VP
Verb NP Top Down Parsing
S
VP
Verb NP
book Top Down Parsing
S
VP
Verb NP
book Pronoun Top Down Parsing
S
VP
Verb NP
book Pronoun X that Top Down Parsing
S
VP
Verb NP
book ProperNoun Top Down Parsing
S
VP
Verb NP
book ProperNoun X that Top Down Parsing
S
VP
Verb NP
book Det Nominal Top Down Parsing
S
VP
Verb NP
book Det Nominal
that Top Down Parsing
S
VP
Verb NP
book Det Nominal
that Noun Top Down Parsing
S
VP
Verb NP
book Det Nominal
that Noun
flight Bottom Up Parsing • Start searching space of reverse derivations from the terminal symbols in the string.
book that flight Bottom Up Parsing
Noun
book that flight Bottom Up Parsing
Nominal
Noun
book that flight Bottom Up Parsing
Nominal
Nominal Noun
Noun
book that flight Bottom Up Parsing
Nominal
Nominal Noun X Noun
book that flight Bottom Up Parsing
Nominal
Nominal PP
Noun
book that flight Bottom Up Parsing
Nominal
Nominal PP
Noun Det
book that flight Bottom Up Parsing
Nominal
Nominal PP
NP
Noun Det Nominal
book that flight Bottom Up Parsing
Nominal
Nominal PP
NP
Noun Det Nominal
book that Noun
flight Bottom Up Parsing
Nominal
Nominal PP
NP
Noun Det Nominal
book that Noun
flight Bottom Up Parsing
Nominal
Nominal PP S
NP VP
Noun Det Nominal
book that Noun
flight Bottom Up Parsing
Nominal
Nominal PP S
NP VP Noun Det Nominal X
book that Noun
flight Bottom Up Parsing
Nominal
Nominal PP X NP
Noun Det Nominal
book that Noun
flight Bottom Up Parsing
NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
VP NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
S
VP NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
S
VP X NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
VP
VP PP NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
VP
VP PP X NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
VP NP NP Verb Det Nominal
book that Noun
flight Bottom Up Parsing
VP NP
Verb Det Nominal
book that Noun
flight Bottom Up Parsing
S
VP NP
Verb Det Nominal
book that Noun
flight Top Down vs. Bottom Up
• Top down never explores options that will not lead to a full parse, but can explore many options that never connect to the actual sentence. • Bottom up never explores options that do not connect to the actual sentence but can explore options that can never lead to a full parse. • Relative amounts of wasted search depend on how much the grammar branches in each direction.
Syntax CYK Algorithm Dynamic Programming Parsing
• CKY (Cocke-Kasami-Younger) algorithm based on bottom-up parsing and requires first normalizing the grammar. • First grammar must be converted to Chomsky normal form (CNF) in which productions must have either exactly 2 non- terminal symbols on the RHS or 1 terminal symbol (lexicon rules). • Parse bottom-up storing phrases formed from all substrings in a triangular table (chart). Dynamic Programming
• a general algorithm design technique for solving problems defined by recurrences with overlapping subproblems • first invented by Richard Bellman in 1950s • “programming” here means “planning” or finding an optimal program, as also seen in the term “linear programming” • Main idea: • setup a recurrence of smaller subproblems • solve subproblems once and record solutions in a table (avoid any recalculation)
ATIS English Grammar Conversion Original Grammar Chomsky Normal Form
S → NP VP S → NP VP S → Aux NP VP S → X1 VP X1 → Aux NP S → VP S → book | include | prefer S → Verb NP S → VP PP NP → Pronoun NP → I | he | she | me NP → Proper-Noun NP → Houston | NWA NP → Det Nominal NP → Det Nominal Nominal → Noun Nominal → book | flight | meal | money Nominal → Nominal Noun Nominal → Nominal Noun Nominal → Nominal PP Nominal → Nominal PP VP → Verb VP → book | include | prefer VP → Verb NP VP → Verb NP VP → VP PP VP → VP PP PP → Prep NP PP → Prep NP
Note that, although not shown here, original grammar contain all the lexical entires. Exercise CKY Parser Book the flight through Houston j= 1 2 3 4 5 i= 0 Cell[i,j] contains all 1 constituents (non-terminals) covering words 2 i +1 through j
3
4 CKY Parser Book the flight through Houston j= 1 2 3 4 5 i= 0 Cell[i,j] contains all 1 constituents (non-terminals) covering words 2 i +1 through j
3
4 CKY Parser
Book the flight through Houston
S, VP, Verb, Nominal, Noun None
NP Det
Nominal, Noun CKY Parser
Book the flight through Houston
S, VP, Verb, Nominal, VP Noun None
NP Det
Nominal, Noun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None
NP Det
Nominal, Noun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None
NP Det
Nominal, Noun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None None
NP Det None
Nominal, Noun None
Prep CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None None
NP Det None
Nominal, Noun None
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None None
NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None None
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP Noun None None VP
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP S Noun None None VP
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston
S, VP, Verb, S Nominal, VP VP S Noun None None VP
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston
S, VP, Verb, S S Nominal, VP VP S Noun None None VP
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston Parse S, VP, Verb, S S Nominal, VP VP Tree S Noun None None #1 VP
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun CKY Parser
Book the flight through Houston Parse S, VP, Verb, S S Nominal, VP VP Tree S Noun None None #2 VP
NP NP Det None
Nominal, Noun None Nominal
Prep PP
NP ProperNoun The Problem with Parsing: Ambiguity
INPUT: She announced a program to promote safety in trucks and vans
+ POSSIBLE OUTPUTS:
S S S S S S
NP VP NP VP NP VP NP VP She Sh e NP VP She NP VP She
announced NP She She announced NP announced NP announced NP
NP VP NP VP a program announced NP announced NP NP VP a program NP PP
to promote NP a progra m to promote NP PP in NP NP VP safety PP safety in NP a program trucks and va ns to promote NP in NP to promote trucks and vans NP safe ty
trucks and vans NP and NP NP and NP va ns vans NP and NP NP VP va ns NP VP sa fety PP a program a program in NP to promote NP PP trucks to promote NP safe ty in NP
trucks safety PP
in NP
trucks
And there are more...
Syntax
Probabilistic Context Free Grammars (PCFG)
split point
O(n3|N|3)
O(n2) for l, i choices
Parsing with Lexicalized CFGs S(saw)
I The new form of grammar looksNP(man) just like a Chomsky normalVP(saw) form CFG, but with potentially O( ⌃ 2 N 3) possible rules. DT(the) | NN(man)| ⇥ | | the man VP(saw) PP(with) I Naively, parsing an n word sentence using the dynamic programming algorithm will take O(n3 ⌃ 2 Vt(saw)N 3) time.NP(dog)But IN(with) NP(telescope) saw with ⌃ can be huge!! | | | | DT(the) NN(dog) DT(the) NN(telescope) | | the dog the telescope 2 3 I Crucial observation: at most O(n N ) rules can be p(t) = q(S(saw)⇥ | | 2 NP(man) VP(saw)) applicable to a given sentence w1,w2,...w! n of length n. q(NP(man) 2 DT(the) NN(man)) This is because any rules which⇥ containq(VP(saw) a lexical! VP(saw) item that PP(with) is ) ⇥ !1 not one of w1 ...wn, can be safelyq(VP(saw) discarded. Vt(saw) NP(dog)) ⇥ !1 q(PP(with) 1 IN(with) NP(telescope)) ⇥5 3 ! I The result: we can parse in O(n ...N ) time. ⇥ | |
Syntax
Dependency Parsing