Problems with Parsing as Search Problems with Parsing as Search The CYK Algorithm The CYK Algorithm 1 Problems with Parsing as Search Grammar Restructuring Chart Parsing: the CYK Algorithm Complexity Informatics 2A: Lecture 16 2 The CYK Algorithm Parsing as Dynamic Programming The CYK Algorithm Bonnie Webber Visualizing the Chart School of Informatics Properties of the Algorithm University of Edinburgh [email protected] Reading: 30 October 2009 J&M (2nd ed), ch. 13, Section 13.3 – 13.4 NLTK Book, ch. 8 (Analyzing Sentence Structure), Section 8.4 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 1 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 2 Problems with Parsing as Search Grammar Restructuring Problems with Parsing as Search Grammar Restructuring The CYK Algorithm Complexity The CYK Algorithm Complexity Grammar Restructuring Left Recursion But grammars for natural human languages should be revealing, allowing strings to be associated with meaning (semantics) in a Deterministic parsing (Lectures 14 and 15) aims to address a systematic way. Re-structuring the grammar may destroy this. limited amount of local ambiguity – the problem of not being able to decide uniquely which grammar rule to use next in a Example: (Indirectly) left-recursive rules are needed in English. left-to-right analysis of the input string, even if the string is not NP → DET N globally ambiguous. NP → NPR By re-structuring the grammar, the parser can make a unique DET → NP ’s decision, based on a limited amount of look-ahead. These rules generate NPs with possesive modifiers such as: Recursive Descent parsing (Lecture 13) demands grammar restructuring, in order to eliminate left-recursive rules that can get John’s sister it into a hopeless loop. John’s mother’s sister John’s mother’s uncle’s sister John’s mother’s uncle’s sister’s niece Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 3 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 4 Problems with Parsing as Search Grammar Restructuring Problems with Parsing as Search Grammar Restructuring The CYK Algorithm Complexity The CYK Algorithm Complexity Left Recursion Complexity Tree structures for possessives: Recursive descent parsing demonstrates other problems with NP NP NP “parsing as search”: DET N DET N DET N 1 Structural ambiguity in the grammar and lexical ambiguity in NP NP NP the words can lead the parser down a wrong path. 2 So the same sub-tree may be built several times: whenever a NPR DET N DET N path fails, the parser undoes its work, backtracks and starts NP NP John 's sister mother 's sister uncle 's sister again. DET N NPR The complexity of this blind backtracking is exponential in the NP John 's mother 's worst case because of repeated re-analysis of the same sub-string. NPR The next 4 lectures will look at other ways of handling ambiguity: John 's Chart parsing: using the parser alone; We don’t want to re-structure our grammar rules just to be able to Probabilistic Grammars: using both grammar and parser. use a particular approach to parsing. Need an alternative. Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 5 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 6 Parsing as Dynamic Programming Parsing as Dynamic Programming Problems with Parsing as Search The CYK Algorithm Problems with Parsing as Search The CYK Algorithm The CYK Algorithm Visualizing the Chart The CYK Algorithm Visualizing the Chart Properties of the Algorithm Properties of the Algorithm Dynamic Programming Parsing as Dynamic Programming Dynamic Programming: With a CFG, a parser should be able to avoid re-analyzing Given a problem, systematically fill a table of solutions to sub-strings because the analysis of any sub-string is independent of sub-problems: this is called memoization. the rest of the parse. Once solutions to all sub-problems have been accumulated, solve the overall problem by composing them.. The dog saw a man in the park For parsing, the sub-problems are analyses of sub-strings, and they NP NP NP are memoized in a chart (aka well-formed substring table, WFST). PP Each analysis may correspond to: The parser’s exploration of its search space can exploit this a complete constituent (sub-tree) that has been found, independence if the parser uses dynamic programming. indexed by the start and end of the sub-strings that it covers; Dynamic programming is the basis for all chart parsing algorithms. a hypothesis about what complete constituent might be found, indexed by the start and end of the sub-strings that support it; Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 7 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 8 Parsing as Dynamic Programming Parsing as Dynamic Programming Problems with Parsing as Search The CYK Algorithm Problems with Parsing as Search The CYK Algorithm The CYK Algorithm Visualizing the Chart The CYK Algorithm Visualizing the Chart Properties of the Algorithm Properties of the Algorithm Depicting a WFST/Chart Depicting a WFST as a Matrix A well-formed substring table (aka chart) can be depicted as either 1 2 3 4 5 6 a matrix or a graph. Both contain the same information. 0 V When a WFST (aka chart) is depicted as a matrix: 1 Prep PP Rows and columns of the matrix correspond to the start and end positions of a span (ie, starting right before the first word, 2 Det NP ending right after the final one); A cell in the matrix corresponds to the sub-string that starts 3 N at the row index and ends at the column index. It can contain information about the type of constituent (or constituents) 4 that span(s) the substring, pointers to its sub-constituents, 5 and/or predictions about what constituents might follow the substring. 0 See 1 with 2 a 3 telescope 4 in 5 hand 6 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 9 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 10 Parsing as Dynamic Programming Parsing as Dynamic Programming Problems with Parsing as Search The CYK Algorithm Problems with Parsing as Search The CYK Algorithm The CYK Algorithm Visualizing the Chart The CYK Algorithm Visualizing the Chart Properties of the Algorithm Properties of the Algorithm Depicting a WFST as a Graph Algorithms for Chart Parsing When a WFST (aka chart) is depicted as a graph: nodes/vertices represent positions in the text string, starting before the first word, ending after the final word. Important members of the chart parsing family include: arcs/edges connect vertices at the start and the end of a span ◦ the CYK algorithm, which memoizes only constituents; to represent a particular substring. Edges can be labelled with the same information as in a cell in the matrix representation. ◦ three algorithms that memoize both constituents and predictions: PP a bottom-up chart parser a top-down chart parser NP the Earley algorithm Prep Det N with a telescope 1 23 4 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 11 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 12 Parsing as Dynamic Programming Parsing as Dynamic Programming Problems with Parsing as Search The CYK Algorithm Problems with Parsing as Search The CYK Algorithm The CYK Algorithm Visualizing the Chart The CYK Algorithm Visualizing the Chart Properties of the Algorithm Properties of the Algorithm CYK Algorithm Chart Parsing with the CYK Algorithm Let Close(X) = {B | B →* A, using unary productions, and A X} CYK (Cocke, Younger, Kasami) is an algorithm for recognizing and recording constituents in the chart (WFST). Build CYK Chart(t,[w1,. ,wn]) for j ← 1 to n The simplest version of CYK is for a CFG whose rules have at do most two symbols on their RHS. t(j-1, j) ← Close({wj }) We can enter constituent A in cell (i, j) if there is a rule A → B for k ← 1 to n and B is found in cell (i, j), or if A → BC and B is found in cell for j ← k to n (i, k) and C is found in cell (k, j). for m ← 1 to k-1 Proceeding systematically, CYK guarantees that the parser only do looks for rules that use a constituent from i to j after it has t(j-k, j) ← t(j-k, j) ∪ Close({A | A → BC processed all the constituents that end at i. This avoids anything for some B ∈ t(j-k, j-m) and C ∈ t(j-m, j)}) being missed. This algorithm is complete and performs recognition in time O(n3). Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 13 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 14 Parsing as Dynamic Programming Parsing as Dynamic Programming Problems with Parsing as Search The CYK Algorithm Problems with Parsing as Search The CYK Algorithm The CYK Algorithm Visualizing the Chart The CYK Algorithm Visualizing the Chart Properties of the Algorithm Properties of the Algorithm Visualizing the Chart Visualizing the Chart Grammatical rules Lexical rules 1 2 3 4 S → NP VP Det → a | the (determiner) 0 NP → Det Nom N → fish | frogs | soup (noun) NP → Nom Prep → in | for (preposition) Nom → N SRel TV → saw | ate (transitive verb) 1 Nom → N IV → fish | swim (intransitive verb) VP → TV NP Relpro → that (relative pronoun) VP → IV PP 2 VP → IV PP → Prep NP SRel → Relpro VP 3 Nom: nominal (the part of the NP after the determiner, if any). the frogs ate fish SRel: subject relative clause, as in the frogs that ate fish. Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 15 Informatics 2A: Lecture 16 Chart Parsing: the CYK Algorithm 16 Parsing as Dynamic Programming Parsing as Dynamic Programming Problems with Parsing as Search The CYK Algorithm Problems with Parsing as Search The CYK Algorithm The CYK Algorithm Visualizing the
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-