The CYK Algorithm
자연어처리 The CYK Algorithm • The membership problem: – Problem: • Given a context-free grammar G and a string w – G = (V, ∑ ,P , S) where » V finite set of variables » ∑ (the alphabet) finite set of terminal symbols » P finite set of rules » S start symbol (distinguished element of V) » V and ∑ are assumed to be disjoint – G is used to generate the string of a language – Question: • Is w in L(G)? The CYK Algorithm
• J. Cocke • D. Younger, • T. Kasami
– Independently developed an algorithm to answer this question. The CYK Algorithm Basics
– The Structure of the rules in a Chomsky Normal Form grammar
– Uses a “dynamic programming” or “table-filling algorithm” Chomsky Normal Form
• Normal Form is described by a set of conditions that each rule in the grammar must satisfy • Context-free grammar is in CNF if each rule has one of the following forms: – A BC at most 2 symbols on right side – A a, or terminal symbol – S λ null string where B, C Є V – {S} Construct a Triangular Table
• Each row corresponds to one length of substrings –Bottom Row – Strings of length 1 –Second from Bottom Row – Strings of length 2 . . –Top Row – string ‘w’ Construct a Triangular Table
•Xi, i is the set of variables A such that
A wi is a production of G
•Compare at most n pairs of previously computed sets:
(Xi, i , Xi+1, j ), (Xi, i+1 , Xi+2, j ) … (Xi, j-1 , Xj, j ) e.g. i=1, j=5
(X1,1 , X2,5 ), (X1,2 , X3,5 ) … (X1,4 , X5,5 )
Construct a Triangular Table
X1, 5
X1, 4 X2, 5
X1, 3 X2, 4 X3, 5
X1, 2 X2, 3 X3, 4 X4, 5
X1, 1 X2, 2 X3, 3 X4, 4 X5, 5
w1 w2 w3 w4 w5 Table for string ‘w’ that has length 5 Construct a Triangular Table
X1, 5
X1, 4 X2, 5
X1, 3 X2, 4 X3, 5
X1, 2 X2, 3 X3, 4 X4, 5
X1, 1 X2, 2 X3, 3 X4, 4 X5, 5
w1 w2 w3 w4 w5
Looking for pairs to compare Example CYK Algorithm
• Show the CYK Algorithm with the following example: – CNF grammar G • S NP VP • NP DET NP | NP NP | time | flies | arrow • VP VP NP | VP PP | flies | like • PP P NP • DET an • P like – w is "time flies like an arrow" – Question Is "time flies like an arrow" in L(G)? Constructing The Triangular Table
{NP} (X1,1) {NP,VP} (X2,2) {VP,P} (X3,3) {DET} (X4,4) {NP} (X5,5)
time (1) flies (2) like (3) an (4) arrow (5)
Calculating the Bottom ROW: Xi, i Constructing The Triangular Table
{NP,S} (X1,2) {S} (X2,3) {} (X3,4) {NP} (X4,5) {NP} {NP,VP} {VP,P} {DET} {NP}
time (1) flies (2) like (3) an (4) arrow (5)
X1,2: (X1,1 , X2,2 ) X3,4: (X3,3 , X4,4 )
X2,3: (X2,2 , X3,3 ) X4,5: (X4,4 , X5,5 ) Constructing The Triangular Table
{S} (X1,3) {} (X2,4) {VP,PP} (X3,5) {NP,S} {S} {} {NP} {NP} {NP,VP} {VP,P} {DET} {NP}
time (1) flies (2) like (3) an (4) arrow (5)
X1,3: (X1,1 , X2,3 ), (X1,2 , X3,3 )
X3,5: (X3,3 , X4,5 ), (X3,4 , X5,5 ) Constructing The Triangular Table
{} (X1,4) {S,VP} (X2,5) {S} {} {VP,PP} {NP,S} {S} {} {NP} {NP} {NP,VP} {VP,P} {DET} {NP} time (1) flies (2) like (3) an (4) arrow (5)
X1,4: (X1,1 , X2,4 ), (X1,2 , X3,4 ) , (X1,3 , X4,4 )
X2,5: (X2,2 , X3,5 ), (X2,3 , X4,5 ) , (X2,4 , X5,5 ) Constructing The Triangular Table
{S} (X1,5) {} {S,VP} {S} {} {VP,PP} {NP,S} {S} {} {NP} {NP} {NP,VP} {VP,P} {DET} {NP}
time (1) flies (2) like (3) an (4) arrow (5)
X1,5: (X1,1 , X2,5 ), (X1,2 , X3,5 ) , (X1,3 , X4,5 ) , (X1,4 , X5,5 )
CYK algorithm: Pseudocode let the input be a string S consisting of n characters: a1 ... an. let the grammar contain r nonterminal symbols R1 ... Rr. This grammar contains the subset Rs which is the set of start symbols. let P[n,n,r] be an array of booleans. Initialize all elements of P to false. for each i = 1 to n
for each unit production Rj ai set P[i,1,j] = true for each i = 2 to n -- Length of span for each j = 1 to n-i+1 -- Start of span for each k = 1 to i-1 -- Partition of span
for each production RA RB RC if P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true if any of P[1,n,x] is true (x is iterated over the set s, where s are all the indices for Rs) then S is member of language else S is not member of language