TAL Recognition in O(M(N2)) Time
Total Page:16
File Type:pdf, Size:1020Kb
TAL Recognition in O(M(n2)) Time Sanguthevar Rajasekaran Dept. of CISE, Univ. of Florida raj~cis.ufl.edu Shibu Yooseph Dept. of CIS, Univ. of Pennsylvania [email protected] Abstract time needed to multiply two k x k boolean matri- ces. The best known value for M(k) is O(n 2"3vs) We propose an O(M(n2)) time algorithm (Coppersmith, Winograd, 1990). Though our algo- for the recognition of Tree Adjoining Lan- rithm is similar in flavor to those of Graham, Har- guages (TALs), where n is the size of the rison, & Ruzzo (1976), and Valiant (1975) (which input string and M(k) is the time needed were Mgorithms proposed for recognition of Con- to multiply two k x k boolean matrices. text Pree Languages (CFLs)), there are crucial dif- Tree Adjoining Grammars (TAGs) are for- ferences. As such, the techniques of (Graham, et al., malisms suitable for natural language pro- 1976) and (Valiant, 1975) do not seem to extend to cessing and have received enormous atten- TALs (Satta, 1993). tion in the past among not only natural language processing researchers but also al- gorithms designers. The first polynomial 2 Tree Adjoining Grammars time algorithm for TAL parsing was pro- A Tree Adjoining Grammar (TAG) consists of a posed in 1986 and had a run time of O(n6). quintuple (N, ~ U {~}, I, A, S), where Quite recently, an O(n 3 M(n)) algorithm N is a finite set of has been proposed. The algorithm pre- nonterminal symbols, sented in this paper improves the run time is a finite set of terminal symbols disjoint from of the recent result using an entirely differ- N, ent approach. is the empty terminal string not in ~, I is a finite set of labelled initial trees, A is a finite set of auxiliary trees, 1 Introduction S E N is the distinguished start symbol The Tree Adjoining Grammar (TAG) formalism was The trees in I U A are called elementary trees. All introduced by :loshi, Levy and Takahashi (1975). internal nodes of elementary trees are labelled with TAGs are tree generating systems, and are strictly nonterminal symbols. Also, every initial tree is la- more powerful than context-free grammars. They belled at the root by the start symbol S and has belong to the class of mildly context sensitive gram- leaf nodes labelled with symbols from ~3 U {E}. An mars (:loshi, et al., 1991). They have been found auxiliary tree has both its root and exactly one leaf to be good grammatical systems for natural lan- (called the foot node ) labelled with the same non- guages (Kroch, Joshi, 1985). The first polynomial terminal symbol. All other leaf nodes are labelled time parsing algorithm for TALs was given by Vi- with symbols in E U {~}, at least one of which has a jayashanker and :loshi (1986), which had a run time label strictly in E. An example of a TAG is given in of O(n6), for an input of size n. Their algorithm figure 1. had a flavor similar to the Cocke-Younger-Kasami A tree built from an operation involving two other (CYK) algorithm for context-free grammars. An trees is called a derived tree. The operation involved Earley-type parsing algorithm has been given by is called adjunction. Formally, adjunction is an op- Schabes and Joshi (1988). An optimal linear time eration which builds a new tree 7, from an auxiliary parallel parsing algorithm for TALs was given by tree fl and another tree ~ (a is any tree - initial, aux- Palls, Shende and Wei (1990). In a recent paper, iliary or derived). Let c~ contain an internal node m Rajasekaran (1995) shows how TALs can be parsed labelled X and let fl be the auxiliary tree with root in time O(n3M(n)). node also labelled X. The resulting tree 7, obtained In this paper, we propose an O(M(n2)) time by adjoining fl onto c~ at node m is built as follows recognition algorithm for TALs, where M(k) is the (figure 2): 166 S S Initial tree O~ EI S Auxiliary tree G = {{S},{a,b,c,e }, { or}, { ~}, S} b S* Figure 1: Example of a TAG 1. The subtree of a rooted at m, call it t, is excised, and proceeds to find the transitive closure b + of this leaving a copy of m behind. matrix. (If b + is the transitive closure, then Ak E 2. The auxiliary tree fl is attached at the copy of b.$,J+. ¢:~ Ak-~ ai .... aj-1) m and its root node is identifed with the copy Instead of finding the transitive closure by the cus- of m. tomary method based on recursively splitting into disjoint parts, a more complex procedure based on 3. The subtree t is attached to the foot node of fl 'splitting with overlaps' is used. The extra cost in- and the root node of t (i.e. m) is identified with volved in such a strategy can be made almost negligi- the foot node of ft. ble. The algorithm is based on the following lemma This definition can be extended to include adjunc- Lemma : Let b be an n x n upper triangular ma- tion constraints at nodes in a tree. The constraints trix, and suppose that for any r > n/e, the tran- include Selective, Null and Obligatory adjunction sitive closure of the partitions [1 < i,j < r] and constraints. The algorithm we present here can he [n- r < i,j < n] are known. Then the closure of b modified to include constraints. can be computed by For our purpose, we will assume that every inter- nal node in an elementary tree has exactly 2 children. I. performing a single matrix multiplication, and Each node in a tree is represented by a tuple < 2. finding the closure of a 2(n - r) × 2(n - r) up- tree, node index, label >. (For brevity, we will refer per triangular matrix of which the closure of the to a node with a single variable m whereever there partitions[1 < i,j < n- r] and [n- r < i,j < is no confusion) 2(n - r)] are known. A good introduction to TAGs can be found in (Partee, et al., 1990). Proof: See (Valiant, 1975)for details The idea behind (Valiant, 1975) is based on visu- 3 Context Free recognition in alizing Ak E b+j as spanning a tree rooted at the O( M(n)) Time node Ak with l~aves ai through aj-1 and internal nodes as nonterminals generated from Ak according The CFG G = (N,~,P, A1), where to the productions in P. Having done this, the fol- N is a set of Nonterminals {A1, A2, .., Ak}, lowing observation is made : is a finite set of terminals, Given an input string al...a, and 2 distinct sym- P is a finite set of productions, bol positions, i and j, and a nonterminal Ak such A1 is the start symbol that Ak E b + ., where i' < i,j' > j, then 3 a non- I P3 is assumed to be in the Chomsky Normal Form. terminal A k, which is a descendent of Ak in the Valiant (1975) shows how the recognition problem tree rooted at Ak, such that A k, E bi + d' . where can be reduced to the problem of finding Transitive Closure and how Transitive Closure can be reduced i" < i, j" > j and A k, has two children Ak~ and Ak2 to Matrix Multiplication. such thatAk~ Eb +, andAk2 Eb +..withi<s<j. Given an input string aza2 .... an E ~*, the recur- A k, can be thought of as a minimal node in this sive algorithm makes use of an (n+l)× (n+l) upper sense.(The descendent relation is both reflexive and triangular matrix b defined by transitive) hi,i+1 = {Ak I(Ak --* a,) E P}, Thus, given a string al...a, of length n, (say r = bi,j = ¢, for j • i + 1 2/3), the following steps are done : 167 k t t Figure 2: Adjunction Operation 1. Find the closure of the first 2/3 ,i.e. all nodes nodes from the first and last 2/3's ,(say r =2/3). spanning trees which are within the first 2/3 . Thus, if we discard the mid 1/3, we will not be able 2. Find the closure of the last 2/3 , i.e. all nodes to infer that the adjunction had indeed taken place spanning trees which are within the last 2/3. at node m. 3. Do a composition operation (i.e. matrix multi- 4 Notations plication) on the nodes got as a result of Step 1 with nodes got as a result of Step 2. Before we introduce the algorithm, we state the no- tations that will be used. 4. Reduce problem size to az...an/zal+2n/3...an We will be making use of a 4-dimensional matrix and find closure of this input. A of size (n + 1) x (n + 1) x (n + 1) x (n + 1), where The point to note is that in step 3, we can get rid n is the size of the input string. of the mid 1/3 and focus on the remaining problem (Vijayashanker, Joshi, 1986) Given a TAG G and size. an input string aza2..an, n > 1, the entries in A will This approach does not work for TALs because of be nodes of the trees of G. We say, that a node m the presence of the adjunction operation.