Multiset-Valued Linear Index Grammars: Imposing Dominance Constraints on Derivations

Owen Rambow Univesit@ Paris 7 UFR Linguistique, TALANA Case 7003, 2, Place Jussieu F-75251 Paris Cedex 05, France rambow©linguist, j ussieu, fr Abstract While LIG-equivalent formalisms have been shown to This paper defines multiset-valued linear index gram- provide adequate formal power for a wide range of lin- mar and unordered vector grammar with dominance guistic phenomena (including the aforementioned Swiss links. The former models certain uses of multiset- German construction), the need for other mathemati- valued feature structures in unification-based for- cal formalisms has arisen in several unrelated areas. In malisms, while the latter is motivated by word order this paper, we discuss three such cases. First, captur- variation and by "quasi-trees", a generalization of trees. ing several semantic and syntactic issues in unification- The two formalisms are weakly equivalent, and an im- based formalisms leads to the use of multiset-valued portant subset is at most context-sensitive and polyno- feature structures. Second, word order facts from lan- mially parsable. guages such as German, Russian, or Turkish cannot be derived by LIG-equivalent formalisms. Third, a gener- alization of trees to "quasi-trees" (Vijay-Shanker, 1992) Introduction in the spirit of D-Theory (Marcus et al., 1983) leads Early attempts to use context-free grammars (CFGs) as to the definition of a new formal system. In this pa- a mathematical model for natural language syntax have per, we introduce two new equivalent mathematical for- largely been abandoned; it has been shown that (un- malisms which provide adequate descriptions for these der standard assumptions concerning the recursive na- three phenomena. ture of clausal embedding) the cross-serial dependencies The paper is structured as follows. First, we present found in Swiss German cannot be generated by a CFG the three phenomena in more detail. We then introduce (Shieber, 1985). Several mathematical models have multiset-valued LIG and present some formal proper- been proposed which extend the formal power of CFGs, ties. Thereafter, we introduce a second rewriting sys- while still maintaining the formal properties that make tem and show that it is weakly equivalent to the LIG CFGs attractive formalisms for formal and computa- variant. We then briefly mention some related for- tional linguists, in particular, polynomial parsability malisms. We conclude with a brief summary. and restricted weak generative capacity. These mathe- matical models include tree adjoining grammar (TAG) Three Problems for LIG-Equivalent (Joshi et al., 1975; Joshi, 1985), head grammar (Pollard, 1984), combinatory categorial grammar (CCG) (Steed- Formalisms man, 1985), and linear index grammar (LIG) (Gaz- The three problems we present are of a rather differ- dar, 1988). These formalisms have been shown to be ent nature. The first arises from the way a linguis- weakly equivalent to each other (Vijay-Shanker et al., tic problem is treated in a specific type of framework 1987; Vijay-Shanker and Weir, 1994); we will refer to (unification-based formalisms). The second problem them as "LIG-equivalent formalisms". LIG is a vari- derives directly from linguistic data. The third prob- ant of index grammar (IG) (Aho, 1968). Like CFG, IG lem is a formalism which has been motivated on in- is a context-free string rewriting system, except that dependent, methodological grounds, but whose formal the nonterminal symbols in a CFG are augmented with properties are unknown. stacks of index symbols. The rewrite rules push or pop indices from the index stack. In an IG, the index stack Multiset-Valued Feature Structures is copied to all nonterminal symbols on the right-hand HPSG (Pollard and Sag, 1987; Pollard and Sag, 1994) side of a rule. In a LIG, the stack is copied to exactly uses typed feature structures as its formal basis, which one right-hand side nonterminal. 1 are Turing-equivalent. However, it is not necessarily 1Note that a LIG is not an IG that is linear (i.e., whose side), but rather, it is a context-free grammar with linear productions have at most one nonterminal on the right-hand indices (i.e., the indices are never copied).

263 the case that the full power of the system is used in (2) ... dab [dem Kunden]i [den Kfihlschrank]j the linguistic analyses that are expressed in it. HPSG ... that the clientDAW the refrigeratorAcc analyses include information about constituent struc- bisher noch niemand ti [[tj ZU reparieren] ture which can be represented as a context-free phrase- so far yet no-oneNoM to repair structure tree. In addition, various mechanisms have zu versuchen] versprochen hat been proposed to handle certain linguistic phenomena to try promised has that relate two nodes within this tree. One of these ... that so-far, no-one yet has promised the client to is a multiset-valued feature that is passed along the repair the refrigerator phrase-structure tree from daughter node to mother node. Multiset-valued features have been proposed for Similar data has been observed in the literature for the SLASH feature which handles wh-dependencies (Pol- other languages, for example for Finnish by Karttunen lard and Sag, 1994, Chapter 4), and for certain semantic (1989). Becker et al. (1991) argue that a simple TAG purposes, including the representation of stored quan- (and the other LIG-equivalent formalisms) cannot de- tifiers in a mechanism similar to Cooper-storage. An- rive the full range of scrambled sentences. Rambow and other use may be the representation of anti-coreference Satta (1994) propose the use of unordered vector gram- constraints arising from Principle C of Binding Theory mar (UVG) to model the data. In UVG (Cremers and (be it that of (Chomsky, 1981) or of Pollard and Sag Mayer, 1973), several context-free string rewriting rules (1992)). are grouped into vectors, as for verspricht 'promises': It is desirable to be able to assess the formal power (3) ((S --+ NPnom VP), (VP -4 NPdat VP), of such a system, for both theoretical and practical (VP --~ Sinf V), (V ~ verspricht) ) reasons. Theoretically, it would be interesting if it turned out that the linguistic principles formulated in During a derivation, rules from a vector can be ap- HPSG naturally lead to certain restricted uses of the plied in any order, and rules from different vectors call unification-based formalism. Clearly this would repre- be interleaved, but at the end, all rules from an instance sent an important insight into the nature of grammat- of a vector must have been used in the derivation. By ical competence. On the practical side, formal equiv- varying the order in which rules from different vectors alences can guide the building of applications such as are applied, we can derive different word orders. Ob- parsers for existing HPSG grammars. For example, it serve that the vector in (3) contains exactly one ter- has been proposed that HPSG grammars can be "com- minal symbol (the verb); grammars in which every el- piled" into TAGs in order to obtain a computationally ementary structure (vector in UVG, tree in TAG, rule more tractable system (Kasper, 1992), thus sidestep- in CFG) contains at least one terminal symbol we will ping the issue of building parsers for HPSG directly. call lexicalized. However, LIG-equivalent formalisms cannot serve as Languages generated by UVG are known to be targets for compilations in cases in which HPSG uses context-sensitive and semilinear (Cremers and Mayer, multiset-valued feature structures. 1974) and polynomially parsable (Satta, 1993). How- ever, they are not adequate for modeling natural lan- guage syntax. In the following example, (4a) is out since Word Order Variation there is no analysis in which the moved NP c-commands Becket et al. (1991) discuss scrambling, which is the its governing verb, as is the case in (4b). permutation of verbal arguments in languages such as (4) a. *... dab niemand [dem Kundeu] [ti German, Korean, Japanese, Hindi, Russian, and Turk- ... that no-onesoM the clientDAT ish. If there are embedded clauses, scrambling in many ZU versuchen] [den Kiihlschrank]j versprochen languages can affect arguments of more than one verb to try the refrigeratorAcc promised ("long-distance" scrambling). hat [tj zu reparieren]i (1) ... dab [den Kiihlschrank]i bisher noch has to repair ... that the refrigeratorAcc so far yet b. ? ... daft niemand [dem Kunden] [den niemand [ti zu reparieren] versprochen hat Kiihlschrank]j [ti zu versuchen] versprochen hat no-onesoM to repair promised has [tjzu reparieren]i ... that so far, no-one has promised to repair the re- What is needed is an additional mechanism that en- frigerator forces a dominance relation between the sister node of Scrambling in German is "doubly unbounded" in the an argument and its governing verb. sense that there is no bound on the number of clause Quasi-Trees boundaries over which an element can scramble, and an element scrambled (long-distance or not) from one Vijay-Shanker (1992) introduces "quasi-trees" as a gen- clause does not preclude the scrambling of an element eralization of trees. He starts from the observation from another clause: that the traditional definition of tree adjoining gram-

264 $ mar (TAG) is incompatible with a unification-based ap- proach because the trees of a TAG start out as fully specified objects, which are later modified; in particu- NP S lar, immediate dominance relations in a tree need not I hold after another tree is adjoined into it. In order to ar- s rive at a definition that is compatible with a unification- based approach, he makes three minimal assumptions NP VP 1 about the nature of the objects used for the representa- I tion of natural language syntax. The first assumption vP (left implicit) is that these objects represent phrase- structure. The second assumption is that they "give V e a sufficiently enlarged domain of locality that allows localization of dependencies such as subcategorization, Figure 1: Sample quasi-tree and filler-gap" (Vijay-Shanker, 1992, p.486). The third assumption is that dominance relations can be stated between different parts of the representation. These long as dominance constraints are respected); there are assumptions lead Vijay-Shanker to define quasi-trees, no operations in regular TAG that correspond to such which are partial descriptions of trees in which "quasi- operations. We will say that a quasi-tree is derived if in nodes" (partial descriptions of nodes) are related by all quasi-node pairs, the two quasi-nodes are equated, dominance constraints. Each node in a traditional tree meaning that they have the same label and the two (as used in TAG) corresponds to two quasi-nodes, a top feature structures are unified, and furthermore, if all and a bottom version, such that the top dominates the frontier quasi-nodes have terminal labels. The string bottom. associated with this quasi-tree is defined in the usual way. There are two ways of interpreting quasi-trees: ei- We have now fully defined a formalism (if informally): ther quasi-trees can be seen as data structures in their its data structures (quasi-trees), its combination oper- own right; or quasi-trees can be seen as descriptions ation (substitution), and the notion of derived struc- of trees whose denotations are sets of (regular) trees. ture. We will call this formalism Quasi-Tree Substitu- If quasi-trees are defined as data structures, we can tion Grammar (QTSG). It can easily be seen that all define operations such as adjunction and substitution examples discussed by Vijay-Shanker (1992) are deriva- and notions such as "derived structure". More pre- tions in QTSG. The question arises as to the formal and cisely, we define quasi-trees to be structures consisting computational properties of QTSG. of pairs of nodes, called quasi-nodes, such that one is the "top" quasi-node and the other is the "bottom" Multiset-Valued LIG quasi-node. The top and bottom quasi-node of a pair are linked by a dominance constraint. Bottom quasi- In order to find a mathematical model for certain uses nodes immediately dominate top quasi-nodes of other of multiset-valued feature structures, discussed above, quasi-node pairs, and each top quasi-node is immedi- we now introduce a multiset-valued variant of LIG. We ately dominated by exactly one bottom quasi-node. For denote by .A4(A) the set of multisets over the elements simplicity, we will assume that there is only a bottom of A, and we use the standard set notation to refer to root quasi-node (i.e., no top root quasi-node), and that the corresponding multiset operations. bottom frontier quasi-nodes are omitted (i.e., frontier Definition 1 A multlset-valued Linear Index nodes just consist of top quasi-nodes). Furthermore, Grammar ({}-LIG) is a 5-tuple (tiN, VT, ~, P, S), we will assume that each quasi-node has a label, and where VN, VT, and VI are disjoint sets of terminals, is equipped with a finite feature structure. A sample non-terminals, and indices, respectively; S E VN is the quasi-tree is shown in Figure 1 (quasi-tree a5 of Vijay- start symbol; and P is a set of productions of the fol- Shanker (1992, p.488)). lowing form: We follow Vijay-Shanker (1992, Section 2.5) in defin- p : As ) voBlslvl ...v~-lB, snvn ing substitution as the operation of forming a quasi-node pair from a frontier node of one tree (which becomes the for some n > O, A, B1,...,Bn E VN, s, sl,...,sn mul- top node) and the root node of another tree (which be- tisets of members of VI, and vo, . . ., vn E V~. comes the bottom node). As always, a dominance link The derivation relation ~ for a {}-LIG is defined relates the two quasi-nodes of the newly formed pair. as follows. Let ~,7 • (VN-A4(~) U VT)*, t,tl,...,tn Adjunction is not defined separately: it suffices to say multisets of members of VI, and p • P of the form that a pair of quasi-nodes is "broken up", thus forming given above. Then we have two quasi-trees. We then perform two substitutions. ~At7 ~ ~voBltlvx • .. vn-lB, t~v~7 Observe that nothing keeps us from breaking up more than one pair of quasi-nodes in either of two quasi-trees, such that t = U~=l(ti \ si)Us. If G is a {}-LIG, L(G) = and then performing more than two substitutions (as {w IS=Z=~c w,w • v4}.

265 Suppose we want to apply rule p to an instance of now define two normal forms which will be used later. nonterminal A with an index multiset t in a sentential We omit the proofs and refer to (Rambow, 1994) for form. First, we remove the indices in s from t, then we details. rewrite the nonterminal, then we distribute the remain- Definition 3 A {}-LIG G = (VN, VT, VI, P, S) is in ing indices freely among the newly introduced nonter- restricted index normal form or RINF if all pro- minals B1,..., Bn, creating new multisets, and finally ductions in P are of one of the following forms 'where we add si to the new multiset for each Bi, creating the new ti. A, B E VN, f E VI and a E (VTU VN)*): The reader will have noticed, and hopefully excused, 1. A )a the abuse of notation in this definition, which results g. A )Bf from mixing set-notation and string-notation. We can 3. AI ~B also define {}-LIG as a pure string-rewriting system which does not require the definition of additional data Theorem 1 For any {}-LIG, there is an equivalent structures (the multisets) for the notion of "derivation" {}-LIG in RINF. (see (Rambow, 1994)). However, the definition pro- Definition 4 A {}-LIG G = (VN, VT, V~, P, S) is in vided here (using an explicit representation of multi- Extended Two Form (ETF) if every production in sets) has the advantage of corresponding more directly P has the form As --+ BlSlB~S2, As --* Bs', or to the intuition underlying {}-LIG and is much easier A -* a, where A, Bx,B2 E VN, s, sl,s2, s' E VI*, and to understand and use in proofs. The issue is purely a E VT U {e}. notational. We now introduce a restriction on derivations, which Theorem 2 For any {}-LIG, there is an equivalent will be useful later. {}-LIG in ETF. Definition 2 A linearly-restrlcted derivation in a We now discuss some formal properties of {}-LIG. For reasons of space limitation, we only sketch the {}-LIG is a derivation 0 : S ~ w with w E V~. such proofs; full versions can be found in (Rainbow, 1994). that: We start with the weak generative power. We have al- I. The number of index symbols added (and hence re- ready seen that {}-LIG can generate languages not in moved) during the derivation is linearly bounded by £(LIG) (and hence not in £(TAG)). We will now show Iwl. that linearly restricted {}-LIGs are at most context- 2. The number ore-productions used during the deriva- sensitive. tion is linearly bounded by Iwl. Theorem 3 £R({}-LIG) _C £(CSG). We let LR(G) = {w I there is aderivation e : Outline of the proof. We simulate a derivation in a S ~ w such that 0 is linearly-restricted}, and we let . The space needed for this is £R({}-LIG) = {LMG ) [ G a {}-LIG}. If G is a {}-LIG bounded linearly in the length of the input word, since such that LR(G) = L(G), we say that G is linearly the number of the symbols that are erased, the index restricted. Many of the results that we will show ap- symbols and nonterminals that rewrite to ¢, is linearly ply only to linearly restricted {}-LIGs. However, as we bounded. • will see, all linguistic applications will make use of this What sort of languages could a {}-LIG possibly not restricted version. generate? Consider the copy language L = {ww ]w E EXAMPLE 1 {a, b}*}, and let us suppose that it is generated by G, a The following grammar derives the language {}-LIG. This language cannot be generated by a CFG. COUNT-5, where COUNT-5 = {anbncndne n In > 0}. We therefore know that for any integer M, there are in- Let G1 = (VN, VT, VI, P, S) with: finitely many strings in L whose derivation in G is such VN = {S,A,B,C,D,E} that at some point, an index multiset in the sentential form contains more than M index symbols (since any V T = {a,b,c,d,e} ¼ = {s~,Sb, S~,Sd,S~} finite use of index symbols can be simulated by a pure P = {PI :S > S{Sa,Sb, Sc,Sd, S~} CFG). It must be the case that this unbounded multiset P2 : S ~ ABCDE is crucial in restricting the second half of the generated P3 : A{s~} ~ Aa, P4 : A--~ E string in such a way that it copies the first half (again, p5 : B{Sb } > Bb, P6 : B -----+ E since a pure CFG cannot derive such strings). However, pT : C{s~} ----r Cc, ps : C ~ e it is impossible for a data structure like a (multi-)set p9 : D{sd} > Dd, Plo : D > (over a finite index alphabet) to record the required se- quential information. Therefore, the second half of the Pll : E{se} ) Ee, Pl~ : E > e } A sample derivation is shown in Figure 2. string cannot be adequately constrained, and G cannot exist. This argument nmtivates the following conjec- This example shows that Z:({}-LIG) is not contained ture. in/:(LIG), since the latter cannot derive COUNT-5. We Conjecture 4 {wwlw E {a,b}*} is not in £:({}-LIG).

266 S S{Sa, 8b, Se, 8d, Be} S{8a, Sb, 8e, Sd, Se, Sa, 8b, Se, Sd, Se, Sa, Sb, Se, Sd, Se } A{sa, sa, sa}B{sb, Sb, sb}C{sc, sc, sc}D{sd, Sd, sd}E{so, se, se} A{s., s., s.}B{sb, Sb, sb}C{so, so}eD{Sd, Sd, Sd}E{s , s., aaaB{Sb, 8b, 8b}C{ o, o}eD{ d, sd}E{80, s.} aaabbbcccdddeee

Figure 2: Sample derivation in {}-LIG G1

We now turn to closure properties. UVG with Dominance Links

Theorem 5 L:({}-LIG) is a substitution-closed full ab- We now formally define UVG with dominance links stract family of languages (AFL). (UVG-DL), which serves as a formal model for the sec- ond and third phenomena introduced above, word order Outline of the proof. Since £({}-LIG) contains all variation and quasi-trees. The definition differs from context-free languages, it contains all regular languages, that of UVG only in that vectors are equipped with and therefore it is sufficient to show that L:({}-LIG) is dominance relations which impose an additional condi- closed under intersection with regular languages and tion on derivations. Note that the definition refers to substitution. These results are shown by adapting the the notion of derivation tree of a UVG, which is defined techniques used to show the corresponding results for as for CFG. CFGs. • Finally, we turn to the recognition and parsing prob- Definition 5 An Unordered Vector Grammar lem. Again, we will restrict our attention to the linearly with Dominance Links (UVG-DL) is a 4-tuple restricted version of {}-LIG. (VN, VT, V, S), where VN and VT are sets of nonter- minals and terminals, respectively, S is the start sym- Theorem 6 Each language in/~R({}-LIG) can be rec- bol, and V is a set of vectors of context-free produc- ognized in polynomial deterministic time. tions equipped with dominance links. For a given vec- tor v E V, the dominance links form a binary relation Outline of the proof. We extend the CKY parser for domv over the set of occurrences of non-terminals in CFG. Let G be a {}-LIG in ETF. Since G may contain the productions of v such that if domv(A, B), then A e-productions, the algorithm is adapted by letting the (an instance of a symbol) occurs in the right-hand side indices of the matrix refer to positions between sym- of some production in v, and B is the left-hand symbol bols in the input string, not the symbols themselves. (instance) of some production in v. In order to account for the index multiset, we let the IfG is a UVG-DL, L(G) consists of all words w E VYt entries in the recognition matrix be pairs consisting of which have a derivation p of the form a nonterminal symbol and a [Y}l-tuple of integers:

(A, (nl,..., nlv, I)) such that ~ meets the following two conditions: The IVil-tuple of integers represents a multiset, with each integer designating the number of copies of a given 1. piP2 • • .Pr is a permutation of a member of V*. index symbol that the set contains. In an entry of 2. The dominance relations of V, when interpreted as the matrix, each pair represents a partial derivation the standard dominance relation defined on trees, of a substring of the input string. More precisely, if hold in the derivation tree of ~. the input word is al .-.an, and if ~ = {il,...,ilv, I}, then we have (A, (nl,...,nlvd)) in entry ti,j of the The second condition can be formulated as follows: recognition matrix if and only if there is a derivation if v in V contributes instances of productions Pl and As ::=¢. ai+l ...aj, where multiset s contains nk copies P2 (and perhaps others), and the kth daughter in the of index symbol it,, 1 < k < I vii. Clearly, there right-hand side of Pl dominates the left-hand nonter- is a derivation in the grammar if and only if entry minal of P2, then in the context-free derivation tree as- t0,n contains the pair (S, (0,...,0)). Now since the sociated with # (the unique node associated with) the grammar is linearly restricted, each nk is bounded by kth daughter node of pl dominates (the unique node n, and hence the number of different pairs is linearly associated with) P2. We now give an example. (The bounded by IVNInW'I. Thus each entry in the matrix superscripts distinguish instances of symbols and are can be computed in O(n l+21vd) steps, and since there not part of the nonterminal alphabet.) are O(n 2) entries, we get an overall time complexity of EXAMPLE 2 O(n3+21v, I). • Let G2 ---- (VN, VT, V, S t) with:

267 v1: {(S' ~ daft VP)} with dome, = I~ v2: {(VP (1) ~ NPnom VP(2)), (VP (3) ) NPdat Vp(4)), (VP (5) ---+ VP (6) VP(r)), (VP (s) > verspricht)} with. dom~ = {(VP(2), Vp(S)), (VP(4), VP(S)), (VP(~),VP(S))} vz: {(VP (1) ----+ VP(D Vp(2)), (Vp(3) ----+ zu versuchen)} with domvs = {(VP(2), Vp(3))} v4: {(Vp(D ----+ NFacc VP(2)), (VP (3) > zu reparieren)} with dome, = {(VP(~), Vp(3))} vh: {(NPnom -'-+ der Meister)} with domvs = v6: {(NPdat ~ niemandem)} with dome. = 0 vr: {(NPacc ~ den K~hlschrank)} with dome, = 0

Figure 3: Definition of V for UVG-DL G2

NP?~) "...vP(p41)

'°'°o* =oO.Oo.. der Meister NP(Pn) "'"'.. ._.~.. (~l) ..... "..

den Kuehlschrank VP(_Pz2) : ""... VP(IP42) ~..... ".. ! * % • NP(i61) " VP(P23) "'..oi zu reparieren

niemandem VPIP~2) VPIP24)

zu versuchen verspricht

Figure 4: Sample UVG-DL derivation

VN = {S', VP, NPnom, NPdat, NPaec} in Section 2. VT = {daft, verspricht, zu versuchen, zu reparieren, We now turn to the formal properties of UVG-DL. der Meister, niemandem, den Kiihlschrank} 2 Our main result is that UVG-DL is weakly equivalenl~ V = {vx, v2, v3, v4, vh, vr, VT} to {}-LIG. The sets of a {}-LIG implement the domi- where the vi are as defined in Figure 3. nance links and make sure that all members from one A sample derivation is shown in Figure 4, where the set of rules are used during a derivation. We first in- dominance relations are shown by dotted lines. Ob- troduce some more terminology with which to describe serve that the example grammar is lexicalized. We will the derivations of UVG-DLs. If two productions P~,1 denote the class of lexicalized UVG-DL by UVG-DLLex. and Pv,2 from vector v are linked by a dominance link It is clear that the dominance links of UVG-DL are from a right-hand side nonterminal of p~,l to the left.- the additional constraints that we argued above are nec- hand nonterminal Pv,2, then we will denote this link by essary to adequately restrict the structural relation be- l,,1,~. We will say that p~.l (or the right-hand side non- tween arguments and their verbs. Furthermore, UVG- terminal in question) has a passive dominance require- DL is a notational variant of QTSG: every vector rep- ment of Iv,l,2, and that Pv,2 has an active dominance resents a quasi-tree, and identifying quasi-nodes cor- requirement of Iv,l,2. If Pv,1 or Pv,2 is used in a partial responds to rewriting. The condition on a successful derivation such that the other production is not used derivation in QTSG - that all nonterminal nodes be in the derivation, the dominance requirement (passive identified - corresponds to the definition of a derivation or active) will be called unfulfilled. Let ~0 be a (partial) in UVG-DL. We have therefore found a mathematical derivation. We associate with # a multi-set which rep- model for the second and third phenomenon mentioned resent all the unfulfilled active dominance requirements of ~0, written T(L0). ~Gloss (in order): that, promises, to try, to repair, the master, no-one, the refrigerator. Theorem 7 ~(UVG-DL) = L:({}-LIG)

268 Outline of the proof. The theorem is proved in two UVG-DL maps to a linearly restricted {}-LIG. For lin- parts (one for each inclusion). We first show the inclu- guistic purposes we are only interested in lexicalized sion Z(UVG-DL) C_ L:({}-LIG). Let G = (VN, Vw, V, S) grammars, and therefore the linear restriction is quite be a UVG-DL, where V = {Vl,..., vK} with vi = natural. We obtain the following corollaries thanks to (pi,1,...,pi,k,), kl = Ivil, 1 < i < K. We construct Theorem 7. a {}-LIG G' = (VN, VT, Yi, P, S). Let Yi = {li,j,k I 1 < Corollary 8 L:(UVG-DLLex) C_ LI(CSG). i < K, 1 < j, k < ki }. Define P as follows. Corollary 9 L:(UVG-DL) is a substitution-closed full Let v in V, and let p in v be the production A ) AFL. WoBlWl...B~w, be in yr. In the following, we will denote by T(p) the multiset of active dominance re- Corollary 10 Each language in /:(UVG-DLLex) can quirements of p, and by .l-i(p) the multiset of passive be recognized in polynomial deterministic time. dominance requirements of Bi, 1 < i < n. Add to P the following production: Related Formalisms Based on word-order facts from Turkish, Hoffman A T(p) ~ woBI.J-I(p)wl'" "Bn-l-n(p)wn (1992) proposes an extension to CCG called {}-CCG, in which arguments of functors form sets, rather than be- P contains no other productions. We show by induc- ing represented in a curried notation. Under function tion that for A in VN, and w in V.~, we have A =~a w composition, these sets are unioned. Thus the move iff A =~=:'c' w. Specifically, we show that for all integers from CCG to {}-CCG corresponds very much to the k k, 0 : A =:~c w, w E V~, with unfulfilled active domi- move from LIG to {}-LIG. We conjecture that (a ver- nance requirements T(0), implies that there is a deriva- sion of) {}-CCG is weakly equivalent to {}-LIG. tion AT(0) =~:¢'G' w, and, conversely, we show that for Staudacher (1993) defines a related system called dis- i tributed index grammar or DIG. DIG is like LIG, except all integers k, At ==~G, a, A E VN, t a multiset of ele- that the stack of index symbols can be split into chunks ments of l/i, and a E V~, implies that there is a deriva- and distributed among the daughter nodes. However, tion 0 : A =~G a such that T(0 ) = t. the formalism is not convincingly motivated by the lin- For the inclusion/:({}-LIG) C L:(UVG-DL), we take guistic data given (which can also be handled by a sim- a slightly different approach to avoid notational com- ple LIG) or by other considerations. plexity. Let G = (VN, VT, Vx,P,S) be a {}-LIG in Several extensions to {}-LIG and UVG-DL are de- RINF. We construct a UVG-DL G' = (VN, VT, V,S), fined in (Rambow, 1994). First, we can introduce the where V is defined as follows: "integrity" constraint suggested by Becker et al. (1991) which restricts long-distance relations through nodes. 1. Ifp E P is a {}-LIG production of RINF type 1, then This is necessary to implement the linguistic notion of ((p),0) E V. "barrier" or "island". Second, we can define the tree- 2. If p E P is a {}-LIG production of RINF type 2, rewriting version of UVG-DL, called V-TAG. This is with p = A ----+ Bf for A, B E VN, f E I,~,then for motivated by Conjecture 4, which (if true) means that all q E P such that q = Cf ~ D, v = ((A UVG-DL cannot derive Swiss German. Under either ex- B,C . ~ D),domv(B,C)) is in V. tension, the weak generative power is extended, but the Let A be in tiN, and w in V~. We show by induction formal and computational results obtained for {}-LIG and UVG-DL still hold. that S =~:~a w iff S =~:~a' w. Specifically, we first show that for all integers k, for all {}-LIGs G and the corre- Conclusion sponding UVG-DL G' as constructed above, if there is This paper has presented two equivalent formalisms, a derivation t~ : S {} ~::~e w with k instances of ap- {}-LIG and UVG-DL, which provide formal models for plications of rules of type 2, then there is a deriva- the three different phenomena that we identified in the tion 0 ' : S :~::~a' w such that 0 and O~ are identical beginning of the paper. We have shown that both for- except for the index symbols in the sentential forms malisms, under certain restrictions that are compati- of 0. For the converse inclusion, we show that for all ble with the motivating phenomena, are restricted ill integers k, for all {}-LIGs G and the correspond UVG- their generative capacity and polynomially parsable, DL G ~ as constructed above, if there is a derivation thus making them attractive candidates for modeling O' : S {} ~a, w with k instances of applications of natural language. Furthermore, the formalisms are rules from vectors with two elements, then there is a substitution-closed AFLs, suggesting that the defini- derivation O : S ::~=~a w such that g and 0 ~ are identical tions we have given are "natural" from the point of except for the index symbols in the sentential forms of view of theory. 0. • This equivalence lets us transfer results from {}-LIG Acknowledgments to UVG-DL. It can easily be seen from the construction I would like to thank Bob Kasper, Gai~lle Recourcd, employed in the proof of Theorem 7 that a lexicalized Giorgio Satta, Ed Stabler, two anonymous reviewers,

269 and especially K. Vijay-Shanker for useful comments Pollard, Carl (1984). Generalized phrase structure and discussions. The research reported in this paper grammars, head grammars and natural language. was conducted while the author was with the Com- PhD thesis, Stanford University, Stanford, CA. puter and Information Science Department of the Uni- Pollard, Carl and Sag, Ivan (1987). Information- versity of Pennsylvania. The research was sponsored Based Syntax and Semantics. Vol 1: Fundamen- by the following grants: ARO DAAL 03-89-C-0031; tals. CSLI. DARPA N00014-90-J-1863; NSF IRI 90-16592; and Ben Franklin 91S.3078C-1. Pollard, Carl and Sag, Ivan (1992). Anaphors in En- glish and the scope of binding theory. Linguistic Bibliography Inquiry, 23(2):261-303. Aho, A. V. (1968). Indexed grammars - an extension Pollard, Carl and Sag, Ivan (1994). Head-Driven to context free grammars. J. ACM, 15:647-671. Phrase Structure Grammar. University of Chicago Becker, Tilman; Joshi, Aravind; and Rambow, Owen Press, Chicago. Draft distributed at the Third Eu- (1991). Long distance scrambling and tree adjoin- ropean Summer School in Language, Logic and In- ing grammars. In Fifth Conference of the European formation, Saarbriicken, 1991. Chapter of the Association for Computational Lin- Rambow, Owen (1994). Formal and Computational guistics (EACL'91), pages 21-26. ACL. Models for Natural Language Syntax. PhD thesis, Chomsky, Noam (1981). Lectures in Government and Department of Computer and Information Science, Binding. Studies in generative grammar 9. Foris, University of Pennsylvania, Philadelphia. Dordrecht. Rambow, Owen and Satta, Giorgio (1994). A rewriting Cremers, A. B. and Mayer, O. (1973). On matrix lan- system for free word order syntax that is non-local guages. Information and Control, 23:86-96. and mildly context sensitive. In Martfn-Vide, Car- los, editor, Current Issues in Mathematical Lin- Cremers, A. B. and Mayer, O. (1974). On vector lan- guistics, North-Holland Linguistic series, Volume guages. J. Comput. Syst. Sei., 8:158-166. 56. Elsevier-North Holland, Amsterdam. Gazdar, G. (1988). Applicability of indexed grammars to natural languages. In Reyle, U. and Rohrer, C., Satta, Giorgio (1993). Recognition of vector languages. editors, Natural Language Parsing and Linguistic Unpublished manuscript, Universith di Venezia. Theories. D. Reidel, Dordrecht. Shieber, Stuart B. (1985). Evidence against the Hoffman, Beryl (1992). A CCG approach to free word context-freeness of natural language. Linguistics order languages. In 30th Meeting of the Associa- and Philosophy, 8:333-343. tion for Computational Linguistics (ACL'92). Staudacher, Peter (1993). New frontiers beyond Joshi, Aravind; Levy, Leon; and Takahashi, M (1975). context-freeness: DI-grammars and DI-automata. Tree adjunct grammars. J. Comput. Syst. Sci., In Sixth Conference of the European Chapter 10:136-163. of the Association for Computational Linguistics (EA CL '93). Joshi, Aravind K. (1985). How much context- sensitivity is necessary for characterizing struc- Steedman, Mark (1985). Dependency and coordination tural descriptions -- Tree Adjoining Grammars. in the grammar of Dutch and English. Language, In Dowty, D.; Karttunen, L.; and Zwicky, A., ed- 61. itors, Natural Language Processing -- Theoreti- Vijay-Shanker, K. (1992). Using descriptions of trees in cal, Computational and Psychological Perspective, a Tree Adjoining Grammar. Compvtational Lin- pages 206-250. Cambridge University Press, New guistics, 18(4) :481-518. York, NY. Originally presented in 1983. Vijay-Shanker, K. and Weir, David (1994). The equiva- Karttunen, Lauri (1989). Radical lexicalism. In Baltin, lence of four extensions of context-free grammars. Mark and Kroch, Anthony S., editors, Alternative Math. Syst. Theory. Also available as Technical conceptions of phrase structure, pages 43-65. Uni- Report CSRP 236 from the University of Sussex, versity of Chicago Press, Chicago. School of Cognitive and Computing Sciences. Kasper, Robert (1992). Compiling head-driven phrase Vijay-Shanker, K.; Weir, D.J.; and Joshi, A.K. (1987). structure grammar into lexicalized tree adjoining Characterizing structural descriptions produced by grammar. Presented at the TAG+ Workshop, Uni- various grammatical formalisms. In 25th Meeting versity of Pennsylvania. of the Association for Computational Lingvistics Marcus, Mitchell; Hindle, Donald; and Fleck, Margaret (ACL '87}, Stanford, CA. (1983). D-theory: Talking about talking about trees. In Proceedings of the 21st Annual Meeting of the Association f or Computational Linguistics, Cambridge, MA.

270