On the Succinctness Properties of Unordered Context-Free Grammars M
Total Page:16
File Type:pdf, Size:1020Kb
ON THE SUCCINCTNESS PROPERTIES OF UNORDERED CONTEXT-FREE GRAMMARS M. Drew Moshier and William C. Rounds Electrical Engineering and Computer Science Department University of Michigan Ann Arbor, Michigan 48109 1 Abstract ally in all cases. (Unfortunately, this does not solve the P - NP question!) We prove in this paper that unordered, or ID/LP Barton's reduction took a known NP-complete prob- grammars, are e.xponentially more succinct than context- lem, the vertex-coverproblem, and reduced it to the URP free grammars, by exhibiting a sequence (L,~) of finite for ID/LP. The reduction makes crucial use of grammars languages such that the size of any CFG for L,~ must whose production size can be arbitrarilylarge. Define the grow exponentially in n, but which can be described by fan-out of a grammar to be the largest total number of polynomial-size ID/LP grammars. The results have im- symbol occurrences on the right hand side of any produc- plications for the description of free word order languages. tion. For a CFG, this would be the maximum length of any RHS; for an ID/LP grammar, we would count sym- bols and their multiplicities. Barton's reduction does the 2 Introduction following. For each instance of the vertex cover problem, Context free grammars in immediate dominance and of size n, he constructs a string w and an ID/LP grammar of fanout proportional to n such that the instance has a linear precedence format were used in GPSG [3] as a skele- vertex cover if and only if the string is generated by the ton for metarule generation and feature checking. It is in- grammar. He also notes that if all ID/LP grammars have tuitively obvious that grammars in this form can describe fanout bounded by a fixed constant, then the URP can languages which are closed under the operation of taking be solved in polynomial time. arbitrary permutations of strings in the language. (Such languages will be called symmetric.) Ordinary context- This brings us to the statement of our results. Let Pn be the language described above. Clearly this language free grammars, on the other hand, "seem to require that can be generated by the ID/LP grammar all permutations of right-hand sides of productions be ex- plicitly listed, in order to describe certain symmetric lan- S -- al,...,an guages. For an explicit example, consider the n-letter al- phabet E,~ = {al ..... a,~}. Let P,~ be the set of all strings whose size in bits is O(n log n). which are permutations of exactly these letters. It seems obvious that no context-free grammar could generate this Theorem 1 There is a constant c > I such that any language without explicitly listing it. Now try to prove contezt-free gr.mmar Gn generating Pn must have size that this is the case. This is in essence what we do in this ~(cn). 2 Moreover, every [D/LP grammar'generating pn, paper. We also hope to get the audience for the paper whose fanout is bounded by a fized constant, must likewise interested in why the proof works! have ezponential size. To give some idea of the difficulty of our problem, we The theorem does not actually depend on having a begin by recounting Barton's results [1] in this confer- vocabulary which grows with n. It is possible to code ence in 1985. (There is a general discussion in [2].) He everything homomorphically into a two-letter alphabet. showed that the universal recognition problem (URP) for However, we think that the result shows that ordinary ID/LP grammars is NP-complete. 1 This means that if CFGs, and bounded-fanout ID/LP grammars, are inade- P :~ NP, then no polynomial algorithm can solve this quate for giving succinct descriptions of languages whose problem. The difficulty of the problem seems to arise vocabulary is open, and whose word order can be very from the fact that the translation from an ID/LP gram- free. Thus, we prefer the statement of the result as it is. mar to a weakly equivalent CFG blows up exponentially. We start the paper with the technical results, in Sec- It is easy to show, assuming P ~ NP, that any reason- tion 3, and continue with a discussion of the implications able transformation from ID/LP grammars to equivalent for linguistics in Section 4. The final section contains CFGs cannot be done in polynomial time; Rounds has a proof of the Interchange Lemma of Ogden, Ross, and done this as a remark in [8]. In this paper, we remove Winklmann [7], which is the main tool used for our re- the hypothesis P ~: NP. That is, we can show that no suits. This proof is included, not because it is new, but algorithm whatever can effect the translation polynomi- because we want to show a beautiful example of the use of The universal recognition problem is to tell for an ID/LP gram- mar G and a string w, whether or not w E L(G). 2This notation meam.s that for inKnitely ramW n, the size of Gn must be bigger than cn. 112 combinatorial principles in formal linguistics, and because Now we can give a more formal statement of the we think the proof may be generalized to other classes of lem/'fla. grammars. Definition. Suppose that A is a subset {zl ..... -p} of L(n). A has the k-interchangeability property iff there are substrings Zh ..., z v of zl, ..., z v respectively, such 3 Technical Results that each z, has length k, each z~ occurs in the same relative position in each zi, and such that if z~ = wiziy( As we have said, our basic tool is the Interchange and z i = wjziVj for any i and j, then wi~jVl is an element Lemma, which was first used to show that the "embedded of L(n). reduplication" language { wzzy I w, z, and y E {a, b, c}" } is not context-free. It was also used in Kac, Manaster- Interchange Lemma. Let G be a CFG or ID/LP Ramer, and Rounds [6] to show that English is not CF, grammar with fanout r, and with nonterminal alphabet and by Rounds, Manaster-Ramer, and Friedman to show N. Let m and n be any positive natural numbers with that reduplication even over length n strings requires r < m_< n. Let L(n) be the set of length nstringsin context-free grammar size exponential in n. The cur- L(G), and Q(n) be a subset of L(n). Then we can find rent application uses the last-mentioned technique, but a k-interchangeable subset A of Q(n), such that m/r <_ the argument is more complicated. k _< m, and such that We will discuss the Interchange Lemma informally, then state it formally. We will then show how to apply it Ial >_ IQ(n)ll (INI" n2). in our case. The IL relies on the following basic observation. Sup- Now we can prove our main theorem. First we show pose we have a context-free language, and two strings in that no CFG of fanout 2 can generate L(n) without an that language, each of which has a substring which is the exponential number of nonterminals. The theorem for yield of a subtree labeled by the same nonterminal sym- any CFG then follows, because any CFG can be trans- bol at the respective roots of the subtrees. Then these formed, into a CFG with fanout 2 by a process essentially substrings can be interchanged, and the resulting strings like that of transforming into Chornsky normal form, but will still be in the language. This is what distinguishes without having to eliminate e-productions or unit produc- the IL from the Pumping Lemma, which finds repeated tions. This process at most cubes the grammar size, and nonterminals in the derivation tree of just one string. the result follows because the cube root of an exponen- The next observation about the IL is that it attempts tial is stillan exponential. The proof for bounded-fanout to find these interchangeable strings among the length n ID/LP is a direct adaptation of the proof for fanout 2, strings of the given language. Moreover, we want to find which we now give. a whole set of such strings, such that in the set, the inter- Let Pn be the permutation language above, and let changed substrings all have the same length, and all start G be a fanout 2 grammar for this language. Apply the at the same position in the host string. The lemma lets Interchange Lemma to G, choosing Q(n) = P~, r = 2, us select a number m less than n, and tells us that the and m = n/2. (n will be chosen as a multiple of 4.) length k of the interchangeable substrings is between role Observe that IQ(n)l = IL(n)[ = n!. From the IL, we get a and m, where r is the fanout of the grammar. Finally, the k-interchangeable subset A of L(n), such that n/4 < k < lemma gives us an estimate of the size of the interchange- n/2, and such that able subset. We may choose an arbitrary subset Q(n) of n! L(n), where L(n) is the set of length n strings in the lan- IAI _> INI" n'-" guage L. If we also choose an integer m < n, then the IL tells us that there is an interchangeable set A C_ Q(n) Next we use the fact that A is k-interchangeable to get an such that IAI _> IQ(n)I/(INI" n=), where the vertical bars upper bound on its cardinality.