View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

JOURNAL OF COMPUTER AND SYSTEM SCIENCES 30, 249-273 (1985)

Pattern Selector Grammars and Several Parsing Algorithms in the Context-free Style

J. GONCZAROWSKIAND E. SHAMIR*

Institute of Mathematics and Computer Science, The Hebrew University of Jerusalem, Jerusalem 91904, Israel Received March 30, 1982; accepted March 5, 1985

Pattern selector grammars are defined in general. We concentrate on the study of special grammars, the pattern selectors of which contain precisely k “one% (0*( 10*)k) or k adjacent “one? (O*lkO*). This means that precisely k symbols (resp. k adjacent symbols) in each sen- tential form are rewritten. The main results concern parsing algorithms and the complexity of the membership problem. We first obtain a polynomial bound on the shortest derivation and hence an NP time bound for parsing. In the case k = 2, we generalize the well-known context- free dynamic programming type algorithms, which run in polynomial time. It is shown that the generated languages, for k = 2, are log-space reducible to the context-free languages. The membership problem is thus solvable in log2 space. 0 1985 Academic Press, Inc.

1. INTRODUCTION

The parsing and membership testing algorithms for context-free (CF) grammars occupy a peculiar border position in the complexity hierarchy. The dynamic programming algorithm runs in 0(n3) steps [21], but more relined methods reduce the problem to matrix multiplication with O(n*+‘) run-time, where a < 1 ([20]. Earley’s algorithm runs in time O(n*) for grammars with bounded ambiguity [4]. This is too much for compiler applications, so that restricted grammars such as U(k) with linear complexity are used for the syntactic analysis of programming languages. On the other hand, even modest attempts to extend the model of con- text-free grammars run the risk of escalating the parsing complexity to the NP-hard zone. In the popular model of EOL systems (we use [lS] as a standard reference), the parsing can still be done in o(n”) (actually, O(n3 + a)) runtime [14]. EDTOL membership is still in nondeterministic log space, and thus in P [9]. For ETOL systems, however, it is NP-complete-the reduction proving this was first given in [Z] and later in [12]. For an overview of the time and space complexity of the membership testing problem for various L families, see [ 15 and lo]. In particular,

* This work was partially supported by the U.S.-Israel Binational Science Foundation, Grant 3432183. 249 OO22OOOO/85$3.00 Copyright 0 1985 by Academic Press, Inc. All rights of reproduction in any form reserved. 250 GONCZAROWSKI AND SHAMIR the tape complexity for CF and EOL is O(log* n). Moreover, the parsing problem for EOL, as well as for various other language families, was shown to be log-space reducible to the corresponding problem for CF [lS, 191. In the present article, we study extensions in the spirit of EOL systems. In suc- cessive sentential forms that constitute a derivation, one insists on parallel (or syn- chronized) application of productions to all the symbols in the EOL case, or to selected subsequences in our case here. A subsequence to be rewritten is specified by assigning “1” to symbols in it and “0” to symbols outside. The entire sequence of O’s and l’s is called a pattern or a mask. A grammar is now defined by context-free like productions plus a language P of patterns over (0, 1 }, which is called the pattern selector. For context-free grammars, the pattern selector is O*lO* or its star closure (0, 1 } *. For EOL systems, it is l*. Even for very simple regular pattern selectors, one can obtain families (models) which are very hard to classify (i.e., to say what their generative power is in com- parison to EOL or ETOL). On the other hand, regular pattern grammars are extremely powerful. In [ 111 it was, for instance, shown that there is a regular pat- tern selector grammar that generates all context-sensitive (and thus all RE) languages through weak identity. Our emphasis in this article is, however, not on generative power, but on parsing algorithms. We concentrate on P=O*lkO*, i.e., rewriting of k adjacent symbols at every step, O*(lO*)k, i.e., rewriting any k sym- bols, and their star closures. In both cases, we can obtain all the context-free languages, for any k > 1. It is easy to see that, in the O*(lO*)k case, one can generate non-context-free languages (cf. Example 2.1), for all k > 2. It was recently shown [3] that there are non-context-free (in fact, also non-EOL) languages that can be generated rewriting k = 2 adjacent symbols together. But nothing is known about hierarchy in k. For the parsing problems, we find polynomial time (or log* n space) algorithms for the families with pattern selectors O*llO* and O*lO*lO*. These algorithms are nontrivial extensions of the classical algorithms for context-free grammars. We shall also see that the languages families with pattern O*lkO* and O*(lO*)k are parsable in non-deterministic polynomial time (NP) and generate thus proper sub-families of the context-sensitive languages. Further results about the complexity of these problems, in particular, a polynomial time parsing algorithm for the pattern selec- tor O*(lO*)k, were found by [6] during the revision process of this paper. The plan of this paper is as follows: The formal definition of pattern selector grammars is given in Section 2. In Section 3 we give combinatorial results that bound the length of the shortest derivations for a word w by a linear function of its length. This kind of bounds is essential in analyzing the complexity of parsing, as they limit the height of the derivation trees that have to be considered. Bounds of this nature, with different techniques and difficulties, were also used, e.g., in Cl, 13, 19,41. In Section 4, we present dynamic programming algorithms for the languages defined with the pattern selectors O*llO* and O*lO*lO*. In the last sec- tion, we show that the membership problem for these languages is log-space reducible to context-free membership. Its space complexity is thus log* n. Our PATTERN SELECMR GRAMMARS PARSING 251 results are summarized in table form at the end of this paper. We thank the referee for suggesting this, and many other useful remarks incorporated in the revised ver- sion.

2. BASIC CONCEPTS AND DEFINITIONS

We assume the reader to be familiar with theory as, e.g., in the scope of [ 161. An overview of L system theory is given in [15]. Some notations need, perhaps, an additional explanation. For a word w, IwI denotes its length. 13. denotes the empty word. For a finite set X, #X denotes the cardinality of X. We shall usually identify a singleton set with its element. Alphabets are finite sets of symbols. Let L, and L, be languages. Then L, and L2 are considered equal if L,u{1}=L,u{i}. Let G be a rewriting system. Then L(G) denotes the language of G. Two rewriting systems are equivalent if the languages they generate are equal. Let Z and CDbe alphabets. We denote the family of total finite homomorphisms from Z* into @* by HOM(Z, @). In context-free grammars only non-terminal symbols can be rewritten. Very often it is convenient to permit the rewriting of terminal symbols as well. Thus we arrive at EOS systems (see, e.g., [ 11, 51).

DEFINITION 2.1. An EOS system F is a quadruple (Z, h, S, A ), where .Y is the alphabet of F, h is a total substitution from C* into (nonempty subsets of) ,Z’* called the sub- stitution of F, with h(a) # Qr for all a E Z, SE Z - A is the start symbol of F, and A EC is the set of terminal symbols of F. As customary, if a EC and w E h(a) then (a, w) is called a production in F: Prod(F) denotes the set of all productions in F, Pprod(F) = Prod(F) n {(a, b): a E C - S and b E h(a)} (see remark below) and Maxr(F)=max{Iw]: (a, w)~Prod(F)}. Whenever an EOS system is propagating (it does not contain productions of the form (a, A), called erasing productions), we call it an EPOS system. Remark. (1) Throughout this paper, we will assume that the start symbol of an EOS system does not occur in any right-hand side of a production rule. We have introduced the set Pprod(F) to allow us to refer only to the “proper productions” of F, i.e., all productions but those that have S as their left-hand side. (2) Note that, unlike in context-free grammars, it is required that the sub- stitution of an EOS system is a total mapping. However, a finite substitution h’ on 252 GONCZAROWSKI AND SHAMIR z* that is not total, can be “completed” to a total finite substitution h as follows. Let f be a “new” non-terminal symbol, called the failure symbol, for which h(f) = J: Then, we let h(a) =f for all those symbols a for which h’ is not defined. The traditional way of defining derivations from production systems is rewriting a single symbol, or alternatively, all symbols in parallel, in each step. This leads to context-free and EOL languages, respectively. Our intention is to further increase the generative power of rewriting by “masking” the sentential forms. A mask for a sentential form x is a (0, l}-word of length 1x1. Those places marked 1 in a mask are the places where a production is to be applied. To an EOS “base grammar” we thus add a language of masks Kc (0, 1 )*, which is called a “pattern selector”; after the initial step rewriting the start symbol S, only masks from the pattern selector K can be used at each rewriting step. Thus, in the original definition of context-free grammars and EOS systems, the pattern selector K = Ol*O is used, meaning that precisely one symbol is rewritten. EOL systems correspond to the pattern selector I*. The reader may prefer at this point to skip the formal definitions, and jump to Example 2.1, where a further example is given.

DEFINITION 2.2. An EOS-bused pattern selector grammar (EOS-based ps-gram- mar, for short) G is a pair (F, K), where Base(G) = F is an EOS system and Patt(G) = K is a language over the alphabet (0, 11. Let G = (F, K) be an EOS-based ps-grammar where F= (z, h, S, A). Then we may specify G also in the form G = (C, h, S, A, K). Remark. (1) In our study we will be concerned with EOS-based ps-grammars only. Hence, we will write “ps-grammar” rather than “EOS-based ps-grammar.” (2) We will carry over all the notations from EOS systems to ps-grammars. As pointed out above, we mask sentential forms to find out which of the symbols are to be rewritten. In the sequel, we need a more detailed description of a rewriting step. Each occurrence of a symbol to be rewritten is represented by the production applied to it. The remaining occurrences are represented by “identity markers”; dis- tinct markers are used for distinct symbols. The following homomorphisms, applied to such a “description word,” reconstruct the mask, the word to be rewritten (the “left-hand side”), and the word obtained (the “right-hand side”). Let G = (Z; h, S, A, K) be a ps-grammar. Let I, = { 1,: a E Z} be the set of iden- tity markers, where I, n Prod(G) = fa. Then - mask is the homomorphism in HOM(Z, u Prod(G), (0, 1 } ) defined by

mask( (a, w)) = 1 for all (a, w) E Prod(G) and mask(z,) = a for all z, E Zz. PATTERN SELECTORGRAMMARS PARSING 253

- Ihs is the homomorphism in HOM(Z, u Prod(G), JC) defined by

Ihs( (a, w)) = a for all (a, w) E Prod(G)

and

Ihs(z,) = a for all 2, E I,. - rhs is the homomorphism in HOM(Z, u Prod(G), C) defined by

rhs( (a, w)) = w for all (a, w) E Prod(G) and

rhs( la) = a for all 1, E I,.

We proceed now to define derivations. As said above, we distinguish between the first step of the derivation of a word in the language, and the remaining steps; the start symbol is always rewritten alone. This distinction is formalized by the notions of “ps-derivation” (every step is governed by the pattern selector), versus “derivation” (either rewriting the start symbol alone, or rewriting according to the pattern selector).

DEFINITION 2.3. Let G = (C, h, S, A, K) be a ps-grammar. - A ps-derivation of length 1 (in G) is a word w E (Zzesu Pprod(G))* with mask(w) E K. For x, y E C* we say that x directly ps-derives y (in G) if there exists a ps-derivation w of length 1 with Ihs(w) = x and rhs( w) = y; we write then x *‘c y. - A ps-derivation of length i > 1 (in G) is a sequence (wi ,..., wi) of words from (IL. u Prod(G))* such that (wi,..., wi- i) is a ps-derivation of length i - 1, wi is a ps- derivation of length 1 and rhs(w,- i ) = Ihs( wi). For x, y E C* and i > 1 we say that x ps-derives y in i steps (in G) if there is a ps-derivation of length i, (w, ,..., wi), where Ihs(w i ) = x and rhs(w,) = y; we write then x 3; y. - A ps-derivation (in G) is a ps-derivation of length i for some i > 1. If D = (WI,..., w,), i 2 1, is a ps-derivation then D is a ps-derivation of rhs(w,) from Ihs(w,) (in G). For x, y E z* we say that x properly ps-derives y (in G) if there is a ps-derivation of y from x; we write then x 3: y. Let 32 be the reflexive closure of E+&. We say that x ps-derives y (in G) for x, y E C* if x 3: y. We write x 3: y whenever x = y. - Let i > 0. A derivation D is a sequence (wi ,..., wi) of words from (Zz u Prod(G))* such that either (a) D is a ps-derivation, or (b) w1 = (S, x) for some (S, x)~Prod(G), and (We,..., wJ is a ps-derivation. If D= (We,..., wi) is a derivation, then D is a derivation (of length i) of rhs(w,) from Ihs( w t ) (in G); the length of a derivation D is denoted by 1DI. If D is a derivation in 254 GONCZAROWSKI AND SHAMIR

G, then D is a derivation of Patt( G). For X, y E Z* we say that x derioes y (in G) if there is a derivation of y from x; we write x *z y. - For a derivation D = ( w1 ,..., wi), the trace of D (Trace(D)) is the sequence of words (Ihs( wi),..., Ihs(w,), rhs(wi)). The elements of Trace(D) are called sentential forms. - The language of G is the set L(G)= {WEA*: S*z w}. The above notions are illustrated in the following example.

EXAMPLE 2.1. Let G= ({A, C, (I, 6, c, S}, h, S, {a, 6, c>, O*lO*lO*), where h is defined by

h(A)= {Aa,a},h(C)= {bCc, bc}, and h(S)=AC.

Table I shows a derivation and its trace. Obviously L(G) = { a”b”P: n > O}. If 9 is a family of EOS systems and X is a family of pattern selectors, then we denote by 9’(9, X) the family of the languages generated by those ps-grammars (F, K) with FE 9 and KE X. In particular, Y(EOS, K) is the family of the languages of all those ps-grammars G with Patt(G) = K. It was shown in [11] that ps-grammars with regular pattern selectors generate all the recursively enumerable languages, modulo a small (fixed) number of marker symbols. On the other hand, Y(EOS, l*) is obviously EOL and 9’(EOS, O*lO*) generates exactly the context-free languages. The EOL pattern selector indicates that all occurrences of symbols in a sentential form have to be rewritten, whereas the CF pattern selector causes a single symbol to be rewritten in each sentential form. It is now natural to study the families of languages obtainable by rewriting k symbols at each step. We consider two variations: the symbols are required to be adjacent (pattern selector O*l“O*), or there is no such restriction (pattern selector O*(lO*)k). The problem was originally posed to the authors by Rozenberg. It is still an open problem whether these families form a hierarchy for increasing k. The relationship between the adjacent and nonadjacent cases is also unknown. Obviously, for all k 2 1, every context-free language is in 9(EPOS, O*lkO*) and in Z(EPOS, 0*( 10*)k). On the other hand, Example 2.1 shows a ps-grammar of the non-adjacent kind

TABLE I

Derivation Trace

(s, AC) s (A, Aa)(C, bee) AC (A, Aa) &(C, bCc) I, AabCc (A, Aa) t,~,,t~lJC, bCc) [,.I, AaabbCcc AaaabbbCccc PATTERN SELECTOR GRAMMARS PARSING 255

that generates a non-CF language already for k= 2. It is easy to see that the language a”a”...a” i 1 2 2k. *n>O) can be generated using the pattern selector O*(lO*)k. Recently it was shown in [3] that the language

{WE (a, b}*: w=u”‘l. b4m for some m, n>O}, where I denotes the shuffle operation, is in Z(EPOS, O*llO*).

3. OPERATIONS AND BOUNDS ON DERIVATIONS AND P-DERIVATIONS

In this section we show that the result of applying the Kleene star closure to cer- tain pattern selectors preserves the derivation relation. In particular, this holds for the pattern selectors of the form O*lkO* and O*(lO*)k. For instance, using masks from (O*lkO*)* means that any number of k consecutive symbols are rewritten in parallel. This leads to a bound on shortest derivation lengths of words in the Kleene star closure version.

LEMMA 3.1 (Star closure of K). Let K=O*LO*, where Lz (0, I}*. Let G = (C, h, S, A, K) be a ps-grammar and let H = (Base(G), K* ). Then G and H define the same derivation relation. Proof. Since K E K*, every derivation in G is also a derivation in H. For the converse direction it can easily be seen by induction on i that for every ps- derivation (w) of length 1 in H with mask(w) E K’, i > 1, there is an equivalent ps- derivation of length i, (ul ,..., ui), in G. 1

DEFINITION 3.1. Let D = ( w1 ,..., w,) be a ps-derivation in a ps-grammar G= (C, h, S, A, K). D is stationary if w~E(Z,U {(a, b): a, bEC})* for all 1

LEMMA 3.2 (Pattern interchange). Let G be a ps-grammar and let

D’=(u IT..., 47, -I, u,,,..., %nz’ %*+ I,.**, ui) 256 GONCZAROWSKI AND SHAMIR be a stationary ps-derivation in G. Then for all 1

D’= (u 1,...? hl, - 1, v,, ,.‘., VrnZ’hnz+ I,-.., ui) in G, such that: (a) mask(vj) = mask(u,j) for all m, < j < m,, (b) mask( v,, ) = mask( urn>), and (c) mask(v,,) = mask(u,,).

Proof Let p be the width of D. We construct u,,,..., vm2 by “shifting” produc- tions in fixed positions from word to word. If there is a production in a specific position in both u,, and urn2 or in neither of these words, no change is necessary in this position. If there is a production in u,, and not in umZ, then we shift each production on to the next word in which there was a production in this position. If there is a production in urn2 but not in u,,, we shift the productions backwards. Thus, D’ satisfies (a), (b), and(c) above. The formal proof and description of D’ are as follows. For all 1 < j6 p,

urn,Cjl = llhs(u,,[ j]) if u,,CA E1, and urn,Cjl = a,[A if u,,[ j] E Prod(G), where t=min{n:m,

v,Cjl = trhs(v,ml[j]) if u,[ j] E I,, v,C.A= WI if u,[ j] E Prod(G) and u,,[j], u,,[j] are both either in Prod(G) or in Z,; 0,C.d= u,Cjl if u,[jJEProd(G),u,,[j]EProd(G),andu,,[j]EZ,, where

t= max(n: m, Qn

t = min(n: s < n < m2 and u,[ j] E Prod(G)}. PATTERNSELECTOR GRAMMARSPARSING 257

For all 1

Since the construction does not change the number of times each symbol in lhs(u,) is rewritten and the sequence of productions that rewrite it, (o,,,..., urn*) is indeed a stationary ps-derivation in G that is eqivalent to (urn,,..., u,,). It is now easy to see from the construction of D’ that

mask(u,,) = mask(u,,),

mask(u,,) = mask(u,,), and for all m,

Hence, the lemma holds. 1

DEFINITION. 3.2. Let G be a ps-grammar and let D be a ps-derivation in G of length i Let 1 < m, , m2 d i. We call the ps-derivation D’ that was constructed in the proof of Lemma 3.2 the (m,, m&interchange of D.

LEMMA 3.3. Let G = (C, h, S, A, O*lkO*) be a ps-grammar and let H = (C, h, S, A, (O*lkO*)*). Then for every stationary ps-derivation D in G there is an equivalent stationary ps-derivation D’ in H such that ID’1 < (2k - l)( #Z)k. Proof. We shall prove the lemma by induction on derivation length. Let D be a stationary ps-derivation in G of width p. BASIS If IDI Q (2k - l)( #C)k, we let D’ = D. The lemma holds then because D is also a ps-derivation in H. INDUCTION. We assume that for every stationary ps-derivation D, in G of length 6i, there is an equivalent stationary ps-derivation D, in H, such that (D,I < (2k- l)( #Z)&. Let D= (ul,...r Ui+l) be a stationary ps-derivation in G. By the induction hypothesis, there is a stationary ps-derivation D2 = (u, ,..., oi.) in H, where i’ d (2k - 1 )( # C)“, such that D, is equivalent to (uI ,.,,, ui). Let

u, + 1 = 2a, “-b#‘,, c1)(bz, d...(bk, ck) ia,+k+;..Lp 258 GONCZAROWSKI AND SHAMIR be the last derivation step in D. In ui+ , , the “j+ l-block” (j+ l,..., j+ k) is rewrit- ten (i.e., masked by 1’). There are 2k - 1 different Z-blocks (j-k + 2 < 1~ j+ k) which intersect our fixed j+ l-block. If in some sentential form in the trace of our derivation the fixed block is masked by 0 *, then we “push” the last step into it. Otherwise, if for some I, more than ( #C)k sentential forms contain the Z-block masked by lk, then there is a precise repetition of the Z-block. We can now “con- tract” it (replace l”-masks by Ok-masks, in between), thus reducing the number of sentential forms in which this Z-block is masked by 1 k, to less than ( #AJk. Doing this for all the 2k - 1 blocks indicated above, there is now a sentential form in which the fixed j+ l-block is masked by Ok, into which we can push the last step. We proceed now to the formal construction, in which we distinguish between several cases.

lDzl<(2k-l)(#C)“-1, (3.1) let D’ = (v, ,..., ui,, u ,+ ,). Obviously, D’ is a stationary ps-derivation in H, 1D’l < (2k - 1 )( # E)“, and D’ is equivalent to D. (B) Let lDzl=(2k-l)(#C)k andu,[j+l],...,u,[j+k]EZ, forsomel

Let D, = (w , ,..., wi.) be the (m, i’)-interchange of D,. By the pattern interchange lemma (3.2), D3 is equivalent to Dz, and hence to (u,,..., ui). Let now

D’= (w ,,..., w:- ,, Wis[j+ 1: (bl, cl),-, j+k: (bk, ck)]).

D’ is a stationary ps-derivation in H that is equivalent to D. Moreover, ID’1 = (2k - I)( #,E)k. Hence the theorem holds. (C) Let ID,1 =(2k-1)(#z)k, and let at least one of o,[j+ l]...o,[j+k] be in Pprod( H), for all 1 Q m 6 i. Let 16 I < p -k + 1. We say that a word urn, 1 d m < i’, contains an l-block if the word u,[ 1 ] . . . u,[Z - 1 ] contains 0 mod k occurrences of symbols from Pprod(H), and u,[Z],..., u,[Z + k - 1) E Pprod(H). For any stationary ps-derivation D” in H of width p, let M(D”)={Z:j-k+2~Z~j+kandthereareatleast(#~)kwords in D” containing an Z-block}.

Note that M(D,) # @, due to the condition on i’. We shall construct a stationary ps-derivation D, in H that is equivalent to D2, such that M(D,) is empty. In this case, we have at most ( #C)k - 1 words containing an Z-block, for any given 1 between j- k + 2 and j+ k. But then, for some word u in D,,

u[j+ l],..., u[j+k] are all in Zz. PATTERN SELECTOR GRAMMARS PARSING 259

Let

be the sequence of ps-derivations that is constructed from D, as follows:

DI#M(D~)=Dz. Let us assume that we have already obtained D:,= (xl,..., x,0,,1 for some # M(D,) > n 2 2. We shall now obtain Dl-, . Let I = min M(DA) and let s be the number of words in 0; that contain an Z-block. Without loss of generality we may assume that those words that contain an Z-block are the last s words in Da. (Otherwise, let x,, ,..., x,$ be those words in DA that contain the Z-blocks, such that l

lhs(x,[j+ t]) = rhs(x,.[j+ t]) for all 1 < t Q k and

ID;l-q’+l+q-(ID;I-s)+l<(#C)‘T

Let

for q 6 m < q’. Let (z , ,.,., z,) be the sequence consisting of those words in the sequence ( yy,..., y,,) that are not in Ix*. Let

Ok-, = (x1 ,..., xy- I, zl,..., z,, x,,.+ ,,.. ., x,,;,).

Q-1 is a stationary ps-derivation in ZZ. Note that M(Di _ 1) = M(DA) - 1. Moreover, Di- I is equivalent to DA and thus also to D,. Db is thus a stationary ps- derivation in H that is equivalent to D2, and M(Db) = 0. But now either (3.1) or (3.2) holds for 0;. We return thus to case (A) or (B), respectively. 1

THEOREM 3.1. Let G be a ps-grammar with Patt(G)=O*lkO*. For every word wcL(G) there is a derivation of w in the ps-grammar (Base(G), (O*lkO*)*) of length< jwl(2k- l)( #C)k. Proof: Immediate by Lemma 3.3. i 260 GONCZAROWSKI AND SHAMIR

LEMMA 3.4. Let G = (C, h, S, A, O*(lO*)k) be a ps-grammar and let H= (C, h, S, A, (0*( 10*)k)‘). Then for every stationary ps-derivation D in G there is an equivalent stationary ps-derivation D’ in H such that ID’1 < k2k( #C)k(k+1)‘2. Proof outline. Again, as in Lemma 3.3, if there are two masks in the original derivation which do not overlap, then we can combine them into one mask in the star version of the pattern selector. If a particular mask is used more than #Ck times in the derivation, then we can make those steps, at which the mask is used, consecutive (using the pattern exchange lemma). Finally, we eliminate duplicates. In Lemma 3.3, there we only 2k- 1 possibilities of overlaps of different masks. Here, the number of overlaps is a function of the derivation width. ,The way we overcome this difficulty can easily be seen in the case of k = 2. If there are 2#C subsequent masks that overlap in one position and that are all dif- ferent in the other position, we can find a symbol A that occurs at least 3 times in that position. In particular, there is one such subderivation of even length 1 in which the symbol A derives itself. We replace this subderivation by l/2 derivation steps which do not rewrite A, and which combine all the now “single” positions into pairs. This construction can be generalized to arbitrary values of k, by reverse induc- tion on the length of overlaps. Let us consider, for each s, k > s > 1, overlaps of size s. We claim that we can eliminate duplicates in overlaps until each overlap of size s occurs at most r, = k2’# cS(‘+ 1)‘2 times. This is true for s = k, as stated before. Let the induction hypothesis be true for all k 3 t 2 s + 1. If it is also true for s, the claim holds. Otherwise, we may assume that there is a set of s positions that are covered by more than rs occurrences of masks. These occurrences may be assumed consecutive in view of the pattern exchange lemma. Since rs = (k2 # Z’) r, + 1, it follows from the induction hypothesis that there are more than k2 # Z” occurrences of masks which overlap in s positions but differ from each other in the remaining positions. Again we may assume these masks to be used at consecutive steps. This derivation contains at least k + 1 duplicates of the symbols in the s overlapping positions. We may thus pick a subderivation of length c x k, for some c > 0, which acts like the identity rewrite on the overlap positions, and where all the other positions are distinct from mask to mask. The overlap positions may then be replaced by identity, and the other positions may be rearranged to yield a derivation of length c(k - s), which is of length

THEOREM 3.2. Let G be a ps-grammar with Patt(G) = 0*( 10*)k. Then for every word w in L(G) there is an equivalent derivation of w in the ps-grammar (Base(G), (O*(lO*)k)*) of length < /WI k2k( #,?Z)k(k+ ‘)j2. Proof: Immediate from Lemma 3.4. 1 PATTEiRNSELECTORGRAMMARSPARSING 261

4. TIME COMPLEXITY

In Section 3 we have shown that in propagating ps-grammars with the pattern selector O*lkO* or O*(lO*)k, we can impose a limit on the derivation length of words in the language. In this section, we shall first present a necessary and suf- ficient condition that helps us determine, for a given EPOS system F, if a given “context-free” style derivation (which is, in essence, a derivation in (F, (0, 1 } *)) is equivalent to a derivation in (F, O*l lo*). This condition will allow us to modify algorithms that recognize context-free languages, yielding algorithms recognizing languages in 9(EPOS, O*llO*). A similar method will be pointed out that allows the parsing of languages in Y(EPOS, O*lO*lO*). All these algorithms are shows to run in polynomial time. Finally, we shall see that Y(EPOS, O*lkO*) and Y(EPOS, O*(lO*)k) are in NP. For the purpose of “filtering out” derivations conforming to the pattern selector O*i lo*, we associate with each derivation D of the “free” pattern (0, 1 }* a set of pairs of integers, as follows. The meaning of (i, j) E Valse(D) is that one can mask the derivation D according to the pattern {O*llO*)*, except for i single productions on the leftmost “branch” of D and for j single productions on the rightmost branch of D. The definition will use induction on the structure of derivations, and the parsing algorithm will keep track of this information.

DEFINITION 4.1. The valence set of a derivation D in the ps-grammar G = (C, h, S, A, (0, 1 } * ), Valse( D), is a set of non-negative integers and is defined as follows. (a) If D is an empty derivation, then Valse(D)= {(i, i): i>O}.

(b) If D consists of a single word with a single symbol that is a production, then Valse(D) = { (1, 0), (0, l)}.

(c) If D consists of a single word with a single symbol that is in I,, then Valse( D)= (0, 0).

(d) If there are derivations D, = (u ,,..., u,) and D, = (ul ,..., u,), such that D = (u, uI ,..., u,u,), then Val.se(D)= ((i, k): (i,j)~Valse(D,) and (j,k)~Valse(D,) for someja0).

(e) If D is of the form (U ,,..., uk, uI ,..., u,), then Valse(D) = {(i + i’, j +j’): (i, j) o Valse((u, ,..., &)) 262 GONCZAROWSKI AND SHAMIR

and (i’,j’)~Valse((u, ,..., v,))}. 1

In particular, valence sets characterize ps-derivations of the pattern selector (o*l lo*)*.

LEMMA 4.1. Let G = (C, h, S, A, (O*llO*)*) be a ps-grammar and let D be a ps- derivation in the ps-grammar H = (C, h, S, A, { 0, 1 >* ). Then D is a ps-derivation in G if and only if (0,O) is in Valse(D).

Proof: Let D be a ps-derivation in G. We shall see by induction on the length of D that (0,O) E Valse(D). BASIS. Let D = (w) be a ps-derivation of length 1: (i) If w E Zz, then, obviously, Valse(D) = (0,O). (ii) If w= rcrr’, where q7t’ E Prod(H), let D, = (x) and D, = (71’). Since (0, 1) EValse(D,) and (1, O)EValse(D,), it follows that (0, 0)~ Valse( D). (iii) Otherwise, w is of the form w1 . .. w,, where each wi is either in Z$ or in Prod(H) Prod(H). Therefore (0,O) E Valse( D). INDUCTION. Let k B 1. We assume that if D’ is a ps-derivation in G of length i, 1 Q i < k, then (0,O) E Valse(D’). Let

D=(w,,..., w/c+,). It follows from the induction hypothesis that

(0, 0) E VaW(w, ,..., wk)) and (0, OkVaW(wk+d). Thus, (0,O) E Valse(D). We shall now see the converse direction. Let D be a ps-derivation in H with (0,O) E Valse(D). We shall prove by induction that D is also a ps-derivation in G. BASIS. Let D = (w) be a length 1. Let

such that w = w1 ... w,, (0,O) E Valse((wJ) for 1 d i < n, and such that there is no subword w’ of any wi, 1~ i < n, with (0,O) E Valse((w’)). For all 1~ i < n, wi is either a symbol in I, or a word in Prod(H)*. It remains thus to show that if wisProd(H then it is of even kqth. Let wi = xt ... n,. PATTERN SELECTOR GRAMMARS PARSING 263

Since Valse((zj)) = {(JO), (0, 1)) for all 1

VaW(Wi)) = { (1, O), (0, 1 I} if lwil is odd = m Oh (1, l,> otherwise.

Hence, lwil must be even. INDUCTION. We assume that, if D’ is a ps-derivation in H, ID’1 < i and (0,O) E Valse(D’), then D’ is also a ps-derivation in G. Let

D = (WI,..., wk+ I).

By the definition of a valence set,

(O,O)EValse((w ,,..., ~~))nValse((w,+,)).

Thus, (w1 ,..., wk) and (wk+ 1) are ps-derivations in G. It fOllOWS that (We,..., wk+ ,) iS a ps-derivation in G, because we know that it is a ps-derivation in H. m

LEMMA 4.2. Let H = (Z, h, S, A, (0, 1 } * ) be a ps-grammar and let D= (w,,..., w,) be a derivation in H. Then (a) Valse((u, We,..., w,))=Valse(D), where u is the word in I,* with Ihs(u) = Ihs(w,) and (b) for aN 1

The Valse information was used by [3] to construct a non-EOL language in y(EPOS, O*l lo*). We shall now show how to use valences in the recognition and parsing problem for G. We want to determine whether the word w = a, . . . a, is in L(G) for a propagating ps-grammar G with Patt(G) = O*l lo*. Let H be the ps- grammar (C, h, S, A, (0, 1 }*). We shall use a Cocke-Younger-Kasami [21] style algorithm, working on a 4-dimensional matrix A = [Ai,j,k,,], where 1 d i, j< n and 0 6 k, I < 3n ( #C)‘. Each element in this matrix will contain symbols from Z, such

that b E Ai,j,k,, if and only if there is a derivation D in H of ai.. * ai+j- 1 from b with (k, 1) E Valse(D). By the star closure lemma (3.1), G is equivalent to (Base(G), Patt(G)*). It follows thus from Lemma 4.1 that w E L(G) if and only if

SEA 1 .n,O, 1 .

ALGORITHM 4.1. Input: A ps-grammar G = (Z, h, S, A, O*l lO* ) and a word w = a, . . . a, E A*. Output: “YES’ if w E L(G), otherwise “NO”.

57113013-2 264 GONCZAROWSKI AND SHAMIR

begin fori := 1 to n doAi,l,O,O:=a,; forj:=l tondofori:=nto 1 do for k := 0 to 34 #Z)2 do for I := 0 to3n( #L’)2 do for s:=O to 1 do beginp,:=k-s;p,:=I-l+s; ifpO > 0 and p, 2 0 then begin for all (6, c, . ..c.)~Prod(G) do for all 1 6 t, ,..., t,

LEMMA 4.3. Let H= (C, h, S, A, (O*llO*)*) be a ps-grammar and let w = a, . . . a, E A*. Let A = [Ai,,,k,,] be the matrix computed by Algorithm 4.1. Then

bEAi,j,k,l if and only if there is a derivation D of ai ’ ’ ’ ai+ j-, from b in H with (k, 1) E Valse(D).

Proof Let D be a derivation in H of a,. . . ai+ j- r from b with (k, I) E Valse(D), such that

O

We shall see by induction on the length of D that b E Ai,j,k,,. BASIS. Let IDI = 1. If D = (w) for some w E I& then w must be z,!. Since VaW(l,))= (0, 0) and aiEAi,j,o,O, the induction hypothesis holds for b = ai and IDJ = 1. Otherwise, w= (b, ai...ai+i_l ). Since wEProd( Valse((w))= ((0, l)(l,O)). It is easy to see from the algorithm that b E Ai,i,O,l and b E Ai,j,,,o. Hence, the induction hypothesis holds for IDI = 1. PATTERN SELECTOR GRAMMARS PARSING 265

INDUCTION. Let D be a derivation of ai... a,, j- r from b that is of the form

((6 cl ... Gn)~ W2,l ... W2,m,..., WIDI.1 ... W,D,,A where Dr = (w~,r,-, w,~,,r) is a derivation of

from c,, for all 1 < r d m. Let

PO,*-, Pm E {O,..., 34 7w’) be integers such that ( prP r, pr) E Valse(D,), for 1~ r < m and ( po, p,) = (k - 1, I) or ( po, p,) = (k, I - 1). This choice is possible by the definition of a valence set and by Lemma 4.1. By the induction hypothesis, c,EA. r+t1+. ” +~,-I,~,,P,-IIPr for all 1 < r < m. By the definition of a valence set,

Valse( D) = ((p. + s, p, + 1 - s): there are p1 ,..., pm _ r, such that (prPl, p,)EValse(D,)forlgr

Theorem 3.1 allows us to choose p1 ,..., p, from the set {O,..., 3n( # C)‘} only. It is now easy to see that b is added by the algorithm to Ai,j,k,l. Hence the induction hypothesis holds. We shall now see the other direction of the lemma, proving by double induction onj and k, I that if b E Ai,j,k,l then there is a derivation D of cli... u,+~-~ from b in H with (k, 1) E Valse(D). BASIS. Let j = 1 and k, I= 0. Then b = a;. Since H is propagating, it follows that D must be of the form (l,,,..., I,,). By the definition of a valence set, Valse(D) = (0,O). Let j= 1 and let either k 2 1 or 12 1 or both. Let b E A+,. Since H is propagating, the algorithm can only add symbols to Ai,l,k,, using chain productions. Thus, there must be a symbol b such that (b, c) E Prod(H) and c E A,,,,- 1,l or C E A,,,,,,- 1. From the induction hypothesis it follows now that there is a derivation D, = (wl,..., w,o,,) of ai from c, such that either (k, I- 1) or (k- 1, I) is in Valse(D,). Let

D = ((b, c), wl,..., w,o,,). By the definition of a valence set, (k, I) E Valse(D). Moreover, D is a derivation of ui from b. Thus, the induction hypothesis holds for j= 1 and for all k, 1. 266 GONCZAROWSKI AND SHAMIR

INDUCTION. Thus, for all 1 < j’ < j and for all k, 1, there is a derivation D of ai”‘ai+jP, from b with (k, I) E Valse(D) if b E Ai,j,k,l. Note that Ai,j+ l,o,o = 4 for all 1 6 j 6 n - 1. Since there are no derivations D of a word of length > 2 from a single symbol, such that (0,O) E Valse(D), the induction hypothesis holds for j + 1 and (k, I) = (0,O). Let k>l or f>l or both and let bEAi,j+l,k,,. Let (q,,U,)=(k,I-1) or (u,, u,) = (k - 1, I). Then there is a production

(b, cl . . . c,) E Prod(H) and there are

0 < t, )...) t,

and

0 6 p(),...,pm < 34 #C)‘, such that

t, + ..’ +t,=j and

c, E A I+II+ ..’ +I,-lJ,,P,-l.P, for all 1 6 r 6 m. We distinguish between two cases. If m > 1, then t ,,..., t, < j. By induction hypothesis (on j), there are thus derivations D, ,..., D,, such that

D, = (ww.., w,~,,,r) is a derivation of

from c, and ( pr- I, p,) E Valse(D,), for all 16 r < m. By Lemma 4.2, we may assume that (D,I = ... = ID,1 = q. (Otherwise we “stretch” all the derivations to the length of the longest derivation by inserting words in I$.) Let

D = ((b, ~1 . * . c,), WI,1 * . . Wl,m )...) WY,, . . . WJ. D is a derivation of ~~...a~+~-, from b. Moreover, by the definition of a valence set, (k, I) E Valse(D). PATTERN SELECTOR GRAMMARSPARSING 267

If m = 1, then t, = j+ 1. By the induction hypothesis (on k, 1) there is a derivation

D1=(w,,..., wp,,) of C7,“.Ui+ i-1 from cr with (q,, u,)~Valse(D,). Let

D = ((6 cl 1, WI,..., w,&

D is a derivation of ai...u,+j-l from b. Moreover, by the definition of a valence set, (k, I) E Valse(D). This completes the proof of the induction hypothesis. Thus, the lemma holds. 1

THEOREM 4.1. Languages from Y(EPOS, O*llO*) admit a parsing algorithm of time complexity O(n2 + 2Msxr(G)).

ProoJ Let G = (Z, h, S, d, O*l lO* ). By the star closure lemma (3.1), G is equivalent to

H = (C, h, S, A, (O*l lo*)* ).

Every derivation D of a terminal word a, . . . a,, from S in H (and also in G) must be of the form ((X xl, Wl,..., WA where D, = (wt ,...,w,)

is a ps-derivation in H. Thus, (0, 1) E Valse(D) if and only if (0,O) E Valse( D, ). By Lemmas 4.1 and 4.3, Algorithm 4.1 recognizes this L(H) and hence L(G). In Algorithm 4.1, there are 4 nested loops (on i, j, k, I). The loops on t, ,..., t, and Pl Y.--T pm-i cause repetition of the inner statements (at most) O(r?“‘- ‘I) times. Since m is bounded by Maxr(G), it follows that the innermost statement of the algorithm is executed at most O(n4+2(M”“‘(G)-1)) = O(n2+2MPxr(G)) times. 1

Remark. Algorithm 4.1 can easily be extended to parse Y(EPOS, O*l lO* u lO*l ). These languages are generated by “cyclic” pairwise rewriting. Using similar arguments as above, it can be seen that a ps-derivation D of the pat- tern selector (0, l}* is a ps-derivation of the pattern selector O*l lO* u lO*l if and only if for some 0 < i < 3n ( #Z)2, (i, i) E Valse(D). We have thus just to modify the final test in Algorithm 4.1 and, therefore, the time complexity remains unchanged. 1 Algorithm 4.1 can be adapted to the case of Patt(G)=O*lO*lO* in a straightforward manner. Let Va12 be defined like Valse, except that in case (d), for ps-derivations D, = (ul ,..., u,) and D, = (wl ,..., w,), 268 GONCZAROWSKI AND SHAMIR

ValZ((o, w1 ,..., u m IV,,,)) = { (il + i,, j, + j,): (ik, j,) E Val2(D,) for k E (1,2} }. It can be shown, in a similar manner as in Lemma 4.1, that D is a ps-derivation in G if and only if G is a ps-derivation in (Base(G), { 0, 1 } * ) and (i, i) E Val2(D) for some 0 6 i < 16( #Z)3. Thus, (i, j) E ValZ(D) means that there we can “pair” the produc- tions in the words of D (i.e., following the mask (O*lO*lO*)*), leaving i leftmost and j rightmost single productions. The following algorithm recognizes G.

ALGORITHM 4.2. Input: A ps-grammar G = (C, h, S, d, O*lO*lO* ) and a word w=a1 . ..a.,EA*. Output: “YES” if u’ E L(G), otherwise “NO”. begin for i := 1 to n do Ai,l,o,o :=a,; for j:=l tondofori:=ntoldo for k :=0 to 164 #,X’)3 do for 1 :=0 to 164 #C)’ do for s:=O to 1 do begin p, :=k-s; pzm :=I- 1 +s; if p, 20 and pzm>O then begin for all (b, c , . ..c.,,)~Prod(G) do for all 1 < t , ,..., t, < j with t, + ... + t, = j do for all 0 < p2 ,..., p2m .~, 6 16n( # C)3 with C:= , p2,. = X7= 1 p2r ~, do begin prodfound : = true; forr:=l tomdo

if c, E A.I+II+ “’ +f,-1J,,P2,-,.P2, then prodfound := false; if prodfound then Ai,,,k,l := Ai,l,k,lu b; end; end; end; Jug := “NO”; for k := 1 to 16n( #.Q3 do if SE A,,n,k+ ,,k u Al,n,k,kp, then JIag := “YES”; outPuw%); end;

THEOREM 4.2. Languages from y(EPOS, O*lO*lO*) admit a parsing algorithm of time complexity O(n’ + 3MaxrCG)). Prooj By similar arguments as in Theorem 4.3, it can easily be seen that Algorithm 4.2 does actually recognize the language L(G). The extra Maxr(G) - 1 in the exponent stem from the fact that now we have the variables ql,..., q2m instead of the variables q,,,..., qm to take care of the valence values. We get thus ,qn4+WWG)b 1’) = qn’ +3MaxW)) steps. 1 PATTERN SELECTORGRAMMARS PARSING 269

There are also Earley style algorithms for the family Z(EPOS, O*llO*) and .9(EPOS, O*lO*lO*). We shall only present the algorithm for the family JZ(EPOS, O*llO*). The language family 5?(EPOS, O*lO*lO*) can be parsed similarly. Let u1 +. . a, be the given word. The algorithm operates on two vectors A = [A(),..., A,] and B= [&,..., B,] of sets of “dotted items.” The meaning of a dotted item [a, u * U, i, k, I] to be in Aj u Bj is the following. There is a derivation D, of a sentential form wax from S, a derivation D, of a, ... a,- i from w and a derivation D of ui”’ ai+ j-, from u, such that (0, m)~ Valse(D,) for some m and (k, I) E Valse(D). The difference between sets A and B is for marking purposes.

ALGORITHM 4.3. Input: A ps-grammar G = (C, h, S, A, O*l lo*) and a word 01 .,*a,EA*. Output: “YES” if a, . . . a,, E L(G) and “NO” otherwise.

begin for all (S, w)~Prod(G) do B, := B,u [S. w, 0, 0, 01; for j := 1 to n do begin forall[b,u~cu,i,k,1]~Bj_,doifc=ujthenA,:=A,u[b,uc~u,i,k,I]; while Aj # 0 do begin for all [c, U. i, k, I] E Aj do begin Aj:=Aj-[c,u*i,k,l]; Bj:=Bju[c,u.i,k,l]; for all [b, w. cx, i’, m, k + 1 ] E Bi do if [b, WC. x, i’, m, l] $ Bj then Ai := Aj u [b, WC. x, i’, m, 11; for all [b, w . cx, i’, m, k] E Bi do if [b,wc.x,i’,m,I+1]~BjthenAj:=A,u[b,wc.x,i’,m,Z+1]; end; for all [b, z.4.cu, i, k, E] E Aj do begin Aj:=Aj-[b,u.cu,i,k,l]; Bj:=Bju[b,u.cv,i,k,I]; for all (c, w)EProd(G) do if [c. w, j, 0, 0] # Bj then Aj := Aju (c. w, j, 0, 01; end; end; end; if[S, x. 0, 0, 0] E B, for some x then output (“YES”) else output (“NO”); end;

Algorithm 4.3 and its analog for the O*lO*lO* case yield the following time bounds for parsing.

THEOREM 4.3. Languages from Z(EPOS, O*llO*) and 2’(EPOS, O*llO*) admit parsing algorithms of time complexity O(n6) and CI(n’), respectiuely. 270 GONCZAROWSKIANDSHAMIR

Remark. Note that the time complexity results of the Earley style algorithms (4.3) are identical to those of the Cocke-Kasami-Younger style algorithms (4.1 and 4.2) whenever Maxr(G) = 2. Unfortunately, these methods do not generalize to 9(EPOS, O*lkO*) or JZ(EPOS, O*(lO*)k). Whenever we want to join two sub-derivations in the O*llO* case, we have only to take care that the number of rightmost “single” symbols in the left sub-derivation and the number of leftmost such symbols in the right sub- derivation is the same. In the case of O*lllO*, there are two ways to “split” a mask in two; either we rewrite one symbol on the left sub-derivation and two in the right sub-derivation or vice versa. Now the order becomes important, because one can- not join two sub-derivations if the order of the “partial” rewritings is not the same. In [6], scheduling theory techniques were used to present deterministic polynomial parsing algorithms for languages from y(EPOS, O*(lO*)k). We get the following upper bounds.

THEOREM 4.4. The languages in JZ(EPOS, O*lkO*) and Z(EPOS, O*(lO*)k) are parsable in nondeterministic polynomial time. Proof Let L~diP(EP0&0*1~0*) and let G= (C, h, S,d,O*lkO*) be a ps- grammar, such that L(G) = L. Let w E L. By Theorem 3.1, there is a derivation of w of length lw((2k - I)( # C)k. Beginning with the start symbol, we can now nondeter- ministically choose the symbols to be rewritten and the productions to be applied at each derivation step. Then we apply the productions, obtaining thus a new sen- tential form. We repeat this process, halting when w is obtained. Since each senten- tial form contains at most Iw( symbols, it follows that languages in 9(EPOS, O*lkO*) are in NP. A similar argument holds for the pattern selector o*( lo*)k. 1

5. SPACE COMPLEXITY

In this section we shall see that the space complexity of the languages in y(EPOS, O*llO*) and 9’(EPOS, O*lO*lO*) is the same as the space complexity of the context-free languages, namely cO(log* n) (see, e.g., [7]). This will be achieved using a similar technique as in [18]. It was shown there that the family of EOL languages is log-space reducible to the CF languages. The following theorem is used in the reduction.

THEOREM 5.1 [ 171. The set of languages which are log-space reducible to con- text-free languages is the set of languages recognizable in polynomial time by non- deterministic log-space bounded auxiliary pushdown automata. An auxiliary (APDA) is a pushdown automaton with an auxiliary work tape, and with acceptance on empty store. The formal definition can be found, e.g., in [S]. PATTJZRNSELECTORGRAMMARSPARSING 271

THEOREM 5.2. Y(EPOS, O*llO*) and 9(EPOS, O*lO*lO*) are log-space reducible to the context-free languages.

Proof Given a ps-grammar G with Patt(G) = O*l lo*, we will construct a non- deterministic APDA that recognizes L(G) in polynomial time and logarithmic work space. The automaton will act like a pushdown automaton that simulates context- free derivations bottom up (see, e.g., [7]). We add to each occurrence of a symbol A on the stack two integers i in j, in binary notation; the APDA will enforce that (i, j) is in the valence set of subderivation that starts at A. Finally, if we encounter the start symbol S with the pair (0, l), and if there is nothing below, the APDA will pop them of the stack, and will thus accept if it is at the end of its input and if the stack is now empty. In more detail, we have two kinds of (sequences of) moves. A shift move takes the next input symbol a and pushes down the live symbols [O&l] (we assume that the symbols 0, 1, [ , ] do not occur in G). A reduce move pops of the stack a sequence oftriplets [i,A,i,][i,A,i,]... [i, _ , A, i,] (in reverse order). If there is a production (A, A, ... A,) in G, then the ADPA pushes down either [i,A(i, + l)] or [(iO + 1) Ai,]. Finally, if ]lSO[ is on top of the stack, the automaton empties its stack if there is nothing below. F there is anything below, then the run enters a failure state. It is easy to see that the valence correctness is enforced by the automaton. Hence, by Lemma 4.1, the APDA recognizes L(G). It remains to show that the APDA requires log n work space and that it operates in polynomial time in n, where n is the length of the input word. Theorem 3.1 implies that the height of the derivation tree, and thus also the valence values, can be bounded linearly by n. Thus, a valence value can be kept in log n space. During a shift move, the work tape is not used at all; during a reduce move, the work tape has to hold at most 2 valence values at any point of time (i, and the current i,). Hence, the work space bound holds. For the time bound we note that the automaton makes exactly one shift or reduce move for each node in the derivation tree. But the total number of nodes is polynomial in n (the width of the three is at most n, and the height is bounded linearly in n). A shift move consists of 5 individual moves. A reduce move consists of about 3Maxr(G) x log n individual moves. It follows that the automaton operates in polynomial time. A similar APDA can be constructed for the case where the selector is O*lO*lO*, using the recurrence from the definition of Va12. 1

Theorem 5.2 gives us the following space bounds.

COROLLARY 5.1. Languages in 9(EPOS, O*llO*) and 9(EPOS, O*lO*lO*) are recognizable in @log* n) space. 212 GONCZAROWSKIANDSHAMIR

TABLE II The complexity of y(EPOS, K)

Pattern Selector K Derivation Length in K* < Time Complexity < Space Complexity <

o*lko* (2k-l)(#T)kX IWI NP o*(lo*)k k*k(#C)k(k+‘)‘2X JWI NP ([GW]: P) O*llO* and 3( #Q x IWI (l)n 2+2MPxr,C) log* n 0*110*u10*1 (2) n6 0*10*10* 16( #C)3 x Iwl (l)n 1+ 3Mcz(G) log’ n (2) n’

6. SUMMARY

Table II gives an overview of the results obtained in this paper. It shows the various pattern selectors that have been examined, the respective derivation lengths obtained (in the star version of the grammar), and the time and space complexity of the respective languages.

REFERENCES

1. P. ASVELD, Time and space complexity of inside-out macro languages, Infernat. J. Compuf. Math. 10 (1981), 3-14. 2. C. BEERI AND E. SHAMIR, Checking stacks and context-free programmed grammars accept P-com- plete languages, in “Proc. 2nd Colloq. Automata, Lang. Programming.” Lecture Notes in Computer Science Vol. 14, pp. 27-33, Springer-Verlag, New York/Berlin, 1974. 3. E. DAHLHAUS AND H. GAIFMAN, private communication, Hebrew University of Jerusalem, 1985. 4. J. ENGELFRIET, “The Complexity of Languages Generated by Attribute Grammars,” technical report, Technische Hogeschool, Twente, 1982. 5. J. G~NCZAROWSKI, H. C. M. KLEIJN, AND G. ROZENBERG, “Grammatical Constructions in Selective Substitution Grammars,” Technical Report TWI 82-19, University of Leiden, 1982. 6. J. GONCZAROWSKI AND M. WARMUTH, “Applications of Scheduling Theory to Formal Language Theory,” Technical Report 84-15, Hebrew University of Jerusalem, 1984. 7. M. A. HARRISON, “Introduction to Formal Language Theory,” Addison-Wesley, Reading, Mass., 1978. 8. J. E. HOPCROFT AND J. D. ULLMAN, “Introduction to , Languages, and Com- putation,” Addison-Wesley, Reading, Mass., 1979. 9. N. D. JONES AND S. SKYUM, Recognition of deterministic ETOL languages in logarithmic space, Inform. and Control 35 (1977), 177-181. 10. N. D. JONES AND S. SKYUM, Complexity of some problems concerning L systems, in “Proc. 4th Colloq. Automata, Lang. Programming,” Lecture Notes in Computer Science Vol. 52, pp. 301-308, Springer-Verlag, New York/Berlin, 1977. 11. H. C. M. KLEIJN AND G. ROZENBERG, Context-free like restrictions on selective rewriting, Theoref. Compuf. Sci. 16 (1981), 237-269. PATTERN SELECTOR GRAMMARS PARSING 273

12. J. VAN LEEUWEN, The membership question for ETOL languages is polynomially complete, Inform. Process. Lett. 3 (1975), 138-143. 13. J. VAN LEEUWEN, The tape complexity of context-independent developmental languages, J. Comput. System Sci. 11 (1975), 203-211. 14. J. OPATRNY AND K. CULIK II, Time complexity of recognition and parsing of EOL languages, in “Automata, Languages, Development” (A. Lindenmayer and G. Rozenberg, Eds.), pp. 243-250, North-Holland, Amsterdam, 1976. 15. G. ROZENBERG AND A. SALOMAA, “The Mathematical Theory of L Systems,” Academic Press, New York, 1979. 16. A. SALOMAA, “Formal Languages,” Academic Press, New York, 1973. 17. I. H. SUDBOROUGH, On the tape complexity of deterministic context-free languages, in “Proc. 8th Annu. ACM Sympos. Theory of Computing,” 1976, pp. 141-148. 18. I. H. SUDBOROUGH, The time and tape complexity of developmental languages, in “Proc. 4th Colloq. Automata, Lang. Programming,” Lecture Notes in Computer Science Vol. 52, pp. 509-523, Springer- Verlag, Berlin/New York, 1977. 19. I. H. SUDBOROUGH, The complexity of the membership problem for some extensions of context-free languages, Internat. J. Comput. Math. 6 (1977), 191-215. 20. L. G. VALIANT, General context-free recognition in less than cubic time, J. Comput. System Sci. 10 (1975) 308-315. 21. D. H. YOUNGER, Recognition and parsing of context-free languages in time n3, Inform. and Control 10 (1967) 189-208.