Properties of LL-Regular Languages
Total Page:16
File Type:pdf, Size:1020Kb
Purdue University Purdue e-Pubs Department of Computer Science Technical Reports Department of Computer Science 1977 Properties of LL-Regular Languages David A. Poplawski Report Number: 77-241 Poplawski, David A., "Properties of LL-Regular Languages" (1977). Department of Computer Science Technical Reports. Paper 177. https://docs.lib.purdue.edu/cstech/177 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. PROPERTIES OF LL-REGULAR LANGUAGES David A. Poplawski Computer Science Department Purdue University W. Lafayette, Ind 47907 CSD TR-241 PROPERTIES OF LL-REGULAR LRNGURGES David R. Pop: aujski Rugust 1, 1377' HSSTRROT: The requirements for a context-free grammar to be !_!_- regular are relaxed, producing a larger ciasa of grammars than the original definition. Relationships between LL-regular prannMars and 1 .arrzuaees and other classes of t'ramma^s and languages are investigated, decidability questions are considered and linear time parsing algorithm is described. INTRODUCTION Until recently, efficient deterministic top-down parsing of context- free languages ^as limited to the class of LL(k) languages [8,8], and some variants [1,7,9]. In addition the claso of l.L(f) languages [3] was introduced, but due to the arbitrary nature of the function f, efficient parsing was ofter, irripossi!:^ e. Efficient top-::'oiyn parsing has bean achieved by restricting f to the class of functions computab1, e by generalized sequential machines [5], thereby defining the class of LL-reg'j"iar languages. This paper extends the class of grammars which define LL-regular languages and in addition provides some neu; results not covered in [5], including a two pass parsing algorithm similar to the parsing algorithm for LR-regular languages [2]. 2. LL-REGULAR GRrlMilPRS AND LANGUAGES We wi11 use the foilowing notation throughout this paper: P context-free grammar G wi1! be represented by a four-tuple (N.TpP^), where 1M is the finite set of non-terminal •symbols, T is the finite set of terminal symbols, P is the set of productions of the form where A € N and $ € (I\!l»T)*, AND S is the start symbol. We will occasionally identify a production in P by a unique number j by writing • Let ex,, aa i (NuT)*. We will write o^Ra-, if there is 5 production ? P in the grammar G (omitting the subscript G when it is obvious from the context). If'tXj j ? e T* then we ujrite oc,Rcr.s =» cc^iXg. ~i** then we write a,rla'. cr.^ct"- snd are the transitive completions of l & rm 1 t p m 1 arid respectively, and are the reflexive, transitive completions of and => respectively. If p = p 1 P2 • • • Pr, a sequence o f production identifiers, then a: £ represents a leftmost derivation from a to p using in sequence the productions p,,p£,...If n is an integer, then we write a P if there exists a sequence of productions p = Pi?2---Pn such that a j3. The set L(G) = {x ( T* | S 4 x) is the language generated by the context- free grammar G. Unless otherwise indicated, upper case 1etters S, fl, B lux 1 be symbols in X, upper case letter X wi 1 1 be a symbol in (NuT), lower case letters a, b, c, d, e wi11 be symbols in T, lower case letters w, x, y, z will be strings of symbols in T*, and greek letters will be strings of symbols in (NuT)*. If a e (NuT)*, then is the reverse of a. FIRSTk(x) = x if the length of x is less than or equal to k and FIRST^(x) = y where the length of y equals k and there is a z e T* such that x = yz. Let 77 = (Rj,...,Rn) be a collection of regular sets over T. If the R; are oairwise disjoint and the union of all R; in 7i is equal to T*, then n is called a regular partition of T*. For strings x,y t T*, if x € Rj and y R^ for some i, then we will write xsy (mod n). R deterministic finite automata N will be represented by the five-tuple (Q,T, S , q0, F) , where Q is the set of states, T is the input alphabet, S is a mapping from QxT Q, q0 € Q is the initial state, and FcQ is the set of .final states. DEFINITION 2.1: Let G= (N.T.P.S] be a cfg. Let n = (Rj Rn) be a regular partition of T*. G is said to be strong LL-regular far n if, given any two leftmost derivations: S u^Ra, is* "Mpa, ^ u),x S w2fi«2 ujaVtta # uiay such that x=y (mod n), then it follows that 0 = Y. This definition is equivalent to the definition of. an LL- regular grammar given in [5]. Since it is simi1ar in form to the definition of strong LL(U) grammars [8], grammars satisfying it will be called strong LL-regular grammars. Corresponding to the definition of LT_ (k) grammars [6] are the LL-regular grammars. DEFINITION 2.2: Let G = (N.T.P.S) be a cfg. Let n - (R,,...,Rn) be a regular partition of T*. G is said to be LL-regular for n (written LL(tt)) if, given any two leftmost derivations: S wP.cc utfa wx 5 # wRoc ^ ujYcx # uiy such that x=y (mod , then it follows that fi = y. DEFINITION 2,3: R language L over alphabet T is said to be LL-regu'ar if there exists a grammar G and a regular partition ?i of T* such that G is LL (tt) and L = L(G) . 3. GRRMMRR PROPERTIES RND RELATIONSHIPS The following set of propositions establish the relationship of the class of LL-regular grammars to other grammar classes. The class of LL-regular grammars fall between the LL(l<) and the LR-reguiar [2] class of grammars, and are incomparable with the class of LR(k) grammars. PROPOSITION 3.1: Every strong LL-regular grammar is LL- regular. PROOF: Follows immediately from the definitions, since the strong LL-regiilar definition is a special case of the LL-regular definition. PROPOSITION 3.2: There is an LL-regul^r grammar that is not strong LL-reguiar. PROOF: Let G = ({S,R},{b,c,d,e} ,P,S), ahere P {S^Ab,S^Rc,R->d,R-e}, and let n ({db,ec}, feb,de},T*-(db, ec.eb.dc}). G is not strong LL-regi:iar for tt since the two derivations S # fib db $ db S IT Rc I m ec ITn ec have dbsec (mod n) but d^e. Since there are only four derivations posible in G, we have S Rb & db # db 5 # Rb & ^ & eb but db^eb (mod n) and S # Rc dc # dc S # fic ec # ec but dc^ec (mod tO and therefore G is LL-reguJ ar for n. PROPOSITION 3.3: Let G = (N,T,P,S) be a grammar that is LL (k) for some integer k>1 . Then G is LL(ti) for n= (w, , w2, . , u, T* f u2T*, . .) , where w,,^,... are all the strings over T* of length less than k and U|,u2>... are all the strings over T* of length exactly k. PROOF: Assume G is LL(I<). Consider the two derivations 5 =5? # ujx S # ujRa =* wVa ^ wy and suppose x=y (mod n) . If x,y = uj^ then FIRSTk(x) = FIRSTk(y). If x,y € UlT*, then FIRSTk(x) = FIRSTk(y) also. Since G is LL(!<) iue conclude that £ = 7. Therefore G is LL(rt). 6 PROPOSITION . >i: The grammar G = (N,T,P,S), where N = {S, R, B) , T = fa,b,c,d,e}, and P = (S-?Ra, S^Bb, R">cfid, R~>e, S^cBd, B^e} is LLCtO for the regular J partition n = (cT*a, eT*a, cT*b, eT*b, T*- (c^'e)T* (avb)) , but is not LR(k) for any integer k. PROOF: That G is not LR(k) for any k can be shown by considering the following derivations: S c-Wa =? c" ed" a S e"Bd"b => c" ed" b for n>0. For any fixed integer, k there is always an n>k n n such that FIRSTk (ed a) - FIRSTk (ed" b) . But c RdT a * cMBd"a so G is not LR(k) for any integer k. To show that G is LLW it is sufficient to show that the contrapositive of the definition is satisfied, that is, instead of showing that x=y (mod tt) implies that a = , show that o: ^ implies that xjsy (mod 71). To this end, consider the following three places where the definition may be not satisfied: Case 1 - using productions S-»Ra and S-Bb. S # S Ra # cTedTa S # S Bb # c" edn b n>0 In this case, Ra * Bb, ' and for all n, cnedna^cr'ed"b (mod n) . Case 2 - using productions R-^cRd and fl-»e. S J^ crnRdma =$> crncRddma cmcnedndma 7 S c^rPa cr,V-dma ^ crnedma m>0,n>1 In this case, cRd * e, and for al 1 n and m, cn ed'"1 cma£edrna (mod tt) . Case 3 - using productions B^cBd and B-*e. ' m m n n m S ^it i c'Wb IFI c cBdd b t=mk c 'c ed"d b S # crnBdrob omedmb cmedmb m>D,n>1 In this case, cBd ^ e, and for all n and m, cnedrtcmb£edmb (mod n).