<<

A representation of context-free grammars with the help of finite digraphs Krasimir Yordzhev

Faculty of Mathematics and Natural Sciences, South-West University, Blagoevgrad, Bulgaria Email address: [email protected]

Abstract: For any context-free grammar, we build a transition diagram, that is, a finite directed graph with labeled arcs, which describes the work of the grammar. This approach is new, and it is different from previously known graph models. We define the concept of proper walk in this transition diagram and we prove that a word belongs to a given context-free language if and only if this word can be obtained with the help of a proper walk. Keywords: Context-free grammar, Finite digraph, Transition diagram, Proper walk, Chomsky normal form

where N and Σ are finite sets, N Σ = , which 1. Introduction are called, respectively, alphabet of nonterminals (or Context-free grammars are widely used to describe variables) and alphabet of terminals∩, where∅ Π is a the syntax of programming languages and natural finite subset of the Cartesian product (N Σ) × languages [1,2]. On the other hand, graph theory is a (N Σ) , and we will write u v if (u, v) ∗ Π good apparatus for the modeling of computing de- (see [3]).∗ We will call Π the set of productions∪ or vices and computational processes. So a lot of graph rules.∪ → ∈ algorithms have been developed [4,5]. The purpose Let Γ = (N, Σ, Π) be a and let of this article is to show how for an arbitrary con- w, y (N Σ) . We will write w y, if there are text-free grammar, we can construct a finite digraph, z , z , u, v (N∗ Σ) , such that w = z uz , which describes the work of the grammar. As is well y = z∈ vz ∪and u ∗v is a production→ of Π. We will known, any finite state machine (deterministic or 1 2 1 2 write w ∈y, if∪ w = y, or there are w , w , … , w nondeterministic finite automaton) and any push- 1 2 such that w =→w and w w for all down automaton can be described by a transition 0 1 r i = 0,1, …⇒, r 1 . The sequence w w diagram, which is essentially a finite directed graph 0 i i+1 w is called a derivation of length →r. with labeled arcs [2]. The finite digraph, which we 0 1 For every −A N, a subset L(Γ, A)→ Σ →defined⋯ → will construct in this work and who will simulate a r given context-free grammar will be different from by ∗ previously known graph models [1,2]. ∈ ⊆ L( , A) = { | A } ∗ 2. The Main Definitions and Notations is called the languageΓ generatedα ∈ Σ by⇒ theα grammar Γ If with the start symbol A (see [3]). The formal grammar Γ = {N, Σ, Π} is called con- Σ = {x , x , … , x } text-free if all the productions of Π are of the form A , where A N, (N Σ) . The language 1 2 p is a finite set, then with Σ we will denote the free L is called context-free if there is∗ a context-free → ω ∈ ω ∈ ∪ monoid over Σ , ie the∗ set of all sequences grammar Γ = (N, Σ, Π) and a nonterminal A N, x x … x , including the empty one which we de- such that L = L(Γ, A). ∈ note with ε. Elements of Σ is called words, and Transition diagram is called a finite directed graph, i1 i2 in all of whose arcs are labelled by an element of a any subset of Σ (formal) language∗ . semigroup (see also [2]). Formal (or generative∗ ) grammar Γ is a triple If is a walk in a transition diagram, then the Γ = (N, Σ, Π), label of this walk l( ) is the product of the labels of π π 2 K. Yordzhev. A Representation of Context-free Grammars with the Help of Finite Digraphs the arcs that make up this walk, taken in passing the arcs. Let Γ = (N, Σ, Π) be a context-free grammar and let Figure 1. Production , , and the corresponding part of the transition diagram∗ ( ). = { | A N} = { | x }. 퐴 → 훼 퐴 ∈ 푁 훼 ∈ 훴 퐻 훤

We considerN′ A′the monoid∈ ΣT′ withx′ the∈ setΣ of gener- If the production is of the form ators , and with the defining rela- A B B … B , where Σ tions ( i = 0,1, … k ), A, B N ( j = 1,2, … , k ), then the∗ 0 1 1 2 2 k−1 k k i 푁 ∪ Σ ∪ 푁′ ∪ Σ′ corresponding→ α α partα ofα the transitionα diagram isα shown∈ j = , = , (1) in Figure 2 ∈ where , 퐴′퐴 , 휀 푥′푥 , 휀 and is the empty word (the identity of the monoid ). We set by definition퐴′ ∈ 푁′ 퐴( ∈)푁= 푥′ ,∈ Σ(′ 푥) ∈=Σ , if휀 = … , then will mean 푇… and = . 푥′ ′ 푥 퐴′ ′ ′ 퐴′ ′ 휔 푧1푧Let2 푧푘 ∈ 푇 휔′ 푧푘푧푘−1 푧1 휀′ 휀 = × = {( , ) | , }. ∗ ∗ In 푅 we Σintroduce푇 the휔 operation푧 휔 ∈ Σ'' ''푧 as∈ follows:푇 If ( , ), ( , ) , then 푅 ∘ 휔1( 푧1, 휔) 2 (푧2 ,∈ 푅) = ( , ). (2)

It 휔is1 easy푧1 ∘ to휔 2see푧2 that 휔 1with휔2 푧 2this푧1 operation is a monoid with identity ( , ). 푅 Figure 2. Production … and 휀 휀 the corresponding part of the transition diagram ( ). 3. Finite Digraphs and Context-free 퐴 → 훼0퐵1훼1퐵2훼2 훼푘−1퐵푘훼푘 Grammars 퐻 훤

Definition 1. For the context-free grammar , we Definition 2. A walk H( ) will be called a construct a transition diagram H( ) with the set of proper walk if: V = {u , v | A N} Γ π ∈ Γ vertices and the set of arcs 1. For each A N the arcs from u to v (if E V × V which is made as follows:Γ A A there exist) are proper walks; ∈ A A 1. For each production A , where A N , ∈ ⊆ 2. Let , , … , be proper walks, A N and Σ there is an arc from u to v with a ( , ) let: bel ∗ (see Fig. 1); → α ∈ π1 π2 πk ∈ α ∈ A A • there is an arc from the vertex u to the 2. For each production A Bz, where A, B N, start vertex of ; α ε A Σ , z (N Σ) there is an arc from u to u • for every i = 1ρ,2, … , k 1 there are arcs ( , z) 1 with a∗ label ; ∗ → α ∈ from the finalπ vertex of to the start vertex of α ∈ ∈ ∪ A B 3. For each production A z B B z where ; − α i i A, B , B N, Σ , z , z (N Σ) , there is an σ • there is an arc fromπ the final vertex of 1 ′1 2 2 i+1 arc from v to u ∗ with a label→( , B ∗α′); π to the vertex v . 1 2 1 2 ∈ α ∈ ∈ ∪ If l( … τ ) = ( , ) for some 4. For eachB1 productionB2 A zB , where2 A, B N, k A z (N ) α α v v π , then the walk Σ , Σ there is an arc from to 1 1 2 2 k−1 k with a∗ label( , ′); ∗ → α ∈ ∗ ρπ σ π σ σ π τ α ε α ∈ ∈ ∪ B A α ∈ Σ = … 5. There are no other arcs in H(Γ) except de- α α scribed in 1 ÷ 4. 1 1 2 2 푘−1 푘 is the proper 휋walk휌 (see휋 휎 Figure휋 휎 3).휎 휋 휏

K. Yordzhev. A Representation of Context-free Grammars with the Help of Finite Digraphs 3

3. There are no other proper walks in ( ) ex- with a label ( , B ). cept described in 1 and 2. c) According to ′1, item′ 1 there is an arc from v i i+1 i 퐻 Γ to v with a labelα ( α, ). Bk We consider the walk ′ = …τ A k H( ). For this walk, αwe αhave:k π ρπ1σ1π2 σk−1πkτ ∈ l( Γ) =

=π( , B … B ) ( , ) ( , B ) ( , ) ( , B ′ ′)( , ) 0 1 2 2 k k 1 1 2 1 α α α α ∘ β ε ∘ α ′ ′ α ′ = ( … ∘ β , 2 εB ∘ ⋯ α…k−B1 kαkB−1 …αk αkB ) ′ ′ ′ ′ ′ = (α0, β)1 α1 βkαk αk kαk−1 2α1α1 2 αk−1 kαk

Sinceα ε , … are proper walks then by defi- Figure 3. A proper walk in ( ). nition 2 is a proper walk in H( ) with the start 1 2 k vertex uπ, theπ finalπ vertex v and labelled l( ) = 퐻 훤 ( , ). Theπ necessity is proved. Γ The following theorem is true. A A π αSufficiencyε . Let be a proper walk in H( ) with Theorem 1 Let = (N, , ) be a context-free the start vertex u , the final vertex v and a label grammar and let A N . Then L( , A) if and l( ) = ( , ). If theπ length of is equal to 1,Γ then only if there is a properΓ walkΣ inΠ H( ) with the start is an arc and this Ais possible if and onlyA if there is a vertex u , the final vertex∈ v andα with∈ aΓ label( , ). productionπ α εA , ie L( ,πA). π Γ A A Suppose that for all A N and for all proper walks Proof. Necessity. Let L( , A). Then thereα isε a with the start vertex→ α u ,α the∈ finalΓ vertex v , length less derivation A .The proof we will make by induc- than or equal to t and with∈ a label ( , ) is satisfied α ∈ Γ A A tion on the length of the derivation. L( , A). Let be a proper walk with the start ⇒ α Let a production A exists. Then according to vertex u , the final vertex v , labelαlεed ( , ) and Definition 1, item 1 there exists an arc u v with a lengthα ∈ Γt + 1 > 1. πSince is a proper walk, then there → α A A label ( , ). is a vertex B N such that the first arc inα theε walk A N AL(A , A) Suppose that for all and for all for starts at u and finishesπ at u . Hence there is a which thereα ε is a derivation A no longer than t, there is 1 A ∈ B B … B ρ a proper walk with the start∈ vertex u , the αfinal∈ vertexΓ v production 1 in such that π A B and labeled ( , ). ⇒ α l(u u ) = ( , B … B ) for some B N , A A → α0 1α1 2 kαk Γ Let L( , A), and there is a derivation A of 1, i = 2,3, … , k , j = 0,1, … , k . According to α ε A B 0 1 2 k k i length t + 1. Let A B … B be the first Definition∗ 2, α isα given byα = … ∈ , j productionα ∈ ofΓ this derivation, where , B⇒ α N, αwhere∈ Σ are proper walks with the start vertices u 0 1 1 k k 1 1 2 s−1 s i = 0,1, … k, j = 1,2, …→kα. Thenα there isα a derivation∗ and the the finalπ vertices v π for ρsomeπ σ πC σN (obvπ τi- i j i Ci α ∈ Σ ∈ ously πC = B ), i = 1,2, … , s , arcs = v u Ci i … … = , from v to u , i = 1,2, … , s 1 , and∈ an arc 1 1 σi Ci Ci+1 0 1 1 푘 푘 0 1 1 푘 푘 = v v from v to v . Let l( ) = ( , C ), wher퐴e → 훼 퐵L(훼, B )퐵 and훼 for⇒ 훼all훽 훼 there훽 exist훼 deriv훼 a- Ci Ci+1 , i = 1,2, … , s 1 , l( ) =−( , ) , ′ ′ . tions B with the length less than or equal to t, Cs A Cs A i i i+1 i i i i τThen we∗ obtain σ ′ γ γ ∗ i = 1,2β, …∈k . ByΓ the inductiveβ hypothesis, for all i s s i i γ ∈ Σ − τ γ γ γ ∈ Σ i = 1,2, …⇒, kβ, there are proper walks with the start l( ) = vertices u , the final vertices v and labeled ( , ). i For the first production A Bπ … B of Bi Bi i =πl u u l( ) l( ) l( ) … l( ) l( ) l( ) the derivation A we have: β ε 0 1 1 k k A C1 1 1 2 s−1 s a) According to Definition 1,→ itemα 2 thereα existsα an = (� , �C∘ π ∘…σC ∘ π B∘ …∘ Bσ )∘, π ∘ τ arc from ⇒uα to u with a label ( , B … B ). whereω γs′= s′γ ands−1 C2′γ1′α1…2C kαBk′ … B = . A B1 b) ρAccording to Definition 1, item 3 for all Having in mind (1), it′ is′ easy′ to see′ ′that this is possible 0 1 2 k k i α= 1α,2, … k 1α there are arcs from v to u if and ωonly αif s =γks , sCγs−=1 B 2forγ1 αi1=22,3, …kα, k andε

− σi Bi Bi+1 i i 4 K. Yordzhev. A Representation of Context-free Grammars with the Help of Finite Digraphs

= for j = 1,2, … , k. Therefore = … , where are j j properγ α walks with the start vertices u and the the 1 1 k−1 k i final vertices vπ (iρ=π 1σ,2, …σ, k); π τ are arcs fromπ v Bi to u (i = 1,2, … , k 1); is an arc from u to Bi i Bi u ; is an arc from v to v andσ there are Bi+1 − ρ A (i = 1,2, … , k) such that l( ) = ( , ). By the induc-∗ B1 Bk A i tive hypothesis,τ we have L( , B ) ie thereβ exist∈ Σ i i derivations B for allπ i = 1β,2, …ε , k. Hence, we have the following derivationβi ∈ Γ i Figure 4. Production and the corresponding i ⇒ βi part of the transition diagram ( ). A B B … B … = , 퐴 → 퐵퐵 퐻 훤 which→ impliesα0 1α1 that2 kαLk(⇒, Aα)0.β The1α1 βtheorem2 βkα isk proved.α Obviously, if we consider a context-free grammar in Chomsky normal form, then the corresponding α ∈ Γ transition diagram is in quite simple and convenient 4. Context-free Grammars in Chomsky form. Normal Form Discussed above model is particularly useful when 5. Conclusions and future work applied to some context-free grammars having a spe- Described in this article a graph model of an arbi- cial form, for example, when the grammar is in trary context-free grammar is new and original. This Chomsky normal form. is different from the familiar until now graph models. Some polynomial algorithms for testing the in- Definition 3. Let be а context-free grammar with- clusion of any regular or linear language in a group out the empty word ε in which all productions are in language are described in [7]. These algorithms are one of two simple forms,Γ either: based on the search for a cycle in the corresponding 1. A BC, where A, B, and C, are each nonter- transition diagram. This idea can be used for any minals, or context-free language L by the transition diagram → described in Section 3. The difficulty here is to find, 2. A x, where A is а nonterminal and x is а in general, the proper walk, which corresponds to any terminal. given word L. We hope this open problem to be → Such а grammar is said to be in Chomsky Normal solved in the near future. Form. The researchα ∈ is partly supported by the project SRP-B3/13, funded by the Ministry of Education, As is well known [2], every context-free language Youth and Science (Bulgaria) and South-West Uni- can be generated by a grammar in Chomsky normal versity “Neofit Rilsky”, Blagoevgrad. form. 퐿 An algorithm that recognizes whether a word Σ belongs to the language , that is generated by a grammar∗ in Chomsky normal form. This algorithm References worksα ∈ in time ( ), where 퐿= | |. Described in [1] A.V. Aho, J.D. Ullman, The theory of , section 3 graph model2 will help us to understand in translation and computing. Vol. 1, 2, Pren- detail this algorithm푂 푛 for more accurate푛 α description. tice-Hall, 1972. If the production is of the form A BC, i.e. it is [2] J.E. Hopcroft, R. Motwani, J.D. Ullman, Intro- one of the productions of grammar in Chomsky nor- duction to automata theory, languages, and mal form, then the appropriate fragment→ of the transi- computation. Addison-Wesley, 2001. tion diagram is shown in Figure 4. [3] G. Lallemant, Semigroups and combinatorial applications. John Wiley & Sons, 1979. [4] I. Mirchev, Graphs. Optimization algorithms in networks. SWU “N. Rilsky”, Blagoevgrad, 2001. [5] M. Swami, K. Thulasirman, Graphs, networks and algorithms. John Wiley & Sons, 1981. 2 [6] K. Yordzhev, An n Algorithm for Recognition of Context-free Languages. Cybern. Syst. Anal. 29, K. Yordzhev. A Representation of Context-free Grammars with the Help of Finite Digraphs 5

No.6, (1993), 922-927. [7] K. Yordzhev, Inclusion of Regular and Linear Languages in Group Languages. International J. of Math. Sci. & Engg. Appls. (IJMSEA) ISSN 0973-9424, Vol. 7 No. I ( 2013), 323-336