Automaty a Gramatiky
Total Page:16
File Type:pdf, Size:1020Kb
Context-Free Grammars and Languages More powerful than finite automata. Used to describe document formats, via DTD - document-type definition used in XML (extensible markup language) We introduce context-free grammar notation parse tree. There exist an ’pushdown automaton’ that describes all and only the context-free languages. Will be described later. Automata and Grammars Grammars 6 March 30, 2017 1 / 39 Palindrome example A string the same forward and backward, like otto or Madam, I’m Adam. w is a palindrome iff w = w R . The language Lpal of palindromes is not a regular language. A context-free grammar for We use the pumping lemma. palindromes If Lpal is a regular language, let n be the 1. P → λ asssociated constant, and consider: 2. P → 0 w = 0n10n. 3. P → 1 For regular L, we can break w = xyz such that y consists of one or more 0’s from the 4. P → 0P0 5. P → 1P1 first group. Thus, xz would be also in Lpal if Lpal were regular. A context-free grammar (right) consists of one or more variables, that represent classes of strings, i.e., languages. Automata and Grammars Grammars 6 March 30, 2017 2 / 39 Definition (Grammar) A Grammar G = (V , T , P, S) consists of Finite set of terminal symbols (terminals) T , like {0, 1} in the previous example. Finite set of variables V (nonterminals,syntactic categories), like {P} in the previous example. Start symbol S is a variable that represents the language being defined. P in the previous example. Finite set of rules (productions) P that represent the recursive definition of the language. Each has the form: αAβ → ω, A ∈ V , α, β, ω ∈ (V ∪ T )∗ notice the left side (head) contains at least one variable. The head - the left side, the production symbol →, the body - the right side. Definition (Context free grammar CFG) Context free grammar (CFG) je G = (V , T , P, S) has only productions of the form A → α, A ∈ V , α ∈ (V ∪ T )∗. Automata and Grammars Grammars 6 March 30, 2017 3 / 39 Chomsky hierarchy Grammar types according to productions allowed. Type 0 (recursively enumerable languages L0) general rules α → β, α, β ∈ (V ∪ T )∗, α contains at least one variable Type 1 (context languages L1) productions of the form αAβ → αωβ A ∈ V , α, β ∈ (V ∪ T )∗, ω ∈ (V ∪ T )+ with only exception S → λ, then S does not appear at the right side of any production Type 2 (context free languages L2) productions of the form A → ω, A ∈ V , ω ∈ (V ∪ T )∗ Type 3 (regular (right linear) languages L3) productions of the form A → ωB, A → ω, A, B ∈ V , ω ∈ T ∗ Automata and Grammars Grammars 6 March 30, 2017 4 / 39 Chomsky hierarchy The classes of languages are ordered L0 ⊇ L1 ⊇ L2 ⊇ L3 later we show proper inclusions L0 ⊃ L1 ⊃ L2 ⊃ L3 L0 ⊇ L1 recursively enumerable contain context free productions αAβ → αωβ have variable A in the head L2 ⊇ L3 context free contain regular languages productions A → ωB, A → ω have in the body a string (V ∪ T )∗ L1 ⊇ L2 context contain context free languages we have to eliminate rules A → λ, we can do it (later). Automata and Grammars Grammars 6 March 30, 2017 5 / 39 Derivations Using a Grammar Definition (One step derivation) Suppose G = (V , T , P, S) is grammar. Let α, ω, η, ν ∈ (V ∪ T )∗. Let α → ω be a production rule of G. Then one derivation step is: ηαν ⇒G ηων or just ηαν ⇒ ηων. We extend ⇒ to any number of derivation steps as follows. Definition (Derivation ⇒∗) Let G = (V , T , P, S) is CFG. ∗ ∗ Basis: For any string α ∈ (V ∪ T ) it derives itself, α ⇒G α. ∗ ∗ Induction: If α ⇒G β and β ⇒G γ then α ⇒G γ. ∗ ∗ If grammar G is understood, then we use ⇒ in place of ⇒G . Example (derivation E ⇒∗ a ∗ (a + b00)) E ⇒ E ∗E ⇒ I ∗E ⇒ a ∗E ⇒ a ∗(E) ⇒ a ∗(E +E) ⇒ a ∗(I +E) ⇒ a ∗(a +E) ⇒ ⇒ a ∗ (a + I) ⇒ a ∗ (a + I0) ⇒ a ∗ (a + I00) ⇒ a ∗ (a + b00) Automata and Grammars Grammars 6 March 30, 2017 6 / 39 The Language of a Grammar, Notation Convention for CFG Derivations a, b, c terminals A, B, C variables w, z strings of terminals X, Y either terminals or variables α, β, γ strings of terminals and/or variables. Definition (The Language of a Grammar) Let G = (V , T , P, S) is CFG. The language L(G) of G is the set of terminal strings that have derivations from the start symbol. ∗ ∗ L(G) = {w ∈ T |S ⇒G w} ∗ ∗ Language of a variable A ∈ V is defined L(A) = {w ∈ T |A ⇒G w}. Example (Not CFL example) L = {0n1n2n|n ≥ 1} is not context-free, there does not exist CFG grammar recognizing it. Automata and Grammars Grammars 6 March 30, 2017 7 / 39 Type 3 grammars and regular languages productions has the form A → wB, A → w, A, B ∈ V , w ∈ T ∗ an example of derivation: P : S → 0S|1A|λ, A → 0A|1B, B → 0B|1S S ⇒ 0S ⇒ 01A ⇒ 011B ⇒ 0110B ⇒ 01101S ⇒ 01101 Observations: each word contains exactly one variable (except the last one) the variable is always on the rightmost position the production A → w is the last one of the derivation any step generates terminal string and (possibly) changes the variable The relation of the grammar and a finite automata variable = state of the finite automata productions = transition function Automata and Grammars Grammars 6 March 30, 2017 8 / 39 Example of the reduction FA to a grammar Example (G, FA binary numbers divisible by 5) L = {w|w ∈ {a, b}∗&w binary numbers divisible by 5} 0 C2 E4 1 A → 1B|0A|λ 1 B → 0C|1D A 0 0 0 1 C → 0E|1A 1 D → 0B|1C 1 0 E → 0D|1E B1 D3 0 A ⇒ 0A ⇒ 0 (0) A ⇒ 1B ⇒ 10C ⇒ 101A ⇒ 101 (5) Derivation examples A ⇒ 1B ⇒ 10C ⇒ 101A ⇒ 1010A ⇒ 1010 (10) A ⇒ 1B ⇒ 11D ⇒ 111C ⇒ 1111A ⇒ 1111 (15) Automata and Grammars Grammars 6 March 30, 2017 9 / 39 FA to Grammar reduction Theorem (L ∈ RE ⇒ L ∈ L3) For any language recognized by a finite automata there exists a grammar Type 3 recognizing the language. Proof: FA to Grammar reduction L = L(A) for some automaton A = (Q, Σ, δ, q0, F ). We define a grammar G = (Q, Σ, P, q0), with productions P p → aq, iff δ(p, a) = q p → λ, iff p ∈ F Is L(A) = L(G)? λ ∈ L(A) ⇔ q0 ∈ F ⇔ (q0 → λ) ∈ P ⇔ λ ∈ L(G) a1 ... an ∈ L(A) ⇔ ∃q0,..., qn ∈ Q tž. δ(qi , ai+1) = qi+1, qn ∈ F ⇔ (q0 ⇒ a1q1 ⇒ ... a1 ... anqn ⇒ a1 ... an) is derivation of a1 ... an ⇔ a1 ... an ∈ L(G) Automata and Grammars Grammars 6 March 30, 2017 10 / 39 We aim to construct Grammar to FA reduction Opposite direction production A → aB are encoded to transition function productions A → λ define the accepting states we rewrite productions A → a1 ... anB, A → a1 ... an with more terminals we introduce new variables H2,..., Hn define productions A → a1H2, H2 → a2H3,..., Hn → anB or A → a1H2, H2 → a2H3,..., Hn → an productions A → B correspond to λ transitions Lemma For any Type 3 grammar there exist a Type 3 grammar with the same languages with all productions of the form: A → aB, A → B, A → λ,A, B ∈ V , a ∈ T. Automata and Grammars Grammars 6 March 30, 2017 11 / 39 Standard form of a grammar Type 3 Lemma For any Type 3 grammar there exist a Type 3 grammar with the same languages with all productions of the form: A → aB, A → B, A → λ,A, B ∈ V , a ∈ T. Proof. We define G| = (V |, T , S, P|), for each rule we introduce set of new variables Y2,..., Yn, Z1,..., Zn and define P P| A → aB A → aB A → λ A → λ A → a1 ... anBA → a1Y2, Y2 → a2Y3,... Yn → anB Z → a1 ... an Z → a1Z1, Z1 → a2Z2,..., Zn−1 → anZn, Zn → λ we may eliminate also rules: A → B transitive closure U(A) = {B|B ∈ V &A ⇒∗ B} A → w for all Z ∈ U(A) and (Z → w) ∈ P| Automata and Grammars Grammars 6 March 30, 2017 12 / 39 Theorem (Reduction Type 3 grammar to a λ–NFA) For any language L of a Type 3 grammar there exists a λ–NFA recognizing the same language. Proof: Reduction Type 3 grammar to a λ–NFA We take a grammar G = (V , T , P, S) with all productions of the form A → aB, A → B, A → λ, A, B ∈ V , a ∈ T generating L (previous lemma) we define a non–deterministic λ–NFA A = (V , T , δ, {S}, F ) where: F = {A|(A → λ) ∈ P} δ(A, a) = {B|(A → aB) ∈ P} δ(A, λ) = {B|(A → B) ∈ P} L(G) = L(A) λ ∈ L(G) ⇔ (S → λ) ∈ P ⇔ S ∈ F ⇔ λ ∈ L(A) a1 ... an ∈ L(G) ⇔ there exists a derivation ∗ ∗ (S ⇒ a1H1 ⇒ ... ⇒ a1 ... anHn ⇒ a1 ... an) ⇔∃ H0,..., Hn ∈ V tak že H0 = S, Hn ∈ F Hi+1 ∈ δ(Hi , ak ) for the step a1 ... ak−1Hi ⇒ a1 ... ak−1ak Hi+1 Hi+1 ∈ δ(Hi , λ) for the step a1 ... ak Hi ⇒ a1 ... ak Hi+1 ⇔ a1 ... an ∈ L(A) Automata and Grammars Grammars 6 March 30, 2017 13 / 39 Left (and right) linear grammars Definition (Left (and right) linear grammars) Type 3 grammars are also called right linear (the variable is always at right). A grammar G is left linear iff all production has the form A → Bw, A → w, A, B ∈ V , w ∈ T ∗.