Chomsky and Greibach Normal Forms

Total Page:16

File Type:pdf, Size:1020Kb

Chomsky and Greibach Normal Forms Chomsky and Greibach Normal Forms Teodor Rus [email protected] The University of Iowa, Department of Computer Science Computation Theory – p.1/27 Simplifying a CFG • It is often convenient to simplify a CFG; • One of the simplest and most useful simplified forms of CFG is called the Chomsky normal form; • Another normal form usually used in algebraic specifications is Greibach normal form. Note the difference between grammar cleaning and grammar simplification! Computation Theory – p.2/27 Fact Normal forms are useful when more advanced topics in computation theory are approached, as we shall see further. Computation Theory – p.3/27 Definition A context-free grammar G is in Chomsky normal form if every rule is of the form: A −→ BC A −→ a where a is a terminal, A,B,C are nonterminals, and B,C may not be the start variable (the axiom). Computation Theory – p.4/27 Observation The rule S −→ ǫ, where S is the start variable, is not excluded from a CFG in Chomsky normal form. Computation Theory – p.5/27 Theorem 2.9 Any context-free language is generated by a context-free grammar in Chomsky normal form. Proof idea: • Show that any CFG G can be converted into a CFG G′ in Chomsky normal form; • Conversion procedure has several stages where the rules that violate Chomsky normal form conditions are replaced with equivalent rules that satisfy these conditions; • Order of transformations:(1) add a new start variable, (2) eliminate all ǫ-rules, (3) eliminate unit-rules, (4) convert other rules; • Check that the obtained CFG G′ define the same language as the initial CFG G. Computation Theory – p.6/27 Proof Let G = (N, Σ,R,S) be the original CFG. Step 1: add a new start symbol S0 to N, and the rule S0 −→ S to R Note: this change guarantees that the start symbol of G′ does not occur on the rhs of any rule. Computation Theory – p.7/27 Step 2: eliminate ǫ-rules Repeat 1. Eliminate the ǫ rule A −→ ǫ from R where A is not the start symbol; 2. For each occurrence of A on the rhs of a rule, add a new rule to R with that occurrence of A deleted; Examples: (1) replace B −→ uAv by B −→ uAv|uv; (2) replace B −→ uAvAw by B −→ uAvAw|uvAw|aAvw|uvw. 3. Replace the rule B −→ A, (if it is present) by B −→ A|ǫ unless the rule B −→ ǫ has been previously eliminated; until all ǫ rules are eliminated. Computation Theory – p.8/27 Step 3: remove unit rules Repeat: 1. Remove a unit rule A −→ B ∈ R; 2. For each rule B −→ u ∈ R, add the rule A −→ u to R, unless B → u was a unit rule previously removed; until all unit rules are eliminated. Note: u is a string of variables and terminals. Computation Theory – p.9/27 Convert all remaining rules Repeat: 1. Replace a rule A −→ u1u2 ...uk, k ≥ 3, where each ui, 1 ≤ i ≤ k, is a variable or a terminal, by: A −→ u1A1, A1 −→ u2A2, ..., Ak−2 −→ uk−1uk where A1,A2, ..., Ak−2 are new variables; 2. If k ≥ 2 replace any terminal ui with a new variable Ui and add the rule Ui −→ ui; until no rules of the form A −→ u1u2 ...uk with k ≥ 3 remain. Computation Theory – p.10/27 Example CFG conversion Consider the grammar G6 whose rules are: S −→ ASA|aB A −→ B|S B −→ b|ǫ Notation: symbols removed are green and those added are red. After first step of transformation we get: S0 −→ S S −→ ASA|aB A −→ B|S B −→ b|ǫ Computation Theory – p.11/27 Removing ǫ rules Removing B → ǫ: S0 −→ S S −→ ASA|aB|a A −→ B|S|ǫ B −→ b|ǫ Removing A → ǫ: S0 −→ S S −→ ASA|aB|a|SA|AS|S A −→ B|S|ǫ B −→ b Computation Theory – p.12/27 Removing unit rule Removing S → S: S0 −→ S S −→ ASA|aB|a|SA|AS|S A −→ B|S B −→ b Removing S0 → S: S0 −→ S|ASA|aB|a|SA|AS S −→ ASA|aB|a|SA|AS A −→ B|S B −→ b Computation Theory – p.13/27 More unit rules Removing A → B: S0 −→ ASA|aB|a|SA|AS S −→ ASA|aB|a|SA|AS A −→ B|S|b B −→ b Removing A → S: S0 −→ ASA|aB|a|SA|AS S −→ ASA|aB|a|SA|AS A −→ S|b|ASA|aB|a|SA|AS B −→ b Computation Theory – p.14/27 Converting remaining rules S0 −→ AA1|UB|a|SA|AS S −→ AA1|UB|a|SA|AS A −→ b|AA1|UB|a|SA|AS A1 −→ SA U −→ a B −→ b Computation Theory – p.15/27 Observation • The conversion procedure produces several variables Ui along with several rules Ui → a; • Since all these represent the same rule, we may simplify the result using a single variable U and a single rule U → a. Computation Theory – p.16/27 Greibach Normal Form A context-free grammar G = (V, Σ,R,S) is in Greibach normal form if each rule r ∈ R has the property: lhs(r) ∈ V , rhs(r)= aα, a ∈ Σ and α ∈ V ∗. Note: Greibach normal form provides a justification of operator prefix-notation usually employed in algebra. Computation Theory – p.17/27 Greibach Theorem Every CFL L where ǫ 6∈ L can be generated by a CFG in Greibach normal form. Proof idea: Let G =(V, Σ,R,S) be a CFG generating L. Assume that G is in Chomsky normal form • Let V = {A1, A2,...,Am} be an ordering of nonterminals. • Construct the Greibach normal form from Chomsky normal form. Computation Theory – p.18/27 Construction 1. Modify the rules in R so that if Ai → Aj γ ∈ R then j > i 2. Starting with A1 and proceeding to Am this is done as follows: (a) Assume that productions have been modified so that for 1 ≤ i ≤ k, Ai → Aj γ ∈ R only if j > i; (b) If Ak → Ajγ is a production with j<k, generate a new set of productions substituting for the Aj the rhs of each Aj production; (c) Repeating (b) at most k − 1 times we obtain rules of the form Ak → Apγ, p ≥ k; (d) Replace rules Ak → Akγ by removing left-recursive rules. Computation Theory – p.19/27 Facts Let m be the highest order nonterminal generated during Greibach normal form construction algorithm. Note: if no left-recursive rules (Ai → Aiα, for some 1 ≤ i ≤ m) are in the grammar then we have: 1. The leftmost symbol on the rhs of each rule for Am must be a terminal, since Am is the highest order variable. 2. The leftmost symbol on the rhs of any rule for Am−1 must be either Am or a terminal symbol. If it is Am we can generate new rules replacing Am with the rhs of the Am rules. 3. Repeating (2) for Am−2,...,A2, A1 we obtain only rules whose rhs start with terminal symbols. Computation Theory – p.20/27 Left recursion A CFG containing rules of the form A → Aα|β is called left-recursive in A. The language generated by such rules is of the form A ⇒∗ βαn. If we replace the rules A → Aα|β by A → βB|β, B → αB|α where B is a new variable, then the language generated by A is the same while no left-recursive A-rules are used in the derivation. Computation Theory – p.21/27 Removing left-recursion Left-recursion can be eliminated by the following scheme: • If A → Aα1|Aα2 ... |Aαr are all A left recursive rules, and A → β1|β2| ... |βs are all remaining A-rules then chose a new nonterminal, say B; • Add the new B-rules B → αi|αiB, 1 ≤ i ≤ r; • Replace the A-rules by A → βi|βiB, 1 ≤ i ≤ s. This construction preserve the language L. Computation Theory – p.22/27 Conversion algorithm for (k=1; k<=m; k++) {for (j=1; j<=k-1; j++) {for (each A_k ---> A_j alpha) { for all rules A_j ---> beta add A_k ---> beta alpha remove A_k ---> A_j alpha } for (each rule A_k ---> A_k alpha) { add rules B_k ---> alpha | alpha B_k remove A_k ---> A_k alpha } for (each rule A_k ---> beta, beta does not begin with A_k) add rule A_k ---> beta B_k } } Computation Theory – p.23/27 More on Greibach NF See Introduction to Automata Theory, Languages, and Computation, J.E, Hopcroft and J.D Ullman, Addison- Wesley 1979, p. 94–96. Computation Theory – p.24/27 Example Convert the CFG G =({A1, A2, A3}, {a, b},R,A1) where R = {A1 → A2A3, A2 → A3A1|b, A3 → A1A2|a} into Greibach normal form. Computation Theory – p.25/27 Solution 1. Step 1: ordering the rules: (Only A3 rules violate ordering conditions, hence only A3 rules need to be changed). Following the procedure we replace A3 rules by: A3 → A3A1A3A2|bA3A2|a 2. Eliminating left-recursion we get: A3 → bA3A2B3|aB3|bA3A2|a, B3 → A1A3A2|A1A3A2B3; 3. All A3 rules start with a terminal. We use them to replace A1 → A2A3. This introduces the rules B3 → A1A3A2|A1A3A2B3; 4. Use A1 production to make them start with a terminal. Computation Theory – p.26/27 The result A3 → bA3A2B3|bA3A2|aB3|a A2 → bA3A2B3A1|bA3A2A1|aB3A1|aA1|b A1 → bA3A2B3A1A3|bA3A2A1A3|aB3A1A3|aA1A3|bA3 B3 → bA3A2B3A1A3A3A2B3|bA3A2B3A1A3A1A3A3A2 B3 → bA3A3A2B3|bA3A3A2|bA3A2A1A3A3A2B3|bA3A2A1A3A3A2 B3 → aB3A1A3A3A2B3|aB3A1A3A3A2 B3 → aA1A3A3A2B3|aA1A3A3A2 Disclaimer: I copied this result from Hopcroft and Ulman, page 98-99! Hence, I cannot guarantee its correctness.
Recommended publications
  • Probabilistic Grammars and Their Applications This Discretion to Pursue Political and Economic Ends
    Probabilistic Grammars and their Applications this discretion to pursue political and economic ends. and the Law; Monetary Policy; Multinational Cor- Most experiences, however, suggest the limited power porations; Regulation, Economic Theory of; Regu- of privatization in changing the modes of governance lation: Working Conditions; Smith, Adam (1723–90); which are prevalent in each country’s large private Socialism; Venture Capital companies. Further, those countries which have chosen the mass (voucher) privatization route have done so largely out of necessity and face ongoing efficiency problems as a result. In the UK, a country Bibliography whose privatization policies are often referred to as a Armijo L 1998 Balance sheet or ballot box? Incentives to benchmark, ‘control [of privatized companies] is not privatize in emerging democracies. In: Oxhorn P, Starr P exerted in the forms of threats of take-over or (eds.) The Problematic Relationship between Economic and bankruptcy; nor has it for the most part come from Political Liberalization. Lynne Rienner, Boulder, CO Bishop M, Kay J, Mayer C 1994 Introduction: privatization in direct investor intervention’ (Bishop et al. 1994, p. 11). After the steep rise experienced in the immediate performance. In: Bishop M, Kay J, Mayer C (eds.) Pri ati- zation and Economic Performance. Oxford University Press, aftermath of privatizations, the slow but constant Oxford, UK decline in the number of small shareholders highlights Boubakri N, Cosset J-C 1998 The financial and operating the difficulties in sustaining people’s capitalism in the performance of newly privatized firms: evidence from develop- longer run. In Italy, for example, privatization was ing countries.
    [Show full text]
  • Grammars and Normal Forms
    Grammars and Normal Forms Read K & S 3.7. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no, AND if yes, describe the structure a + b * c Now it's time to worry about extracting structure (and doing so efficiently). Optimizing Context-Free Languages For regular languages: Computation = operation of FSMs. So, Optimization = Operations on FSMs: Conversion to deterministic FSMs Minimization of FSMs For context-free languages: Computation = operation of parsers. So, Optimization = Operations on languages Operations on grammars Parser design Before We Start: Operations on Grammars There are lots of ways to transform grammars so that they are more useful for a particular purpose. the basic idea: 1. Apply transformation 1 to G to get of undesirable property 1. Show that the language generated by G is unchanged. 2. Apply transformation 2 to G to get rid of undesirable property 2. Show that the language generated by G is unchanged AND that undesirable property 1 has not been reintroduced. 3. Continue until the grammar is in the desired form. Examples: • Getting rid of ε rules (nullable rules) • Getting rid of sets of rules with a common initial terminal, e.g., • A → aB, A → aC become A → aD, D → B | C • Conversion to normal forms Lecture Notes 16 Grammars and Normal Forms 1 Normal Forms If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with. Normal forms are designed to do just that.
    [Show full text]
  • CS351 Pumping Lemma, Chomsky Normal Form Chomsky Normal
    CS351 Pumping Lemma, Chomsky Normal Form Chomsky Normal Form When working with a context-free grammar a simple and useful form is called the Chomsky Normal Form (CNF). A CFG in CNF is one where every rule is of the form: A ! BC A ! a Where a is any terminal and A,B, and C are any variables, except that B and C may not be the start variable. Note that we have two and only two variables on the right hand side of the rule, with the exception that the rule S!ε is permitted where S is the start variable. Theorem: Any context free language may be generated by a context-free grammar in Chomsky normal form. To show how to make this conversion, we will need to do three things: 1. Eliminate all ε rules of the form A!ε 2. Eliminate all unit rules of the form A!B 3. Convert remaining rules into rules of the form A!BC Proof: 1. First add a new start symbol S0 and the rule S0 ! S, where S was the original start symbol. This guarantees that the start symbol doesn’t occur on the right hand side of a rule. 2. Remove all ε rules. Remove a rule A!ε where A is not the start symbol For each occurrence of A on the right-hand side of a rule, add a new rule with that occurrence of A deleted. Ex: R!uAv becomes R!uv This must be done for each occurrence of A, so the rule: R!uAvAw becomes R! uvAw and R! uAvw and R!uvw This step must be repeated until all ε rules are removed, not including the start.
    [Show full text]
  • LING83600: Context-Free Grammars
    LING83600: Context-free grammars Kyle Gorman 1 Introduction Context-free grammars (or CFGs) are a formalization of what linguists call “phrase structure gram- mars”. The formalism is originally due to Chomsky (1956) though it has been independently discovered several times. The insight underlying CFGs is the notion of constituency. Most linguists believe that human languages cannot be even weakly described by CFGs (e.g., Schieber 1985). And in formal work, context-free grammars have long been superseded by “trans- formational” approaches which introduce movement operations (in some camps), or by “general- ized phrase structure” approaches which add more complex phrase-building operations (in other camps). However, unlike more expressive formalisms, there exist relatively-efficient cubic-time parsing algorithms for CFGs. Nearly all programming languages are described by—and parsed using—a CFG,1 and CFG grammars of human languages are widely used as a model of syntactic structure in natural language processing and understanding tasks. Bibliographic note This handout covers the formal definition of context-free grammars; the Cocke-Younger-Kasami (CYK) parsing algorithm will be covered in a separate assignment. The definitions here are loosely based on chapter 11 of Jurafsky & Martin, who in turn draw from Hopcroft and Ullman 1979. 2 Definitions 2.1 Context-free grammars A context-free grammar G is a four-tuple N; Σ; R; S such that: • N is a set of non-terminal symbols, corresponding to phrase markers in a syntactic tree. • Σ is a set of terminal symbols, corresponding to words (i.e., X0s) in a syntactic tree. • R is a set of production rules.
    [Show full text]
  • The Logic of Categorial Grammars: Lecture Notes Christian Retoré
    The Logic of Categorial Grammars: Lecture Notes Christian Retoré To cite this version: Christian Retoré. The Logic of Categorial Grammars: Lecture Notes. RR-5703, INRIA. 2005, pp.105. inria-00070313 HAL Id: inria-00070313 https://hal.inria.fr/inria-00070313 Submitted on 19 May 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE The Logic of Categorial Grammars Lecture Notes Christian Retoré N° 5703 Septembre 2005 Thème SYM apport de recherche ISRN INRIA/RR--5703--FR+ENG ISSN 0249-6399 The Logic of Categorial Grammars Lecture Notes Christian Retoré ∗ Thème SYM — Systèmes symboliques Projet Signes Rapport de recherche n° 5703 — Septembre 2005 —105 pages Abstract: These lecture notes present categorial grammars as deductive systems, and include detailed proofs of their main properties. The first chapter deals with Ajdukiewicz and Bar-Hillel categorial grammars (AB grammars), their relation to context-free grammars and their learning algorithms. The second chapter is devoted to the Lambek calculus as a deductive system; the weak equivalence with context free grammars is proved; we also define the mapping from a syntactic analysis to a higher-order logical formula, which describes the seman- tics of the parsed sentence.
    [Show full text]
  • Appendix A: Hierarchy of Grammar Formalisms
    Appendix A: Hierarchy of Grammar Formalisms The following figure recalls the language hierarchy that we have developed in the course of the book. '' $$ '' $$ '' $$ ' ' $ $ CFG, 1-MCFG, PDA n n L1 = {a b | n ≥ 0} & % TAG, LIG, CCG, tree-local MCTAG, EPDA n n n ∗ L2 = {a b c | n ≥ 0}, L3 = {ww | w ∈{a, b} } & % 2-LCFRS, 2-MCFG, simple 2-RCG & % 3-LCFRS, 3-MCFG, simple 3-RCG n n n n n ∗ L4 = {a1 a2 a3 a4 a5 | n ≥ 0}, L5 = {www | w ∈{a, b} } & % ... LCFRS, MCFG, simple RCG, MG, set-local MCTAG, finite-copying LFG & % Thread Automata (TA) & % PMCFG 2n L6 = {a | n ≥ 0} & % RCG, simple LMG (= PTIME) & % mildly context-sensitive L. Kallmeyer, Parsing Beyond Context-Free Grammars, Cognitive Technologies, DOI 10.1007/978-3-642-14846-0, c Springer-Verlag Berlin Heidelberg 2010 216 Appendix A For each class the different formalisms and automata that generate/accept exactly the string languages contained in this class are listed. Furthermore, examples of typical languages for this class are added, i.e., of languages that belong to this class while not belonging to the next smaller class in our hier- archy. The inclusions are all proper inclusions, except for the relation between LCFRS and Thread Automata (TA). Here, we do not know whether the in- clusion is a proper one. It is possible that both devices yield the same class of languages. Appendix B: List of Acronyms The following table lists all acronyms that occur in this book. (2,2)-BRCG Binary bottom-up non-erasing RCG with at most two vari- ables per left-hand side argument 2-SA Two-Stack Automaton
    [Show full text]
  • Using Contextual Representations to Efficiently Learn Context-Free
    JournalofMachineLearningResearch11(2010)2707-2744 Submitted 12/09; Revised 9/10; Published 10/10 Using Contextual Representations to Efficiently Learn Context-Free Languages Alexander Clark [email protected] Department of Computer Science, Royal Holloway, University of London Egham, Surrey, TW20 0EX United Kingdom Remi´ Eyraud [email protected] Amaury Habrard [email protected] Laboratoire d’Informatique Fondamentale de Marseille CNRS UMR 6166, Aix-Marseille Universite´ 39, rue Fred´ eric´ Joliot-Curie, 13453 Marseille cedex 13, France Editor: Fernando Pereira Abstract We present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of context-free languages, that includes all regular languages and those context-free languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Pre- liminary experimental results confirm the effectiveness of this approach. Keywords: grammatical inference, context-free language, positive data only, membership queries 1.
    [Show full text]
  • 1 from N-Grams to Trees in Lindenmayer Systems Diego Gabriel Krivochen [email protected] Abstract: in This Paper We
    From n-grams to trees in Lindenmayer systems Diego Gabriel Krivochen [email protected] Abstract: In this paper we present two approaches to Lindenmayer systems: the rule-based (or ‘generative’) approach, which focuses on L-systems as Thue rewriting systems and a constraint-based (or ‘model- theoretic’) approach, in which rules are abandoned in favour of conditions over allowable expressions in the language (Pullum, 2019). We will argue that it is possible, for at least a subset of Lsystems and the languages they generate, to map string admissibility conditions (the 'Three Laws') to local tree admissibility conditions (cf. Rogers, 1997). This is equivalent to defining a model for those languages. We will work out how to construct structure assuming only superficial constraints on expressions, and define a set of constraints that well-formed expressions of specific L-languages must satisfy. We will see that L-systems that other methods distinguish turn out to satisfy the same model. 1. Introduction The most popular approach to structure building in contemporary generative grammar is undoubtedly based on set theory: Merge creates sets of syntactic objects (Chomsky, 1995, 2020; Epstein et al., 2015; Collins, 2017), such that Merge(A, B) results in the set {A, B}; in some versions where A always projects a phrasal label, the result is {A, {A, B}}. However, the primacy of set-theory in generative grammar has not been uncontested: syntactic representations can also be expressed in terms of graphs rather than sets, with varying results in terms of empirical adequacy and theoretical consistency. Graphs are sets of nodes and edges; more specifically, a graph is a pair G = (V, E), where V is a set of vertices (also called nodes) and E is a set of edges; v ∈ V is a vertex, and e ∈ E is an edge.
    [Show full text]
  • Converting a CFG to Chomsky Normal Formjp ● Prerequisite Knowledge: Context-Free Languages Context-Free Grammars JFLAP Tutorial
    Converting a CFG to Chomsky Normal FormJP ● Prerequisite knowledge: Context-Free Languages Context-Free Grammars JFLAP Tutorial In this unit, we will look at the process of converting a Context Free Grammar into an equivalent grammar in Chomsky Normal Form. With no restrictions on the form of the right-hand sides of CFG grammar rules, some grammars can be hard to use; that is, some tasks and algorithms can be difficult to carry out or inefficient. For example, if the grammar has ε-productions, brute-force parsing can be slow. Since there exist multiple equivalent CFG expressions of a given language, we consider the potential to rewrite the grammar into a form that has more desirable properties. A grammar is said to be in a normal form if it cannot be rewritten any further under the specified rules of transformation. Given a CFG grammar G, we are looking for an alternative CFG F that is a transformed version of G (produces the same language) but for which some tasks of interest are easier to perform. One example of a useful CFG normal form is Greibach Normal Form (GNF) established by Sheila Greibach. A CFG is in GNF if the right-hand sides of all production rules start with a terminal symbol, optionally followed by some variables, and the only permissible ε-production is from the start symbol. That is, all rules have the form A → aß or S → ε where A Variables (nonterminals), a Terminals, ß Variables*, and S is the start symbol. In every derivation produced by a GNF grammar exactly one terminal is generated for each rule application.
    [Show full text]
  • Lecture 5: Context Free Grammars
    Lecture 5: Context Free Grammars Introduction to Natural Language Processing CS 585 Fall 2007 Andrew McCallum Also includes material from Chris Manning. Today’s Main Points • In-class hands-on exercise • A brief introduction to a little syntax. • Define context free grammars. Give some examples. • Chomsky normal form. Converting to it. • Parsing as search Top-down, bottom up (shift-reduce), and the problems with each. Administration • Your efforts in HW1 looks good! Will get HW1 back to you on Thursday. Might want to wait to hand in HW2 until after you get it back. • Will send ping email to [email protected]. Language structure and meaning We want to know how meaning is mapped onto what language structures. Commonly in English in ways like this: [Thing The dog] is [Place in the garden] [Thing The dog] is [Property fierce] [Action [Thing The dog] is chasing [Thing the cat]] [State [Thing The dog] was sitting [Place in the garden] [Time yesterday]] [Action [Thing We] ran [Path out into the water]] [Action [Thing The dog] barked [Property/Manner loudly]] [Action [Thing The dog] barked [Property/Amount nonstop for five hours]] Word categories: Traditional parts of speech Noun Names of things boy, cat, truth Verb Action or state become, hit Pronoun Used for noun I, you, we Adverb Modifies V, Adj, Adv sadly, very Adjective Modifies noun happy, clever Conjunction Joins things and, but, while Preposition Relation of N to, from, into Interjection An outcry ouch, oh, alas, psst Part of speech “Substitution Test” The {sad, intelligent, green, fat, ...} one is in the corner.
    [Show full text]
  • CS 273, Lecture 15 Chomsky Normal Form
    CS 273, Lecture 15 Chomsky Normal form 6 March 2008 1 Chomsky Normal Form Some algorithms require context-free grammars to be in a special format, called a “normal form.” One example is Chomsky Normal Form (CNF). Chomsky Normal Form requires that each rule in the grammar look like: • T → BC, where T, B, C are all variables and neither B nor C is the start symbol, • T → a, where T is a variable and a is a terminal, or • S → ǫ, where S is the start symbol. Why should we care for CNF? Well, its an effective grammar, in the sense that every variable that being expanded (being a node in a parse tree), is guaranteed to generate a letter in the final string. As such, a word w of length n, must be generated by a parse tree that has O(n) nodes. This is of course not necessarily true with general grammars that might have huge trees, with little strings generated by them. 1.1 Outline of conversion algorithm All context-free grammars can be converted to CNF. Here is an outline of the procedure: (i) Create a new start symbol S0, with new rule S0 → S mapping it to old start symbol (i.e., S). (ii) Remove “nullable” variables (i.e., variables that can generate the empty string). (iii) Remove “unit pairs” (i.e., variables that can generate each other). (iv) Restructure rules with long righthand sides. Let’s look at an example grammar with start symbol S. ⇒ S → TST | aB (G0) T → B | S B → b | ǫ 1 After adding the new start symbol S0, we get the following grammar.
    [Show full text]
  • Chomsky and Greibach Normal Forms
    Chomsky and Greibach Normal Forms Chomsky and Greibach Normal Forms – p.1/24 One of the simplest and most useful simplified forms of CFG is called the Chomsky normal form Another normal form usually used in algebraic specifications is Greibach normal form Note the difference between grammar cleaning and simpli®cation Simplifying a CFG It is often convenient to simplify CFG Chomsky and Greibach Normal Forms – p.2/24 Another normal form usually used in algebraic specifications is Greibach normal form Note the difference between grammar cleaning and simpli®cation Simplifying a CFG It is often convenient to simplify CFG One of the simplest and most useful simplified forms of CFG is called the Chomsky normal form Chomsky and Greibach Normal Forms – p.2/24 Note the difference between grammar cleaning and simpli®cation Simplifying a CFG It is often convenient to simplify CFG One of the simplest and most useful simplified forms of CFG is called the Chomsky normal form Another normal form usually used in algebraic specifications is Greibach normal form Chomsky and Greibach Normal Forms – p.2/24 Simplifying a CFG It is often convenient to simplify CFG One of the simplest and most useful simplified forms of CFG is called the Chomsky normal form Another normal form usually used in algebraic specifications is Greibach normal form Note the difference between grammar cleaning and simpli®cation Chomsky and Greibach Normal Forms – p.2/24 Note Normal forms are useful when more advanced topics in computation theory are approached, as we shall see further Chomsky and Greibach Normal Forms – p.3/24 Definition A context-free grammar is in Chomsky normal form if every rule is of the form: ¡ ¥ ¤ £ ¢ ¡ £ ¦ ¢ ¡ ¥ ¥ ¤ ¤ ¦ § where is a terminal, § § are nonterminals, and may not be the start variable (the axiom) Chomsky and Greibach Normal Forms – p.4/24 Note £ ¡ The rule ¢ , where is the start variable, is not ex- cluded from a CFG in Chomsky normal form.
    [Show full text]