Appendix A: Hierarchy of Grammar Formalisms

Total Page:16

File Type:pdf, Size:1020Kb

Appendix A: Hierarchy of Grammar Formalisms Appendix A: Hierarchy of Grammar Formalisms The following figure recalls the language hierarchy that we have developed in the course of the book. '' $$ '' $$ '' $$ ' ' $ $ CFG, 1-MCFG, PDA n n L1 = {a b | n ≥ 0} & % TAG, LIG, CCG, tree-local MCTAG, EPDA n n n ∗ L2 = {a b c | n ≥ 0}, L3 = {ww | w ∈{a, b} } & % 2-LCFRS, 2-MCFG, simple 2-RCG & % 3-LCFRS, 3-MCFG, simple 3-RCG n n n n n ∗ L4 = {a1 a2 a3 a4 a5 | n ≥ 0}, L5 = {www | w ∈{a, b} } & % ... LCFRS, MCFG, simple RCG, MG, set-local MCTAG, finite-copying LFG & % Thread Automata (TA) & % PMCFG 2n L6 = {a | n ≥ 0} & % RCG, simple LMG (= PTIME) & % mildly context-sensitive L. Kallmeyer, Parsing Beyond Context-Free Grammars, Cognitive Technologies, DOI 10.1007/978-3-642-14846-0, c Springer-Verlag Berlin Heidelberg 2010 216 Appendix A For each class the different formalisms and automata that generate/accept exactly the string languages contained in this class are listed. Furthermore, examples of typical languages for this class are added, i.e., of languages that belong to this class while not belonging to the next smaller class in our hier- archy. The inclusions are all proper inclusions, except for the relation between LCFRS and Thread Automata (TA). Here, we do not know whether the in- clusion is a proper one. It is possible that both devices yield the same class of languages. Appendix B: List of Acronyms The following table lists all acronyms that occur in this book. (2,2)-BRCG Binary bottom-up non-erasing RCG with at most two vari- ables per left-hand side argument 2-SA Two-Stack Automaton ACG Abstract Categorial Grammar AFL Abstract Family of Languages CCG Combinatory Categorial Grammar CFG Context-Free Grammar CNF Chomsky Normal Form EPDA Extended Push-Down Automaton FSA Finite State Automaton GCFG Generalized Context-Free Grammar GNF Greibach Normal Form HPSG Head-Driven Phrase Structure Grammar IG Indexed Grammar LCFRS Linear Context-Free Rewriting System LFG Lexical Functional Grammar LIG Linear Indexed Grammar LTAG Lexicalized TAG LMG Literal Movement Grammar MCFG Multiple Context-Free Grammar MCTAG Multicomponent Tree Adjoining Grammar MG Minimalist Grammar NRCG Negative Range Concatenation Grammar NPDA Nested Push-Down Automaton PDA Push-Down Automaton PMCFG Parallel Multiple Context-Free Grammar PRCG Positive Range Concatenation Grammar RCG Range Concatenation Grammar SD-2SA Strongly-Driven Two-Stack Automaton 218 Appendix B SNMCTAG tree-local MCTAG with shared nodes SRCG Simple Range Concatenation Grammar TA Thread Automaton TAG Tree Adjoining Grammar TSG Tree Substitution Grammar TT-MCTAG Tree-Tuple MCTAG with Shared Nodes V-TAG Vector-TAG Solutions Problems of Chapter 2 n n 2.1 L2 := {a b | n ≥ 0} 1. G = N,T,P,S with N = {S}, T = {a, b}, start symbol S and produc- tions S → aSb, S → ε. 2. Assume that such a CFG exists. Its productions are then all of the form X → αaβbγ with X ∈ N, α, β, γ ∈ N ∗ such that if such a production is applied when generating a string a1 ...anb1 ...bn, then the a and b of the production necessarily end up at positions i and n+i for some i, 1 ≤ i ≤ n. Then replacing each of these productions X → αaβbγ with X → αaβaγ and X → αbβbγ leads to a CFG generating the copy language. This contradicts the fact that the copy language is not context-free. 2.2 A first homomorphism can the homomorphism f from (Shieber, 1985). After having applied f to Swiss German, we intersect with the regular lan- ∗ ∗ ∗ ∗ guage w{a, b} x{c, d} y, which leads to {wv1xv2y | v1 ∈{a, b} ,v2 ∈{c, d} such that |v1| = |v2| and for all i,1≤ i ≤|v1|,iftheith symbol in v1 is an a (a b), the ith symbol in v2 is a c (a d)}. Finally we apply a second homomorphism g to the result of the intersection with g(w):=g(x):=g(y):=ε, g(a):=g(c):=a, g(b):=g(d):=b. This leads to the copy language. 2.3 There are several possibilities. The simplest one: S VP VP NP VP ADV VP V John always laughs 220 Solutions Linguistically, this is unsatisfying since the S node comes with the lexical item John even though it is the maximal projection of the verb. A lexicalized TSG for the given CFG where the S node comes with the verb is not possible. 2.4 1. The copy language L := {ww | w ∈ T ∗} is letter equivalent to L := {wwR | w ∈ T ∗ and wR is w in reverse order},whichisaCFL:Itis generated by the CFG with productions S → ε and S → xSx for all x ∈ T . Consequently (with Parikh’s theorem) L and also L are semilinear. 2n 2. Assume that {a | n ≥ 0} satisfies the constant growth property with c0 2m m and C. Then take a w = a with |w| =2 >max({c0}∪C). Then, m+1 according to the definition of constant growth, for w = a2 there must k be a w = a2 with |w| = |w| + c for some c ∈ C. I.e., 2m+1 =2k + c. Consequently (since k ≤ m) c ≥ 2m. Contradiction. Problems of Chapter 3 3.1 The items can have the form [•A, i, −], 0 ≤ i ≤ n (for predicted categories) and [A•,i,j], 0 ≤ i<j≤ n (for completed categories). The goal item is [S•, 0,n]. We need the following deduction rules: 1. An operation scan-predict that, starting from a predicted A-item, predicts a B-item, based on the existence of a production A → aB: [•A, i, −] there is a production A → aB ∈ P with wi+1 = a. [•B,i +1, −] 2. An operation scan that turns a predicted A into a completed A based on the existence of a production A → a: [•A, i, −] there is a production A → a ∈ P with wi+1 = a. [A•,i,i+1] 3. An operation complete that turns a predicted A into a completed A based on the existence of a completed B and a production A → aB: [•A, i, −][B•,i+1,j] there is a production A → aB ∈ P with wi+1 = a. [A•,i,j] 3.2 ∗ ∗ To show: If [A → α • β,i,j] then S ⇒ w1 ···wiAγ ⇒ w1 ···wiαβγ ⇒ ∗ w1 ···wjβγ for some γ ∈ (N ∪ T ) . We show this by an induction on the deduction rules: • Axioms: S → α ∈ P [S →•α, 0, 0] ∗ Trivially, if [S →•α, 0, 0] is obtained by this rule, then S ⇒ εS ⇒ εα. Solutions 221 [A → α • Bβ,i,j] • Predict: B → γ ∈ P [B →•γ,j,j] We assume that the claim holds for the antecedent item. We then obtain ∗ ∗ (because of the production B → γ) S ⇒ w1 ···wiAγ ⇒ w1 ···wiαBβγ ⇒ w1 ···wjBβγ ⇒ w1 ···wjγγ . [A → α • aβ,i,j] • Scan: wj+1 = a [A → αa • β,i,j +1] Since the claim holds for the antecedent item, with the side condition, it holds immediately for the consequent item. [A → α • Bβ,i,j], [B → γ•,j,k] • Complete: [A → αB • β,i,k] Since the induction claim holds for the antecedent items, we have in par- ∗ ticular B ⇒ wj+1 ···wk. With this and the claim for the antecedent A- ∗ ∗ item, we obtain S ⇒ w1 ···wiAγ ⇒ w1 ···wiαBβγ ⇒ w1 ···wjBβγ ⇒ w1 ···wkβγ . The algorithm is sound since, as a special case, we obtain for the goal ∗ items that [S → α•, 0,n] implies S ⇒ α ⇒ w1 ···wn. 3.3 The space complexity of the CYK algorithm is determined by the memory requirement for the chart. Since this is an |N|×n×n table, we obtain a space complexity O(n2). 3.4 The most complex rule is complete with three indices ranging form 0 to 2 n and with |P | possible productions and lrhs = maxA→α∈P (|α| +1)pos- sible positions for the dot in the antecedent A-items. Note that, once the A-production is fixed, the position of the dot determines the left-hand side symbol of the second production. Let mrhs = maxA∈N |{A → α ∈ P }|. Then 3 we have ≤ lrhs ·|P |·mrhs(n +1) possible different applications of complete. Consequently, as in the CYK case, the time complexity of the fixed recognition problem is O(n3). Problems of Chapter 4 4.1 1. TAG for L3: SNA S aS ε ∗ bSNA c 222 Solutions 2. Assume that a TAG G for L3 without adjunction constraints exists. As- sume without loss of generality that G contains no substitution nodes. G has at least one auxiliary tree β with leaves labeled with terminals. β must contain equal numbers of as, bsandcs. (Otherwise one could de- rive a word with different numbers of as, bsandcs.) One can adjoin β at its root, which leads to a derived auxiliary β. If the yield of β is aibici (i ≥ 1), there are the following possibilities for the foot node: a) The foot node is left of all as or right of all cs ⇒ using β a word with substring aibiciaibici can be derived. Contradiction. b) The foot node is right of the kth a for some k,1≤ k ≤ i ⇒ using β a word with substring ai−kbiciai−kbici can be derived. Contradiction. c) The foot node is right of the kth b for some k,1≤ k ≤ i ⇒ using β a word with substrings aibkaibk and bi−kcibi−kci can be derived. Contradiction. d) The foot node is left of the kth c for some k,1≤ k ≤ i ⇒ using β a word with substring aibick−1aibick−1 can be derived. Contradiction. Consequently, there is no TAG without adjunction constraints for L3.
Recommended publications
  • The Structure of Index Sets and Reduced Indexed Grammars Informatique Théorique Et Applications, Tome 24, No 1 (1990), P
    INFORMATIQUE THÉORIQUE ET APPLICATIONS R. PARCHMANN J. DUSKE The structure of index sets and reduced indexed grammars Informatique théorique et applications, tome 24, no 1 (1990), p. 89-104 <http://www.numdam.org/item?id=ITA_1990__24_1_89_0> © AFCET, 1990, tous droits réservés. L’accès aux archives de la revue « Informatique théorique et applications » im- plique l’accord avec les conditions générales d’utilisation (http://www.numdam. org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright. Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques http://www.numdam.org/ Informatique théorique et Applications/Theoretical Informaties and Applications (vol. 24, n° 1, 1990, p. 89 à 104) THE STRUCTURE OF INDEX SETS AND REDUCED INDEXED GRAMMARS (*) by R. PARCHMANN (*) and J. DUSKE (*) Communicated by J. BERSTEL Abstract. - The set of index words attached to a variable in dérivations of indexed grammars is investigated. Using the regularity of these sets it is possible to transform an mdexed grammar in a reducedfrom and to describe the structure ofleft sentential forms of an indexed grammar. Résumé. - On étudie Vensemble des mots d'index d'une variable dans les dérivations d'une grammaire d'index. La rationalité de ces ensembles peut être utilisée pour transformer une gram- maire d'index en forme réduite, et pour décrire la structure des mots apparaissant dans les dérivations gauches d'une grammaire d'index. 1. INTRODUCTION In this paper we will further investigate indexed grammars and languages introduced by Aho [1] as an extension of context-free grammars and lan- guages.
    [Show full text]
  • Genome Grammars
    Genome Grammars Genome Sequences and Formal Languages Andreas de Vries Version: June 17, 2011 Wir sind aus Staub und Fantasie Andreas Bourani, Nur in meinem Kopf (2011) Contents 1 Genetics 5 1.1 Cell physiology ............................. 5 1.2 Amino acids and proteins ....................... 6 1.2.1 Geometry of peptide bonds .................. 8 1.2.2 Protein structure ......................... 10 1.3 Nucleic acids ............................... 12 1.4 DNA replication ............................. 14 1.5 Flow of information for cell growth .................. 16 1.5.1 The genetic code ......................... 18 1.5.2 Open reading frames, coding regions, and genes ...... 19 2 Formal languages 23 2.1 The Chomsky hierarchy ......................... 25 2.2 Regular languages ............................ 28 2.2.1 Regular expressions ....................... 30 2.3 Context-free languages ......................... 31 2.3.1 Linear languages ........................ 33 2.4 Context-sensitive languages ...................... 34 2.4.1 Indexed languages ....................... 35 2.5 Languages and machines ........................ 38 3 Grammar of DNA Strings 42 3.1 Searl’s approach to DNA language .................. 42 3.2 Gene regulation and inadequacy of context-free grammars ..... 45 3.3 DNA splicing rule ............................ 46 A Mathematical Foundations 49 A.1 Notation ................................. 49 A.2 Sets .................................... 49 A.3 Maps ................................... 52 A.4 Algebra .................................
    [Show full text]
  • Probabilistic Grammars and Their Applications This Discretion to Pursue Political and Economic Ends
    Probabilistic Grammars and their Applications this discretion to pursue political and economic ends. and the Law; Monetary Policy; Multinational Cor- Most experiences, however, suggest the limited power porations; Regulation, Economic Theory of; Regu- of privatization in changing the modes of governance lation: Working Conditions; Smith, Adam (1723–90); which are prevalent in each country’s large private Socialism; Venture Capital companies. Further, those countries which have chosen the mass (voucher) privatization route have done so largely out of necessity and face ongoing efficiency problems as a result. In the UK, a country Bibliography whose privatization policies are often referred to as a Armijo L 1998 Balance sheet or ballot box? Incentives to benchmark, ‘control [of privatized companies] is not privatize in emerging democracies. In: Oxhorn P, Starr P exerted in the forms of threats of take-over or (eds.) The Problematic Relationship between Economic and bankruptcy; nor has it for the most part come from Political Liberalization. Lynne Rienner, Boulder, CO Bishop M, Kay J, Mayer C 1994 Introduction: privatization in direct investor intervention’ (Bishop et al. 1994, p. 11). After the steep rise experienced in the immediate performance. In: Bishop M, Kay J, Mayer C (eds.) Pri ati- zation and Economic Performance. Oxford University Press, aftermath of privatizations, the slow but constant Oxford, UK decline in the number of small shareholders highlights Boubakri N, Cosset J-C 1998 The financial and operating the difficulties in sustaining people’s capitalism in the performance of newly privatized firms: evidence from develop- longer run. In Italy, for example, privatization was ing countries.
    [Show full text]
  • Grammars and Normal Forms
    Grammars and Normal Forms Read K & S 3.7. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no, AND if yes, describe the structure a + b * c Now it's time to worry about extracting structure (and doing so efficiently). Optimizing Context-Free Languages For regular languages: Computation = operation of FSMs. So, Optimization = Operations on FSMs: Conversion to deterministic FSMs Minimization of FSMs For context-free languages: Computation = operation of parsers. So, Optimization = Operations on languages Operations on grammars Parser design Before We Start: Operations on Grammars There are lots of ways to transform grammars so that they are more useful for a particular purpose. the basic idea: 1. Apply transformation 1 to G to get of undesirable property 1. Show that the language generated by G is unchanged. 2. Apply transformation 2 to G to get rid of undesirable property 2. Show that the language generated by G is unchanged AND that undesirable property 1 has not been reintroduced. 3. Continue until the grammar is in the desired form. Examples: • Getting rid of ε rules (nullable rules) • Getting rid of sets of rules with a common initial terminal, e.g., • A → aB, A → aC become A → aD, D → B | C • Conversion to normal forms Lecture Notes 16 Grammars and Normal Forms 1 Normal Forms If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with. Normal forms are designed to do just that.
    [Show full text]
  • CS351 Pumping Lemma, Chomsky Normal Form Chomsky Normal
    CS351 Pumping Lemma, Chomsky Normal Form Chomsky Normal Form When working with a context-free grammar a simple and useful form is called the Chomsky Normal Form (CNF). A CFG in CNF is one where every rule is of the form: A ! BC A ! a Where a is any terminal and A,B, and C are any variables, except that B and C may not be the start variable. Note that we have two and only two variables on the right hand side of the rule, with the exception that the rule S!ε is permitted where S is the start variable. Theorem: Any context free language may be generated by a context-free grammar in Chomsky normal form. To show how to make this conversion, we will need to do three things: 1. Eliminate all ε rules of the form A!ε 2. Eliminate all unit rules of the form A!B 3. Convert remaining rules into rules of the form A!BC Proof: 1. First add a new start symbol S0 and the rule S0 ! S, where S was the original start symbol. This guarantees that the start symbol doesn’t occur on the right hand side of a rule. 2. Remove all ε rules. Remove a rule A!ε where A is not the start symbol For each occurrence of A on the right-hand side of a rule, add a new rule with that occurrence of A deleted. Ex: R!uAv becomes R!uv This must be done for each occurrence of A, so the rule: R!uAvAw becomes R! uvAw and R! uAvw and R!uvw This step must be repeated until all ε rules are removed, not including the start.
    [Show full text]
  • A Declarative Characterization of Different Types of Multicomponent
    A Declarative Characterization of Different Types of Multicomponent Tree Adjoining Grammars† Laura Kallmeyer ([email protected]) Collaborative Research Center 833, University of T¨ubingen, Nauklerstr. 35, D-72074 T¨ubingen, Germany. Abstract. Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing. Keywords: Tree Adjoining Grammar, MCTAG, multicomponent rewriting, Simple Range Concatenation Grammar 1. Introduction 1.1. Tree Adjoining Grammars Tree Adjoining Grammar (TAG, see Joshi and Schabes, 1997) is a tree- rewriting formalism. A TAG consists of a finite set of trees (elementary trees). The nodes of these trees are labelled with nonterminals and ter- minals (terminals only label leaf nodes).
    [Show full text]
  • What Was Wrong with the Chomsky Hierarchy?
    What Was Wrong with the Chomsky Hierarchy? Makoto Kanazawa Hosei University Chomsky Hierarchy of Formal Languages recursively enumerable context- sensitive context- free regular Enduring impact of Chomsky’s work on theoretical computer science. 15 pages devoted to the Chomsky hierarchy. Hopcroft and Ullman 1979 No mention of the Chomsky hierarchy. 450 INDEX The same with the latest edition of Carmichael, R. D., 444 CNF-formula, 302 Cartesian product, 6, 46 Co-Turing-recognizableHopcroft, language, 209 Motwani, and Ullman (2006). CD-ROM, 349 Cobham, Alan, 444 Certificate, 293 Coefficient, 183 CFG, see Context-free grammar Coin-flip step, 396 CFL, see Context-free language Complement operation, 4 Chaitin, Gregory J., 264 Completed rule, 140 Chandra, Ashok, 444 Complexity class Characteristic sequence, 206 ASPACE(f(n)),410 Checkers, game of, 348 ATIME(t(n)),410 Chernoff bound, 398 BPP,397 Chess, game of, 348 coNL,354 Chinese remainder theorem, 401 coNP,297 Chomsky normal form, 108–111, 158, EXPSPACE,368 198, 291 EXPTIME,336 Chomsky, Noam, 444 IP,417 Church, Alonzo, 3, 183, 255 L,349 Church–Turing thesis, 183–184, 281 NC,430 CIRCUIT-SAT,386 NL,349 Circuit-satisfiability problem, 386 NP,292–298 CIRCUIT-VALUE,432 NPSPACE,336 Circular definition, 65 NSPACE(f(n)),332 Clause, 302 NTIME(f(n)),295 Clique, 28, 296 P,284–291,297–298 CLIQUE,296 PH,414 Sipser 2013Closed under, 45 PSPACE,336 Closure under complementation RP,403 context-free languages, non-, 154 SPACE(f(n)),332 deterministic context-free TIME(f(n)),279 languages, 133 ZPP,439 P,322 Complexity
    [Show full text]
  • LING83600: Context-Free Grammars
    LING83600: Context-free grammars Kyle Gorman 1 Introduction Context-free grammars (or CFGs) are a formalization of what linguists call “phrase structure gram- mars”. The formalism is originally due to Chomsky (1956) though it has been independently discovered several times. The insight underlying CFGs is the notion of constituency. Most linguists believe that human languages cannot be even weakly described by CFGs (e.g., Schieber 1985). And in formal work, context-free grammars have long been superseded by “trans- formational” approaches which introduce movement operations (in some camps), or by “general- ized phrase structure” approaches which add more complex phrase-building operations (in other camps). However, unlike more expressive formalisms, there exist relatively-efficient cubic-time parsing algorithms for CFGs. Nearly all programming languages are described by—and parsed using—a CFG,1 and CFG grammars of human languages are widely used as a model of syntactic structure in natural language processing and understanding tasks. Bibliographic note This handout covers the formal definition of context-free grammars; the Cocke-Younger-Kasami (CYK) parsing algorithm will be covered in a separate assignment. The definitions here are loosely based on chapter 11 of Jurafsky & Martin, who in turn draw from Hopcroft and Ullman 1979. 2 Definitions 2.1 Context-free grammars A context-free grammar G is a four-tuple N; Σ; R; S such that: • N is a set of non-terminal symbols, corresponding to phrase markers in a syntactic tree. • Σ is a set of terminal symbols, corresponding to words (i.e., X0s) in a syntactic tree. • R is a set of production rules.
    [Show full text]
  • The Logic of Categorial Grammars: Lecture Notes Christian Retoré
    The Logic of Categorial Grammars: Lecture Notes Christian Retoré To cite this version: Christian Retoré. The Logic of Categorial Grammars: Lecture Notes. RR-5703, INRIA. 2005, pp.105. inria-00070313 HAL Id: inria-00070313 https://hal.inria.fr/inria-00070313 Submitted on 19 May 2006 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE The Logic of Categorial Grammars Lecture Notes Christian Retoré N° 5703 Septembre 2005 Thème SYM apport de recherche ISRN INRIA/RR--5703--FR+ENG ISSN 0249-6399 The Logic of Categorial Grammars Lecture Notes Christian Retoré ∗ Thème SYM — Systèmes symboliques Projet Signes Rapport de recherche n° 5703 — Septembre 2005 —105 pages Abstract: These lecture notes present categorial grammars as deductive systems, and include detailed proofs of their main properties. The first chapter deals with Ajdukiewicz and Bar-Hillel categorial grammars (AB grammars), their relation to context-free grammars and their learning algorithms. The second chapter is devoted to the Lambek calculus as a deductive system; the weak equivalence with context free grammars is proved; we also define the mapping from a syntactic analysis to a higher-order logical formula, which describes the seman- tics of the parsed sentence.
    [Show full text]
  • Mildly Context-Sensitive Grammar-Formalisms
    Mildly Context Sensitive Grammar Formalisms Petra Schmidt Pfarrer-Lauer-Strasse 23 66386 St. Ingbert [email protected] Overview ,QWURGXFWLRQ«««««««««««««««««««««««««««««««««««« 3 Tree-Adjoining-Grammars 7$*V «««««««««««««««««««««««««« 4 CoPELQDWRU\&DWHJRULDO*UDPPDU &&*V ««««««««««««««««««««««« 7 /LQHDU,QGH[HG*UDPPDU /,*V ««««««««««««««««««««««««««« 8 3URRIRIHTXLYDOHQFH«««««««««««««««««««««««««««««««« 9 CCG ĺ /,*«««««««««««««««««««««««««««««««««« 9 LIG ĺ 7$*«««««««««««««««««««««««««««««««««« 10 TAG ĺ &&*«««««««««««««««««««««««««««««««««« 13 Linear Context-)UHH5HZULWLQJ6\VWHPV /&)56V «««««««««««««««««««« 15 Multiple Context-Free-*UDPPDUV««««««««««««««««««««««««««« 16 &RQFOXVLRQ«««««««««««««««««««««««««««««««««««« 17 References««««««««««««««««««««««««««««««««««««« 18 Introduction Natural languages are not context free. An example non-context-free phenomenon are cross-serial dependencies in Dutch, an example of which we show in Fig. 1. ik haar hem de nijlpaarden zag helpen voeren Fig. 1.: cross-serial dependencies A concept motivated by the intention of characterizing a class of formal grammars, which allow the description of natural languages such as Dutch with its cross-serial dependencies was introduced in 1985 by Aravind Joshi. This class of formal grammars is known as the class of mildly context-sensitive grammar-formalisms. According to Joshi (1985, p.225), a mildly context-sensitive language L has to fulfill three criteria to be understood as a rough characterization. These are: The parsing problem for L is solvable in
    [Show full text]
  • Cs.FL] 7 Dec 2020 4 a Model Programming Language and Its Grammar 11 4.1 Alphabet
    Describing the syntax of programming languages using conjunctive and Boolean grammars∗ Alexander Okhotin† December 8, 2020 Abstract A classical result by Floyd (\On the non-existence of a phrase structure grammar for ALGOL 60", 1962) states that the complete syntax of any sensible programming language cannot be described by the ordinary kind of formal grammars (Chomsky's \context-free"). This paper uses grammars extended with conjunction and negation operators, known as con- junctive grammars and Boolean grammars, to describe the set of well-formed programs in a simple typeless procedural programming language. A complete Boolean grammar, which defines such concepts as declaration of variables and functions before their use, is constructed and explained. Using the Generalized LR parsing algorithm for Boolean grammars, a pro- gram can then be parsed in time O(n4) in its length, while another known algorithm allows subcubic-time parsing. Next, it is shown how to transform this grammar to an unambiguous conjunctive grammar, with square-time parsing. This becomes apparently the first specifica- tion of the syntax of a programming language entirely by a computationally feasible formal grammar. Contents 1 Introduction 2 2 Conjunctive and Boolean grammars 5 2.1 Conjunctive grammars . .5 2.2 Boolean grammars . .6 2.3 Ambiguity . .7 3 Language specification with conjunctive and Boolean grammars 8 arXiv:2012.03538v1 [cs.FL] 7 Dec 2020 4 A model programming language and its grammar 11 4.1 Alphabet . 11 4.2 Lexical conventions . 12 4.3 Identifier matching . 13 4.4 Expressions . 14 4.5 Statements . 15 4.6 Function declarations . 16 4.7 Declaration of variables before use .
    [Show full text]
  • From Indexed Grammars to Generating Functions∗
    RAIRO-Theor. Inf. Appl. 47 (2013) 325–350 Available online at: DOI: 10.1051/ita/2013041 www.rairo-ita.org FROM INDEXED GRAMMARS TO GENERATING FUNCTIONS ∗ Jared Adams1, Eric Freden1 and Marni Mishna2 Abstract. We extend the DSV method of computing the growth series of an unambiguous context-free language to the larger class of indexed languages. We illustrate the technique with numerous examples. Mathematics Subject Classification. 68Q70, 68R15. 1. Introduction 1.1. Indexed grammars Indexed grammars were introduced in the thesis of Aho in the late 1960s to model a natural subclass of context-sensitive languages, more expressive than context-free grammars with interesting closure properties [1,16]. The original ref- erence for basic results on indexed grammars is [1]. The complete definition of these grammars is equivalent to the following reduced form. Definition 1.1. A reduced indexed grammar is a 5-tuple (N , T , I, P, S), such that 1. N , T and I are three mutually disjoint finite sets of symbols: the set N of non-terminals (also called variables), T is the set of terminals and I is the set of indices (also called flags); 2. S ∈N is the start symbol; 3. P is a finite set of productions, each having the form of one of the following: (a) A → α Keywords and phrases. Indexed grammars, generating functions, functional equations, DSV method. ∗ The third author gratefully acknowledges the support of NSERC Discovery Grant funding (Canada), and LaBRI (Bordeaux) for hosting during the completion of the work. 1 Department of Mathematics, Southern Utah University, Cedar City, 84720, UT, USA.
    [Show full text]