The Tibidabo Treebank

Total Page:16

File Type:pdf, Size:1020Kb

The Tibidabo Treebank The Tibidabo Treebank El treebank Tibidabo Montserrat Marimon Universitat de Barcelona Gran Via de les Corts Catalanes 585, 08007-Barcelona [email protected] Resumen: En este art´ıculopresentamos el desarrollo de un nuevo recurso de c´odigo abierto para el espa~nol:el treebank Tibidabo. La anotaci´onse est´allevando a cabo de forma semi–autom´aticaen la que, en primer lugar, el corpus es analizado au- tom´aticamente con una gram´aticasimb´olicadel espa~nolbasada en HPSG e im- plementada en el sistema Linguistic Knowledge Builder, y, en segundo lugar, los resultados del proceso de an´alisisse desambiguan manualmente. La existencia del treebank Tibidabo nos permitir´afuturos trabajos de investigaci´onpara el desar- rollo y evaluaci´onde una arquitectura h´ıbridaque combine metodos simb´olicosy estad´ısticospara el PLN, as´ıcomo investigaciones orientadas a la hibridizaci´onde t´ecnicasde bajo y alto nivel para el PLN. Palabras clave: treebank, espa~nol,HPSG. Abstract: This paper describes work in progress for the creation of a new open{ source resource for Spanish: an HPSG{based treebank so{called Tibidabo. The annotation is performed semi{automatically. First, the corpus is automatically an- notated by a symbolic HPSG{based grammar for Spanish implemented on the Lin- guistic Knowledge Builder system; then, the output is manually disambiguated. The existence of the Tibidabo treebank will facilitate research into the development and evaluation of a hybrid architecture combining symbolic and stochastic approaches to NLP, as well as investigations oriented to hybridization of shallow{deep techniques for NLP. Keywords: Treebank, Spanish, HPSG. 1 Introduction mar, a multi{purpose large{coverage HPSG{ based grammar for Spanish implemented on Linguistically interpreted natural language the Linguistic Knowledge Builder system. texts constitute a crucial resource both Then, ambiguous outputs are manually dis- for theoretical linguistic investigations about ambiguated. language use, for instance, and for practical The goal of the work we present is NLP purposes, such as the acquisition and twofold: (i) to create the training data which evaluation of parsing systems. Thus, in re- will allow us to build up a parse selection cent years, there has been an increasing in- model over the hand{built grammar follow- terest in the construction of treebanks and, ing (Toutanova et al., 2005), and (ii) since nowadays, both theory{neutral and theory{ language resource development and evalua- grounded treebanks have been developed for tion go hand in hand, to create the gold stan- a great variety of languages. Some of these dard to evaluate the hybrid system. The treebanks are presented in (Hinrichs and Tibidabo treebank, in addition, will enable Simov, 2004). to evaluate foreseen investigations oriented to This paper describes ongoing work for the the hybridization of shallow{deep techniques creation of a new resource for Spanish, an for NLP aiming at efficient, robust, and ac- HPSG{based treebank so{called Tibidabo. curate rule{based parsing. The annotation is carried out in two The primary objective of our research steps. First, the corpus is automatically an- work is to produce open{source reusable re- notated with the Spanish Resource Gram- sources both for theoretical linguistic inves- tigations and for application{oriented NLP. The Tibidabo treebank is open{source and, together with the Spanish Resource Gram- mar, is part of the DELPH{IN open{source repository of linguistic resources, which also includes HPSG{based grammars for a wide variety of languages, including En- glish (Flickinger, 2002), German (Crysmann, 2005), Japanese (Siegel and Bender, 2002), French, Korean (Kim and Yangs, 2003), mod- ern Greek (Kordoni and Neu, 2005), Norwe- gian (Hellan and Haugereid, 2005), and Por- tuguese (Branco and Costa, 2008), as well as HPSG{based treebanks, such as the LinGO Figure 1: System architecture. Redwoods for English (Oepen et al., 2004) and Hinoki for Japanese (Hashimoto, Bond, 2.1 The FreeLing Tool and Siegel, 2007).1 The paper is organized as follows. First, Before parsing input sentences with the LKB in the following three sections, we present the system, raw text is pre{processed by FreeL- main features of the system we use to anno- ing, an open{source language analysis tool tate the corpus automatically (the architec- suite performing shallow processing function- alities ranging from text tokenization to de- ture, the development environment and theo- 2 retical background, and the linguistic compo- pendency parsing (Atserias et al., 2006). nents and coverage) and the disambiguation Our system integrates the morpho pro- process, and we show the representation of cessing module of FreeLing which receives a derivations produced by the grammar. Then, sentence and morphologically annotates each in section 5, we present the corpus which is word. This module applies a cascade of spe- the basis of the treebank and shows some fig- cialized processors that includes: ures about the treebank. Finally, in section 6, we conclude this paper with a summary • Punctuation symbol annotator. and some directions for future work. • Multi{word recognizer. 2 Treebank annotation • Numerical expression recognizer. We have developed a hybrid architecture for • Date/time expression recognizer. the automatic processing of Spanish, shown in Figure 1, which integrates a shallow pro- • Ratio and percentage expression and cessing tool { FreeLing { into an HPSG{ monetary amount recognizer. based grammar implemented on the LKB • Proper noun recognizer. A fast and sim- system { the Spanish Resource Grammar. ple pattern{matching module based on The advantage of our hybrid architecture capitalization which yields an accuracy is that it allows us to release the parser from near 90%.3 certain tasks {namely, morphological analy- sis and recognition and classification of spe- • Dictionary look{up and affixes handler. cial text expressions that have been consid- • Lexical probabilities annotator and un- ered peripheral to the lexicon, e.g. numbers, known word handler. dates, percentages, currencies, proper names, etc.{ that may be robustly, efficiently, and Our system plugs the FreeLing tool into reliably dealt with by shallow external com- the system by means of the LKB Simple Pre- ponents, thus making the whole system more Processor Protocol (SPPP), which assumes adequate to deal with real world text. 2The FreeLing toolkit may be downloaded from: http://www.lsi.upc.edu/~nlp/freeling. 3FreeLing also provides a NE recognizer based on the CoNLL{2002 shared task winning system (Car- reras, M´arquez,and Padr´o,2002) with higher accu- 1See http://www.delph-in.net/. racy {over 92%{ but which is rather slower. that a preprocessor runs as an external pro- rules, a lexicon, and syntactic rules. cess to the LKB system and communicates The inflectional rules. The inflectional with its caller through its standard input and rules in the LKB system perform the mor- output channels.4 phological analysis of the words in the input The SPPP, therefore, integrates into the sentences. parsing process the PoS tags and lemmata of Since we use an external morphological each word in the sentence. analyzer, the SRG does not need a mor- FreeLing and the Spanish Resource Gram- phology component, instead we use the in- mar are two independently developed com- flectional rule component to propagate the ponents and show some discrepancies in the morpho{syntactic information associated to tagset and the lexical categories. We also use full{forms, in the form of PoS tags, to this model to handle them. the morpho{syntactic features of the lexical 2.2 The Spanish Resource items. We have defined as many rules as tags Grammar we have in the FreeLing tagset. The lexicon. The lexicon component in The Spanish Resource Grammar (SRG) is the LKB system contains the lexical entries designed as multi{purpose (abstracted away of the grammar. from any particular application), and broad{ coverage (aiming to cover not only all varia- The SRG has a full coverage lexicon of tions of the phenomena that have been im- closed word classes (pronouns, determiners, plemented, but also the combinations of dif- prepositions and conjunctions) and it con- ferent phenomena). tains about 50,000 lexical entries for open word classes (7,865 verbs, 28,025 nouns, 2.2.1 Development environment and 10,410 adjectives and 4,110 adverbs).5 These theoretical background lexical entries are organized into a multiple The grammar is implemented on the Lin- inheritance type hierarchy of about 500 leaf guistic Knowledge Builder (LKB) system { types which represent the type of words we an interactive grammar development envi- have in our lexicon. The grammar also has ronment for typed feature structure gram- 64 lexical rules to perform valence changing mars { (Copestake, 2002), based on the ba- operations on lexical items (e.g. movement sic components of the LinGO Grammar Ma- and removal of complements) which reduces trix, an open{source starter{kit for rapid de- the number of lexical entries to be manually velopment of precision broad{coverage gram- encoded in the lexicon. mars compatible with the LKB system (Ben- The syntactic rules. The syntactic der and Flickinger, 2005). rules in the LKB system are phrase structure The SRG is grounded in the theoretical rules that combine words and phrases into framework of Head{driven Phrase Structure larger constituents and compositionally build Grammar (HPSG) (Pollard and Sag, 1994), a up their semantic representation. The SRG constraint{based lexicalist approach to gram- has 191 phrase structure rules. matical theory where all linguistic objects (i.e. words and phrases) are represented as With these linguistic resources, the SRG typed feature structures, and uses Minimal handles the following range of linguistic Recursion Semantics (MRS) (Copestake et phenomena: all types of subcategorization al., 2006) for the semantic representation. structures, surface word order variation and MRS is not a semantic theory in itself, but a valence alternations, subordinate clauses, kind of meta{level which has been defined for raising and control, determination, null{ describing semantic structures.
Recommended publications
  • Language Structure: Phrases “Productivity” a Property of Language • Definition – Language Is an Open System
    Language Structure: Phrases “Productivity” a property of Language • Definition – Language is an open system. We can produce potentially an infinite number of different messages by combining elements differently. • Example – Words into phrases. An Example of Productivity • Human language is a communication system that bears some similarities to other animal communication systems, but is also characterized by certain unique features. (24 words) • I think that human language is a communication system that bears some similarities to other animal communication systems, but is also characterized by certain unique features, which are fascinating in and of themselves. (33 words) • I have always thought, and I have spent many years verifying, that human language is a communication system that bears some similarities to other animal communication systems, but is also characterized by certain unique features, which are fascinating in and of themselves. (42 words) • Although mainstream some people might not agree with me, I have always thought… Creating Infinite Messages • Discrete elements – Words, Phrases • Selection – Ease, Meaning, Identity • Combination – Rules of organization Models of Word reCombination 1. Word chains (Markov model) Phrase-level meaning is derived from understanding each word as it is presented in the context of immediately adjacent words. 2. Hierarchical model There are long-distant dependencies between words in a phrase, and these inform the meaning of the entire phrase. Markov Model Rule: Select and concatenate (according to meaning and what types of words should occur next to each other). bites bites bites Man over over over jumps jumps jumps house house house Markov Model • Assumption −Only adjacent words are meaningfully (and lawfully) related.
    [Show full text]
  • Chapter 30 HPSG and Lexical Functional Grammar Stephen Wechsler the University of Texas Ash Asudeh University of Rochester & Carleton University
    Chapter 30 HPSG and Lexical Functional Grammar Stephen Wechsler The University of Texas Ash Asudeh University of Rochester & Carleton University This chapter compares two closely related grammatical frameworks, Head-Driven Phrase Structure Grammar (HPSG) and Lexical Functional Grammar (LFG). Among the similarities: both frameworks draw a lexicalist distinction between morphology and syntax, both associate certain words with lexical argument structures, both employ semantic theories based on underspecification, and both are fully explicit and computationally implemented. The two frameworks make available many of the same representational resources. Typical differences between the analyses proffered under the two frameworks can often be traced to concomitant differ- ences of emphasis in the design orientations of their founding formulations: while HPSG’s origins emphasized the formal representation of syntactic locality condi- tions, those of LFG emphasized the formal representation of functional equivalence classes across grammatical structures. Our comparison of the two theories includes a point by point syntactic comparison, after which we turn to an exposition ofGlue Semantics, a theory of semantic composition closely associated with LFG. 1 Introduction Head-Driven Phrase Structure Grammar is similar in many respects to its sister framework, Lexical Functional Grammar or LFG (Bresnan et al. 2016; Dalrymple et al. 2019). Both HPSG and LFG are lexicalist frameworks in the sense that they distinguish between the morphological system that creates words and the syn- tax proper that combines those fully inflected words into phrases and sentences. Stephen Wechsler & Ash Asudeh. 2021. HPSG and Lexical Functional Gram- mar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), Head-Driven Phrase Structure Grammar: The handbook.
    [Show full text]
  • Lexical-Functional Grammar and Order-Free Semantic Composition
    COLING 82, J. Horeck~(eel) North-HoOandPubllshi~ Company O A~deml~ 1982 Lexical-Functional Grammar and Order-Free Semantic Composition Per-Kristian Halvorsen Norwegian Research Council's Computing Center for the Humanities and Center for Cognitive Science, MIT This paper summarizes the extension of the theory of lexical-functional grammar to include a formal, model-theoretic, semantics. The algorithmic specification of the semantic interpretation procedures is order-free which distinguishes the system from other theories providing model-theoretic interpretation for natural language. Attention is focused on the computational advantages of a semantic interpretation system that takes as its input functional structures as opposed to syntactic surface-structures. A pressing problem for computational linguistics is the development of linguistic theories which are supported by strong independent linguistic argumentation, and which can, simultaneously, serve as a basis for efficient implementations in language processing systems. Linguistic theories with these properties make it possible for computational implementations to build directly on the work of linguists both in the area of grammar-writing, and in the area of theory development (cf. universal conditions on anaphoric binding, filler-gap dependencies etc.). Lexical-functional grammar (LFG) is a linguistic theory which has been developed with equal attention being paid to theoretical linguistic and computational processing considerations (Kaplan & Bresnan 1981). The linguistic theory has ample and broad motivation (vide the papers in Bresnan 1982), and it is transparently implementable as a syntactic parsing system (Kaplan & Halvorsen forthcoming). LFG takes grammatical relations to be of primary importance (as opposed to the transformational theory where grammatical functions play a subsidiary role).
    [Show full text]
  • Psychoacoustics, Speech Perception, Language Structure and Neurolinguistics Hearing Acuity Absolute Auditory Threshold Constant
    David House: Psychoacoustics, speech perception, 2018.01.25 language structure and neurolinguistics Hearing acuity Psychoacoustics, speech perception, language structure • Sensitive for sounds from 20 to 20 000 Hz and neurolinguistics • Greatest sensitivity between 1000-6000 Hz • Non-linear perception of frequency intervals David House – E.g. octaves • 100Hz - 200Hz - 400Hz - 800Hz - 1600Hz – 100Hz - 800Hz perceived as a large difference – 3100Hz - 3800 Hz perceived as a small difference Absolute auditory threshold Demo: SPL (Sound pressure level) dB • Decreasing noise levels – 6 dB steps, 10 steps, 2* – 3 dB steps, 15 steps, 2* – 1 dB steps, 20 steps, 2* Constant loudness levels in phons Demo: SPL and loudness (phons) • 50-100-200-400-800-1600-3200-6400 Hz – 1: constant SPL 40 dB, 2* – 2: constant 40 phons, 2* 1 David House: Psychoacoustics, speech perception, 2018.01.25 language structure and neurolinguistics Critical bands • Bandwidth increases with frequency – 200 Hz (critical bandwidth 50 Hz) – 800 Hz (critical bandwidth 80 Hz) – 3200 Hz (critical bandwidth 200 Hz) Critical bands demo Effects of masking • Fm=200 Hz (critical bandwidth 50 Hz) – B= 300,204,141,99,70,49,35,25,17,12 Hz • Fm=800 Hz (critical bandwidth 80 Hz) – B=816,566,396,279,197,139,98,69,49,35 Hz • Fm=3200 Hz (critical bandwidth 200 Hz) – B=2263,1585,1115,786,555,392,277,196,139,98 Hz Effects of masking Holistic vs. analytic listening • Low frequencies more effectively mask • Demo 1: audible harmonics (1-5) high frequencies • Demo 2: melody with harmonics • Demo: how
    [Show full text]
  • Phrase Structure Rules, Tree Rewriting, and Other Sources of Recursion Structure Within the NP
    Introduction to Transformational Grammar, LINGUIST 601 September 14, 2006 Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP 1 Trees (1) a tree for ‘the brown fox sings’ A ¨¨HH ¨¨ HH B C ¨H ¨¨ HH sings D E the ¨¨HH F G brown fox • Linguistic trees have nodes. The nodes in (1) are A, B, C, D, E, F, and G. • There are two kinds of nodes: internal nodes and terminal nodes. The internal nodes in (1) are A, B, and E. The terminal nodes are C, D, F, and G. Terminal nodes are so called because they are not expanded into anything further. The tree ends there. Terminal nodes are also called leaf nodes. The leaves of (1) are really the words that constitute the sentence ‘the brown fox sings’ i.e. ‘the’, ‘brown’, ‘fox’, and ‘sings’. (2) a. A set of nodes form a constituent iff they are exhaustively dominated by a common node. b. X is a constituent of Y iff X is dominated by Y. c. X is an immediate constituent of Y iff X is immediately dominated by Y. Notions such as subject, object, prepositional object etc. can be defined structurally. So a subject is the NP immediately dominated by S and an object is an NP immediately dominated by VP etc. (3) a. If a node X immediately dominates a node Y, then X is the mother of Y, and Y is the daughter of X. b. A set of nodes are sisters if they are all immediately dominated by the same (mother) node.
    [Show full text]
  • Ccgbank: a Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
    CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank ∗ Julia Hockenmaier University of Pennsylvania ∗∗ Mark Steedman University of Edinburgh This article presents an algorithm for translating the Penn Treebank into a corpus of Combina- tory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus, CCGbank, includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium, and has been used to train wide- coverage statistical parsers that obtain state-of-the-art rates of dependency recovery. In order to obtain linguistically adequate CCG analyses, and to eliminate noise and incon- sistencies in the original annotation, an extensive analysis of the constructions and annotations in the Penn Treebank was called for, and a substantial number of changes to the Treebank were necessary. We discuss the implications of our findings for the extraction of other linguistically expressive grammars from the Treebank, and for the design of future treebanks. 1. Introduction In order to understand a newspaper article, or any other piece of text, it is necessary to construct a representation of its meaning that is amenable to some form of inference. This requires a syntactic representation which is transparent to the underlying seman- tics, making the local and long-range dependencies between heads, arguments, and modifiers explicit. It also requires a grammar that has sufficient coverage to deal with the vocabulary and the full range of constructions that arise in free text, together with a parsing model that can identify the correct analysis among the many alternatives that such a wide-coverage grammar will generate even for the simplest sentences.
    [Show full text]
  • The Minimalist Program 1 1 the Minimalist Program
    The Minimalist Program 1 1 The Minimalist Program Introduction It is my opinion that the implications of the Minimalist Program (MP) are more radical than generally supposed. I do not believe that the main thrust of MP is technical; whether to move features or categories for example. MP suggests that UG has a very different look from the standard picture offered by GB-based theories. This book tries to make good on this claim by outlining an approach to grammar based on one version of MP. I stress at the outset the qualifier “version.” Minimalism is not a theory but a program animated by certain kinds of methodological and substantive regulative ideals. These ideals are reflected in more concrete principles which are in turn used in minimalist models to analyze specific empirical phenomena. What follows is but one way of articulating the MP credo. I hope to convince you that this version spawns grammatical accounts that have a theoretically interesting structure and a fair degree of empirical support. The task, however, is doubly difficult. First, it is unclear what the content of these precepts is. Second, there is a non-negligible distance between the content of such precepts and its formal realization in specific grammatical principles and analyses. The immediate task is to approach the first hurdle and report what I take the precepts and principles of MP to be.1 1 Principles-Parameters and Minimalism MP is many things to many researchers. To my mind it grows out of the per- ceived success of the principles and parameters (P&P) approach to grammatical competence.
    [Show full text]
  • Lexical Functional Grammar Carol Neidle, Boston University
    Lexical Functional Grammar Carol Neidle, Boston University The term Lexical Functional Grammar (LFG) first appeared in print in the 1982 volume edited by Joan Bresnan: The Mental Representation of Grammatical Relations, the culmination of many years of research. LFG differs from both transformational grammar and relational grammar in assuming a single level of syntactic structure. LFG rejects syntactic movement of constituents as the mechanism by which the surface syntactic realization of arguments is determined, and it disallows alteration of grammatical relations within the syntax. A unique constituent structure, corresponding to the superficial phrase structure tree, is postulated. This is made possible by an enriched lexical component that accounts for regularities in the possible mappings of arguments into syntactic structures. For example, the alternation in the syntactic position in which the logical object (theme argument) appears in corresponding active and passive sentences has been viewed by many linguists as fundamentally syntactic in nature, resulting, within transformational grammar, from syntactic movement of constituents. However, LFG eliminates the need for a multi-level syntactic representation by allowing for such alternations to be accomplished through regular, universally constrained, productive lexical processes that determine multiple sets of associations of arguments (such as agent, theme) with grammatical functions (such as SUBJECT, OBJECT)— considered within this framework to be primitives—which then map directly into the syntax. This dissociation of syntactic structure from predicate argument structure (a rejection of Chomsky’s Projection Principle, in essence) is crucial to the LFG framework. The single level of syntactic representation, constituent structure (c-structure), exists simultaneously with a functional structure (f-structure) representation that integrates the information from c-structure and from the lexicon.
    [Show full text]
  • LSA 112 Notes on Basic Phrase Structure
    LSA 112 Notes on basic phrase structure Ida Toivonen 1 Syntactic phrases A syntactic phrase is a syntactic unit that consists of one or more words. We’ll assume here that the major word classes project phrases. By ‘major’ word classes, we mean the lexical (N, V, A, Adv, P) as opposed to functional word classes (Det, Conj, Comp, Aux). Each syntactic phrase has a head. • The head of a noun phrase (NP) is a noun. • The head of a verb phrase (VP) is a verb. • The head of an adjective phrase (AP) is an adjective. • The head of an adverb phrase (AdvP) is an adverb. • The head of a prepositional phrase (PP) is a preposition. 2 Constituents The smallest syntactic unit of a sentence is a word. Larger units are phrases. A sentence (S) is of course also a unit. These units are constituents of a sentence. Each word is a constituent, each phrase is a constituent, and the sentence itself (S) is a constituent. (1) Mary saw the cat. (2) [[[Mary]] [[saw][[the][cat]]]] (3) [S[NP [N Mary]] [VP [V saw][NP [Det the][N cat]]]] Sometimes one might want to show a simplified representation. For example, one may decide that it is unnecessary (in a given situation) to bracket the individual words. (Note that Mary is the only word in the NP.) (4) [S[NP Mary] [VP saw [NP the cat]]] An alternative notation to bracketing notation is the syntactic tree (phrase marker, phrase struc- ture tree). 1 (5) S NP VP N V NP Mary saw det N the cat Phrase markers represent the hierarchical structure and linear order of sentences.
    [Show full text]
  • The Psycholinguistics of Ellipsis
    Available online at www.sciencedirect.com ScienceDirect Lingua 151 (2014) 78--95 www.elsevier.com/locate/lingua Review The psycholinguistics of ellipsis a,b, a Colin Phillips *, Dan Parker a Department of Linguistics, University of Maryland, College Park, 20742, USA b Program in Neuroscience and Cognitive Science, University of Maryland, College Park, 20742, USA Received 13 September 2012; received in revised form 19 September 2013; accepted 4 October 2013 Available online 27 November 2013 Abstract This article reviews studies that have used experimental methods from psycholinguistics to address questions about the representa- tion of sentences involving ellipsis. Accounts of the structure of ellipsis can be classified based on three choice points in a decision tree. First: does the identity constraint between antecedents and ellipsis sites apply to syntactic or semantic representations? Second: does the ellipsis site contain a phonologically null copy of the structure of the antecedent, or does it contain a pronoun or pointer that lacks internal structure? Third: if there is unpronounced structure at the ellipsis site, does that structure participate in all syntactic processes, or does it behave as if it is genuinely absent at some levels of syntactic representation? Experimental studies on ellipsis have begun to address the first two of these questions, but they are unlikely to provide insights on the third question, since the theoretical contrasts do not clearly map onto timing predictions. Some of the findings that are emerging in studies on ellipsis resemble findings from earlier studies on other syntactic dependencies involving wh-movement or anaphora. Care should be taken to avoid drawing conclusions from experiments about ellipsis that are known to be unwarranted in experiments about these other dependencies.
    [Show full text]
  • Syntax Corrected
    01:615:201 Introduction to Linguistic Theory Adam Szczegielniak Syntax: The Sentence Patterns of Language Copyright in part: Cengage learning Learning Goals • Hierarchical sentence structure • Word categories • X-bar • Ambiguity • Recursion • Transformaons Syntax • Any speaker of any human language can produce and understand an infinite number of possible sentences • Thus, we can’ t possibly have a mental dictionary of all the possible sentences • Rather, we have the rules for forming sentences stored in our brains – Syntax is the part of grammar that pertains to a speaker’ s knowledge of sentences and their structures What the Syntax Rules Do • The rules of syntax combine words into phrases and phrases into sentences • They specify the correct word order for a language – For example, English is a Subject-Verb-Object (SVO) language • The President nominated a new Supreme Court justice • *President the new Supreme justice Court a nominated • They also describe the relationship between the meaning of a group of words and the arrangement of the words – I mean what I say vs. I say what I mean What the Syntax Rules Do • The rules of syntax also specify the grammatical relations of a sentence, such as the subject and the direct object – Your dog chased my cat vs. My cat chased your dog • Syntax rules specify constraints on sentences based on the verb of the sentence *The boy found *Disa slept the baby *The boy found in the house Disa slept The boy found the ball Disa slept soundly Zack believes Robert to be a gentleman *Zack believes to be a gentleman Zack tries to be a gentleman *Zack tries Robert to be a gentleman What the Syntax Rules Do • Syntax rules also tell us how words form groups and are hierarchically ordered in a sentence “ The captain ordered the old men and women of the ship” • This sentence has two possible meanings: – 1.
    [Show full text]
  • Lexical-Functional Grammar∗
    Lexical-Functional Grammar∗ Ash Asudeh & Ida Toivonen August 10, 2009 [19:17] CORRECTED FINAL DRAFT To appear in Bernd Heine and Heiko Narrog, eds., The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press. Contents 1 Introduction 1 2 C-structure and F-structure 2 2.1 C-structure ........................................ 3 2.2 F-structure ......................................... 4 3 Structures and Structural Descriptions 8 3.1 Constraints on C-structures ................................ 8 3.2 Constraints on F-structures ................................ 10 4 The C-structure to F-structure Correspondence 15 4.1 How the C-structure to F-structure Mapping Works ................... 15 4.2 Flexibility in Mapping .................................. 18 5 The Correspondence Architecture 19 6 Some Recent Developments 20 7 Concluding Remarks 21 A Example: Unbounded Dependency, Adjuncts, Raising, Control 23 References 25 ∗This work was supported by the Social Sciences and Humanities Research Council of Canada, Standard Research Grant 410-2006-1650. We would like to thank Mary Dalrymple, Bernd Heine, Daniela Isac, Helge Lødrup, Kumiko Murasugi, Heiko Narrog, Kenji Oda, Chris Potts, and Nigel Vincent for helpful comments and suggestions. We would also like to thank Daniela Isac and Cristina Moldoveanu for checking the Romanian data. Any remaining errors are our own. i Abstract Lexical-Functional Grammar (LFG) is a lexicalist, declarative (non-transformational), constraint- based theory of generative grammar. LFG has a detailed, industrial-strength computational imple- mentation. The theory has also proven useful for descriptive/documentary linguistics. The gram- matical architecture of LFG, sometimes called the ‘Correspondence Architecture’, posits that dif- ferent kinds of linguistic information are modelled by distinct, simultaneously present grammatical structures, each having its own formal representation.
    [Show full text]