Parsing Japanese with a PCFG Treebank Grammar

Total Page:16

File Type:pdf, Size:1020Kb

Parsing Japanese with a PCFG Treebank Grammar 言語処理学会 第20回年次大会 発表論文集 (2014年3月) Parsing Japanese with a PCFG treebank grammar Tsaiwei FANG* Alastair BUTLER†‡ Kei YOSHIMOTO*‡ *Graduate School of International Cultural Studies, Tohoku University †PRESTO, Japan Science and Technology Agency ‡Center for the Advancement of Higher Education, Tohoku University Abstract Table 1: Keyaki Treebank content This paper describes constituent parsing of Domain Number of trees Japanese using a probabilistic context-free blog posts 217 grammar treebank grammar that is enhanced Japanese Law 484 newspaper 1600 with Parent Encoding, reversible tree trans- telephone calls 1177 formations, refinement of treebank labels and textbooks 7733 Markovisation. We evaluate the quality of the Wikipedia 2464 resulting parsing. Total 13675 1 Introduction 2 The parser The parsing of this paper is made possible The focus of this paper is on the task of ob- because of the unlexicalised statistical parser taining a constituent parser for Japanese us- BitPar (Schmid, 2004), which allows any ing a treebank (Keyaki Treebank) as training grammar rule files in the proper format to be data and a Probabilistic Context-Free Gram- used for parsing. BitPar uses a fast bitvector- mar (PCFG) as parsing method. We report re- based implementation of the Cocke-Younger- sults for a vanilla PCFG model that is directly Kasami algorithm, storing the parse chart as read off the training data and for an enhanced a large bit vector. This enables full parsing PCFG model obtained with transformations of (without search space pruning) with large tree- the training data that aim to tune the treebank bank grammars. BitPar can extract from the representations to the specific needs of proba- parse chart the most likely parse tree (Viterbi bilistic context-free parsers, while allowing for parse), or the full set of parses in the form of a the original annotation to be restored. Specif- parse forest, or the n-best parse trees. ically, we follow best practices from Johnson (1998), Klein and Manning (2003) and Fraser et al. (2013) among others of Parent Encoding, 3 The treebank reversible tree transformations, refinement of treebank labels and Markovisation. PCFGs so The grammar and lexicon used by the Bit- derived can be used by a parser to construct Par parser are extracted from the Keyaki Tree- maximal probability (Viterbi) parses. We eval- bank (Butler et al. 2012). The current com- uate the quality of the resulting parsing using position of the Keyaki Treebank is detailed standard PARSEVAL constituency measures. in Table 1. The treebank uses an annota- Our fully labelled bracketing score for a held- tion scheme that follows, with adaptations for out portion of the Keyaki Treebank (1,300 Japanese, the general scheme proposed in the trees) is 79.97 (recall), 80.61 (precision) and Annotation manual for the Penn Historical 80.29 (F-score). We show a learning curve Corpora and the PCEEC (Santorini 2010). suggestive that parser performance will con- Constituent structure is represented with la- tinue to strongly improve with access to more belled bracketing and augmented with gram- training data. matical functions and notation for recovering ― 432 ― Copyright(C) 2014 The Association for Natural Language Processing. All Rights Reserved. discontinuous constituents. Primary motiva- tion forthe annotationhas been to facilitate au- Figure 1: growth of phrase structure rules tomated searches for linguistic research (e.g., 14000 1 enhanced model all enhanced model f > 1 via CorpusSearch ), and to provide a syntactic enhanced model f > 2 enhanced model f > 4 12000 vanilla model all base that is sufficiently rich to enable an au- vanilla model f > 1 vanilla model f > 2 vanilla model f > 4 tomatic generation of (higher-order) predicate 10000 logic based meaning representations.2 A typi- cal parse in tree form looks like: 8000 6000 ✘IP-MAT✘❍❤❤❤❤❤ PP NP-SBJ PP VB P VB2 AX AXD PU ✥✥✥❵❵❵ 4000 Number of phrase structure rules 。 NP P * ✭✭✭✭NP❤❤❤❤ P 寝 て しまい まし た N は✭✭✭✭IP-EMB✏✏✭❳❤❳❤❳❤❤❤ N で 2000 弟 PP NP-OB1 VB AXDまま 0 NP P *を* つけ た 0 2000 4000 6000 8000 10000 12000 14000 Number of training sentences N を テレビ Every word has a word level part-of-speech la- this was created with techniques from Johnson bel. Phrasal nodes (NP, PP, ADJP, etc.) im- (1998), Klein and Manning (2003) and Fraser mediately dominate the phrase head (N, P, et al. (2013) among others, of: ADJ, etc.), so that the phrase head has as sis- ters both modifiers and complements. Mod- 1. eliminating discontinuous constituents ifiers and complements are distinguished be- and (Section 4.1), cause there are extended phrase labels to mark function (e.g., -EMB encodes that the clause 2. transforming and augmenting treebank テ レ ビ を つ け た is a complement of the annotations (Section 4.2), and phrase head まま). All noun phrases imme- diately dominated by IP are marked for func- 3. following the extraction of phrase struc- tion (NP-SBJ=subject, NP-OB1=direct object, ture rules, lexical rules, and their fre- NP-TMP=temporal NP, etc.). The PP la- quencies from the annotated parse trees, bel is never extended with function mark- markovising the grammar (Section 4.3). ing. However the immediately following sibling of a PP may be present in the an- Note that all curves of Figure 1 remain steep at notation to provide disambiguation informa- the maximum training set size of 12,375 trees, suggesting more data would lead to more sig- tion for the PP. Thus, (NP-OB1 *を*) in- dicates the immediately preceeding PP (with nificant growth. As a comparison, the treebank case particle を) is the object, while (NP-SBJ grammar of Schmid (2006) extracted from the Penn Treebank of English has 52,297 phrase *) indicates the immediately preceeding PP (without case particle) is the subject. All structure rules (enabling a labelled bracketing clauses have extended labels to mark function F-score of 86.6%). (IP-MAT=matrix clause, IP-ADV=adverbial clause, IP-REL=relative clause, etc.). 4.1 Discontinuous constituents The Keyaki Treebank annotates trace nodes, 4 Extracted grammars for example with relative clauses. But un- like the Penn Treebank (Bies et al. 1995) trace Figure 1 shows the growth of extracted phrase nodes are not indexed and typically appear structure rules for a vanilla grammar model di- clause initially, with precise attachment points rectly read off the Keyaki Treebank and also unspecified since it is enough to assume that for an enhanced model. The enhanced model constituents at the IP level are dependents of is obtained after reversible changes are made the main verb of the clause. For trace nodes to the treebank aimed at improving parsing within embedded contexts (cases of long dis- quality. This section focuses on the changes tance dependency) it is the phrase level of at- made for the enhanced model. Specifically tachment for the trace node that is the relevant 1http://corpussearch.sourceforge.net indicator of the dependency. For the current 2http://www.compling.jp/ts work we assume parse trees from which trace ― 433 ― Copyright(C) 2014 The Association for Natural Language Processing. All Rights Reserved. nodes are removed and aim to recognize dis- section 3 results in the following tree repre- continuous constituents in a post-processing sentation: step, following for example Johnson (2001). IP-MATˆTOP 。 ✭✭✭IP✭-MATˆTOP✭❤❤❤❤ PU- 。 ✥✥PPˆIP✥❵❵❵ ✭✭✭✭PPˆIP✭❤❤❤❤❤ VB-INTRNS P-CONJ-て VB2 AX2 AXD 4.2 Transforming and augmenting NPˆPP P-OPTR-は ✭✭✭✭NPˆPP✭❤❤❤❤❤ P-CASE-で 寝 て しまい まし た N ✭は✭✭IP-EMBˆNP✭✭❤❤❤❤❤N-STRUCTURAL で annotations 弟 ✥✥PPˆIP✥❵❵❵ VB-TRNS AXD まま NPˆPP P-CASE-をつけ た N を In addition to removing trace nodes, transfor- テレビ mations and augmentations of the trees are performed. Specifically interjection, punctu- ation or parenthetical materials occurring at 4.3 Markovisation the left or right periphery of a constituent The Keyaki Treebank uses rather flat struc- are moved to a new projection of the con- tures, particularly at the clause level, with stituent. There is also Parent Encoding, fol- nodes having up to 31 child nodes. As Fraser lowing Johnson (1998), which copies the syn- et al. (2013) note this causes problems be- tactic label of a parent node (minus the func- cause only some rules of that length appear tional information) onto the labels of its chil- in the training data. This sparse data prob- dren. Finally there are refinements to the POS lem is solved by markovisation (Collins 1997), tags. which splits long rules into a set of shorter Refinements to the POS tags include the rules. The following shows consequences to P (particle) tag becoming either P-CASE, P- our running example of markovising the rule CONJ, P-COORD, P-CP-THT, P-FINAL, P- IP-MATˆTOP–> PPˆIP PPˆIP VB-INTRNS P- NOUN or P-OPTR depending on the func- CONJ-て VB2 AX2 AXD. tional role of the particle and/or the syntac- ✭✭IP-MATˆTOP✭✭❤❤❤❤ tic context in which the particle occurs. Verbs IP-MATˆTOP PU-。 。 are also split to inform information about the <IP-MATˆTOP[VB-INTRNS]AX2|AXD>✭✭✭✭✭✭ AXD clause of occurrence: VB-THT (verb with <IP-MATˆTOP[VB-INTRNS]VB2|AX2> AX2 た <IP-MATˆTOP[VB-INTRNS]P-CONJ-て|VB2> VB2 まし <IP-MATˆTOP[VB-INTRNS]VB-INTRNS|P-CONJ-て> P-CONJ-て しまい <IP-MATˆTOP[VB-INTRNS]PPˆIP|VB-INTRNS> VB-INTRNS て ✥✥PPˆIP✥❵❵❵ ✭✭✭✭PPˆIP✭❤❤❤❤❤ 寝 NPˆPP P-OPTR-は NPˆPP P-CASE-で ✭✭✭✭✭❤❤❤❤❤ The auxiliary symbols that are created en- N は IP-EMBˆNP✭✭❤❤ N-STRUCTURAL で ✭✭✭ ❤❤❤ code information about the parent category, 弟 ✥✥PPˆIP✥❵❵❵ VB-TRNS AXD まま NPˆPP P-CASE-をつけ た the head child, the child that is generated next N を and the previously generated child.
Recommended publications
  • Language Structure: Phrases “Productivity” a Property of Language • Definition – Language Is an Open System
    Language Structure: Phrases “Productivity” a property of Language • Definition – Language is an open system. We can produce potentially an infinite number of different messages by combining elements differently. • Example – Words into phrases. An Example of Productivity • Human language is a communication system that bears some similarities to other animal communication systems, but is also characterized by certain unique features. (24 words) • I think that human language is a communication system that bears some similarities to other animal communication systems, but is also characterized by certain unique features, which are fascinating in and of themselves. (33 words) • I have always thought, and I have spent many years verifying, that human language is a communication system that bears some similarities to other animal communication systems, but is also characterized by certain unique features, which are fascinating in and of themselves. (42 words) • Although mainstream some people might not agree with me, I have always thought… Creating Infinite Messages • Discrete elements – Words, Phrases • Selection – Ease, Meaning, Identity • Combination – Rules of organization Models of Word reCombination 1. Word chains (Markov model) Phrase-level meaning is derived from understanding each word as it is presented in the context of immediately adjacent words. 2. Hierarchical model There are long-distant dependencies between words in a phrase, and these inform the meaning of the entire phrase. Markov Model Rule: Select and concatenate (according to meaning and what types of words should occur next to each other). bites bites bites Man over over over jumps jumps jumps house house house Markov Model • Assumption −Only adjacent words are meaningfully (and lawfully) related.
    [Show full text]
  • Chapter 30 HPSG and Lexical Functional Grammar Stephen Wechsler the University of Texas Ash Asudeh University of Rochester & Carleton University
    Chapter 30 HPSG and Lexical Functional Grammar Stephen Wechsler The University of Texas Ash Asudeh University of Rochester & Carleton University This chapter compares two closely related grammatical frameworks, Head-Driven Phrase Structure Grammar (HPSG) and Lexical Functional Grammar (LFG). Among the similarities: both frameworks draw a lexicalist distinction between morphology and syntax, both associate certain words with lexical argument structures, both employ semantic theories based on underspecification, and both are fully explicit and computationally implemented. The two frameworks make available many of the same representational resources. Typical differences between the analyses proffered under the two frameworks can often be traced to concomitant differ- ences of emphasis in the design orientations of their founding formulations: while HPSG’s origins emphasized the formal representation of syntactic locality condi- tions, those of LFG emphasized the formal representation of functional equivalence classes across grammatical structures. Our comparison of the two theories includes a point by point syntactic comparison, after which we turn to an exposition ofGlue Semantics, a theory of semantic composition closely associated with LFG. 1 Introduction Head-Driven Phrase Structure Grammar is similar in many respects to its sister framework, Lexical Functional Grammar or LFG (Bresnan et al. 2016; Dalrymple et al. 2019). Both HPSG and LFG are lexicalist frameworks in the sense that they distinguish between the morphological system that creates words and the syn- tax proper that combines those fully inflected words into phrases and sentences. Stephen Wechsler & Ash Asudeh. 2021. HPSG and Lexical Functional Gram- mar. In Stefan Müller, Anne Abeillé, Robert D. Borsley & Jean- Pierre Koenig (eds.), Head-Driven Phrase Structure Grammar: The handbook.
    [Show full text]
  • Lexical-Functional Grammar and Order-Free Semantic Composition
    COLING 82, J. Horeck~(eel) North-HoOandPubllshi~ Company O A~deml~ 1982 Lexical-Functional Grammar and Order-Free Semantic Composition Per-Kristian Halvorsen Norwegian Research Council's Computing Center for the Humanities and Center for Cognitive Science, MIT This paper summarizes the extension of the theory of lexical-functional grammar to include a formal, model-theoretic, semantics. The algorithmic specification of the semantic interpretation procedures is order-free which distinguishes the system from other theories providing model-theoretic interpretation for natural language. Attention is focused on the computational advantages of a semantic interpretation system that takes as its input functional structures as opposed to syntactic surface-structures. A pressing problem for computational linguistics is the development of linguistic theories which are supported by strong independent linguistic argumentation, and which can, simultaneously, serve as a basis for efficient implementations in language processing systems. Linguistic theories with these properties make it possible for computational implementations to build directly on the work of linguists both in the area of grammar-writing, and in the area of theory development (cf. universal conditions on anaphoric binding, filler-gap dependencies etc.). Lexical-functional grammar (LFG) is a linguistic theory which has been developed with equal attention being paid to theoretical linguistic and computational processing considerations (Kaplan & Bresnan 1981). The linguistic theory has ample and broad motivation (vide the papers in Bresnan 1982), and it is transparently implementable as a syntactic parsing system (Kaplan & Halvorsen forthcoming). LFG takes grammatical relations to be of primary importance (as opposed to the transformational theory where grammatical functions play a subsidiary role).
    [Show full text]
  • Psychoacoustics, Speech Perception, Language Structure and Neurolinguistics Hearing Acuity Absolute Auditory Threshold Constant
    David House: Psychoacoustics, speech perception, 2018.01.25 language structure and neurolinguistics Hearing acuity Psychoacoustics, speech perception, language structure • Sensitive for sounds from 20 to 20 000 Hz and neurolinguistics • Greatest sensitivity between 1000-6000 Hz • Non-linear perception of frequency intervals David House – E.g. octaves • 100Hz - 200Hz - 400Hz - 800Hz - 1600Hz – 100Hz - 800Hz perceived as a large difference – 3100Hz - 3800 Hz perceived as a small difference Absolute auditory threshold Demo: SPL (Sound pressure level) dB • Decreasing noise levels – 6 dB steps, 10 steps, 2* – 3 dB steps, 15 steps, 2* – 1 dB steps, 20 steps, 2* Constant loudness levels in phons Demo: SPL and loudness (phons) • 50-100-200-400-800-1600-3200-6400 Hz – 1: constant SPL 40 dB, 2* – 2: constant 40 phons, 2* 1 David House: Psychoacoustics, speech perception, 2018.01.25 language structure and neurolinguistics Critical bands • Bandwidth increases with frequency – 200 Hz (critical bandwidth 50 Hz) – 800 Hz (critical bandwidth 80 Hz) – 3200 Hz (critical bandwidth 200 Hz) Critical bands demo Effects of masking • Fm=200 Hz (critical bandwidth 50 Hz) – B= 300,204,141,99,70,49,35,25,17,12 Hz • Fm=800 Hz (critical bandwidth 80 Hz) – B=816,566,396,279,197,139,98,69,49,35 Hz • Fm=3200 Hz (critical bandwidth 200 Hz) – B=2263,1585,1115,786,555,392,277,196,139,98 Hz Effects of masking Holistic vs. analytic listening • Low frequencies more effectively mask • Demo 1: audible harmonics (1-5) high frequencies • Demo 2: melody with harmonics • Demo: how
    [Show full text]
  • Phrase Structure Rules, Tree Rewriting, and Other Sources of Recursion Structure Within the NP
    Introduction to Transformational Grammar, LINGUIST 601 September 14, 2006 Phrase Structure Rules, Tree Rewriting, and other sources of Recursion Structure within the NP 1 Trees (1) a tree for ‘the brown fox sings’ A ¨¨HH ¨¨ HH B C ¨H ¨¨ HH sings D E the ¨¨HH F G brown fox • Linguistic trees have nodes. The nodes in (1) are A, B, C, D, E, F, and G. • There are two kinds of nodes: internal nodes and terminal nodes. The internal nodes in (1) are A, B, and E. The terminal nodes are C, D, F, and G. Terminal nodes are so called because they are not expanded into anything further. The tree ends there. Terminal nodes are also called leaf nodes. The leaves of (1) are really the words that constitute the sentence ‘the brown fox sings’ i.e. ‘the’, ‘brown’, ‘fox’, and ‘sings’. (2) a. A set of nodes form a constituent iff they are exhaustively dominated by a common node. b. X is a constituent of Y iff X is dominated by Y. c. X is an immediate constituent of Y iff X is immediately dominated by Y. Notions such as subject, object, prepositional object etc. can be defined structurally. So a subject is the NP immediately dominated by S and an object is an NP immediately dominated by VP etc. (3) a. If a node X immediately dominates a node Y, then X is the mother of Y, and Y is the daughter of X. b. A set of nodes are sisters if they are all immediately dominated by the same (mother) node.
    [Show full text]
  • Ccgbank: a Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
    CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank ∗ Julia Hockenmaier University of Pennsylvania ∗∗ Mark Steedman University of Edinburgh This article presents an algorithm for translating the Penn Treebank into a corpus of Combina- tory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus, CCGbank, includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium, and has been used to train wide- coverage statistical parsers that obtain state-of-the-art rates of dependency recovery. In order to obtain linguistically adequate CCG analyses, and to eliminate noise and incon- sistencies in the original annotation, an extensive analysis of the constructions and annotations in the Penn Treebank was called for, and a substantial number of changes to the Treebank were necessary. We discuss the implications of our findings for the extraction of other linguistically expressive grammars from the Treebank, and for the design of future treebanks. 1. Introduction In order to understand a newspaper article, or any other piece of text, it is necessary to construct a representation of its meaning that is amenable to some form of inference. This requires a syntactic representation which is transparent to the underlying seman- tics, making the local and long-range dependencies between heads, arguments, and modifiers explicit. It also requires a grammar that has sufficient coverage to deal with the vocabulary and the full range of constructions that arise in free text, together with a parsing model that can identify the correct analysis among the many alternatives that such a wide-coverage grammar will generate even for the simplest sentences.
    [Show full text]
  • The Minimalist Program 1 1 the Minimalist Program
    The Minimalist Program 1 1 The Minimalist Program Introduction It is my opinion that the implications of the Minimalist Program (MP) are more radical than generally supposed. I do not believe that the main thrust of MP is technical; whether to move features or categories for example. MP suggests that UG has a very different look from the standard picture offered by GB-based theories. This book tries to make good on this claim by outlining an approach to grammar based on one version of MP. I stress at the outset the qualifier “version.” Minimalism is not a theory but a program animated by certain kinds of methodological and substantive regulative ideals. These ideals are reflected in more concrete principles which are in turn used in minimalist models to analyze specific empirical phenomena. What follows is but one way of articulating the MP credo. I hope to convince you that this version spawns grammatical accounts that have a theoretically interesting structure and a fair degree of empirical support. The task, however, is doubly difficult. First, it is unclear what the content of these precepts is. Second, there is a non-negligible distance between the content of such precepts and its formal realization in specific grammatical principles and analyses. The immediate task is to approach the first hurdle and report what I take the precepts and principles of MP to be.1 1 Principles-Parameters and Minimalism MP is many things to many researchers. To my mind it grows out of the per- ceived success of the principles and parameters (P&P) approach to grammatical competence.
    [Show full text]
  • Lexical Functional Grammar Carol Neidle, Boston University
    Lexical Functional Grammar Carol Neidle, Boston University The term Lexical Functional Grammar (LFG) first appeared in print in the 1982 volume edited by Joan Bresnan: The Mental Representation of Grammatical Relations, the culmination of many years of research. LFG differs from both transformational grammar and relational grammar in assuming a single level of syntactic structure. LFG rejects syntactic movement of constituents as the mechanism by which the surface syntactic realization of arguments is determined, and it disallows alteration of grammatical relations within the syntax. A unique constituent structure, corresponding to the superficial phrase structure tree, is postulated. This is made possible by an enriched lexical component that accounts for regularities in the possible mappings of arguments into syntactic structures. For example, the alternation in the syntactic position in which the logical object (theme argument) appears in corresponding active and passive sentences has been viewed by many linguists as fundamentally syntactic in nature, resulting, within transformational grammar, from syntactic movement of constituents. However, LFG eliminates the need for a multi-level syntactic representation by allowing for such alternations to be accomplished through regular, universally constrained, productive lexical processes that determine multiple sets of associations of arguments (such as agent, theme) with grammatical functions (such as SUBJECT, OBJECT)— considered within this framework to be primitives—which then map directly into the syntax. This dissociation of syntactic structure from predicate argument structure (a rejection of Chomsky’s Projection Principle, in essence) is crucial to the LFG framework. The single level of syntactic representation, constituent structure (c-structure), exists simultaneously with a functional structure (f-structure) representation that integrates the information from c-structure and from the lexicon.
    [Show full text]
  • LSA 112 Notes on Basic Phrase Structure
    LSA 112 Notes on basic phrase structure Ida Toivonen 1 Syntactic phrases A syntactic phrase is a syntactic unit that consists of one or more words. We’ll assume here that the major word classes project phrases. By ‘major’ word classes, we mean the lexical (N, V, A, Adv, P) as opposed to functional word classes (Det, Conj, Comp, Aux). Each syntactic phrase has a head. • The head of a noun phrase (NP) is a noun. • The head of a verb phrase (VP) is a verb. • The head of an adjective phrase (AP) is an adjective. • The head of an adverb phrase (AdvP) is an adverb. • The head of a prepositional phrase (PP) is a preposition. 2 Constituents The smallest syntactic unit of a sentence is a word. Larger units are phrases. A sentence (S) is of course also a unit. These units are constituents of a sentence. Each word is a constituent, each phrase is a constituent, and the sentence itself (S) is a constituent. (1) Mary saw the cat. (2) [[[Mary]] [[saw][[the][cat]]]] (3) [S[NP [N Mary]] [VP [V saw][NP [Det the][N cat]]]] Sometimes one might want to show a simplified representation. For example, one may decide that it is unnecessary (in a given situation) to bracket the individual words. (Note that Mary is the only word in the NP.) (4) [S[NP Mary] [VP saw [NP the cat]]] An alternative notation to bracketing notation is the syntactic tree (phrase marker, phrase struc- ture tree). 1 (5) S NP VP N V NP Mary saw det N the cat Phrase markers represent the hierarchical structure and linear order of sentences.
    [Show full text]
  • The Psycholinguistics of Ellipsis
    Available online at www.sciencedirect.com ScienceDirect Lingua 151 (2014) 78--95 www.elsevier.com/locate/lingua Review The psycholinguistics of ellipsis a,b, a Colin Phillips *, Dan Parker a Department of Linguistics, University of Maryland, College Park, 20742, USA b Program in Neuroscience and Cognitive Science, University of Maryland, College Park, 20742, USA Received 13 September 2012; received in revised form 19 September 2013; accepted 4 October 2013 Available online 27 November 2013 Abstract This article reviews studies that have used experimental methods from psycholinguistics to address questions about the representa- tion of sentences involving ellipsis. Accounts of the structure of ellipsis can be classified based on three choice points in a decision tree. First: does the identity constraint between antecedents and ellipsis sites apply to syntactic or semantic representations? Second: does the ellipsis site contain a phonologically null copy of the structure of the antecedent, or does it contain a pronoun or pointer that lacks internal structure? Third: if there is unpronounced structure at the ellipsis site, does that structure participate in all syntactic processes, or does it behave as if it is genuinely absent at some levels of syntactic representation? Experimental studies on ellipsis have begun to address the first two of these questions, but they are unlikely to provide insights on the third question, since the theoretical contrasts do not clearly map onto timing predictions. Some of the findings that are emerging in studies on ellipsis resemble findings from earlier studies on other syntactic dependencies involving wh-movement or anaphora. Care should be taken to avoid drawing conclusions from experiments about ellipsis that are known to be unwarranted in experiments about these other dependencies.
    [Show full text]
  • Syntax Corrected
    01:615:201 Introduction to Linguistic Theory Adam Szczegielniak Syntax: The Sentence Patterns of Language Copyright in part: Cengage learning Learning Goals • Hierarchical sentence structure • Word categories • X-bar • Ambiguity • Recursion • Transformaons Syntax • Any speaker of any human language can produce and understand an infinite number of possible sentences • Thus, we can’ t possibly have a mental dictionary of all the possible sentences • Rather, we have the rules for forming sentences stored in our brains – Syntax is the part of grammar that pertains to a speaker’ s knowledge of sentences and their structures What the Syntax Rules Do • The rules of syntax combine words into phrases and phrases into sentences • They specify the correct word order for a language – For example, English is a Subject-Verb-Object (SVO) language • The President nominated a new Supreme Court justice • *President the new Supreme justice Court a nominated • They also describe the relationship between the meaning of a group of words and the arrangement of the words – I mean what I say vs. I say what I mean What the Syntax Rules Do • The rules of syntax also specify the grammatical relations of a sentence, such as the subject and the direct object – Your dog chased my cat vs. My cat chased your dog • Syntax rules specify constraints on sentences based on the verb of the sentence *The boy found *Disa slept the baby *The boy found in the house Disa slept The boy found the ball Disa slept soundly Zack believes Robert to be a gentleman *Zack believes to be a gentleman Zack tries to be a gentleman *Zack tries Robert to be a gentleman What the Syntax Rules Do • Syntax rules also tell us how words form groups and are hierarchically ordered in a sentence “ The captain ordered the old men and women of the ship” • This sentence has two possible meanings: – 1.
    [Show full text]
  • Lexical-Functional Grammar∗
    Lexical-Functional Grammar∗ Ash Asudeh & Ida Toivonen August 10, 2009 [19:17] CORRECTED FINAL DRAFT To appear in Bernd Heine and Heiko Narrog, eds., The Oxford Handbook of Linguistic Analysis. Oxford: Oxford University Press. Contents 1 Introduction 1 2 C-structure and F-structure 2 2.1 C-structure ........................................ 3 2.2 F-structure ......................................... 4 3 Structures and Structural Descriptions 8 3.1 Constraints on C-structures ................................ 8 3.2 Constraints on F-structures ................................ 10 4 The C-structure to F-structure Correspondence 15 4.1 How the C-structure to F-structure Mapping Works ................... 15 4.2 Flexibility in Mapping .................................. 18 5 The Correspondence Architecture 19 6 Some Recent Developments 20 7 Concluding Remarks 21 A Example: Unbounded Dependency, Adjuncts, Raising, Control 23 References 25 ∗This work was supported by the Social Sciences and Humanities Research Council of Canada, Standard Research Grant 410-2006-1650. We would like to thank Mary Dalrymple, Bernd Heine, Daniela Isac, Helge Lødrup, Kumiko Murasugi, Heiko Narrog, Kenji Oda, Chris Potts, and Nigel Vincent for helpful comments and suggestions. We would also like to thank Daniela Isac and Cristina Moldoveanu for checking the Romanian data. Any remaining errors are our own. i Abstract Lexical-Functional Grammar (LFG) is a lexicalist, declarative (non-transformational), constraint- based theory of generative grammar. LFG has a detailed, industrial-strength computational imple- mentation. The theory has also proven useful for descriptive/documentary linguistics. The gram- matical architecture of LFG, sometimes called the ‘Correspondence Architecture’, posits that dif- ferent kinds of linguistic information are modelled by distinct, simultaneously present grammatical structures, each having its own formal representation.
    [Show full text]