Discourse-Givenness of Noun Phrases Theoretical and Computational Models

Total Page:16

File Type:pdf, Size:1020Kb

Discourse-Givenness of Noun Phrases Theoretical and Computational Models Discourse-Givenness of Noun Phrases Theoretical and Computational Models Dissertation zur Erlangung des akademischen Grades Doktor der Philosophie (Dr. phil.) der Humanwissenschaftlichen Fakult¨at der Universit¨atPotsdam vorgelegt von Julia Ritz Stuttgart, Mai 2013 Published online at the Institutional Repository of the University of Potsdam: URL http://opus.kobv.de/ubp/volltexte/2014/7081/ URN urn:nbn:de:kobv:517-opus-70818 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-70818 Erkl¨arung(Declaration of Authenticity of Work) Hiermit erkl¨areich, dass ich die beigef¨ugteDissertation selbstst¨andigund ohne die un- zul¨assigeHilfe Dritter verfasst habe. Ich habe keine anderen als die angegebenen Quellen und Hilfsmittel genutzt. Alle w¨ortlich oder inhaltlich ¨ubernommenen Stellen habe ich als solche gekennzeichnet. Ich versichere außerdem, dass ich die Dissertation in der gegenw¨artigenoder einer an- deren Fassung nur in diesem und keinem anderen Promotionsverfahren eingereicht habe, und dass diesem Promotionsverfahren keine endg¨ultiggescheiterten Promotionsverfahren vorausgegangen sind. Im Falle des erfolgreichen Abschlusses des Promotionsverfahrens erlaube ich der Fakult¨at, die angef¨ugtenZusammenfassungen zu ver¨offentlichen. Ort, Datum Unterschrift iii iv iv Contents Preface xiii 1 Introduction 1 1.1 Discourse-Givenness and Related Concepts . .3 1.2 Goal . .4 1.3 Motivation . .6 1.4 The Field of Discourse-Givenness Classification . 10 1.5 Structure of this Document . 11 2 Concepts and Definitions 13 2.1 Basic Concepts and Representation . 13 2.1.1 Basic Concepts in Discourse . 14 2.1.2 Formal Representation . 15 2.2 Reference . 18 2.2.1 Identifying Basic Units . 20 2.2.2 Non-Referring Use of Expressions . 24 2.2.3 World of Reference . 25 2.2.4 Specificity . 27 2.2.5 Generalizations . 29 2.2.6 Vagueness and Ambiguity . 33 2.3 Coreference . 35 2.3.1 Consequences of Vagueness, Ambiguity and Non-Specificity . 36 2.3.2 Coreference vs. Lexical, Intensional or Extensional Identity . 37 2.3.3 Identity of Binding Index . 39 2.3.4 Asserted Identity . 42 2.3.5 Identity and Time Dependence . 45 2.3.6 Accommodation of Referents . 45 2.3.7 Bridging . 46 2.4 Coreference, Discourse-Givenness and Information Status . 48 3 Corpora and Annotation Schemes 53 3.1 Corpora Annotated with Discourse-Givenness or Coreference . 53 3.1.1 MUC-7 . 54 3.1.2 Nissim's Annotation of the Switchboard Corpus . 55 3.1.3 OntoNotes . 55 3.1.4 ARRAU . 55 v vi 3.1.5 PCC . 56 3.1.6 T¨uBa-D/Z. 56 3.1.7 DIRNDL . 57 3.2 Comparison of Annotation Schemes . 57 3.2.1 Markables and Preconditions . 58 3.2.1.1 Markables . 58 3.2.1.2 Preconditions . 68 3.2.2 Coreference and Context . 69 3.2.2.1 Discourse-Givenness and Coreference . 69 3.2.2.2 Notions of Context . 74 3.2.3 Formalization and Evaluation of the Annotation . 74 3.2.3.1 Formalization . 74 3.2.3.2 Evaluation . 75 3.2.4 A Critical Assessment of Existing Corpora . 77 4 Related Work and Technical Background 81 4.1 Data and Categories . 81 4.2 Features . 83 4.2.1 Properties of NPs . 83 4.2.2 NPs' Relations to their Context . 85 4.3 Algorithms . 87 4.3.1 Evaluation Measures . 93 4.4 Experiments and Results . 95 5 Discourse-Givenness Classification: New Experiments and Results 101 5.1 Data . 101 5.2 Methods . 102 5.2.1 Features . 102 5.2.2 Algorithms and Evaluation Measures . 109 5.3 Classification Results . 110 5.3.1 Quantitative Results . 110 5.3.1.1 OntoNotes . 110 5.3.1.2 MUC-7 . 116 5.3.1.3 ARRAU . 123 5.3.1.4 T¨uBa-D/Z . 126 5.3.2 Discussion . 132 5.3.2.1 Machine Learning Aspects . 132 5.3.2.2 Linguistic Aspects . 134 5.4 Comparison to Related Work . 135 5.4.1 OntoNotes . 136 5.4.2 MUC-7 . 136 5.4.3 Related Work in General . 136 6 Conclusions and Outlook 139 6.1 Conclusions . 139 6.2 Outlook . 140 vi vii 7 Summaries 143 7.1 Summary in English . 143 7.2 Summary in German (Zusammenfassung in deutscher Sprache)...... 146 References 150 Index 166 vii viii viii List of Figures 1 Coreference visualisation in ANNIS2 . xiv 1.1 Example Discourse . .1 1.2 Example Structure according to Rhetorical Structure Theory (RST) . .9 1.3 Coreference Resolution: Terminology . 10 2.1 DRS of `Jones owns Ulysses. It fascinates him.' . 16 2.2 DRS of `Susan has found every book which Bill needs. They are on his desk.' . 17 2.3 DRS of `Most students bought books that would keep them fully occupied during the next two weeks.' . 17 2.4 DRS of `Mary wants to marry a rich man. He is a banker.' . 26 2.5 DRS of `Mary wants to marry a rich man. He must be a banker.' . 26 2.6 DRS of `If parents are dissatisfied with a school, they should have the option of switching to another.' . 27 2.7 DRS of `Parents should have the option of switching to another school, if they are dissatisfied with the current school.' . 27 2.8 DRS of `Jones does not own a Porsche.' . 30 2.9 DRS of `A wolf takes a mate for life.' . 30 2.10 DRS of `Sheep are quadrupedal, ruminant mammals typically kept as live- stock. Sheep are members of the order Artiodactyla.' . 33 2.11 Syntactic Tree Containing Traces . 42 2.12 DRS for `John took Mary to Acapulco. They had a lousy time.' . 46 2.13 DRS for `The students went camping. They enjoyed it.' . 47 2.14 Prince's (1981) Hierarchical Scheme of Familiarity . 49 2.15 Information Status, Discourse Status, and Hearer Status . 50 3.1 Syntactic Analysis of Possessives in MUC-7 . 60 3.2 Syntactic Analysis of Possessives in OntoNotes 1.0 . 60 3.3 Syntactic analysis of a PP with Fusion of Preposition and Definite Determiner 63 4.1 Example Decision Tree for Discourse-Givenness. 88 4.2 Pseudo-code for Decision Tree Construction . 88 4.3 Example Rules for Discourse-Givenness. 89 4.4 Pseudo-code for Rule Learners . 89 4.5 Support Vector Machines . 91 5.1 OntoNotes 1.0: Decision Tree . 113 ix x 5.2 OntoNotes 1.0: Learning Curve . 114 5.3 MUC-7: Decision Tree . 118 5.4 MUC-7: Learning Curve (original split) . 120 5.5 MUC-7: Learning Curve (random split) . 122 5.6 MUC-7: Learning Curve (all vs. Formaleval) . 122 5.7 ARRAU 1.2: Learning Curve . 124 5.8 ARRAU 1.2: Decision Tree . 125 5.9 T¨uBa-D/Z6.0: Learning Curve . 129 5.10 T¨uBa-D/Z6.0: Decision Tree (coreferential/anaphoric/bound) . 130 5.11 T¨uBa-D/Z6.0: Decision Tree (coreferential) . 131 x List of Tables 2.1 Types of Reference (Tentative) . 19 2.2 Specificity, Genericity and Coreference . 32 2.3 Concepts and Definitions . 51 3.1 Overview of Corpora Annotated with Coreference/Discourse-Givenness . 54 3.2 Markable Definitions (part I: English Corpora) . 64 3.2 Markable Definitions (part II: German Corpora) . 66 3.3 Definitions of Coreference (part I: English Corpora) . 70 3.3 Definitions of Coreference (part II: German Corpora) . 72 4.1 Overview: Approaches to Identifying Information Structure . 82 4.2 Comparison of Learning Algorithms . 92 4.3 Naming Conventions for Evaluation . 93 4.4 Example Contingency Table . 95 4.5 State of the Art Classification Results (part I) . 96 4.5 State of the Art Classification Results.
Recommended publications
  • Questions Under Discussion: from Sentence to Discourse∗
    Questions under Discussion: From sentence to discourse∗ Anton Benz Katja Jasinskaja July 14, 2016 Abstract Questions under discussion (QUD) is an analytic tool that has recently become more and more popular among linguists and language philosophers as a way to characterize how a sentence fits in its context. The idea is that each sentence in discourse addresses a (of- ten implicit) QUD either by answering it, or by bringing up another question that can help answering that QUD. The linguistic form and the interpretation of a sentence, in turn, may depend on the QUD it addresses. The first proponents of the QUD approach (von Stut- terheim & Klein, 1989; van Kuppevelt, 1995) thought of it as a general approach to the analysis of discourse structure where structural relations between sentences in a coherent discourse are understood in terms of relations between questions they address. However, the idea did not find broad application in discourse structure analysis, where theories aiming at logical discourse representations are predominantly based on the notion of coherence re- lations (Mann & Thompson, 1988; Hobbs, 1985; Asher & Lascarides, 2003). The problem of finding overt markers for QUDs means that they are also difficult to utilize for quanti- tative measures of discourse coherence (McNamara et al., 2010). At the same time QUD proved useful in the analysis of a wide range of linguistic phenomena including accentua- tion, discourse particles, presuppositions and implicatures. The goal of this special issue is to bring together cutting-edge research that demonstrates the success of QUD in linguistics and the need for the development of a comprehensive model of discourse coherence that incorporates the notion of QUD as its integral part.
    [Show full text]
  • From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD
    PhænEx 13, no. 2 (Winter 2020): 38-51 © 2020 Iris Hennigfeld From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD The program of the phenomenological movement, “Back to the things themselves!” does not point to any particular topic of phenomenological investigation, for example the question of being (ontology) or the question of God (metaphysics). Rather, it designates a specific methodological approach toward the things, and, as its final aim, toward a particular mode in which the things are given in experience. One particular mode in which the things are given with full evidence and in original intuition is “self-givenness”. In the experience of self-givenness, the object is fully present, or, in Edmund Husserl’s words, given in “originary presentive intuition” (Husserl, Ideas I 44, cf. 36). The correlative act on the part of the subject, the noesis, is a “seeing” which is not restricted to the senses but must be understood “in the universal sense as an originally presentive consciousness of any kind whatever” (Husserl, Ideas I 36). Self-givenness also represents the criterion of evidence and truth. The phenomenological principles of givenness and self-givenness can be made fruitful in particular for a description and investigation of those phenomena which, according to their very nature, resist other theoretical approaches. Such more “resilient” phenomena include interpersonal, religious or aesthetic experiences. Presentation becomes, in accordance with philosophical tradition, the dominant model in phenomenological research. But first-person experience shows that presentation is not the only mode of givenness and self-givenness. Rather, pushing phenomenology to its limits, the full range of experience shows that there are certain kinds of phenomena of which the genuine mode of givenness cannot be reduced to the way in which objects are presented to a subject.
    [Show full text]
  • Givenness and Its Realization in a Linguistic and in a Non- Linguistic Environment
    VILNIUS PEDAGOGICAL UNIVERSITY FACULTY OF FOREIGN LANGUAGES DEPARTMENT OF ENGLISH PHILOLOGY NATALIJA ZACHAROVA GIVENNESS AND ITS REALIZATION IN A LINGUISTIC AND IN A NON- LINGUISTIC ENVIRONMENT MA Paper Academic advisor: prof. Dr. Hab. Laimutis Valeika Vilnius, 2008 VILNIUS PEDAGOGICAL UNIVERSITY FACULTY OF FOREIGN LANGUAGES DEPARTMENT OF ENGLISH PHILOLOGY GIVENNESS AND ITS REALIZATION IN A LINGUISTIC AND IN A NON- LINGUISTIC ENVIRONMENT This MA paper is submitted in partial fulfillment of requirements for the degree of the MA in English Philology By Natalija Zacharova I declare that this study is my own and does not contain any unacknowledged work from any source. (Signature) (Date) Academic advisor: prof. Dr. Hab. Laimutis Valeika (Signature) (Date) Vilnius, 2008 2 CONTENTS ABSTRACT………………………………………………………………………….4 INTRODUCTION…………………………………………………………………...5 1. THE PROBLEMS OF THE INFORMATIONAL STRUCTURE OF THE SENTENCE ……………………………………………………………….8 1.1. The sentence as dialectical entity of given and new……………………...8 1.2. Givenness vs. Newnness………………………………………………….11 1.3. The realization of Givenness……………………………………………..12 1.4. Givenness expressed by the definite article ……………………………..15 1.5. Givenness expressed by the indefinite article…………………………….19 1.6. Givenness expressed by semi-grammatical definite determiners and lexical determiners………………………………………………………..20 2. THE REALIZATION OF GIVENNESS IN A LINGUISTIC ENVIRONMENT……………………………………………………………………21 2.1. Anaphoric Givenness………………………………………………………22 2.2. Cataphoric Givenness……………………………………………………...29 2.3. Givenness expressed by the use of the indefinite article…………………..30 3. THE REALIZATION OF GIVENNESS IN A NON- LINGUISTIC ENVIRONMENT……………………………………………………………………..32 3. 1. The environment of the home……………………………………………...33 3.2. The environment of the town/country, world………………………………35 3.3. The environment of the universe…………………………………………....37 3.4. Cultural environment……………………………………………………….38 4.
    [Show full text]
  • UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations
    UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations Title Syntax & Information Structure: The Grammar of English Inversions Permalink https://escholarship.org/uc/item/2sv7q1pm Author Samko, Bern Publication Date 2016 License https://creativecommons.org/licenses/by-nc-nd/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA SANTA CRUZ SYNTAX & INFORMATION STRUCTURE: THE GRAMMAR OF ENGLISH INVERSIONS A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in LINGUISTICS by Bern Samko June 2016 The Dissertation of Bern Samko is approved: Professor Jim McCloskey, chair Associate Professor Pranav Anand Associate Professor Line Mikkelsen Assistant Professor Maziar Toosarvandani Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright © by Bern Samko 2016 Contents Acknowledgments x 1 Introduction 1 1.1 Thequestions ............................... 1 1.1.1 Summaryofresults ........................ 2 1.2 Syntax................................... 6 1.3 InformationStructure . .. .. .. .. .. .. .. 7 1.3.1 Topic ............................... 8 1.3.2 Focus ............................... 11 1.3.3 Givenness............................. 13 1.3.4 The distinctness of information structural notions . ....... 14 1.3.5 TheQUD ............................. 17 1.4 The relationship between syntax and information structure ....... 19 1.5 Thephenomena .............................. 25 1.5.1
    [Show full text]
  • Focus and Givenness: a Unified Approach Michael Wagner Mcgill University
    1 Focus and Givenness: A Unified Approach Michael Wagner McGill University Abstract Theories about information structure differ in treating focus-, contrast-, and givenness-marking as either a single phenomenon or as distinct phenomena. This paper argues for a unified approach, following the lead of Schwarzschild (1999), but a unified approach based on local alternatives (Wagner, 2005, 2006b).1 The basic insights of the local alternatives approach are captured using a new formalization, which aims to fix several problems in the original proposal noted in B¨uring (2008). The unified approach is compared with alternative approaches that invoke separate mechanisms for givenness and focus: the disanaphora approach (Williams, 1997), the F-Projection account (Selkirk, 1984, 1995), and the reference set approach in Szendr¨oi (2001) and Reinhart (2006). All of these have fundamental problems, and the requisite revisions render them equivalent to the local alternatives approach. Keywords: alternatives, focus, givenness, prosody, contrast, information struc- ture 1 The label ‘local alternatives’ for the ‘relative giveneness’ account in Wagner (2005, 2006b) was proposed in Rooth (2007) and seems like a more appropriate term. 2 Michael Wagner 1.1 Three Phenomena, or Two, or One? The distribution of accents in a sentence in English is affected by informa- tion structure in a systematic way. Information structure comprises a broad set of phenomena. This paper looks at three types of effects that influence the prosody of a sentence, those of question-answer congruence, contrast, and givenness, and presents evidence—some new, some already presented in Wag- ner (2005, 2006b)—that all three should be treated as reflexes of the same underlying phenomenon.
    [Show full text]
  • Coreference Resolution for Spoken French Loïc Grobol
    Coreference resolution for spoken French Loïc Grobol To cite this version: Loïc Grobol. Coreference resolution for spoken French. Computation and Language [cs.CL]. Université Sorbonne Nouvelle - Paris 3, 2020. English. tel-02928209v2 HAL Id: tel-02928209 https://hal.archives-ouvertes.fr/tel-02928209v2 Submitted on 9 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License École Doctorale 622 — Sciences du langage Lattice, Inria Thèse de doctorat en sciences du langage de l’Université Sorbonne Nouvelle Coreference resolution for spoken French présentée et soutenue publiquement par Loïc Grobol le 15 juillet 2020 Sous la direction d’Isabelle Tellier† et de Frédéric Landragin Et co-encadrée par Éric Villemonte de la Clergerie et Marco Dinarelli Jury : Massimo Poesio, Professor (Queen Mary University), Rapporteur, Sophie Rosset, Directrice de recherche (LIMSI), Rapportrice, Béatrice Daille, Professeur (Université de Nantes), Examinatrice, Pascal Amsili, Professeur (Université Sorbonne Nouvelle), Examinateur, Frédéric Landragin, Directeur de recherche (Lattice), Directeur, Éric Villemonte de la Clergerie, Chargé de recherche (Inria), Co-encadrant, Marco Dinarelli, Chargé de recherche (LIG), Co-encadrant. Reconnaissance automatique de chaînes de coréférences en français parlé Résumé Une chaîne de coréférences est l’ensemble des expressions linguistiques — ou mentions — qui font référence à une même entité ou un même objet du discours.
    [Show full text]
  • BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis a Thesis Submitted Fox 1 the Degree of Doctor of Philosophy Of
    BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis A thesis submitted fox1 the degree of Doctor of Philosophy of the University of London School of Oriental and African Studies University of London 1979 ProQuest Number: 10731354 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10731354 Published by ProQuest LLC(2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 II Abstract The present thesis aims to describe some aspects of Mod Greek syntax.It contains an introduction and five chapters. The introduction states the purpose for writing this thesis and points out the fact that it is a data-oriented rather, chan a theory-^oriented work. Chapter one deals with the word order in Mod Greek. The main conclusion drawn from this chapter is that, given the re­ latively rich system of inflexions of Mod Greek,there is a freedom of word order in this language;an attempt is made to account for this phenomenon in terms of the thematic structure. of the sentence and PSP theory. The second chapter examines the clitics;special attention is paid to clitic objects and some problems concerning their syntactic relations .to the rest of the sentence are pointed out;the chapter ends with the tentative suggestion that cli­ tics might be taken care of by the morphologichi component of the grammar• Chapter three deals with complementation;this a vast area of study and-for this reason the analysis is confined to 'oti1, 'na* and'pu' complement clauses; Object Raising, Verb Raising and Extraposition are also discussed in this chapter.
    [Show full text]
  • Mind the GAP: a Balanced Corpus of Gendered Ambiguous Pronouns
    Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns Kellie Webster and Marta Recasens and Vera Axelrod and Jason Baldridge Google AI Language fwebsterk|recasens|vaxelrod|[email protected] Abstract (1) In May, Fujisawa joined Mari Motohashi’s rink as the team’s skip, moving back from Karuizawa to Ki- Coreference resolution is an important task tami where she had spent her junior days. for natural language understanding, and the resolution of ambiguous pronouns a long- With this scope, we make three key contributions: standing challenge. Nonetheless, existing • We design an extensible, language- corpora do not capture ambiguous pronouns in sufficient volume or diversity to accu- independent mechanism for extracting rately indicate the practical utility of mod- challenging ambiguous pronouns from text. els. Furthermore, we find gender bias in • We build and release GAP, a human-labeled existing corpora and systems favoring mas- corpus of 8,908 ambiguous pronoun-name culine entities. To address this, we present pairs derived from Wikipedia.2 This dataset and release GAP, a gender-balanced labeled targets the challenges of resolving naturally- corpus of 8,908 ambiguous pronoun-name occurring ambiguous pronouns and rewards pairs sampled to provide diverse coverage systems which are gender-fair. of challenges posed by real-world text. We explore a range of baselines which demon- • We run four state-of-the-art coreference re- strate the complexity of the challenge, the solvers and several competitive simple base- best achieving just 66.9% F1. We show lines on GAP to understand limitations in that syntactic structure and continuous neu- current modeling, including gender bias.
    [Show full text]
  • Constraints on Donkey Pronouns
    Constraints on Donkey Pronouns The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Grosz, P. G., P. Patel-Grosz, E. Fedorenko, and E. Gibson. “Constraints on Donkey Pronouns.” Journal of Semantics 32, no. 4 (July 15, 2014): 619–648. As Published http://dx.doi.org/10.1093/jos/ffu009 Publisher Oxford University Press Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/102962 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ CONSTRAINTS ON DONKEY PRONOUNS Patrick Grosz, Pritty Patel-Grosz, Evelina Fedorenko, Edward Gibson Abstract This paper reports on an experimental study of donkey pronouns, pronouns (e.g. it) whose meaning covaries with that of a non-pronominal noun phrase (e.g. a donkey) even though they are not in a structural relationship that is suitable for quantifier-variable binding. We investigate three constraints, (i) the preference for the presence of an overt NP antecedent that is not part of another word, (ii) the salience of the position of an antecedent that is part of another word, and (iii) the uniqueness of an intended antecedent (in terms of world knowledge). We compare constructions in which intended antecedents occur in a context such as who owns an N / who is an N-owner with constructions of the type who was without an N / who was N-less. Our findings corroborate the existence of the overt NP antecedent constraint, and also show that the salience of an unsuitable antecedent’s position matters.
    [Show full text]
  • Definiteness in Languages with and Without Articles
    Definiteness in languages with and without articles Jingting Ye & Laura Becker Fudan University & Leipzig University 13th Conference on Typology and Grammar for Young Scholars 24.11.2016 ILS RAN, St Petersburg Jingting Ye & Laura Becker Definiteness and articles 1 / 49 1 Introduction 2 (In)definiteness and specificity 3 Outline: A pilot study based on multiple parallel movie subtitles 4 Results Comparing the use of articles Examples Factor strength based on random forests Other strategies to mark (in)definiteness Demonstratives Adnominal indefinites The numeral one Word order The levels of givenness Relevant factors for definiteness Trees Random forests 5 Concluding remarks Jingting Ye & Laura Becker Definiteness and articles 2 / 49 Introduction Introduction There are many accounts for definiteness, however, most rely on language-specific expressions, e.g. definite articles. Although comparative studies exist, no empirically based cross-linguistic study seems to be available yet that makes expressions of definiteness comparable directly. This pilot study explores the possibilities of parallel texts for comparing the expression of definiteness in languages with and without articles. The languages we examined are German, Hungarian, Russian, and Chinese. Jingting Ye & Laura Becker Definiteness and articles 3 / 49 (In)definiteness and specificity Definiteness Definiteness has been associated with the following concepts: uniqueness (Frege 1892; Strawson 1950; Heim & Kratzer 1998; Stanley & Gendler Szabó 2000) familiarity (Heim 1988; Roberts 2003; Chierchia 1995)
    [Show full text]
  • Arxiv:1805.11824V1 [Cs.CL] 30 May 2018
    Artificial Intelligence Review manuscript No. (will be inserted by the editor) Anaphora and Coreference Resolution: A Review Rhea Sukthanker · Soujanya Poria · Erik Cambria · Ramkumar Thirunavukarasu Received: date / Accepted: date Abstract Entity resolution aims at resolving repeated references to an entity in a document and forms a core component of natural language processing (NLP) research. This field possesses immense potential to improve the performance of other NLP fields like machine translation, sentiment analysis, paraphrase detection, summarization, etc. The area of entity resolution in NLP has seen proliferation of research in two separate sub-areas namely: anaphora resolution and coreference resolution. Through this review article, we aim at clarifying the scope of these two tasks in entity resolution. We also carry out a detailed analysis of the datasets, evaluation metrics and research methods that have been adopted to tackle this NLP problem. This survey is motivated with the aim of providing the reader with a clear understanding of what constitutes this NLP problem and the issues that require attention. Keywords Entity Resolution · Coreference Resolution · Anaphora Resolution · Natural Language Processing · Sentiment Analysis · Deep Learning 1 Introduction A discourse is a collocated group of sentences which convey a clear understanding only when read together. The etymology of anaphora is ana (Greek for back) and pheri (Greek for to bear), which in simple terms means repetition. In computational linguistics, anaphora is typically defined as references to items mentioned earlier in the discourse or \pointing back" reference as described by (Mitkov, 1999). The most prevalent type of anaphora in natural language is the pronominal anaphora (Lappin and Leass, 1994).
    [Show full text]
  • Event Coreference Resolution: a Survey of Two Decades of Research
    Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Event Coreference Resolution: A Survey of Two Decades of Research Jing Lu and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas Richardson, TX 75083-0688 fljwinnie,[email protected] Abstract Recent years have seen a gradual shift of focus from entity-based tasks to event-based tasks in informa- tion extraction research. Being a core event-based task, event coreference resolution is less studied but arguably more challenging than entity corefer- ence resolution. This paper provides an overview of the major milestones made in event coreference research since its inception two decades ago. Figure 1: The standard information extraction pipeline 1 Introduction It should be easy to see from this example that to per- Compared to entity coreference resolution, event coreference form end-to-end event coreference resolution, one has to resolution is less studied but arguably more challenging. To build an information extraction (IE) pipeline (cf. Figure 1) see its difficulty, consider the following example on within- that involves (1) extracting the entity mentions from a given document event coreference resolution, whose goal is to de- document (the entity extraction component) and determining termine which event mentions in a document refer to the same which of them are coreferent (the entity coreference compo- real-world event: nent); (2) extracting the event mentions by identifying their trigger words/phrases and determining which entity mentions Georges Cipriani fleftg a prison in Ensisheim ev1 are their arguments (the event extraction component); and (3) in northern France on parole on Wednesday.
    [Show full text]