Computational Modelling of Coreference and Bridging Resolution

Total Page:16

File Type:pdf, Size:1020Kb

Computational Modelling of Coreference and Bridging Resolution Computational modelling of coreference and bridging resolution Von der Fakultät Informatik, Elektrotechnik und Informationstechnik der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung. Vorgelegt von Ina Verena Rösiger aus Göppingen Hauptberichter Prof. Dr. Jonas Kuhn Mitberichter Prof. Dr. Simone Teufel Tag der mündlichen Prüfung: 28.01.2019 Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart 2019 Erklärung (Statement of Authorship) Hiermit erkläre ich, dass ich die vorliegende Arbeit selbständig verfasst habe und dabei keine andere als die angegebene Literatur verwendet habe. Alle Zitate und sinngemäßen Entlehnungen sind als solche unter genauer Angabe der Quelle gekennzeichnet. I hereby declare that this text is the result of my own work and that I have not used sources without declaration in the text. Any thoughts from others or literal quotations are clearly marked. Ort, Datum Unterschrift iii Contents 1. Introduction 5 1.1. Motivation ................................... 5 1.2. Research questions .............................. 7 1.3. Contributions and publications ....................... 9 1.4. Outline of the thesis ............................. 14 I. Background 17 2. Anaphoric reference 19 2.1. Coreference .................................. 22 2.2. Bridging .................................... 30 3. Related NLP tasks 35 3.1. Coreference resolution ............................ 35 3.2. Bridging resolution .............................. 41 II. Data and tool creation 47 4. Annotation and data creation 49 4.1. Coreference annotation and existing corpora ................ 49 4.2. Bridging annotation and existing corpora .................. 53 4.3. Newly created corpus resources ....................... 61 4.3.1. BASHI: bridging in news text .................... 63 4.3.2. SciCorp: coreference and bridging in scientific articles ....... 70 4.3.3. GRAIN: coreference and bridging in radio interviews ....... 79 4.3.4. Conclusion ............................... 81 5. Coreference resolution 83 5.1. Existing tools and related work ....................... 84 5.2. A coreference system for German ...................... 88 5.2.1. System and data ........................... 88 5.2.2. Adapting the system to German ................... 91 5.2.3. Evaluation ............................... 99 5.2.4. Ablation experiments ......................... 101 5.2.5. Pre-processing pipeline: running the system on new texts ..... 102 v Contents 5.2.6. Application on DIRNDL ....................... 104 5.3. Conclusion ................................... 106 6. Bridging resolution 109 6.1. A rule-based bridging system for English .................. 109 6.1.1. Reimplementation .......................... 111 6.1.2. Performance .............................. 120 6.1.3. Generalisability of the approach ................... 123 6.2. CRAC 2018: first shared task on bridging resolution ............ 124 6.2.1. The ARRAU corpus ......................... 125 6.2.2. Data preparation ........................... 125 6.2.3. Evaluation scenarios and metrics .................. 127 6.2.4. Applying the rule-based system to ARRAU ............ 128 6.3. A refined bridging definition ......................... 131 6.3.1. Referential bridging .......................... 132 6.3.2. Lexical bridging ............................ 134 6.3.3. Subset relations and lexical givenness ................ 135 6.3.4. Near-identity ............................. 137 6.3.5. Priming and bridging ......................... 137 6.4. Shared task results .............................. 138 6.4.1. Rules for bridging in ARRAU .................... 138 6.4.2. A learning-based method ....................... 143 6.4.3. Final performance .......................... 144 6.5. A rule-based bridging system for German .................. 146 6.5.1. Adaptation to German ........................ 147 6.5.2. Performance .............................. 157 6.6. Conclusion ................................... 160 III. Linguistic validation experiments 163 7. Using prosodic information to improve coreference resolution 165 7.1. Motivation ................................... 165 7.2. Background .................................. 167 7.3. Related work ................................. 169 7.4. Experimental setup .............................. 170 7.5. Prosodic features ............................... 171 7.6. Manual prosodic information ......................... 173 7.7. Automatically predicted prosodic information ............... 174 7.8. Results and discussion ............................ 176 7.9. Conclusion and future work ......................... 181 vi Contents 8. Integrating predictions from neural-network relation classifiers into coreference and bridging resolution 183 8.1. Relation hypotheses .............................. 184 8.2. Experimental setup .............................. 186 8.3. First experiment ............................... 187 8.3.1. Semantic relation classification .................... 187 8.3.2. Relation analysis ........................... 189 8.3.3. Relations for bridging resolution ................... 190 8.3.4. Relations for coreference resolution ................. 192 8.4. Second experiment .............................. 192 8.4.1. Semantic relation classification .................... 193 8.4.2. Relation analysis ........................... 194 8.4.3. Relations for coreference and bridging resolution .......... 194 8.5. Final performance of the bridging tool ................... 195 8.6. Discussion and conclusion .......................... 196 9. Conclusion 199 9.1. Summary of contributions .......................... 199 9.2. Lessons learned ................................ 202 9.3. Future work .................................. 207 Bibliography 209 vii List of figures 1.1. Four levels of contribution .......................... 8 1.2. Contributions to coreference resolution ................... 9 1.3. Contributions to bridging resolution ..................... 10 2.1. Reference as the relation between referring expressions and referents ... 19 3.1. Latent trees for coreference resolution: data structures .......... 38 4.1. Contribution and workflow pipeline for coreference: data creation .... 61 4.2. Contribution and workflow pipeline for bridging: data creation ...... 62 5.1. Contribution and workflow pipeline for coreference: tool creation ..... 84 6.1. Contribution and workflow pipeline for bridging: tool creation ...... 110 6.2. Contribution and workflow pipeline for bridging: task definition (reloaded)132 7.1. Contribution and workflow pipeline for coreference: validation, part 1 .. 166 7.2. One exemplary pitch accent shape ...................... 167 7.3. The relation between phrase boundaries and intonation phrases ..... 168 7.4. The relation between boundary tones and nuclear and prenuclear accents 168 7.5. Convolutional neural network model for prosodic event recognition .... 175 7.6. The relation between coreference and prominence: example from the DIRNDL dataset with English translation ................. 181 8.1. Contribution and workflow pipeline for coreference: validation, part 2 .. 184 8.2. Contribution and workflow pipeline for bridging: validation ........ 185 8.3. Neural net relation classifier: example of a non-related pair ........ 188 8.4. Neural net relation classifier: example of a hypernym pair ......... 188 8.5. Neural net relation classifier in the second experiment ........... 194 9.1. A data structure based on latent trees for the joint learning of coreference and bridging .................................. 206 viii List of tables 4.1. Guideline comparison: overview of the main differences between OntoN- otes, RefLex and NoSta-D .......................... 50 4.2. Existing corpora annotated with corerefence used in this thesis ...... 51 4.3. Existing corpora annotated with bridging used in this work ........ 58 4.4. An overview of the newly created data ................... 63 4.5. BASHI: corpus statistics ........................... 69 4.6. BASHI: inter-annotator agreement on five WSJ articles .......... 70 4.7. SciCorp: categories and links in our classification scheme ......... 73 4.8. SciCorp: overall inter-annotator-agreement (in κ) ............. 76 4.9. SciCorp: inter-annotator-agreement for the single categories (in κ) ... 77 4.10. SciCorp: corpus statistics .......................... 78 4.11. SciCorp: distribution of information status categories, in absolute numbers 78 4.12. SciCorp: distribution of information status categories, in percent ..... 79 5.1. IMS HotCoref DE: performance of the mention extraction module on TüBa-D/Z version 8 ............................ 91 5.2. IMS HotCoref DE: performance of the mention extraction module after the respective parse adjustments, on TüBa-D/Z version 8 ...... 94 5.3. IMS HotCoref DE: performance of the mention extraction module on TüBa-D/Z version 10 ........................... 94 5.4. Performance of IMS HotCoref DE on TüBa-D/Z version 10: gold vs. predicted annotations ........................ 99 5.5. SemEval-2010 official shared task results for German ........... 100 5.6. SemEval-2010 post-task evaluation ..................... 101 5.7. SemEval-2010: post-task evaluation, excluding singletons ......... 101 5.8. Performance of IMS HotCoref DE on TüBa-D/Z version 10: ablation experiments ............................. 102 5.9. CoNLL-12 format overview: tab-separated columns and content ..... 103 5.10. Markable extraction for the DIRNDL corpus ................ 105 5.11. Performance
Recommended publications
  • Questions Under Discussion: from Sentence to Discourse∗
    Questions under Discussion: From sentence to discourse∗ Anton Benz Katja Jasinskaja July 14, 2016 Abstract Questions under discussion (QUD) is an analytic tool that has recently become more and more popular among linguists and language philosophers as a way to characterize how a sentence fits in its context. The idea is that each sentence in discourse addresses a (of- ten implicit) QUD either by answering it, or by bringing up another question that can help answering that QUD. The linguistic form and the interpretation of a sentence, in turn, may depend on the QUD it addresses. The first proponents of the QUD approach (von Stut- terheim & Klein, 1989; van Kuppevelt, 1995) thought of it as a general approach to the analysis of discourse structure where structural relations between sentences in a coherent discourse are understood in terms of relations between questions they address. However, the idea did not find broad application in discourse structure analysis, where theories aiming at logical discourse representations are predominantly based on the notion of coherence re- lations (Mann & Thompson, 1988; Hobbs, 1985; Asher & Lascarides, 2003). The problem of finding overt markers for QUDs means that they are also difficult to utilize for quanti- tative measures of discourse coherence (McNamara et al., 2010). At the same time QUD proved useful in the analysis of a wide range of linguistic phenomena including accentua- tion, discourse particles, presuppositions and implicatures. The goal of this special issue is to bring together cutting-edge research that demonstrates the success of QUD in linguistics and the need for the development of a comprehensive model of discourse coherence that incorporates the notion of QUD as its integral part.
    [Show full text]
  • From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD
    PhænEx 13, no. 2 (Winter 2020): 38-51 © 2020 Iris Hennigfeld From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD The program of the phenomenological movement, “Back to the things themselves!” does not point to any particular topic of phenomenological investigation, for example the question of being (ontology) or the question of God (metaphysics). Rather, it designates a specific methodological approach toward the things, and, as its final aim, toward a particular mode in which the things are given in experience. One particular mode in which the things are given with full evidence and in original intuition is “self-givenness”. In the experience of self-givenness, the object is fully present, or, in Edmund Husserl’s words, given in “originary presentive intuition” (Husserl, Ideas I 44, cf. 36). The correlative act on the part of the subject, the noesis, is a “seeing” which is not restricted to the senses but must be understood “in the universal sense as an originally presentive consciousness of any kind whatever” (Husserl, Ideas I 36). Self-givenness also represents the criterion of evidence and truth. The phenomenological principles of givenness and self-givenness can be made fruitful in particular for a description and investigation of those phenomena which, according to their very nature, resist other theoretical approaches. Such more “resilient” phenomena include interpersonal, religious or aesthetic experiences. Presentation becomes, in accordance with philosophical tradition, the dominant model in phenomenological research. But first-person experience shows that presentation is not the only mode of givenness and self-givenness. Rather, pushing phenomenology to its limits, the full range of experience shows that there are certain kinds of phenomena of which the genuine mode of givenness cannot be reduced to the way in which objects are presented to a subject.
    [Show full text]
  • Givenness and Its Realization in a Linguistic and in a Non- Linguistic Environment
    VILNIUS PEDAGOGICAL UNIVERSITY FACULTY OF FOREIGN LANGUAGES DEPARTMENT OF ENGLISH PHILOLOGY NATALIJA ZACHAROVA GIVENNESS AND ITS REALIZATION IN A LINGUISTIC AND IN A NON- LINGUISTIC ENVIRONMENT MA Paper Academic advisor: prof. Dr. Hab. Laimutis Valeika Vilnius, 2008 VILNIUS PEDAGOGICAL UNIVERSITY FACULTY OF FOREIGN LANGUAGES DEPARTMENT OF ENGLISH PHILOLOGY GIVENNESS AND ITS REALIZATION IN A LINGUISTIC AND IN A NON- LINGUISTIC ENVIRONMENT This MA paper is submitted in partial fulfillment of requirements for the degree of the MA in English Philology By Natalija Zacharova I declare that this study is my own and does not contain any unacknowledged work from any source. (Signature) (Date) Academic advisor: prof. Dr. Hab. Laimutis Valeika (Signature) (Date) Vilnius, 2008 2 CONTENTS ABSTRACT………………………………………………………………………….4 INTRODUCTION…………………………………………………………………...5 1. THE PROBLEMS OF THE INFORMATIONAL STRUCTURE OF THE SENTENCE ……………………………………………………………….8 1.1. The sentence as dialectical entity of given and new……………………...8 1.2. Givenness vs. Newnness………………………………………………….11 1.3. The realization of Givenness……………………………………………..12 1.4. Givenness expressed by the definite article ……………………………..15 1.5. Givenness expressed by the indefinite article…………………………….19 1.6. Givenness expressed by semi-grammatical definite determiners and lexical determiners………………………………………………………..20 2. THE REALIZATION OF GIVENNESS IN A LINGUISTIC ENVIRONMENT……………………………………………………………………21 2.1. Anaphoric Givenness………………………………………………………22 2.2. Cataphoric Givenness……………………………………………………...29 2.3. Givenness expressed by the use of the indefinite article…………………..30 3. THE REALIZATION OF GIVENNESS IN A NON- LINGUISTIC ENVIRONMENT……………………………………………………………………..32 3. 1. The environment of the home……………………………………………...33 3.2. The environment of the town/country, world………………………………35 3.3. The environment of the universe…………………………………………....37 3.4. Cultural environment……………………………………………………….38 4.
    [Show full text]
  • UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations
    UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations Title Syntax & Information Structure: The Grammar of English Inversions Permalink https://escholarship.org/uc/item/2sv7q1pm Author Samko, Bern Publication Date 2016 License https://creativecommons.org/licenses/by-nc-nd/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA SANTA CRUZ SYNTAX & INFORMATION STRUCTURE: THE GRAMMAR OF ENGLISH INVERSIONS A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in LINGUISTICS by Bern Samko June 2016 The Dissertation of Bern Samko is approved: Professor Jim McCloskey, chair Associate Professor Pranav Anand Associate Professor Line Mikkelsen Assistant Professor Maziar Toosarvandani Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright © by Bern Samko 2016 Contents Acknowledgments x 1 Introduction 1 1.1 Thequestions ............................... 1 1.1.1 Summaryofresults ........................ 2 1.2 Syntax................................... 6 1.3 InformationStructure . .. .. .. .. .. .. .. 7 1.3.1 Topic ............................... 8 1.3.2 Focus ............................... 11 1.3.3 Givenness............................. 13 1.3.4 The distinctness of information structural notions . ....... 14 1.3.5 TheQUD ............................. 17 1.4 The relationship between syntax and information structure ....... 19 1.5 Thephenomena .............................. 25 1.5.1
    [Show full text]
  • Focus and Givenness: a Unified Approach Michael Wagner Mcgill University
    1 Focus and Givenness: A Unified Approach Michael Wagner McGill University Abstract Theories about information structure differ in treating focus-, contrast-, and givenness-marking as either a single phenomenon or as distinct phenomena. This paper argues for a unified approach, following the lead of Schwarzschild (1999), but a unified approach based on local alternatives (Wagner, 2005, 2006b).1 The basic insights of the local alternatives approach are captured using a new formalization, which aims to fix several problems in the original proposal noted in B¨uring (2008). The unified approach is compared with alternative approaches that invoke separate mechanisms for givenness and focus: the disanaphora approach (Williams, 1997), the F-Projection account (Selkirk, 1984, 1995), and the reference set approach in Szendr¨oi (2001) and Reinhart (2006). All of these have fundamental problems, and the requisite revisions render them equivalent to the local alternatives approach. Keywords: alternatives, focus, givenness, prosody, contrast, information struc- ture 1 The label ‘local alternatives’ for the ‘relative giveneness’ account in Wagner (2005, 2006b) was proposed in Rooth (2007) and seems like a more appropriate term. 2 Michael Wagner 1.1 Three Phenomena, or Two, or One? The distribution of accents in a sentence in English is affected by informa- tion structure in a systematic way. Information structure comprises a broad set of phenomena. This paper looks at three types of effects that influence the prosody of a sentence, those of question-answer congruence, contrast, and givenness, and presents evidence—some new, some already presented in Wag- ner (2005, 2006b)—that all three should be treated as reflexes of the same underlying phenomenon.
    [Show full text]
  • Coreference Resolution for Spoken French Loïc Grobol
    Coreference resolution for spoken French Loïc Grobol To cite this version: Loïc Grobol. Coreference resolution for spoken French. Computation and Language [cs.CL]. Université Sorbonne Nouvelle - Paris 3, 2020. English. tel-02928209v2 HAL Id: tel-02928209 https://hal.archives-ouvertes.fr/tel-02928209v2 Submitted on 9 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License École Doctorale 622 — Sciences du langage Lattice, Inria Thèse de doctorat en sciences du langage de l’Université Sorbonne Nouvelle Coreference resolution for spoken French présentée et soutenue publiquement par Loïc Grobol le 15 juillet 2020 Sous la direction d’Isabelle Tellier† et de Frédéric Landragin Et co-encadrée par Éric Villemonte de la Clergerie et Marco Dinarelli Jury : Massimo Poesio, Professor (Queen Mary University), Rapporteur, Sophie Rosset, Directrice de recherche (LIMSI), Rapportrice, Béatrice Daille, Professeur (Université de Nantes), Examinatrice, Pascal Amsili, Professeur (Université Sorbonne Nouvelle), Examinateur, Frédéric Landragin, Directeur de recherche (Lattice), Directeur, Éric Villemonte de la Clergerie, Chargé de recherche (Inria), Co-encadrant, Marco Dinarelli, Chargé de recherche (LIG), Co-encadrant. Reconnaissance automatique de chaînes de coréférences en français parlé Résumé Une chaîne de coréférences est l’ensemble des expressions linguistiques — ou mentions — qui font référence à une même entité ou un même objet du discours.
    [Show full text]
  • BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis a Thesis Submitted Fox 1 the Degree of Doctor of Philosophy Of
    BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis A thesis submitted fox1 the degree of Doctor of Philosophy of the University of London School of Oriental and African Studies University of London 1979 ProQuest Number: 10731354 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10731354 Published by ProQuest LLC(2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 II Abstract The present thesis aims to describe some aspects of Mod Greek syntax.It contains an introduction and five chapters. The introduction states the purpose for writing this thesis and points out the fact that it is a data-oriented rather, chan a theory-^oriented work. Chapter one deals with the word order in Mod Greek. The main conclusion drawn from this chapter is that, given the re­ latively rich system of inflexions of Mod Greek,there is a freedom of word order in this language;an attempt is made to account for this phenomenon in terms of the thematic structure. of the sentence and PSP theory. The second chapter examines the clitics;special attention is paid to clitic objects and some problems concerning their syntactic relations .to the rest of the sentence are pointed out;the chapter ends with the tentative suggestion that cli­ tics might be taken care of by the morphologichi component of the grammar• Chapter three deals with complementation;this a vast area of study and-for this reason the analysis is confined to 'oti1, 'na* and'pu' complement clauses; Object Raising, Verb Raising and Extraposition are also discussed in this chapter.
    [Show full text]
  • Mind the GAP: a Balanced Corpus of Gendered Ambiguous Pronouns
    Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns Kellie Webster and Marta Recasens and Vera Axelrod and Jason Baldridge Google AI Language fwebsterk|recasens|vaxelrod|[email protected] Abstract (1) In May, Fujisawa joined Mari Motohashi’s rink as the team’s skip, moving back from Karuizawa to Ki- Coreference resolution is an important task tami where she had spent her junior days. for natural language understanding, and the resolution of ambiguous pronouns a long- With this scope, we make three key contributions: standing challenge. Nonetheless, existing • We design an extensible, language- corpora do not capture ambiguous pronouns in sufficient volume or diversity to accu- independent mechanism for extracting rately indicate the practical utility of mod- challenging ambiguous pronouns from text. els. Furthermore, we find gender bias in • We build and release GAP, a human-labeled existing corpora and systems favoring mas- corpus of 8,908 ambiguous pronoun-name culine entities. To address this, we present pairs derived from Wikipedia.2 This dataset and release GAP, a gender-balanced labeled targets the challenges of resolving naturally- corpus of 8,908 ambiguous pronoun-name occurring ambiguous pronouns and rewards pairs sampled to provide diverse coverage systems which are gender-fair. of challenges posed by real-world text. We explore a range of baselines which demon- • We run four state-of-the-art coreference re- strate the complexity of the challenge, the solvers and several competitive simple base- best achieving just 66.9% F1. We show lines on GAP to understand limitations in that syntactic structure and continuous neu- current modeling, including gender bias.
    [Show full text]
  • Constraints on Donkey Pronouns
    Constraints on Donkey Pronouns The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Grosz, P. G., P. Patel-Grosz, E. Fedorenko, and E. Gibson. “Constraints on Donkey Pronouns.” Journal of Semantics 32, no. 4 (July 15, 2014): 619–648. As Published http://dx.doi.org/10.1093/jos/ffu009 Publisher Oxford University Press Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/102962 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ CONSTRAINTS ON DONKEY PRONOUNS Patrick Grosz, Pritty Patel-Grosz, Evelina Fedorenko, Edward Gibson Abstract This paper reports on an experimental study of donkey pronouns, pronouns (e.g. it) whose meaning covaries with that of a non-pronominal noun phrase (e.g. a donkey) even though they are not in a structural relationship that is suitable for quantifier-variable binding. We investigate three constraints, (i) the preference for the presence of an overt NP antecedent that is not part of another word, (ii) the salience of the position of an antecedent that is part of another word, and (iii) the uniqueness of an intended antecedent (in terms of world knowledge). We compare constructions in which intended antecedents occur in a context such as who owns an N / who is an N-owner with constructions of the type who was without an N / who was N-less. Our findings corroborate the existence of the overt NP antecedent constraint, and also show that the salience of an unsuitable antecedent’s position matters.
    [Show full text]
  • Definiteness in Languages with and Without Articles
    Definiteness in languages with and without articles Jingting Ye & Laura Becker Fudan University & Leipzig University 13th Conference on Typology and Grammar for Young Scholars 24.11.2016 ILS RAN, St Petersburg Jingting Ye & Laura Becker Definiteness and articles 1 / 49 1 Introduction 2 (In)definiteness and specificity 3 Outline: A pilot study based on multiple parallel movie subtitles 4 Results Comparing the use of articles Examples Factor strength based on random forests Other strategies to mark (in)definiteness Demonstratives Adnominal indefinites The numeral one Word order The levels of givenness Relevant factors for definiteness Trees Random forests 5 Concluding remarks Jingting Ye & Laura Becker Definiteness and articles 2 / 49 Introduction Introduction There are many accounts for definiteness, however, most rely on language-specific expressions, e.g. definite articles. Although comparative studies exist, no empirically based cross-linguistic study seems to be available yet that makes expressions of definiteness comparable directly. This pilot study explores the possibilities of parallel texts for comparing the expression of definiteness in languages with and without articles. The languages we examined are German, Hungarian, Russian, and Chinese. Jingting Ye & Laura Becker Definiteness and articles 3 / 49 (In)definiteness and specificity Definiteness Definiteness has been associated with the following concepts: uniqueness (Frege 1892; Strawson 1950; Heim & Kratzer 1998; Stanley & Gendler Szabó 2000) familiarity (Heim 1988; Roberts 2003; Chierchia 1995)
    [Show full text]
  • Arxiv:1805.11824V1 [Cs.CL] 30 May 2018
    Artificial Intelligence Review manuscript No. (will be inserted by the editor) Anaphora and Coreference Resolution: A Review Rhea Sukthanker · Soujanya Poria · Erik Cambria · Ramkumar Thirunavukarasu Received: date / Accepted: date Abstract Entity resolution aims at resolving repeated references to an entity in a document and forms a core component of natural language processing (NLP) research. This field possesses immense potential to improve the performance of other NLP fields like machine translation, sentiment analysis, paraphrase detection, summarization, etc. The area of entity resolution in NLP has seen proliferation of research in two separate sub-areas namely: anaphora resolution and coreference resolution. Through this review article, we aim at clarifying the scope of these two tasks in entity resolution. We also carry out a detailed analysis of the datasets, evaluation metrics and research methods that have been adopted to tackle this NLP problem. This survey is motivated with the aim of providing the reader with a clear understanding of what constitutes this NLP problem and the issues that require attention. Keywords Entity Resolution · Coreference Resolution · Anaphora Resolution · Natural Language Processing · Sentiment Analysis · Deep Learning 1 Introduction A discourse is a collocated group of sentences which convey a clear understanding only when read together. The etymology of anaphora is ana (Greek for back) and pheri (Greek for to bear), which in simple terms means repetition. In computational linguistics, anaphora is typically defined as references to items mentioned earlier in the discourse or \pointing back" reference as described by (Mitkov, 1999). The most prevalent type of anaphora in natural language is the pronominal anaphora (Lappin and Leass, 1994).
    [Show full text]
  • Event Coreference Resolution: a Survey of Two Decades of Research
    Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Event Coreference Resolution: A Survey of Two Decades of Research Jing Lu and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas Richardson, TX 75083-0688 fljwinnie,[email protected] Abstract Recent years have seen a gradual shift of focus from entity-based tasks to event-based tasks in informa- tion extraction research. Being a core event-based task, event coreference resolution is less studied but arguably more challenging than entity corefer- ence resolution. This paper provides an overview of the major milestones made in event coreference research since its inception two decades ago. Figure 1: The standard information extraction pipeline 1 Introduction It should be easy to see from this example that to per- Compared to entity coreference resolution, event coreference form end-to-end event coreference resolution, one has to resolution is less studied but arguably more challenging. To build an information extraction (IE) pipeline (cf. Figure 1) see its difficulty, consider the following example on within- that involves (1) extracting the entity mentions from a given document event coreference resolution, whose goal is to de- document (the entity extraction component) and determining termine which event mentions in a document refer to the same which of them are coreferent (the entity coreference compo- real-world event: nent); (2) extracting the event mentions by identifying their trigger words/phrases and determining which entity mentions Georges Cipriani fleftg a prison in Ensisheim ev1 are their arguments (the event extraction component); and (3) in northern France on parole on Wednesday.
    [Show full text]