Improving Coreference Resolution with Automatically Predicted Prosodic Information

Total Page:16

File Type:pdf, Size:1020Kb

Improving Coreference Resolution with Automatically Predicted Prosodic Information Improving coreference resolution with automatically predicted prosodic information Ina Rosiger¨ ∗, Sabrina Stehwien∗, Arndt Riester, Ngoc Thang Vu Institute for Natural Language Processing University of Stuttgart, Germany roesigia,stehwisa,arndt,thangvu @ims.uni-stuttgart.de { } Abstract there exist a few systems for pronoun resolution in transcripts of spoken text (Strube and Muller¨ , Adding manually annotated prosodic in- 2003; Tetreault and Allen, 2004). It has been formation, specifically pitch accents and shown that there are differences between written phrasing, to the typical text-based feature and spoken text that lead to a drop in performance set for coreference resolution has previ- when coreference resolution systems developed ously been shown to have a positive effect for written text are applied on spoken text (Amoia on German data. Practical applications on et al., 2012). For this reason, it may help to use spoken language, however, would rely on additional information available from the speech automatically predicted prosodic informa- signal, for example prosody. tion. In this paper we predict pitch ac- In West-Germanic languages, such as English cents (and phrase boundaries) using a con- and German, there is a tendency for coreferent volutional neural network (CNN) model items, i.e. entities that have already been intro- from acoustic features extracted from the duced into the discourse (their information sta- speech signal. After an assessment of the tus is given), to be deaccented, as the speaker as- quality of these automatic prosodic anno- sumes the entity to be salient in the listener’s dis- tations, we show that they also signifi- course model (cf. Terken and Hirschberg(1994); cantly improve coreference resolution. Baumann and Riester(2013); Baumann and Roth (2014)). We can make use of this fact by providing 1 Introduction prosodic information to the coreference resolver. Noun phrase coreference resolution is the task of Example (2), this time marked with prominence grouping noun phrases (NPs) together that refer information, shows that prominence can help us resolve cases where the transcription is potentially to the same discourse entity in a text or dialogue. 2 In Example (1), taken from Umbach(2002), the ambiguous . question for the coreference resolver, besides link- (2) John 1 has an old cottage 2. ing the anaphoric pronoun he back to John, is to { } { } a. Last year he reconstructed the decide whether an old cottage and the shed refer { }1 { SHED . to the same entity. }3 b. Last year he 1 reconSTRUCted the { } (1) John has an old cottage . shed 2. { }1 { }2 } Last year he reconstructed the shed . { }1 { }? The pitch accent on shed in (2a) leads to the in- Coreference resolution is an active NLP research terpretation that the shed and the cottage refer to area, with its own track at most NLP conferences different entities, where the shed is a part of the and several shared tasks such as the CoNLL or cottage (they are in a bridging relation). In con- SemEval shared tasks (Pradhan et al., 2012; Re- trast, in (2b), the shed is deaccented, which sug- casens et al., 2010) or the CORBON shared task gests that the shed and the cottage corefer. 20171. Almost all work is based on text, although A pilot study byR osiger¨ and Riester(2015) has *The two first authors contributed equally to this work. 2The anaphor under consideration is typed in boldface, its 1http://corbon.nlp.ipipan.waw.pl/ antecedent is underlined. Accented syllables are capitalised. 78 Proceedings of the First Workshop on Speech-Centric Natural Language Processing, pages 78–83 Copenhagen, Denmark, September 7–11, 2017. c 2017 Association for Computational Linguistics shown that enhancing the text-based feature set 3 Data for a coreference resolver, consisting of e.g. auto- To ensure comparability, we use the same dataset matic part-of-speech (POS) tags and syntactic in- as in the pilot study, namely the DIRNDL cor- formation, with pitch accents and prosodic phras- pus (Eckart et al., 2012; Bjorkelund¨ et al., 2014), ing information helps to improve coreference res- a German radio news corpus annotated with both olution of German spoken text. The prosodic la- manual coreference and manual prosody labels. bels used in the experiments were annotated man- We adopt the official train, test and development ually, which is not only expensive but not applica- split4 designed for research on coreference res- ble in an automatic pipeline setup. In our paper, olution. The recorded news broadcasts in the we present an experiment in which we replicate DIRNDL-anaphora corpus were spoken by 13 the main results from the pilot study by annotating male and 7 female speakers, in total roughly 5 the prosodic information automatically, thus omit- hours of speech. The prosodic annotations follow ting any manual annotations from the feature set. the GToBI(S) standard for pitch accent types and We show that adding prosodic information signif- boundary tones and are described in Bjorkelund¨ icantly helps in all of our experiments. et al.(2014). In this study we make use of two 2 Prosodic features for coreference class labels of prosodic events: all accent types resolution (marked by the standard ToBI *) grouped into a single class (pitch accent presence) and the same Similar to the pilot study, we make use of pitch ac- for intonational phrase boundaries (marked by %). cents and prosodic phrasing. We predict the pres- 3 ence of a pitch accent and use phrase boundaries 4 Automatic prosodic information to derive nuclear accents, which are taken to be the last (and perceptually most prominent) accent In this section we describe the prosodic event de- in an intonation phrase. This paper tests whether tector used in this work. It is a binary classifier previously reported tendencies for manual labels that is trained separately for either pitch accents are also observable for automatic labels, namely: or phrase boundaries and predicts for each word, whether it carries the respective prosodic event. Short NPs Since long, complex NPs almost al- ways have at least one pitch accent, the presence 4.1 Model and the absence of a pitch accent is more helpful We apply a convolutional neural network (CNN) for shorter phrases. model, illustrated in Figure1. The input to the Long NPs For long, complex NPs, we look for CNN is a matrix spanning the current word and its nuclear accents that indicate the phrase’s overall right and left context word. The input matrix is prominence. If the NP contains a nuclear accent, a frame-based representation of the speech signal. it is assumed to be less likely to take part in The signal is divided into overlapping frames for coreference chains. each 20 ms with a 10 ms shift and are represented by a 6-dimensional feature vector for each frame. We test the following features that have proven We use acoustic features as well as position in- beneficial in the pilot study. These features are de- dicator features following Stehwien and Vu(2017) rived for each NP. that are simple and fast to obtain. The acoustic features were extracted from the speech signal us- Pitch accent presence focuses on the presence ing the OpenSMILE toolkit (Eyben et al., 2013). of a pitch accent, disregarding its type. If one ac- The feature set consists of 5 features that comprise cent is present in the NP, this boolean feature gets acoustic correlates of prominence: smoothed fun- assigned the value true, and false otherwise. damental frequency (f0), RMS energy, PCM loud- Nuclear accent presence is a boolean feature ness, voicing probability and Harmonics-to-Noise comparable to pitch accent presence. It gets as- Ratio. The position indicator feature is appended signed the value true if there is a nuclear accent as an extra feature to the input matrices (see Fig- present in the NP, false otherwise. ure1) and aids the modelling of the acoustic con- 3We do not predict the pitch accent type (e.g. fall H*L or 4http://www.ims.uni-stuttgart.de/ rise L*H) as this distinction was not helpful in the pilot study forschung/ressourcen/korpora/dirndl. and is generally more difficult to model. en.html 79 News Corpus (Ostendorf et al., 1995) that contains speech from 3 female and 2 male speakers and that includes manually labelled pitch accents and into- national phrase boundary tones. Hence, both cor- pora consist of read speech by radio news anchors. The prediction accuracy on the DIRNDL anaphora corpus is 81.9% for pitch accents and 85.5% for intonational phrase boundary tones6. The speaker- independent performance of this model on the Boston dataset is 83.5% accuracy for pitch accent detection and 89% for phrase boundary detection. We conclude that the prosodic event detector gen- eralises well to the DIRNDL dataset and the ob- tained accuracies are appropriate for our experi- ments. 5 Coreference resolution In this section, we describe the coreference re- solver used in our experiments and how it was ap- plied to create the baseline system using only au- tomatic annotations. 5.1 IMS HotCoref DE Figure 1: CNN for prosodic event recognition with The IMS HotCoref DE coreference resolver is an input window of 3 successive words and posi- a state-of-the-art tool for German7 (Rosiger¨ and tion indicating features. Kuhn, 2016). It is data-driven, i.e. it learns from annotated data with the help of pre-defined fea- text by indicating which frames belong to the cur- tures using a structured perceptron that models rent word or the neighbouring words. coreference within a document as a directed tree. We apply two convolution layers in order to ex- This way, it can exploit the tree structure to create pand the input information and then use max pool- non-local features (features that go beyond a pair ing to find the most salient features.
Recommended publications
  • Questions Under Discussion: from Sentence to Discourse∗
    Questions under Discussion: From sentence to discourse∗ Anton Benz Katja Jasinskaja July 14, 2016 Abstract Questions under discussion (QUD) is an analytic tool that has recently become more and more popular among linguists and language philosophers as a way to characterize how a sentence fits in its context. The idea is that each sentence in discourse addresses a (of- ten implicit) QUD either by answering it, or by bringing up another question that can help answering that QUD. The linguistic form and the interpretation of a sentence, in turn, may depend on the QUD it addresses. The first proponents of the QUD approach (von Stut- terheim & Klein, 1989; van Kuppevelt, 1995) thought of it as a general approach to the analysis of discourse structure where structural relations between sentences in a coherent discourse are understood in terms of relations between questions they address. However, the idea did not find broad application in discourse structure analysis, where theories aiming at logical discourse representations are predominantly based on the notion of coherence re- lations (Mann & Thompson, 1988; Hobbs, 1985; Asher & Lascarides, 2003). The problem of finding overt markers for QUDs means that they are also difficult to utilize for quanti- tative measures of discourse coherence (McNamara et al., 2010). At the same time QUD proved useful in the analysis of a wide range of linguistic phenomena including accentua- tion, discourse particles, presuppositions and implicatures. The goal of this special issue is to bring together cutting-edge research that demonstrates the success of QUD in linguistics and the need for the development of a comprehensive model of discourse coherence that incorporates the notion of QUD as its integral part.
    [Show full text]
  • From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD
    PhænEx 13, no. 2 (Winter 2020): 38-51 © 2020 Iris Hennigfeld From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD The program of the phenomenological movement, “Back to the things themselves!” does not point to any particular topic of phenomenological investigation, for example the question of being (ontology) or the question of God (metaphysics). Rather, it designates a specific methodological approach toward the things, and, as its final aim, toward a particular mode in which the things are given in experience. One particular mode in which the things are given with full evidence and in original intuition is “self-givenness”. In the experience of self-givenness, the object is fully present, or, in Edmund Husserl’s words, given in “originary presentive intuition” (Husserl, Ideas I 44, cf. 36). The correlative act on the part of the subject, the noesis, is a “seeing” which is not restricted to the senses but must be understood “in the universal sense as an originally presentive consciousness of any kind whatever” (Husserl, Ideas I 36). Self-givenness also represents the criterion of evidence and truth. The phenomenological principles of givenness and self-givenness can be made fruitful in particular for a description and investigation of those phenomena which, according to their very nature, resist other theoretical approaches. Such more “resilient” phenomena include interpersonal, religious or aesthetic experiences. Presentation becomes, in accordance with philosophical tradition, the dominant model in phenomenological research. But first-person experience shows that presentation is not the only mode of givenness and self-givenness. Rather, pushing phenomenology to its limits, the full range of experience shows that there are certain kinds of phenomena of which the genuine mode of givenness cannot be reduced to the way in which objects are presented to a subject.
    [Show full text]
  • Givenness and Its Realization in a Linguistic and in a Non- Linguistic Environment
    VILNIUS PEDAGOGICAL UNIVERSITY FACULTY OF FOREIGN LANGUAGES DEPARTMENT OF ENGLISH PHILOLOGY NATALIJA ZACHAROVA GIVENNESS AND ITS REALIZATION IN A LINGUISTIC AND IN A NON- LINGUISTIC ENVIRONMENT MA Paper Academic advisor: prof. Dr. Hab. Laimutis Valeika Vilnius, 2008 VILNIUS PEDAGOGICAL UNIVERSITY FACULTY OF FOREIGN LANGUAGES DEPARTMENT OF ENGLISH PHILOLOGY GIVENNESS AND ITS REALIZATION IN A LINGUISTIC AND IN A NON- LINGUISTIC ENVIRONMENT This MA paper is submitted in partial fulfillment of requirements for the degree of the MA in English Philology By Natalija Zacharova I declare that this study is my own and does not contain any unacknowledged work from any source. (Signature) (Date) Academic advisor: prof. Dr. Hab. Laimutis Valeika (Signature) (Date) Vilnius, 2008 2 CONTENTS ABSTRACT………………………………………………………………………….4 INTRODUCTION…………………………………………………………………...5 1. THE PROBLEMS OF THE INFORMATIONAL STRUCTURE OF THE SENTENCE ……………………………………………………………….8 1.1. The sentence as dialectical entity of given and new……………………...8 1.2. Givenness vs. Newnness………………………………………………….11 1.3. The realization of Givenness……………………………………………..12 1.4. Givenness expressed by the definite article ……………………………..15 1.5. Givenness expressed by the indefinite article…………………………….19 1.6. Givenness expressed by semi-grammatical definite determiners and lexical determiners………………………………………………………..20 2. THE REALIZATION OF GIVENNESS IN A LINGUISTIC ENVIRONMENT……………………………………………………………………21 2.1. Anaphoric Givenness………………………………………………………22 2.2. Cataphoric Givenness……………………………………………………...29 2.3. Givenness expressed by the use of the indefinite article…………………..30 3. THE REALIZATION OF GIVENNESS IN A NON- LINGUISTIC ENVIRONMENT……………………………………………………………………..32 3. 1. The environment of the home……………………………………………...33 3.2. The environment of the town/country, world………………………………35 3.3. The environment of the universe…………………………………………....37 3.4. Cultural environment……………………………………………………….38 4.
    [Show full text]
  • UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations
    UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations Title Syntax & Information Structure: The Grammar of English Inversions Permalink https://escholarship.org/uc/item/2sv7q1pm Author Samko, Bern Publication Date 2016 License https://creativecommons.org/licenses/by-nc-nd/4.0/ 4.0 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA SANTA CRUZ SYNTAX & INFORMATION STRUCTURE: THE GRAMMAR OF ENGLISH INVERSIONS A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in LINGUISTICS by Bern Samko June 2016 The Dissertation of Bern Samko is approved: Professor Jim McCloskey, chair Associate Professor Pranav Anand Associate Professor Line Mikkelsen Assistant Professor Maziar Toosarvandani Tyrus Miller Vice Provost and Dean of Graduate Studies Copyright © by Bern Samko 2016 Contents Acknowledgments x 1 Introduction 1 1.1 Thequestions ............................... 1 1.1.1 Summaryofresults ........................ 2 1.2 Syntax................................... 6 1.3 InformationStructure . .. .. .. .. .. .. .. 7 1.3.1 Topic ............................... 8 1.3.2 Focus ............................... 11 1.3.3 Givenness............................. 13 1.3.4 The distinctness of information structural notions . ....... 14 1.3.5 TheQUD ............................. 17 1.4 The relationship between syntax and information structure ....... 19 1.5 Thephenomena .............................. 25 1.5.1
    [Show full text]
  • Focus and Givenness: a Unified Approach Michael Wagner Mcgill University
    1 Focus and Givenness: A Unified Approach Michael Wagner McGill University Abstract Theories about information structure differ in treating focus-, contrast-, and givenness-marking as either a single phenomenon or as distinct phenomena. This paper argues for a unified approach, following the lead of Schwarzschild (1999), but a unified approach based on local alternatives (Wagner, 2005, 2006b).1 The basic insights of the local alternatives approach are captured using a new formalization, which aims to fix several problems in the original proposal noted in B¨uring (2008). The unified approach is compared with alternative approaches that invoke separate mechanisms for givenness and focus: the disanaphora approach (Williams, 1997), the F-Projection account (Selkirk, 1984, 1995), and the reference set approach in Szendr¨oi (2001) and Reinhart (2006). All of these have fundamental problems, and the requisite revisions render them equivalent to the local alternatives approach. Keywords: alternatives, focus, givenness, prosody, contrast, information struc- ture 1 The label ‘local alternatives’ for the ‘relative giveneness’ account in Wagner (2005, 2006b) was proposed in Rooth (2007) and seems like a more appropriate term. 2 Michael Wagner 1.1 Three Phenomena, or Two, or One? The distribution of accents in a sentence in English is affected by informa- tion structure in a systematic way. Information structure comprises a broad set of phenomena. This paper looks at three types of effects that influence the prosody of a sentence, those of question-answer congruence, contrast, and givenness, and presents evidence—some new, some already presented in Wag- ner (2005, 2006b)—that all three should be treated as reflexes of the same underlying phenomenon.
    [Show full text]
  • Coreference Resolution for Spoken French Loïc Grobol
    Coreference resolution for spoken French Loïc Grobol To cite this version: Loïc Grobol. Coreference resolution for spoken French. Computation and Language [cs.CL]. Université Sorbonne Nouvelle - Paris 3, 2020. English. tel-02928209v2 HAL Id: tel-02928209 https://hal.archives-ouvertes.fr/tel-02928209v2 Submitted on 9 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License École Doctorale 622 — Sciences du langage Lattice, Inria Thèse de doctorat en sciences du langage de l’Université Sorbonne Nouvelle Coreference resolution for spoken French présentée et soutenue publiquement par Loïc Grobol le 15 juillet 2020 Sous la direction d’Isabelle Tellier† et de Frédéric Landragin Et co-encadrée par Éric Villemonte de la Clergerie et Marco Dinarelli Jury : Massimo Poesio, Professor (Queen Mary University), Rapporteur, Sophie Rosset, Directrice de recherche (LIMSI), Rapportrice, Béatrice Daille, Professeur (Université de Nantes), Examinatrice, Pascal Amsili, Professeur (Université Sorbonne Nouvelle), Examinateur, Frédéric Landragin, Directeur de recherche (Lattice), Directeur, Éric Villemonte de la Clergerie, Chargé de recherche (Inria), Co-encadrant, Marco Dinarelli, Chargé de recherche (LIG), Co-encadrant. Reconnaissance automatique de chaînes de coréférences en français parlé Résumé Une chaîne de coréférences est l’ensemble des expressions linguistiques — ou mentions — qui font référence à une même entité ou un même objet du discours.
    [Show full text]
  • BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis a Thesis Submitted Fox 1 the Degree of Doctor of Philosophy Of
    BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis A thesis submitted fox1 the degree of Doctor of Philosophy of the University of London School of Oriental and African Studies University of London 1979 ProQuest Number: 10731354 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a com plete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest ProQuest 10731354 Published by ProQuest LLC(2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States C ode Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 II Abstract The present thesis aims to describe some aspects of Mod Greek syntax.It contains an introduction and five chapters. The introduction states the purpose for writing this thesis and points out the fact that it is a data-oriented rather, chan a theory-^oriented work. Chapter one deals with the word order in Mod Greek. The main conclusion drawn from this chapter is that, given the re­ latively rich system of inflexions of Mod Greek,there is a freedom of word order in this language;an attempt is made to account for this phenomenon in terms of the thematic structure. of the sentence and PSP theory. The second chapter examines the clitics;special attention is paid to clitic objects and some problems concerning their syntactic relations .to the rest of the sentence are pointed out;the chapter ends with the tentative suggestion that cli­ tics might be taken care of by the morphologichi component of the grammar• Chapter three deals with complementation;this a vast area of study and-for this reason the analysis is confined to 'oti1, 'na* and'pu' complement clauses; Object Raising, Verb Raising and Extraposition are also discussed in this chapter.
    [Show full text]
  • Mind the GAP: a Balanced Corpus of Gendered Ambiguous Pronouns
    Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns Kellie Webster and Marta Recasens and Vera Axelrod and Jason Baldridge Google AI Language fwebsterk|recasens|vaxelrod|[email protected] Abstract (1) In May, Fujisawa joined Mari Motohashi’s rink as the team’s skip, moving back from Karuizawa to Ki- Coreference resolution is an important task tami where she had spent her junior days. for natural language understanding, and the resolution of ambiguous pronouns a long- With this scope, we make three key contributions: standing challenge. Nonetheless, existing • We design an extensible, language- corpora do not capture ambiguous pronouns in sufficient volume or diversity to accu- independent mechanism for extracting rately indicate the practical utility of mod- challenging ambiguous pronouns from text. els. Furthermore, we find gender bias in • We build and release GAP, a human-labeled existing corpora and systems favoring mas- corpus of 8,908 ambiguous pronoun-name culine entities. To address this, we present pairs derived from Wikipedia.2 This dataset and release GAP, a gender-balanced labeled targets the challenges of resolving naturally- corpus of 8,908 ambiguous pronoun-name occurring ambiguous pronouns and rewards pairs sampled to provide diverse coverage systems which are gender-fair. of challenges posed by real-world text. We explore a range of baselines which demon- • We run four state-of-the-art coreference re- strate the complexity of the challenge, the solvers and several competitive simple base- best achieving just 66.9% F1. We show lines on GAP to understand limitations in that syntactic structure and continuous neu- current modeling, including gender bias.
    [Show full text]
  • Constraints on Donkey Pronouns
    Constraints on Donkey Pronouns The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Grosz, P. G., P. Patel-Grosz, E. Fedorenko, and E. Gibson. “Constraints on Donkey Pronouns.” Journal of Semantics 32, no. 4 (July 15, 2014): 619–648. As Published http://dx.doi.org/10.1093/jos/ffu009 Publisher Oxford University Press Version Author's final manuscript Citable link http://hdl.handle.net/1721.1/102962 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ CONSTRAINTS ON DONKEY PRONOUNS Patrick Grosz, Pritty Patel-Grosz, Evelina Fedorenko, Edward Gibson Abstract This paper reports on an experimental study of donkey pronouns, pronouns (e.g. it) whose meaning covaries with that of a non-pronominal noun phrase (e.g. a donkey) even though they are not in a structural relationship that is suitable for quantifier-variable binding. We investigate three constraints, (i) the preference for the presence of an overt NP antecedent that is not part of another word, (ii) the salience of the position of an antecedent that is part of another word, and (iii) the uniqueness of an intended antecedent (in terms of world knowledge). We compare constructions in which intended antecedents occur in a context such as who owns an N / who is an N-owner with constructions of the type who was without an N / who was N-less. Our findings corroborate the existence of the overt NP antecedent constraint, and also show that the salience of an unsuitable antecedent’s position matters.
    [Show full text]
  • Definiteness in Languages with and Without Articles
    Definiteness in languages with and without articles Jingting Ye & Laura Becker Fudan University & Leipzig University 13th Conference on Typology and Grammar for Young Scholars 24.11.2016 ILS RAN, St Petersburg Jingting Ye & Laura Becker Definiteness and articles 1 / 49 1 Introduction 2 (In)definiteness and specificity 3 Outline: A pilot study based on multiple parallel movie subtitles 4 Results Comparing the use of articles Examples Factor strength based on random forests Other strategies to mark (in)definiteness Demonstratives Adnominal indefinites The numeral one Word order The levels of givenness Relevant factors for definiteness Trees Random forests 5 Concluding remarks Jingting Ye & Laura Becker Definiteness and articles 2 / 49 Introduction Introduction There are many accounts for definiteness, however, most rely on language-specific expressions, e.g. definite articles. Although comparative studies exist, no empirically based cross-linguistic study seems to be available yet that makes expressions of definiteness comparable directly. This pilot study explores the possibilities of parallel texts for comparing the expression of definiteness in languages with and without articles. The languages we examined are German, Hungarian, Russian, and Chinese. Jingting Ye & Laura Becker Definiteness and articles 3 / 49 (In)definiteness and specificity Definiteness Definiteness has been associated with the following concepts: uniqueness (Frege 1892; Strawson 1950; Heim & Kratzer 1998; Stanley & Gendler Szabó 2000) familiarity (Heim 1988; Roberts 2003; Chierchia 1995)
    [Show full text]
  • Arxiv:1805.11824V1 [Cs.CL] 30 May 2018
    Artificial Intelligence Review manuscript No. (will be inserted by the editor) Anaphora and Coreference Resolution: A Review Rhea Sukthanker · Soujanya Poria · Erik Cambria · Ramkumar Thirunavukarasu Received: date / Accepted: date Abstract Entity resolution aims at resolving repeated references to an entity in a document and forms a core component of natural language processing (NLP) research. This field possesses immense potential to improve the performance of other NLP fields like machine translation, sentiment analysis, paraphrase detection, summarization, etc. The area of entity resolution in NLP has seen proliferation of research in two separate sub-areas namely: anaphora resolution and coreference resolution. Through this review article, we aim at clarifying the scope of these two tasks in entity resolution. We also carry out a detailed analysis of the datasets, evaluation metrics and research methods that have been adopted to tackle this NLP problem. This survey is motivated with the aim of providing the reader with a clear understanding of what constitutes this NLP problem and the issues that require attention. Keywords Entity Resolution · Coreference Resolution · Anaphora Resolution · Natural Language Processing · Sentiment Analysis · Deep Learning 1 Introduction A discourse is a collocated group of sentences which convey a clear understanding only when read together. The etymology of anaphora is ana (Greek for back) and pheri (Greek for to bear), which in simple terms means repetition. In computational linguistics, anaphora is typically defined as references to items mentioned earlier in the discourse or \pointing back" reference as described by (Mitkov, 1999). The most prevalent type of anaphora in natural language is the pronominal anaphora (Lappin and Leass, 1994).
    [Show full text]
  • Event Coreference Resolution: a Survey of Two Decades of Research
    Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18) Event Coreference Resolution: A Survey of Two Decades of Research Jing Lu and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas Richardson, TX 75083-0688 fljwinnie,[email protected] Abstract Recent years have seen a gradual shift of focus from entity-based tasks to event-based tasks in informa- tion extraction research. Being a core event-based task, event coreference resolution is less studied but arguably more challenging than entity corefer- ence resolution. This paper provides an overview of the major milestones made in event coreference research since its inception two decades ago. Figure 1: The standard information extraction pipeline 1 Introduction It should be easy to see from this example that to per- Compared to entity coreference resolution, event coreference form end-to-end event coreference resolution, one has to resolution is less studied but arguably more challenging. To build an information extraction (IE) pipeline (cf. Figure 1) see its difficulty, consider the following example on within- that involves (1) extracting the entity mentions from a given document event coreference resolution, whose goal is to de- document (the entity extraction component) and determining termine which event mentions in a document refer to the same which of them are coreferent (the entity coreference compo- real-world event: nent); (2) extracting the event mentions by identifying their trigger words/phrases and determining which entity mentions Georges Cipriani fleftg a prison in Ensisheim ev1 are their arguments (the event extraction component); and (3) in northern France on parole on Wednesday.
    [Show full text]