Discourse-Givenness of Noun Phrases Theoretical and Computational Models

Discourse-Givenness of Noun Phrases Theoretical and Computational Models Dissertation zur Erlangung des akademischen Grades Doktor der Philosophie (Dr. phil.) der Humanwissenschaftlichen Fakultät der UniversitätPotsdam vorgelegt von Julia Ritz Stuttgart, Mai 2013 Published online at the Institutional Repository of the University of Potsdam: URL http://opus.kobv.de/ubp/volltexte/2014/7081/ URN urn:nbn:de:kobv:517-opus-70818 http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-70818 Erklärung(Declaration of Authenticity of Work) Hiermit erkläreich, dass ich die beigefügteDissertation selbstständigund ohne die un- zulässigeHilfe Dritter verfasst habe. Ich habe keine anderen als die angegebenen Quellen und Hilfsmittel genutzt. Alle wörtlich oder inhaltlich übernommenen Stellen habe ich als solche gekennzeichnet. Ich versichere außerdem, dass ich die Dissertation in der gegenwärtigenoder einer anderen Fassung nur in diesem und keinem anderen Promotionsverfahren eingereicht habe, und dass diesem Promotionsverfahren keine endgültiggescheiterten Promotionsverfahren vorausgegangen sind. Im Falle des erfolgreichen Abschlusses des Promotionsverfahrens erlaube ich der Fakultät, die angefügtenZusammenfassungen zu veröffentlichen. Ort, Datum Unterschrift iii iv iv Contents Preface xiii 1 Introduction 1 1.1 Discourse-Givenness and Related Concepts . .3 1.2 Goal . .4 1.3 Motivation . .6 1.4 The Field of Discourse-Givenness Classification . 10 1.5 Structure of this Document . 11 2 Concepts and Definitions 13 2.1 Basic Concepts and Representation . 13 2.1.1 Basic Concepts in Discourse . 14 2.1.2 Formal Representation . 15 2.2 Reference . 18 2.2.1 Identifying Basic Units . 20 2.2.2 Non-Referring Use of Expressions . 24 2.2.3 World of Reference . 25 2.2.4 Specificity . 27 2.2.5 Generalizations . 29 2.2.6 Vagueness and Ambiguity . 33 2.3 Coreference . 35 2.3.1 Consequences of Vagueness, Ambiguity and Non-Specificity . 36 2.3.2 Coreference vs. Lexical, Intensional or Extensional Identity . 37 2.3.3 Identity of Binding Index . 39 2.3.4 Asserted Identity . 42 2.3.5 Identity and Time Dependence . 45 2.3.6 Accommodation of Referents . 45 2.3.7 Bridging . 46 2.4 Coreference, Discourse-Givenness and Information Status . 48 3 Corpora and Annotation Schemes 53 3.1 Corpora Annotated with Discourse-Givenness or Coreference . 53 3.1.1 MUC-7 . 54 3.1.2 Nissim's Annotation of the Switchboard Corpus . 55 3.1.3 OntoNotes . 55 3.1.4 ARRAU . 55 v vi 3.1.5 PCC . 56 3.1.6 TüBa-D/Z. 56 3.1.7 DIRNDL . 57 3.2 Comparison of Annotation Schemes . 57 3.2.1 Markables and Preconditions . 58 3.2.1.1 Markables . 58 3.2.1.2 Preconditions . 68 3.2.2 Coreference and Context . 69 3.2.2.1 Discourse-Givenness and Coreference . 69 3.2.2.2 Notions of Context . 74 3.2.3 Formalization and Evaluation of the Annotation . 74 3.2.3.1 Formalization . 74 3.2.3.2 Evaluation . 75 3.2.4 A Critical Assessment of Existing Corpora . 77 4 Related Work and Technical Background 81 4.1 Data and Categories . 81 4.2 Features . 83 4.2.1 Properties of NPs . 83 4.2.2 NPs' Relations to their Context . 85 4.3 Algorithms . 87 4.3.1 Evaluation Measures . 93 4.4 Experiments and Results . 95 5 Discourse-Givenness Classification: New Experiments and Results 101 5.1 Data . 101 5.2 Methods . 102 5.2.1 Features . 102 5.2.2 Algorithms and Evaluation Measures . 109 5.3 Classification Results . 110 5.3.1 Quantitative Results . 110 5.3.1.1 OntoNotes . 110 5.3.1.2 MUC-7 . 116 5.3.1.3 ARRAU . 123 5.3.1.4 TüBa-D/Z . 126 5.3.2 Discussion . 132 5.3.2.1 Machine Learning Aspects . 132 5.3.2.2 Linguistic Aspects . 134 5.4 Comparison to Related Work . 135 5.4.1 OntoNotes . 136 5.4.2 MUC-7 . 136 5.4.3 Related Work in General . 136 6 Conclusions and Outlook 139 6.1 Conclusions . 139 6.2 Outlook . 140 vi vii 7 Summaries 143 7.1 Summary in English . 143 7.2 Summary in German (Zusammenfassung in deutscher Sprache)...... 146 References 150 Index 166 vii viii viii List of Figures 1 Coreference visualisation in ANNIS2 . xiv 1.1 Example Discourse . .1 1.2 Example Structure according to Rhetorical Structure Theory (RST) . .9 1.3 Coreference Resolution: Terminology . 10 2.1 DRS of `Jones owns Ulysses. It fascinates him.' . 16 2.2 DRS of `Susan has found every book which Bill needs. They are on his desk.' . 17 2.3 DRS of `Most students bought books that would keep them fully occupied during the next two weeks.' . 17 2.4 DRS of `Mary wants to marry a rich man. He is a banker.' . 26 2.5 DRS of `Mary wants to marry a rich man. He must be a banker.' . 26 2.6 DRS of Ìf parents are dissatisfied with a school, they should have the option of switching to another.' . 27 2.7 DRS of `Parents should have the option of switching to another school, if they are dissatisfied with the current school.' . 27 2.8 DRS of `Jones does not own a Porsche.' . 30 2.9 DRS of À wolf takes a mate for life.' . 30 2.10 DRS of `Sheep are quadrupedal, ruminant mammals typically kept as live- stock. Sheep are members of the order Artiodactyla.' . 33 2.11 Syntactic Tree Containing Traces . 42 2.12 DRS for `John took Mary to Acapulco. They had a lousy time.' . 46 2.13 DRS for `The students went camping. They enjoyed it.' . 47 2.14 Prince's (1981) Hierarchical Scheme of Familiarity . 49 2.15 Information Status, Discourse Status, and Hearer Status . 50 3.1 Syntactic Analysis of Possessives in MUC-7 . 60 3.2 Syntactic Analysis of Possessives in OntoNotes 1.0 . 60 3.3 Syntactic analysis of a PP with Fusion of Preposition and Definite Determiner 63 4.1 Example Decision Tree for Discourse-Givenness. 88 4.2 Pseudo-code for Decision Tree Construction . 88 4.3 Example Rules for Discourse-Givenness. 89 4.4 Pseudo-code for Rule Learners . 89 4.5 Support Vector Machines . 91 5.1 OntoNotes 1.0: Decision Tree . 113 ix x 5.2 OntoNotes 1.0: Learning Curve . 114 5.3 MUC-7: Decision Tree . 118 5.4 MUC-7: Learning Curve (original split) . 120 5.5 MUC-7: Learning Curve (random split) . 122 5.6 MUC-7: Learning Curve (all vs. Formaleval) . 122 5.7 ARRAU 1.2: Learning Curve . 124 5.8 ARRAU 1.2: Decision Tree . 125 5.9 TüBa-D/Z6.0: Learning Curve . 129 5.10 TüBa-D/Z6.0: Decision Tree (coreferential/anaphoric/bound) . 130 5.11 TüBa-D/Z6.0: Decision Tree (coreferential) . 131 x List of Tables 2.1 Types of Reference (Tentative) . 19 2.2 Specificity, Genericity and Coreference . 32 2.3 Concepts and Definitions . 51 3.1 Overview of Corpora Annotated with Coreference/Discourse-Givenness . 54 3.2 Markable Definitions (part I: English Corpora) . 64 3.2 Markable Definitions (part II: German Corpora) . 66 3.3 Definitions of Coreference (part I: English Corpora) . 70 3.3 Definitions of Coreference (part II: German Corpora) . 72 4.1 Overview: Approaches to Identifying Information Structure . 82 4.2 Comparison of Learning Algorithms . 92 4.3 Naming Conventions for Evaluation . 93 4.4 Example Contingency Table . 95 4.5 State of the Art Classification Results (part I) . 96 4.5 State of the Art Classification Results.

Discourse-Givenness of Noun Phrases Theoretical and Computational Models

Questions Under Discussion: from Sentence to Discourse∗

From Phenomenological Self-Givenness to the Notion of Spiritual Freedom IRIS HENNIGFELD

Givenness and Its Realization in a Linguistic and in a Non- Linguistic Environment

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

Focus and Givenness: a Uniﬁed Approach Michael Wagner Mcgill University

Coreference Resolution for Spoken French Loïc Grobol

BORE ASPECTS OP MODERN GREEK SYLTAX by Athanaaios Kakouriotis a Thesis Submitted Fox 1 the Degree of Doctor of Philosophy Of

Mind the GAP: a Balanced Corpus of Gendered Ambiguous Pronouns

Constraints on Donkey Pronouns

Definiteness in Languages with and Without Articles

Arxiv:1805.11824V1 [Cs.CL] 30 May 2018

Event Coreference Resolution: a Survey of Two Decades of Research