Computational Modelling of Coreference and Bridging Resolution

Computational modelling of coreference and bridging resolution Von der Fakultät Informatik, Elektrotechnik und Informationstechnik der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung. Vorgelegt von Ina Verena Rösiger aus Göppingen Hauptberichter Prof. Dr. Jonas Kuhn Mitberichter Prof. Dr. Simone Teufel Tag der mündlichen Prüfung: 28.01.2019 Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart 2019 Erklärung (Statement of Authorship) Hiermit erkläre ich, dass ich die vorliegende Arbeit selbständig verfasst habe und dabei keine andere als die angegebene Literatur verwendet habe. Alle Zitate und sinngemäßen Entlehnungen sind als solche unter genauer Angabe der Quelle gekennzeichnet. I hereby declare that this text is the result of my own work and that I have not used sources without declaration in the text. Any thoughts from others or literal quotations are clearly marked. Ort, Datum Unterschrift iii Contents 1. Introduction 5 1.1. Motivation ................................... 5 1.2. Research questions .............................. 7 1.3. Contributions and publications ....................... 9 1.4. Outline of the thesis ............................. 14 I. Background 17 2. Anaphoric reference 19 2.1. Coreference .................................. 22 2.2. Bridging .................................... 30 3. Related NLP tasks 35 3.1. Coreference resolution ............................ 35 3.2. Bridging resolution .............................. 41 II. Data and tool creation 47 4. Annotation and data creation 49 4.1. Coreference annotation and existing corpora ................ 49 4.2. Bridging annotation and existing corpora .................. 53 4.3. Newly created corpus resources ....................... 61 4.3.1. BASHI: bridging in news text .................... 63 4.3.2. SciCorp: coreference and bridging in scientific articles ....... 70 4.3.3. GRAIN: coreference and bridging in radio interviews ....... 79 4.3.4. Conclusion ............................... 81 5. Coreference resolution 83 5.1. Existing tools and related work ....................... 84 5.2. A coreference system for German ...................... 88 5.2.1. System and data ........................... 88 5.2.2. Adapting the system to German ................... 91 5.2.3. Evaluation ............................... 99 5.2.4. Ablation experiments ......................... 101 5.2.5. Pre-processing pipeline: running the system on new texts ..... 102 v Contents 5.2.6. Application on DIRNDL ....................... 104 5.3. Conclusion ................................... 106 6. Bridging resolution 109 6.1. A rule-based bridging system for English .................. 109 6.1.1. Reimplementation .......................... 111 6.1.2. Performance .............................. 120 6.1.3. Generalisability of the approach ................... 123 6.2. CRAC 2018: first shared task on bridging resolution ............ 124 6.2.1. The ARRAU corpus ......................... 125 6.2.2. Data preparation ........................... 125 6.2.3. Evaluation scenarios and metrics .................. 127 6.2.4. Applying the rule-based system to ARRAU ............ 128 6.3. A refined bridging definition ......................... 131 6.3.1. Referential bridging .......................... 132 6.3.2. Lexical bridging ............................ 134 6.3.3. Subset relations and lexical givenness ................ 135 6.3.4. Near-identity ............................. 137 6.3.5. Priming and bridging ......................... 137 6.4. Shared task results .............................. 138 6.4.1. Rules for bridging in ARRAU .................... 138 6.4.2. A learning-based method ....................... 143 6.4.3. Final performance .......................... 144 6.5. A rule-based bridging system for German .................. 146 6.5.1. Adaptation to German ........................ 147 6.5.2. Performance .............................. 157 6.6. Conclusion ................................... 160 III. Linguistic validation experiments 163 7. Using prosodic information to improve coreference resolution 165 7.1. Motivation ................................... 165 7.2. Background .................................. 167 7.3. Related work ................................. 169 7.4. Experimental setup .............................. 170 7.5. Prosodic features ............................... 171 7.6. Manual prosodic information ......................... 173 7.7. Automatically predicted prosodic information ............... 174 7.8. Results and discussion ............................ 176 7.9. Conclusion and future work ......................... 181 vi Contents 8. Integrating predictions from neural-network relation classifiers into coreference and bridging resolution 183 8.1. Relation hypotheses .............................. 184 8.2. Experimental setup .............................. 186 8.3. First experiment ............................... 187 8.3.1. Semantic relation classification .................... 187 8.3.2. Relation analysis ........................... 189 8.3.3. Relations for bridging resolution ................... 190 8.3.4. Relations for coreference resolution ................. 192 8.4. Second experiment .............................. 192 8.4.1. Semantic relation classification .................... 193 8.4.2. Relation analysis ........................... 194 8.4.3. Relations for coreference and bridging resolution .......... 194 8.5. Final performance of the bridging tool ................... 195 8.6. Discussion and conclusion .......................... 196 9. Conclusion 199 9.1. Summary of contributions .......................... 199 9.2. Lessons learned ................................ 202 9.3. Future work .................................. 207 Bibliography 209 vii List of figures 1.1. Four levels of contribution .......................... 8 1.2. Contributions to coreference resolution ................... 9 1.3. Contributions to bridging resolution ..................... 10 2.1. Reference as the relation between referring expressions and referents ... 19 3.1. Latent trees for coreference resolution: data structures .......... 38 4.1. Contribution and workflow pipeline for coreference: data creation .... 61 4.2. Contribution and workflow pipeline for bridging: data creation ...... 62 5.1. Contribution and workflow pipeline for coreference: tool creation ..... 84 6.1. Contribution and workflow pipeline for bridging: tool creation ...... 110 6.2. Contribution and workflow pipeline for bridging: task definition (reloaded)132 7.1. Contribution and workflow pipeline for coreference: validation, part 1 .. 166 7.2. One exemplary pitch accent shape ...................... 167 7.3. The relation between phrase boundaries and intonation phrases ..... 168 7.4. The relation between boundary tones and nuclear and prenuclear accents 168 7.5. Convolutional neural network model for prosodic event recognition .... 175 7.6. The relation between coreference and prominence: example from the DIRNDL dataset with English translation ................. 181 8.1. Contribution and workflow pipeline for coreference: validation, part 2 .. 184 8.2. Contribution and workflow pipeline for bridging: validation ........ 185 8.3. Neural net relation classifier: example of a non-related pair ........ 188 8.4. Neural net relation classifier: example of a hypernym pair ......... 188 8.5. Neural net relation classifier in the second experiment ........... 194 9.1. A data structure based on latent trees for the joint learning of coreference and bridging .................................. 206 viii List of tables 4.1. Guideline comparison: overview of the main differences between OntoN- otes, RefLex and NoSta-D .......................... 50 4.2. Existing corpora annotated with corerefence used in this thesis ...... 51 4.3. Existing corpora annotated with bridging used in this work ........ 58 4.4. An overview of the newly created data ................... 63 4.5. BASHI: corpus statistics ........................... 69 4.6. BASHI: inter-annotator agreement on five WSJ articles .......... 70 4.7. SciCorp: categories and links in our classification scheme ......... 73 4.8. SciCorp: overall inter-annotator-agreement (in κ) ............. 76 4.9. SciCorp: inter-annotator-agreement for the single categories (in κ) ... 77 4.10. SciCorp: corpus statistics .......................... 78 4.11. SciCorp: distribution of information status categories, in absolute numbers 78 4.12. SciCorp: distribution of information status categories, in percent ..... 79 5.1. IMS HotCoref DE: performance of the mention extraction module on TüBa-D/Z version 8 ............................ 91 5.2. IMS HotCoref DE: performance of the mention extraction module after the respective parse adjustments, on TüBa-D/Z version 8 ...... 94 5.3. IMS HotCoref DE: performance of the mention extraction module on TüBa-D/Z version 10 ........................... 94 5.4. Performance of IMS HotCoref DE on TüBa-D/Z version 10: gold vs. predicted annotations ........................ 99 5.5. SemEval-2010 official shared task results for German ........... 100 5.6. SemEval-2010 post-task evaluation ..................... 101 5.7. SemEval-2010: post-task evaluation, excluding singletons ......... 101 5.8. Performance of IMS HotCoref DE on TüBa-D/Z version 10: ablation experiments ............................. 102 5.9. CoNLL-12 format overview: tab-separated columns and content ..... 103 5.10. Markable extraction for the DIRNDL corpus ................ 105 5.11. Performance

Load more