Broad-Coverage Rule-Based Processing of Temporal Expressions
Total Page:16
File Type:pdf, Size:1020Kb
Broad-Coverage Rule-Based Processing of Temporal Expressions Pawe lPiotr Mazur Master of Science (M.Sc.) Macquarie University Centre for Language Technology Department of Computing Faculty of Science and Wroclaw University of Technology Software Engineering Department Institute of Informatics Faculty of Computer Science and Management This thesis is presented for the degree of Doctor of Philosophy Submitted in partial fulfilment of joint institutional requirements for the double-badged degree March 2012 Contents Abstract xi Abstract xii Statement of Candidate xiii Acknowledgements xv Publications xvii Terms, Definitions and Notational Conventions xix 1 Introduction 1 1.1 The Problem: Processing Temporal Expressions in Texts . .2 1.2 An Overview of the State-Of-The-Art . .4 1.3 What is Still Missing? . .7 1.4 The Aims of this Work . .8 1.5 The Contributions of the Thesis . .9 1.6 The Structure of the Thesis . .9 2 A Review of the Literature 11 2.1 Defining Temporal Expressions . 12 2.1.1 Temporal Ontology . 12 2.1.2 What Constitutes a Temporal Expression? . 15 2.1.3 Summary . 20 2.2 Taxonomising Temporal Expressions . 20 2.2.1 Point Expressions . 21 2.2.2 Period Expressions . 23 2.2.3 Set Expressions . 24 2.2.4 Summary . 25 2.3 Representing Temporal Expressions . 25 2.3.1 Temporal Information in Logic and Formal Semantics . 25 2.3.2 Representation via Attributes . 26 2.3.3 The Temporal Expression Language . 26 2.3.4 Temporal Concepts . 27 2.3.5 The Time Calculus for Natural Language . 28 2.3.6 A Functional Approach . 30 2.3.7 Timeline Finite-State Transducers . 31 2.3.8 The Computational Treatment of Temporal Notions . 32 i 2.3.9 Summary . 33 2.4 Annotating Temporal Expressions . 33 2.4.1 TIMEX . 33 2.4.2 TIMEX2 . 36 2.4.3 TIMEX3 in TimeML . 40 2.4.4 Summary . 43 2.5 Corpora with Annotated Temporal Expressions . 44 2.5.1 The MUC Corpora . 44 2.5.2 The TIDES Parallel Temporal Corpus . 45 2.5.3 The ACE 2004 Corpora . 46 2.5.4 The ACE 2005 Corpora . 47 2.5.5 The ACE 2007 Corpora . 48 2.5.6 The TimeBank Corpus . 49 2.5.7 Summary . 49 2.6 Approaches to the Processing of Temporal Expressions . 50 2.6.1 The Architecture of Taggers . 51 2.6.2 Rule-based Systems . 53 2.6.3 Machine-learning-based Systems . 57 2.6.4 Evaluation and State-of-the-Art Performance . 61 2.6.5 Summary . 64 2.7 Conclusions . 67 3 The WikiWars Corpus 69 3.1 The Limitations of Existing Corpora . 70 3.2 Creating WikiWars . 70 3.2.1 Selecting Data Sources . 71 3.2.2 Text Extraction and Preprocessing . 72 3.2.3 Creating Gold-Standard Annotations . 75 3.2.4 Observed Deficiencies of TIMEX2 . 77 3.2.5 Corpus Statistics . 80 3.3 The Nature of Wikipedia Articles . 80 3.3.1 Broken Narratives . 82 3.3.2 Ambiguous Writing . 83 3.3.3 Restarting the Time of Narrative . 84 3.3.4 The Use of Deictic Expressions . 85 3.3.5 The Use of Time-Zone Information . 85 3.3.6 Quotes Missing a Time-Stamp . 86 3.3.7 Grammatical Errors . 86 3.4 Conclusions . 87 4 A Taxonomy of Temporal Expressions 89 4.1 Basic Concepts . 90 4.2 What Counts as a Temporal Expression? . 93 4.3 Temporal Expressions Referring to Points . 96 4.3.1 Explicit Expressions . 97 4.3.2 Indexical Expressions . 98 4.4 Temporal Expressions Referring to Periods . 101 4.4.1 Unanchored Periods . 101 4.4.2 Anchored Periods . 102 ii 4.5 Temporal Expressions Referring to Sets . 104 4.5.1 Regularly Recurring Temporal Entities . 105 4.5.2 Irregularly Recurring Temporal Entities . 106 4.5.3 What Counts as a Set? . 106 4.6 Non-specific Expressions . 107 4.7 Conclusions . 109 5 Extent Recognition 111 5.1 The Extent of Temporal Expressions . 113 5.2 Syntactic Information for Extent Recognition . 114 5.3 The Selection of Triggers . 117 5.4 The Dependency-based Approach . 122 5.4.1 The Parsers . 123 5.4.2 Results . 124 5.4.3 Error Analysis . 129 5.4.4 Extent Recognition of Event-based Expressions . 132 5.5 The Constituency-based Approach . 135 5.5.1 The Experimental Set-up . 136 5.5.2 The Experiments . 136 5.5.3 Extent Recognition of Event-based Expressions . 142 5.6 Conclusions . 143 6 The Interpretation of Temporal Expressions 145 6.1 The Interpretation Task . 146 6.1.1 Local and Global Semantics . 146 6.1.2 The Representation of Global Semantics in TIMEX2 . 148 6.2 LTIMEX: A String-based Representation of Local Semantics . 150 6.2.1 Explicit Expressions . 151 6.2.2 Underspecified Expressions . 152 6.2.3 Offset Expressions . 153 6.2.4 Event-based Point Expressions . 155 6.2.5 Period Expressions . 157 6.2.6 Event-based Period Expressions . 158 6.2.7 Modified Expressions . 158 6.2.8 Ordinally-specified expressions . 158 6.2.9 Non-Specific Expressions . 159 6.2.10 Set Expressions . 160 6.2.11 Summary . 161 6.3 Temporal Focus Tracking . 162 6.3.1 The Phenomenon . 162 6.3.2 Related Work . 164 6.3.3 The Experimental Set-up . 169 6.3.4 The Experiments . 172 6.3.5 Summary . 176 6.4 The Interpretation of Bare Weekday Names . 177 6.4.1 What's the Problem? . 177 6.4.2 Related Work . 178 6.4.3 The Experimental Corpus and Set-up . 178 6.4.4 Evaluated Approaches . 179 iii 6.4.5 Results . 183 6.4.6 Error Analysis . 183 6.4.7 Summary . 184 6.5 Other Challenges and Problems . 185 6.5.1 Calendar Arithmetics . 185 6.5.2 The Interpretation of Some Underspecified Expressions . 185 6.5.3 The Twelve-Hour Clock . 186 6.5.4 Ambiguous Triggers . 187 6.5.5 Providing an Anchor for a Duration . 188 6.5.6 The Interpretation of Event-Based Expressions . 189 6.5.7 Time Flow in Speech Transcripts . 191 6.5.8 Sentence-initial Temporal Adverbials . 191 6.6 Conclusions . 191 7 The DANTE System 193 7.1 A Description of the System . 194 7.1.1 Preprocessing Components . ..