A User Guide)

Developing Language Processing Components with GATE Version 5 (a User Guide) For GATE version 5.0 (built May 28, 2009) Hamish Cunningham Diana Maynard Kalina Bontcheva Valentin Tablan Cristian Ursu Marin Dimitrov Mike Dowman Niraj Aswani Ian Roberts Yaoyong Li Andrey Shafirin Adam Funk c The University of Sheffield 2001-2009 http://gate.ac.uk/ HTML version: http://gate.ac.uk/userguide Work on GATE has been partly supported by EPSRC grants GR/K25267 (Large-Scale Information Extraction), GR/M31699 (GATE 2), RA007940 (EMILLE), GR/N15764/01 (AKT) and GR/R85150/01 (MIAKT), AHRB grant APN16396 (ETCSL/GATE), and several EU-funded projects (SEKT, TAO, NeOn, MediaCampaign, MUSING, KnowledgeWeb, PrestoSpace, h-TechSight, enIRaF). Brief Contents 1 Introduction 2 1.1 How to Use This Text . .3 1.2 Context . .4 1.3 Overview . .5 1.4 Structure of the Book . .9 1.5 Further Reading . 10 2 Change Log 16 2.1 Version 5.0 (May 2009) . 16 2.2 Version 4.0 (July 2007) . 20 2.3 Version 3.1 (April 2006) . 24 2.4 January 2005 . 27 2.5 December 2004 . 28 2.6 September 2004 . 28 2.7 Version 3 Beta 1 (August 2004) . 28 2.8 July 2004 . 29 2.9 June 2004 . 30 2.10 April 2004 . 30 2.11 March 2004 . 31 2.12 Version 2.2 { August 2003 . 31 2.13 Version 2.1 { February 2003 . 31 2.14 June 2002 . 32 3 How To. 33 3.1 Download GATE* . 33 3.2 Install and Run GATE* . 34 3.3 [D,F] Use System Properties with GATE . 35 3.4 [D,F] Use (CREOLE) Plug-ins . 37 3.5 Troubleshooting . 38 3.6 [D] Get Started with the GUI* . 39 3.7 [D,F] Configure GATE . 40 3.8 Build GATE . 41 3.9 [D] Use GATE with Maven or JPF . 42 3.10 [D,F] Create a New (CREOLE) Resource . 43 3.11 [F] Instantiate (CREOLE) Resources . 46 3.12 [D] Load Resources: document, tokenizer...* . 49 3.13 [D,F] Configure (CREOLE) Resources . 51 3.14 [D] Create and Run an Application* . 51 3.15 [D] Run PRs Conditionally on Document Features . 52 3.16 [D] View Annotations* . 52 3.17 [D] Do Information Extraction with ANNIE* . 53 3.18 [D] Modify ANNIE . 54 i Brief Contents ii 3.19 [D] Create and Edit Annotations* . 54 3.20 [D] Saving annotations* . 58 3.21 [D,F] Create a New Annotation Schema . 58 3.22 [D] Save and Restore LRs in Data Stores . 59 3.23 [D] Save Resource Parameter State to File . 60 3.24 [D] Save an application with its resources (e.g. GATE Teamware) . 60 3.25 [D,F] Perform Evaluation with the AnnotationDiff tool . 62 3.26 [D] Use the Corpus Benchmark Evaluation tool . 63 3.27 [D] Write JAPE Grammars . 64 3.28 [F] Embed NLE in other Applications . 65 3.29 [F] Use GATE within a Spring application . 66 3.30 [F] Use GATE within a Tomcat Web Application . 68 3.31 [F] Use GATE in a Multithreaded Environment . 70 3.32 [D,F] Add support for a new document format . 72 3.33 [D] Dump Results to File . 73 3.34 [D] Stop GUI `Freezing' on Linux . 74 3.35 [D] Stop GUI Crashing on Linux . 75 3.36 [D] Stop GATE Restoring GUI Sessions/Options . 75 3.37 Work with Unicode . 75 3.38 Work with Oracle and PostgreSQL . 76 3.39 Annotate using ontologies . 77 4 CREOLE: the GATE Component Model 79 4.1 The Web and CREOLE . 80 4.2 Java Beans: a Simple Component Architecture . 81 4.3 The GATE Framework . 82 4.4 Language Resources and Processing Resources . 83 4.5 The Lifecycle of a CREOLE Resource . 84 4.6 Processing Resources and Applications . 85 4.7 Language Resources and Datastores . 86 4.8 Built-in CREOLE Resources . 86 4.9 CREOLE Resource Configuration . 87 5 Visual CREOLE 100 5.1 Gazetteer Visual Resource - GAZE . 100 5.2 Ontogazetteer . 102 5.3 The Document Editor . 104 6 Language Resources: Corpora, Documents and Annotations 108 6.1 Features: Simple Attribute/Value Data . 108 6.2 Corpora: Sets of Documents plus Features . 109 6.3 Documents: Content plus Annotations plus Features . 109 6.4 Annotations: Directed Acyclic Graphs . 109 6.5 Document Formats . 114 6.6 XML Input/Output . 128 Brief Contents iii 7 JAPE: Regular Expressions Over Annotations 129 7.1 Matching operators in detail . 136 7.2 Use of Context . 138 7.3 Use of Priority . 140 7.4 Use of negation . 141 7.5 Useful tricks . 142 7.6 Ontology aware grammar transduction . 145 7.7 Using Java code in JAPE rules . 146 7.8 Optimising for speed . 150 7.9 Serializing JAPE Transducer . 151 7.10 The JAPE Debugger . 151 7.11 Notes for Montreal Transducer users . 154 8 ANNIE: a Nearly-New Information Extraction System 158 8.1 Tokeniser . 159 8.2 Gazetteer . 162 8.3 Sentence Splitter . 164 8.4 RegEx Sentence Splitter . 164 8.5 Part of Speech Tagger . 165 8.6 Semantic Tagger . 166 8.7 Orthographic Coreference (OrthoMatcher) . 167 8.8 Pronominal Coreference . 167 8.9 A Walk-Through Example . 173 9 (More CREOLE) Plugins 176 9.1 Document Reset . 177 9.2 Verb Group Chunker . 177 9.3 Noun Phrase Chunker . 177 9.4 OntoText Gazetteer . 178 9.5 Flexible Gazetteer . 180 9.6 Gazetteer List Collector . 181 9.7 Tree Tagger . 183 9.8 Stemmer . 184 9.9 GATE Morphological Analyzer . ..

Load more