Documentation for the NITE XML Toolkit
Total Page:16
File Type:pdf, Size:1020Kb
Documentation for the NITE XML Toolkit Document revisions 0.3 - 11 June 09 0.2 - 28 March 07 0.1 - 12 Jan 07 JEAN CARLETTA STEFAN EVERT JONATHAN KILGOUR CRAIG NICOL DENNIS REIDSMA JUDY ROBERTSON HOLGER VOORMANN Table of Contents 1. Drafts before v1.0...........................................................4 Orthography as Textual Content versus as an String 2. A Basic Introduction to the NITE XML Toolkit ................5 Attribute .................................................................31 3. Downloading and Using NXT .........................................6 Use of Namespaces ..............................................31 3.1. Prerequisites ..........................................................6 Division into Files for Storage ................................31 3.2. Getting Started.......................................................6 Skipping Layers .....................................................32 Sample Corpora ......................................................6 4.8. Data Builds...........................................................32 3.3. Setting the CLASSPATH .......................................7 Examples and explanation of format .....................32 3.4. How to Play Media signals in NXT.........................7 5. The NXT Query Language (NQL).................................34 3.5. Programmatic Controls for NXT.............................8 5.1. General structure of a simple query ....................34 3.6. Compiling from Source and Running the Test Declaration part .....................................................34 Suites ............................................................................9 Condition part ........................................................35 4. Data..............................................................................10 5.2. Property tests .......................................................36 4.1. The NITE Object Model .......................................11 Simple and functional property expressions ..........36 4.2. The NITE Data Set Model....................................12 Property existence tests ........................................37 4.3. Data Storage........................................................14 String and number comparisons using == , != , <= , < Namespacing ........................................................14 , > , and >= ............................................................37 Coding Files...........................................................14 Regular expression comparisons ..........................38 Links ......................................................................14 5.3. Comments............................................................38 Data And Signal File Naming ................................15 5.4. Structural relations ...............................................38 4.4. Metadata ..............................................................16 Identity ...................................................................38 Preliminaries..........................................................16 Dominance.............................................................38 Top-level corpus description..................................16 Precedence............................................................39 Reserved Elements and Attributes (optional) ........17 5.5. Temporal relations ...............................................39 CVS Details (optional, beta) ..................................19 5.6. Quantifier..............................................................40 Independent Variables on Observations ..............19 5.7. Query results........................................................40 Agents (optional) ...................................................19 5.8. Complex queries ..................................................41 Signals (optional)...................................................20 5.9. Known Problems ..................................................41 Corpus Resources (optional).................................21 Multiple observations and timings..........................42 Ontologies (optional) .............................................22 Search GUI and forall ............................................42 Object Sets (optional) ............................................22 Immediate Precedence..........................................42 Codings and Layers...............................................23 Arithmetic ..............................................................42 Callable Programs (optional) .................................25 Inability to handle namespacing ............................42 Observations .........................................................25 Speed and Memory Use ........................................42 4.5. Dependency Structures .......................................26 5.10. Helpful hints .......................................................43 Resource File Syntax ............................................26 5.11. Related documentation ......................................43 Behaviour ..............................................................27 6. Analysis ........................................................................45 Validation...............................................................28 6.1. Command line tools for data analysis ..................45 4.6. Data validation .....................................................28 Preliminaries ..........................................................45 Limitations in the validation process......................28 Common Arguments..............................................46 Preliminaries - setting up for schema validation ....29 SaveQueryResults.................................................46 The validation process...........................................29 CountQueryResults ...............................................46 4.7. Data Set Design...................................................30 MatchInContext......................................................47 Children versus Pointers .......................................30 NGramCalc: Calculating N-Gram Sequences ......47 Possible Tag Set Representations ........................30 FunctionQuery: Time ordered, tab-delimited output, with aggregate functions .......................................48 Table of Contents Indexing .................................................................49 The Observer.........................................................73 6.2. Projecting Images Of Annotations .......................50 Other Formats........................................................73 Notes .....................................................................51 8.4. Exporting Data from NXT into Other Tools ..........73 6.3. Reliability Testing.................................................52 TGREP2 via Penn Treebank Format.....................73 Generic documentation .........................................52 TigerSearch ...........................................................76 MultiAnnotatorDisplay............................................52 8.5. Knitting and Unknitting NXT Data Files................76 CountQueryMulti....................................................53 Knit using Stylesheet .............................................76 Example reliability study ........................................53 Knit using LT XML2 ...............................................77 7. Graphical user interfaces .............................................57 Unknit using LT XML2 ...........................................77 7.1. Preliminaries ........................................................57 8.6. General Approaches to Processing NXT Data.....78 Invoking the GUIs ..................................................57 Option 1: Write an NXT-based application ............78 Time Highlighting...................................................57 Option 2: Make a tree, process it, and (for re-import- Search Highlighting ...............................................57 ation) put it back ....................................................78 7.2. Generic tools that work on any data ....................57 Option 3: Process using other XML-aware soft- The NXT Search GUI ............................................57 ware .......................................................................80 The Generic Display ..............................................58 8.7. Manipulating media files.......................................80 Appendix A. FAQ ............................................................81 7.3. Configurable end user coding tools .....................58 Appendix B. How To Use Metadata................................85 The signal labeller .................................................58 B.1. What metadata files do........................................85 The discourse entity coder ....................................58 B.2. What metadata files look like...............................85 The discourse segmenter ......................................58 B.3. Metadata examples .............................................85 The non-spanning comparison display..................59 B.4. Using Metadata to validate data ..........................86 The dual transcription comparison display ............59 Validation limitations ..............................................86 How to configure the end user tools ......................59 Appendix C. Comparison to other efforts ........................87 7.4. Libraries