Array TFS Storage for Unification Grammars

Array TFS storage for unification grammars Glenn C. Slayden A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science University of Washington 2012 Committee: Emily M. Bender Stephan Oepen Program Authorized to Offer Degree: Linguistics – Computational Linguistics Abstract Constraint-based grammar formalisms such as Head-Driven Phrase Structure Grammar (HPSG) model linguistic entities as sets of attribute-value tuples headed by types drawn from a connected multiple-inheritance hierarchy. These typed feature structures (TFSes) describe di- rected graphs, allowing for the application of graph-theoretic analyses. In particular, graph unification—the computation of the most general structure that is consistent with a set of argu- ment graphs (if such a structure exists)—can be interpreted as expressing the satisfiability and combination functions for the represented linguistic entities, thus providing a principled method for describing syntactic elaboration. In competent natural language grammars, however, the graphs are typically large and numerous, and computational efficiency is a key engineering concern. This thesis describes a method for the storage of typed feature structures where each TFS comprises a self-contained, contiguous memory allocation with a tabular internal structure. Also detailed is an efficient unification algorithm for this storage mode. The techniques are evaluated in agree, a new managed-execution concurrent unification chart parser which supports both syntactic analysis (parsing) and surface realization (generation) within the framework of the DELPH-IN (Deep Linguistic Processing with HPSG Initiative) joint refer- ence formalism. Acknowledgements ..................................................................................................................... iv 1 Introduction .......................................................................................................................... 1 2 Typed feature structures ....................................................................................................... 4 2.1 TFS formalism .............................................................................................................. 4 2.1.1 Type hierarchy and feature appropriateness ......................................................... 5 2.1.2 Functional formalism ............................................................................................ 9 2.1.3 DAG interpretation ............................................................................................. 10 2.1.4 Contrasting the functional and graph approaches ............................................... 12 2.1.5 Well-formedness ................................................................................................. 13 2.2 Design considerations for TFS storage ....................................................................... 14 2.2.1 Node allocation and discarding ........................................................................... 15 2.2.2 Structure sharing ................................................................................................. 17 2.2.3 The feature arity problem .................................................................................... 18 2.3 Array storage TFS formalism ..................................................................................... 21 2.3.1 Notation ............................................................................................................... 23 2.3.2 Formal description .............................................................................................. 25 2.3.3 Node governance ................................................................................................. 29 2.3.4 Array TFS graphical form ................................................................................... 33 2.3.5 Properness of the out-tuple relation .................................................................... 34 2.3.6 Feature enumeration ............................................................................................ 35 2.4 Summary ..................................................................................................................... 37 3 Array TFS storage implementation .................................................................................... 38 3.1 Managed-code............................................................................................................. 38 3.1.1 Value types .......................................................................................................... 39 3.1.2 Direct access ....................................................................................................... 41 3.2 Engineering ................................................................................................................. 43 3.2.1 Root out-tuple ..................................................................................................... 43 3.2.2 4-tuple layout ...................................................................................................... 44 3.2.3 Mark assignment ................................................................................................. 46 3.2.4 Hash access ......................................................................................................... 47 3.3 Example ...................................................................................................................... 50 i 3.4 Future work in array storage ....................................................................................... 54 3.5 Summary ..................................................................................................................... 58 4 Unification ......................................................................................................................... 59 4.1 Well-formed unification ............................................................................................. 60 4.2 Prior work in linguistic unification ............................................................................. 61 4.2.1 UNION-FIND .......................................................................................................... 62 4.2.2 PATR-II: environments and structure sharing .................................................... 63 4.2.3 D-PATR .............................................................................................................. 64 4.2.4 Incremental unification ....................................................................................... 64 4.2.5 Lazy approaches .................................................................................................. 65 4.2.6 Chronological dereferencing ............................................................................... 66 4.2.7 Later work in term unification ............................................................................ 66 4.2.8 Strategic lazy incremental copy graph unification .............................................. 66 4.2.9 Quasi-destructive unification .............................................................................. 67 4.2.10 Concurrent and parallel unification ..................................................................... 69 4.3 n-way unification ........................................................................................................ 70 4.3.1 Motivation ........................................................................................................... 71 4.3.2 Procedure ............................................................................................................ 71 4.3.3 Evaluation ........................................................................................................... 73 4.3.4 Classifying unifier performance .......................................................................... 74 4.3.5 Summary ............................................................................................................. 75 4.4 Array TFS unification ................................................................................................. 75 4.4.1 COMP-ARC-LIST .................................................................................................... 76 4.4.2 Scratch field mapping ......................................................................................... 78 4.4.3 Scratch slot initialization and discarding ............................................................ 80 4.4.4 Scratch slot implementation ................................................................................ 82 4.4.5 Array TFS storage slot mappings ........................................................................ 85 4.4.6 Slot mapping selection ........................................................................................ 85 4.4.7 Disjoint feature coverage .................................................................................... 87 4.4.8 Fallback fetch ...................................................................................................... 88 4.4.9 Summary of the first pass .................................................................................... 89 ii 4.4.10 Node counting ..................................................................................................... 90 4.4.11 Writing pass .......................................................................................................

Load more