An Abstract Machine for Unification Grammars

An Abstract Machine for Unification Grammars with Applications to an HPSG Grammar for Hebrew Research Thesis Submitted in partial fulfillment of the requirements for the degree of Doctor of Science Shalom Wintner arXiv:cmp-lg/9709013v1 23 Sep 1997 Submitted to the Technion – Israel Institute of Technology Tevet5757 Haifa January1997 The work described herein was supervised by Prof. Nissim Francez Under the Auspices of the Faculty of Computer Science Acknowledgments This work could never have been what it is without the support I received from my advisor, Nissim Francez. I am grateful to Nissim for introducing me to this subject, leading and advising me throughout the project, bearing with me when I messed things up and encouraging me all along the way. Many thanks are due to the members of my Thesis Committee: Bob Carpenter, Michael El- hadad, Alon Itai, Uzzi Ornan and Mori Rimon. I am especially indebted to Bob for his major help, mostly in the preliminary stages of this project. I also want to thank Uzzi for replacing Nissim when he was on sabbatical. While working on this thesis I spent a fruitful summer in the University of Tübingen, Seminar für Sprachwissenschaft, during which I learned a lot from many discussions. I wish to thank Paul King for his generosity and his wisdom. Thanks are also due to Erhard Hinrichs, Dale Gerdemann, Thilo Götz, Detmar Meurers, John Griffith and Frank Morawietz. I am grateful to Evgeniy Gabrilovich for many fruitful discussions and for going over my code. Special thanks to Holger Maier and Katrine Kirk for their hospitality and their company. Finally, I want to thank Yifat and Galia for being there when I needed it. The generous financial help of the Minerva Stipendien Komitee and Intel Israel Ltd. is grate- fully acknowledged. The research was supported by a grant from the Israeli Ministry of Science: “Programming Languages Induced Computational Linguistics” and by the Fund for the Promotion of Research in the Technion. The project was carried out under the auspices of the Laboratory for Computational Linguistics of the Technion. Contents Abstract 1 1 Introduction 3 1.1 Motivation ........................................ 3 1.2 LiteratureSurvey.................................. ... 4 1.2.1 GrammaticalFormalisms ............................ 4 1.2.2 TheCurrentRoleofHPSG ........................... 5 1.2.3 AbstractMachineTechniques . 6 1.2.4 ProcessingHPSG................................. 7 1.2.5 ComputationalGrammarsforHebrew . .. 8 1.3 AchievementsoftheThesis .. .. .. .. .. .. .. .. .. .. .. .. ... 9 1.4 StructureofthisDocument . .... 10 2 Parsing with Typed Feature Structures 11 2.1 TheoryofFeatureStructures . ...... 11 2.2 Subsumption ....................................... 14 2.3 Well-FoundednessofSubsumption . .... 14 2.4 AbstractFeatureStructures . ....... 15 2.5 Unification ........................................ 19 2.6 A LinearRepresentationofFeatureStructures . ......... 21 2.7 Multi-rootedStructures .. .. .. .. .. .. .. .. .. .. .. .. .... 23 2.8 RulesandGrammars.................................. 27 2.9 ParsingasOperationalSemantics. ..... 32 2.10 ProofofCorrectness .............................. ..... 34 2.10.1 Soundness..................................... 35 2.10.2 Completeness................................... 36 2.10.3 SubsumptionCheck ............................... 37 2.10.4 Termination.................................... 37 3 AMALIA – An Abstract Machine for Linguistic Applications 40 3.1 TheInputLanguage ................................. .. 40 3.1.1 TypeSpecification ................................ 40 3.1.2 RulesandGrammars............................... 41 3.2 ATFSUnificationEngine................................ 41 3.2.1 First-OrderTermsvs.FeatureStructures . ...... 41 3.2.2 ProcessingScheme ................................ 42 3.2.3 Memory Representation of Feature Structures . ....... 43 3.2.4 FlatteningFeatureStructures . ... 43 3.2.5 ProcessingofaQuery .............................. 44 3.2.6 Compilation of the Type Hierarchy . 45 3.2.7 ProcessingofaProgram............................. 47 3.3 Parsing .......................................... 49 3.3.1 AParsingAlgorithm............................... 50 3.3.2 DataStructures.................................. 52 3.3.3 Compilation.................................... 52 3.3.4 EffectoftheMachineInstructions . .. 54 3.4 Optimizations and Extensions . ... 58 3.4.1 LazyEvaluationofFeatureStructures . .... 58 3.4.2 Partial Descriptions . 59 3.4.3 EmptyCategories ................................ 60 3.4.4 LexicalAmbiguity ................................ 60 3.4.5 FunctionalAttachments . 61 3.5 Implementation..................................... 61 4 An HPSG-based Grammar for Hebrew 63 4.1 PhraseStructureSchemata . ..... 63 4.2 TheStructureofNounPhrasesinHebrew . ...... 64 4.3 The Status of the Definite Article in Hebrew . .... 67 4.3.1 TheData ..................................... 68 4.3.2 HPSGApproach ................................. 68 4.3.3 An Analysis of Hebrew Definites . 69 4.4 Noun-NounConstructs. .. .. .. .. .. .. .. .. .. .. .. .. .... 71 5 Conclusion 74 A List of Machine Instructions 76 B The Hebrew Grammar 77 Bibliography 85 List of Figures 2.1 Afeaturestructure................................ .... 12 2.2 AninfinitedecreasingsequenceofTFSs . ..... 16 2.3 Anabstractfeaturestructure . ....... 18 2.4 Agraph-andAVM-representationofaMRS . ..... 24 2.5 Anexamplegrammar .................................. 28 2.6 Strongderivation................................... .. 30 2.7 Aleftmostderivation................................. .. 31 3.1 Anexampletypehierarchy ............................. .. 41 3.2 AnexamplegrammarinALEformat. .. 42 3.3 Heap representation of the TFS b(b( 1 d, 1 ),d) ..................... 43 3.4 Featurestructuresassetsofequations . ......... 44 3.5 The effect of the put instructions – put ........................ 44 3.6 Compiled code for the query b(b( 1 d, 1 ),d) ....................... 45 3.7 Typeunificationtables................................ .. 45 3.8 unify type[a,b] ..................................... 46 3.9 The effect of the type unification instructions . ....... 47 3.10 Compiled code for the program a( 3 d1, 3 ) ....................... 47 3.11 Implementation of the get/unify instructions . ........ 48 3.12 The code of the unify function............................. 49 3.13 Parsing – informal description . ..... 51 3.14 The effect of the proceed instruction.......................... 53 3.15 The effect of put rule .................................. 53 3.16Compiledrule....................................... 54 3.17 Compiled code for a grammar with k rules....................... 55 3.18 Compiled code obtained for the example grammar . ...... 56 3.19 The effect of the key-manipulation instructions . ........ 57 3.20 Theeffect of the edgetraversalinstructions . .......... 57 3.21 The effect of the call instruction............................ 58 3.22 The effect of the copy instructions ........................... 58 3.23 The fail function .................................... 59 3.24 Performance comparison of AMALIA andALE ................... 62 4.1 Subject-Headschema............................... .... 65 4.2 Head-Complementschema. ... 65 4.3 Head-Markerschema................................ ... 66 4.4 Head-Adjunctschema ............................... ... 66 4.5 The lexical entry of the noun sepr ............................ 67 4.6 TheeffectoftheDefiniteLexicalRule . .... 70 4.7 The effect of the ‘nismak’ lexical rule . .... 72 Abstract Contemporary linguistic formalisms have become so rigorous that it is now possible to view them as very high level declarative programming languages. Consequently, grammars for natural languages can be viewed as programs; this view enables the application of various methods and techniques that were proved useful for programming languages to the study of natural languages. One of the most successful implementation techniques for logic programming languages involves the use of an abstract machine. In this approach one defines an abstract machine with the fol- lowing properties: it is close enough to the high-level language, thus allowing efficient compilation to the abstract machine language; and it is sufficiently low-level to allow efficient interpretation of the machine instructions on a variety of host architectures. Abstract machines were used for processing procedural and functional languages, but they gained much popularity for logic programming languages since the introduction of the Warren Abstract Machine (WAM). Most current implementations of Prolog, as well as other logic languages, are based on abstract machines. The incorporation of such techniques usually leads to very efficient compilers in terms of both space and time requirements. In this work we have designed and implemented an abstract machine, AMALIA, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. AMALIA is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives

An Abstract Machine for Unification Grammars

Lecture Notes in Computer Science

Unification Grammars Nissim Francez and Shuly Wintner Frontmatter More Information

Chp%3A10.1007%2F3-54

Amalia--A Unified Platform for Parsing and Generation

Semantics of Nondeterminism, Concurrency, and Communication*

David S. Brée

Off-Line Parsability and the Well-Foundedness of Subsumption

Proceedings of the 15Th Meeting on the Mathematics of Language (MOL 2017), Held at Queen Mary University of London, on July 13–14, 2017

Notes on Computational Linguistics

Elephant 2000: a Programming Language Based on Speech Acts

Rechnion - Israel Institute Ot Technology Computer Science Dep~Rtment

Categorial Grammar and the Semantics of Contextual Prepositional Phrases