Experiences with the GTU Grammar Development Environment

Experiences with the GTU Grammar Development Environment

Experiences with the GTU grammar development environment Martin Volk Dirk Richarz University of Zurich University of Koblenz-Landau Department of Computer Science Institute of Computational Linguistics Computational Linguistics Rheinau 1 Winterthurerstr. 190 D-56075 Koblenz CH-8057 Zurich [email protected] [email protected] Abstract 1. a small hand-coded stem-lexicon whose vocabu- lary has been tailored towards the test sentences In this paper we describe our experi- (This lexicon also contains selectional restric- ences with a tool for the development tions for all its nouns and adjectives.), and testing of natural language gram- mars called GTU (German: Grammatik- 2. GerTWOL (Oy, 1994), a fast morphology anal- Testumgebumg; grammar test environ- ysis program, and ment). GTU supports four grammar for- 3. PLOD, a full-form lexicon that has been derived malisms under a window-oriented user in- from the CELEX lexical database (Baayen, terface. Additionally, it contains a set Piepenbrock, and van Rijn, 1995). of German test sentences covering various syntactic phenomena as well as three types GTU supports grammars under four formalisms: of German lexicons that can be attached to 1. Definite Clause Grammar (DCG, (Pereira and a grammar via an integrated lexicon inter- Shieber, 1987)) augmented with feature struc- face. What follows is a description of the tures, experiences we gained when we used GTU 2. Immediate Dominance / Linear Precedence as a tutoring tool for students and as an ex- Grammar (ID/LP; a subset of GPSG), perimental tool for CL researchers. From these we will derive the features necessary 3. Generalized Phrase Structure Grammar for a future grammar workbench. (GPSG, (Gazdar et al., 1985)), 4. Lexical Functional Grammar (LFG, (Kaplan 1 Introduction and Bresnan, 1982)). arXiv:cmp-lg/9707010v1 21 Jul 1997 GTU (German: Grammatik-Testumgebung; gram- Additionally, GTU provides a first step towards mar test environment) was developed as a flexible semantic processing of LFG f-structures. Thus a and user-friendly tool for the development and test- grammar developer may specify the way the seman- ing of grammars in various formats. Throughout tic module computes logical expressions for an f- the last 7 years it has been successfully used as a structure using semantic rules. In another module tutoring tool to supplement syntax courses in com- the selectional restrictions of the hand-coded lexi- putational linguistics at the Universities of Koblenz con can be used to compute if (a reading of) a sen- and Zurich. tence is semantically anomalous. This module can GTU has been implemented in Arity Prolog under be switched on and off when parsing a sentence. DOS and OS/2, and in SICStus Prolog under UNIX. GTU’s features have been published before (see In this paper we will concentrate on the UNIX ver- (Jung, Richarz, and Volk, 1994) or (Volk, Jung, and sion. GTU in this version is a stand-alone system Richarz, 1995)). In this paper we concentrate on of about 4.5 MB compiled Prolog code (not count- evaluating GTU’s features, comparing them to some ing the lexicons)1. GTU interacts with 3 German other workbenches that we have access to (mostly lexicons: GATE (Gaizauskas et al., 1996) and the Xerox LFG 1According to rearrangements of the operating sys- workbench (Kaplan and Maxwell, 1996)). From tem the actual memory requirements total about 7 MB this we derive recommendations for future grammar for both SUN OS 4.x and SUN OS 5.x. workbenches. 2 GTU - its merits and its limits ble grammar rules and lexicon interface rules under this formalism. Grammar rule notation Writing large grammars with GTU has sometimes One of the primary goals in the GTU project was to lead to problems in navigation through the grammar support a grammar rule notation that is as close as files. A grammar browser could be used to alliviate possible to the one used in the linguistics literature. these problems. The Xerox LFG-WB contains such This has been a general guideline for every formal- a browser. It consists of a clickable index of all rule ism added to the GTU system. Let us give some heads (i.e. all defined constituent symbols). Via this examples. Typical ID-rules in GTU are: index the grammar developer can comfortably access the rule definitions for a given constituent. (1)S ->NP[X], VP[X] | X = [kas=nom]. Static grammar checks (2) NP[kas=K] -> Det[kas=K, num=N], For the different formalisms in GTU, different types (AdjP[kas=K, num=N]), of parsers are produced. GPSG grammars are pro- N[kas=K, num=N]. cessed by a bottom-up chart parser, DCG and LFG Rule (1) says, that a constituent of type S con- grammars are processed by top-down depth-first sists of constituents of type NP and VP. The feature parsers. All parsers have specific problems with structures are given in square brackets. A capital some structural properties of a grammar, e.g. top- letter in a feature structure represents a variable. down depth-first parsers may run into infinite loops Identical variables within a rule stand for shared val- if the grammar contains (direct or indirect) left re- ues. Hence, the feature structures for NP and VP in cursive rules. rule (1) are declared to be identical. In addition the Therefore GTU provides a static check for detect- feature structure equation behind the vertical bar ing left recursions. This is done by building up a | specifies that X must be unified with the feature graph structure. After processing all grammar rules structure [kas=nom]. Rule (2) says that an NP con- and inserting all possible edges into the graph, the sists of a Det, an optional AdjP and an N. It also says grammar contains a possible left recursion if this that the features kas and num are set to be identi- graph contains at least one cycle. In a similar man- cal across constituents while only the feature kas is ner we can detect cycles within transitive LP rules passed on to the NP-node. or within alias definitions. There are further means for terminal symbols These checks have shown to be very helpful in un- within a grammar and a reserved word representing covering structural problems once a grammar has an empty constituent. grown to more than two dozen rules. The static In our experience the grammar rule notation helps checks in GTU have to be explicitly called by the the students in getting acquainted with the system. grammar developer. It would be better to perform But students still need some time in understanding these checks automatically any time a grammar is the syntax. In particular they are sometimes misled loaded into the system. by the apparent similarity of GTU’s ID-rules to Pro- A model for the employment of grammar checks log DCG-rules. While in Prolog constituent symbols is the workbench for affix grammars introduced by are atoms and are usually written with lower case (Nederhof et al., 1992), which uses grammar checks letters, GTU requires upper case letters as is custom- in order to report on inconsistencies (conflicts with ary in the linguistic literature. In addition students well-formedness conditions such as that every non- need a good understanding of feature structure uni- terminal should have a definition), properties (such fication to be able to manipulate the grammatical as LL(1)), and information on the overall grammar features within the grammar rules. structure (such as the is-called-by relation). For writing grammar rules GTU has an inte- grated editor that facilitates loading the grammar Output in different granularities into GTU’s database. A grammar thus becomes One of GTU’s main features is the graphics display immediately available for testing. Loading a gram- of parsing results. All constituent structures can be mar involves the translation of a grammar rule into displayed as parse trees. For LFG-grammars GTU Prolog. This is done by various grammar proces- additionally outputs the f-structure. For DCG and sors (one for each formalism). The grammar pro- GPSG the parse tree is also displayed in an indented cessors are SLR parsers generated from metagram- fashion with all features used during the parsing pro- mars. There is one metagrammar for each gram- cess. Output can be directed into one or multiple mar formalism describing the format of all admissi- windows. The multiple window option facilitates the comparison of the tree structures on screen. Pars- with the saved structures and to inform the user that ing results can also be saved into files in order to there now is an additional reading. In our compari- use them in documentations or for other evaluation son tool series of comparisons for multiple sentences purposes. can be run in the background. Their results are dis- The automatic graphic display of parsing results played in a table which informs about the numbers is an important feature for using GTU as a tutoring of readings for every sentence. tool. For students this is the most striking advantage This comparison tool is considered very helpful, over coding the grammar directly in a programming once the user understands how to use it. It should language. The GTU display works with structures be complemented with the option to compare the of arbitrary size. But a structure that does not fit output structures of two readings of the same input on the screen requires extensive scrolling. A zoom sentence. option could remedy this problem. Zooming into output structures is nicely inte- Tracing the parsing process grated into the Xerox LFG-WB. Every node in the Within GTU the parsing of natural language input parse tree output can be enlarged by a mouse click can be traced on various levels.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us