XTAG - a Graphical Workbench for Developing Tree-Adjoining Grammars*

XTAG - a Graphical Workbench for Developing Tree-Adjoining Grammars*

XTAG - A Graphical Workbench for Developing Tree-Adjoining Grammars* Patrick Paroubek**, Yves Schabes and Aravind K. Joshi Department of Computer and Information Science University of Pennsylvania Philadelphia PA 19104-6389 USA pap/schabes/[email protected] Abstract XTAG runs under Common Lisp and X Window (CLX). We describe a workbench (XTAG) for the development of tree-adjoining grammars and their parsers, and dis- 1 Introduction cuss some issues that arise in the design of the graphical interface. Tree-adjoining grammar (TAG) [Joshi et al., 1975; Joshi, Contrary to string rewriting grammars generating 1985; Joshi, 1987] and its lexicalized variant [Schabes et trees, the elementary objects manipulated by a tree- al., 1988; Schabes, 1990; Joshi and Schabes, 1991] are adjoining grammar are extended trees (i.e. trees of depth tree-rewriting systems in which the syntactic properties one or more) which capture syntactic information of lex- of words are encoded as tree structured-objects of ex- ical items. The unique characteristics of tree-adjoining tended size. TAG trees can be combined with adjoining grammars, its elementary objects found in the ~lexicon and substitution to form new derived trees. 1 (extended trees) and the derivational history of derived Tree-adjoining grammar differs from more traditional trees (also a tree), require a specially crafted interface in tree-generating systems such as context-free grammar in which the perspective has Shifted from a string-based to two ways: a tree-based system. XTAG provides such a graphical 1. The objects combined in a tree-adjoining grammar interface in which the elementary objects are trees (or (by adjoining and substitution) are trees and not tree sets) and not symbols (or strings). strings. In this approach, the lexicon associates with The kernel of XTA G is a predictive left to right parser a word the entire structure it selects (as shown in for unification-based tree-adjoining grammar [Schabes, Figure 1) and not just a (non-terminal) symbol as 1991]. XTAG includes a graphical editor for trees, a in context-free grammars. graphical tree printer, utilities for manipulating and displaying feature structures for unification-based tree- 2. Unlike string-based systems such as context-free adjoining grammar, facilities for keeping track of the grammars, two objects are built when trees are com- derivational history of TAG trees combined with adjoin- bined: the resulting tree (the derived tree) and its ing and substitution, a parser for unification based tree- derivational history (the derivation tree). 2 adjoining grammars, utilities for defining grammars and These two unique characteristics of tree-adjoining lexicons for tree-adjoining grammars, a morphological grammars, the elementary objects found in the lexicon recognizer for English (75 000 stems deriving 280 000 in- (extended trees) and the distinction between derived tree flected forms) and a tree-adjoining grammar for English and its derivational history (also a tree), require a spe- that covers a large range of linguistic phenomena. cially crafted interface in which the perspective must Considerations of portability, efficiency, homogeneity be shifted from a string-based to a tree-based system. and ease of maintenance, lead us to the use of Common Lisp without its object language addition and to the use 1We assume familiarity throughout the paper with the of the X Window interface to Common Lisp (CLX) for definition of TAGs. See the introduction by Joshi [1987] the implementation of XTAG. for an introduction to tree-adjoining grammar. We refer the reader to Joshi [1985], Joshi [1987], Kroch and Joshi [1985], XTA G without the large morphological and syntactic Abeill~ et al. [1990a], Abeill~ [1988] and to Joshi and Schabes lexicons is public domain software. The large morpho- [1991] for more information on the linguistic characteristics logical and syntactic lexicons can be obtained through of TAG such as its lexicalization and factoring recursion out an agreement with ACL's Data Collection Initiative. of dependencies. 2The TAG derivation tree is the basis for semantic inter- *This work was partially supported by NSF grants DCR- pretation [Shieber and Schabes, 1990b], generation [Shieber 84-10413, ARO Grant DAAL03-87-0031, and DARPA Grant and Schabes, 1991] and machine translation [Abeill~ et al., N0014-85-K0018. 1990b] since the information given in this data-structure is **Visiting from the L aboratoire Informatique Th~orique et richer than the one found in the derived tree. Furthermore, Programmation, Institut Blaise Pascal, 4 place Jussieu, 75252 it is at the level of the derivation tree that ambiguity must PARIS Cedex 05, France. be defined. 223 s rithm [Chalnick, 1989]. The algorithm is an im- provement of the ones developed by R.eingold and NP NPo$ VP Tolford [1981] and, Lee [1987]. It guarantees in lin- ear time that tress which are structural mirror im- (0~1) D~N (/~1) V SI*NA ages of on another are drawn such that their dis- plays are reflections of one another while achieving J I minimum width of the tree. boy thi nk s • Capabilities for grouping trees into sets which can be linked to a file. This is particularly useful since lexicalized TAGs organize trees into tree-families NPo$ VP which capture all variations of a predicative lexical item for a given subcategorization frame. V NPI$ PP2 S • Utilities for editing and processing equations for unification based tree-adjoining grammar [Vijay- Shanker and ]oshi, 1988; Schabes, 1990]. take P NP2 NP0$ VP • A predictive left to right parser for unification-based tree-adjoining grammar [Schabes, 1991]. into N 2 V NPI$ • Utilities for defining a grammar (set of trees, set of I I tree families, set of lexicons) which the parser uses. account saw • Morphological lexicons for English [Karp et al., Figure 1: Elementary trees found in a tree-adjoining 1992] grammar lexicon • A tree-adjoining grammar for English that covers a large range of linguistic phenomena. XTA G provides such a graphical interface in which the 2 XTAG Components elementary objects are trees (or tree sets) and not sym- bols (or strings of symbols). The communication with the user is centralized around Skeletons of such workbenches have been previously the interface manager window (See Figure 2) which gives realized on Symbolics machines [Schabes, 1989; Schif- the user control over the different modules of XTAG. ferer, 1988]. Although they provided some insights on the architectural design of a TAG workbench, they were test I never expanded to a full fledged natural language envi- ronment because of inherent limitations (such as their lack of portability). Q Pp~ XTAG runs under Common Lisp [Steele, 1990] and it F ~~ha~utrMOVpnXL~ uses the Common LISP X Interface (CLX) to access the O eV,~x~Vpm~ graphical primitives defined by the Xll protocol. XTAG 0 ,~V'IMOVpnxl O QWlm~Vpn~. is portable across machines and Common Lisp compilers. The kernel of XTA G is a predictive left to right parser for unification-based tree-adjoining grammar [Schabes, F /mm,M~ ~Im~s~a~n~w~ ham,w~r m,OvsLm,m 0 ~Vl,tdVsl 1991]. The system includes the following components O pR0s~Vsl 0 com0~l and features: O !~v~umDVd G MimJVsl • Graphical edition of trees. The graphical display of F ~w~e]',sJNa~nml~s~lJsh,Snw~n~0Vslp~.trem a tree is the only representation of a tree accessible @ eW0~0Vslp~ to the user. Some of the operations that can be o epnmOWlp=~ performed graphically on trees are: O =Whu~Vsll~2 0 [~Vsll~2 - Add and edit nodes. o l~L~,~ov~ap~ 0 II~1~V~I~2 - Copy, paste, move or delete subtrees. 0 l~t~m~tt~p~ F ~~}Is~ Jpn~ml~n.~0 Val./re~ - Combine two trees with adjunction or substitu- O ao,~/al tion. These operations keep track of the deriva- O aW0nx~Va]. tional history and update attributes stated in form of feature structures as defined in the Figure 2: Manager Window. framework of unification-based tree-adjoining grammar [Vijay-Shanker and Joshi, 1988]. This window displays the contents of the tree buffers - View the derivational history of a derived tree currently loaded into the system. The different functions and its components (elementary trees). of XTAG are available by means of a series of pop-up menus associated to buttons, and by means of mouse • A tree display module for efficient and aesthetic for- actions performed on the mouse-sensitive items (such as matting of a tree based on a new tree display algo- the tree buffer names and the tree names). 224 A tree editor for a tree contained in one of the tree XTAG uses a centralized clipboard for all binary op- buffer contained in the window can be called up by click- erations on trees (all operations are either unary or bi- ing over its tree name. Each tree editor manages one tree nary). These operations (such as paste, adjoin or substi- and as many tree editors as needed can run concurrently. tute) are always performed between the tree contained For example, Figure 2 holds a set of files (such in XTAG's clipboard and the current tree. The contents as Tnx0Vsl.trees) 3 which each contain trees (such as of the clipboard can be displayed in a special view-only anx0Vsl). When this tree is selected for editing, the window. window shown in Figure 3 is displayed. Files can be The request to view the derivational history of a tree handled independently or in group, in which case they result of a combining operation triggers the opening of a form a tree family (flag F next to a buffer name). view-only window which displays the associated deriva- tion tree.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us