
I Integration of syntactic and lexical information in a hierarchical ! dependency grammar Cristina Barbero and Leonardo Lesmo and Vincenzo Lombardo Dipartimento di Informatica I Universit~ di Torino - Italy Paola Merlo Universit6 de Gen~ve - Switzerland I IRCS - University of Pennsylvania I Abstract level provided by syntactic rules is necessary to avoid In this paper, we propose to introduce syntactic the loss of generalization which would arise if class- classes in a lexicalized dependency formalism. Sub- level information were repeated in all lexical items. i categories of words are organized hierarchically from In parsing, a predictive component is required to a general, abstract level (syntactic categories) to a guarantee the valid prefiz property, namely the ca- word-specific level (single lexical items). The formal- pabifity of detecting as soon as possible whether a ism is parsimonious, and useful for processing. We substring is a valid prefix for the language defined I also sketch a parsing model that uses the hierarchi- by the grammar. Knowledge of syntactic categories, cal mixed-grain representation to make predictions which does not depend on the input, is needed for a on the structure of the input. parser to be predictive. In this paper we address the problem of the in- I 1 Introduction teraction between syntactic and lexical information in dependency grammar. We introduce many inter- Much recent work in linguistics and computational mediate levels between lexical items and syntactic linguistics emphasizes the role of lexical information categories, by organizing the grammar around the i in syntactic representation and processing. notion of subcategorizetion. Intuitively, a subcat- This emphasis given to the lexicon is the result egorization frame for a lexical item L is a specifi- of a gradual process. The original trend in linguis- cation of the number and type of elements that L I tics has been to individuate categories of words hav- requires in order, for ml utterance that contains L, ing related characteristics - the traditional syntactic to be well-formed. For example, within the syntac- categories like verb, noun, adjective, etc. - and to tic category VERB, different verbs require different express the structure of a sentence in terms of con- numbers of nominal dependents for a well-formed I stituents, or phrases, built around these categories. sentence. In Italian (our case study), an intransi- Subsequent considerations lead to a lexicalization of tive verb such as dormirv, "sleep", subcategorizes for grammar. Linguistically, the constraints expressed only one nominal element (the subject), while a tran- on syntactic categories are too general to explain sitive verb such as baciare, "kiss", subcategorizes for I facts about words - e.g. the relation between a verb two nominal elements (the subject and the object) and its nominalization, "destroy the city" and "de- 1. Grammatical relations such as subject and object struction of the city" - or to account uniformly for a are primitive concepts in a dependency paradigm, number of phenomena across languages - e.g. pas- | i.e. they directly define the structure of the sen- sivization. In parsing, the use of individual item tence. Consequently, the dependency paradigm is information reduces the search space of the possi- particularly suitable to define the grammar in terms ble structures of a sentence. From a mathematical of constraints on subcategorization frames. point of view, lexicalized grammars exhibit proper- Our proposal is to use subcategories organized in i ties - like finite ambiguity (Schabes, 1990) - that are of a practical interest (especially in writing real- a hierarchy: the upper level of the hierarchy corre- istic grammars). Dependency grammar is naturally sponds to the syntactic categories, the other levels I suitable for a lexicalization, as the binary relations correspond to subcategories that are more and more representing the structure of a sentence are defined 1We include the subject relation in the subcategorization, with respect to the head (that is a word). or valency, of a verb - cf. (Hudson, 1990) (Mel'cuk, 1988). Pure lexicalized formalisms, however, have also In most constituency theories, on the contrary, the subject is I several disadvantages. Linguistically, the abstract not part of the valency of a verb. I i 58 I specific as one descends the hierarchy. This repre- 2 A dependency formalism sentation is advantageous because of its compact- The basic idea of dependency is that the syntac- ness, and bemuse the hierarchical mixed-grained or- tic structure of a sentence is described in terms of ganization of the information is useful in processing. binary relations (dependency relations) on pairs of In fact, using the general knowledge at the upper words, a head (or parent), and a dependent (daugh- level of the hierarchy, we can make predictions on ter), respectively; these relations form a tree, the de- the structure of the sentence before encountering the pendency tree. In this section we introduce a formal lexical head. dependency system, which expresses the syntactic Hierarchical formalisms have been proposed in knowledge through dependency rules. The grammar some theories. Pollard and Sag (1987) suggested and the lexicon coindde, since the rules are lexical- a hierarchical organization of lexical information: ized: the head of the rule is a word of a certain cate- as far as subcategorization is concerned, they in- gory, namely the lexical anchor. The formalism is a troduced a "hierarchy of lexical types". A specific shnplified version of (Lombardo and Lesmo, 1998); formalisation of this hierarchy has never reached a we have left out the treatment of long-distance de- wide consensus in the HPSG community, but sev- pendencies to focus on the subcategorization knowl- eral proposals have been developed - see for example edge, which is to be represented in a hierarchy. (Meurers, 1997), that uses head subtypes and lexical principles to express generalizations on the valency A dependency grammar is a five-tuple <W,C,S,D, properties of words. H>, where Hudson (1990) adopts a dependency approach and W is a finite set of words of a natural language; uses hierarchies to organize different kinds of lin- C is a finite set of syntactic categories; guistic information, for instance a hierarchy includ- S is a non-empty set of categories (S _C C) that can ing word classes and lexical items. The subcatego- act as head of a sentence; rization constraints, however, are specified for each D is the set of dependency relations, for instance lexical item (for instance STAND -4 STAND-intrans, SUB J, OBJ, XCOMP, P-OB3, PRED; STAND-trans): this is highly redundant and misses H is a set of dependency rules of the form important generalizations. z:X (<raYl> ... <ri-l~-l> # <ri+l~+l> ... In LTAG (Joshi and Schabes, 1996), pure syntac- <rmYrn>) tic information is grouped around shared subcatego- 1) z E W, is the head of the rule; rization constraints (tree families). Hierarchical rep- 2) X E C, is its syntactic category; resentations of LTAG have been proposed: (Vijay- 3) an dement <rjYj> is a d-pair (which descri- Shanker and Schabes, 1992), (Becker, 1993), (Evans bes a dependent); the sequence of d-pairs, in- et al., 1995), (Candito, 1996), (Doran et al., 1997). eluding the special symbol # (representing the However, none of these works proposes to use the hi- linear position of the head), is called the d-pair erarchical representation in processing - just Vijay- sequence. We have that Shanker and Schabes (1992) mention, as a possible 3a) rj E D, j E {1,...,i - 1,i + 1 .... ,rn}; future investigation, the definition of parsing strate- 3b) Y~ ~ C,j ~ {1,...,i-l,i+l,...,m}; gies that take advantage of the hierarchical repre- Intuitively, a dependency rule constrains one node sentation. (head) and its dependents in a dependency tree: the The goal of our hierarchical formalism is twofold. d-pair sequence states the order of elements, both On one side, we want to provide a hierarchical orga- the head (# position) and the dependents (d-pairs). nization to a lexicalized dependency formalism: sim- The grammar is lexicalized, because each depen- ilarly to the hierarchical representations of LTAG, dency rule has a lexieal anchor in its head (z:X). the aim is to solve the problems of redundancy and A d-pair <riYi> identifies a dependent of category lexicon maintenance of pure lexicalized approaches. Yi, connected with the head via a dependency rela- On the other side, we want to explore how a hierar- tion rl. chical formAllgm can be used in processing in order As an example, consider the grammar 2: to get the maximum benefit from it. The paper is organized as follows: in section 2 we G--< describe a lexiealized dependency formalism that is a W : {gli, un, amici, eroe, lo, credevano} simplified version of (Lombardo and Lesmo, 1998). 2We use Italian terms to label grammatical relations - Starting from this formalism, we define in section see table 1. Since subcategorization frames are language- 3 the hierarchy of subcategories. In section 4, we dependent, we prefer to avoid confusions due to different ter- sketch a parsing model that uses the hierarchical minology across languages. For example, the relation Ter- mine - see the caption of figure 4 - actually corresponds to the grammar. In section 5, we describe an application indirect object in English. However l-Objundergoes the dou- of the formalism to the classification of 101 Italian ble accusative transformation into Obj, while Termine does verbs. Section 6 concludes the paper. not. 59 grammar) and sets of subcategories, L : W --~ 2 T- {}. That is, each word can "belong" to one or mo- re subcategories; D is a set of dependency relations (as in section 2); Q is a set of subcategorization frames. Each subcate- gorization frame is a total mapping q : D -4 Rx 2 T, where R is the set of pairs of natural numbers Figure 1: Dependency tree of the sentence Gg arnici io <nl,n~> such that nl _> 0,n2 _> 0 and nl ~ n2; credevano un eroe, "The friends considered him a hero", F is a bijection between subcategories and subcatv- given the grammar G.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-