Animacy Encoding in English: Why and How
Total Page:16
File Type:pdf, Size:1020Kb
Animacy Encoding in English: why and how Annie Zaenen Jean Carletta Gregory Garretson PARC & Stanford University HCRC-University of Edinburgh Boston University 3333 Coyote Hill Road 2, Buccleuch Place Program in Applied Linguistics Palo Alto, CA 94304] Edinburgh EH8LW, UK 96 Cummington St., [email protected] [email protected] Boston, MA 02215 [email protected] Joan Bresnan Andrew Koontz-Garboden Tatiana Nikitina CSLI-Stanford University CSLI-Stanford University CSLI-Stanford University 220, Panama Street 220, Panama Street 220, Panama Street Stanford CA 94305 Stanford CA 94305 Stanford CA 94305 [email protected] [email protected] [email protected] M. Catherine O’Connor Tom Wasow Boston University CSLI-Stanford University Program in Applied Linguistics 220, Panama Street 96 Cummington St., Stanford CA 94305 Boston, MA 02215 [email protected] [email protected] Abstract of entity representation within language: the definiteness dimension is linked to the status of the We report on two recent medium-scale initiatives entity at a particular point in the discourse, the annotating present day English corpora for animacy person hierarchy depends on the participants distinctions. We discuss the relevance of animacy for within the discourse, and the animacy status is an computational linguistics, specifically generation, the annotation categories used in the two studies and the inherent characteristic of the entities referred to. interannotator reliability for one of the studies. Each of these aspects, however, orders entities on a scale that makes them more or less salient or 1 Introduction ‘accessible’ when humans use their language. It has long been known that animacy is an The importance of accessibility scales is not important category in syntactic and morphological widely recognized in computational treatments of natural language analysis. It is less evident that natural language. This contrasts with the situation this dimension should play an important role in in linguistics where such scales have been practical natural language processing. After recognized as playing an important role in the reviewing some linguistic facts, we argue that it organization of sentence syntax and discourse. does play a role in natural language generation and translation, describe a schema for the annotation of Even in natural language studies, however, their animacy distinctions, evaluate the reliability of the importance has been underestimated because the scheme and discuss some results obtained. We role of these scales is not always to distinguish conclude with some remarks on the importance of between grammatical and ungrammatical animacy and other accessibility dimensions for the utterances but often that of distinguishing mainly architecture of generation schemes between felicitous and infelicitous ones, especially in languages such as English. 2 The animacy dimension in natural language Grammaticality and acceptability The animacy hierarchy is one of the accessibility As long as one’s attention is limited to the scales that are hypothesized to influence the distinction between grammatical and grammatical prominence that is given to the ungrammatical sentences, the importance of the realization of entities within a discourse. Three animacy hierarchy in particular is mainly relevant important scales (sometimes conflated into one, for languages with a richer morphology than also called animacy hierarchy in Silverstein, 1976), English. In such languages animacy distinctions are the definiteness, the person and the animacy can influence grammaticality of e.g. case-marking hierarchy proper. We assume these are three and voice selection. To give just one example, in different hierarchies that refer to different aspects Navaho, a bi-form is used rather than an yi-form whenever the patient is animate and the agent is performed (see description below, section 5) also inanimate, whereas the yi-form is used when the encode some of these interacting factors. The agent is animate and the patient is inanimate as LINK-project encodes information status (see illustrated in (1) (from Comrie 1989 p. 193). Nissim et al, 2004) and the Boston project encodes definiteness and expression type (i.e., NP form) as (1) (a) At’ééd nímasi yi-díílíd proxies for information status. girl potato burnt The girl burnt the potato. 3 Animacy as a factor in generation and (b) At’ééd nímasi bi-díílíd translation girl potato burnt As long as animacy was discussed as a relevant The potato burnt the girl. grammatical category in languages that had not been studied from a computational point of view, Other phenomena discussed in the literature are its importance for computational linguistics was agreement limited to animate noun phrases perceived as rather limited. The fact that it (Comrie, 1989), overt case-marking of subject permeates the choice between constructions in limited to inanimates or overt case-marking of languages such as English changes this perception. objects limited to animates (Aissen, 2003, see also The category is of obvious importance for high Bossong 1985 and 1991), object agreement in quality generation and translation. Bantu languages (see e.g. Woolford, 1999), choice between direct and inverse voice in Menominee For instance, if one is faced with the task of (Bloomfield, 1962, see also, Trommer, n.d.). generating a sentence from a three place predicate, P (a,b,c), and one has the choice of rendering either Recent linguistic studies have highlighted the b or c as the direct object, knowing that c is human importance of the category in languages such as and b is abstract would lead one to choose c ceteris English. For instance the choice between the Saxon paribus. However, everything else is rarely equal genitive and the of-genitive (Rosenbach, 2002, and the challenge for generation will be to assign 2003, O’Connor et al. 2004, Leech et al. 1994), the exact relative weights to factors such as between the double NP and the prepositional animacy, person distinction and recency. dative (Cueni et al, work in progress) and between Moreover, the importance of these factors needs to active and passive (Rosenbach et al. 2002, Bock et be combined with that of heterogeneous al. 1992, McDonald et al. 1993) and between considerations such as the length of the resulting pronominal and full noun reference (Dahl and expression. Fraurud, 1996, based on Swedish data) have all been shown to be sensitive to the difference In the context of translation we need also to keep between animate and inanimate referents for the in mind the possibility that the details of the arguments with variable realization. In these cases animacy ranking might be different from language the difference between animate and inanimate does to language and that the relative impact of animacy not lead to a difference between a grammatical or and other accessibility scale factors might be an ungrammatical sentence as in the cases cited in different from construction to construction. the previous paragraph but to a difference in acceptability. 4 The Animacy Hierarchy Interaction between animacy and other scales Given the pervasive importance of animacy information in human languages one might expect As mentioned above, the term ‘animacy hierarchy’ it to be a well-understood linguistic category. is used in two ways, one to refer to an ordering Nothing could be farther from the truth. Linguistic imposed on definiteness, animacy and person and descriptions appeal to a hierarchy that in its the other where it refers to animacy proper. The minimal form distinguishes between human, non- reason for this lies in the interaction between the human animate and inanimate but can contain different factors that determine accessibility. more distinctions, such as distinctions between higher and lower animals (see Yamamoto, 1999 for It is conceptually desirable to distinguish between a particularly elaborated scheme). animacy and definiteness but in practice it is frequently the case that a linguistic phenomenon is What makes it difficult to develop clear categories conditioned by more than one of these of animacy is that the linguistically relevant conceptually independent factors (see e.g. Comrie, distinctions between animate and non-animate and 1989 for discussion). The projects in the context between human and non-human are not the same of which the annotation tasks described here were as the biological distinctions. Part of this research animacy will, however, be important for more is devoted to discovering the principles that detailed studies of the interaction between the underlie the distinctions; and the type of various accessibility hierarchies. This more precise distinctions proposed depend on the assumptions notion will be needed for cross-linguistic studies, that a researcher makes about the underlying and, in the context of natural language processing, motivation for them, e.g. as a reflection of the for high quality generation and translation. language user’s empathy with living beings (e.g. Yamamoto, 1999). What is of particular interest 5 Animacy Annotation for natural language processing is the observation As we have discussed above, the animacy scale is that the distinctions are most likely not the same an important factor in the choice of certain across languages (cf. Comrie, 1989) and can even construction in English. But it is only a soft change over time in a given language. They are constraint and as such outside of the realm of similar to other scalar phenomena such as voicing