TOWARDS BETTER UNDERSTANDING O15' ANAPHORA

Barbara Dunin-K@pH cz Institute of Informatics, Warsaw University P.O. Box 1210 00-901 Warszawa, Poland

ABSTRACT the text and moreover on zero , i.e. ellipsis of NP in the subject position, specific This paper presents a syntactical for Slavonic languages. method of interpreting pronouns in Polish, Using the surface structure of the sentence My purpose in the description of as well as grammatical and inflexional regularities of the in the Polish inlormation accessible during syntactic language. I shall express them by defining the analysis, an area of reference is marked out area of 's , i.e. those regions for each personal and possessive pronoun. of the text where its antecedents should De This area consists of a few internal areas found, q hese surface referents will be selected inside the current sentence and an external from among NPs occurring in the sentence. area, i.e. the part of the text preceding it. In order to determine that area of reference The research on anaphora made for several syntactic sentence-level restrictions English has led to the formulation of some on anaphora interpretation are formulated. structural rules using such relations as command, c-command and precede-and-command (Reinhart, Next, when looking at the area of 3.981). pronoun's reference, all NPs which number- I have been searching for analogous rules -gender agree with the pronoun can be for Polish. But two essential differences have selected and this way the set of surface to be considered: referents ol each pronoun can be created. (i) grammatical and morphological properties It can be used as data for further semantic of Polish and English; analysis. (ii) different grammatical traditions.

For English the rules concernig the of entities were forrrulated on the I INTRODUCTION basis of generative-transformational grammar. For Polish the first precise description of Reference is one of the central Polish was formulated only recently by concepts of any linguistic theory. In recent Szpakowicz, who based his work on the research into anaphora the term "reference" framework created by Saloni (Saloni, 1976; has been used in three different senses Saloni and Swidzinski, 1981). It is a kind of ( Szwedek, 1981): in,mediate-constituent grammar; the grammatical categories (case, ~ender, etc) are applied not (a) as a relation between the name and the only to single words, but also to compound thing named (Hall Partee, 1978) phrases. In my present ~vork I have limited my (b) as an association between noun phrases attention to the subset of Polish described and mental entities in the language user's by Szpakowicz (Szpako~Jvicz, 1983). (Nash-%~ebber, 1978) Folish is a highly inflexionat language and (c) as an association between the occurrence this fact has many and varied consequences. of phrases in the text (Reinhart, 1981) Surface referents of the pronoun will be selected from among those NPs which number- However the reference is understood, -gender agree with the pronoun. Strictly irl order to interpret correctly anaphora on the speaking, the grammatical categories of the semantic level ((a) and (b)), first a stage pronoun should be compatible with the (C) is necessary. categories of the NP, but in cases of neutralization they cannot be fully determined. in this paper I have taken the point of vie~ presented under (c). i shall discuss the My method of determining the areas of pronoun's problem o~ onaphora in Polish ser Atences. rvly reference is a syntachc one, because it is altentioF, is focused on personal ond based on morphological and syntactical possessive pronouns expticitely occurring in properties of the Polish language. I assume

139 the availability of the surface structure of the deeper than that on the surface, because they sentence as well as grammatical and inflexional refer to syntactical functions of phrases in the accessible during a syntactic sentence. The first rule presents the problem analysis. I detiberately do not make use of of coreference of the subject and other nominal any semantic information, trying to get the groups, i.e. objects and nominal trodifiers, in most out of grammar, ri'he feature I intend tO short called objects. It concerns reflexive provide is a complete definition of the area pronouns, so it should be noted first that they of pronoun's reference. differ from those in English, eg.: - possessive pronoun "sw6j" may have one of the following meanings: his, her, its.

- "siebie" can mean: himself, II AREA OF REFERENCE herself, itself, myself, ourself, yourself, themselves. A. Internal and external areas of reference The basic criterion of excludin~ In the process of determining the surface corference I have formulated from the referents of the pronoun, first the area of its analytical point of view: reference should be marked out. This area, i.e. those regions of the text, where its (R I) If the object is expressed by means of antecedents should be found, is usually made a reflexive pronoun, then it is up of several internal reference arehsp i.e. coreferential with the subject; in other the appropriate bits of the current sentence, cases the referential identity of the and an external area, the part of the text subject and object ist excluded. preceding the current sentence. The list of internal areas depends on the syntactic This criterion is applied both to look for position of the pronoun in the sentence. coreferents of objects - blocking the subject, q'o determine these areas it is necessary to and in testing the possible antecedents of the formulate sentence-level anaphora restrictions subject - blocking the objects. for Polish.. These rules will determine the Let us consider some examples: conditions of both obligatory coreference and Meaning of symbols: 0bii~atory non-coreference of entities. Thus .~ ,- obligatory coreference we have two situations to consider: , ./ r obligatory non-coreference (i) in the case of obligatory coreference one ~ --.- admissible coreference internal area of reference containing the reference to external area appropriate referent should be marked zero pronoun out; (ii) in the case of obligatory" non-coreference (2) "Ewa zapyta{a i o to" the elements which are forbidden as surface referents of the pronoun should "Eva asked her about it" be excluded from the internal area. The coreference of entities which is qualified 4 on the basis of some other premises will be (3) ~~i.~ o to" called admissible coreference. 7~ - "Aske~e m her about it" At our disposal we have a multileveled, (4) "Ona zapyta{a i o to" hierarchic surface structure of the sentence. Generally, it seems that internal areas can be "She asked her about it" identified with the constituents on the hi~hest level: subject, objects, modifiers, regardless (5) "On zapyta~ Jana o Piotra" of their syntactic realization. Strictly speaking, noun as well as NP or any sentential "He asked John about Peter" structures can be instances of internal areas of reference. (6) "Piotr nala{ sobie piw•" The partitioning of sentence (i) illustrates i%: "Peter poured himself beer" (i) "(Ewa i Piotr) poszli (do niego) (z dziewczynq, kt6r~% w{a~nie spotkali)". Rule R 1 holds for possessive pronouns:

"Eva and Peter went to him with a girl (7) "Ewa uwielbia swoj~ przyjaci6~k~" which just fret". "Eva adores her friend" [3. Rules ccncernin~ coreference of Now let us have a loo[~ at the case of the entities in Polish preposed PPs so difficult to interpret in i. The basic criterion of excluding English. The basic criterion of excluding coreference coreference covers these phrases too: (8) "ik'~.gle, obok J ana, ~) zobaczy~ wqza" The following rules of excluding the coreference of entities concern a level "Suddenly, near John, saw a snake" mast

140 (9) "Nagle, obok niego, ~ zobaczy~ w@za" (15.1) "John told (the sevant) (to wash him)"

"Suddenly, near him, saw a snake" (15.2) "John told (the servant) (to wash masc himself )" (10) "Nagle, obok siebie, zobaczy{ w~-a" In the infinitive phrase "umyd si@" ("to wash "Suddenly, near himself, saw a snake" him" or "to wash himself") which is standing masc in the object position, the reflexive pronoun (ii) "Nagle, obok siebie, --Jn zobaczy~ w~za" "si~" is coreferential with the deep subject of this phrase. Thus its interpretation has to be "Suddenly, near himself, he saw a snake" determined. Here we have two possibilities: In examples (10) and (13.) the reflexive (i) the previoux object- "servant" - pronoun has appeared. These are the only interpretation (15.1) two cases in which the coreference with the subject of the main sentence is permitted and (it) the subject of the main sentence - "John" even obligator'y. Such an interpretation is - interpretation (15.2) correct irrespective of the position of PP in the sentence, i.e. it does not depend on One of them is the referent of the deep whether this phrase precedes or follows the subject. And so we come to the next rule: subject. (R 2) In order to interpret the infinitive The basic criterion of excluding phrase, the deep subject of the phrase coreference works as follows: has to be selected from among the previous object (if any) and the (i) it is valid only for a simple clause, subject of the main sentence. without blocking coreference between the elements of the main sentence and the constituents of embedded clauses; 2. Excludin~ the coreference between (ii) it is obligatory on every level of the objects sentence, i.e. it concerns all the sentence constructions irrespective of The next sentence-level restriction of their position in the structure of the anaphora interpretation regulates the problem whole sentence. of coreference of l'4Ps other than a subject, Examples (12) to (14) illustrate this: i.e. objects, between them.

12) "Piot"~ nie wiedzia~, czy'~ pdjdzie do (R 3 ) The coreference of particular objets kina" is excluded. This in an obligatory non-coreference. "Peter did not know, whether would go to the movies" (16) "Jan zapyta{ eo o Piotra

13) "Jan zapomnia{, o co Pio£.F ~Q pyta{" "John asked him about Peter"

"John forgot, what Peter asked him aboulP "Jan zapyta~ e_qo o nie~o'

. ~ 14) Jan spotka{ ch*opca, kt6ry eo dawno "Jo2 ~a~ut him" ni e"o d~ e c~z'ii..... "4" (18) ,,ja n zapyta, P io~J~o H "John met a boy, who didn't visit him for lon~" "John asked Peter about him"

The interpretation of reflexive pronouns This rule does not hold for possessive is not so easy as the criterion R 1 suggests. pronouns which in Polish do not create NPs These pronouns can be involved in various by themselves. If these pronouns occur in compound phrases which often are ambiguous. objects, they may be coreferential with objects Especially infinitive phrases are hard to preceding them Cadmissible coreference). interpret. In order to do this correctly, an (19) "JaD zapyta~ Piot~ o ieRo brata" implicit agent which will be called further the / --~ deep subject, should be obtained. It often needs a few hypotheses to be formulated. "John asked Peter about his brother" Let us consider an example. The sentence: Rule R 2 is only valid for a simple clause, • (15) "Jan kaza{ stuzqcemu umyd siq" but it concerns all the sentence constructions irrespective of their position in the whole can be translated in two ways which exactly sentence. • . m ~lve the sense of possible Polish interpretations:

141 3. Rules of interpretinq compound (zJ) Zanim 4~.~2z~>~ zgasi~ ~wiat~o" sentences "Before leftmasc, turned Offmasc the light" "l~he next group of problems concerns the coreference of entities in a compound (24) "Poniewa~ %~¢ zapyta~ o to" sentence, including the question of the subject. In a Polish sentence it needs not be explicit. "Because forgot , asked about it" masc masc Ellipsis of the I'~P in the subject position, often called "the elided subject", is a natural (R ?) The elided subject in the embedded way of expressing "thematic cont,nu,ty'' " and clause is a natural way of indicating exemplifies an unaccented position in the the nearest candidate -the previous sentence. On the other hand, the pronoun as object (if it is there) or the subject the subject stands in syntactic opposition to of the main sentence (admissible the elided subject (zero pronoun) and coreference ). exemplifies an accented position in the sentence. (25) "Jan zapewni~"- -- Plotra,"'--" -- ze'*~ p6jdzie do kina" "~ __ -#' ~,'hile determining the antecedent of the subject of a simple sentence or a main clause "John promised Peter, that will go to in a compound sentence (explicit or implicit) the movies" we reach out to the external area of references. However, the basic criterion of (R 8 ) The pronoun or zero pronoun subject excluding coreference is still valid. in the main sentence can be coreferential with the non-pronoun (20) "Oh zap~a{ ~.~ o Pio~ra' subject of the embedded clause which precedes the main sentence (admissible "He~t Peter" coreference), but cannot be Coreferential with the non-pronoun The interpretation of compound sentences is subject of the embedded clause d~icult and sometimes leads to ambiguous following the main sentence (obligatory results. The following rules concern mainly the non-coreference ). coreference (or non-coreference) of elided subjects in co-ordinate and aubordinate (26) "Zanin Jan w-y-szed{, ~ zgasi{ ~wiat{o" clauses. In the case of co-ordinate clauses t~,o rules can be formulated: "Before John left, turned off the light" masc l (R 4) I~or each two clauses in a sequence, (27) "~ zz~gasi{ ~wiat~o, zanim J aan wyszed{" { if the elided subject is in the second clause, then the subject of the first "Turned off the light, before John left" masc clause should be extrapolated there (obliRatory coreference). (28) ,,__ O.~n-~ni e wiedzia~,/ czy ~iot.r. 156jdzie do kina" (21) "Piotr wsta~ od podszed~ do okna" "He didn't know, whether Peter will go "Peter left the table and approached the to the movies" window" 4. Interpretation of relative clauses (R 5) 5"or each two clauses in a sequence, the pronoun or zero pronoun subject Relative clauses are quite easy to in the first clause cannot be interpret in Polish. Either their subject or coreferential with the non-pronoun object is replaced with pronoun "which" or subject of the second clause "what" or their equivalents (only such types (obligatory non-coreference). of relative clauses are described in the Szpakowicz grammar). These pronouns (22) ¢~ od/to~,~-piot~ podszed~ do always indicate the NP next to which they okna" stand and inherit gender, number and person from it. rfhus the obligatory coreference of "lie left the table and Peter approached and this NP is determined. the window" Let us have a look at some examples: Interpreting subordinate clauses depends on (29) "E~'a zaprosi~ca Ani@, kt6r~ ~ zna{a od the relative position of the main and the dawna" embedded clause. "Eva invited Ann t which had known (R 6) If the embedded clause precedes the fem main clause and if both have elided for lon~" subjects, these have to be coreferential (obligatory coreference).

142 (30) "Ewa zaprosi~a An~ro~'~'~'~jsA. od NASH-WEBBER, Bonnie Lynn (1978). dawn~' A Formal Approach to Anaphora. Phl) "Eva invited Ann, which had known thesis, Harvard University (~'-J~ct) for her fang" PARTEE, Barbara Hall (1978). Bound Variables and Other Anaphors in: Waltz 1978, 79-85. III CONCLUSION REINHART, Tanya (1981). Definite NP Anaphora and C-Command Domains. The above syntactic method of in: Linguistic Inquiry, Vol 12, No 4, interpreting pronouns yields only partial results Fall 1981. - the list of internal areas of reference or the external area, both with certain restrictions on SALONI, Zygmunt (1976). Cechy sk{adniowe coreference, are determined. Next, more polskiego czasownika (Syntax detailed results can be obtained. 1~'hen looking Properties of Polish Verb). at the internal areas, all NPs which number- Ossolineum, Prace j~zykoznawcze, -gender agree with the pronoun should be 1976. selected and a list of surface referents of pronoun together with a list of elements SALONI Zygmunt, SWIDZINSKI Marek (1981). blocked as the referents can be drawn up. Skgadnia wsp6{czesnego j~zyka If no internal areas are marked out, the polskiego (Syntax of Contemporary external area with the list of blocked elements Polish Language). 1~'ydawnictwa is the result of the method presented here. Uniwersytetu 9Varszawskiego, 1981. Similary, while only admissible coreference is determined, the external area is marked out SZPAKOIA'ICZ, Stanis{aw (1983). Formalny too and the list of blocked elements remains opis sk~adnio~y" zda6 polskich. valid. On the other hand the obligatory (Formal Syntactic Description of coreference makes it possible to define the Polish sentences). INydawnictwa appropriate antecedent of the pronoun. The Uniwersytetu "vVarszawskiego, 1983. list of surface referents may be ordered by assunzin~ the specific method of traversing the SZWEDEK, Aleksander (1981). Word Order, parsin~ tree. I expext, that as for English, Sentence, Stress and Reference recency understood as a physical distance in English and Polish. WSP between the pronoun and its antecedent can be Bydgoszcz, 1981. the first approximation of the probability.

As expected the results of the method applied here need semantic verification. But at V ACKNOWLEDGEMENTS the same time they are a reasonable data for further semantic analysis. Data arrived at in I would like to acknowledge Janusz this way make this process much easier. Bie6 and Stanistaw Szpakowicz for their helpfuU comments on this paper. it seems that a similar procedure can be carried out for other languages. Full grammatical information should be used wherever it can simplify such complex process as the semantic analysis.

REFERENCES

HIRST, Oraeme (1979). Anaphora in Natural Language Understanding: A Survey. I~ept. of Compute Science, University of British Columbia.

HOBBS, Jerry R (1976). Computational Approach to Discourse Analysis. Artificial Intelligence Center, SRI International

HOBBS, Jerry lq (1978). Coherence and Coreference. Technical note 168. Artificial Intelligence Center, SRI international

143