A Machine Learning Approach to German Pronoun Resolution Beata Kouchnir Department of Computational Linguistics Tubingen¨ University 72074 Tubingen,¨ Germany [email protected] Abstract The system presented in this paper resolves German pronouns in free text by imitating the This paper presents a novel ensemble manual annotation process with off-the-shelf lan- learning approach to resolving German guage sofware. As the avalability and reliability of pronouns. Boosting, the method in such software is limited, the system can use only question, combines the moderately ac- a small number of features. The fact that most curate hypotheses of several classifiers German pronouns are morphologically ambiguous to form a highly accurate one. Exper- proves an additional challenge. iments show that this approach is su- The choice of boosting as the underlying ma- perior to a single decision-tree classi- chine learning algorithm is motivated both by its fier. Furthermore, we present a stan- theoretical concept as well as its performance for dalone system that resolves pronouns in other NLP tasks. The fact that boosting uses the unannotated text by using a fully auto- method of ensemble learning, i.e. combining the matic sequence of preprocessing mod- decisions of several classifiers, suggests that the ules that mimics the manual annotation combined hypothesis will be more accurate than process. Although the system performs one learned by a single classifier. On the practical well within a limited textual domain, side, boosting has distinguished itself by achieving further research is needed to make it good results with small feature sets. effective for open-domain question an- swering and text summarisation. 2 Related Work Although extensive research has been conducted 1 Introduction on statistical anaphora resolution, the bulk of Automatic coreference resolution, pronominal and the work has concentrated on the English lan- otherwise, has been a popular research area in guage. Nevertheless, comparing different strate- Natural Language Processing for more than two gies helped shape the system described in this pa- decades, with extensive documentation of both per. the rule-based and the machine learning approach. (McCarthy and Lehnert, 1995) were among For the latter, good results have been achieved the first to use machine learning for coreference with large feature sets (including syntactic, se- resolution. RESOLVE was trained on data from mantic, grammatical and morphological informa- MUC-5 English Joint Venture (EJV) corpus and tion) derived from handannotated corpora. How- used the C4.5 decision tree algorithm (Quinlan, ever, for applications that work with plain text (e.g. 1993) with eight features, most of which were tai- question answering, text summarisation), this ap- lored to the joint venturte domain. The system proach is not practical. achieved an F-measure of 86.5 for full coreference resolution (no values were given for pronouns). are 82.79 and 84.94, respectively. Although a number this high must be attributed to the specific textual domain, RESOLVE also out- 3 Boosting performed the authors’ rule-based algorithm by All of the systems described in the previous sec- 7.6 percentage points, which encouraged further tion use a single classifier to resolve coreference. reseach in this direction. Our intuition, however, is that a combination of Unlike the other systems presented in this sec- classifiers is better suited for this task. The con- tion, (Morton, 2000) does not use a decision tree cept of ensemble learning (Dietterich, 2000) is algorithm but opts instead for a maximum entropy based on the assumption that combining the hy- model. The model is trained on a subset of the potheses of several classifiers yields a hypothesis Wall Street Journal, comprising 21 million tokens. that is much more accurate than that of an individ- The reported F-measure for pronoun resolution is ual classifier. 81.5. However, (Morton, 2000) only attempts to One of the most popular ensemble learning resolve singular pronouns, and there is no mention methods is boosting (Schapire, 2002). It is based of what percentage of total pronouns are covered on the observation that finding many weak hy- by this restriction. potheses is easier than finding one strong hypothe- (Soon et al., 2001) use the C4.5 algorithm with sis. This is achieved by running a base learning al- a set of 12 domain-independent features, ten syn- gorithm over several iterations. Initially, an impor- tactic and two semantic. Their system was trained tance weight is distributed uniformly among the on both the MUC-6 and the MUC-7 datasets, for training examples. After each iteration, the weight which it achieved F-scores of 62.6 and 60.4, re- is redistributed, so that misclassified examples get spectively. Although these results are far worse higher weights. The base learner is thus forced to than the ones reported in (McCarthy and Lehnert, concentrate on difficult examples. 1995), they are comparable to the best-performing Although boosting has not yet been applied rule-based systems in the respective competitions. to coreference resolution, it has outperformed As (McCarthy and Lehnert, 1995), (Soon et al., stateof-the-art systems for NLP tasks such as part- 2001) do not report separate results for pronouns. ofspeech tagging and prepositional phrase attach- (Ng and Cardie, 2002) expanded on the work ment (Abney et al., 1999), word sense disam- of (Soon et al., 2001) by adding 41 lexical, se- biguation (Escudero et al., 2000), and named en- mantic and grammatical features. However, since tity recognition (Carreras et al., 2002). using this many features proved to be detrimen- The implementation used for this project is tal to performance, all features that induced low BoosTexter (Schapire and Singer, 2000), a toolkit precision rules were discarded, leaving only 19. freely available for research purposes. In addition The final system outperformed that of (Soon et al., to labels, BoosTexter assigns confidence weights 2001), with F-scores of 69.1 and 63.4 for MUC-6 that reflect the reliability of the decisions. and MUC-7, respectively. For pronouns, the re- ported results are 74.6 and 57.8, respectively. 4 System Description The experiment presented in (Strube et al., 2002) is one of the few dealing with the applica- Our system resolves pronouns in three stages: tion of machine learning to German coreference preprocessing, classification, and postprocessing. resolution covering definite noun phrases, proper Figure 1 gives an overview of the system archi- names and personal, possessive and demonstrative tecture, while this section provides details of each pronouns. The research is based on the Heidelberg component. Text Corpus (see Section 4), which makes it ideal for comparison with our system. (Strube et al., 4.1 Training and Test Data 2002) used 15 features modeled after those used The system was trained with data from the Heidel- by state-of-the-art resolution systems for English. berg Text Corpus (HTC), provided by the Euro- The results for personal and possessive pronouns pean Media Laboratory in Heidelberg, Germany. Figure 1: System Architecture The HTC is a collection of 250 short texts (30-700 person, number and gender. The possible val- tokens) describing architecture, historical events ues are 1s, 1p, 2s, 2p, 3m, 3f, 3n, 3p. and people associated with the city of Heidelberg. To examine its domain (in)dependence, the system semantic class: human, physical object (in- was tested on 40 unseen HTC texts as well as on cludes animals), or abstract. When the se- 25 articles from the Spiegel magazine, the topics mantic class is ambiguous, the ”abstract” op- of which include current events, science, arts and tion is chosen. entertainment, and travel. type: if the entity that the markable refers to is new to the discourse, the value is ”none”. If 4.2 The MMAX Annotation Tool the markable refers to an already mentioned The manual annotation of the training data was entity, the value is ”anaphoric”. An anaphoric done with the MMAX (Multi-Modal Annotation markable has another attribute for its rela- in XML) annotation tool (Muller¨ and Strube, tion to the antecedent. The values for this at- 2001). The fist step of coreference annotation is to tribute are ”direct”, ”pronominal”, and ”ISA” identify the markables, i.e. noun phrases that refer (hyponym-hyperonym). to real-word entities. Each markable is annotated with the following attributes: To mark coreference, MMAX uses coreference sets, such that every new reference to an already np form: proper noun, definite NP, indefinite mentioned entity is added to the set of that entity. NP, personal pronoun, possessive pronoun, or Implicitly, there is a set for every entity in the dis- demonstrative pronoun. course - if an entity occurs only once, its set con- tains one markable. grammatical role: subject, object (direct or 4.3 Feature Vector indirect), or other. The features used by our system are summarised agreement: this attribute is a combination of in Table 4.3. The individual features for anaphor Feature Description The chunker is based on a head-lexicalised prob- pron the pronoun abilistic context free grammar (H-L PCFG) and ana npform NP form of the anaphor achieves an F-measure of 92 for range only and ana gramrole grammatical role of the 83 for range and label, whereby a range of a noun anaphor chunk is defined as ”all words from the beginning ana agr agreement of the anaphor of the noun phrase to the head noun”. This is dif- ana semclass* semantic class of the anaphor ferent from manually annotated markables, which ante npform NP form of the antecedent can be complex noun phrases. ante gramrole grammatical role of the an- Despite good overall performance, the chunker tecedent fails on multi-word proper names in which case it 1 ante agr agreement of the antecedent marks each word as an individual chunk.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-