Using Decision Trees to Select the Grammatical Relation of a Noun Phrase

Using Decision Trees to Select the Grammatical Relation of a Noun Phrase

Using decision trees to select the gran natical relation of a noun phrase Simon CORSTON-OLIVER Microsoft Research One Microsoft Way Redmond WA 98052, USA [email protected] Abstract For example, transitive clauses tend not to contain We present a machine-learning approach to lexical NPs in both subject and object positions and modeling the distribution of noun phrases subjects of transitives tend not to be lexical NPs nor (NPs) within clauses with respect to a fine- to be discourse-new. grained taxonomy of grammatical relations. We Unfortunately, the models used in PAS have demonstrate that a cluster of superficial involved only simple chi-squared tests to identify linguistic features can function as a proxy for statistically significant patterns in the distribution of more abstract discourse features that are not NPs with respect to pairs of features (e.g. part of observable using state-of-the-art natural speech and grammatical relation). A further problem language processing. The models constructed from the point of view of computational discourse for actual texts can be used to select among analysis is that many of the features used in empirical alternative linguistic expressions of the same studies are not observable in texts using state-of-the propositional content when generating art natural language processing. Such non-observable discourse. features include animacy, the information status of a referent, and the identification of the gender of a 1. Introduction referent based on world knowledge. Natural language generation involves a number of In the present study, we treat the task of processes ranging from planning the content to be determining the appropriate distribution of mentions expressed through making encoding decisions in text as a machine learning classification problem: involving syntax, the lexicon and morphology. The what is the probability that a mention will have a present study concerns decisions made about the form certain grammatical relation given a deh set of and distribution of each "mention" of a discourse linguistic features? In particular, how accurately can entity: should reference be made with a lexical NP, a we select appropriate grammatical relations using pronominal NP or a zero anaphor (i.e. an elided only superficial linguistic features? mention)? Should a given mention be expressed as the subject of its clause or in some other grammatical 2. Data relation? A total of 5,252 mentions were annotated from the If all works well, a natural language generation Encarta electronic encyclopedia and 4,937 mentions system may end up proposing a mmaber of possible from the Wall Street Journal (WSJ). Sentences were well-formed expressions of the same propositional parsed using the Microsoft English Grammar content. Although these possible formulations would (Heidorn 1999) to extract mentions and linguistic all be judged to be valid sentences of the target features. These analyses were then hand-corrected to language, it is not the ease that they are all equally eliminate noise in the training data caused by likely to occur. inaccurate parses, allowing us to determine the upper Research in the area of Preferred Argument bound on accuracy for the classification task if the Structure (Corston 1996, Du Bois 1987) has computational analysis were perfect. Zero anaphors established that in discourse in many languages, were annotated only when they occurred as subjects including English, NPs are distributed across of coordinated clauses. They have been excluded grammatical relations in statistically significant ways. 66 from the present study since they are invariably • [NounClass] We distinguish common nouns discourse-given subjects. versus proper names. Within proper names, 3. Features we distinguish the name of a place ("Geo") versus other proper names ("ProperName"). Nineteen linguistic features were annotated, along with information about the referent of each mention. • [Plural] The head of the mention is On the basis of the reference information we morphologically marked as plural. extracted the feature [InformationStatus], distinguishing "discourse-new" versus "discourse- • [POS] The part of speech of the head of the old". All mentions without a prior coreferential mention. mention in the text were classified as discourse-new, • [Prep] The governing preposition, if any. even if they would not traditionally be considered referential. [InformationStatus] is not directly • [RelC1] The mention is a child of a relative observable since it requires the analyst to make clause. decisions about the referent of a mention. • [TopLevel] The mention is not embedded In addition to the feature [InformafionStatus], the within another mention. following eighteen observable features were annotated. These are all features that we can • [Words] The total number of words in the reasonably expect syntactic parsers to extract with mention, discretized to the following values: sufficient accuracy today or in the near future. {0, 1, 2, 3, 4, 5, 6to10, 1 lto15, abovel5}. Gender ([Fern], [Mast]) was only annotated for • [ClausalStatus]: Does the mention occur in a common nouns whose default word sense is gendered main clause ("M"), complement clause ("C"), (e.g. "mother", "father"), for common nouns with or subordinate clause ("S")? specific morphology (e.g. with the -ess suffix) and for gender-marked proper names (e.g. "John", • [Coordinated] The mention is coordinated "Mary"). Gender was not marked for pronouns, to with at least one sibling. avoid difficult encoding decisions such as the use of • [Definite] The mention is marked with the genetic "he". ~ Gender was also not marked for cases definite article or a demonstrative pronoun. that would require world knowledge. The feature [GrRel] was given a much finer- [Fem] The mention is unambiguously grained analysis than is usual in computational feminine. linguistics. Studies in PAS have demonstrated the • [GrRel] The grammatical relation of the need to distinguish finer-grained categories than the mention (see below, this section). traditional grammatical relations of English grammar ("subject", "object" ere) in order to account for • [HasPossessive] Modified by a possessive distributional phenomena in discourse. For example, pronoun or a possessive NP with the elitic's subjects of intransitive verbs pattern with the direct ors'. objects of transitive verbs as being the preferred locus • [HasPP] Contains a postmodifying pre- for introducing new mentions. Subjects of transitives, positional phrase. however, are strongly dispreferred slots for the expression of new information. The use of fine- • [HasRelC1] Contains a postmodifying relative grained grammatical relations enables us to make clause. rather specific claims about the distribution of • [InQuotes] The mention occurs in quoted mentions. The taxonomy of fine-grained grammatical material. relations is given below in Figure 1. • [Lex] The specific inflected form of a pronoun, e.g. he, him. • [Mase] The mention is unambiguously 1 The feature [Lex] was sufficient for the decision tree tools masculine. to learn idiosyncratic uses of gendered pronouns. 67 Subject of transitive (S.r) Subject of copula Subject (Sc) Subject of intransitive Object of transitive Subject of intransitive (non-copula)(Si) ///L (PN) Grammatical Po so.Po PCP e I Relation ,//'t, NPCPP. I ~! adjective(PP,) } ., .. I ~ PP complementof J ' N°un (NA~=ve I [ verb (PPv) I Other (Oth) J Figure 1 The taxonomy of grammatical relations where 0 < k _< 1, and c is a constant such that p(S) 4. Decision trees sums to one. Note that smaller values of kappa cause For a set of annotated examples, we used decision- simpler structures to be favored. As kappa grows tree tools to construct the conditional probability of a closer to one (k = 1 corresponds to a uniform prior specific grammatical relation, given other features in over all possible tree structures), the learned decision the domain, z The decision trees are constructed using trees become more elaborate. Decision trees were a Bayesian learning approach that identifies tree built for k~ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, structures with high posterior probability (Chickefing et al. 1997). In particular, a candidate tree structure 0.95, 0.99, 0.999}. Having selected a decision tree, we use the (S) is evaluated against data (D) using Bayes' rule as posterior means of the parameters to specify a follows: probability distribution over the grammatical P(SID) = constant- p(DIS) • p(S) relations. To avoid overfitting, nodes containing fewer than fifty examples were not split during the For simplicity, we specify a prior distribution learning process. In building decision trees, 70% of over tree structures using a single parameter kappa the data was used for training and 30% for held-out (k). Assuming that N(S) probabilities are needed to evaluation. parameterize a tree with structure S, we use: The decision trees constructed can be rather complex, making them difficult to present visually. Figure 2 p(S) c. k = gives a simpler decision tree that predicts the grammatical relation of a mention for Enearta at 2 Comparison experiments were also done with Support Vector Machines (Platt 2000, Vapnik 1998) using a variety of kernel functions. The results obtained were indistinguishable from those reported here. 68 ,=:..,°IY" ~)=o.~ I ,- "-. I t~,te,n}=o.o~ ._.....2 °~'---. It~=:,=o.o~ p(sc)=O.09 I I p(na)=0.O~ ] I p(nay=u..26 J p(ppv)=O.~ I P(~) =0-07 1 y ~ -~, p(sc}=0.1~l P~PPW~.-3 p(st)=o,os ~ .;~..~Zo I~(st)=o.o3 I I p(ppa)=o.o5 ~(poss)=o.: p(srn)=O.Ol p(so)=o.o2 Figure 2 Decision tree for Enearta, at k=0.7 k=0.7. The tree was constructed using a subset of the 5. Evaluating decision trees morphological and syntactic features: [Coordinated], Decision trees were constructed and evaluated for [HasPP], [Lex], [NounClass], [Plural], [POS], [Prep], each corpus. We were particularly interested in the [RelC1], [TopLevel], [Words]. Grammatical relations accuracy of models built using only observable with only a residual probability are omitted for the features.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us