Inferring Selectional Preferences from Part-Of-Speech N-Grams

Inferring Selectional Preferences from Part-Of-Speech N-grams Hyeju Jang and Jack Mostow Project LISTEN (www.cs.cmu.edu/~listen), School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA [email protected], [email protected] because originally it referred to a vocabulary Abstract word targeted for instruction, and r its “relative.” We present the PONG method to compute selectional preferences using part-of-speech Notation Description (POS) N-grams. From a corpus labeled with R a relation between words grammatical dependencies, PONG learns the t a target word distribution of word relations for each POS r, r' possible relatives of t N-gram. From the much larger but unlabeled g a word N-gram Google N-grams corpus, PONG learns the g and g ith and jth words of g distribution of POS N-grams for a given pair i j of words. We derive the probability that one p the POS N-gram of g word has a given grammatical relation to the other. PONG estimates this probability by Table 1: Notation used throughout this paper combining both distributions, whether or not either word occurs in the labeled corpus. Previous work on selectional preferences has PONG achieves higher average precision on used them primarily for natural language analytic 16 relations than a state-of-the-art baseline in tasks such as word sense disambiguation (Resnik, a pseudo-disambiguation task, but lower 1997), dependency parsing (Zhou et al., 2011), coverage and recall. and semantic role labeling (Gildea and Jurafsky, 2002). However, selectional preferences can 1 Introduction also apply to natural language generation tasks Selectional preferences specify plausible fillers such as sentence generation and question for the arguments of a predicate, e.g., celebrate. generation. For generation tasks, choosing the Can you celebrate a birthday? Sure. Can you right word to express a specified argument of a celebrate a pencil? Arguably yes: Today the relation requires knowing its connotations – that Acme Pencil Factory celebrated its one-billionth is, its selectional preferences. Therefore, it is pencil. However, such a contrived example is useful to know selectional preferences for many unnatural because unlike birthday, pencil lacks a different relations. Such knowledge could have strong association with celebrate. How can we many uses. In education, they could help teach compute the degree to which birthday or pencil word connotations. In machine learning they is a plausible and typical object of celebrate? could help computers learn languages. In Formally, we are interested in computing the machine translation, they could help generate probability Pr(r | t, R), where (as Table 1 more natural wording. specifies), t is a target word such as celebrate, r This paper introduces a method named PONG is a word possibly related to it, such as birthday (for Part-Of-Speech N-Grams) to compute or pencil, and R is a possible relation between selectional preferences for many different them, whether a semantic role such as the agent relations by combining part-of-speech of an action, or a grammatical dependency such information and Google N-grams. PONG as the object of a verb. We call t the “target” achieves higher precision on a pseudo- 377 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 377–386, Avignon, France, April 23 - 27 2012. c 2012 Association for Computational Linguistics disambiguation task than the best previous model relation between these two content words: sky is (Erk et al., 2010), but lower coverage. the location where flies occurs. Other function The paper is organized as follows. Section 2 words yield different collapsed dependencies. describes the relations for which we compute For example, consider these two sentences: selectional preferences. Section 3 describes The airplane flies over the ocean. PONG. Section 4 evaluates PONG. Section 5 The airplane flies and lands. relates PONG to prior work. Section 6 concludes. Collapsed dependencies for the first sentence include prep_over between flies and ocean, 2 Relations Used which characterizes their relative vertical position, and conj_and between flies and lands, Selectional preferences characterize constraints which links two actions that an airplane can on the arguments of predicates. Selectional perform. As these examples illustrate, collapsing preferences for semantic roles (such as agent and dependencies involving prepositions and patient) are generally more informative than for conjunctions can yield informative dependencies grammatical dependencies (such as subject and between content words. object). For example, consider these Besides collapsed dependencies, PONG infers semantically equivalent but grammatically inverse dependencies. Inverse selectional distinct sentences: preferences are selectional preferences of Pat opened the door. arguments for their predicates, such as a The door was opened by Pat. preference of a subject or object for its verb. In both sentences the agent of opened, namely They capture semantic regularities such as the set Pat, must be capable of opening something – an of verbs that an agent can perform, which tend to informative constraint on Pat. In contrast, outnumber the possible agents for a verb (Erk et knowing that the grammatical subject of opened al., 2010). is Pat in the first sentence and the door in the second sentence tells us only that they are nouns. 3 Method Despite this limitation, selectional preferences for grammatical dependencies are still useful, for To compute selectional preferences, PONG a number of reasons. First, in practice they combines information from a limited corpus approximate semantic role labels. For instance, labeled with the grammatical dependencies typically the grammatical subject of opened is its described in Section 2, and a much larger agent. Second, grammatical dependencies can be unlabeled corpus. The key idea is to abstract extracted by parsers, which tend to be more word sequences labeled with grammatical accurate than current semantic role labelers. relations into POS N-grams, in order to learn a Third, the number of different grammatical mapping from POS N-grams to those relations. dependencies is large enough to capture diverse For instance, PONG abstracts the parsed relations, but not so large as to have sparse data sentence Pat opened the door as NN VB DT NN, for individual relations. Thus in this paper, we with the first and last NN as the subject and use grammatical dependencies as relations. object of the VB. To estimate the distribution of A parse tree determines the basic grammatical POS N-grams containing particular target and dependencies between the words in a sentence. relative words, PONG POS-tags Google N- For instance, in the parse of Pat opened the door, grams (Franz and Brants, 2006). the verb opened has Pat as its subject and door Section 3.1 derives PONG’s probabilistic as its object, and door has the as its determiner. model for combining information from labeled Besides these basic dependencies, we use two and unlabeled corpora. Section 3.2 and Section additional types of dependencies. 3.3 describe how PONG estimates probabilities Composing two basic dependencies yields a from each corpus. Section 3.4 discusses a collapsed dependency (de Marneffe and Manning, sparseness problem revealed during probability 2008). For example, consider this sentence: estimation, and how we address it in PONG. The airplane flies in the sky. Here sky is the prepositional object of in, which 3.1 Probabilistic model is the head of a prepositional phrase attached to We quantify the selectional preference for a flies. Composing these two dependencies yields relative r to instantiate a relation R of a target t as the collapsed dependency prep_in between flies the probability Pr(r | t, R), estimated as follows. and sky, which captures an important semantic By the definition of conditional probability: 378 Pr(r , t , R ) Pr(R | t , r , p ) Pr( p | t , r ) Pr( t , r ) Pr(r | t , R ) Pr(tR , ) p Pr(tr , ) We care only about the relative probability of Cancelling the common factor yields: different r for fixed t and R, so we rewrite it as: Pr(R | p , t , r ) Pr( p | t , r ) Pr(r , t , R ) p We use the chain rule: We approximate the first term Pr(R | p, t, r) as Pr(R | r , t ) Pr( r | t ) Pr( t ) Pr(R | p), based on the simplifying assumption and notice that t is held constant: that R is conditionally independent of t and r, given p. In other words, we assume that given a Pr(R | r , t ) Pr( r | t ) POS N-gram, the target and relative words t and We estimate the second factor as follows: r give no additional information about the Pr(t , r ) freq( t , r ) probability of a relation. However, their Pr(rt | ) Pr(tt ) freq( ) respective positions i and j in the POS N-gram p matter, so we condition the probability on them: We calculate the denominator freq(t) as the number of N-grams in the Google N-gram Pr(R | p ,,) t r Pr( R | p ,,) i j corpus that contain t, and the numerator freq(t, r) Summing over their possible positions, we get as the number of N-grams containing both t and r. Pr(R | r , t ) To estimate the factor Pr(R | r, t) directly from Pr(R | p , i , j ) Pr( p | t g , r g ) a corpus of text labeled with grammatical ij p i j relations, it would be trivial to count how often a word r bears relation R to target word t. As Figure 1 shows, we estimate Pr(R | p, i, j) by However, the results would be limited to the abstracting the labeled corpus into POS N-grams. words in the corpus, and many relation We estimate Pr(p | t = gi, r = gj) based on the frequencies would be estimated sparsely or frequency of partially lexicalized POS N-grams missing altogether; t or r might not even occur.

Inferring Selectional Preferences from Part-Of-Speech N-Grams

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support