Mining the Web for Fine-Grained Semantic Verb Relations

VERBOCEAN: Mining the Web for Fine-Grained Semantic Verb Relations

Timothy Chklovski and Patrick Pantel Information Sciences Institute University of Southern California 4676 Admiralty Way Marina del Rey, CA 90292 {timc, pantel}@isi.edu

In this paper, we present an algorithm that semi- Abstract automatically discovers fine-grained verb seman- Broad-coverage repositories of semantic relations tics by querying the Web using simple lexico- between verbs could benefit many NLP tasks. We syntactic patterns. The verb relations we discover present a semi-automatic method for extracting are similarity, strength, antonymy, enablement, and fine-grained semantic relations between verbs. We temporal relations. Identifying these relations over detect similarity, strength, antonymy, enablement, 29,165 verb pairs results in a broad-coverage re- and temporal happens-before relations between source we call VERBOCEAN. Our approach extends pairs of strongly associated verbs using lexico- previously formulated ones that use surface pat- syntactic patterns over the Web. On a set of 29,165 terns as indicators of semantic relations between strongly associated verb pairs, our extraction algo- nouns (Hearst 1992; Etzioni 2003; Ravichandran rithm yielded 65.5% accuracy. Analysis of and Hovy 2002). We extend these approaches in error types shows that on the relation strength we achieved 75% accuracy. We provide the two ways: (i) our patterns indicate verb conjuga- resource, called VERBOCEAN, for download at tion to increase their expressiveness and specificity http://semantics.isi.edu/ocean/. and (ii) we use a measure similar to mutual information to account for both the frequency of the 1 Introduction verbs whose semantic relations are being discovered as well as for the frequency of the pattern. Many NLP tasks, such as question answering, summarization, and machine translation could 2 Relevant Work benefit from broad-coverage semantic resources such as WordNet (Miller 1990) and EVCA (Eng- In this section, we describe application domains lish Verb Classes and Alternations) (Levin 1993). that can benefit from a resource of verb semantics. These extremely useful resources have very high We then introduce some existing resources and precision entries but have important limitations describe previous attempts at mining semantics when used in real-world NLP tasks due to their from text. limited coverage and prescriptive nature (i.e. they 2.1 Applications do not include semantic relations that are plausible but not guaranteed). For example, it may be valu- Question answering is often approached by can- able to know that if someone has bought an item, onicalizing the question text and the answer text they may sell it at a later time. WordNet does not into logical forms. This approach is taken, inter include the relation “X buys Y” happens-before “X alia, by a top-performing system (Moldovan et al. sells Y” since it is possible to sell something with- 2002). In discussing future work on the system’s out having bought it (e.g. having manufactured or logical form matching component, Rus (2002 p. stolen it). 143) points to incorporating entailment and causa- Verbs are the primary vehicle for describing tion verb relations to improve the matcher’s per- events and expressing relations between entities. formance. In other work, Webber et al. (2002) Hence, verb semantics could help in many natural have argued that successful question answering language processing (NLP) tasks that deal with depends on lexical reasoning, and that lexical rea- events or relations between entities. For tasks soning in turn requires fine-grained verb semantics which require canonicalization of natural language in addition to troponymy (is-a relations between statements or derivation of plausible inferences verbs) and antonymy. from such statements, a particularly valuable re- In multi-document summarization, knowing verb source is one which (i) relates verbs to one another similarities is useful for sentence compression and and (ii) provides broad coverage of the verbs in the for determining sentences that have the same target language. meaning (Lin 1997). Knowing that a particular action happens before another or is enabled by Entailment finer-grained relations such as strength, enablement and temporal information. Also, in contrast +Temporal Inclusion -Temporal Inclusion with WordNet, we cover more than the prescriptive cases. +Troponymy -Troponymy Backward Cause (coextensiveness) (proper inclusion) Presupposition march-walk walk-step forget-know show-see 2.3 Mining semantics from text Previous web mining work has rarely addressed Figure 1. Fellbaum’s (1998) entailment hierarchy. extracting many different semantic relations from another is also useful to determine the order of the Web-sized corpus. Most work on extracting se- events (Barzilay et al. 2002). For example, to order mantic information from large corpora has largely summary sentences properly, it may be useful to focused on the extraction of is-a relations between know that selling something can be preceded by nouns. Hearst (1992) was the first followed by either buying, manufacturing, or stealing it. Fur- recent larger-scale and more fully automated ef- thermore, knowing that a particular verb has a forts (Pantel and Ravichandran 2004; Etzioni et al. meaning stronger than another (e.g. rape vs. abuse 2004; Ravichandran and Hovy 2002). Recently, and renovate vs. upgrade) can help a system pick Moldovan et al. (2004) present a learning algo- the most general sentence. rithm to detect 35 fine-grained noun phrase rela- In lexical selection of verbs in machine transla- tions. tion and in work on document classification, prac- Turney (2001) studied word relatedness and titioners have argued for approaches that depend synonym extraction, while Lin et al. (2003) present on wide-coverage resources indicating verb simi- an algorithm that queries the Web using lexical larity and membership of a verb in a certain class. patterns for distinguishing noun synonymy and In work on translating verbs with many counter- antonymy. Our approach addresses verbs and pro- parts in the target language, Palmer and Wu (1995) vides for a richer and finer-grained set of seman- discuss inherent limitations of approaches which tics. Reliability of estimating bigram counts on the do not examine a verb’s class membership, and put web via search engines has been investigated by forth an approach based on verb similarity. In Keller and Lapata (2003). document classification, Klavans and Kan (1998) Semantic networks have also been extracted demonstrate that document type is correlated with from dictionaries and other machine-readable re- the presence of many verbs of a certain EVCA sources. MindNet (Richardson et al. 1998) extracts class (Levin 1993). In discussing future work, Kla- a collection of triples of the type “ducks have vans and Kan point to extending coverage of the wings” and “duck capable-of flying”. This re- manually constructed EVCA resource as a way of source, however, does not relate verbs to each improving the performance of the system. A wide- other or provide verb semantics. coverage repository of verb relations including verbs linked by the similarity relation will provide 3 Semantic relations among verbs a way to automatically extend the existing verb In this section, we introduce and motivate the classes to cover more of the English lexicon. specific relations that we extract. Whilst the natural 2.2 Existing resources language literature is rich in theories of semantics (Barwise and Perry 1985; Schank and Abelson Some existing broad-coverage resources on 1977), large-coverage manually created semantic verbs have focused on organizing verbs into resources typically only organize verbs into a flat classes or annotating their frames or thematic roles. or shallow hierarchy of classes (such as those de- EVCA (English Verb Classes and Alternations) scribed in Section 2.2). WordNet identifies synon- (Levin 1993) organizes verbs by similarity and ymy, antonymy, troponymy, and cause. As sum- participation / nonparticipation in alternation pat- marized in Figure 1, Fellbaum (1998) discusses a terns. It contains 3200 verbs classified into 191 finer-grained analysis of entailment, while the classes. Additional manually constructed resources WordNet database does not distinguish between, include PropBank (Kingsbury et al. 2002), Frame- e.g., backward presupposition (forget :: know, Net (Baker et al. 1998), VerbNet (Kipper et al. where know must have happened before forget) 2000), and the resource on verb selectional restric- from proper temporal inclusion (walk :: step). In tions developed by Gomez (2001). formulating our set of relations, we have relied on Our approach differs from the above in its focus. the finer-grained analysis, explicitly breaking out We relate verbs to each other rather than organize the temporal precedence between entities. them into classes or identify their frames or the- In selecting the relations to identify, we aimed at matic roles. WordNet does provide relations be- both covering the relations described in WordNet tween verbs, but at a coarser level. We provide and covering the relations present in our collection of strongly associated verb pairs. We relied on the Table 1. Semantic relations identified in VERBOCEAN. Sib- strongly associated verb pairs, described in Section lings in the WordNet column refers to terms with the same troponymic parent, e.g. swim and fly. 4.4, for computational efficiency. The relations we identify were experimentally found to cover 99 out SEMANTIC Alignment with EXAMPLE Symmetric of 100 randomly selected verb pairs. RELATION WordNet synonyms or Our algorithm identifies six semantic relations similarity transform :: integrate Y between verbs. These are summarized in Table 1 siblings synonyms or along with their closest corresponding WordNet strength wound :: kill N category and the symmetry of the relation (whether siblings antonymy open :: close antonymy Y V1 rel V2 is equivalent to V2 rel V1). Similarity. As Fellbaum (1998) and the tradition enablement fight :: win cause N of organizing verbs into similarity classes indicate, happens- buy :: sell; cause before verbs do not neatly fit into a unified is-a (tro- marry :: divorce entailment, no N temporal inclusion ponymy) hierarchy. Rather, verbs are often similar or related. Similarity between action verbs, for example, can arise when they differ in connota- system include: assess :: review and accomplish :: tions about manner or degree of action. Examples complete. extracted by our system include maximize :: en- Happens-before. This relation indicates that the hance, produce :: create, reduce :: restrict. two verbs refer to two temporally disjoint intervals Strength. When two verbs are similar, one may or instances. WordNet’s cause relation, between a denote a more intense, thorough, comprehensive or causative and a resultative verb (as in buy :: own), absolute action. In the case of change-of-state would be tagged as instances of happens-before by verbs, one may denote a more complete change. our system. Examples of the happens-before rela- We identify this as the strength relation. Sample tion identified by our system include marry :: di- verb pairs extracted by our system, in the order vorce, detain :: prosecute, enroll :: graduate, weak to strong, are: taint :: poison, permit :: au- schedule :: reschedule, tie :: untie. thorize, surprise :: startle, startle :: shock. Some instances of strength sometimes map to WordNet’s 4 Approach troponymy relation. We discover the semantic relations described Strength, a subclass of similarity, has not been above by querying the Web with Google for identified in broad-coverage networks of verbs, but lexico-syntactic patterns indicative of each rela- may be of particular use in natural language gen- tion. Our approach has two stages. First, we iden- eration and summarization applications. tify pairs of highly associated verbs co-occurring Antonymy. Also known as semantic opposition, on the Web with sufficient frequency using previ- antonymy between verbs has several distinct sub- ous work by Lin and Pantel (2001), as described in types. As discussed by Fellbaum (1998), it can Section 4.4. Next, for each verb pair, we tested arise from switching thematic roles associated with lexico-syntactic patterns, calculating a score for the verb (as in buy :: sell, lend :: borrow). There is each possible semantic relation as described in also antonymy between stative verbs (live :: die, Section 4.2. Finally, as described in Section 4.3, differ :: equal) and antonymy between sibling we compare the strengths of the individual seman- verbs which share a parent (walk :: run) or an en- tic relations and, preferring the most specific and tailed verb (fail :: succeed both entail try). then strongest relations, output a consistent set as Antonymy also systematically interacts with the the final output. As a guide to consistency, we use happens-before relation in the case of restitutive a simple theory of semantics indicating which se- opposition (Cruse 1986). This subtype is exempli- mantic relations are subtypes of other ones, and fied by damage :: repair, wrap :: unwrap. In terms which are compatible and which are mutually ex- of the relations we recognize, it can be stated that clusive. restitutive-opposition(V1, V2) = happens- before(V1, V2), and antonym(V1, V2). Examples of 4.1 Lexico-syntactic patterns antonymy extracted by our system include: assem- The lexico-syntactic patterns were manually se- ble :: dismantle; ban :: allow; regard :: condemn, lected by examining pairs of verbs in known se- roast :: fry. mantic relations. They were refined to decrease Enablement. This relation holds between two capturing wrong parts of speech or incorrect se- verbs V1 and V2 when the pair can be glossed as V1 mantic relations. We used 50 verb pairs and the is accomplished by V2. Enablement is classified as overall process took about 25 hours. a type of causal relation by Barker and Szpakowicz We use a total of 35 patterns, which are listed in (1995). Examples of enablement extracted by our Table 2 along with the estimated frequency of hits. Table 2. Semantic relations and the 35 surface patterns used P(V , p,V ) S (V ,V ) = 1 2 to identify them. Total number of patterns for that relation is p 1 2 P( p)× P(V )× P(V ) shown in parentheses. In patterns, “*” matches any single 1 2 word. Punctuation does not count as words by the search The probabilities in the denominator are difficult engine used (Google). to calculate directly from search engine results. For a given lexico-syntactic pattern, we need to esti- SEMANTIC Surface Hitsest for RELATION Patterns patterns mate the frequency of the pattern instantiated with X ie Y appropriately conjugated verbs. For verbs, we need narrow 219,480 similarity (2)* Xed ie Yed to estimate the frequency of the verbs, but avoid counting other parts-of-speech (e.g. chair as a broad Xed and Yed 154,518,326 similarity (2)* to X and Y noun or painted as an adjective). Another issue is that some relations are symmetric (similarity and X even Y Xed even Yed antonymy), while others are not (strength, enable- X and even Y ment, happens-before). For symmetric relations Xed and even Yed strength (8) 1,016,905 only, the verbs can fill the lexico-syntactic pattern Y or at least X in either order. To address these issues, we esti- Yed or at least Xed not only Xed but Yed mate Sp(V1,V2) using: not just Xed but Yed hits(V , p,V ) 1 2 Xed * by Ying the S (V ,V ) ≈ N P 1 2 hits ( p) hits("to V ") ×C hits("to V ") ×C Xed * by Ying or est × 1 v × 2 v enablement (4) 2,348,392 to X * by Ying the N N N to X * by Ying or for asymmetric relations and

either X or Y hits(V1, p,V2 ) hits(V2 , p,V1) either Xs or Ys + S (V ,V ) ≈ N N either Xed or Yed P 1 2 2 * hits ( p) hits("to V ") × C hits("to V ") × C est × 1 v × 2 v antonymy (7) either Xing or Ying 18,040,916 N N N whether to X or Y Xed * but Yed for symmetric relations. to X * but Y Here, hits(S) denotes the number of documents to X and then Y containing the string S, as returned by Google. N is to X * and then Y the number of words indexed by the search engine 11 Xed and then Yed (N ≈ 7.2 × 10 ), Cv is a correction factor to obtain Xed * and then Yed the frequency of the verb V in all tenses from the happens-before to X and later Y 8,288,871 (12) Xed and later Yed frequency of the pattern “to V”. Based on several to X and subsequently Y verbs, we have estimated Cv = 8.5. Because pattern Xed and subsequently Yed counts, when instantiated with verbs, could not be to X and eventually Y estimated directly, we have computed the frequen- Xed and eventually Yed cies of the patterns in a part-of-speech tagged *narrow- and broad- similarity overlap in their coverage 500M word corpus and used it to estimate the ex- and are treated as a single category, similarity, when pected number of hits hitsest(p) for each pattern. postprocessed. Narrow similarity tests for rare patterns We estimated the N with a similar method. and hitsest for it had to be approximated rather than estimated from the smaller corpus. We say that the semantic relation Sp indicated by lexico-syntactic pattern p is present between V1 and V if Note that our patterns specify the tense of the verbs 2 they accept. When instantiating these patterns, we Sp(V1,V2) > C1 conjugate as needed. For example, “both Xed and As a result of tuning the system on a tuning set of Yed” instantiates on sing and dance as “both sung 50 verb pairs, C1 = 8.5. and danced”. Additional test for asymmetric relations. For the asymmetric relations, we require not only that 4.2 Testing for a semantic relation SP (V1,V2 ) exceed a certain threshold, but that In this section, we describe how the presence of there be strong asymmetry of the relation: a semantic relation is detected. We test the rela- S (V ,V ) hits(V , p,V ) tions with patterns exemplified in Table 2. We p 1 2 = 1 2 > C S (V ,V ) hits(V , p,V ) 2 adopt an approach inspired by mutual information p 2 1 2 1 to measure the strength of association, denoted From the tuning set, C2 = 5. S (V , V ), between three entities: a verb pair V p 1 2 1 4.3 Pruning identified semantic relations and V2 and a lexico-syntactic pattern p: Given a pair of semantic relations from the set we identify, one of three cases can arise: (i) one relation is more specific (strength is more specific of associated verb pairs. Hence, this paper is con- than similarity, enablement is more specific than cerned only with relations between transitive happens-before), (ii) the relations are compatible verbs. (antonymy and happens-before), where presence of A path, extracted from a parse tree, is an expres- one does not imply or rule out presence of the sion that represents a binary relation between two other, and (iii) the relations are incompatible (simi- nouns. A set of paraphrases was generated for each larity and antonymy). pair of associated paths. For example, using a It is not uncommon for our algorithm to identify 1.5GB newspaper corpus, here are the 20 most presence of several relations, with different associated paths to “X solves Y” generated by strengths. To produce the most likely output, we DIRT: use semantics of compatibility of the relations to Y is solved by X, X resolves Y, X finds output the most likely one(s). The rules are as a solution to Y, X tries to solve Y, X follows: deals with Y, Y is resolved by X, X addresses Y, X seeks a solution to Y, X If the frequency was too low (less than 10 on the does something about Y, X solution to pattern “X * Y” OR “Y * X” OR “X * * Y” OR “Y Y, Y is resolved in X, Y is solved through X, X rectifies Y, X copes with * * X”), output that the statements are unrelated Y, X overcomes Y, X eases Y, X tackles and stop. Y, X alleviates Y, X corrects Y, X is a If happens-before is detected, output presence of solution to Y, X makes Y worse, X irons out Y happens-before (additional relation may still be output, if detected). This list of associated paths looks tantalizingly If happens-before is not detected, ignore detec- close to the kind of axioms that would prove useful tion of enablement (because enablement is more in an inference system. However, DIRT only out- specific than happens-before, but is sometimes puts pairs of paths that have some semantic rela- falsely detected in the absence of happens-before). tion. We used these as our set to extract finer- If strength is detected, score of similarity is ig- grained relations. nored (because strength is more specific than similarity). 5 Experimental results Of the relations strength, similarity, opposition In this section, we empirically evaluate the accu- and enablement which were detected (and not ig- racy of VERBOCEAN1. nored), output the one with highest Sp. If nothing has been output to this point, output 5.1 Experimental setup unrelated. We studied 29,165 pairs of verbs. Applying 4.4 Extracting highly associated verb pairs DIRT to a 1.5GB newspaper corpus2, we extracted 4000 paths that consisted of single verbs in the To exhaustively test the more than 64 million relation subject-verb-object (i.e. paths of the form unordered verb pairs for WordNet’s more than “X verb Y”) whose verbs occurred in at least 150 11,000 verbs would be computationally intractable. documents on the Web. For example, from the 20 Instead, we use a set of highly associated verb most associated paths to “X solves Y” shown in pairs output by a paraphrasing algorithm called Section 4.4, the following verb pairs were ex- DIRT (Lin and Pantel 2001). Since we are able to tracted: test up to 4000 verb pairs per day on a single ma- solves :: resolves chine (we issue at most 40 queries per test and solves :: addresses each query takes approximately 0.5 seconds), we solves :: rectifies are able to test several dozen associated verbs for solves :: overcomes solves :: eases each verb in WordNet in a matter of weeks. solves :: tackles Lin and Pantel (2001) describe an algorithm solves :: corrects called DIRT (Discovery of Inference Rules from 5.2 Accuracy Text) that automatically learns paraphrase expres- sions from text. It is a generalization of previous We classified each verb pair according to the algorithms that use the distributional hypothesis semantic relations described in Section 2. If the (Harris 1985) for finding similar words. Instead of system does not identify any semantic relation for applying the hypothesis to words, Lin and Pantel a verb pair, then the system tags the pair as having applied it to paths in dependency trees. Essen- tially, if two paths tend to link the same sets of 1 VERBOCEAN is available for download at words, they hypothesized that the meanings of the http://semantics.isi.edu/ocean/. corresponding paths are similar. It is from paths of 2 The 1.5GB corpus consists of San Jose Mercury, the form subject-verb-object that we extract our set Wall Street Journal and AP Newswire articles from the TREC-9 collection. Table 3. Five randomly selected pairs along with the system tag (in bold) and the judges’ responses.

CORRECT PREFERRED SEMANTIC RELATION

PAIRS WITH SYSTEM TAG (IN BOLD) JUDGE 1 JUDGE 2 JUDGE 1 JUDGE 2 X absolve Y is similar to X vindicate Y Yes Yes is similar to is similar to X bottom Y has no relation with X abate Y Yes Yes has no relation with has no relation with X outrage Y happens-after / is stronger than X shock Y Yes Yes happens-before / is happens-before/ is stronger stronger than than X pool Y has no relation with X increase Y Yes No has no relation with can result in X insure Y is similar to X expedite Y No No has no relation with has no relation with

Table 4. Accuracy of system-discovered relations. Table 5. Accuracy of each semantic relation.

ACCURACY SEMANTIC SYSTEM Tags Preferred Tags RELATION TAGS Correct Correct Tags Preferred Tags Baseline Correct Correct Correct similarity 41 63.4% 40.2%

Judge 1 66% 54% 24% strength 14 75.0% 75.0% Judge 2 65% 52% 20% antonymy 8 50.0% 43.8% Average 65.5% 53% 22% enablement 2 100% 100% no relation 35 72.9% 72.9% no relation. To evaluate the accuracy of the sys- happens before 17 67.6% 55.9% tem, we randomly sampled 100 of these verb pairs, and presented the classifications to two human judges. The adjudicators were asked to judge put by the system, 67.6% were judged correct. whether or not the system classification was ac- Ignoring the happens-before relations, we achieved ceptable (i.e. whether or not the relations output by a Tags Correct precision of 68%. the system were correct). Since the semantic rela- Table 5 shows the accuracy of the system on tions are not disjoint (e.g. mop is both stronger each of the relations. The stronger-than relation is than and similar to sweep), multiple relations may a subset of the similarity relation. Considering a be appropriately acceptable for a given verb pair. coarser extraction where stronger-than relations The judges were also asked to identify their pre- are merged with similarity, the task of judging ferred semantic relations (i.e. those relations which system tags and the task of identifying the pre- seem most plausible). Table 3 shows five randomly ferred semantic relation both jump to 68.2% accu- selected pairs along with the judges’ responses. racy. Also, the overall accuracy of the system The Appendix shows sample relationships discov- climbs to 68.5%. ered by the system. As described in Section 2, WordNet contains Table 4 shows the accuracy of the system. The verb semantic relations. A significant percentage baseline system consists of labeling each pair with of our discovered relations are not covered by the most common semantic relation, similarity, WordNet’s coarser classifications. Of the 40 verb which occurs 33 times. The Tags Correct column pairs whose system relation was tagged as correct represents the percentage of verb pairs whose sys- by both judges in our accuracy experiments and tem output relations were deemed correct. The whose tag was not ‘no relation’, only 22.5% of Preferred Tags Correct column gives the percent- them existed in a WordNet relation. age of verb pairs whose system output relations 5.3 Discussion matched exactly the human’s preferred relations. The Kappa statistic (Siegel and Castellan 1988) for The experience of extracting these semantic rela- the task of judging system tags as correct and in- tions has clarified certain important challenges. While relying on a search engine allows us to correct is κ = 0.78 whereas the task of identifying query a corpus of nearly a trillion words, some the preferred semantic relation has κ = 0.72. For issues arise: (i) the number of instances has to be the latter task, the two judges agreed on 73 of the approximated by the number of hits (documents); 100 semantic relations. 73% gives an idea of an (ii) the number of hits for the same query may upper bound for humans on this task. On these 73 fluctuate over time; and (iii) some needed counts relations, the system achieved a higher accuracy of are not directly available. We addressed the latter 70.0%. The system is allowed to output the hap- issue by approximating these counts using a pens-before relation in combination with other smaller corpus. relations. On the 17 happens-before relations out- We do not detect entailment with lexico- 7 Conclusions syntactic patterns. In fact, we propose that whether We have demonstrated that certain fine-grained the entailment relation holds between V and V 1 2 semantic relations between verbs are present on the depends on the absence of another verb V ' in the 1 Web, and are extractable with a simple pattern- same relationship with V . For example, given the 2 based approach. In addition to discovering rela- relation marry happens-before divorce, we can tions identified in WordNet, such as opposition and conclude that divorce entails marry. But, given the enablement, we obtain strong results on strength relation buy happens-before sell, we cannot con- relations (for which no wide-coverage resource is clude entailment since manufacture can also hap- available). On a set of 29,165 associated verb pen before sell. This also applies to the enablement pairs, experimental results show an accuracy of and strength relations. 65.5% in assigning similarity, strength, antonymy, Corpus-based methods, including ours, hold the enablement, and happens-before. promise of wide coverage but are weak on dis- Further work may refine extraction methods and criminating senses. While we hope that applica- further process the mined semantics to derive other tions will benefit from this resource as is, an inter- relations such as entailment. esting next step would be to augment it with sense We hope to open the way to inferring implied, information. but not stated assertions and to benefit applications such as question answering, information retrieval, 6 Future work and summarization. There are several ways to improve the accuracy of the current algorithm and to detect relations Acknowledgments between low frequency verb pairs. One avenue The authors wish to thank the reviewers for their would be to automatically learn or manually craft helpful comments and Google Inc. for supporting more patterns and to extend the pattern vocabulary high volume querying of their index. This research (when developing the system, we have noticed that was partly supported by NSF grant #EIA-0205111. different registers and verb types require different patterns). Another possibility would be to use References more relaxed patterns when the part of speech confusion is not likely (e.g. “eat” is a common Baker, C.; Fillmore, C.; and Lowe, J. 1998. The verb which does not have a noun sense, and pat- Berkeley FrameNet project. In Proceedings of terns need not protect against noun senses when COLING-ACL. Montreal, Canada. testing such verbs). Barker, K.; and Szpakowicz, S. 1995. Interactive Our approach can potentially be extended to Semantic Analysis of Clause-Level multiword paths. DIRT actually provides two Relationships. In Proceedings of PACLING `95. orders of magnitude more relations than the 29,165 Brisbane. single verb relations (subject-verb-object) we ex- Barwise, J. and Perry, J. 1985. Semantic innocence tracted. On the same 1GB corpus described in Sec- and uncompromising situations. In: Martinich, tion 5.1, DIRT extracted over 200K paths and 6M A. P. (ed.) The Philosophy of Language. New unique paraphrases. These provide an opportunity York: Oxford University Press. pp. 401–413. to create a much larger corpus of semantic relations, or to construct smaller, in-depth resources Barzilay, R.; Elhadad, N.; and McKeown, K. 2002. for selected subdomains. For example, we could Inferring strategies for sentence ordering in extract that take a trip to is similar to travel to, and multidocument summarization. JAIR, 17:35–55. that board a plane happens before deplane. Cruse, D. 1992 Antonymy Revisited: Some If the entire database is viewed as a graph, we Thoughts on the Relationship between Words currently leverage and enforce only local consis- and Concepts, in A. Lehrer and E.V. Kittay tency. It would be useful to enforce global consis- (eds.), Frames, Fields, and Contrasts, Hillsdale, tency, e.g. V1 stronger-than V2, and V2 stronger- NJ, Lawrence Erlbaum associates, pp. 289-306. than V3 indicates that V1 stronger-than V3, which Etzioni, O.; Cafarella, M.; Downey, D.; Kok, S.; may be leveraged to identify additional relations or Popescu, A.; Shaked, T.; Soderland, S.; Weld, inconsistent relations (e.g. V3 stronger-than V1). D.; and Yates, A. 2004. Web-scale information Finally, as discussed in Section 5.3, entailment extraction in KnowItAll. To appear in WWW- relations may be derivable by processing the com- 2004. plete graph of the identified semantic relation. Appendix. Sample relations extracted by our system.

SEMANTIC SEMANTIC SEMANTIC EXAMPLES EXAMPLES EXAMPLES RELATION RELATION RELATION maximize :: enhance assess :: review detain :: prosecute happens similarity produce :: create enablement accomplish :: complete enroll :: graduate before reduce :: restrict double-click :: click schedule :: reschedule permit :: authorize assemble :: dismantle strength surprise :: startle antonymy regard :: condemn startle :: shock roast :: fry

Fellbaum, C. 1998. Semantic network of English Moldovan, D.; Badulescu, A.; Tatu, M.; Antohe, verbs. In Fellbaum, (ed). WordNet: An D.; and Girju, R. 2004. Models for the semantic Electronic Lexical Database, MIT Press. classification of noun phrases. To appear in Gomez, F. 2001. An Algorithm for Aspects of Proceedings of HLT/NAACL-2004 Workshop on Semantic Interpretation Using an Enhanced Computational Lexical Semantics. WordNet. In NAACL-2001, CMU, Pittsburgh. Moldovan, D.; Harabagiu, S.; Girju, R.; Harris, Z. 1985. Distributional Structure. In: Katz, Morarescu, P.; Lacatusu, F.; Novischi, A.; J. J. (ed.), The Philosophy of Linguistics. New Badulescu, A.; and Bolohan, O. 2002. LCC tools York: Oxford University Press. pp. 26–47. for question answering. In Notebook of the Eleventh Text REtrieval Conference (TREC- Hearst, M. 1992. Automatic acquisition of 2002). pp. 144–154. hyponyms from large text corpora. In COLING- 92. pp. 539–545. Nantes, France. Palmer, M., Wu, Z. 1995. Verb Semantics for English-Chinese Translation Machine Keller, F.and Lapata, M. 2003. Using the Web to Translation, 9(4). Obtain Frequencies for Unseen Bigrams. Computational Linguistics 29:3. Pantel, P. and Ravichandran, D. 2004. Automatically labeling semantic classes. To Kingsbury, P; Palmer, M.; and Marcus, M. 2002. appear in Proceedings of HLT/NAACL-2004. Adding semantic annotation to the Penn Boston, MA. TreeBank. In Proceedings of HLT-2002. San Diego, California. Ravichandran, D. and Hovy, E., 2002. Learning surface text patterns for a question answering Kipper, K.; Dang, H.; and Palmer, M. 2000. Class- system. In Proceedings of ACL-02. based construction of a verb lexicon. In Pro- Philadelphia, PA. ceedings of AAAI-2000. Austin, TX. Richardson, S.; Dolan, W.; and Vanderwende, L. Klavans, J. and Kan, M. 1998. Document classi- 1998. MindNet: acquiring and structuring fication: Role of verb in document analysis. In semantic information from text. In Proceedings Proceedings COLING-ACL '98. Montreal, of COLING '98. Canada. Rus, V. 2002. Logic Forms for WordNet Glosses. Levin, B. 1993. English Verb Classes and Ph.D. Thesis. Southern Methodist University. Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL. Schank, R. and Abelson, R. 1977. Scripts, Plans, Goals and Understanding: An Inquiry into Lin, C-Y. 1997. Robust Automated Topic Human Knowledge Structures. Lawrence Identification. Ph.D. Thesis. University of Erlbaum Associates. Southern California. Siegel, S. and Castellan Jr., N. 1988. Lin, D. and Pantel, P. 2001. Discovery of inference Nonparametric Statistics for the Behavioral rules for question answering. Natural Language Sciences. McGraw-Hill. Engineering, 7(4):343–360. Turney, P. 2001. Mining the Web for synonyms: Lin, D.; Zhao, S.; Qin, L.; and Zhou, M. 2003. PMI-IR versus LSA on TOEFL. In Proceedings Identifying synonyms among distributionally ECML-2001. Freiburg, Germany. similar words. In Proceedings of IJCAI-03. pp.1492–1493. Acapulco, Mexico. Webber, B.; Gardent, C.; and Bos, J. 2002. Position statement: Inference in question Miller, G. 1990. WordNet: An online lexical answering. In Proceedings of LREC-2002. Las database. International Journal of Lexicography, Palmas, Spain. 3(4).