Appendix for Chapter 5, Section 5.3.5: Verbs with Narrow Selectional Constraints
Total Page:16
File Type:pdf, Size:1020Kb
Appendix for Chapter 5, Section 5.3.5: Verbs with Narrow Selectional Constraints To test our method of resolving Broad RefExs using expectations about selectional constraints (Section 5.3.5), we used the dataset below. It includes 202 verbs and typical fillers of a given case- role. The description of the task given to the undergraduate student who carried it out is as follows: The goal is to prepare for a system evaluation using a “lightweight” text analysis system, not the full OntoSem. We hypothesize that we can automatically resolve instances of the referring expressions it/this/that with relatively high confidence if (a) they fill a case-role that is quite narrowly constrained semantically and (b) the preceding context contains a word that reflects exactly such a meaning. For example, if you see the sentence “It was flown by a madman.” and if the preceding context contains “plane” or “airplane” or “helicopter”, then that’s probably what ‘it’ refers to. Ontologically speaking, that’s because the THEME of FLY is some sort of AIRCRAFT. However, for this experiment we’re dealing with text strings NOT ontological concepts. In effect, we’re creating a text-string proxy for concept-level constraints by listing the most common strings that could fill the position. Below are two lists of words. The first contains verbs that can be used in the pattern “It was/got/must be VERBed”: e.g., abandon: plan, idea, scheme says that we find the input “It got abandoned”, and if the preceding context contains plan, idea or scheme, we’re guessing that the given word is the referent for it. (Again, we’re simplifying to words, not full NPs and so on.) The second list contains verbs that can be used in the pattern “It VERBed”. So babble: baby, brook, stream, river means that if we find “It babbled”, and if the preceding context contains baby, brook, stream or river, then that string is the referent for it. Your task is to expand the inventory of strings that can fill the case roles. We’re looking for highly predictive, highly common strings – this is not meant to be an exhaustive listing of possibilities. (In some cases I specifically point out places where things need to be added; but add things wherever you see fit.) You can use several methods of finding more strings: • Use your own introspection. • Use WordNet online to find synonyms and hyponyms of what we have. • Use the BYU corpus online to search for contextual usages of the verbs (sign up for a free account to the right when you get in): 1 http://corpus.byu.edu/coca/ If you use this method, don’t get bogged down in it. Search for strings like “abandoned it” to see what the direct objects of ‘abandon’ can be, then see if they work in our configuration of interest: “It was abandoned”. You will find A LOT of metaphorical usages in the corpus, which is why you should just use this as a quick check method, not quicksand. Note that we won’t include proper names, though they might be common in some cases: e.g., Jerusalem/Haifa/other cities got bombed. We also are not trying to be comprehensive – if any kind MAMMAL would be an appropriate filler, you don’t need to list them all. We’re trying to list what we might find in actual corpora. However, there’s no significant extra computational load in listing a lot of reasonable possibilities (e.g., for it barked, any kind of dog will do; so if you find a list of dog breeds that you want to copy it in, that’s fine). The amount of time to be spent on this task: 10 hours.