Semantic Pleonasm Detection
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Pleonasm Detection Omid Kashefi Andrew T. Lucas Rebecca Hwa School of Computing and Information University of Pittsburgh Pittsburgh, PA, 15260 kashefi, andrew.lucas, hwa @cs.pitt.edu { } Abstract some GEC corpora do annotate some words or phrases as “redundant” or “unnecessary,” they are Pleonasms are words that are redundant. To typically a manifestation of grammar errors (e.g., aid the development of systems that detect we still have room to improve our current pleonasms in text, we introduce an anno- for tated corpus of semantic pleonasms. We val- welfare system) rather than a stylistic redundancy idate the integrity of the corpus with inter- (e.g., we aim to better improve our welfare annotator agreement analyses. We also com- system). pare it against alternative resources in terms of This paper presents a new Semantic Pleonasm their effects on several automatic redundancy Corpus (SPC), a collection of three thousand sen- detection methods. tences. Each sentence features a pair of potentially 1 Introduction semantically related words (chosen by a heuris- tic); human annotators determine whether either Pleonasm is the use of extraneous words in an (or both) of the words is redundant. The corpus expression such that removing them would not offers two improvements over current resources. significantly alter the meaning of the expression First, the corpus filters for grammatical sentences (Merriam-Webster, 1983; Quinn, 1993; Lehmann, so that the question of redundancy is separated 2005). Although pleonastic phrases may serve from grammaticality. Second, the corpus is fil- literary functions (e.g., to add emphasis) (Miller, tered for a balanced set of positive and negative ex- 1951; Chernov, 1979), most modern writing style amples (i.e., no redundancy). The negative exam- guides caution against them in favor of con- ples may make useful benchmark data – because cise writing (Hart et al., 1905; Williams, 2003; they all contain a pair of words that are deemed to Turabian, 2013; Gowers, 2014; Strunk, 1920). be semantically related, a successful system can- An automatic pleonasm detector would be not rely on simple heuristics, such as semantic dis- beneficial for natural language processing tances, for discrimination. We evaluate the cor- (NLP) applications that support student writ- pus in terms of inter-annotator agreement, and in ing, such as grammar error correction (GEC) terms of its usefulness for developing automatic (Han et al., 2006; Rozovskaya and Roth, 2010; pleonasm detectors. Tetreault et al., 2010; Dahlmeier and Ng, 2011), automatic essay grading (Larkey, 1998; Landauer, 2 Semantic Pleonasm 2003; Ong et al., 2014), and intelligent writing tutors (Merrill et al., 1992; Aleven et al., 2009; Although pleonasm is generally a semantic and Atkinson, 2016). Pleonastic phrases may also rhetorical concept, it could have different aspects negatively impact NLP applications in general and be formed in different layers of language, in- because they introduce an unnecessary complexity cluding morphemic (e.g., “irregardless”(Berube, to the language. Their removal might facilitate 1985)) and syntactic layers (e.g., “the most un- NLP tasks such as parsing, summarization, kindest cut of all”). Detecting and correcting and machine translation. However, automated morphemic and syntactic pleonasms are more in pleonasm detection is a challenging problem, in the scope of GEC research, especially when they parts because there is no appropriate resources to cause errors. Semantic pleonasm, on the other support the development of such systems. While hand, is “a question of style or taste, not gram- 225 Proceedings of NAACL-HLT 2018, pages 225–230 New Orleans, Louisiana, June 1 - 6, 2018. c 2018 Association for Computational Linguistics mar” (Evans and Evans, 1957). It occurs when the of finding redundancy, since semantically related meaning of a word (or phrase) is already implied words that are farther apart are more likely to by other words in the sentence. For example, the have different syntactic and semantic roles. To de- following is a grammatical sentence that has a re- termine semantic similarity, we use the TextBlob dundant word: I received a free gift. While writers Python interface2, which, for a given word, pro- might intentionally include the redundant word for vides access to WordNet synsets (Miller, 1995) emphasis, the overuse of pleonasm may weaken corresponding to each of the word’s senses. We the expression, making it “boring rather than strik- compare each pair of adjacent words in the dataset ing the hearer.” (Fowler, 1994). to see whether they share any synsets. Since WordNet serves as a coarse filter, we need to fur- 3 A Semantic Pleonasm Corpus ther improve recall. We select any sentences that Semantic pleonasm is a complex linguistic phe- contains a pair of adjacent words such that one of nomenon; to develop a useful corpus for it, we the words has a synset that is similar to a synset need to make some design decisions in terms of of the other word. TextBlob provides this “simi- a trading off between the breadth and depth of our lar to” functionality, which finds synsets that are coverage. close to a given synset in WordNet’s taxonomy tree. (note, however, that these words may not 3.1 Data Source be used in those senses in the sentence). Apply- We want to start from a source that is likely to con- ing these filtering rules, we are able to eliminate tain semantic redundancies. Because good writers a large percentage of sentences that do not con- are trained to guard against redundant phrasings, tain semantic redundancy; the method also help us professionally written text from Project Gutenberg identify a pair of words in each sentence that is or the Wall Street Journal would not be appropri- likely to have a redundancy. In the second step ate. Because we want to separate the issues of of filtering, we manually removed sentences that grammaticality from redundancy, learner corpora contained obvious grammatical mistakes. would also not be appropriate. A data source that 3.3 Annotation seems promising is amateur product reviews. The We set up a Amazon Mechanical Turk service to writers tend to produce more emotional prose that determine whether the potentially redundant word are at times exasperated or gushing; the writing pairs are actually redundant. Because we want to is more off-the-cuff and casual, and may contain build a balanced corpus, we first perform a quick more redundancy. Ultimately, we chose to work internal first pass, marking each sentence as ei- with restaurant reviews from Round Seven of the ther “possibly containing redundancy” or “prob- Yelp Dataset Challenge1 because it is widely dis- ably not containing redundancy” so that we can tributed. distribute the instances to the Turkers with equal 3.2 Filtering probability (they do not see our internal annota- tions). The Turkers are given six sentences at a Although redundant words and phrases occur fre- time, each containing a highlighted pair of words. quently enough that exhortations to excise them The workers have to decide whether to delete the is a constant refrain in writing guides, most sen- first word, the second word, both, or neither. Then, tences still skew toward not containing pleonasms. they indicate their confidence: “Certain,” “Some- Annotating all sentences would dilute the impact what certain,” or “Uncertain.” Lastly, they are of the positive examples, further complicate the given the opportunity to provide additional expla- annotation scheme, and increase the cost of the nations. Each sentence has been reviewed by three corpus creation. Thus, we opt to construct a bal- different workers. For about ninety percent of the anced corpus of positive and negative examples for sentences, three annotations proved sufficient to a specific kind of redundancy in a specific config- achieve a consensus. We collect a fourth annota- uration. In particular, we extract all sentences that tion for the remaining sentences, and are then able contained a pair of adjacent words that are likely to declare a consensus. to be semantically similar. We restrict our atten- tion to adjacent word pairs to increase the chance 1https://www.yelp.com/dataset/challenge 2http://textblob.readthedocs.io/en/dev/ 226 Consensus Level Fleiss’s Kappa One Both Neither Total First Second Word Level 0.384 Sentence Level 0.482 955 765 16 1,283 3,019 32% 25% 1% 42% 100% Table 1: Inter-Annotator Agreement 57% Table 2: Statistics of the Semantic Pleonasm Corpus A Few Examples Sentence: Freshly squeezed and no additives, sentences, both words are marked as redundant. • just plain pure fruit pulp. Table 2. shows the statistics of annotators consen- Consensus: plain is redundant. sus. The corpus, including all annotations and the final consensus, is available in JSON format from Sentence: It is clear that I will never have • http://pleonasm.cs.pitt.edu another prime first experience like the one I had at Chompies. 4 Automatic Pleonasm Detection Consensus: neither word is redundant. Given our design choices, the current SPC is not a Sentence: The dressing is absolutely incred- large corpus; we posit that it can nonetheless serve • ibly fabulously flavorful! as a valuable resource for developing systems to Consensus: both words are redundant. detect semantic pleonasm. For example, the ear- lier work of Xue and Hwa (2014) might have ben- 3.4 Inter-Annotator Agreement efited from this resource. They wanted to detect Because our corpus is annotated by many Turk- the word in a sentence that contributes the least ers, with some labeling only a handful of sen- to the meaning of the sentence; however, their ex- tences while others contributed hundreds, the typ- periments were hampered by a mismatch between ical pair-wise inter-annotated agreement is not ap- their intended domain and the corpus they eval- propriate. Instead, we compute Fleiss’s Kappa uated on – while their model estimated a word’s (Fleiss, 1971), which measures the degree of semantic redundancy, their experiments were per- agreement in classification over what would be ex- formed on NUCLE (Dahlmeier et al., 2013), a pected by chance for more than two annotator.