Analyzing the Impact of Spelling Errors on POS-Tagging and Chunking in Learner English
Total Page:16
File Type:pdf, Size:1020Kb
Analyzing the Impact of Spelling Errors on POS-Tagging and Chunking in Learner English Tomoya Mizumoto1 and Ryo Nagata2 1 1 RIKEN Center for Advanced Intelligence Project 2 Konan University [email protected], nagata-ijcnlp @hyogo-u.ac.jp Abstract The heavy dependence on POS-tagging and chunking suggests that failures could degrade the Part-of-speech (POS) tagging and chunk- performance of NLP systems and linguistic analy- ing have been used in tasks targeting ses (Han et al., 2006; Sukkarieh and Blackmore, learner English; however, to the best 2009). For example, failure to recognize noun our knowledge, few studies have eval- phrases in a sentence could lead to failure in cor- uated their performance and no stud- recting related errors in article use and noun num- ies have revealed the causes of POS- ber. More importantly, such failures make it more tagging/chunking errors in detail. There- difficult to simply count the number of POSs and fore, we investigate performance and an- chunks, thereby causing inaccurate estimates of alyze the causes of failure. We focus their distributions. Note that such estimates are of- on spelling errors that occur frequently ten employed in linguistic analysis, including the in learner English. We demonstrate that above-mentioned studies. spelling errors reduced POS-tagging per- Despite its importance in related tasks, we also formance by 0.23% owing to spelling er- note that few studies have focused on performance rors, and that a spell checker is not neces- evaluations of POS-tagging and chunking. Only a sary for POS-tagging/chunking of learner few studies, including Nagata et al.(2011), Berzak English. et al.(2016) and Sakaguchi et al.(2012), have re- ported the performance of POS taggers in learner 1 Introduction English and found a performance gap between na- tive and learner English. However, none of those Part-of-speech (POS) tagging and chunking have studies described the root causes of POS-tagging been essential components of Natural Language and chunking errors in detail. Detailed investiga- Processing (NLP) techniques that target learner tions would certainly improve performance, which English, such as grammatical error correction and in turn, would improve related tasks. Further- automated essay scoring. In addition, they are fre- more, to the best of our knowledge, no study has quently used to extract linguistic features relevant reported chunking performance when applied to to the given task. For example, in the CoNLL- learner English. 1 2014 Shared Task (Ng et al., 2014), 10 of the 12 Unknown words are a major cause of POS- teams used one or both POS-tagging and chunk- tagging and chunking failures (Manning, 2011). In ing to extract features for grammatical error cor- learner English, spelling errors, which occur fre- rection. quently, are a major source of unknown words. They have also been used for linguistic analy- Spell checkers (e.g., Aspell) are used to correct sis of learner English, particularly in corpus-based spelling errors prior to POS-tagging and chunking. studies. Aarts and Granger(1998) explored char- However, their effectiveness remains unclear. acteristic POS patterns in learner English. Nagata Thus, we evaluate the extent to which spelling and Whittaker(2013) demonstrated that POS se- errors in learner English affect the POS tag- quences obtained by POS-tagging can be used to 1It appears that parsing doubles as chunking; however, distinguish between mother tongue interferences chunking only considers a minimal phrase (non-recursive effectively. structures). 54 Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications, pages 54–58, Taipei, Taiwan, December 1, 2017 c 2017 AFNLP ging and chunking performance. More precisely, *smell/smelly).2 Some spelling errors, such as ty- we analyze the performance analysis of POS- pographical and merge errors, result in unknown tagging/chunking to determine (1) the extent to words, whereas others, such as homophones and which performance is reduced due to spelling split errors, are known words. For unknown errors, (2) what types of spelling errors impact words, it is possible to predict POSs/chunks from the performance, and (3) the effect of correcting surrounding words, whereas for known words spelling errors using a spell checker. Our analy- (e.g., homophone errors), POS-tagging/chunking sis demonstrates that employing a spell checker is fails. We use specific examples to investigate what not required preliminary step of POS-tagging and types of spelling errors impact the performance of chunking for NLP analysis of learner English. POS-tagging. Some spelling errors have effective informa- 2 Performance Analysis of Spelling tion that helps determine POSs. For exam- Errors ple, for the above typographical error (i.e., *studing/studying), it may be possible to predict Here, we explain how we analyzed POS tag- the corresponding POS as a “gerund or present ging/chunking performance relative to spelling er- participle verb” based on the suffix “ing.” We also rors. consider the effectiveness of prefix and suffix (i.e., Extent of performance degradation due to affix) information in determining the correspond- spelling errors Spelling errors occur frequently ing POS for misspelled words. For this investiga- in learner English. For example, the learner cor- tion, we compared POS-tagging systems both with pus used in (Flor et al., 2013) includes 3.4% and without affix information. spelling errors. Thus assuming that POS-tagging Effects of a spell checker Some previous stud- and chunking fails for all unknown words, perfor- ies into grammatical error correction investigated mance would be reduced by 3.4% owing spelling using a spell checker in a preprocessing step to re- errors. Realistically, performance does not drop a duce the negative impact of spelling errors. How- full 3.4% because POS-taggers and chunkers can ever, as noted above, little is known about the per- infer POS/chunk from surrounding words. How- formance of POS-tagging and chunking for mis- ever, it is not clear how POS-tagging/chunking can spelled words and their surrounding words. There- correctly predict them. In contrast, if it is possi- fore, the effectiveness of a spell checker in a pre- ble to estimate POSs/chunks of misspelled words processing step on POS-tagging and chunking for from surrounding words, this has the potential to learner English remains unclear. Spell check- fail due to spelling errors. To investigate the extent ers can correct some errors, particularly unknown to which performance is reduced due to spelling word errors; thus, POS-tagging and chunking have errors, we compared the results of POS-tagging the potential to predict correct tags. We therefore and chunking on learner English without correct- examined the effect of a spell checker has on POS- ing spelling errors to results obtained by POS- tagging and chunking performance by comparing tagging and chunking on learner English in which results obtained with and without the use of a spell spelling errors were first corrected. In addition, checker. we measured the effect of misspelled words had on them or their surrounding words by counting 3 Experimental Setup the number of correctly identified POSs/chunks. To evaluate the performance of POS-tagging and Types of spelling errors There are various chunking, we used the Konan-JIEM (KJ) corpus types of spelling errors in learner English (Sak- (Nagata et al., 2011) , which consists of 3,260 sen- aguchi et al., 2012). The most common tences and 30,517 tokens. Note that the essays in type of spelling error is a typographical er- the KJ corpus were written by Japanese university ror (e.g., *studing/studying). In learner En- students. The number of spelling errors targeted glish, other types of errors include homophones in this paper was 654 (i.e., 2.1% of all words). (e.g., *see/sea), confusion (e.g., *form/from), We used a proprietary dataset comprising En- splits (e.g., *home town/hometown), merges (e.g., glish teaching materials for reading comprehen- *airconditioner/air conditioner), inflections (e.g., *program/programming) and derivations (e.g., 2Note that we do not address split and merge errors. 55 #TP #FP #FN Precision Recall F-score Method Accuracy 409 197 120 67.49 77.32 72.07 Baseline 93.97 (92.71) Base+Aff 95.31 (93.93) Table 1: Performance of spelling error correction Base+Checker 94.21 (93.13) Base+Aff+Checker 95.37 (94.05) Base+Aff+Gold 95.54 (94.16) sion for Japanese students. We annotated this Table 2: Results of POS-tagging. Accuracies of dataset with POS tags and chunks to train a model POS-tagging trained on Penn TreeBank are shown for POS-tagging and chunking. This corpus con- in parentheses. sists of 16,375 sentences and 213,017 tokens, and does not contain grammatical errors. We also used Method # of si # of si 1 # of si+1 sections 0-18 of the Penn TreeBank only to train correct correct− correct the model for POS-tagging. Baseline 344 540 590 Base+Aff 465 542 598 We formulated the POS-tagging and chunking Base+Aff+Gold 528 547 596 as a sequence labeling problem. We used a condi- tional random field (CRF) (Lafferty et al., 2001) Table 3: Results of POS tagging for misspelled 3 for sequence labeling and CRF++ with default words and their surrounding words. si indicates a parameters as a CRF tool. The features used for misspelled word. POS-tagging were based on the widely used fea- tures employed in Ratnaparkhi(1996). These fea- 4 POS-tagging Experiments tures consist of surface, original form, presence of specific characters (e.g., numbers, uppercase, We conducted POS-tagging experiments to inves- and symbols), and prefix and suffix (i.e., affix) tigate the question introduced in Section2.