Assessment of an Index for Measuring Pronunciation Difficulty
Total Page:16
File Type:pdf, Size:1020Kb
Assessment of an Index for Measuring Pronunciation Difficulty Katsunori Kotani Takehiko Yoshimi Kansai Gaidai University Ryukoku University [email protected] [email protected] Abstract pronounceability) has not been examined exten- sively. Given that readability and the difficulty of This study assesses an index for measuring listening influence learners’ motivation and out- the pronunciation difficulty of sentences comes (Hwang 2005, Lai 2015, Yoon et al. 2016), (henceforth, pronounceability) based on we consider that CALL for pronunciation learning the normalized edit distance from a refer- ence sentence to a transcription of learn- should consider pronounceability in evaluating ers’ pronunciation. Pronounceability learners’ pronunciation. should be examined when language teach- Pronounceability can be represented as the ers use a computer-assisted language phonetic edit distance from reference pronuncia- learning system for pronunciation learning tion to a learner’s expected pronunciation based to maintain the motivation of learners. on the proficiency. Phonetic edit distance can be However, unlike the evaluation of learners’ measured using a modified version of the Le- pronunciation performance, previous re- venshtein edit distance (Wieling et al. 2014) or a search did not focus on pronounceability deep-neural-network-based classifier (Li et al. not only for English but also for Asian 2016). languages. This study found that the nor- malized edit distance was reliable but not This study measured normalized edit distance valid. The lack of validity appeared to be (NED) using the orthographical transcription of because of an English test used for deter- learners’ pronunciation of reference sentences. An mining the proficiency of learners. advantage of the NED based on orthographic tran- scription is the availability of data. This is because 1 Introduction language teachers can obtain orthographical tran- scription without being trained for phonetic tran- Research on computer-assisted language learning scription. (CALL) has been carried out for learning the pro- This study measures pronounceability using nunciation of European languages as a foreign multiple regression analysis considering ortho- language such as English (Witt & Young 2002, graphic NED as a dependent variable and the fea- Mak et al. 2004, Ai & Xu 2015, Liu & Hung tures of a sentence and a learner as independent 2016) and Swedish (Koniaris 2014). CALL re- variables. First, a corpus for multiple regression search on Asian languages has considered Japa- analysis is developed. This corpus includes the da- nese as a foreign language (Hirata 2004) and ta for NED and the proficiency data in a score- Chinese as a foreign language (Zhao et al. 2012). based scale of Test of English for International The primary goal of CALL systems for the learn- Communication (TOEIC). TOEIC is a widely ing of foreign language pronunciation is to re- used English test in Asian countries, and its test solve interference from the first language of score ranges from 10 to 990. In previous research learners. For instance, a CALL system can ana- (Grahma et al. 2015, Delais-Roussarie 2015, Gósy lyze the speech in which a learner reads English et al. 2015), proficiency was demonstrated using a sentences aloud and presents pronunciation errors point-scale such as the Common European that a learner must read aloud again for reducing Framework of Reference for Languages (six lev- the errors. els from A1 to C2). Even though the methods of evaluating learn- This study assessed our phonetic learner corpus ers’ pronunciation performance have received data by answering the following research ques- considerable attention in previous research, the tions: pronunciation difficulty of sentences (henceforth, 119 Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, pages 119–124 Melbourne, Australia, July 19, 2018. c 2018 Association for Computational Linguistics How stable is NED as a pronounceability 2.2 Annotation of Pronunciation Data index? Our phonetic learner corpus includes NED, the To what extent does NED classify learn- linguistic features of sentences, and learner fea- ers depending on their proficiency? tures. NED was derived as the Levenshtein edit dis- How strongly does NED correlate with a tance normalized by sentence length. It reflected learner’s proficiency? the differences from the reference sentences to the transcription of learners’ pronunciation due to the How accurately is NED measurable substitution, deletion, or insertion of letters. Be- based on linguistic and learner features fore measuring the edit distance, symbols such as for pronounceability measurement? commas and periods were deleted and expressions were uncapitalized in the transcription and refer- 2 Compilation of Phonetic Learner ence data. Corpus The pronunciation was manually transcribed by a transcriber who was a native speaker of English 2.1 Collection of Pronunciation Data and trained to replicate interviews and meetings Our phonetic learner corpus was compiled by re- but was unaccustomed to the English spoken by cording pronunciation data for English texts that learners. The transcriber examined the texts before learners read aloud sentence by sentence. In addi- starting the transcription task. The transcriber was tion, after reading a sentence aloud, learners sub- required to replicate learners’ pronunciation with- jectively determined the pronounceability of sen- out adding, deleting, and substituting any expres- tences on a five-point Likert scale (1: easy; 2: sions for improving grammaticality and/or accept- somewhat easy; 3: average; 4: somewhat difficult; ability (except the addition of symbols such as 5: difficult) (henceforth, SBJ). commas and periods). The texts for reading aloud (the title of Text I is Linguistic features were automatically derived the North Wind and the Sun and that of Text II is from a sentence as follows: Sentence length was the Boy who Cried Wolf) were selected from the derived as the number of words in a sentence. texts distributed by the International Phonetic As- Word length was derived as the number of sylla- sociation (International Phonetic Association bles in a word. The number of multiple-syllable 1999). Even though these texts contain only 15 words in a sentence were derived by calculating sentences, they cover the basic sounds of English ∑ (푆 −1), where n was the number of words (International Phonetic Association 1999, Deterd- in a sentence, and Si was the number of syllables ing 2006). This enables us to analyze which types in the i-th word (Fang 1966). This derivation elim- of English sounds influence learners’ pronuncia- inated the presence of single-syllable words. Word tion. Deterding (2006) reported that Text I failed difficulty was derived as the rate of words not to cover certain sounds, such as initial and medial listed in a basic vocabulary list (Kiyokawa 1990) /z/ and syllable-initial /θ/, and then developed ma- relative to the total number of words in a sentence. terial that covered the English pronunciation for Table 1 summarizes the linguistic features of the these sounds by rewriting a well-known fable by texts that learners read aloud, i.e., text length and Aesop (Text II). The corpus data were compiled from 50 learn- Text I Text II ers of English as a foreign language at university Text length (sentences) 5 10 (28 males, 22 females; mean age: 20.8 years Text length (words) 113 216 (standard deviation, SD, 1.3)). The learners were Sentence length 22.6 (8.3) 21.6 (7.6) compensated for their participation. In our sample, (words) Word length 1.3 (0.1) 1.2 (0.1) the mean TOEIC score was 607.7 (SD, 186.2). (syllables) The minimum and maximum scores were 295 and Multiple syllable word 6.4 (2.8) 5.7 (3.0) 900, respectively. (syllables) Word difficulty 0.3 (0.1) 0.2 (0.1) Table 1: Linguistic features of the texts that learners read aloud. 120 the mean (standard deviation, SD) values of sen- appear to overvalue pronounceability. On the con- tence length, word length, multiple-syllable trary, if NED fails to explain pronounceability, words, and word difficulty. learners appear to undervalue pronounceability. Learner features were determined using the This provides a solution for the improvement of scores of TOEIC for the current or previous year. NED. Even though TOEIC consists of listening and reading tests, it is strongly correlated with the 4 Assessment of NED as a Pronouncea- Language Proficiency Interview, which is a well- bility Index established direct assessment of oral language proficiency developed by the Foreign Service In- In Sections 4.1, 4.2, and 4.3, research questions 1– stitute of the U.S. Department of State (Chauncey 3 are assessed using the classical test theory Group International 1998). (Brown 1996). The fourth question is answered in Section 4.4. 3 Properties of Phonetic Learner Cor- 4.1 Reliability of NED pus The reliability of NED was examined through in- Our phonetic learner corpus was compiled using ternal consistency in terms of Cronbach’s α the method described in Section 2, and this corpus (Cronbach 1970). Internal consistency refers to included 750 instances (15 sentences read aloud whether NED demonstrates similar results for sen- by 50 learners). Table 2 shows the descriptive sta- tences with similar pronounceability. Cronbach’s tistics for NED and SBJ in the phonetic learner α is a reliability coefficient defined by the follow- corpus. ing equation: