The Percentage of Words Known in a Text and Reading Comprehension
Total Page:16
File Type:pdf, Size:1020Kb
The Percentage of Words Known in a Text and Reading Comprehension NORBERT SCHMITT XIANGYING JIANG WILLIAM GRABE University of Nottingham West Virginia University Northern Arizona University Nottingham, United Kingdom Morgantown, WV Flagstaff, AZ Email: norbert.schmitt@ Email: xiangying.jiang@ Email: [email protected] nottingham.ac.uk mail.wvu.edu This study focused on the relationship between percentage of vocabulary known in a text and level of comprehension of the same text. Earlier studies have estimated the percentage of vocabulary necessary for second language learners to understand written texts as being between 95% (Laufer, 1989) and 98% (Hu & Nation, 2000). In this study, 661 participants from 8 countries completed a vocabulary measure based on words drawn from 2 texts, read the texts, and then completed a reading comprehension test for each text. The results revealed a relatively linear relationship between the percentage of vocabulary known and the degree of reading comprehension. There was no indication of a vocabulary “threshold,” where comprehension increased dramatically at a particular percentage of vocabulary knowledge. Results suggest that the 98% estimate is a more reasonable coverage target for readers of academic texts. IN A RECENT ARTICLE, NATION (2006) CON- ever, there is a very large difference between learn- cluded that much more vocabulary is required ing 3,000 and 9,000 word families, and this has to read authentic texts than has been previously massive implications for teaching methodology. thought. Whereas earlier research suggested that When the instructional implications of vocabu- around 3,000 word families provided the lexical lary size hinge so directly on the percentage of resources to read authentic materials indepen- coverage figure, it is important to better estab- dently (Laufer, 1992), Nation argues that in fact lish the relationship between vocabulary cover- 8,000–9,000 word families are necessary. The key age and reading comprehension. Common sense factor in these widely varying estimates is the per- dictates that more vocabulary is better, and there centage of vocabulary in a text that one needs to is probably no single coverage figure, for exam- comprehend it. An earlier study (Laufer, 1989) ple 98%, over which good comprehension occurs came to the conclusion that around 95% cov- and short of which one understands little. Indeed, erage was sufficient for this purpose. However, both Laufer (1989, 1992) and Hu and Nation Hu and Nation (2000) reported that their partic- (2000) found increasing comprehension with in- ipants needed to know 98%–99% of the words in creasing vocabulary coverage. This suggests that texts before adequate comprehension was possi- there is a coverage/comprehension “curve,” in- ble. Nation used the updated percentage figure of dicating that more coverage is generally better, 98% in his analysis, which led to the 8,000–9,000 but it may or may not be linear. This study builds vocabulary figure. on Laufer’s and Hu and Nation’s earlier studies As reading is a crucial aid in learning a sec- and uses an enhanced research methodology to ond language (L2), it is necessary to ensure that describe this curve between a relatively low vocab- learners have sufficient vocabulary to read well ulary coverage of 90% to knowledge of 100% of (Grabe, 2009; Hudson, 2007; Koda, 2005). How- the words in a text. The Modern Language Journal, 95, i, (2011) BACKGROUND DOI: 10.1111/j.1540-4781.2011.01146.x 0026-7902/11/26–43 $1.50/0 Reading is widely recognized as one of the most C 2011 The Modern Language Journal important skills for academic success, both in Norbert Schmitt, Xiangying Jiang, and William Grabe 27 first language (L1) and L2 environments (Johns, language English-as-a-foreign-language (EFL) 1981; National Commission on Excellence in learners ranged from 1,000–4,000.1 Whereas a Education, 1983; Rosenfeld, Leung, & Oltman, 3,000–5,000 word family reading target may seem 2001; Sherwood, 1977; Snow, Burns, & Griffin, attainable for learners, with hard work, the 8,000– 1998). In many cases, L2 reading represents the 9,000 target might appear so unachievable that primary way that students can learn on their teachers and learners may well conclude it is not own beyond the classroom. Research has iden- worth attempting. Thus, the lexical size target is a tified multiple component skills and knowledge key pedagogical issue, and one might ask why the resources as important contributors to reading various estimates are so different. abilities (Bowey, 2005; Grabe, 2004; Koda, 2005; The answer to that rests in the relationship be- Nassaji, 2003; Perfetti, Landi, & Oakhill, 2005). tween vocabulary knowledge and reading com- However, one of the primary factors consistently prehension. In a text, readers inevitably come shown to affect reading is knowledge of the across words they do not know, which affects their words in the text. In general, research is increas- comprehension. This is especially true of L2 learn- ingly demonstrating what practitioners have al- ers with smaller vocabularies. Thus, the essential ways known: that it takes a lot of vocabulary to question is how much unknown vocabulary learn- use a language well (for more on this, see Na- ers can tolerate and still understand a text. Or we tion, 2006; Schmitt, 2008). This is particularly can look at the issue from the converse perspec- true for reading. Vocabulary knowledge and read- tive: What percentage of lexical items in a text ing performance typically correlate strongly: .50– do learners need to know in order to successfully .75 (Laufer, 1992); .78–.82 (Qian, 1999); .73–.77 derive meaning from it? (Qian, 2002). Early research estimated that it took Laufer (1989) explored how much vocabulary 3,000 word families (Laufer, 1992) or 5,000 indi- is necessary to achieve a score of 55% on a read- vidual words (Hirsh & Nation, 1992) to read texts. ing comprehension test. This percentage was the Similarly, Laufer (1989) came up with an estimate lowest passing mark in the Haifa University sys- of 5,000 words. More recent estimates are consid- tem, even though earlier research suggested that erably higher, in the range of 8,000–9,000 word 65%–70% was the minimum to comprehend the families (Nation, 2006). English on the Cambridge First Certificate in These higher figures are daunting, but even so, English examination (Laufer & Sim, 1985). She they probably underestimate the lexis required. asked learners to underline words they did not Each word family includes several individual word know in a text, and adjusted this figure on the basis forms, including the root form (e.g., inform), of results of a translation test. From this she calcu- its inflections (informed, informing, informs), and lated the percentage of vocabulary in the text each regular derivations (information, informative). Na- learner knew. She found that 95% was the point tion’s (2006) British National Corpus lists show which best distinguished between learners who that the most frequent 1,000 word families average achieved 55% on the reading comprehension test about six members (types per family), decreasing versus those who did not. Using the 95% figure, to about three members per family at the 9,000 Laufer referred to Ostyn and Godin’s research frequency level. According to his calculations, a (1985) and concluded that approximately 5,000 vocabulary of 8,000 word families (enabling wide words would supply this vocabulary coverage. Al- reading) entails knowing 34,660 individual word though this was a good first attempt to specify forms, although some of these family members the vocabulary requirements for reading, it has a are low-frequency items. The upshot is that stu- number of limitations (see Nation, 2001, pp. 144– dents must learn a large number of individual 148 for a complete critique). Ostyn and Godin’s word forms to be able to read a variety of texts frequency counts are of Dutch, and it is not clear in English, especially when one considers that the that they can be applied directly to English. Their figures above do not take into account the multi- methodology also mixed academic texts and news- tude of phrasal lexical items that have been shown paper clippings, but different genres can have dif- to be extremely widespread in language use (e.g., ferent frequency profiles (see Nation, 2001, Table Grabe, 2009; Schmitt, 2004; Wray, 2002). 1.7). Perhaps most importantly, the comprehen- Unfortunately, most students do not learn sion criterion of 55% seems to be very modest, this much vocabulary. Laufer (2000) reviewed and most language users would probably hope for a number of vocabulary studies from eight dif- better understanding than this. Nevertheless, the ferent countries and found that the vocabulary 95% coverage figure and the related 3,000–5,000 size of high school/university English-as-a-second- vocabulary size figure were widely cited. 28 The Modern Language Journal 95 (2011) A decade later, Hu and Nation (2000) com- equate comprehension (establishing 12 correct pared reading comprehension of fiction texts at answers out of 14 on an MC test and 70 out of 80%, 90%, 95%, and 100% vocabulary coverages. 124 on a WR test and then determined whether Sixty-six students studying on a pre-university learners at the four coverage points reached these course were divided into four groups of 16–17 criteria. They concluded that 98% coverage was participants. Each group read a 673-word story, the level where this was likely to happen, although at one of the aforementioned vocabulary cover- their study also clearly showed increasing compre- age levels. They then completed multiple-choice hension with increasing vocabulary: 80% cover- (MC) and cued written recall (WR) comprehen- age = 6.06 MC and 24.60 WR; 90% = 9.50, 51.31; sion tests.