ED382928.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
DOCUMENT RESUME ED 382 928 CS 012 139 AUTHOR Breland, Hunter M.; And Others TITLE The College Board Vocabulary Study. College Board Report No. 94-4. INSTITUTION College Entrance Examination Board, New York, N.Y. REPORT NO ETS-RR-94-26 PUB DATE 94 NOTE 56p. AVAILABLE FROM College Board Publications, Box 886, New York, NY 10101-0886 ($12). PUB TYPE Reports Research/Technical (143) EDRS PRICE MF01/PC03 Plus Postage. DESCRIPTORS College Freshmen; Higher Education; High Schools; High School Students; Language Patterns; Lexicology; *Reading Materials; Reading Research; *Vocabulary; *Word Frequency IDENTIFIERS Words ABSTRACT A study provided an up-to-date source of word frequency information based in the kinds of reading materials to which high school and first-year college students are exposed. A corpus of 14,360,884 words was assembled from acomprehensive listing of reading materials from curriculum surveys, state curriculum guides, private school reading lists, research surveys, federal reports, recommended reading lists, and other sources. Includedin the sample of reading materials were American and British novels, poetry, drama, essays, biographies, autobiographies, current periodicals, historical documents, and text from an encyclopedia. The following statistics were generated: (1)the overall frequency of occurrence of each word in the corpus;(2) an index of dispersion for each word over 27 text categories;(3) an estimate of the number of occurrences per one million words of running textfor each word that would be expected in a similar but different corpus; and(4) a standard frequency index developed from a logarithmic transformation. (Contains 43 references and 7 tables of data. Appendixes present a list of materials surveyed, and a list of materials sampled for word count.) (RS) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. *********t************************************************************* 1!a... ..10a. ." College Board Report No. 94-4 00 (*V 00 L.1.1 The College Board Vocabulary Study HUNTER M. IRELAND, ROBERT J. JONES, and LAURA JENKINS BEST COPY AVAILABLE "PERMISSION TO REPRODUCE THIS U.S. DEPARTMENT OF EDUCATION '.424MATERIAL HAS BEEN GRANTED BY Offrce of Educational Reearch and Improvement EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) CI This documitnI has been reproduced as 'eceived from the person or organization .figinating it ea The College Board C M.nor changes have been made lo improve repiOduction quality i(),. All Points of view ot opinions staleo in this docu- TO THE EDUCATIONAL RESOURCES ment do nol necessarily represent official ------2 INFORMATION CENTER (ERIC)." OERI position or policy College Board Report No. 94-4 ETS RR No. 94-26 The College Board Vocabulary Study HUNTER M. BRELAND, ROBERT J. JONES, and LAURA JENKINS with the assistance of Marion Paynter, Judith Pollack, and Y. Fai Fong CollegeEntranceLaminationBoard, New York, 1994 Acknowledgments The authors are indebted to a number of people who served as consultants and advisers during the course of the project. Since the project was in some ways similar to one completed in 1971 by John B. Carroll when he was with Educational Testing Service (ETS), we contacted him at the University of North Carolina in the early stages of the project. He provided much useful information, including extensive FORTRAN programs he had developed over the years that were not available from any other source. He was always willing to take the time to answer questions. ETS colleague lssac Bejar and Roger Chaffin of Trenton State College also gave early advice on the project. We also consulted with Louis T. Milic of Cleveland State University, author of the Augustan Prose Sample and well versed in the intricacies of corpus development and lan- guage analysis. He introduced us to two associations that Hunter M. Breland is a senior research scientist at ETS. proved to be especially valuable sources of information, the Robert J. Jones was formerly a senior examiner at ETS. Association for Literary and Linguistic Computing and the Association for Computers in the Humanities. Laura Jenkins is a principal research data analyst at ETS. Marion Paynter was formerly associate librarian at ETS. Randall Jones of Brigham Young University, secretary of the Judith Pollack is an advanced research systems specialist. Association for Computers in the Humanities, helped Y. Fai Fong was formerly associate research data analyst. immensely in obtaining text in electronic form. He advised us on the use of the WORDCRUNCHER text retrieval Researchers are encouraged to freely express their pro- system and put us in contact with his associates at the fessional judgment. Therefore, points of view or opinions Electronic Text Corporation, which markets books and stated in College Board Reports do not nect!ssarily represent other materials in electronic form. official College Board position or policy The Oxford Text Archive was also a critical contributor. At the time the project began, it was the most importantsource of text in electronic form. Probably one-third of the textwe The College Board is a national nonprofit association that were able to obtain in electronic form came from the champions educational excellence for all students through Archive. Judith Proud and Lou Burnard of the Archivewere the ongoing collaboration of more than 2,900 member especially helpful. schools, colleges, universities, education systems, and organizations. The Board promotesby means of respon- Richard Venezky of the University of Delaware helped sive forums, research, programs, and policy development orient us to the esoteric world of lexicography and advised universal access to high standards of learning, equity of on a number of related issues, including sources of elec- opportunity, and sufficient financial support so that every tronic text. Some hard-to-find works of William Faulkner student is prepared for success in college and work. were obtained from the U.S. Military Academy at West Point. Touchstone Applied Science Associates supplied word Ordering Information counts for numerous textbooks and other difficult to obtain Additional copies of this report may be obtained from titles. College Board Publications, Box 886, New York, Finally, we are indebted to staff at ETS for assistance and New York 10101-0886. The price is $12. timely expertise in several areas. Marion Paynter, formerly associate librarian, connected us with DIALOG Information The data base used in this study is available on requeston Services and with the NEXIS service of Mead Data Central. 3.5" IBM- or Mac-formatted disks. If you would like to These two services provided most of the periodicaltext. She order copies of the data base on disk, please write Dr. also gave us access to the Rutgers Inventory of Machine- Howard Everson, Research and Development, The College Readable Text, and she introduced us to the world of CD- Board, 45 Columbus Avenue, New York, NY 10023-6992. ROM from which we obtained encyclopedia text. Judith Pollack, veteran data analyst, wrote the program that Copyright © 1994 by College Entrance Examination Board. combined all the text word counts into the final listing and All rights reserved. College Board, SAT, AP, CLEP, and the computed word frequency statistics. Laura Jenkins devel- acorn logo are registered trademarks of the College En- oped the data base and conducted the data analyses. Y. Fai trance Examination Board. Fong designed the original format for the alphabetical word list. Peggy Fisher helped assemble some of the early book Printed in the United States of America. lists. 4 Contents Abstract 1 7. Comparisons of BWVT Word Difficulties and U Values for Four Corpora 11 Introduction Purpose of the Study 2 8. Comparisons of Selected Words in Different Vocabulary Acquisition 2 Corpora 11 Word Frequency 2 Text Sampling Procedures 3 Compilation of Works 3 Leading Authors 3 Leading Works 4 Other Materials Sampled 4 Sampled Text by Categories 5 Computational Procedures 6 Analysis of the College Board Corpus 7 Comparisons with Subjective Estimates of Word Frequency 7 Comparisons with Word Difficulty Estimates 7 Comparisons of Selected Words / References 11 Appendix A: Materials Surveyed for the College Board Vocabulary Study 13 Appendix B: Materials Sampled for Word Count 44 Tables 1. Leading Authors 4 2. Leading Works 4 3. Textbooks, Periodicals, and Other Materials Sampled 5 4. Text by Categories 6 5. Comparisons between Subjective Estimates of Word Frequencies and Objective Estimates Based on Word Counts 8-9 6. Comparisons of Dale and O'Rourke Word Difficulties and U Values for Four Corpora 9 The corpus described in this report came about be- Abstract cause of deficiencies in previous corpora.They were out of date, not large enough, focused on younger age This study was conducted to provide an up-to-date groups, or were not representative ofEnglish as studied source of word frequency informationbased on the in U.S. high schools. Because of such deficiencies, there kinds of reading materials to which high school and was a reluctance to rely on theinformation derived first-year college students are exposed. It began with a from these corpora. The present corpus is not perfect, comprehensive listing of reading materials from cur- but it attempts to address some of the problems of pre- riculum surveys, state curriculum guides, private school vious efforts. In order to develop a corpus of sufficient