A Computer Science Academic Vocabulary List
Total Page:16
File Type:pdf, Size:1020Kb
Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 7-27-2020 A Computer Science Academic Vocabulary List David Roesler Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Applied Linguistics Commons, and the Computer Sciences Commons Let us know how access to this document benefits ou.y Recommended Citation Roesler, David, "A Computer Science Academic Vocabulary List" (2020). Dissertations and Theses. Paper 5540. https://doi.org/10.15760/etd.7414 This Thesis is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. A Computer Science Academic Vocabulary List by David Roesler A thesis submitted in partial fulfillment of the requirements for the degree of Master of Arts in Teaching English to Speakers of Other Languages Thesis Committee: Lynn Santelmann, Chair Susan Conrad Alissa Hartig Portland State University 2020 A COMPUTER SCIENCE ACADEMIC VOCABULARY LIST Abstract This thesis documents the development the Computer Science Academic Vocabulary List (CSAVL), a pedagogical tool intended for use by English-for-specific-purpose educators and material developers. A 3.5-million-word corpus of academic computer science textbooks and journal articles was developed in order to produce the CSAVL. This study draws on the improved methodologies used in the creation of recent lemma- based word lists such as the Academic Vocabulary List (AVL) (Gardner & Davies, 2014) and the Medical Academic Vocabulary List (MAVL) (Lei & Liu, 2016), which take into account the discipline-specific meanings of academic vocabulary. The CSAVL provides specific information for each entry, including part of speech and CS-specific meanings in order to provide users with clues as to how each item is used within the context of academic CS. Based on the comparative analyses performed in this study, the CSAVL was found to be a more efficient tool for reaching an minimal level of academic CS reading comprehension than the widely-used Academic Word List (AWL) (Coxhead, 2000), or the combination of the AWL with the Computer Science Word List (CSWL) (Minshall, 2013). Through coverage tests performed on a variety of corpora, CSAVL was shown to be representative of the written language of academic computer science and focused on the lemmas that are the most relevant to the context of written academic CS. i A COMPUTER SCIENCE ACADEMIC VOCABULARY LIST Acknowledgements I would like to thank Lynn Santelmann, Susan Conrad, Alissa Hartig, and Tanya Sydorenko of the Portland State University Department of Applied Linguistics for their feedback and support during this project. Their suggestions provided vital guidance in the development of this document. I would also like to thank Errin Beck, Julie Haun, and Alyx Cesar at the Portland State University Intensive English Language Program for providing me with the inspiration for the subject of this thesis. ii A COMPUTER SCIENCE ACADEMIC VOCABULARY LIST Table of Contents Abstract ................................................................................................................................ i Acknowledgements ............................................................................................................. ii List of Tables ..................................................................................................................... vi List of Figures ................................................................................................................... vii Glossary ........................................................................................................................... viii Chapter 1: Introduction ................................................................................................... 1 Chapter 2: Literature Review.......................................................................................... 5 2.1 Academic vocabulary ........................................................................................... 6 2.2 An overview of word lists .................................................................................... 8 2.3 Organizing and defining a ‘word’ ...................................................................... 12 2.4 Lexical coverage and comprehension ................................................................ 15 2.5 Developing a corpus ........................................................................................... 17 2.6 Creating and evaluating word lists ..................................................................... 21 2.7 Evaluations of GSL-based word lists ................................................................. 28 2.8 Evaluations of lemma-based word lists .............................................................. 33 2.9 The Computer Science Academic Word List (Minshall, 2013) ......................... 37 2.10 Research Purposes of the CSAVL project ......................................................... 40 Chapter 3: Methodology ............................................................................................... 43 3.1 Corpus design overview ..................................................................................... 44 3.2 Text types ........................................................................................................... 49 3.3 Journal article selection criteria.......................................................................... 55 3.4 Textbook selection criteria ................................................................................. 57 3.5 CSAC1 size and balance .................................................................................... 58 3.6 CSAC2 size and balance .................................................................................... 62 3.7 Data collection & manual processing................................................................. 66 3.8 Automated processing ........................................................................................ 69 3.9 Word selection criteria ....................................................................................... 71 iii A COMPUTER SCIENCE ACADEMIC VOCABULARY LIST 3.10 Tokenization and part of speech tagging............................................................ 74 3.11 Extracting CSAVL candidate lemmas ............................................................... 75 3.12 Extracting CSAVL-S candidate lemmas ............................................................ 80 3.13 Additions to the CSAVL and CSAVL-S............................................................ 81 3.14 Removing undesired content from the CSAC2 .................................................. 82 3.15 Comparing structurally different word lists ....................................................... 84 3.16 Summary of methodology .................................................................................. 89 Chapter 4: Results ......................................................................................................... 91 4.1 Academic word list coverage comparisons ........................................................ 92 4.2 Supplemental CS word list coverage comparisons ............................................ 97 4.3 General word list coverage comparisons for minimum reading comprehension 99 4.4 Coverage across general, academic, and CS corpora ....................................... 105 4.5 Content comparison with technical dictionaries .............................................. 108 4.6 Word list content comparisons ......................................................................... 113 4.7 Coverage tests of overlapping content ............................................................. 119 4.8 Summary of results........................................................................................... 122 Chapter 5: Discussion ................................................................................................. 126 5.1 The coverage and efficiency of the CSAVL and CSAVL-S ............................ 127 5.2 Representativeness and technicality of the CSAVL and CSAVL-S ................ 132 5.3 The content of the CSAVL and CSAVL-S ...................................................... 134 5.4 Usage of the CSAVL and CSAVL-S ............................................................... 136 5.5 The limitations of the CSAVL project ............................................................. 143 5.6 Possibilities for future corpus and word list research ...................................... 145 5.7 Conclusion ........................................................................................................ 149 References ....................................................................................................................... 151 Appendix A: The Computer Science Academic Vocabulary List (CSAVL) ................. 151 Appendix B: The Computer Science Academic Vocabulary Supplemental List (CSAVL- S)…………… ................................................................................................................. 174 Appendix C: Textbook Usage Survey Data ................................................................... 184 Appendix D: Textbook Usage Counts ...........................................................................