Building High-Frequency Word Lists for the Semantic Domain of ʻĀINA (‘Land’) Using a Raw Corpus of Spoken ʻōlelo Hawaiʻi
Total Page:16
File Type:pdf, Size:1020Kb
University of Hawai‘i at Mānoa Building High-frequency Word Lists for the Semantic Domain of ʻĀINA (‘land’) Using a Raw Corpus of Spoken ʻŌlelo Hawaiʻi Catherine Elizabeth Lee Brockway April 2021 A dissertation submitted to the University of Hawaiʻi at Mānoa Graduate Division in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Linguistics Dissertation committee: Andrea Berez-Kroeker, Chairperson Gary Holton Christina Higgins Kristopher Kyle Katrina-Ann R. Kapāʻanaokalāokeola Nākoa Oliveira He aliʻi ka ʻāina; he kauā ke kanaka. Acknowledgments Mahalo au i ke kōkua hoʻomanawanui a nā kānaka ʻŌlelo Hawaiʻi, especially the kānaka ʻŌlelo Hawaiʻi who participated in my freelisting elicitation sessions as well as Larry Kimura and the members of Aloha ʻĀina Tuahine and everyone who called in to share their leo and manaʻo on the Ka Leo Hawaiʻi radio program. Mahalo au i kuʻu mau kumu ʻŌlelo Hawaiʻi: Pōmaikaʻi Stone, Kamaka Gramberg, Kaliko Baker, Lolena Nichols, a me Kainoa Wong (Ke Kulanui o Hawaiʻi ma Mānoa), a me nā kumu ma Hoʻokahua (Nā Kula ʻo Kamehameha). Mahalo iā kuʻu poʻo, ʻO Dr. Andrea Berez-Kroeker, for your steadfast encouragement, for being a scholar I can look up to, and for believing in this project and my ability to com- plete it. Thank you to the members of my committee for your challenging questions, article recommendations, deep conversations, and the countless other ways you have all shaped this dissertation and my understanding of scholarship and language. I also wish to express my thanks for the institutional and financial support of the Department of Linguistics, the Bilinski Summer Research Award, the Bilinski Doctoral Fellowship, the Language Beyond the Classroom program, the Kaipuleohone archive, and the Kaniʻāina project. Mahalo i ke kōkua a kuʻu mau hoa: my accountabilibuddies, especially Dannii Yarbrough and Claire Stabile, Brad Rentz for his help with LaTeX, Amber Camp for her editorial support in the final stretch, and Noah Haʻalilio Solomon for crucial help with glosses and other ʻŌlelo questions. Mahalo iā kuʻu ʻohana: my wife Amanda Brockway, for helping me persevere and taking me on adventures when I needed it; my parents Linda Jones and Jeff Lee, for preparing me for this my whole life; and my sister Charlotte Stockton-Lee, who always believed in me. Mahalo iā kuʻu ʻīlio, ʻŌ Joxer, who loyally napped behind me through 90% of the writing of this dissertation. ii Abstract This dissertation presents high frequency word lists of ʻŌlelo Hawaiʻi ‘Hawaiian language’ within an emic ontology of the semantic domain of ʻāina ‘land, landscape’ using word fre- quency in the raw spoken language corpus of the radio program Ka Leo Hawaiʻi. Using freel- isting and salience analysis, this study defines the core and periphery of the semantic domain of ʻāina within the minds of current speakers, indicating that the domain includes not only geomorphological features such as mauna ‘mountain’ and pali ‘cliff’, but also plants and ani- mals, humans and human-made structures, and water features, including kai ‘sea, saltwater’. Based on this emic definition of the semantic domain, twelve target words are identified and used in a word association strength analysis of the Ka Leo Hawaiʻi corpus using T-scores to identify words likely to co-occur with the target ʻāina words. This dissertation provides the first 95% text coverage list of high frequency words inthe Ka Leo Hawaiʻi corpus, with 712 of the highest-frequency words broken into seven bands of descending word frequencies. This list is cross-referenced with the results of the association strength analysis to identify high frequency ʻāina-related words, which yields a vocabulary list structured by frequency bands for the target semantic domain. iii Table of Contents Acknowledgments ......................................... ii Abstract ............................................... iii List of Figures ............................................ vii List of Tables ............................................ ix List of Abbreviations ....................................... xiv 1 Introduction .......................................... 1 1.1 Research Questions . 2 1.2 Context of the Research Questions . 4 1.2.1 Geography of Hawaiʻi . 4 1.2.2 Hawaiian Scholarly Expectations . 4 1.2.3 My Position as a Researcher . 5 1.3 Roadmap of the Dissertation . 6 2 Background and Relevant Literature ........................... 8 2.1 ʻŌlelo Hawaiʻi . 8 2.1.1 Typology and Variation . 8 2.1.2 Documentation . 11 2.1.3 Revitalization . 13 2.2 Landscape Ontology Research . 15 2.2.1 Linguistic Ontologies . 17 2.2.2 Landscape Ethnoecology and Ethnophysiography . 19 3 Salience Analysis of ʻĀina-related Categories ...................... 22 3.1 Freelisting and Salience Analysis Literature Review . 22 3.2 Data Collection . 25 3.2.1 Participants . 26 3.2.2 Interaction Order . 27 3.2.3 Recordings . 30 3.2.4 Data Extraction . 32 iv 3.3 Salience Analysis . 37 3.4 Determination of High Salience Categories . 40 3.5 High Salience ʻĀina Categories . 41 3.6 Chapter Conclusion . 63 4 Ka Leo Hawaiʻi Corpus and Frequency-based ʻĀina Vocabulary Lists ........ 65 4.1 Frequency and Percent Coverage Literature Review . 66 4.2 Data Collection . 68 4.2.1 Participants . 69 4.2.2 Interaction Order and Genres . 69 4.2.3 Recordings . 69 4.2.4 Data Extraction . 69 4.3 95% Coverage Word Lists . 72 4.4 Salience and Frequency . 74 4.4.1 High Salience and High Frequency . 75 4.4.2 High Salience and Lower Frequency . 83 4.5 Association Strength Analysis . 85 4.6 Frequency-based ʻĀina Vocabulary Lists for Beginner Learners of ʻŌlelo Hawaiʻi 90 4.7 Chapter Conclusion . 120 5 Discussion ............................................ 121 5.1 Discussion of Salience Analysis . 121 5.2 Discussion of Frequency Analysis . 124 5.3 Discussion of Association Strength Analysis . 125 5.4 Discussion of Frequency-based ʻĀina Vocabulary Lists for Beginner Learners 126 6 Conclusion ........................................... 129 6.1 Recommended Next Steps . 130 Appendices ............................................. 133 A R Code .............................................. 133 A.1 R Code for Salience Analysis . 133 A.2 R Code for Percent Coverage . 136 B Salience Analysis Results .................................. 139 B.1 Salience Analysis Tables . 139 B.2 Salience Analysis Graphs . 169 C Frequency Bands for 95% Coverage of Ka Leo Hawaiʻi Corpus ............ 188 D T Score Collocations ..................................... 213 E ʻĀina-related Word Lists by Frequency Band ....................... 263 v References ............................................. 271 vi List of Figures 2.1 Simplified linguistic family tree of ʻŌlelo Hawaiʻi. Adapted from Walworth (2014)............................................ 9 3.1 Screenshot of ELAN software showing three tiers for freelist analysis. 33 3.2 Screenshot of ELAN window with Question 14 selected on Org Structure Tier. Note that corresponding categories on the Categories tier are also selected and are listed in the grid view in the top, right-hand corner. Participant 42: Pōmai. 38 3.3 Example text files of Question 1 individual (a) and combined (b) freelists. .39 4.1 AntConc after running the Word List function. 73 4.2 Settings used in AntConc for Association Strength Analysis . 88 B.1 Smith’s S for all categories in response to Question 1: He aha kekahi mau mea o ka ʻāina? . 170 B.2 Smith’s S for all categories in response to Question 2: He aha kekahi mau ʻano kahakai? . 171 B.3 Smith’s S for all categories in response to Question 3: He aha kekahi mau mahele o kahakai? . 172 B.4 Smith’s S for all categories in response to Question 4: He aha kekahi mau ʻano nalu? . 173 B.5 Smith’s S for all categories in response to Question 5: He aha kekahi mau mahele o ka nalu? . 173 B.6 Smith’s S for all categories in response to Question 6: He aha kekahi mau ʻano kai holo? . 174 B.7 Smith’s S for all categories in response to Question 7: He aha kekahi mau mahele o ke kai holo? . 174 B.8 Smith’s S for all categories in response to Question 8: He aha kekahi mau ʻano papa koʻa? . 175 B.9 Smith’s S for all categories in response to Question 9: He aha kekahi mau mahele o ka papa koʻa? . 175 B.10 Smith’s S for all categories in response to Question 10: He aha kekahi mau ʻano one? . 176 vii B.11 Smith’s S for all categories in response to Question 11: He aha kekahi mau ʻano puʻu? . 176 B.12 Smith’s S for all categories in response to Question 12: He aha kekahi mau mahele o ka puʻu? . 177 B.13 Smith’s S for all categories in response to Question 13: He aha kekahi mau ʻano mauna? . 178 B.14 Smith’s S for all categories in response to Question 14: He aha kekahi mau mahele o ka mauna? . 179 B.15 Smith’s S for all categories in response to Question 15: He aha kekahi mau ʻano kahawai? . 180 B.16 Smith’s S for all categories in response to Question 16: He aha kekahi mau mahele o kahawai? . 181 B.17 Smith’s S for all categories in response to Question 17: He aha kekahi mau ʻano loko wai? . 182 B.18 Smith’s S for all categories in response to 18: He aha kekahi mau mahele o kalokowai? .......................................182 B.19 Smith’s S for all categories in response to Question 19: He aha kekahi mau ʻano lua? . 183 B.20 Smith’s S for all categories in response to Question 20: He aha kekahi mau mahele o ka lua? . 183 B.21 Smith’s S for all categories in response to Question 21: He aha kekahi mau ʻano ʻā? . 184 B.22 Smith’s S for all categories in response to Question 22: He aha kekahi mau ʻano pōhaku? . 184 B.23 Smith’s S for all categories in.