Quantitative Determinants of Prefabs: a Corpus-Based, Experimental Study of Multiword Units in the Lexicon." (2013)

University of New Mexico UNM Digital Repository Linguistics ETDs Electronic Theses and Dissertations 7-1-2013 Quantitative determinants of prefabs: A corpus- based, experimental study of multiword units in the lexicon Clayton Beckner Follow this and additional works at: https://digitalrepository.unm.edu/ling_etds Recommended Citation Beckner, Clayton. "Quantitative determinants of prefabs: A corpus-based, experimental study of multiword units in the lexicon." (2013). https://digitalrepository.unm.edu/ling_etds/3 This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at UNM Digital Repository. It has been accepted for inclusion in Linguistics ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected]. i Clayton Beckner Candidate Linguistics Department This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: Jill Morford, Chairperson Joan Bybee William Croft Andrew Wedel ii QUANTITATIVE DETERMINANTS OF PREFABS: A CORPUS-BASED, EXPERIMENTAL STUDY OF MULTIWORD UNITS IN THE LEXICON by CLAYTON BECKNER B.A., Physics, Bradley University, 1994 M.S., English, Illinois State University, 1999 M.A., Linguistics, University of New Mexico, 2005 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Linguistics The University of New Mexico Albuquerque, New Mexico July 2013 iii ACKNOWLEDGEMENTS This dissertation would not have been possible without the support of many individuals, both personally and academically. I thank, first of all, the many participants who generously volunteered their time for my studies, since I could not pursue empirical research without their help. I am grateful to all of my committee members, Jill Morford, Joan Bybee, Bill Croft, and Andy Wedel, for their mentorship. My research is better due to the challenges they posed, and the assistance they gave me. Special thanks are due to my chair, Jill Morford, for her probing questions, feedback, and support throughout the research process. She was a reliable technical resource, voice of reason, and source of encouragement, and could always help me get half-baked ideas to be more fully-baked. I am also especially grateful to Joan Bybee, whose pivotal work has been indispensible to my psycholinguistic research. I thank her for immersing me in usage-based linguistics, first as a student, then as a research assistant, and finally as a collaborator. In addition to my committee members, I’m ever grateful to other UNM faculty, past and present, in linguistics and adjoining fields, including George Luger, Caroline Smith, Melissa Axelrod, Christian Koops, Phyllis Wilcox, Catherine Travis, Larry Gorbet, and Vera John-Steiner. They taught me the science of language and empirical research, provided technical help, and provided helpful encouragement. I thank Nick Ellis, Vsevolod Kapatsinski, Michael Barlow, and Dawn Nordquist for helpful discussions, and for directing me toward useful resources at various stages. I thank Mark Davies for maintaining freely available corpora online, and for answering numerous queries about the workings of his search engines. I am grateful to UNM’s Graduate Resource Center for assistance with statistics. At UNM, I’ve been fortunate to be part of a thriving community of graduate students in linguistics. I have greatly benefitted from this community in various forms, including moral support and random breeze-shooting. There are too many people to thank everyone individually, but I should specifically acknowledge help at various stages of this project from Keri Holley, Gabe Waters, Jason Timm, Evan Ashworth, Susan Metheney, Logan Sutton, Susan Brumbaugh, Jeannine Kammann, Motomi Kajitani, Iphigenia Kerfoot, Benjamin Sienicki, Laura Hirrel, Sook-Kyung Lee, and Amy Lindstrom. I’m also lucky to have a wonderful network of friends and neighbors outside of linguistics, including Kristen Fedesco, Drew Sedrel, Meisha Sedrel, Maggie Faber, Andrew Faber, Liz Bowden, Laura Tomedi, Karla Koch, Thondup Saari, Alexa Wheeler, Caleb Wheeler, Richard Frieday, Laura Lance, Kendra Watkins, Holly von Winckel, and Greg von Winckel. Many of these friends have been hugely supportive of my family, and watched our children in times of need. Many also tried out strange experimental tasks at early stages—some of which were destined for the cutting-room floor—and still they remain my friends. Thank you. Finally, my family has provided me indispensible support and companionship. I am thankful to my children, Saoirse and Roan, for being an endless source of joy and hilarity. Roan, you learned the word ‘dissertation’ at an age that is surely abnormal, and you said it with alarming frequency. (To this day, though, I’m pleased you taught the word to your Pre-K class.) Now we can get back to reading Tolkien, and flipping in the iv living room. To my wife, Danielle, I am indebted in countless ways. Thank you for goading and soothing in the right measure, thank you for your partnership, and your love. On to the next adventure! v QUANTITATIVE DETERMINANTS OF PREFABS: A CORPUS-BASED, EXPERIMENTAL STUDY OF MULTIWORD UNITS IN THE LEXICON by CLAYTON BECKNER B.A., Physics, Bradley University, 1994 M.S., English, Illinois State University, 1999 M.A., Linguistics, University of New Mexico, 2005 Ph.D., Linguistics, University of New Mexico, 2013 ABSTRACT In recent years many researchers have been rethinking the ‘Words and Rules’ model of syntax (Pinker 1999), instead arguing that language processing relies on a large number of preassembled multiword units, or ‘prefabs’ (Bolinger 1976). A usage-based perspective predicts that linguistic units, including prefabs, arise via repeated use, and prefabs should thus be associated with the frequency with which words co-occur (Langacker 1987). Indeed, in several recent experiments, corpus analysis is found to be associated with behavioral measures for multiword sequences (Kapatsinski and Radicke 2009, Ellis and Simpson-Vlach 2009). This dissertation supplements such findings with two new psycholinguistic investigations of prefabs. Study 1 revisits a dictation experiment by Schmitt et al. (2004), in which participants are asked to listen to stretches of speech and repeat the input verbatim, after performing a distractor task intended to encourage reliance on prefabs. I describe the results of an updated experiment which demonstrates that participants are less likely to interrupt or partially alter high-frequency multiword sequences. Although the original study by Schmitt et al. (2004) reported null findings, the revised methodology suggests that frequency indeed plays a role in the creation of prefabs. Study 2 investigates the vi distribution of affix positioning errors (he go aheads) which give evidence that some multiword sequences (e.g., go ahead) are retrieved from memory as a unit. As part of this study, I describe a novel methodology which elicits the errors of interest in an experimental setting. Errors evincing holistic retrieval are induced more often among multiword sequences that are high in Mutual Dependency, a corpus measure that weighs a sequence’s frequency against the frequencies of its component words. Followup analyses indicate that sequence frequency is positively associated with affix errors, but only if component-word frequencies are included as variables in the model. In sum, the studies in this dissertation provide evidence that prefabricated, multiword units are associated with high frequency of a sequence, in addition to statistical measures that take component words’ frequency into account. These findings provide further support for a usage-based model of the lexicon, in which linguistic units are both gradient and changeable with experience. vii TABLE OF CONTENTS LIST OF FIGURES ...........................................................................................................x LIST OF TABLES. .......................................................................................................... xi CHAPTER 1. INTRODUCTION .....................................................................................1 1.0 The notions of ‘prefab,’ and frequency of co-occurrence ..............................................1 1.1. The gradient nature of holistic retrieval ........................................................................4 1.2 Storage vs. retrieval, frequency, and the maximalist lexicon ........................................6 CHAPTER 2. QUANTITATIVE MEASURES OF PREFABS: BEHAVIORAL INVESTIGATIONS AND THEORETICAL ISSUES. ........................12 2.0. Introduction. ................................................................................................................12 2.1. Evidence that token frequency is associated with holistic retrieval ...........................13 2.2. Problems with a purely token frequency-based account .............................................17 2.3. Experimental support for relative frequency accounts ...............................................22 2.4. Complications with Mutual Information, and Mutual Dependency as an alternative 29 2.5. The need for absolute frequency alongside relative frequency ..................................37 2.6. Toward an integrated model .......................................................................................42 CHAPTER 3. PREFABS AND VERBATIM MEMORY: A DICTATION METHODOLOGY RECONSIDERED .................................................43

Quantitative Determinants of Prefabs: a Corpus-Based, Experimental Study of Multiword Units in the Lexicon." (2013)

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support