An Analysis of the Processing of Multiword Units in Sentence Reading and Unit Presentation Using Eye Movement Data: Implications for Theories of Mwus
Total Page:16
File Type:pdf, Size:1020Kb
University of Alberta An analysis of the processing of multiword units in sentence reading and unit presentation using eye movement data: Implications for theories of MWUs by Georgina Columbus A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor in Philosophy Department of Linguistics © Georgina Columbus Spring 2012 Edmonton, Alberta Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission. Abstract Multiword units (MWUs) have been the subject of much research in psycholinguistics, due to their syntactic and semantic idiosyncrasies. While studies traditionally focused on idioms (a piece of cake), more recent work has focused on another type: lexical bundles (in the middle of). How MWUs are stored and retrieved remains a central question in the literature, the answer to which will add to our understanding of language processing. To date though, there have been few investigations comparing the processing of different types of MWUs. This dissertation aims to fill that gap through analyses of eye movement data during normal sentence reading and trigram reading. The sentence reading results suggest that the familiarity rating for the MWU types analysed here is a relevant predictor of MWU processing. Surprisingly however, individual word frequency has more predictive capacity for MWU reading times than does MWU frequency. Much of the variance is explained by individual word frequency instead. Overall, the three MWU types investigated here are distinguished from one another in fixation durations on words, particularly for idioms and lexical bundles. For sentence reading times, in contrast, the effects of the MWU types are cancelled out, suggesting that processing difficulties may have been resolved at the sentence level. The second study investigates MWU type effects while reading them without context. Each MWU in this study is a trigram taken from the Google Web1T n-gram corpus (Brants & Franz, 2006) using stratified sampling across n- gram frequencies. The trigrams were coded for MWU type based on the categories used in Chapter 1. The results show that MWU effects are visible at the trigram level even without context. Somewhat surprisingly, however, there is also evidence of MWU types affecting processing of the first word in the first fixation duration, and of the first bigram in the subgaze duration. The findings suggest the semantic composition of MWUs is apparent to the reader very early. Our results support a usage-based model of language access and storage, such as those put forward by Bybee (e.g., 2006), Pierrehumbert (2001) and Bod (1998), where individual and unit frequency both affect reading times. Acknowledgments I wish to thank the many people who have helped me complete this dissertation. Firstly, thank you for all the guidance, mentoring and support to my supervisors Patrick Bolger and Harald Baayen, my former supervisors at the University of Alberta, John Newman and Gary Libben, and my Masters and Honours supervisor, without whose influence I wouldn’t be studying MWUs at all, Kon Kuiper at University of Canterbury. Thanks, too, to the fantastic staff at the Department of Linguistics office who helped ensure that I could find jobs and get paid and processed varia for me: Elizabeth French, Jana Tomasovic, Diane McKen & Oliver Rossier. Special thanks to the team of linguists in the graduate offices, both former and current, with their helpful comments on my work, particularly Antoine Tremblay and Laura Teddiman, and Cyrus Shaoul from Psychology. Other people whose discussions and feedback have aided this dissertation are Norbert Schmitt, Kathy Conklin, Walter van Heuven, Jen Hay, Anna Siyanova-Chanturia, Kon Kuiper, Alison Wray, Janet van Hell, Chris Westbury, Debra Titone, and Octavian Ion; many thanks to you all as well. All errors, of course, remain my own. Thanks are also due to the many lab assistants at the Centre for Comparative Psycholinguistics who ran experimental sessions: Kyla Coole, Reagan Cyr, Janelle Dickout, Christen Flottemesch, Simon Fung, Allison Husband, Mikyung Lee, Danielle Markewic, Caelan Marr, Vince Porretta, Laura Teddiman and Antoine Tremblay; to the CCP lab managers over the years: Anamaria Popescu, Chelsi Mitchell, Lindsay Griener, Karin Nault, Laura Teddiman, and Alexandra Mateu Martin; as well as to the excellent technical support staff at the Arts Resource Centre and the Department of Linguistics: Lee Ramsdell, Tom Welz, Al Dennington and Clare Peters. Funding for this dissertation was provided by the University of Alberta’s Provost Doctoral Scholarship, the GRA Rice Communications Scholarship, the GSA Graduate Student Research Assistant Award, the GSA Doctoral Travel Fund Award, numerous Travel Awards from the Department of Linguistics at the University of Alberta, and funding from the Mental Lexicon research group through their CFI grant for early experimental costs (SSHRC MCRI 412-2001- 1009; Gary Libben PI). On a more personal note, huge gratitude is owed to the people who helped me get through the dissertation: to my family, especially Lisa Dooling for her hospitality on many conference trips, and to my parents for their support, especially during their own times of stress, and to my other siblings for their long phone calls and cards and treats from home; to Tamara Sorenson Duncan and her husband James for the hospitality and moral support; to Laura Teddiman for lab- related and relaxation-related support, and finally, to my hugely supportive and fellow academic husband Andrew D.F. Ross, who ‘gets it’, and helped me to focus and write and keep on working when the light at the end of the tunnel seemed too far away. Arohanui. Table of Contents Introduction …………………………………………….………………………1 Background ………………………………………………………………….…4 On methodologies employed ……………………………………………10 Materials and Design ……………………………………………10 Analysis ……………………………………………………………12 References ……………….………………………………………………………18 Chapter 1. Processing Multiword Units: Degrees of idiomaticity seen through eye movements. ……………………………………………………25 Method …………………….……………………………………………...….37 Participants ……………………………………………………………...…….37 Materials ……………….………………………………………………………37 Experimental procedure ……………………………………………………40 Procedure ……………………………………………………………41 Results ….……………………………………………………………………42 Analysis methodology ……………………………………………………43 Predictors in the analysis ……………………………………………43 Results and discussion: First Fixation Duration ……………………………48 L1 ……………………………………………………………………49 L2 ……………………………………………………………………58 Joint analysis of L1 and L2 readers ……………………61 Results and discussion: Word Reading Time ……………………66 L1 ……………………………………………………………………67 L2 ……………………………………………………………………70 Joint analysis of L1 and L2 readers ……………………………76 Results and discussion: Sentence Reading Time ……………………………83 L1 ……………………………………………………………………83 L2 …………………………………...……………………………….85 Joint analysis of L1 and L2 readers ……………………………87 General discussion and conclusion ……………………………………………90 References ……………………………………………………………………….99 Chapter 2. When three become one: eye movements and fixations in trigram reading …………………………………………………………..109 Method ……………………………………………………………………...114 Participants …………………………………………………..114 Materials …………………………………………………………..114 Procedure …………………………………………………………..116 Results ……………………………………………………………………...117 Analysis methodology …………………………………………………..117 Predictors in the analysis …………………………………………..118 Constituency judgements for trigrams …………………………..118 Model results …………………………………………………………..119 Results: First Fixation Duration times on the first word …………..120 Results: Subgaze Fixation Duration times on the first bigram …..125 Results: Total Fixation Duration times on the trigram …………..129 General Discussion …………………………………………………………..133 References ……………………………………………………………………...142 General Discussion and Conclusions …………………………………………..146 References ……………………………………………………………………...154 Appendix A …………………………………………………………………..155 Appendix B …………………………………………………………………..161 List of Tables Table 1-1. Predictors in the analyses ……………………………………..……..45-46 Table 1-2. Numerical values of predictors …………………………………...….….46 Table 1-3. Increase in goodness of fit (as indicated by AIC) with addition of each predictor in the First Fixation Duration L1 model…………..……………...…..50 Table 1-4. Increase in goodness of fit (as indicated by AIC) with addition of each predictor in the First Fixation Duration L2 model……………………...………58 Table 1-5. Increase in goodness of fit (as indicated by AIC) with addition of each predictor in the First Fixation Duration Joint L1 and L2 model……………..…61 Table 1-6. Increase in goodness of fit (as indicated by AIC) with addition of each predictor in the total Word Reading Time L1 model…………………………...67 Table 1-7. Increase in goodness of fit (as indicated by AIC) with addition of each predictor in the total Word Reading Time L2 model…………………………...71 Table 1-8. Increase