L2 Processing Advantages of Multiword Sequences: Evidence from Eye-Tracking
Total Page:16
File Type:pdf, Size:1020Kb
L2 Processing Advantages of Multiword Sequences: Evidence from Eye-Tracking Elma Kerz Arndt Heilmann Stella Neumann RWTH Aachen University RWTH Aachen University RWTH Aachen University elma.kerz@ arndt.heilmann@ stella.neumann@ ifaar.rwth-aachen.de ifaar.rwth-aachen.de ifaar.rwth-aachen.de Abstract the frequencies of individual words but also to the frequencies of word sequences (see, e.g., Chris- A substantial body of research has demon- strated that native speakers are sensitive to the tiansen and Arnon, 2017, for a recent overview). frequencies of multiword sequences (MWS). This questions the strict compartmentalization be- Here, we ask whether and to what extent tween the lexicon as a storage of individual words intermediate-advanced L2 speakers of English and a grammar as a set of rules or constrained used can also develop the sensitivity to the statistics to combine them. of MWS. To this end, we aimed to replicate Moving away from the traditional ‘words and the MWS frequency effects found for adult na- rules’ approach, emergentist approaches have put tive language speakers based on evidence from self-paced reading and sentence recall tasks forward alternative theoretical models of lan- in an ecologically more valid eye-tracking guage. Following the literature (see, e.g. Arnon study. L2 speakers’ sensitivity to MWS fre- and Snider, 2010; Kidd et al., 2017; MacWhin- quency was evaluated using generalized linear ney and O’Grady, 2015; Mitchell et al., 2013), mixed-effects regression with separate mod- we use the term ‘emergentist’ as a cover term els fitted for each of the four dependent mea- for a broad class of approaches to language in- sures. Mixed-effects modeling revealed sig- cluding usage-based (a.k.a. experience-based) nificantly faster processing of sentences con- models, constraint-based approaches, exemplar- taining MWS compared to sentences contain- ing equivalent control items across all eye- based models and connectionist models (for more tracking measures. Taken together, these find- general overviews, see, e.g., Beckner et al., ings suggest that, in line with emergentist ap- 2009; Christiansen and Chater, 2016a,b; Ellis and proaches, MWS are important building blocks Larsen-Freeman, 2006; Ellis, 2019; MacWhin- of language and that similar mechanisms un- ney, 2012; McClelland et al., 2010). Distinct derlie both native and non-native language from nativist/generative approaches, emergentist processing. approaches share the folowing two central as- 1 Introduction sumptions: First, emergentist approaches eschew the existence of Universal Grammar and instead 1.1 Emergentist approaches and statistical emphasize that language is learnable via general learning cognitive mechanisms. Second, these approaches A widely held assumption in the language sci- put the emphasis on usage and/or experience with ences, including psycholinguistics, has long been language and assume a direct and immediate re- the ‘words and rules’ view (Levelt, 1993; Jackend- lationship between processing and learning, con- off and Jackendoff, 2002; Pinker, 1999): In this ceiving of them as inseparable rather than gov- view, speakers/writers generate sentences by com- erned by different mechanisms (‘two sides of the bining words according to the grammatical rules same coin’). In these approaches, language acqui- of their language, and listeners/readers compre- sition is viewed as learning how to process effi- hend sentences by looking up words in their men- ciently (see, the ‘learning-as-processing’ assump- tal lexicon and combining them using the same tion, Chang, Dell, and Bock, 2006; see also ‘lan- rules. This view has been challenged recently by guage acquisition as skill learning’ Chater and an accumulating body of evidence demonstrating Christiansen, 2018). One of the major advances that language users are highly sensitive not only to in the language sciences across theoretical orienta- 60 Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 60–69 Florence, Italy, August 2, 2019. c 2019 Association for Computational Linguistics tions has been the recognition that language con- occur is thus a driving force behind chunking and, sists of complex, highly variable patterns occur- all else being equal, each exposure to a given se- ring in sequence, and as such can be described quence of words (sounds or graphemes) will affect in terms of statistical or distributional relations its subsequent processing. But why is there a need among language units (see, e.g., Redington and for chunking? To ameliorate the effects of the Chater, 1997). Thus, learning a language heav- ‘real-time’ constraints on language processing im- ily involves figuring out the statistics inherent in posed by the limitations of human sensory system language input. This is supported by a large body and human memory in combination with the con- of evidence from the literature on statistical learn- tinual deluge of language input (cf., Christiansen ing. Statistical learning – defined as the mecha- and Chater, 2016a,b, for the ‘Now-or-Never bot- nism by which language users discover the pat- tleneck’), through constant exposure to (both au- terns inherent in the language input based on its ditory and visual) language input, humans learn distributional properties – has been shown to facil- to rapidly and efficiently recode incoming infor- itate the acquisition of various aspects of language mation into larger sequences. The fact that lan- knowledge, including phonological learning (e.g., guage is abundant in statistical regularities at mul- Maye et al., 2008; Thiessen and Saffran, 2003), tiple levels of language representations and that word segmentation (e.g., Onnis et al., 2008; Saf- humans are able to detect such regularities via sta- fran et al., 1996), learning the graphotactic and tistical learning allows for such chunking to take morphological regularities of written words (e.g., place. The by-products of statistical learning and Pacton et al., 2005), learning to form syntactic and chunking enable anticipatory language processing semantic categories and structures (e.g., Lany and humans rely on to integrate the greatest possible Saffran, 2010; Saffran and Wilson, 2003; Thomp- amount of available information as fast as pos- son and Newport, 2007). Furthermore, an impres- sible. Processing a MWS as a chunk will min- sive body of evidence has been accumulating over imize memory load and speed up integration of the last years indicating a close relationship be- the MWS with prior context (see, a chunk-based tween individual differences in statistical learning computational model presented in a recent study ability and variation in native language learning in by McCauley and Christiansen, 2019). both child and adult L1 populations (e.g., Conway et al., 2010; Kidd and Arciuli, 2016; Misyak and 1.2 MultiWord Frequency Effects in Online Christiansen, 2012; Siegelman and Frost, 2015), Processing and in adult L2 populations (e.g., Ettlinger et al., 2016; Frost et al., 2013; Onnis et al., 2016). Thus, There is now an extensive body of evidence from an emergentist perspective, language acqui- demonstrating that language users are sensitive sition is essentially an ‘intuitive statistical learning to the input frequency across all levels of lin- problem’ (Ellis, 2008, p. 376). guistic analysis (Ellis, 2002; Diessel, 2007; Juraf- sky, 2003). An accumulating body of evidence Emergentist approaches have developed a grow- now suggests that frequency effects also extend ing interest in the role of multiword sequences to the processing of MWS. Children and adults (henceforth MWS), also commonly referred to as are shown to be sensitive to the statistics of MWS ’formulaic sequences’ (Wray, 2013). MWS are and rely on knowledge of such statistics to facil- succinctly defined as variably-sized compositional itate language processing and boost their acquisi- recurring sequence patterns comprised of multiple tion (for overviews, see, Christiansen and Arnon, words (for a recent overview, see Arnon and Chris- 2017; Shaoul and Westbury, 2011). tiansen, 2017). Three mechanisms that have been In the area of native language processing, a proposed to underpin frequency effects specifi- number of comprehension and production stud- cally in learning word sequences are described as ies have provided evidence of processing advan- follows (Diessel, 2007): [1] increased frequency tages for MWS over non-MWS (see, e.g., Arnon causes the strengthening of linguistic representa- and Snider, 2010; Bannard and Matthews, 2008; tions, [2] increased frequency causes the strength- Conklin and Schmitt, 2012; Durrant and Doherty, ening of expectations and [3] increased frequency 2010; Tremblay et al., 2011). Many of these leads to the automatization of chunks. The fre- studies follow an approach where the target stim- quency with which building blocks of language uli are restricted to a certain frequency thresh- 61 old. The threshold-approach studies aimed to de- than two words. The few existing studies have termine whether and to what extent MWS – i.e., produced inconsistent results: Some studies found more precisely ‘lexical bundles’ (LB)– are pro- frequency effects in processing of MWS in non- cessed faster over less frequent counterparts (non- native speakers (e.g. Jiang and Nekrasova, 2007), LB). The stimulus material is typically derived whereas other studies found no such effects (e.g. from language corpora based on predefined fre- Babaei et al., 2015). In addition, these previ- quency