Language Learning Research at the Intersection of Experimental, Computational and Corpus-Based Approaches Patrick Rebuschat
Total Page:16
File Type:pdf, Size:1020Kb
Language learning research at the intersection of experimental, computational and corpus-based approaches (1,2) (2,3) (1,4) Patrick Rebuschat , Detmar Meurers , and Tony McEnery (1) (2) Department of Linguistics and English Language, Lancaster University; LEAD Graduate (3) School and Research Network, University of Tübingen; Seminar für Sprachwissenschaft, (4) University of Tübingen; ESRC Centre for Corpus Approaches to Social Science Keywords: Psycholinguistics; corpus linguistics; computational linguistics; natural language processing; first language acquisition; second language acquisition; multimethod approaches; interdisciplinary research Acknowledgements: Our research was supported by the Economic and Social Research Council, UK (grant ES/K002155/1) and by the LEAD Graduate School & Research Network (grant DFG-GSC1028), a project of the Excellence Initiative of the German federal and state governments. Correspondence: Correspondence concerning this article should be addressed to Patrick Rebuschat, Department of Linguistics and English Language, Lancaster University, Lancaster LA1 4YL, United Kingdom, E-mail: [email protected]. Language acquisition occupies a central place in the study of human cognition, and research on how we learn language can be found across many disciplines, from developmental psychology and linguistics to education, philosophy and neuroscience. It is a very challenging topic to investigate given that the learning target in first and second language acquisition is highly complex, and part of the challenge consists in identifying how different domains of language are acquired to form a fully functioning system of usage (Ellis, this volume). Correspondingly, the evidence about language use and language learning is generally shaped by many factors, including the characteristics of the task in which the language is produced (Alexopoulou et al, this volume). The challenge is further complicated by the fact that language acquisition is affected by individual learner characteristics. Individual differences are particularly well-studied for second language acquisition, where it is clear that factors such as native language, type of instruction, and motivation affect learning rate and ultimate attainment (Ushioda & Dörnyei, 2012; Williams, 2012). But recent research indicates that there is also considerable individual variation in child language development (see Rowland, 2013). To develop an understanding of language acquisition, we need to take into account these individual differences (MacWhinney, this volume). Despite these and other challenges, the past decades have witnessed significant progress in our understanding of how children and adults learn languages. The conceptual and empirical progress arguably is fueled by an increasing range of methods and approaches that are being used to study language acquisition (see Hoff, 2011; Mackey & Gass, 2012). For example, experimental approaches using artificial or natural languages have made it possible to investigate how changes across exposure conditions such as input frequency, instruction type, or prior 1 knowledge affect learning in rigorously controlled environments. Learner corpora are growing in size and task types covered, with increasingly rich annotation supporting detailed analyses employing sophisticated statistical methods. Digital learning environments integrating computational methods hold the promise of supporting the systematic exploration of learning mechanisms in authentic teaching and learning, providing new sources of evidence on the roles played by the linguistic environment, interaction, and feedback in learning. The investigation of a complex phenomenon like language acquisition can significantly benefit from insights, tools, and methods from many disciplines, yet it is still relatively rare to find studies that combine multiple approaches. The research described in Monaghan and Mattock (2012), Ellis, Römer and O’Donnell (2016), and Christiansen and Chater (2016) transparently illustrates the potential of multimethod approaches to language. For example, Monaghan and Mattock’s (2012) investigation of word learning is an excellent illustration of how corpus research can connect with experimental research. Monaghan and Mattock first conducted corpus analyses of child-directed speech. They then used the information derived from these analyses to construct an artificial language that is based on natural language statistics. On this basis, they investigated the acquisition of nouns and verbs by adult learners in an artificial language experiment. While artificial language research is occasionally criticized for its limited ecological validity, the use of distributional information from natural language corpora in the artificial language construction mitigates some of this criticism (see also Monaghan & Rowland, this volume). Another impressive example of multimethod research is Ellis, Römer, and O’Donnell (2016). Ellis et al. investigate the acquisition, processing and use of Verb-Argument Constructions (VACs), and their monograph 2 contains series of behavioral experiments, large-scale corpus analyses supported by Natural Language Processing (NLP) techniques, and several computational simulations (connectionist and agent-based). The result of this systematic multimethod exploration is a significant, in-depth understanding of how we learn, process and use VACs – and a research model for others to follow suit. Finally, Christiansen and Chater’s (2016) theoretical framework for understanding language acquisition, evolution, and processing is the direct result of multimethod research and would not be possible without the insights the authors gained from working at the intersection of experimental, computational and corpus-based approaches for more than two decades. The question of how to promote multidisciplinary research across methodological boundaries has been central to the work of the three editors of this volume. A series of review articles aiming to connect research areas and introduce methodologies exemplify this (e.g., Meurers, 2012, 2015; Meurers & Dickinson, this volume; Rebuschat, 2013). One of the editors, Tony McEnery, directs the ESRC Centre for Corpus Approaches to Social Sciences (CASS, http://cass.lancs.ac.uk) at Lancaster University, whose primary objective is to enable colleagues in other, non-linguistic disciplines to utilize the corpus approach. The two other editors are part of Tübingen’s unique LEAD Graduate School and Research Network, which brings together over 130 scientists from Education, Psychology, Linguistics, Neuroscience, Informatics, Sociology, and Economics to investigate learning and educational achievement.1 The LEAD initiative includes an interdisciplinary research and training program for doctoral students and postdocs, which is funded by Germany’s Excellence Initiative. In the same spirit, we have enjoyed organizing numerous symposia, workshops, summer schools, and conferences, and we 1 For more information on the LEAD Graduate School & Research Network, please see http://www.lead.uni-tuebingen.de 3 have edited several books and special journal issues with the specific aim of bringing together leading researchers from different disciplines whose paths would normally not cross (e.g., Andringa & Rebuschat, 2015; Meurers, 2009; Monaghan and Rebuschat, in prep; Rebuschat, 2015; Rebuschat, Rohrmeier, Hawkins, & Cross, 2012; Rebuschat & Williams, 2012). The present volume is part of this ongoing effort. This volume This volume was inspired by a symposium on “Connecting data and theory: Corpora and second language research”, which was jointly organized by the editors and took place in Lancaster, UK, on July 19, 2015. The symposium was jointly funded by the Language Learning Roundtable Grant Program and by the ESRC Centre for Corpus Approaches to Social Science (CASS). The objective was to establish a dialogue between experts on second language acquisition, corpora, and computational analysis methods. This dialogue can significantly enrich the empirical basis of second language research but, to date, collaborations across these fields are still rare. The symposium aimed at directly addressing this shortcoming. There were three sessions, each approaching the symposium topic from a distinct research area. Nick Ellis and Brian MacWhinney provided the view from cognitive psychology, Detmar Meurers and Markus Dickinson the view from computational linguistics, and Anke Lüdeling and Sylviane Granger the view from corpus linguistics. The symposium concluded with a general discussion. The discussion and feedback were both very positive and lively, and when the opportunity arose to produce a volume of Currents in Language Learning, we readily agreed to do so. Five presentations of the symposium provided the basis for four expanded and updated 4 chapters (Ellis; Lüdeling et al.; MacWhinney; Meurers & Dickinson). Additional chapters were written by colleagues who attended the symposium and made thoughtful contributions (Alexopoulou et al.; Gablasova et al; Monaghan & Rowland; Ziegler et al.). Based on the symposium discussions, we decided to expand the scope for the special issue in two areas. We solicited a manuscript that would contribute a language testing angle (Wisniewski) and broadened the topic to language learning in general, given the long and fruitful tradition of using corpora, NLP tools and computational modeling in child language research. As a result, the third