Proceedings of the Joint Workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016

Preface Elena Volodina, Gintarė Grigonytė, Ildikó Pilán, Kristina Nilsson Björkenstam and Lars Borin

Conference description

The joint workshop on Natural Language Processing (NLP) for Computer-Assisted Language Learning (CALL) & NLP for Language Acquisition (LA) – shorthand NLP4CALL&LA – is an effort to provide a debate space and collaboration between two closely related areas. Both focus on language acquisition, related resources and technologies, that can support research of the language learning process as well as aim to bring interdisciplinary advantage to the field. Individual workshop areas are outlined below. The area of NLP4CALL is applied in essence, where tools, algorithms, and ready-to-use programs play an important role. It has a traditional focus on second or foreign language learning, and the target age group of school children or older. The intersection of Natural Language Processing and Speech Technology, with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has provided the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition (SLA) theories and practices, second language assessment, as well as knowledge of L2 pedagogy and didactics. The workshop on Language Processing for Research in Language Acquisition (NLP4LA) broadens the scope of the joint workshop to also include theoretical, empirical, and experimental investigation of first, second and bilingual language acquisition. NLP4LA aims to foster collaboration between the NLP, linguistics, psychology and cognitive science communities. The workshop is targeted at anyone interested in the relevance of computational techniques for first, second and bilingual language acquisition. The joint workshop series on NLP4CALL&LA has arisen in 2016 and has become a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in systems supporting language learning and research around it, and exploring the theoretical and methodological issues arising during language acquisition.

Motivation Results of the Survey of Adult Competencies (PIAAC, 2013), where literacy as a skill has been assessed among the adult population (16–65 years) has shown that on average Sweden scored among the top 5 countries out of the 23 OECD participants. However, the national Swedish report quotes the difference between the average literacy levels of

Elena Volodina, Gintare˙ Grigonyte,˙ Ildikó Pilán, Kristina Nilsson Björkenstam and Lars Borin 2016. Preface. Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016. Linköping Electronic Conference Proceedings 130: i–viii. i native (L1) born citizens compared to citizens with an immigrant (L2) background as the largest observed among all participating countries (OECD, 2013:6). The low literacy population in Sweden has three times higher risk of being unemployed or reporting poor health. The survey results point to an acute need of supporting immigrants and other low-literacy groups in building stronger language skills as a way of getting jobs and improving lifestyle (SCB, 2013:8). Besides, in the setting of an escalating refugee crisis in Europe and a growing number of people seeking asylum in Sweden (Migrationsverket 2016), research supporting second language acquisition, assessment and teaching is in every way important to the Swedish society. The government has recently initiated a project on learning among newly arrived (Skolverket 2014) where one of the foci is on producing tools for evaluation of Swedish as a second language, an aim to which the NLP4CALL workshop series contributes in a most robust way through bringing together people capable of influencig the situation through intelligent solutions. Exchange of information, ideas, experiences, methods, etc. between researchers dealing with ICALL questions leads to new insights and as a result to progress in the field. In the recent debates, the Swedish government have been strongly encouraging immigrants to take a “fast path” to learn Swedish so that immigrants can be sooner considered for work in Sweden. However, the fast path is not a solution, according to SLA researchers1 (Josefsson 2016). Professor Gunlög Josefsson in her article argues that the two immediate investments for improving teaching of L2 Swedish should be: 1. Development of effective IT-based solutions that can be used anywhere despite presence of a teacher 2. Education of a larger number of second language teachers that can offer SFI (Swedish For Immigrants) and other types of courses to greater number of immigrants,especially to those planning to take Swedish university courses as a step to validate their education. The research outlined for this workshop targets directly the first point on Josefsson's agenda and indirectly supports the second point on the Josefsson's agenda. Language technologies can be used to create more effective tools and computerized solutions for online teaching of target languages; as well as to support and relieve teachers of tedious tasks that can be modelled, such as exercise generation, essay grading, etc. Most importantly, use of Language Technologies can make IT solutions for language learning more “intelligent”. Through this workshop, we intend to profile ICALL and LA research in the Nordic countries and to provide a dissemination venue for researchers active in this area. The broad motivation of NLP4CALL & NLP4LA workshop is to provide a meeting place for researchers working on language learning issues including both empirical and experimental studies and NLP-based applications and to bring together competences from these areas for sharing experiences and brainstorming the future of the field.

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 ii Research background Intelligent tools for language learning are within reach given the availability of key components: corpora, lexicons, tokenizers, lemmatizers, morphological analyzers, parsers, etc. (Nerbonne & Smit, 1996; Tufis, 1996). ICALL applications are based on (language-specific) tools that are used to process language samples (text, speech, words, etc.) and that have generative power of applying the same analysis model to different language samples over and over again, being an infinite source of language “wisdom” (e.g. automatic error correction, automatic exercise generation, etc). Depending upon the aim of an ICALL application the above-named key software can be assembled in various ways making use of their different features, thus facilitating diverse learning aims. Nowadays various ICALL applications can support reading and writing activities (Heilman et al., 2006; Mitkov & Ha, 2003), vocabulary (Volodina et al., 2014a), grammar (Meurers et al., 2010; Reynolds et al., 2014) as well as pronunciation and listening skills (Wik & Hjalmarsson, 2007). However, very often these efforts remain prototypes not leading to fully-functional systems that can be incorporated into educational establishments. To successfully build a full-ended ICALL system, a wide spectrum of issues need to be addressed and solved: * collection and annotation of learner-specific data, such as learner-specific lexicons, grammar profiles, annotated essays, reading comprehension corpora, etc. * incorporation of the results of (S)LA research to gain appropriate linguistic features in combination with pedagogically relevant criteria to base automatic evaluation/ assessment on * algorithms, methods, heuristic rules, etc. for data handling * evaluation of tools, algorithms and programs with teachers and learners * modeling of learners and learner progress for indivisualized learning * feedback generation for encouraging progress on the learner side As long as these areas are treated separately, a vision of a full-ended system remains utopian. However, without having each of the outlined issues solved/researched, there is no hope for making utopia a reality. That is why it is important to create a network of researchers working on various tasks within ICALL so that solutions prompted by them could be tested in other projects. The workshop creates a meeting space for sharing insights into the ICALL problems, uniting efforts and creating a network of experts in the field. This workshop series covers all Language Acquisition-relevant research areas as outlined above, including studies where NLP-enriched tools are used for testing (S)LA and pedagogical theories, and vice versa, where (S)LA theories/pedagogical practices are modeled in hands-on tools. This year our focus has been on how to tranfser from small individual research projects

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 iii to a full-scale application ready for use in educational establishments: What needs to be done yet? Which approach is the most effective? What time estimation is realistic? Do we have enough expertise? Which collaboration do we need to establish? How? Do we lack manpower or financial support? Or both? The two invited speakers presented ICALL from two points of view: commercial and academic. The first invited speaker, Jill Burstein, is a Research Director of the Natural Language Processing Group in Research & Development at Educational Testing Service in Princeton, New Jersey2. Her research interests span Natural Language Processing for educational technology, automated essay scoring and evaluation, discourse and sentiment analysis, argumentation mining, education policy, English language learning, and writing research. The intersection of her interests has led to two extensively used commercial applications for English L2 learners: E-rater®, ETS' automated essay evaluation application, and the Language Muse Activity PaletteTM -- a new classroom tool under development targeting English learners that automatically generates language activities for classroom texts to support content comprehension. Jill Burstein is one of the most successful researchers within ICALL that together with a group of bright researchers made ICALL tools a reality for many teachers of L2 English. Her expertise and experience will be a highlight of the workshop. The second invited speaker, Piet Desmet, is Full Professor of French and Applied linguistics and Computer-Assisted Language Learning at KU Leuven and KU Leuven KULAK. He coordinates the imec-research team ITEC (Interactive Technologies), focusing on domain-specific educational technology with a main interest in language learning & technology. He leads a range of research projects in this field focusing on such topics as adaptive and personalized learning, input enhancement, intelligent feedback or automated analysis and annotation of text corpora using natural language processing. He also coordinates the large-scale research project TECOL focusing on technology-enhanced collaborative learning. He is director of more than 15 PhDs (finished and ongoing) and author of publications in journals such as Language Learning & Technology, System, ReCALL or CALL Journal). He has been presenter at many international conferences (CALICO, WORLDCALL, EUROCALL, UNTELE, EDMEDIA, etc.) and organizer of different international symposia. He was involved in the creation of two spin-offs in the field. All this makes him a renowned scholar in our field with theoretical as well as practical contributions to the integration of NLP into CALL. The two speakers represented two different worlds - the first one of a commercial company and the second one of an academic university. As practice shows, most tools, solutions and technologies developed at a university remain prototypical whereas commercial companies tend to take such solutions to the users. However, the two worlds are dependent on each other. Both invited speakers represented projects that over time have grown from small-scale initiatives to become influential trend-setting

2 The text is copied from

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 iv intelligent solutions in language learning.

Previous NLP4CALL workshops The first five editions of this workshop series3 have attracted participants from all over the world, including researchers from Australia, Canada, Central, South and Northern Europe, Russia as well as USA. The workshops have shown the vast potential that Language Technology (LT) holds for language learning and – most importantly – the interest that LT researchers have in the domain of CALL. Previous workshop editions have covered numerous topics that can be grouped towards

• research directly aimed at ICALL, such as the analysis of learner-produced texts and the generation of L2 learning materials • practices demonstrating actual or potential use of existing Speech Technologies, NLP tools or resources for language learning, such as automatic essay grading or using speech synthesis in spelling exercises • research aimed at development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. collecting and annotating ICALL- relevant corpora; developing tools and algorithms for readability analysis, selecting optimal corpus examples, etc. • discussion of challenges, visions and research agenda for ICALL The special focus has always been given to discussion of the above-mentioned themes for the Nordic languages. Submissions to the four workshop editions have targeted a wide variety of languages, ranging from well-resourced languages (German, English, French, Russian, Spanish) to under-resourced ones (Estonian, Saami, Võro), among which several Nordic languages have been targeted: Danish, Estonian, Icelandic, Norwegian, Saami, Swedish, and Võro. Up to date, acceptance rate varied between 50% and 77% (Table 1), the average being 66,5%. The acceptance rate is rather high, however, the reviewing process has always been very strict with two-three double reviews per submission. This indicates that submissions to the workshops have always been of high quality.

Workshop year Submitted Accepted Acceptance rate 2012 12 8 67% 2013 8 4 50% 2014 13 10 77% 2015 9 6 67% 2016 14 10 71,5% Table 1. Submmissions and submission rates, 2012-2016

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 v Our many thanks go to our program committee consisting of internationally acknowledged researchers from many countries and continents representing different competences within ICALL and LA areas:

• Lars Ahrenberg, Linköping University, Sweden • Florencia Alam, CONICET, Argentina • Christina Bergmann, Centre National de la Recherche Scientifique, France • Eckhard Bick, University of Southern Denmark, Denmark • Lars Borin, University of Gothenburg, Sweden • Antonio Branco, University of Lisboa, Portugal • Jill Burstein, Educational Testing Service, USA • Alex Cristia, Centre National de la Recherche Scientifique, ENS-DEC, EHESS, France • Piet Desmet, KU Leuven Kulak, Belgium • Simon Dobnik, University of Gothenburg, Sweden • Thomas Francois, UCLouvain, Belgium • Gintare Grigonyte, Stockholm University, Sweden • Anna Gudmundsson, Stockholm University, Sweden • Jana Götze, KTH, Sweden • Björn Hammarberg, Stockholm University, Sweden • Katarina Heimann Mühlenbock, DART, Sahlgrenska Universitetssjukhuset, Sweden • Sofie Johansson Kokkinakis, University of Gothenburg, Sweden • Chigusa Kurumada, University of Rochester, USA • Peter Ljunglöf, Chalmers Tekniska Högskolan, Sweden • Staffan Larsson, University of Gothenburg, Sweden • Montse Maritxalar, University of the Basque country, Spain • Ellen Marklund, Stockholm University, Sweden • Detmar Meurers, University of Tübingen, Germany • Kristina Nilsson Björkenstam, Stockholm University, Sweden • John K. Pate, The University at Buffalo, USA • Martí Quixal, University of Tübingen, Germany • Lena Renner, Stockholm University, Sweden • Gerold Schneider, University of Konstanz, Germany • Mathias Schulze, University of Waterloo, Canada • Iris-Corinna Schwarz , Stockholm University • Philip Shaw, Stockholm University, Sweden • Jennifer Spenader, University of Groningen, Netherlands • Sofia Strömbergsson, Karolinska Institutet, Sweden • Joel Tetreault, Yahoo! Labs, USA • Trond Trosterud, Universitetet i Tromsø, Norway

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 vi • Cornelia Tschichold, Swansea University, UK • Francis Tyers, The Arctic University of Norway, Norway • Sowmya Vajjala, Iowa State University, US • Paul Vogt, Tilburg University, Netherlands • Elena Volodina, University of Gothenburg, Sweden • Torsten Zesch, University of Duisburg-Essen, Germany • Robert Östling, Stockholm University, Sweden

Workshop organizers, Elena Volodina, Ildikó Pilán, Lars Borin (University of Gothenburg) Gintare Grigonyte, Kristina Nilsson Björkenstam (Stockholm University)

Workshop website:

Acknowledgements: Financial support for the organization of the workshop has come from the University of Gothenburg, Språkbanken , and Swedish Research Council through its funding of the conference organization (Dnr. 2016-00447).

References

Josefsson, Gunlög. 2016. . Svenska dagbladet. Retrieved 2016-02-22.

Heilman, M., Collins-Thompson, K., Callan, J. and Eskenazi, M. 2006. Classroom Success of an Intelligent Tutoring System for Lexical Practice and Reading Comprehension. ICSLP.

Detmar Meurers, Ramon Ziai, Luiz Amaral, Adriane Boyd, Aleksandar Dimitrov, Vanessa Metcalf, Niels Ott. 2010. Enhancing Authentic Web Pages for Language Learners. Proceedings of the 5th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT 2010, Los Angeles.

Migrationsverket. 2016. http://www.migrationsverket.se/Om- Migrationsverket/Statistik/Aktuell-statistik.html Mitkov, R. and Ha, L.A. 2003. Computer-Aided Generation of Multiple-Choice Tests. Proceedings of the HLT-NAACL 2003 Workshop on Building Educational Applications Using Natural Language Processing, 17-22.

Nerbonne, J. and Smit, P. (1996). GLOSSER-RuG: In Support of Reading. COLING- 96. The 16th International Conference on Computational Linguistics. Proceedings, vol.2 830-835. Copenhagen: Centre for Sprogteknologi.

OECD. 2013. OECD Skills Outlook 2013. First Results from the Survey of Adult Skills. http://skills.oecd.org/skillsoutlook.html

PIAAC. 2013. http://www.oecd.org/site/piaac/

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 vii Robert Reynolds, Eduard Schaf and Detmar Meurers: A VIEW of Russian: Visual Input Enhancement and adaptive feedback. Proceedings of the 3rd Workshop on NLP for Computer-Assisted Language Learning at the 5th Swedish Language Technology Conference, Uppsala University, Sweden.

Skolverket. 2014. http://www.skolverket.se/skolutveckling/larande/nyanlandas-larande

SCB, Statistiska centralbyrån. 2013. Tema utbildning, rapport 2013:2, Den internationella undersökningen av vuxnas färdigheter. http://www.scb.se/statistik/_publikationer/UF0546_2013A01_BR_00_A40BR1302.pdf Tufis, D. 1996. CALL: The Potential of Lingware and the Use of Empirical Linguistic Data. COLING - 96. The 16th international conference on computational linguistics. Proceedings, vol.2. 1010-1011. Copenhagen: Center for Språkteknologi.

Preben Wik & Anna Hjalmarsson. 2009. Embodied conversational agents in computer assisted language learning. Speech communication 51 (10), 1024-1037

Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition at SLTC 2016 viii