Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation
Total Page:16
File Type:pdf, Size:1020Kb
ICAME 41 Heidelberg Digital Conference Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation The International Computer Archive of Modern and Medieval English Annual Conference Heidelberg University and University of Cologne, Germany May 20-23, 2020 EXTENDED BOOK OF ABSTRACTS Conference Organizers: Beatrix Busse, Ingo Kleiber, Nina Dumrukcic, Franziska Wagner, Ruth Möhlig-Falke, and Kellie Gonçalves. Edited by Beatrix Busse, Nina Dumrukcic, and Ruth Möhlig-Falke ICAME41 Heidelberg Digital Conference Heidelberg University and University of Cologne, Germany Published by University and City Library (USB) Cologne Editors: Beatrix Busse Nina Dumrukcic Ruth Möhlig-Falke Cologne University Publication Server KUPS Universitäts- und Stadtbibliothek Köln Universitätsstr. 33 50931 Köln https://kups.ub.uni-koeln.de/ Conference organized with support from DFG Deutsche Forschungsgemeinschaft (German Research Foundation) John Benjamins Publishing Company University of Cologne This publication contributes to the Open Access movement by offering free access to its articles and permitting any users to read, download, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software. The copyright is shared by authors and the University of Cologne to control over the integrity of their work and the right to be properly acknowledged and cited. The name of the author must be mentioned at the place of use. The work may be distributed, but not altered or remixed. The work may not be used for commercial purposes. To view a copy of this license, visit: https://creativecommons.org/licenses/by-nc- nd/3.0/de/deed.en Preface ICAME conferences aim to provide the research community with a view of the state-of-the-art in English corpus linguistics and corpus linguistics in general. For ICAME 41 our aim was to go beyond this: we wanted to discuss how we can take corpus linguistics out of its comfort zone and realize its (inter- )disciplinary potential. Given the great societal challenges we are currently experiencing, Harari (2019, p. 266) has claimed that “the heat is on” in teaching and education to find solutions. As a result of, for example, (hyper)globalization, digitization, and developments in artificial intelligence and machine-learning, including big-data research, we are faced with a growing global and local complexity and interdependence of matter, lives, people and things, which also includes language. However, we have not even begun to understand how all of these issues and concepts will interact (humanely, sensibly, peacefully and sustainably) under the new circumstances of rapid change - nor have we yet considered what role linguistics and corpus linguistics may have to play in the solution of global challenges, and how education and teaching will consequently have to be transformed in order to do this. Language is everywhere, we use language(s) every day, and language and discourse shape our thinking as well as our lives. And yet, the general public has little knowledge of what linguistics and corpus linguistics is and does. Even though (historical) corpus linguistics has produced pioneering research findings on (quantitative and qualitative) linguistic and other semiotic patterns of language use for more than half a century, and has compiled various kinds of groundbreaking synchronic and diachronic language corpora and developed software tools, the impact of corpus linguistics is still fairly low and its social functions have not been sketched, let alone realized. Therefore, we think there is a need to transfer the methods and insights of corpus linguistics and its related sub-branches to society more forcefully than has been attempted in the past. This involves making accessible the excellent digital research on language, as well as the software, tools and methods of analyzing linguistic data big and small, past and present. Corpus linguistics needs to become more interdisciplinary to contribute to solving the grand societal challenges and to emphasize that language is the crucial social and cultural factor in human 1 interaction. At the same time, the corpus linguistic community needs to take on responsibility for educating and training the next generation, with these challenges in mind as they do so. Hence, the theme we chose for this conference was Language and Linguistics in a Complex World: Data, Interdisciplinarity, Transfer, and the Next Generation. Besides facilitating the exchange and discussion of cutting-edge research, our aims were as follows: • make participants reflect upon the relevance of their research for the community and society generally, and consider how it can be transferred through, for example, communication, counseling or even patent management; • make participants think about ways in which we can join forces to find answers to the grand societal challenges, which are always interdisciplinary and can only be addressed in teams. Applying our methodologies and approaches to ‘real-world problems’ and communicating well the value that corpus linguistics can bring to different areas and domains will also facilitate the renewal of corpus linguistics as a methodology/discipline; • critically discuss the interplay between artificial intelligence and corpus linguistics and address the growing divide between ‘traditional’ corpus linguistics and primarily tech-driven emerging fields such as ‘data science’ and ‘natural language processing’; • set up topic-related working groups of senior and junior experts that address these aims and volunteer, for example, to prepare innovative (digital) teaching and learning material and environments for the next generation of students of corpus linguistics and junior researchers. The process of moving the conference into the digital space and what it takes to organize a digital conference is described in more detail in Busse and Kleiber (2020). Instead of publishing a book of abstract as is the norm for academic conferences, we wanted our participants to describe their work in a compact yet informative way. This is a collection of papers, work-in- progress reports, and other contributions that were part of the ICAME41 2 digital conference. We would like to sincerely thank Heidelberg University, the University of Cologne, as well as the DFG Deutsche Forschungsgemeinschaft (German Research Foundation) and the John Benjamins Publishing Company for their support. Beatrix Busse and the ICAME41 team References Busse, B., & Kleiber, I. (2020). Realizing an online conference: Organization, management, tools, communication, and co-creation. International Journal of Corpus Linguistics, 25(3), 322-346. 3 Ajšić, Adnan American University of Sharjah [email protected] Type of Contribution: Work-in-Progress Report Large-scale lexical patterns in discourse as indicators of ideologies Comparing exploratory factor analysis and topic modeling Abstract Recent (lexical) approaches to identification of discourses (e.g. Baker et al., 2008; Partington, 2010), and language ideologies in particular (e.g. Subtirelu, 2013, 2015; Vessey, 2017), focus on the application of quantitative corpus- linguistic techniques to large data sets as a way to ensure more objective sampling methods and replicability of analytical procedures, as well as to minimize researcher inference. In addition to studies applying exploratory factor analysis (Ajšić, 2015, 2016, 2021; Berber Sardinha, 2019; Fitzsimmons- Doolan, 2014; cf. Biber, 1988), corpus-based discourse analysis research, as well as research in neighboring disciplines, increasingly relies on a similar probabilistic, algorithmic technique called topic modeling (Brookes & McEnery, 2019; DiMaggio, Nag & Blei, 2013; Jaworska & Nanda, 2016; Murakami et al., 2017; Törnberg & Törnberg, 2016a, 2016b). Based on a downsampled research corpus (1,118,454 tokens from 1,257 articles) resulting from analyses of two comprehensive, specialized research (11,656,247 tokens from 16,148 articles) and reference (22,493,804 tokens from 37,227 articles) corpora comprising articles from select Serbian newspapers and magazines published between 2003 and 2008, this study uses WordSmith Tools (Scott, 2014) and SPSS (IBM, 2012), on the one hand, and MALLET (McCullum, 2002), on the other, to compare the relative effectiveness of exploratory factor analysis and topic modeling for identifying large-scale patterns in discourse. 4 As is well known, both approaches are data-driven and more or less automated and therefore reasonably objective even though they involve subjective decisions at certain points in the analytical process; both offer data amenable to a variety of follow-up synchronic and diachronic analyses, as well as visualization; and both provide an insight into the heteroglossia (multivoicedness) of texts. Exploratory factor analysis, however, is shown to be a more reliable, if also more challenging, methodological solution for large-scale lexical analyses of discursive profiles of unstructured corpora targeting specific research questions (cf. Murakami et al., 2017). Preliminary findings include the following: • topic modeling (at least using MALLET) is more automated and less cumbersome to run • although topic modeling is non-deterministic and therefore not entirely replicable, topic models can still be relatively stable and useful • topic modeling produces results similar and complementary to results produced by exploratory factor analysis, and • topic modeling can account for a portion