Exploring the use of English on Instagram in the Finnish Capital Region:

Spatial and Temporal Perspectives

Niko Huhtala Master’s Thesis English Studies Faculty of Arts University of March 2021

Faculty Degree Programme Faculty of Arts Master’s programme in English Studies Study Track Master’s programme in English Studies Author Huhtala Niko Title Exploring the use of English on Instagram in the Finnish Capital Region: Spatial and Temporal Perspectives Level Month and year Number of pages Master’s Thesis March 2021 59 Abstract This MA thesis explores the use of English on Instagram in the Finnish capital region that consists of the municipalities of Helsinki, , and Kauniainen. Building on previous research on Virtual Linguistic Landscapes and English as a lingua franca, this thesis investigates the extent to which English is used in the study area and how different types of areas and locations differ in terms of English use in the study area. For this purpose, I use geotagged social media data and methods from the fields of natural language processing and geoinformatics. Firstly, I analyse the general linguistic make-up of the study area to understand the use of English in relation to other languages. Secondly, I analyse and compare how the use of English and Finnish are spread geographically across the Finnish capital region on Instagram and identify spatial clusters by means of spatial autocorrelation analysis. Lastly, I seek to provide further insights into the different types of locations where English, Finnish and other languages are used by using the Corine Land Cover inventory for categorising different types of locations. The results of this study show that the English language has a very strong presence as the second most used language in the Virtual Linguistic Landscape of the Finnish capital region. English is used especially often by users who use more than one language on Instagram. The spatial patterns of English use show that the language is used particularly often in the Helsinki city centre, western Helsinki and eastern Espoo and least in north-eastern Helsinki and Vantaa. English has a strong presence in essentially all the studied location types, especially in commercial and urban contexts. The relative proportions of English use are highest at airport areas and lowest in various sport and leisure facilities, where Finnish is used significantly more than any other language. In the analysis, I also include frequent observations on Finnish and other languages, which provide further insights into the rich Virtual Linguistic Landscape of the capital city region. Keywords Virtual Linguistic Landscapes, English as a lingua franca, geotagged social media, Instagram, Corine Land Cover, multilingualism, spatial autocorrelation Where deposited Helsinki University Library Additional information

Table of Contents

1 Introduction and Research Questions ...... 1

2 Theoretical Framework ...... 3

2.1 Physical and Virtual Linguistic Landscapes ...... 3

2.2 English as a Lingua Franca and Multilingual Practices ...... 6

2.3 The English Language in ...... 9

2.4 Ethical Considerations in Social Media Research...... 10

3 Data and Methods ...... 11

3.1 Geotagged Instagram posts ...... 11

3.2 Corine Land Cover Inventory ...... 13

3.3 QGIS, GeoDa and Excel ...... 15

3.4 Spatial Autocorrelation ...... 16

4 Analysis and Results ...... 18

4.1 General statistics and temporal patterns ...... 18

4.2 The Spatial Distribution of English and Finnish ...... 23

4.3 Corine Land Cover Classes ...... 29

4.3.1 Urban environments ...... 33

4.3.2 Transportation ...... 38

4.3.3 Recreational and natural areas ...... 42

5 Discussion ...... 45

5.1 Limitations of the study...... 45

5.2 Research Questions ...... 48

5.3 Suggestions for further research ...... 51

6 Conclusion ...... 53

7 References ...... 53

8 Appendices ...... 59

Tables: Table 1. Top 16 languages – General statistics...... 19 Table 2. Multilingual users...... 20 Table 3. CLC classes - Distribution of English...... 30 Table 4. CLC classes - Distribution of Finnish...... 30 Table 5. CLC classes - Distribution of languages other than English and Finnish...... 31

Figures: Figure 1. Main axis & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month...... 21 Figure 2. Daily temporal patterns in the Finnish capital region. Line: average number of sentences per hour. Grey area: standard deviation...... 22 Figure 3. Weekly temporal patterns in the Finnish capital region. Line: average number of sentences per hour. Grey area: standard deviation...... 22 Figure 4. Population density in the Finnish capital region in 2016 ...... 23 Figure 5. Spatial distribution of English...... 24 Figure 6. Spatial distribution of Finnish...... 24 Figure 7. Spatial distribution of English – Relative difference from average (38% of all posts)...... 25 Figure 8. Spatial distribution of Finnish – Relative difference from average (51% of all posts)...... 26 Figure 9. Spatial distribution of English – statistically significant spatial clusters...... 27 Figure 10. Spatial distribution of Finnish – statistically significant spatial clusters...... 27 Figure 11. Total number of sentences in each CLC class...... 32 Figure 12. Relative differences from sentence averages (English 36.7%, Finnish 50.6%, other languages 12.7%) in each CLC class...... 32 Figure 13. Urban environments. Differences from the sentence averages...... 33 Figure 14. Urban environments. Main axes & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month...... 33

Figure 15. Daily and weekly temporal patterns in Commercial units. Line: average number of sentences per hour/day. Grey area: standard deviation...... 35 Figure 16. Daily and weekly temporal patterns in Urban fabric. Line: average number of sentences per hour/day. Grey area: standard deviation...... 36 Figure 17. Daily and weekly temporal patterns in Industrial units. Line: average number of sentences per hour/day. Grey area: standard deviation...... 36 Figure 18. Daily and weekly temporal patterns in Sport and leisure areas and facilities. Line: average number of sentences per hour/day. Grey area: standard deviation...... 37 Figure 19. Transportation. Differences from the sentence averages...... 38 Figure 20. Transportation. Main axes & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month...... 38 Figure 21. Daily and weekly temporal patterns in Road and rail networks and associated land. Line: average number of sentences per hour/day. Grey area: standard deviation...... 40 Figure 22. Daily and weekly temporal patterns in Airports. Line: average number of sentences per hour/day. Grey area: standard deviation...... 41 Figure 23. Daily and weekly temporal patterns in Port areas. Line: average number of sentences per hour/day. Grey area: standard deviation...... 41 Figure 24. Recreational and natural areas. Differences from the sentence averages ...... 42 Figure 25. Recreational and natural areas. Main axes & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month...... 42 Figure 26. Daily and weekly temporal patterns in Green urban areas. Line: average number of sentences per hour/day. Grey area: standard deviation...... 44 Figure 27. Daily and weekly temporal patterns in Forests and semi-natural areas. Line: average number of sentences per hour/day. Grey area: standard deviation...... 44 Figure 28. Daily and weekly temporal patterns in Water bodies and wetlands. Line: average number of sentences per hour/day. Grey area: standard deviation...... 45

Appendices

Appendix A. Top 15 location labels in Commercial units...... 59 Appendix B. Top 15 location labels in Urban fabric...... 60 Appendix C. Top 15 location labels in Industrial units...... 60

Appendix D. Top 15 location labels in Sport and leisure areas and facilities...... 60 Appendix E. Top 15 location labels in Road and rail networks and associated land...... 61 Appendix F. Top 15 location labels in Airports...... 61 Appendix G. Top 15 location labels in Port areas...... 62 Appendix H. Top 15 location labels in Green urban areas...... 62 Appendix I. Top 15 location labels in Forests and semi-natural areas...... 62 Appendix J. Top 15 location labels in Water bodies and wetlands...... 63 Appendix K. Top 15 locations in Mine, dump and construction sites...... 63 Appendix L. Top 15 locations in Agricultural areas...... 64

1 Introduction and Research Questions

Instagram is a widely popular social media platform where users share photographs, videos, and captions. Like many other social media platforms, Instagram allows the sharing of user location in the posts at user-defined points-of-interest (POI) (Hochmair et al., 2018). This practice is also known as geotagging. Access to geotagged social media data has opened a world of novel opportunities when it comes to researching the use of languages online as geotagged data provides significant amounts of information about how people use languages in specific locations (e.g. Laitinen et al., 2018; Coats 2019; Hiippala et al., 2019; Hovy et al., 2019; Hiippala et al., 2020). Besides having a purely spatial dimension, most spatial phenomena also have a temporal dimension (e.g. Crampton et al., 2013; Tenkanen, 2017; Hiippala et al., 2019) making such data even more intricate. With the help of computational methods, large amounts of geotagged social media data can be used effectively for various research purposes, such as researching multilingualism and the use of English online in redefined ways.

In the recent years, social media platforms such as Instagram and Twitter have been popular sources of data among linguists studying online multilingualism and the significance of English as the language of online communication (Lee, 2016). However, the relationship between Virtual Linguistic Landscapes (Ivkovic & Lotherington, 2009), found on social media and their corresponding physical environments is a relatively new area of interest. For example, the effect of location on the choice of language on social media is a topic that has been explored in recent studies. Hiippala et al. (2019) found that Finnish Instagram users write half of their posts in English when their post is geotagged at the Helsinki Senate Square, which is a significant cultural landmark and tourist attraction in Finland. Using the same set of data, Loikkanen (2020) found that mentioning the Senate Square or Finland in the post increased the odds of choosing English over Finnish significantly, whereas mentioning popular Finnish events and holidays decreased the odds. The findings of these studies are notable examples of situations where a specific location and context have observable influences on language choices on social media.

Both abovementioned studies focus on a relatively narrow geographical area in an urban context, but broader approaches have also been undertaken in studies. For example, Hiippala et al. (2020) used geotagged data from Twitter to map the linguistic diversity of the

1

entire Finland at the level of regions and municipalities and the results revealed distinct regional patterns in linguistic richness and the uses of different languages. Coats (2019) used Twitter data for studying the relationships between language choice, multilingualism, gender and location in the Nordic countries and found that location and gender appear to have an effect on the languages used. By demonstrating novel visualization methods for studying geographic language variation across the entire continent of Europe, Hovy et al. (2019) showed how new methods and geotagged social media data can be used to complement traditional research of regional language variation. The results and methods of the abovementioned large-scale studies offer an excellent basis for interesting comparisons with the findings of smaller scale studies. These studies also demonstrate the benefits that linguists can gain from collaborating with scholars in the field of geography and adopting new methodologies.

The purpose of the current study is to continue the investigation of Virtual Linguistic Landscapes found on geotagged social media, in this case Instagram, with a particular focus on the use of English in Finnish capital region that consists of the cities of Helsinki, Espoo, Vantaa and Kauniainen. This exploration is supported by methods from the fields of natural language processing and geoinformatics. As previous studies (Hiippala et al., 2019; Hiippala et al., 2020) have shown, the capital city region has a rich and diverse Virtual Linguistic Landscape and English and Finnish are widely used in the area. Therefore, Finnish, the first official language, as well as other languages found in the data are also paid attention throughout this study to allow comparisons and also for the fact that focusing exclusively on the use of English would only provide a limited point of view on the inherently multilingual nature of online communication and English as a lingua franca (Lee, 2016; Cogo, 2017; Sangiamchit, 2017).

I approach the topic from three different perspectives. Firstly, I explore the general linguistic make-up of the geotagged Instagram data to provide a wider context for the analysis of the use of English. Secondly, I analyse and compare how the use of English and Finnish are spread geographically across the Finnish capital region on Instagram. Lastly, by using the Corine Land Cover inventory I seek to provide further insights into the different types of locations where English, Finnish and other languages are used. The research questions for this study are as follows.

1. To what extent is English used on Instagram in the Finnish capital region? 2. How is the multilingual nature of English as a lingua franca reflected in the geotagged Instagram data?

2

3. How do different areas and types of locations in the Finnish capital region differ in terms of English use on Instagram?

In Chapter 2 I present the theoretical framework relevant to the current study. In Chapter 3 I introduce the data and methods used. In Chapter 4 I present the analysis and results. In Chapter 5 I first discuss the limitations of the current study before moving on to a discussion on the key findings of this study and suggestions for further research. In Chapter 6 I present the conclusions of this study.

2 Theoretical Framework

In this chapter I present the theoretical framework relevant to this study, which explores the use and spread of the English language on Instagram in the Finnish capital region. By providing an overview of previous research, relevant theoretical perspectives and methodological issues, I demonstrate how geotagged social media data is still a somewhat untapped source of data that can be effectively used for studying Virtual Linguistic Landscapes and English as a lingua franca.

The starting point of this chapter is the study of Linguistic Landscapes (LL), which has over time been extended to also include Virtual Linguistic Landscapes (VLL) since physical places and virtual spaces are becoming increasingly intertwined. Closely related to the study of LL and VLL, the topics of urban and online multilingualism and English as a lingua franca (ELF) are then discussed. To further contextualize the current study, I will also discuss the use and importance of English in the Finnish context. Lastly, I discuss ethical considerations relevant to social media research.

2.1 Physical and Virtual Linguistic Landscapes

The study of LLs is an interdisciplinary field of research that draws from several fields of study, such as applied linguistics, sociolinguistics, cultural geography, sociology and anthropology (Shohamy et al., 2010). Various definitions have been given by scholars to describe this diverse research domain and Gorter (2018) has proposed the following as the latest and most comprehensive definition: ‘the field of linguistic landscapes [...] attempts to understand the motives, uses, ideologies, language varieties and contestations of multiple forms of ‘languages’ as they are displayed in public spaces’ (pp. 41-42). The study of LL encompasses all kinds of 3

signs and any form of writing on display issued by public authorities as well as individuals, companies, associations and so forth (Shohamy et al., 2010). One of the main purposes of LL is to analyse and characterise multilingual signage in public spaces and its role in defining the cultural, socioeconomic, political and sociolinguistic character of a given territory (Shohamy et al. 2010).

Most of the study of LL has taken place in metropolitan areas around the world (hence the alternative name ‘multilingual cityscape’ was suggested by Gorter, 2006) but smaller and more remote locations have been studied as well as LL research has evolved. The most common method of data collection in LL studies has been photography and presenting photographic material (Gorter, 2018). LL research has often focused on relatively small and easily observable areas such as shopping streets (e.g. Cenoz & Gorter, 2006), shopping malls (e.g. Coluzzi, 2017) or transportation hubs (e.g. Soler-Carbonell, 2016; Henricson, 2020). However, many studies have also sourced data from multiple locations for larger surveys. For example, Tufi & Blackwood (2016) gathered data from 20 sites in cities on the coasts of France and Italy, Pietikäinen et al. (2011) researched the LLs of smaller villages in Norway, Sweden, Finland and Russia, and Hult (2014) collected data in Texas by capturing the LL on video from a moving train.

In his paper on the methodological challenges of LL research, Blackwood (2015) argues that during the first years of LL research two approaches have become dominant. The quantitative approach has been somewhat reduced to the counting of signs, whereas qualitative research is focused on the analysis of selected signs from which wider conclusions can then be drawn. Blackwood (2015) argues that adopting only one approach in LL can be problematic and lead to inaccurate conclusions, and therefore a hybrid approach that combines statistical data with qualitative analysis is necessary for future research in order to gain a better understanding of the LL of a given space. Furthermore, technological developments such as augmented reality and omnipresent digital screens further complicate the study of LL. In general, it has become clear that methodological innovations and expanding the current approaches are needed for the research of LL to keep up with the modern age (Gorter, 2018).

Originally coined by Ivkovic & Lotherington (2009), the concept of Virtual Linguistic Landscapes (VLL) brought forth a new area of interest closely related to LL. Whereas LL describes the linguistic ecology of urban environments, VLL attempts to describe the interaction of languages in the cyberspace. As with LL, VLL research has focused on linguistic

4

content generated by both individuals as well as authorities. In their study on plurilingual interaction on YouTube, Thorne & Ivkovic (2015) demonstrated how even relatively simple social media platforms such as YouTube comment fields make visible the ordinariness of multilingual practices. Berezkina (2018) studied language policies and linguistic homogenization of the VLL of the public sector in Norway and illustrated how English has superseded the use of minority languages on state websites.

Although the digital world is described as conceptually grounded in the physical world, Ivkovic & Lotherington (2009) argue that physical linguistic landscapes are always local, whereas virtual interaction is mediated and delocalized. However, with the swift development of technology and Internet, the concept of virtual space and its relationship to the physical world is changing. Kellerman (2010) has suggested that cyberspace can nowadays be in fact considered “simultaneously an entity of its own and an entity converged with real space” (p. 2994). This notion of “double space” is an issue mainly considered in the field of human geography (Kellerman, 2014), but the same concept offers interesting possibilities for sociolinguistic enquiries as digital screens and smartphones containing ever-changing virtual language on display have become ubiquitous in physical spaces (Gorter, 2018).

These changes in our understanding of space have gained increasing attention in the studies of LL and VLL. Maly & Blommaert (2019) argue that “City-scapes cannot be grasped from a merely synchronic perspective. A diachronic perspective, understanding the landscape as a multilayered historically and socially constructed space on the online/offline nexus, is thus necessary.” (p. 19) Maly & Blommaert demonstrate their approach by analysing how websites tied to a specific location affect our understanding of the physical place. Similarly, Hiippala et al. (2019) have proposed that “geotagged social media posts anchored to a specific geographic location act as an extension of the linguistic landscape of the corresponding physical environment” (p. 292). For many, the use of social media has become an inseparable part of the day-to-day life and therefore social media holds great potential for LL and VLL research. One of the benefits of geotagged social media data is that the large amount of user generated content can help us to effectively sense the highly dynamic interaction of humans and languages on both large as well as small scales across time and space (Steiger et al., 2016)

Both physical as well as virtual LLs are always very dynamic in nature (Papen, 2012; Soler-Carbonell, 2016; Hiippala et al. 2019; Hiippala et al., 2020). The effects of the varying degrees of mobility of people are especially important to consider when we try to understand

5

the VLL and LL of a given space. As pointed out by Soler-Carbonell (2016), “[t]he very fact that particular types of people with particular linguistic profiles move around these spaces co- creates the possibility for particular language resources to emerge more prominently in some settings and not in others.” (pp. 18-19) If we consider the VLL of a given space an extension of the LL, then this notion can be extended to geotagged social media as well, albeit on a different scale due to the nearly unrestricted accessibility of social media from anywhere in the world. As Steiger et al. (2016) have suggested “The spatial and social structures of a city as well as the dynamic nature of human activities result in certain collective and individual human behaviour patterns. Social media data can help to “sense” this type of information from urban environments in an in-situ manner” (p. 238).

To conclude, it has become clear that the future of LL and VLL research lies in the combination of online and offline perspectives (Gorter, 2018; Maly & Blommaert, 2019). Research methodology has continuously evolved since the birth of LL and VLL and continues to develop as new innovations and technological developments constantly change our understanding of the world around us. At the same time these changes give us new means for exploring the use of languages in public spaces. Geotagged social media, which can be considered at the same time local as well as accessible from anywhere around the world, is a prime example of a modern yet rather untapped source of data that offers new points of view on language practices taking place in the multidimensional and highly dynamic nexus of the online and offline worlds.

2.2 English as a Lingua Franca and Multilingual Practices

Mauranen (2017) defines ELF as “a contact language between speakers or speaker groups when at least one of them uses it as a second language” (p. 8). According to Jenkins (2017) The defining feature about English nowadays is that the language has spread across the globe wider than any other language before. The widespread use of ELF has resulted in exceptionally heterogenous and varied uses of the language as English is used extensively in politics, business, and academia as well as by tourists, immigrants and more or less everyone over digital media. There are also more people who speak English as a second language than there are people who speak it as first language. However, it is important to point out that very often English is not the only language present in ELF exchange, as speakers continuously adapt to one another in their interaction and draw on a wide range of multi- and translingual resources

6

at the same time. Increasing attention has been given in ELF research to the importance of multilingualism “as ELF’s overarching framework rather than one of its characteristics, with translanguaging seen as an intrinsic part of ELF communication” (Jenkins, 2017, p. 2). Translanguaging (Wei, 2018) means using one’s entire linguistic repertoire unconcerned about the socially and politically defined boundaries of languages. Taking the multilingual nature of ELF into consideration is considered essential when we try to understand the way English is used in today’s world (Cogo, 2017).

Previous research on ELF has included a wide range of topics on both offline and online uses of English (Jenkins et al., 2017). In her paper on the use of ELF in the European context, Sherman (2017) argues that ELF is widely used for solving communicative and sociocultural problems in Europe, but also emphasizes the distinct differences between regions and individual national contexts. Similarly, Sangiamchit (2017) demonstrates how English is used in the online context to fulfil diverse and highly context-specific communication purposes. As discussed by Cogo (2017), the most recent ELF research has put increased focus on the situatedness of multilingual linguistic practices within specific contexts. Two common inferences that can be drawn from ELF research are that, firstly, the use of English is tremendously widespread and enormously diverse, and secondly, English use is always specific to the location and context in which it is used.

Pennycook & Otsuji (2015) emphasize the idea that language use is always local and emerges first and foremost from the activities it performs as people use creative and mixed linguistic practices across cultural, historical, and political borders. Pennycook & Otsuji (2015) argue that a mixed and varied language use is the norm in today’s world and for this they have come up with the term metrolingualism: “By viewing language use as profoundly bound up with space and activity, metrolingualism also raises questions about how we understand languages not only in relation to each other but also in relation to all that is going on in a particular place” (p. 182). Pennycook & Otsuji (2015) stress the importance on observing on the spot and make arguments against the enumeration of speakers or languages, but as mentioned earlier, for instance Blackwood (2015) has advocated the adoption of hybrid approaches in LL research, where statistical data is combined with qualitative analysis, as essential for understanding LLs in depth. If we consider the VLL an extension of the LL as discussed in the previous section, then geotagged social media and other location specific data found online hold great potential for studying when, where, and why English and other languages are used in specific contexts.

7

The significance of English as the language of the internet is a topic that has been discussed and studied for decades (Lee, 2016). Different uses of English continue to grow on the Internet as the use of English is seen as both a natural and neutral choice by web-users that allows them to reach a wide global audience. Lee (2016) argues that the concurrent use of English and other languages functions also as a projection of a modern cosmopolitan identity especially for the younger generations. Furthermore, the language shift to English in online communication among people with a common language speaks volumes of the social status and perceived value of English in today’s world. From the user’s perspective, English dominates the most common globalized social media platforms and even people who do not consider themselves especially fluent with English seem to be accustomed to using the language online (Lee, 2016).

Different social media platforms such as Instagram, Twitter and Facebook hold great potential for studying ELF and online multilingualism on both small and large scale due to the ever-increasing amount of user-generated content. For example, a study by Laitinen et al. (2018) explored the prospects of using Twitter data as a diagnostic tool in evaluating the changing roles of English as a lingua franca and the results showed that English accounts for almost a third of all Twitter posts in the Nordic countries. Another study on language choice and gender in the Nordics (Coats, 2019) revealed a consistent pattern in the uses of the principal national language and English based on gender, as males seemed to prefer the national language more and females used more English. A study on regional language variation in Germany on data from the anonymous mobile chat application Jodel (Purschke & Hovy, 2019) revealed the existence of distinct “digital regiolects”, which share linguistic features with their regional offline equivalents, but which also show substantial differences in structure and dynamics. All the above-mentioned studies offer big data perspectives on the ways in which languages are used in specific locations online.

As already mentioned in the introduction, the studies by Hiippala et al. (2019) and Loikkanen (2020) showed how a specific location and context can have an observable influence on language choices on social media. In a multimodal study on geotagged Instagram data from the Paris Orly airport, Blackwood (2019) explored approaches to self-representation on social media. The results indicate that French passengers use English often in their photo arrangements and captions as a way of presenting themselves as mobile global citizens. One common reason for the abovementioned results could be that distinct locations such as popular tourist attractions and airports are considered “international” locations and, consequently, the

8

posts that are geotagged at these places function as means for connecting with a prestigious global culture rather than the local culture and as a result English is used more often than on average. Whatever the reasons behind these language choices might be, it can be argued that the choice of language on social media is always intentional, and location is one factor affecting that choice.

As demonstrated by the studies discussed so far (e.g. Blackwood 2019; Hiippala et al., 2019; Maly & Blommaert, 2019; Hiippala et al., 2020) new research methodology and data from social media can help us in gaining a more comprehensive understanding of ELF and multilingual language practices. Information found in VLLs, such as geotagged social media data, can and should be used to complement research endeavours in the corresponding physical world to better understand the complex nature of languages in modern urban spaces.

2.3 The English Language in Finland

The English language has a strong presence in Finland as the language is used to a great extent in research, higher education, business, and the media and it is also the first foreign language taught in schools for many (e.g. Taavitsainen & Pahta, 2003; Leppänen and Nikula, 2007). Taavitsainen & Pahta (2008) argue that the language is not only used in Finland as a lingua franca for international communication, but the language has also been widely appropriated for local uses and meanings. Taavitsainen & Pahta suggest that the English language is increasingly becoming “a natural part of language resources for ” (2008, p. 37). Nowadays, English can in fact be considered the most important foreign language in Finland (Leppänen & Nikula, 2007).

According to the largest nationwide survey on the English language in Finland to date (Leppänen et al., 2011), the language is seen, heard, and used constantly in the everyday lives of Finnish people both at work and in free time. In public spaces, English is encountered most frequently in the street, shops, stores, restaurants, cafes and public transport and least often in institutional settings, offices, libraries, churches, and hospitals. The presence of English is especially evident at workplaces and educational establishments, as nearly 80% of respondents in working life encountered English at work and around 70% of students encountered it in their place of study. In general, the younger and more urban a Finnish person is, the more important English is to them whereas the attitudes of the oldest generations are somewhat less positive. Nevertheless, English is a legitimate linguistic resource for self-expression and communication

9

in a variety of different cultural and social contexts among the Finnish people. It is also fair to assume that much has changed since the study by Leppänen et al. (2011) was published and the role of English has by no means diminished. For example, various social media platforms, where people encounter and use English on a day-to-day basis, are more popular than ever before.

The soaring popularity of various social media platforms and the consumption of online content in the recent years have led to increasing amounts of research on the use of English in Finland in online contexts. As an example of a smaller-scale study, Kytölä (2013) studied multilingual practices found on Finnish football discussion forums and found that although the studied virtual spaces were essentially aimed for Finnish speaking football fans, frequent and marked use of English was regularly present in the discussions for different purposes. On the other hand, a large-scale study by Laitinen et al. (2018) found that English accounts for 26% of the languages used on Twitter in Finland, which is slightly less than in other Nordic countries, but still clearly more than any other language besides Finnish. Previous studies (Hiippala et al., 2019; Hiippala et al., 2020) have also shown that the capital city Helsinki has a rich VLL where Finnish and English have a very strong presence and both are used by locals as well as foreigners, even though it is fair to assume that most of the users do not speak English as their first language.

Large scale geotagged social media data holds great potential for further studies in multilingualism and the use of English in Finland, especially in the urban contexts. As Shohamy et al. (2010) have proposed, “to an ever-increasing extent, cities are places where different cultures, languages and identities interact; they are also places where this interaction can be observed.” (p. 3). The multicultural and diverse Finnish capital region can thus be considered an excellent site for studying urban multilingualism and the use of English as a lingua franca in Finland in the context of social media.

2.4 Ethical Considerations in Social Media Research

An aspect that somewhat complicated and redefined research on social media was the introduction of General Data Protection Regulation (GDPR) in 2018. According to Kotsios et al. (2019) some of the current problems include such as difficulties in pseudonymizing the data and the impossibility of guaranteeing respondent anonymity, difficulty of sending information to millions of users and the inclusion of data about individuals not involved in the study. In the

10

context of social media research, these matters are not easily solved and to some of them there are no standard procedures as the principles in the regulation can be rather difficult to interpret and apply (Kotsios et al. 2019).

Using data collected from social media demands caution and there are certain ethical considerations that must be taken into account as the data has been collected from people who have not explicitly agreed to participate in the study (Moreno et al., 2013). The data used in the current study is highly personal and sensitive and has been processed accordingly. The risk- level of the current study is, however, relatively low as the data is studied and evaluated based only on the languages used in the captions and the associated geotags. For the purposes of the current study, the actual written content of the captions, photographs and other content of individual posts are omitted. Likewise, I have also made sure that captions and all other parts of the data are treated anonymously throughout the process and the data are erased afterwards. Furthermore, Digital Geography Lab has taken care of GDPR-compliant use and storage of the data.

3 Data and Methods

In this chapter I present the data and methods used in the current study. In Section 3.1 I report the origins of the geotagged Instagram data and explain the initial processing of the data. In Section 3.2 I discuss the Corine Land Cover inventory which is used in the analysis of the geotagged Instagram data for categorizing different types of locations. In Section 3.3 I give a brief overview of the software used for processing the data, namely QGIS, GeoDa and Excel. In Section 3.4 I discuss the method of spatial autocorrelation analysis used in this study.

3.1 Geotagged Instagram posts

Geotagging is a widely used means of associating virtual content with geographical locations and the feature is found on various social networking platforms such as Instagram and Facebook. Geotagged data contain explicit geographic references in the form of latitude and longitude coordinates but may also contain other types of information such as location names, time stamps, accuracy data and more.

The data used in this study were originally collected from Instagram via their own Application Programming Interface (API) for the purposes of an earlier study by Tenkanen et 11

al. (2017). The API has since been closed (Bruns, 2019) and collecting up-to-date data in a similar way is not possible anymore. The data consists of 1 210 762 Instagram posts uploaded between June 2014 and March 2016. The data includes public posts that were geotagged within the Greater Helsinki area that includes the municipalities of Helsinki, Espoo, Vantaa, Kauniainen, , , Nurmijärvi, Tuusula, and . However, as the majority of the posts are located within the borders of Helsinki, Espoo, Vantaa and Kauniainen as can be seen from the maps presented in Section 4.2, I will focus on the capital city region in the analysis.

To capture sentence-level code switching within individual posts, the caption texts were processed as individual sentences for most parts of the analysis. Within the posts there were 1 886 215 individual sentences that had been processed beforehand with fastText (Bojanowski et al., 2017) automatic language detection tool that recognizes more than 170 languages1. The predictions made by fastText are accompanied by probability values that reflect the confidence of the model on the prediction made. The data was filtered for confident predictions as follows. Sentences with a probability value of less than 0.7 (as suggested by Hiippala et al., 2019), sentences that start with a hashtag (in order to exclude sentences that consist of only hashtags) and sentences that consist of only a single word (as very short sentences do not offer enough information for fastText to detect the language with enough confidence) were filtered out. The filtered data set that is used in the final analysis contains 634 747 posts with 897 569 sentences in total, which is approximately half of the original, unfiltered data.

To begin with, the pre-processed data contained the following information on each post: sentence identification number, post identification number, user identification number, caption text, sentence text, detected language of the sentence, confidence of language detection, number of characters in the sentence, name of the location, location coordinates and local time. These data fields were further complemented by calculating the count of words for each sentence and by breaking down the local time to hour, day of the week, month, and year for temporal analyses.

It is very important to keep in mind that the positional accuracy of geotagged Instagram posts varies greatly as discussed by Cvetojevic et al. (2016). A geotagged Instagram post very seldomly includes precise geographical coordinates of the location from which the image was

1 For a detailed description of the language identification process, see Hiippala et al. (2019). 12

taken or posted. Instead, users can choose the location they want to associate the post with from a predefined list. These location labels are associated with a geotag that may or may not correspond to the location of the location label. Location labels can also be either the name of a point such as the ‘Sibelius Monument’, but also the name of a larger area such as Helsinki or Finland. Instagram’s location database was originally linked to the location data platform Foursquare but has since been merged with Facebook Places. Until August 2015, users were also able to add custom locations on Instagram. As a result of the constantly evolving location database, real-world places can have multiple location labels assigned to them on Instagram with varying geolocations and names. The unreliability of geotagging and its consequences on interpreting the results of this study are discussed in more detail in Section 5.1.

3.2 Corine Land Cover Inventory

Corine Land Cover (Coordination of Information on the Environment Land Cover, CLC) is a Europe-wide land cover and land use monitoring programme2. CLC provides geospatial data on the state of the European landscapes and their change over time in six-year cycles. CLC data are used in various social and environmental applications by a wide range of national environmental agencies, private companies as well as researchers and academics most often working in the field of environmental sciences. In general, the CLC databases are created with the help of automatic satellite image interpretation as well as manual digitizing. The land cover of Finland has been mapped by The Finnish Environment Institute (SYKE) and all the mappings are available for free online3. CLC inventory is used in this study as a means of categorizing the different types of locations found in the data as during the study period Instagram did not provide a categorization for their POI database and therefore the locations can only be characterized by their name and geolocation. The CLC mapping conducted in 2018 is used in this study.

The original nomenclature of the Finnish CLC inventory has four levels of hierarchy. At the first and most generic level there are 5 classes: 1) Artificial surfaces, 2) Agricultural areas, 3) Forests and seminatural areas 4) Wetlands and 5) Water bodies. The second level has 15 classes and the third level 44 sub-classes. The fourth level has five additional classes in

2 https://land.copernicus.eu/pan-european/corine-land-cover 3 https://www.syke.fi/en-US/Open_information/Spatial_datasets/Downloadable_spatial_dataset

13

some of the sub-classes that are specific to the Finnish landscape and thus the total amount of different land cover classes is 49. An adapted version of the categorization is used in this study as some of the sub-classes are overly specific for the purposes of a purely linguistic study (for example, there are 18 classes for forests and other natural areas) whereas others are more semantically appropriate. The following list contains the 12 classes used in this study together with brief descriptions of each class and the level four codes for all the subclasses included in each category4.

1) Urban fabric ▪ Residential areas with apartment buildings, row houses and small residential buildings. ▪ Classes: 1111 and 1121 2) Commercial units ▪ Commercial complexes, office buildings, administrative and public buildings. ▪ Class: 1211 3) Industrial units ▪ Industrial and warehouse buildings and associated land. ▪ Class: 1212 4) Road and rail networks associated land ▪ Motorways and railways, including associated stations, platforms and land. ▪ Class: 1221 5) Port areas ▪ Infrastructure of port areas, including quays, dockyards, and marinas. ▪ Class: 1231 6) Airports ▪ Airports, runways and associated buildings and land. ▪ Class: 1241 7) Mine, dump and construction sites

4 For a complete description of the CLC nomenclature, see the links below. Note that there are some differences between the Pan-European and Finnish classification systems. For example, Commercial and Industrial units are separate classes in the Finnish nomenclature, whereas in the European classification they are listed as a single class. The adapted classification used in this study is based on the Finnish nomenclature. https://land.copernicus.eu/user-corner/technical-library/corine-land-cover-nomenclature-guidelines/html http://geoportal.ymparisto.fi/meta/julkinen/dokumentit/CorineMaanpeite2018Luokkakuvaus.pdf 14

▪ Artificial areas occupied by extractive activities, construction sites, man- made waste dump sites and their associated lands. ▪ Classes: 1311, 1312, 1321 and 1331 8) Green urban areas ▪ Formal parkland within or partly surrounded by urban areas. ▪ Class: 1411 9) Sport and leisure areas and facilities ▪ Areas and facilities used for sports, leisure and recreation purposes, such as camping grounds, summer cottages, sports grounds, leisure parks, golf courses, racecourses and indoor . ▪ Classes: 1421, 1422, 1423 and 1424 10) Agricultural areas ▪ Arable land, permanent crops, pastures and other agricultural land. ▪ Classes: 2111. 2221. 2311, 2312, 2431 and 2441 11) Forests and semi natural areas ▪ Forests, shrubs and herbaceous vegetation associations and open spaces with little or no vegetation. ▪ Classes: 3111, 3112, 3121, 3122, 3123, 3131, 3132, 3133, 3211, 3221, 3241, 3242, 3243, 3244, 3246, 3311, 3321 and 3331 12) Water bodies and wetlands ▪ Inland and coastal wetlands and bodies of water. ▪ Classes: 4111, 4112, 4121, 4122, 4211, 4212, 5111, 5121 and 5231

When working with the CLC inventory it must be kept in mind that there is bound to be overlap between the classes, especially in the urban contexts, as the classification is based on land cover. Urban areas are very often built in layers and overlapping layers are always assigned a single classification based on the dominant land cover type. This issue and its implications are discussed in more detail in Section 5.1.

3.3 QGIS, GeoDa and Excel

Geographical Information System (GIS) software are the primary tools used in the field of geoinformatics. GIS software are essentially frameworks for gathering, managing, and

15

analysing spatial and geographic data. The main software used in this study is called QGIS5 (QGIS Development Team, 2021), which is a free and open source software that is widely used by geographers, public and private organizations and others working with spatial data. QGIS has a relatively easy to learn and use graphical user interface and there are numerous introductory courses and tutorials available for free online6. All map visualisations presented in this study were created with QGIS.

Another free and open source software used in this study is GeoDa7 (Anselin et al., 2010). The software has been designed as an easy to approach tool for exploratory spatial data analysis. As with QGIS, numerous beginner friendly tutorials for learning the use of GeoDa can be found online8. The statistically significant spatial clusters presented in Figures 9 and 10 were calculated with GeoDa.

The third software used in this study was Microsoft Excel and its PivotTable functions. Excel’s PivotTable is an easy to use tool for summarising, sorting, counting, reorganising, and grouping large amounts of data stored in a table format. All the various graphs presented in this study were created with PivotChart and other Excel graphing tools.

3.4 Spatial Autocorrelation

Spatial autocorrelation is a central concept in spatial statistics that is used for explaining the presence of systematic spatial variation in a variable, or in other words, spatial dependency. Positive spatial autocorrelation means that there exists a tendency for areas that are close to each other to have similar values whereas negative spatial autocorrelation means that these areas have dissimilar values. There are two ways in which spatial autocorrelation is measured, global and local. Global measures describe the data set as a whole whereas local measures are used for detecting local clusters and assessing their significance.

5 https://qgis.org/en/site/

6 https://docs.qgis.org/3.16/en/docs/training_manual/index.html

7 https://geodacenter.github.io/

8 https://geodacenter.github.io/documentation.html

16

In this study I am primarily interested in identifying local spatial patterns and therefore a local measure is used. The measure of local spatial autocorrelation used in this study is called the Local Geary statistic (Anselin, 1995), which is a local version of one of the standard measures of global spatial autocorrelation, Geary’s c index. The Local Geary statistic of spatial autocorrelation is formulated as follows:

2 퐿퐺푖 = ∑ 푤푖푗(푥푖 − 푥푗) 푗

Essentially, the Local Geary measures the squared difference between the value at a geographic location and that at each neighbouring location and summarises this in the form of a weighted sum. Small differences imply similarity between locations. Local clusters are identified by means of a Moran scatter plot (Anselin, 1996) and assessment of the statistical significance of the observations. The classification of local clusters is discussed in Section 4.2 when I present the results of the spatial autocorrelation analysis.

The statistical significance of the observations was tested with 9 999 permutations. Permutation tests offer a data-driven approach to testing statistical significance (Good, 2013). By repeatedly resampling the initial test values an approximate reference distribution is generated. For each permutation, the initial test values are randomly relocated in space to calculate a new distribution. This randomly created reference distribution can then be compared with the initial test statistics to determine where they fall in comparison and to see how likely it would be to observe the initial test values under conditions of spatial randomness.

The spatial autocorrelation analysis was calculated with the difference from average in percentages based on the amount of posts with at least one sentence in English and Finnish within each hexagon in a 500-meter hexagonal grid. The process involved the following steps. First, I aggregated all posts into a 500-meter hexagonal grid and calculated the percentage of posts with at least one sentence in English (Figure 5) and Finnish (Figure 6) within each hexagon. Next, I calculated the difference of these percentages from the global average of posts with at least one sentence in English (38.4%) and Finnish (50.6%) in percentages. Thus, I had a percentage value for each hexagon indicating the degree to which the proportion of posts with English and Finnish differed from the global averages (Figures 7 and 8). These values were then used for calculating the Local Geary statistic for each hexagon and their neighbours to identify clusters of hexagons with similar and dissimilar values (Figures 9 and 10). The neighbourhood of adjacent hexagons was determined by a Euclidean distance of 510 meters.

17

4 Analysis and Results

In this chapter I analyse the data and present the findings of the study. Throughout this chapter I present statistics, graphs, and maps on the use of English, Finnish and other languages. Even though the main focus of this study is on the use of English, I also provide statistics and observations on all the other languages present in the data for the purpose of comparison and also to take the fundamentally multilingual nature of ELF communication into account (Cogo, 2017). Previous studies (Hiippala et al., 2019; Hiippala et al., 2020) have also shown that the capital city region has a rich and diverse VLL where English and Finnish have a very strong presence. Furthermore, comparing the similarities and differences between multiple languages also further highlight the unique as well as shared spatial and temporal patterns of English and other languages.

A brief clarification on the terminology used henceforth: the term ‘all languages’ is used as a reference to all 75 languages recognised in the data set and their combined values (such as count of sentences) and ‘other languages’ stands for all languages other than Finnish or English, which are studied separately. The term ‘global average(s)’ is used to reference the areawide distribution of languages based on count of sentences (Table 1, fifth column) or posts (Table 1, third column) depending on the context. In Section 4.1 I analyse the general statistics on the geotagged Instagram posts, languages, and users. In Section 4.2 I analyse the spatial distribution of English and Finnish in the form of seven maps. In Section 4.3 I analyse the use of English, Finnish and other languages within the different CLC classes together with their temporal patterns.

4.1 General statistics and temporal patterns

In this section I aim to provide an overview of the VLL found in the Finnish capital region focusing on the use of English. To start with, Table 1 lists the top 16 languages based on the number of sentences written in each language.

18

Count of unique users who Count of Count of used the posts in which Percentage unique users Percentage Count of Percentage language Percentage Language the language of all who used the of all posts sentences of all users more of all users was used at sentences language at frequently least once least once than any other language Finnish 325 007 51.2% 453 968 50.6% 63 347 48.8% 53 105 40.9% English 243 302 38.3% 328 986 36.7% 63 119 48.6% 49 507 38.1% Russian 41 817 6.6% 61 639 6.9% 16 019 12.3% 15 096 11.6%

Swedish 13 383 2.1% 18 116 2.0% 4 964 3.8% 3 290 2.5% Japanese 4 309 0.7% 4 631 0.5% 1 473 1.1% 1 174 0.9% Korean 3 098 0.5% 4 794 0.5% 1 240 1.0% 1 159 0.9% Spanish 2 687 0.4% 3 578 0.4% 1 615 1.2% 1 056 0.8% Portuguese 2 348 0.4% 3 492 0.4% 1 099 0.9% 750 0.6% German 2 177 0.3% 3 002 0.3% 1 340 1.0% 641 0.5%

Arabic 2 138 0.3% 2 515 0.3% 413 0.3% 371 0.3% Italian 1 723 0.3% 1 983 0.2% 1 277 1.0% 421 0.3% French 1 529 0.2% 1 681 0.2% 1 131 0.9% 432 0.3% Estonian 1 306 0.2% 1 357 0.2% 991 0.8% 485 0.4%

Thai 848 0.1% 891 0.1% 340 0.3% 276 0.2% Chinese 742 0.1% 748 0.1% 492 0.4% 368 0.3% Table 1. Top 16 languages – General statistics.

To recap, the filtered data set contains a total of 634 747 posts with 897 569 sentences that were posted by 129 944 unique users. A total of 75 languages were recognized in the data indicating that the VLL of the capital city region is indeed very rich, as has also been shown in previous research (Hiippala et al., 2019; Hiippala et al., 2020). The top 16 list in Table 1 includes languages that are spoken in countries geographically close to Finland, such as Russian, Swedish, Estonian, and German, but also further away, such as Japanese and Korean. The reasons behind these languages appearing in the VLL of the Finnish capital region are numerous, but the ever-increasing mobility of people in the globalized world is arguably the most obvious factor.

As may be expected based on previous research (Hiippala et al, 2019, Hiippala et al. 2020), English and Finnish are the two languages with the strongest presence in the VLL of the capital city region. Out of the 897 569 sentences 50.6% were recognised as Finnish and 36.7% as English making the two the most frequently used languages by a significant margin as the total of all other languages combined was 12.7% of all sentences. The third language on the list, Russian, accounted for 6,9% of all the sentences and was used by a considerable percentage of all users, 12.3%. Fourth on the list, the number of unique users of Swedish and

19

captions written in Swedish are perhaps surprisingly low as both are at around 2% of the totals even though Swedish is the second official language of Finland.

The extent to which English is used in the region appears to reflect the significant role of English as a Lingua Franca in Finland (e.g. Taavitsainen & Pahta, 2008; Leppänen et al., 2011), a notion which can also be extended to social media. Table 1 shows that out of the 129 944 users 48.6% used English at least once whereas 48.8% used Finnish at least once. These numbers show the widespread use of English as it was used in the study area essentially by as many users as Finnish was. The proportion of users who used English more often than any other language (38.1%) is also close to the percentage of users who had Finnish as their dominant language (40.9%). Furthermore, Table 2 shows that drawing on multiple languages was by no means uncommon on Instagram in the study area and the use of English was exceedingly high among multilingual users.

Number of Count of unique Percentage of Users who used Users who used Users who used Users who used recognised users all users English English in% Finnish Finnish in% languages per user 1 102 109 78.6% 37 234 36.5% 40 713 39.9% 2 24 039 18.5% 22 206 92.4% 19 194 79.8% 3 3 104 2.4% 3 009 96.9% 2 789 89.9% 4 504 0.4% 488 96.8% 473 93.8% 5 or more 188 0.1% 182 96.8% 178 94.7% Table 2. Multilingual users.

As discussed in previous ELF research (Cogo, 2017), multilingualism and translanguaging are always a part of ELF exchange and this kind of behaviour seems to be evident also in the studied data. Table 2 shows that a total of 27 835 users, or 21.4% of all users, used more than one language. Out of these users a total of 25 885 users, or 93%, used English which is a very significant number. The same number for Finnish was 22 634, or 81.3% of all multilingual users. Furthermore, out of the 53 105 predominantly Finnish users 11 307, or 21.3% of the total, also used English which is a good indication of the amount of bilingual exchange between the English and Finnish languages that speakers of Finnish carried out on Instagram in the capital city region.

20

45 000 90% 40 000 80% 35 000 70% 30 000 60% 25 000 50% 20 000 40%

Sentences per month 15 000 30%

10 000 20% Percentage ofall sentences 5 000 10% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 English Finnish Other Proportion of English Proportion of Finnish Proportion of other

Figure 1. Main axis & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month.

Figure 1 presents the growth of English, Finnish and other languages over time. Figure 1 shows an evident rising trend in the amounts of sentences geotagged in the Finnish capital region between June 2014 and March 2016. On average, the absolute number of sentences written in English grew by 8.4% per month whereas the number of sentences written in Finnish grew by 9.7% per month. The combined number of sentences written in all other languages also grew by 9.7% on average. The steady growth in all languages is most likely explained by the growing popularity of Instagram in general. However, the relative proportion of Finnish was the only one to grow with an increase of 0.9% on average per month, whereas English decreased at a rate of -0,4% and other languages by -0.1% on average per month.

The relative proportions of sentences written in Finnish, English and other languages show some fluctuation throughout the period of 21 months. In July 2014 the number of sentences written in English and Finnish were quite close to each other: Finnish at 44%, English at 40% and other languages at 16%. In April 2015 the gap between Finnish and all other languages had widened: 55% of all sentences were in Finnish, 34% in English and 10% in other languages. In March 2016 the numbers came quite close to the average of the entire period: Finnish at 52%, English at 37% and other languages at 11%. Such temporal changes were to be expected on Instagram in the studied period as the popularity of the application grew rather drastically and the user base developed on a constant basis. However, what is interesting to observe is that English maintained a very stable position as the second most used language in the area throughout the studied period. The stability of the VLL of the capital city region

21

suggests that the English language has become an established and significant linguistic resource that is drawn on regularly.

100 100 100 English Finnish Other languages 75 75 75

50 50 50

Sentences Sentences 25 Sentences 25 25

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day Figure 2. Daily temporal patterns in the Finnish capital region. Line: average number of sentences per hour. Grey area: standard deviation.

1200 1200 1200 English Finnish Other languages

800 800 800

Sentences Sentences 400 Sentences 400 400

0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week Figure 3. Weekly temporal patterns in the Finnish capital region. Line: average number of sentences per hour. Grey area: standard deviation.

Figures 2 and 3 present the average number of sentences written in English, Finnish and other languages per hour of day and day of week. The daily and weekly temporal patterns reveal the most common hours and days of using Instagram which appear to be very similar for all languages. Figure 2 shows that the daily patterns consist of three peaks: first and smallest peak at 11am around lunch time, second peak at around 5pm when people have gotten off work, and the last peak at 9pm when people go to bed, or alternatively, go out. The number of sentences also rises towards the weekend in Figure 3, indicating that the use of Instagram increases in the free time when people have more time to go out, visit places and take photos to post on social media. As these daily and weekly patterns represent the combined totals of the entire region, they cannot really be used to infer anything beyond the fact that the general temporal patterns of English use do not seem to differ greatly from the use of Finnish or other languages. Such similarities can be expected as most of the population in the area have a relatively similar daily and weekly rhythms of life, working usually from 9am to 5pm, Monday to Friday. A more detailed analysis of the temporal patterns in different types of locations is presented in Section 4.3.

22

4.2 The Spatial Distribution of English and Finnish

In this section I present seven maps for the purpose of analysing the spatial distribution of English and Finnish. All the presented maps are based on a 500-meter hexagonal grid (distance from side to side is 500 meters), except for the first map which is based on one square kilometre grid. The first map in Figure 4 presents the population density in the Finnish capital region and is provided here to give the reader a general idea of the geographical distribution of population across the study area.

Figure 4. Population density in the Finnish capital region in 2016 9.

Figures 5 and 6 present the spatial distribution of English and Finnish based on the number of posts with at least one sentence of the language in question in the caption, thus providing a general overview of the areas where the two languages are used the most and least. As mentioned earlier, Figures 5 and 6 show that the majority of the posts geotagged in the Finnish capital region are located inside the borders of the Helsinki Metropolitan area that consists of the municipalities of Helsinki, Vantaa, Espoo and Kauniainen. For this reason, all the maps have been framed with a focus on the Helsinki Metropolitan area and its immediate surroundings instead of the entire Greater Helsinki area.

9 Population density data retrieved from: https://www.stat.fi/org/avoindata/paikkatietoaineistot/vaestoruutuaineisto_1km_en.html 23

Figure 5. Spatial distribution of English.

Figure 6. Spatial distribution of Finnish.

In general, the spatial patterns in Figures 5 and 6 look relatively similar, except for the fact that the use of Finnish is more spread out. The distributions of both English and Finnish seem to follow the spatial patterns of the population density in Figure 4 to a degree, which seems rather natural. The more people there are going about their daily lives and activities within a certain area, the more activity there can be assumed to be in the corresponding virtual

24

environments on social media. As the study by Leppänen et al. (2011) showed, English is most often encountered in Finland in the urban context and locations where the mobility of people is high. The best example of such a place in the studied area is the Helsinki city centre and the surrounding areas.

The entire Helsinki city centre stands out as a large hotspot for English and Finnish use, indicating that Instagram is used actively in the area. This does not come as a surprise as the area is the centre of the Finnish capital region with a high population density that also hosts numerous tourist attractions and all the most important transportation hubs such as metro stations, bus terminals, ports and the central railway station, with the exception of the Helsinki- Vantaa airport which also stands out as a hotspot in Vantaa. Major public transportation lines such as railroads leading northeast and northwest and the metro line east of the Helsinki city centre also stand out as hotspots in both maps. Very often traffic hubs also have commercial and leisure activity in the close vicinity, such as shopping malls or sports centres, where users post often (see Subsection 4.3.1).

Figure 7. Spatial distribution of English – Relative difference from average (38% of all posts).

25

Figure 8. Spatial distribution of Finnish – Relative difference from average (51% of all posts).

Figures 7 and 8 present the relative differences from the global averages of posts with at least one sentence in English and Finnish within each hexagon10. The most remarkable feature in Figure 7 is the apparent transition from orange and yellow in the Helsinki city centre, western Helsinki and south-eastern Espoo, indicating that within these areas the percentage of posts with English is higher than on average, to blue in eastern and northern Helsinki and Vantaa, where English is used less than on average. This shift becomes even more evident when compared with the spatial patterns in Figure 8, which look very similar to the patterns found in Figure 7, except for the fact that the colours are inversed. In Eastern and Northern Helsinki and Vantaa there are large areas where the majority of the posts include significantly more Finnish than on average in the studied area. The fact that Figures 7 and 8 look like mirror images of each other is yet another indication of the dominance of English and Finnish in the study area. Since the relative proportions of these two languages are in general significantly higher than those of other languages, variance from the global average in one dominant language is in most cases reflected directly in the proportion of the other dominant language.

10 It should be noted that in Figures 7 and 8 there are numerous dark red outlier hexagons which can be disregarded for the most. Many of the outlier hexagons contain only small amounts of posts and therefore the difference from average fluctuates easily. 26

Figure 9. Spatial distribution of English – statistically significant spatial clusters.

Figure 10. Spatial distribution of Finnish – statistically significant spatial clusters.

Figures 9 and 10 present the results of the spatial autocorrelation analysis calculated with the Local Geary statistic using the relative differences from averages as described in Section 3.3. The results of the spatial autocorrelation analysis further highlight the areas where the use of English and Finnish are clustered. Figures 9 and 10 are interpreted as follows. Spatial clusters are classified either as ‘High-High’ when adjacent hexagons have high values, ‘Low- 27

Low’ when adjacent hexagons have low values, or ‘Other positive’ if adjacent hexagons have similar values, but they cannot be classified as the squared difference crosses the mean. In case of negative spatial autocorrelation, it is not possible to determine whether the association is between High-Low or Low-High outliers due to the squaring of the difference and therefore a single classification ‘Negative’ is used. Hexagons that do not have any adjacent hexagons are classified as ‘Neighborless’. Hexagons where the values are close to the mean are classified as ‘Not significant’.11

Figure 9 shows that the following locations feature clusters of high percentages of English use. In Helsinki the use of English is most clearly clustered in the inner city of Helsinki and its immediate surroundings (e.g. Jätkäsaari, Töölö, Ruskeasuo, Kallio, Sörnäinen, Vallila, ), northwest of the inner city in the suburbs of Haaga and Munkkiniemi as well as east in and west in . In Espoo the use of English is clustered in Otaniemi and Keilaniemi as well as in various locations around the Espoonlahti, Matinkylä, Tapiola and Leppävaara areas. In Vantaa the Helsinki-Vantaa airport stands out. ‘Low-Low’ clusters of English use seem to correspond quite often with ‘High-High’ clusters of Finnish use, which are discussed below.

Figure 10 shows that high percentages of Finnish use are clustered in the north-eastern inner city of Helsinki (Kallio, Sörnäinen and Vallila), eastern Helsinki (, Itäkeskus, Roihupelto, Puotila), north-eastern Helsinki (Viikki, Malmi and Tapanila) and northern Helsinki (Pasila, Käpylä, Kumpula, Pohjois-Haaga, Konala). In Vantaa ‘High-High’ clusters of Finnish use can be observed in the west in Hämeenkylä, Pähkinärinne and Petikko, and in the east around and Päiväkumpu areas. ‘Low-Low’ cluster of Finnish use correspond largely with ‘High-High’ clusters of English use.

A common factor for both English and Finnish clusters (‘High-High’ as well as ‘Low- Low’) found in Figures 9 and 10 seems to be that they are very often located near transportation hubs, i.e. intersections of railroad lines, the metro line, and Ring roads. The difference between the two languages is that English is used more (and Finnish less) closer to and around the Helsinki city centre as well as in Espoo, whereas Finnish is used more (and English less) farther

11 For a detailed description of the Local Geary statistic calculation in GeoDa, see: https://geodacenter.github.io/workbook/6b_local_adv/lab6b.html

28

from the city centre especially along the Ring 3 highway, railroad lines in northern Helsinki and Vantaa as well as along the metro line in eastern Helsinki.

To conclude, the relative amounts of English use are higher in the Helsinki city centre, western Helsinki and south-eastern Espoo. Although the total count of posts and sentences written in Finnish is higher than those in English and the use of Finnish is in general more spread out, there are distinct areas where the use of English is notably higher than on average in the capital city region. However, as with all urban locations, the regions found here are highly dynamic and complex. Without zooming in on specific locations it is not possible to draw highly specific conclusions and therefore the observations made here should be considered as broad generalizations.

4.3 Corine Land Cover Classes

In this section I analyse and compare the distribution of English, Finnish and other languages across the CLC classes. For a description of the CLC categorization used in this study, see Section 3.4. I begin with an overview of the distribution of languages across the CLC classes, which is then followed by a more in-depth analysis of the CLC data in three parts. Firstly, I analyse urban environments in Subsection 4.3.1, i.e. Commercial units, Urban fabric and Industrial units and Sport and leisure areas and facilities. Secondly, I discuss transportation in Subsection 4.3.2, i.e. Road and rail networks and associated land, Airports and Port areas. Thirdly, I examine recreational and natural areas in Subsection 4.3.3, i.e. Forests and semi- natural areas, Water bodies and wetlands and Green urban areas. The last two classes, Mine, dump and construction sites and Agricultural areas, are not discussed in more detail as the number of sentences is too low for a meaningful analysis.

To start with, Tables 3, 4 and 5 present the distribution of posts and sentences in English, Finnish and other languages across the CLC classes. Furthermore, Tables 3 and 4 present the average distribution of the classes within 50- and 100-meter buffer zones for English and Finnish. Buffer zone statistics are used here for analysing the general distribution of land cover classes in the surroundings of the point observations.

29

Average Average Count of posts Count of Distribution distribution Distribution of distribution of with at least sentences of sentences of classes posts in English classes within Corine Land Cover Class one sentence in English in English in within in percentages 100-meter in English in English percentages 50-meter buffer zone buffer zone Commercial units 80 365 33.0% 107 967 32.8% 31.7% 30.5% Urban fabric 55 932 23.0% 77 067 23.4% 25.2% 25.9% Road and rail networks 38 711 15.9% 52 904 16.1% 13.6% 13.0% and associated land Airports 15 486 6.4% 20 851 6.3% 6.5% 6.3% Industrial units 13 718 5.6% 18 509 5.6% 5.3% 4.6% Sport and leisure areas 13 039 5.4% 17 547 5.3% 5.0% 4.0% and facilities Forests and semi-natural 11 191 4.6% 14 838 4.5% 5.1% 7.0% areas Water bodies and 6 433 2.6% 8 329 2.5% 3.5% 4.2% wetlands Green urban areas 3 751 1.5% 4 820 1.5% 2.1% 2.4% Port areas 3 631 1.5% 4 758 1.4% 1.5% 1.2% Mine, dump and 655 0.3% 840 0.3% 0.3% 0.5% construction sites Agricultural areas 390 0.2% 556 0.2% 0.2% 0.3% Grand Total 243 302 100% 328 986 100% 100% 100% Table 3. CLC classes - Distribution of English.

Average Average Count of posts Count of Distribution Distribution of distribution of distribution of with at least sentences of sentences posts in Finnish classes within classes within Corine Land Cover Class one sentence in Finnish in Finnish in in percentages 50-meter 100-meter in Finnish in Finnish percentages buffer zone buffer zone Commercial units 117 775 36.2% 167 299 36.9% 34.2% 31.9% Urban fabric 71 605 22.0% 99 618 21.9% 23.4% 25.5% Road and rail networks and 46 664 14.4% 65 586 14.4% 13.0% 13.4% associated land Airports 9 619 3.0% 12 460 2.7% 2.8% 2.7% Industrial units 19 220 5.9% 28 706 6.3% 5.8% 5.3% Sport and leisure areas and 27 301 8.4% 38 515 8.5% 8.1% 6.5% facilities Forests and semi-natural 15 166 4.7% 19 367 4.3% 4.7% 7.1% areas Water bodies and wetlands 6 685 2.1% 8 492 1.9% 4.5% 3.4% Green urban areas 5 428 1.7% 6 872 1.5% 1.9% 2.3% Port areas 3 898 1.2% 4 909 1.1% 1.1% 1.0% Mine, dump and 1 066 0.3% 1 440 0.3% 0.3% 0.5% construction sites Agricultural areas 580 0.2% 704 0.2% 0.2% 0.3% Grand Total 325 007 100% 453 968 100% 100% 100% Table 4. CLC classes - Distribution of Finnish.

30

Count of Distribution of Distribution of Count of posts in sentences in posts in sentences in Corine Land Cover Class other languages other percentages percentages languages Commercial units 30 475 36.8% 42 471 37.1% Urban fabric 13 439 16.2% 18 377 16.0% Road and rail networks and associated land 12 130 14.6% 17 052 14.9% Airports 8 360 10.1% 11 278 9.8% Industrial units 4 317 5.2% 5 902 5.1% Sport and leisure areas and facilities 3 954 4.8% 5 698 5.0% Forests and semi-natural areas 3 118 3.8% 4 241 3.7% Water bodies and wetlands 2 252 2.7% 3 043 2.7% Green urban areas 1 491 1.8% 2 037 1.8% Port areas 3 137 3.8% 4 286 3.7% Mine, dump and construction sites 97 0.1% 133 0.1% Agricultural areas 81 0.1% 97 0.1% Grand Total 82 851 100% 453 968 100% Table 5. CLC classes - Distribution of languages other than English and Finnish.

The general statistics on the distribution of posts and sentences across the CLC classes in Tables 3, 4 and 5 reveal that most of the sentences, regardless of language, are situated in the urban environment and in the proximity of transportation networks with recreational and natural areas forming the third biggest group. The majority of the sentences written in English (67%), Finnish (74%) as well as other languages (63%) fall into the classes Commercial units, Urban fabric, Industrial units and Sport and leisure areas and facilities. Roughly a fourth of the sentences in English (24%), Finnish (18%) and other languages (28%) fall into the categories of transportation, namely Road and rail networks and associated land, Airports and Port areas. The proportion of recreational and natural areas, namely Forests and semi-natural areas, Water bodies and wetlands and Green urban areas, is around 9% for English, 8% for Finnish and 8% for other languages.

The statistics on the average distribution of CLC classes within 50- and 100-meter buffer zones show some variation from the general distribution of sentences. The rise in the proportions of Urban fabric, Forests and semi-natural areas, Water bodies and wetlands and Green urban areas indicate that quite often the urban areas where the majority of the geotagged posts are located in the study area are to some extent surrounded by not only residential areas, but also natural areas and the coastline. However, most of the differences between the point observations and the buffer zone statistics are relatively small, which indicates that with the studied data the point location statistics seem to also give a fairly accurate picture of the surroundings.

31

180 000

160 000 English 140 000 Finnish Other languages 120 000

100 000

80 000

60 000

40 000

20 000

0 Commercial Urban fabric Road and rail Airports Industrial Sport and Forests and Water bodies Green urban Port areas Mine, dump Agricultural units networks and units leisure semi-natural and wetlands areas and areas associated facilities areas construction land sites

Figure 11. Total number of sentences in each CLC class.

140% English 120% Finnish 100% Other languages

80%

60%

40% 27% 14% 12% 20% 8% 6% 5% 0%

-20% -7% -5% -4% -7% -5% -23% -40%

-60% Commercial Urban fabric Road and rail Airports Industrial Sport and Forests and Water bodies Green urban Port areas Mine, dump Agricultural units networks and units leisure semi-natural and wetlands areas and areas associated facilities areas construction land sites

Figure 12. Relative differences from sentence averages (English 36.7%, Finnish 50.6%, other languages 12.7%) in each CLC class.

Figure 11 presents the total number of sentences in each class and Figure 12 shows the differences from the global averages of sentences in English, Finnish and other languages in each class. The distributions of English, Finnish and other languages look initially quite similar in most classes in Figure 11, but Figure 12 provides more insights into the differences within the classes. Figure 12 shows that the relative proportions of English are higher than on average in six of the twelve classes: Urban fabric (+8%), Road and rail networks and associated land (+6%), Airports (+27%), Forests and semi-natural areas (+5%), Water bodies and wetlands (+14%) and Agricultural areas (+12%). The relative proportions of English are lower than on average in Commercial units (-7%), Industrial areas (-5%), Sport and leisure areas and

32

facilities (-23%), Green urban areas (-4%), Port areas (-7%) and Mine, dump and construction sites (-5%). These findings are discussed in more detail in the following subsections.

4.3.1 Urban environments

45% English Finnish 23% 25% 8% 4% 5% 7% 5% 1%

-5% -15% -7% -13% -23% -35% -26% -27% Commercial units Urban fabric Industrial units Sport and leisure facilities

Figure 13. Urban environments. Differences from the sentence averages.

English Finnish Other languages English Finnish Other languages 16 000 80% 16 000 80% 14 000 70% 14 000 70% 12 000 60% 12 000 60% 10 000 50% 10 000 50% 8 000 40% 8 000 40%

6 000 30% 6 000 30%

Sentences Sentences 4 000 20% 4 000 20% 2 000 10% 2 000 10% 0 0% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 2014 2015 2016 Commercial units Urban fabric

English Finnish Other languages English Finnish Other languages 16 000 80% 16 000 80% 14 000 70% 14 000 70% 12 000 60% 12 000 60% 10 000 50% 10 000 50% 8 000 40% 8 000 40%

6 000 30% 6 000 30%

Sentences Sentences 4 000 20% 4 000 20% 2 000 10% 2 000 10% 0 0% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 2014 2015 2016 Industrial areas Sport and leisure areas and facilities Figure 14. Urban environments. Main axes & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month.

33

The class with the highest number of sentences out of all classes is Commercial units, as around a third of all sentences are in this class. A closer look at the most popular location labels associated with Commercial units (see Appendix A) reveals locations such as Finland’s biggest convention centre Messukeskus, the popular tourist attraction Helsinki Senate Square, music clubs Tavastia and Circus as well as contemporary art museum, Linnanmäki amusement park and shopping centres Kamppi and Stockmann. All these locations, like commercial contexts in general, are places where people go to spend their free time, do shopping, attend events and visit tourist attractions. Therefore, in these locations users are also active on social media and use geotags relatively often to show other users where they have been to (Tasse et al., 2017).

According to Leppänen et al. (2011), the English language is encountered very frequently in commercial contexts in Finland. What is therefore a slightly surprising finding is that in the studied data English is used 7% less in the Commercial units class than on average as seen in Figure 13. On the other hand, Finnish (+4%) and other languages (+5%) are used somewhat more than on average. The differences from averages are, however, relatively small and Commercial units is the class with the highest count of sentences in English (see Figure 11) indicating that English is indeed encountered often in the VLL of Commercial units.

The class with the second highest count of sentences overall is Urban fabric. A closer look reveals location labels (see Appendix B) referring to various neighbourhoods such as Punavuori, Töölö, Kruununhaka, and Tapiola as well as more specific labels referring to, for example, parks and cafes in these neighbourhoods. Areas such as Töölö, Kallio and Punavuori in Helsinki are very prestigious and popular, which encourages the use of geotags on social media. English and Finnish are clearly the dominant languages in Urban fabric whereas other languages are used 26% less than on average. Out of the three classes analysed in this subsection, Urban fabric features the highest relative proportion of English as the language is used 8% more than on average.

The location labels associated with Industrial units (see Appendix C) reveal locations such as , , Suvilahti, Flow Festival, training and sports facilities as well as restaurants and night clubs. Especially in Helsinki, industrial areas very often have commercial space and are actively used for organising festivals as well as other types of events. The use of English (-5%) and other languages (-13%) is somewhat lower than on average, whereas the use of Finnish (7%) is higher. The effects of events organised in these locations

34

could be one explaining factor for the prevalence of Finnish in certain contexts as I discuss next.

Sport and leisure areas and facilities is somewhat different from the other classes analysed in this subsection, as the class includes both outdoor areas as well as indoor facilities. However, the majority of the data in this class comes from various arenas and stadiums. The most popular locations associated with Sport and leisure areas and facilities (see Appendix D) include such as Hartwall , , Sonera Stadium and . In Sports and leisure areas and facilities there is a rather significant decrease in the use of English by -23% and other languages by -27% whereas the use of Finnish is up by 23% indicating a rather clear prevalence of Finnish over English and other languages. It is very likely that various events, concerts and sports competitions organised in these locations have a significant effect on the VLL of Sport and leisure areas and facilities and Finnish seems to be the most common language in these contexts. What is more, the prevalence of Finnish could also be explained by the fact that smaller areas and facilities, such as training centres, football fields, golf courses etc., are visited most often by local Finnish residents.

40 40 40 English Finnish Other languages 30 30 30

20 20 20

Sentences Sentences Sentences 10 10 10

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

500 500 500 English Finnish Other languages 400 400 400 300 300 300

200 200 200

Sentences Sentences Sentences 100 100 100 0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 15. Daily and weekly temporal patterns in Commercial units. Line: average number of sentences per hour/day. Grey area: standard deviation.

35

20 20 20 English Finnish Other languages 15 15 15

10 10 10

Sentences Sentences 5 Sentences 5 5

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

250 250 250 English Finnish Other languages 200 200 200

150 150 150

100 100 100

Sentences

Sentences Sentences 50 50 50

0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 16. Daily and weekly temporal patterns in Urban fabric. Line: average number of sentences per hour/day. Grey area: standard deviation.

8 8 8 English Finnish Other languages 6 6 6

4 4 4

Setnences Sentences Sentences 2 2 2

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

80 80 80 English Finnish Other languages 60 60 60

40 40 40

Sentences Sentences 20 Sentences 20 20

0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week Figure 17. Daily and weekly temporal patterns in Industrial units. Line: average number of sentences per hour/day. Grey area: standard deviation.

36

12 12 12 10 English 10 Finnish 10 Other languages 8 8 8

6 6 6

Sentences Sentences 4 Sentences 4 4 2 2 2 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

150 150 150 125 English 125 Finnish 125 Other languages 100 100 100 75 75 75

Sentences 50 Sentences Sentences 50 50 25 25 25 0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 18. Daily and weekly temporal patterns in Sport and leisure areas and facilities. Line: average number of sentences per hour/day. Grey area: standard deviation.

In Figures 16, 17 and 18 there does not seem to be major differences between the daily and weekly patterns of English, Finnish and other languages, besides the fact that the number of sentences written in English and Finnish is greater than those in other languages. The daily temporal rhythm of Commercial units shows a pattern with three peaks in the activity which is like the one observed in Figure 2. In Urban fabric the peaks appear flatter and the use of Instagram is more spread out across the afternoon and evening. Events and other activities organised in Industrial units and Sport and leisure areas and facilities stand out as a higher peak in the use of Finnish in the evening-time and weekends. The activity rises towards the end of the week in all classes.

37

4.3.2 Transportation

160% 142% 140% English 120% 99% 100% Finnish 80% Other languages 60% 40% 27% 20% 6% 0% -20% -4% -1% -7% -40% -30% -60% -45% -80% Road and rail networks and Airports Port areas associated land

Figure 19. Transportation. Differences from the sentence averages.

English Finnish Other languages English Finnish Other languages 8 000 80% 8 000 80% 7 000 70% 7 000 70% 6 000 60% 6 000 60% 5 000 50% 5 000 50% 4 000 40% 4 000 40%

3 000 30% 3 000 30%

Sentences Sentences 2 000 20% 2 000 20% 1 000 10% 1 000 10% 0 0% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 2014 2015 2016 Road and rail networks Airports

English Finnish Other languages 8 000 80% 7 000 70% 6 000 60% 5 000 50% 4 000 40%

3 000 30% Sentences 2 000 20% 1 000 10% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 Port areas Figure 20. Transportation. Main axes & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month.

The mobility of people has significant effects on the LL of a given space and these effects are especially prominent in the context of transportation (Soler-Carbonell, 2016). If we consider

38

the VLL an extension of the LL (Hiippala et al., 2019), then the same concept of mobility as a factor that enables the emergence of particular linguistic resources in a given space can be extended to social media as well. Furthermore, geotagging is an often-used feature for keeping friends, family and other followers updated on one’s travels (Tasse et al., 2017).

Road and rail networks and associated land feature the third highest number of sentences overall. As a form of transportation, travel by roads and railways involves travelling relatively short distances within the city or between municipalities for various reasons such as work, study, school or hobbies. However, with Road and rail networks and associated land it should be kept in mind, that the class is very closely tied to the urban environments discussed in the previous section and the associated locations are for the most not directly related to transportation. Location labels associated with Road and rail networks and associated land (see Appendix E) do include for example, the Helsinki Central railway station and various labels referring to the municipalities of Helsinki, Espoo and Vantaa that have most likely been geotagged at railway stations or other transportation hubs. However, the list also includes locations such as restaurants, night clubs, event venues, museums and hotels. Geotags placed on the streets end up in this class even if the associated location label refers to, for example, a museum in a nearby building. As a result, most of the location labels are not related to transportation beyond the fact that they are located in the vicinity of roads and railways and most users will have used the road and rail networks to get to the locations found in this class. Therefore, Road and rail networks and associated land should essentially be considered an extension of the urban and commercial environments discussed in the previous section.

The distribution of languages in Road and rail networks and associated land is quite close to the global averages. However, Figure 19 shows that English is used somewhat more than on average (+6%). In January 2016 English even becomes the most used language in the class (Figure 20). A closer look at the monthly data revealed that this peak in English and other languages is caused by a sudden and drastic increase in the use of a single location label, ‘helsinki’. The fact that a single location label can have such a striking effect on the distribution of languages across a land cover class indicates that the results presented here must be considered with caution and as broad generalizations.

Airports are mainly hubs for international travel and this is reflected in the significant differences from the global averages. Location labels associated with Airports (see Appendix F) refer to the Helsinki-Vantaa airport and its immediate surroundings almost exclusively as it

39

is the only major airport in the region. In Airports English is used 27% more, other languages 99% more and Finnish 45% less than on average. As the global lingua franca, English can be expected to appear frequently alongside a wide range of other languages in the rather out of the ordinary virtual as well as physical linguistic landscapes of airports (Blackwood, 2019; Woo & Riget, 2020).

As the capital city region is located on the coast, an important mode of transportation in the area is travel by sea. Location labels associated with Port areas (see Appendix G) refer to the and associated terminals and surroundings as well as various cruise ships that operate in the area. In Port areas Finnish is used 30% less and English is used 7% less than on average whereas other languages show a rather significant increase of 142% from the average. The Port of Helsinki is in fact one of Europe’s busiest passenger ports with popular connections to Sweden, , Russia and Germany, which is the most obvious explanation for the finding that other languages appear in this context quite more often than on average in the Finnish capital region. Furthermore, work-, shopping-, services- and social-related cross border mobility of transnationals, such as working in Finland and Finns working in Estonia (Järv et al., 2021), is most likely another factor affecting the VLL of Port areas.

15 English 15 Finnish 15 Other languages

10 10 10

Sentences Sentences Sentences 5 5 5

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

150 150 Finnish 150 English Other languages 125 125 125 100 100 100

75 75 75

Sentences Sentences Sentences 50 50 50 25 25 25 0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 21. Daily and weekly temporal patterns in Road and rail networks and associated land. Line: average number of sentences per hour/day. Grey area: standard deviation.

40

6 6 6 5 English 5 Finnish 5 Other languages 4 4 4

3 3 3

Sentences Sentences 2 Sentences 2 2 1 1 1 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

50 English 50 Finnish 50 Other languages 40 40 40 30 30 30

20 20 20

Sentences

Sentences Sentences 10 10 10 0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 22. Daily and weekly temporal patterns in Airports. Line: average number of sentences per hour/day. Grey area: standard deviation.

2 2 2 English English Other languages

1 1 1

Sentences

Sentences Sentences

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

15 15 15 English Finnish Other languages

10 10 10

Sentences Sentences Sentences 5 5 5

0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 23. Daily and weekly temporal patterns in Port areas. Line: average number of sentences per hour/day. Grey area: standard deviation.

The three classes feature three very different types of daily rhythms. The three peaks seen in Road and rail networks and associated land look very similar to the ones observed in the previous section, which is only natural as these transportation networks are located within the urban environments. Airports class features a rather distinct daily pattern that is especially evident in the use of English. The most popular hours and days for flights are reflected in the daily peaks at 7am, 1pm and 4pm together with weekly peaks on Mondays and Fridays. The 41

temporal patterns in Port areas show weekend-focused mobility between leisure, work and home. Cruise tourism is focused on the weekend, whereas transnationals commute on Mondays and Fridays.

4.3.3 Recreational and natural areas

25% 21% English 20% 17% Finnish 14% 15% 10% 5% 5% 0% -5% 0% -1% -10% -4% -15% -13% -20% -16% Forests and semi-natural areas Water bodies and wetlands Green urban areas

Figure 24. Recreational and natural areas. Differences from the sentence averages

English Finnish Other languages English Finnish Other languages 2 000 80% 2 000 80% 1 750 70% 1 750 70% 1 500 60% 1 500 60% 1 250 50% 1 250 50% 1 000 40% 1 000 40%

750 30% 750 30%

Sentences Sentences 500 20% 500 20% 250 10% 250 10% 0 0% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 2014 2015 2016 Forests and semi-natural areas Water bodies and wetlands

English Finnish Other languages 2 000 80% 1 750 70% 1 500 60% 1 250 50% 1 000 40%

750 30% Sentences 500 20% 250 10% 0 0% 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 2014 2015 2016 Green urban areas Figure 25. Recreational and natural areas. Main axes & bars: Monthly total of sentences in English, Finnish and other languages. Secondary axis & lines: Relative proportions of English, Finnish and the combined total of all other languages per month. 42

Natural and coastal areas in the capital city region attract a great number of visitors all year round. Also using data from Instagram, Heikinheimo et al. (2020) analysed the languages found in the VLL of green spaces in Helsinki. Heikinheimo et al. used different registers for identifying green spaces, not the CLC inventory used in this study, but their findings can be related to the results discussed in this subsection. Heikinheimo et al. found that linguistic richness was highest in green spaces close to the Helsinki city centre and in locations that also serve as touristic attractions. On the other hand, green spaces further away from the Helsinki city centre featured significantly more Finnish compared to other languages.

The most popular locations associated with Forests and semi-natural areas (see Appendix I) include such as Lauttasaari, , , Helsinki and national park. In Forests and semi-natural areas, the use of English is somewhat above the average (+5%) whereas other languages are used 13% less. Finnish is used exactly as much as on average in the study area.

The locations associated with Water bodies and wetlands (see Appendix J) include mostly beaches and park areas close to water such as Eiranranta, , Tokoinranta, and Haukilahti. The most popular location label is, however, the ‘Market Square, Helsinki’. Whereas Finnish is used 16% less than on average, Water bodies and wetlands feature quite significant increases of 14% in the use of English and 21% in the use of other languages. Such high percentages are most likely the result of the popular tourist attraction Market Square appearing high on the list of most used location labels.

The most popular locations associated with Green urban areas (see Appendix H) include such as the Sibelius Monument, Lauttasaari, Esplanade park, Kaivopuisto, Alppipuisto, Töölönlahti and the Viikki fields. Green urban areas feature other languages 21% more than on average, whereas English and Finnish are slightly below the average values. The high proportion of other languages in Green urban areas can be related to the abovementioned findings by Heikinheimo et al. (2020) as many park areas associated with this class are located close to the Helsinki city centre and host popular tourist attractions.

Figure 25 shows similar fluctuation in the amounts of sentences in all three classes with peaks during the summer and autumn. These seasonal changes are also reflected in the rather significant fluctuation in the proportions of languages. The use of outdoor areas is highly seasonal and therefore such fluctuation is to be expected. It appears that in all of the three classes the proportion of Finnish peaks highest during the spring, summer, and autumn when

43

locals spend more time outdoors, whereas the relative proportions of English use are highest during the winter. The relatively low monthly sentence counts should also be considered, as the distributions are therefore quite prone to sudden fluctuations.

3 3 3 English Finnish Other languages

2 2 2 Sentences

Sentences 1 1 Sentences 1

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

20 20 20 English Finnish Other languages 15 15 15

10 10 10

Sentences

Sentences Sentences 5 5 5

0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 26. Daily and weekly temporal patterns in Green urban areas. Line: average number of sentences per hour/day. Grey area: standard deviation.

5 6 6 English Finnish Other languages 4 5 5 4 4 3 3 3

2

Sentences Sentences Sentences 2 2 1 1 1 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

60 60 60 50 English 50 Finnish 50 Other languages 40 40 40

30 30 30

Sentences Sentences 20 20 Sentences 20 10 10 10 0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 27. Daily and weekly temporal patterns in Forests and semi-natural areas. Line: average number of sentences per hour/day. Grey area: standard deviation.

44

4 4 4 English Finnish Other languages 3 3 3

2 2 2

Sentences Sentences 1 1 Sentences 1

0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 0 2 4 6 8 10 12 14 16 18 20 22 Hour of the day Hour of the day Hour of the day

30 30 30 25 English 25 Finnish 25 Other languages 20 20 20

15 15 15 Sentences

10 Sentences Sentences 10 10 5 5 5 0 0 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Day of the week Day of the week Day of the week

Figure 28. Daily and weekly temporal patterns in Water bodies and wetlands. Line: average number of sentences per hour/day. Grey area: standard deviation.

In Green urban areas, Forests and semi-natural areas and Water bodies and wetlands there does not seem to be major differences between the temporal patterns of English, Finnish and other languages. Temporal patterns for the three classes feature peaks during the evenings and weekends which is logical as natural areas are used for recreational purposes during the free time.

5 Discussion

I begin this section by discussing the limitations of the study in Section 5.1. In Section 5.2 I discuss the results of this study and seek to answer the research questions for this study based on the data presented, bearing in mind the theoretical background of this study and the discussed limitations. In Section 5.4 I discuss my suggestions for further research.

5.1 Limitations of the study

Interpreting geotagged social media data must always be approached with caution. Although it is very likely that a high percentage of the data used in this study has been produced by Finnish users, drawing conclusions about the use of English among the Finnish people as a whole based on the findings of this study warrants caution. Firstly, I did not take into account the country

45

of origin of each user and it can be assumed that a significant proportion of the data has been generated by foreigners instead of locals. Secondly, most users of Instagram are relatively young (Longley et al., 2015; Hausmann et al., 2018; Lopez et al., 2019) and therefore do not represent the population at large. Thirdly, the use of Instagram also clearly focuses on the free time. It is therefore easy to get a skewed picture of language use and its spatiality towards positive and out of the ordinary things. Events and various special occasions are clearly emphasised when users decide to geotag their location (Tasse et al., 2017) whereas activities of the everyday life are not as clearly visible.

An aspect of the data that makes establishing clear links between language use on geotagged social media and the corresponding offline environments particularly challenging is the inaccuracy of locations found on Instagram. According to Tasse et al. (2017) associating a post with a location is very often the result of a conscious choice. Furthermore, any user can geotag their post to any location in the world even if they have never even actually been to that place. The actual coordinates of the geotags might not even correspond to the intended location due to user error, bad GPS-signal or other reasons. Geotagging at various spatial scales also further degrades the accuracy of spatial observations. In hindsight, filtering out the location labels referring to large geographical areas, such as Finland or Helsinki, would probably have made the data somewhat more accurate. Such location labels are used very often which results in false hotspots that do not correspond with the actual location of the users. The tables on the most popular location labels in Appendices show that various tags referring to large geographical areas are used often alongside more specific location labels. During the study period, Instagram also allowed all users to easily create their own location labels and geotags, which makes the locations found in the data even more inconsistent and prone to errors. This feature was deactivated in August 2015 and creating custom locations has since become more difficult.

Regarding the use of the CLC inventory as a means of categorizing the locations, it should be noted that there is bound to be overlap between the land cover classes which makes the classification at times vague and imprecise for the purposes of a linguistic analysis. For example, the shopping complex Kamppi in the Helsinki city centre covers a large area of land, which is all categorised as Commercial units. Besides shops and other types of commercial space, the shopping centre also hosts an underground metro station, a large training centre and housing, which could theoretically speaking be classified into Road and rail networks and associated land, Sport and leisure areas and facilities and Urban fabric categories

46

respectively. Urban environments are almost always built in layers, one on top of the other, which complicates the accuracy of classifications based on land cover. Furthermore, Road and rail networks and associated land proved especially difficult analyse, as the many of the associated locations fall into the class simply for the fact that the geotag happens to be placed on a road. We also need to keep in mind the inaccuracy of smartphone GPS devices and Instagram location information (Cvetojevic et al., 2016). Especially in the presence of tall apartment buildings, the quality of the GPS signal is poor, which reduces the reliability of the position accuracy.

Automatic language recognition and big data have certain challenges. The data analysed in this study consists of roughly half of the original, unfiltered data. However, as human language use is highly varied and complex it will probably never be possible to recognise such large quantities of unstandardized language use without at least minor errors here and there. Instagram is also primarily a photo sharing application where the emphasis is on visual content and relatively short caption texts. Short sentences can be very difficult to interpret without the associated image and as a result a large part of the text cannot be identified with sufficient certainty. In general, social media data are always highly varied and even ‘raw data’ sourced directly from the platform interface is not neutral as it has already been filtered and classified by the platform itself. Therefore, the only way for pursuing research questions such as the ones presented in this study is to accept these facts and try to find the boundaries that provide enough data without compromising the confidence of the predictions made by automatic language recognition tools.

It must be admitted that the rather simplistic way of grouping all languages other than English and Finnish into one seemingly homogenous group is by no means unproblematic and only usable to a very limited extent. Every language found in the data and their users are unique and different from each other. Although the sensibleness of grouping large numbers of vastly differing languages together can be trivial in many situations, I believe that the observations made here hold their place as the presented data serve the sole purpose of contextualizing the bigger picture and providing a broad overview of what goes on with languages other than the ‘big two’, Finnish and English. I have done my best to avoid assuming all 73 languages in the group of ‘other languages’ are used in a similar manner. The focus of this study is first and foremost on the use of English and without making some compromises I would not have been able to fit everything that I have presented here into a single study without straying too far from the main objective. Then on the other hand, languages are always in constant interaction with

47

each other and I believe that observing only the use of English on its own would have provided an all too narrow point of view.

5.2 Research Questions

To what extent is English used on Instagram in the Finnish capital region?

In light of the data that has been presented in this study it is clear that the English language has a very strong presence in the linguistically diverse VLL of the Finnish capital region found on Instagram. English is the second most used language in the studied area with a total of 328 986 sentences, or 36.7% of all sentences. Furthermore, 63 119 unique users, 48.6% of all users, used English at least once and a total of 49 507, or 38.1% of all users, used English more than any other language in their captions, both of which come very close to the number of users who used Finnish. The absolute number of sentences written in English grew by 8.4% per month on average, increasing steadily throughout the studied period of 21 months. However, the relative proportion of sentences in English out of all sentences actually decreased slightly by -0.4% per month on average. The slightly downward trend could be explained by the fact that the popularity of Instagram grew very fast during the study period. As vast amounts of new users started using the platform the relative proportions of the languages fluctuated accordingly.

The English language clearly has a key role in communication on Instagram in the Finnish capital region and the findings of this study largely reflect what has been written on ELF (e.g. Jenkins et al., 2017), the status and extensive use of English in Finland (e.g. Taavitsainen & Pahta, 2008; Leppänen et al., 2011) and the use of English in online communication (e.g. Lee, 2016). The stable position of English as the second most used language in the VLL of the capital city region in the study period indicates that English has become a significant linguistic resource that is drawn on frequently in the area, at least on the studied platform Instagram. The wider notion of English as a significant language resource on social media in Finland is supported by previous studies on Twitter data as well (Laitinen et al., 2019; Hiippala et al., 2020) and the results of this study further confirm that idea.

48

How is the multilingual nature of English as a lingua franca communication reflected in the data?

It can be assumed that English is by no means the first language of the majority of the people using Instagram in the study area. The language is, however, a significant linguistic resource that both the Finns as well as foreigners in the Finnish capital region draw on very often as noted above. In light of the data, English is also not only a secondary linguistic resource, but also the primary language on Instagram for many in the study area and the presence of English is especially evident with multilingual users. A total of 27 835 unique users, or 21.4% of all users, used more than one language and out of those users a significant number of 25 885, or 93%, used English. Out of the predominantly Finnish users 11 307, or 21%, also used English, indicating that English is used rather frequently among the predominantly Finnish users in the area.

As discussed in the theoretical framework, in ELF communication speakers continuously adapt to one another in their interaction and as a result end up drawing on a wide range of multi- and translingual resources at the same time. The findings of this study support this idea, as English clearly forms a significant part of the linguistic resources of users who use multiple languages on Instagram. However, this matter deserves more attention in the future and as has been shown in this study, geotagged social media data holds great potential for future research on ELF and multilingual practices in specific contexts. Also, as has been demonstrated in previous research (Hiippala et al., 2019; Loikkanen, 2020) physical locations can have an effect on the choice of language online. Future research on VLLs and ELF should continue investigating the effects that different types of locations have on users and their multilingual practices. For example, future research could look into whether multilingual practices emerge more often in specific types of locations.

How do different areas and types of locations in the Finnish capital region differ in terms of English use on Instagram?

The results of this study revealed a rich VLL from which certain general observations can be made. The use of geotagged Instagram is clearly focused on densely populated urban environments and transportation networks, and the use of English is no exception. The absolute numbers of sentences in English are highest in the Helsinki city centre and its close surroundings as well as eastern Espoo. Major public transportation lines such as railroads and 49

the metro line as well as the Helsinki-Vantaa airport also stand out as hotspots. When considered in relation to Finnish and other languages, the use of English is highest in the Helsinki city centre, western Helsinki and eastern Espoo and lowest in north-eastern Helsinki and Vantaa. A detail that was especially interesting to observe was that the relative spatial distributions of English and Finnish look like mirror images of each other. Behind this phenomenon is the fact that these two languages are by far the most dominant languages in the region.

Analysis of the different types of locations revealed interesting insights into the VLL of the capital city region. The total numbers of sentences in English are highest in various commercial and urban contexts as well as transportation networks. The English language has a strong presence in essentially all the studied location types as the language is usually used almost as much as Finnish and at times even more. The fact that English is used extensively in a wide range of contexts is yet another indication that English has become a significant linguistic resource in the Finnish capital region. In most location types the use of English is fairly close to the average of the entire area, i.e. 37% of all sentences, as the relative difference from average is between -10% and 10% in seven out of the ten CLC categories analysed, namely Commercial units, Urban fabric, Road and rail networks and associated land, Industrial units, Forests and semi-natural areas, Green urban areas and Port areas. The use of English is especially high at airport areas (essentially the Helsinki-Vantaa airport) as the language is used 27% more than on average. The high proportions of English as well as languages other than Finnish are arguably the result of the international mobility of people. The relative proportions of English use are lowest in sport and leisure facilities where the language is used 23% less than on average. Finnish is clearly the dominant language in these types of locations, which could be the result of various events organised in the arenas and stadiums of the capital city region. Lastly, around a tenth of all sentences are located in the natural and coastal areas and in these locations the use of English is relatively close to the average.

For the most, the monthly data indicates that the VLLs of different types of locations are relatively stable. The use of English also seems very similar at different times of day and days of week in different types of locations when compared with Finnish and other languages. Similarly, the temporal data showed that different location types have distinct seasonal patterns that appear to be very similar for English, Finnish and other languages. The most significant seasonal changes were observed in natural and coastal areas, which is rather natural as Finland

50

is a country with four distinct seasons and the use of outdoor areas varies accordingly throughout the year.

Regarding the figures presented in this study, they must be considered with the inaccuracy of locations found on Instagram and the limitations of categorization based on landcover in mind. The results should therefore be interpreted mainly as indicative figures and broad generalizations. The tables presented in Appendices provide an overview of the location labels that fall into each of the CLC categories and it is easy to see that there is a great amount of variation in the distributions of languages in the individual locations found in each category. To gain an even better understanding of where and when the use of English use is highest and lowest, zooming in on specific locations is required. The maps presented in this study, especially the detected spatial clusters and statistics on different types of locations, could be used as a starting point for further research on the VLL of the capital city region.

5.3 Suggestions for further research

This study provided an overview perspective on the use of English on Instagram in the Finnish capital region. However, the intriguing questions that are left somewhat unanswered in the current study are as follows: What are the reasons behind English emerging more prominently in some settings and not in others? Furthermore, what kinds of effects do different types of physical locations have on language choices on social media? Answering these questions would most likely involve tracking user behaviour and factoring in the likeliest country of origin of each user as demonstrated by Hiippala et al. (2019) and Hiippala et al. (2020). A thorough analysis of geotagging behaviour on social media could also be used for pursuing these questions. Nowadays, Instagram and most other social media platforms use hierarchical location categorization which can be used to define whether a location is, for example, a restaurant, cafe, sports facility, night club, tourist attraction, city or street. Using the built-in POI categorizations of social media platforms (Hochmair et al., 2018) could provide us with a new source of detailed information about distinct location types that could be used for effectively analysing user behaviour in different environments.

Although the Corine Land Cover inventory used in this study was not originally developed for purely linguistic enquiries, it proved to be a reasonably valid tool for analysing the data and the inventory might prove beneficial in future research. The CLC classification provides an efficient way of categorising locations into broad categories. As already

51

mentioned, the biggest problem with the CLC inventory is that it is based on land cover and therefore cannot take into account the multilayered nature of densely built cities. Thus, complementary methods and points of view are most likely needed in the future when working with the CLC inventory and linguistic analyses.

The multilingual Helsinki Metropolitan area is such a diverse area, that large scale studies such as this can offer only broad generalizations of the VLL of the area. Mapping the way languages are used in geotagged social media involves conducting studies of various spatial scales. Variation in individual locations is evident and therefore zooming in on specific types of locations is needed if we want to explore individual areas in more detail, as demonstrated by previous studies (Hiippala et al., 2019; Heikinheimo et al., 2020; Loikkanen, 2020). Furthermore, comparing the VLLs of other cities and areas of Finland with the results of this study would complement our understanding of the use of English, Finnish and other languages even more.

While this study focused mainly on the use of English, it is by no means the only language worth exploring. A central interest in the study of VLLs and LLs is the interaction of all the languages present in the given space. As I showed in Section 4.1., the Finnish capital region features a rich VLL. Studying the differences in linguistic diversity across the Finnish capital region as demonstrated by Heikinheimo et al. (2020) and Hiippala et al. (2020) would provide us with an even better sense of the linguistic richness of the area.

As a last note, collecting an up to date set of data for the purposes of this study would have been very exciting as the amounts of geotagged posts in the Finnish capital region showed a clear rising trend and the use of Instagram is nowadays more popular than ever. However, access to data from Instagram has become significantly more restricted after they closed their API. It is largely up to the social media platforms themselves to decide whether they want to give researchers access to their archives through their API services. As Bruns (2019) discusses, Facebook (the owner of Instagram among other social media platforms) in particular wants to manage its PR image, which is why their API services are closed and Facebook can decide for themselves what kind of research is done with their data and by whom. On the other hand, Twitter opened its entire archive for free for academic research in January 202112. Big data

12https://blog.twitter.com/developer/en_us/topics/tools/2021/enabling-the-future-of-academic-research-with-the- twitter-api.html

52

from social media is a valuable source for various fields of research, so let us hope major social media platforms continue to support academic research in the future.

6 Conclusion

In this MA thesis, I reported on an empirical study on the use of English on Instagram in the Finnish capital region. Building on previous research on VLL and ELF, I set out to explore the role of English in the study area by using geotagged social media data and methods of natural language processing and geoinformatics. Based on the geotagged Instagram data, the English language has a remarkably strong presence in the rich VLL of the Finnish capital region. Over a third of all users used English more than any other language, half used English at least once and out of the users who used more than one language over nine out of ten used English. The stable position of English as the second most used language throughout the study period indicates that the language has become a significant linguistic resource in the area. The spatial patterns of English use show that the language is used, in relation to other languages, most in the Helsinki city centre, western Helsinki and eastern Espoo and least in north-eastern Helsinki and Vantaa. The English language has a strong presence in essentially all the studied location types, especially in commercial and urban contexts. The use of English is especially high at the Helsinki-Vantaa airport whereas the language seems to be used least in various sports and leisure facilities, event arenas and stadiums.

The data and methods used in this study demonstrate how the study of Applied Linguistics as well as other fields of research can benefit from the application of existing computational research methodologies used in other fields of research (see e.g. Hiippala et al, 2019; Hiippala et al., 2020). The methods demonstrated here are relatively straightforward and by no means novel methods in the fields of geography and geoinformatics and more advanced methods hold even more potential and new perspectives. Especially with some collaboration linguists can easily learn and adapt new tools to use in their research.

7 References

Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical analysis, 27(2), 93-115.

53

Anselin, L. (2019). The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In Fischer, M., Scholten, H., Unwin, D. (Eds.), Spatial Analytical Perspectives on GIS (pp. 111–126). Routledge.

Anselin, L., Syabri, I., & Kho, Y. (2010). GeoDa: an introduction to spatial data analysis. In Handbook of applied spatial analysis (pp. 73-89). Springer, Berlin, Heidelberg.

Berezkina, M. (2018). ‘Language is a costly and complicating factor’: a diachronic study of language policy in the virtual public sector. Language Policy, 17(1), 55-75.

Blackwood, R. (2015). LL explorations and methodological challenges: Analysing France’s regional languages. Linguistic Landscape, 1(1-2), 38-53.

Blackwood, R. (2019). Language, images, and Paris Orly airport on Instagram: multilingual approaches to identity and self-representation on social media. International journal of multilingualism, 16(1), 7-24.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.

Bruns, A. (2019). After the ‘APIcalypse’: social media platforms and their fight against critical scholarly research. Information, Communication & Society, 22(11), 1544-1566.

Cenoz, J., & Gorter, D. (2006). Linguistic landscape and minority languages. International journal of multilingualism, 3(1), 67-80.

Coats, S. (2019). Language choice and gender in a Nordic social media corpus. Nordic Journal of Linguistics, 42(1), 31-55.

Cogo, A. (2017). ELF and multilingualism. In Jenkins, J., Baker, W., & Dewey, M. (Eds.), The Routledge handbook of English as a lingua franca (pp. 357-368). Routledge.

Coluzzi, P. (2017). Italian in the linguistic landscape of Kuala Lumpur (Malaysia). International Journal of Multilingualism, 14(2), 109-123.

Crampton, J. W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M. W., & Zook, M. (2013). Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb. Cartography and geographic information science, 40(2), 130-139.

54

Cvetojevic, S., Juhasz, L., & Hochmair, H. H. (2016). Positional accuracy of Twitter and Instagram images in urban environments. GI_Forum, 1, 191-203.

General Data Protection Regulation (GDPR). 2018. Available at: https://gdpr-info.eu/

Good, P. (2013). Permutation tests: a practical guide to resampling methods for testing hypotheses. Springer Science & Business Media.

Gorter, D. (2018). Methods and techniques for linguistic landscape research: About definitions, core issues and technological innovations. Expanding the Linguistic Landscape: Multilingualism, Language Policy and the Use of Space as a Semiotic Resource, 38, 57.

Hausmann, A., Toivonen, T., Slotow, R., Tenkanen, H., Moilanen, A., Heikinheimo, V., & Di Minin, E. (2018). Social media data can be used to understand tourists’ preferences for nature‐based experiences in protected areas. Conservation Letters, 11(1), e12343.

Heikinheimo, V., Tenkanen, H., Bergroth, C., Järv, O., Hiippala, T., & Toivonen, T. (2020). Understanding the use of urban green spaces from user-generated geographic information. Landscape and Urban Planning, 201, 103845.

Henricson, S. (2020). Aktivistinen kielimaisema: pilottitutkimus. AFinLAn vuosikirja, 95-114.

Hiippala, T., Hausmann, A., Tenkanen, H., & Toivonen, T. (2019). Exploring the linguistic landscape of geotagged social media content in urban environments. Digital Scholarship in the Humanities, 34(2), 290-309.

Hiippala, T., Väisänen, T. L. A., Toivonen, T., & Järv, O. (2020). Mapping the languages of Twitter in Finland: Richness and diversity in space and time. Neuphilologische Mitteilungen.

Hochmair, H. H., Juhász, L., & Cvetojevic, S. (2018). Data quality of points of interest in selected mapping and social media platforms. In LBS 2018: 14th International Conference on Location Based Services (pp. 293-313). Springer, Cham.

Hovy, D., Rahimi, A., Baldwin, T., & Brooke, J. (2019). Visualizing regional language variation across Europe on Twitter. Handbook of the Changing World Language Map, 3719-3742.

55

Hult, F. M. (2014). Drive-thru linguistic landscaping: Constructing a linguistically dominant place in a bilingual space. International Journal of Bilingualism, 18(5), 507-523.

Ivkovic, D., & Lotherington, H. (2009). Multilingualism in cyberspace: Conceptualising the virtual linguistic landscape. International Journal of Multilingualism, 6(1), 17-36.

Jenkins, J. (2017). Introduction. In Jenkins, J., Baker, W., & Dewey, M. (Eds.), The Routledge handbook of English as a lingua franca. Routledge.

Jenkins, J., Baker, W., & Dewey, M. (2017). The Routledge handbook of English as a Lingua Franca. Routledge.

Järv, O., Tominga, A., Müürisepp, K., & Silm, S. (2021). The impact of COVID-19 on daily lives of transnational people based on smartphone data: Estonians in Finland. Journal of Location Based Services, 1-29.

Kellerman, A. (2010). Mobile broadband services and the availability of instant access to cyberspace. Environment and planning A, 42(12), 2990-3005.

Kellerman, A. (2014) The Internet as Second Action Space. Routledge.

Kotsios, A., Magnani, M., Vega, D., Rossi, L., & Shklovski, I. (2019). An analysis of the consequences of the general data protection regulation on social network research. ACM Transactions on Social Computing, 2(3), 1-22.

Kytölä, S. (2013). Multilingual language use and metapragmatic reflexivity in Finnish internet football forums: a study in the sociolinguistics of globalization (No. 200). University of Jyväskylä.

Laitinen, M., Lundberg, J., Levin, M., & Martins, R. M. (2018). The Nordic Tweet Stream: A dynamic real-time monitor corpus of big and rich language data. In Mäkelä, E., Tolonen, M., Tuominen, J. (eds.), DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018 (pp. 349-362). CEUR- WS.org. CEUR Workshop Proceedings.

Wei, L. (2018). Translanguaging as a practical theory of language. Applied linguistics, 39(1), 9-30.

56

Loikkanen, K. (2020). Language Choice between English and Finnish: Insights from Geotagged Social Media (Master’s thesis). University of Helsinki.

Longley, P. A., Adnan, M., & Lansley, G. (2015). The geotemporal demographics of Twitter usage. Environment and Planning A, 47(2), 465-484.

Lopez, B. E., Magliocca, N. R., & Crooks, A. T. (2019). Challenges and opportunities of social media data for socio-environmental systems research. Land, 8(7), 107.

Lee, C. (2016). Multilingualism online. Taylor & Francis.

Leppänen, S., & Nikula, T. (2007). Diverse uses of English in Finnish society: Discourse- pragmatic insights into media, educational and business contexts. Multilingua, 26(4), 333-380.

Leppänen, S., Pitkänen-Huhta, A., Nikula, T., Kytölä, S., Törmäkangas, T., Nissinen, K., ... & Jousmäki, H. (2011). National survey on the English language in Finland: Uses, meanings and attitudes. Studies in Variation, Contacts and Change in English Vol.5, Helsinki: VARIENG.

Maly, I., & Blommaert, J. (2019). Digital Ethnographic Linguistic Landscape Analysis. Tilburg Papers in Culture Studies; No. 233. Available at: https://www.tilburguniversity.edu/sites/default/files/download/TPCS_233-Maly- Blommaert.pdf

Mauranen, A. (2017). Conceptualising ELF. In Jenkins, J., Baker, W., & Dewey, M. (Eds.), The Routledge handbook of English as a lingua franca (pp. 7-24). Routledge.

Moreno, M. A., Goniu, N., Moreno, P. S., & Diekema, D. (2013). Ethics of social media research: common concerns and practical considerations. Cyberpsychology, behavior, and social networking, 16(9), 708-713.

Papen, U. (2012). Commercial discourses, gentrification and citizens’ protest: The linguistic landscape of Prenzlauer Berg, Berlin. Journal of Sociolinguistics 16(1): 56–80.

Pennycook, A. (2010). Language as a local practice. Routledge.

Pennycook, A., & Otsuji, E. (2015). Metrolingualism: Language in the city. Routledge.

57

Pietikäinen, S., Lane, P., Salo, H., & Laihiala-Kankainen, S. (2011). Frozen actions in the Arctic linguistic landscape: A nexus analysis of language processes in visual space. International Journal of Multilingualism, 8(4), 277-298.

Purschke, C., & Hovy, D. (2019). Lörres, Möppes, and the Swiss: Discovering regional patterns in anonymous social media data. Journal of Linguistic Geography, 7(2), 113-134.

QGIS Development Team. (2021). QGIS geographic information system. Open source geospatial foundation project. http://qgis.osgeo.org

Sangiamchit, C. (2017). ELF in electronically mediated intercultural communication. In Jenkins, J., Baker, W., & Dewey, M. (Eds.), The Routledge handbook of English as a lingua franca (pp. 345-356). Routledge.

Shohamy, E. G., Rafael, E. B., & Barni, M. (Eds.). (2010). Linguistic landscape in the city. Multilingual Matters.

Soler-Carbonell, J. (2016). Complexity perspectives on linguistic landscapes: a scalar analysis. Linguistic Landscape, 2(1), 1-25.

Steiger, E., Westerholt, R., & Zipf, A. (2016). Research on social media feeds–A GIScience perspective. European Handbook of Crowdsourced Geographic Information, 237-254.

Taavitsainen, I., & Pahta, P. (2003). English in Finland: Globalisation, language awareness and questions of identity. English Today, 19(4), 3-15.

Taavitsainen, I., & Pahta, P. (2008). From global language use to local meanings: English in Finnish public discourse. English today, 24(3), 25-38.

Tasse, D., Liu, Z., Sciuto, A., & Hong, J. (2017). State of the geotags: Motivations and recent changes. Proceedings of the International AAAI Conference on Web and Social Media (ICWSM 2017). 250-259.

Tenkanen H. (2017). Capturing time in space: Dynamic analysis of accessibility and mobility to support spatial planning with open data and tools. Department of Geosciences and Geography A55. Unigrafia, Helsinki.

Tenkanen, H., Di Minin, E., Heikinheimo, V., Hausmann, A., Herbst, M., Kajala, L., & Toivonen, T. (2017). Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas. Scientific reports, 7(1), 1-11.

58

Thorne, S. L., & Ivković, D. (2015). Multilingual Eurovision meets plurilingual YouTube. Dialogue in multilingual and multimodal communities, 27, 167.

Tufi, S., & Blackwood, R. J. (2016). The linguistic landscape of the Mediterranean: French and Italian coastal cities. Springe

Woo, W. S., & Nora Riget, P. (2020). Linguistic landscape in Kuala Lumpur international airport, Malaysia. Journal of Multilingual and Multicultural Development, 1-20.

8 Appendices

The following tables present the top 15 location labels for each CLC class. These tables are included here to give the reader a general idea of the most popular locations labels in each land cover category. It is important to keep in mind that the location labels are essentially user generated names for locations that are associated with geographical coordinates that correspond to the intended location with varying degrees of accuracy.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Helsinki, Finland 8 183 50.5% 2 949 18.2% 5 085 31.4% 16 217 Messukeskus Helsinki 1 647 29.4% 3 563 63.6% 392 7.0% 5 602 Helsinki Senate Square 1 699 37.3% 788 17.3% 2 074 45.5% 4 561 Linnanmäki 968 21.7% 2 868 64.4% 619 13.9% 4 455 TAVASTIA-klubi 1 072 29.7% 2 357 65.2% 184 5.1% 3 613 Kiasma 1 410 42.1% 1 024 30.6% 914 27.3% 3 348 Helsinki 1 396 49.2% 227 8.0% 1 212 42.8% 2 835 The Circus 1 010 51.3% 803 40.8% 156 7.9% 1 969 Kauppakeskus Kamppi 340 19.1% 1 261 70.7% 183 10.3% 1 784 Stockmann 615 36.0% 641 37.6% 450 26.4% 1 706 792 48.5% 779 47.7% 62 3.8% 1 633 The Cock 1 052 64.5% 512 31.4% 66 4.0% 1 630 , Helsinki 757 49.7% 108 7.1% 657 43.2% 1 522 Finlandia-talo 661 45.3% 631 43.2% 167 11.4% 1 459 Ruoholahti 594 40.8% 664 45.6% 199 13.7% 1 457 Appendix A. Top 15 location labels in Commercial units.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Finland 2 588 56.1% 288 6.2% 1 738 37.7% 4 614 Punavuori 1 138 47.7% 1 109 46.5% 140 5.9% 2 387 Töölö 907 41.2% 1 176 53.5% 116 5.3% 2 199 Cafe Kokko 598 53.7% 473 42.5% 43 3.9% 1 114 Katajanokka 488 45.7% 459 43.0% 121 11.3% 1 068 Kannelmäki Hoodz 226 25.8% 592 67.5% 59 6.7% 877 Karhupuisto 305 36.3% 493 58.6% 43 5.1% 841

59

Arabianrannan Rantapuisto 347 42.9% 378 46.7% 84 10.4% 809 Kruununhaka 385 49.4% 361 46.3% 34 4.4% 780 LÄVISTYSLIIKE 441 62.0% 269 37.8% 1 0.1% 711 Gloria 202 29.1% 468 67.4% 24 3.5% 694 Espoo, Tapiola 272 39.2% 309 44.5% 113 16.3% 694 Koti 115 16.9% 543 79.9% 22 3.2% 680 Viikki 206 30.4% 390 57.5% 82 12.1% 678 TFW Helsinki 222 32.7% 452 66.7% 4 0.6% 678 Appendix B. Top 15 location labels in Urban fabric.

Other English - Finnish - Sentences languages - Total cunt Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Suomenlinna Island 2 958 41.8% 1 818 25.7% 2 298 32.5% 7 074 Kaapelitehdas 1 817 37.9% 2 715 56.6% 268 5.6% 4 800 Optimal Performance Center 103 4.6% 2 147 95.3% 4 0.2% 2 254 Suomenlinna 765 34.4% 1 028 46.3% 428 19.3% 2 221 Flow Festival Helsinki 709 47.5% 499 33.4% 284 19.0% 1 492 Suvilahti 421 42.5% 538 54.3% 31 3.1% 990 M/S Silja Serenade 174 19.7% 467 52.8% 244 27.6% 885 SuperPark Vantaa 156 19.7% 613 77.4% 23 2.9% 792 Suomenlinna Sveaborg 212 29.0% 386 52.7% 134 18.3% 732 Hernesaaren Ranta 203 31.3% 403 62.1% 43 6.6% 649 Angel Films Oy 44 7.1% 567 91.9% 6 1.0% 617 Siltanen 175 28.9% 407 67.2% 24 4.0% 606 Ääniwalli 187 31.3% 371 62.1% 39 6.5% 597 Radio NRJ Finland 24 4.1% 557 95.2% 4 0.7% 585 Kalasatama / Fiskehamnen 232 40.3% 325 56.5% 18 3.1% 575 Appendix C. Top 15 location labels in Industrial units.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Hartwall Arena 2 282 24.2% 6 124 65.0% 1 022 10.8% 9 428 Helsinki Ice Hall 1 177 38.6% 1 388 45.5% 485 15.9% 3 050 Helsinki Olympic Stadium 1 070 36.8% 1 244 42.8% 592 20.4% 2 906 Espoo Metro Areena 507 19.4% 1 974 75.7% 128 4.9% 2 609 Nordis, Stadi 354 20.8% 1 220 71.5% 132 7.7% 1 706 Sonera Stadium 254 19.7% 901 69.8% 135 10.5% 1 290 Aurinkolahden ranta 374 30.6% 602 49.2% 247 20.2% 1 223 Café Regatta 476 41.4% 397 34.5% 277 24.1% 1 150 Töölön Kisahalli 306 30.8% 652 65.6% 36 3.6% 994 Cafe Regatta (official) 400 42.4% 346 36.7% 198 21.0% 944 Tapiolan urheiluhalli 84 11.2% 657 87.7% 8 1.1% 749 Otahalli 126 17.4% 582 80.5% 15 2.1% 723 Kaisaniemen puisto 252 35.4% 346 48.6% 114 16.0% 712 Trio Areena 68 11.0% 539 86.8% 14 2.3% 621 Leppävaaran urheilupuisto 160 25.8% 436 70.3% 24 3.9% 620 Appendix D. Top 15 location labels in Sport and leisure areas and facilities.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total helsinki 6 961 51.0% 3 010 22.1% 3 673 26.9% 13 644 Espoo, Finland 4 526 51.5% 2 957 33.6% 1 313 14.9% 8 796

60

Vantaa, Finland 1 781 38.3% 1 947 41.8% 926 19.9% 4 654 Helsinki Central railway station 1 035 42.2% 551 22.5% 868 35.4% 2 454 Ullanlinna 893 44.9% 844 42.4% 252 12.7% 1 989 RUPLA 649 34.7% 1 206 64.5% 16 0.9% 1 871 Helsinki - Finland 734 51.2% 200 14.0% 499 34.8% 1 433 (Helsinki) 440 32.7% 842 62.6% 64 4.8% 1 346 Helsinki Sörnäinen 454 39.5% 612 53.3% 82 7.1% 1 148 Sinebrychoffin puisto 411 39.6% 569 54.8% 58 5.6% 1 038 371 38.6% 445 46.4% 144 15.0% 960 Kuudes Linja 235 25.5% 652 70.6% 36 3.9% 923 Kino Sheryl 76 8.9% 757 88.8% 19 2.2% 852 Suomen kansallismuseo - The National Museum of Finland 234 34.1% 301 43.9% 151 22.0% 686 Hotel Kämp 316 46.8% 219 32.4% 140 20.7% 675 Appendix E. Top 15 location labels in Road and rail networks and associated land.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Helsinki Airport 10 857 48.4% 5 317 23.7% 6 278 28.0% 22 452 Helsinki Airport T2 International 1 922 44.4% 1 521 35.1% 889 20.5% 4 332 Helsinki-Vantaa Airport T2 841 38.9% 853 39.4% 469 21.7% 2 163 Helsinki-Vantaa Airport T1 799 43.9% 660 36.3% 359 19.7% 1 818 Helsinki, Finland 629 51.2% 62 5.0% 538 43.8% 1 229 Helsinki-Vantaa Airport Terminal 2 432 36.2% 436 36.5% 326 27.3% 1 194 Helsinki Airport (HEL) 443 39.4% 417 37.1% 264 23.5% 1 124 Finnair Business Lounge 337 58.4% 138 23.9% 102 17.7% 577 Airplane 450 79.8% 4 0.7% 110 19.5% 564 The Oak Barrel Irish Pub 147 29.4% 330 66.0% 23 4.6% 500 Starbucks Helsinki Airport 192 45.3% 139 32.8% 93 21.9% 424 Helsinki-Malmin lentoasema 174 43.5% 193 48.3% 33 8.3% 400 Hilton Helsinki Airport 205 52.2% 70 17.8% 118 30.0% 393 Helsinki-Vantaa Airport (hel) Flygstationsvägen 1 01530 Vantaa Finland 128 43.8% 109 37.3% 55 18.8% 292 Finnair Premium Lounge 168 64.1% 42 16.0% 52 19.8% 262 Appendix F. Top 15 location labels in Airports.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Port of Helsinki 506 34.9% 47 3.2% 896 61.8% 1 449 helsinki 634 51.5% 232 18.8% 365 29.7% 1 231 M/S Baltic Queen 132 14.7% 668 74.3% 99 11.0% 899 Länsiterminaali 171 23.3% 332 45.2% 231 31.5% 734 Tallink Silja M/S Superstar 218 30.9% 203 28.8% 285 40.4% 706 SkyWheel Helsinki 242 35.2% 118 17.2% 328 47.7% 688 Vanha Kauppahalli 263 46.3% 167 29.4% 138 24.3% 568 Viking Line M/S Mariella 117 23.9% 189 38.6% 184 37.6% 490

61

Olympia Terminal 89 25.1% 148 41.7% 118 33.2% 355 MS Viking XPRS 71 23.7% 123 41.1% 105 35.1% 299 Halkolaituri 112 39.0% 125 43.6% 50 17.4% 287 M/S Silja Symphony 67 24.3% 104 37.7% 105 38.0% 276 Tallink Star 62 23.9% 64 24.7% 133 51.4% 259 Finnair Skywheel 108 43.0% 58 23.1% 85 33.9% 251 Katajanokan terminaali 52 21.4% 111 45.7% 80 32.9% 243 Appendix G. Top 15 location labels in Port areas.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Sibelius Monument 588 36.6% 134 8.3% 884 55.0% 1 606 Töölönlahti 332 44.1% 362 48.1% 58 7.7% 752 Kaivopuisto, Helsinki / Brunnsparken, Helsingfors 315 45.6% 308 44.6% 68 9.8% 691 Lauttasaari, Etelä- Suomen Lääni, Finland 222 37.2% 338 56.6% 37 6.2% 597 Viikin Pellot / Viikki Fields 202 34.2% 335 56.8% 53 9.0% 590 Kaivopuisto / Brunnsparken 192 38.4% 252 50.4% 56 11.2% 500 Lauttasaari, Drumsö 170 34.5% 261 52.9% 62 12.6% 493 Alppipuisto 91 24.3% 276 73.8% 7 1.9% 374 Malminkartanon Portaat 82 24.9% 230 69.9% 17 5.2% 329 Lauttasaari / Drumsö 124 40.9% 157 51.8% 22 7.3% 303 Bergga 92 32.6% 182 64.5% 8 2.8% 282 Katri Valan puisto 94 33.7% 176 63.1% 9 3.2% 279 Esplanadin puisto / Esplanadparken 88 36.2% 110 45.3% 45 18.5% 243 Torkkelinmäki 85 35.3% 151 62.7% 5 2.1% 241 Sibeliuspuisto 69 29.5% 107 45.7% 58 24.8% 234 Appendix H. Top 15 location labels in Green urban areas.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Lauttasaari 1 253 44.4% 1 393 49.4% 176 6.2% 2 822 Nuuksio National Park 908 52.6% 578 33.5% 239 13.9% 1 725 Seurasaari 711 45.1% 616 39.1% 248 15.7% 1 575 , Helsinki, Finland 276 30.3% 266 29.2% 369 40.5% 911 Kauppatori 259 35.6% 245 33.7% 224 30.8% 728 Suomenlinnan lautta - Suomenlinna ferry 193 30.9% 184 29.5% 247 39.6% 624 Pasila 219 38.6% 321 56.5% 28 4.9% 568 Kivinokka 179 35.6% 304 60.4% 20 4.0% 503 Espoon Keskuspuisto 159 32.9% 268 55.4% 57 11.8% 484 Seurasaari - Fölisön 191 40.5% 210 44.5% 71 15.0% 472 , Helsinki 200 47.7% 146 34.8% 73 17.4% 419 Herttoniemenranta / Hertonäs strand 183 45.0% 210 51.6% 14 3.4% 407 Uutelan Luonnonpuisto 234 62.4% 122 32.5% 19 5.1% 375 Keskuspuisto 149 40.1% 215 57.8% 8 2.2% 372 Hietaniemi 147 40.5% 163 44.9% 53 14.6% 363 Appendix I. Top 15 location labels in Forests and semi-natural areas.

62

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Market Square, Helsinki 1 555 49.0% 387 12.2% 1 231 38.8% 3 173 Tokoinranta 370 36.4% 608 59.8% 39 3.8% 1 017 Eiranranta 302 43.6% 342 49.4% 49 7.1% 693 Haukilahden ranta 272 39.7% 347 50.7% 66 9.6% 685 Kaivopuiston ranta 282 45.9% 291 47.3% 42 6.8% 615 Kruunuvuori "Ghost Town" 230 50.8% 208 45.9% 15 3.3% 453 Skiffer 153 43.0% 193 54.2% 10 2.8% 356 Vanhankaupunginkoski / Gammelstadsforsen 109 31.6% 225 65.2% 11 3.2% 345 Tuusula, Finland 99 30.7% 191 59.1% 33 10.2% 323 Matinkylän uimaranta 120 37.3% 156 48.4% 46 14.3% 322 Hakaniemenranta 95 29.9% 199 62.6% 24 7.5% 318 South Harbour, Helsinki 153 49.4% 16 5.2% 141 45.5% 310 Gulf Of Finland 163 60.6% 15 5.6% 91 33.8% 269 Harbour 94 36.0% 127 48.7% 40 15.3% 261 Rajasaaren koirapuisto 104 42.6% 106 43.4% 34 13.9% 244 Appendix J. Top 15 location labels in Water bodies and wetlands.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Jätkäsaari 506 32.2% 999 63.5% 67 4.3% 1 572 Pasila, Etelä-Suomen Lääni, Finland 66 46.8% 69 48.9% 6 4,3% 141 Mad House 40 58.0% 26 37.7% 3 4.3% 69 Länsisatama Car Check- in 9 18.4% 32 65.3% 8 16.3% 49 Make Your Mark Gallery 27 58.7% 18 39.1% 1 2.2% 46 Mad House Helsinki 8 19.5% 33 80.5% 0 0.0% 41 Korkeasaari 11 47.8% 7 30.4% 5 21.7% 23 West Harbour, Helsinki 8 42.1% 1 5.3% 10 52.6% 19 Kalafornia 9 47.4% 10 52.6% 0 0.0% 19 Ämmässuon Jätteenkäsittelykeskus 4 22.2% 13 72.2% 1 5.6% 18 Verkkosaari 6 35.3% 10 58.8% 1 5.9% 17 Kuninkaantammi 0 0.0% 17 100% 0 0.0% 17 Kalasatama Hoodzzz 5 31.3% 9 56.3% 2 12.5% 16 Lataa GoGolf App IOS, Android & Windows phone 0 0.0% 15 93.8% 1 6.3% 16 Ravintola Lämpö 3 23.1% 10 76.9% 0 0.0% 13 Maijan Kahvila @ Kivinokka 1 7.7% 12 92.3% 0 0.0% 13 IHANA BAARI 7 53.8% 6 46.2% 0 0.0% 13 Appendix K. Top 15 locations in Mine, dump and construction sites.

Other English - Finnish - Sentences languages - Total count Sentences percentage Sentences percentage Location label in other percentage of in English of location in Finnish of location languages of location sentences total total total Haukkalampi, Nuuksio. 61 32.1% 112 58.9% 17 8.9% 190 Latokaski, Etelä- Suomen Lääni, Finland 171 91.4% 13 7.0% 3 1.6% 187 Nurmijärvi 59 42.4% 79 56.8% 1 0.7% 139 Pakkalanrinne 15 25.9% 39 67.2% 4 6.9% 58

63

Haltiala 32 59.3% 21 38.9% 1 1.9% 54 Biokeskus 3 0 0.0% 30 73.2% 11 26.8% 41 Sotunki, Etelä-Suomen Lääni, Finland 13 43.3% 15 50.0% 2 6.7% 30 Kartanon Marja 8 33.3% 14 58.3% 2 8.3% 24 Viikin Ponitalli 3 13.0% 12 52.2% 8 34.8% 23  koti 1 5.3% 17 89.5% 1 5.3% 19 Oulunkylän Siirtolapuutarha 5 26.3% 9 47.4% 5 26.3% 19 , Etelä-Suomen Lääni, Finland 4 22.2% 14 77.8% 0 0.0% 18 Arctic Choc 13 72.2% 5 27.8% 0 0.0% 18 7 38.9% 10 55.6% 1 5.6% 18 Tullisaari 10 55.6% 8 44.4% 0 0.0% 18 Appendix L. Top 15 locations in Agricultural areas.

64