Language Change and Evolution in Online Social Networks
Total Page:16
File Type:pdf, Size:1020Kb
Language change and evolution in Online Social Networks Daniel James Kershaw, BSc (Hons), MRes Highwire Centre for Doctoral Training School of Computing and Communications Lancaster University Submitted for the degree of Doctor of Philosophy October, 2018 ii Declaration The material presented in this thesis is the result of original research by the named author, Daniel James Kershaw, under the supervision of Dr. Matthew Rowe, Dr. Patrick Stacey and Dr. Anastasios Noulas; carried in the Highwire Doctoral Training Centre at Lancaster University. This work was funded by the Engineering and Physical Sciences Research Council(EPSRC). Parts of this thesis are based upon prior publications by the author, listed below|all other sources, if quoted, are attributed accordingly in the body of the text. 1. D. Kershaw, M. Rowe, and P. Stacey, \Language innovation and change in on-line social networks", in Proceedings of the 26th ACM Conference on Hypertext & Social Media, ser. HT '15, Guzelyurt, Northern Cyprus: ACM, 2015, pp. 311{314, isbn: 978-1-4503-3395-5. doi: 10.1145/2700171. 2804449. [Online]. Available: http://doi.acm.org/10.1145/2700171.2804449 2. D. Kershaw, M. Rowe, and P. Stacey, \Towards modelling language innovation acceptance in online social networks", in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, ser. WSDM '16, San Francisco, California, USA: ACM, 2016, pp. 553{562, isbn: 978-1-4503-3716-8. doi: 10.1145/2835776.2835784. [Online]. Available: http://doi.acm.org/10. 1145/2835776.2835784 3. D. Kershaw, M. Rowe, A. Noulas, et al., \Birds of a feather talk together: user influence on language adoption", in Hawaii International Conference on System Sciences, Hawaii International Conference on System Sciences, Jan. 2017, isbn: 9780998133102. doi: 10.24251/HICSS.2017.225. [Online]. Available: http://hdl.handle.net/10125/41379 iii iv Funding Statement This work is funded by the Digital Economy programme (RCUK Grant EP/G037582/1), which supports the HighWire Centre for Doctoral Training (http://highwire.lancaster.ac.uk). v Language Change and Evolution in Online Social Networks Daniel James Kershaw Abstract Language is in constant flux, whether through the creation of new terms or the changing meanings of existing words. The process by which language change happens is through complex reinforcing inter- actions between individuals and the social structures in which they exist. There has been much research into language change and evolution [191], though this has involved manual processes that are both time consuming and costly. However, with the growth in popularity of Online Social Networks (OSNs), for the first time, researchers have access to fine-grained records of language and user interactions that not only contain data on the creation of these language innovations but also reveal the inter-user and inter- community dynamics that influence their adoptions and rejections. Having access to these OSN datasets means that language change and evolution can now be assessed and modelled through the application of computational and machine-learning-based methods. Therefore, this thesis looks at how one can detect and predict language change in OSN, as well as the factors that language change depends on. The an- swer to this over-arching question lies in three core components: first, detecting the innovations; second, modelling the individual user adoption process; and third, looking at the collective adoption across a network of individuals. In the first question, we operationalise traditional language acceptance heuristics (used to detect the emergence of new words) into three classes of computation time-series measures computing the variation in frequency, form and/or meaning. The grounded methods are applied to two OSNs, with results demonstrating the ability to detect language change across both networks. By additionally applying the methods to communities within each network, e.g. geographical regions, on Twitter and Subreddits in Reddit, the results indicate that language variation and change can be dependent on the community memberships. The second question in this thesis focuses on the process of users adopting language innovations in relation to other users with whom they are in contact. By modelling influence between users as a function of past innovation cascades, we compute a global activation threshold at which users adopt new terms dependent on exposure to them from their neighbours. Additionally, by testing the user interaction networks through random shuffles, we show that the time at which a user adopts a term is dependent on the local structure; however, a large part of the influence comes from sources external to the observed OSN. The final question looks at how the speakers of a language are embedded in social networks, and how vi the networks' resulting structures and dynamics influence language usage and adoption patterns. We show that language innovations diffuse across a network in a predictable manner, which can be modelled using structural, grammatical and temporal measures, using a logistic regression model to predict the vitality of the diffusion. With regard to network structure, we show how innovations that manifest across structural holes and weak ties diffuse deeper across the given network. Beyond network influence, our results demonstrate that the grammatical context through which innovations emerge also play an essential role in diffusion dynamics - this indicates that the adoption of new words is enabled by a complex interplay of both network and linguistic factors. The three questions are used to answer the over-arching question, showing that one can, indeed, model language change and forecast user and community adoption of language innovations. Additionally, we also show the ability to apply grounded models and methods and apply them within a scalable computational framework. However, it is a challenging process that is heavily influenced by the underlying processes that are not recorded within the data from the OSNs. vii viii Contents 1 Introduction 1 1.1 Research Questions........................................3 1.1.1 Question 1........................................3 1.1.2 Question 2........................................3 1.1.3 Question 3........................................4 1.1.4 Question 4........................................4 1.1.5 Overview.........................................5 1.2 Problem Relevance........................................5 1.3 Thesis Outline..........................................6 1.4 Publication List..........................................7 1.5 Invited Talks...........................................8 1.6 Press Clippings..........................................9 I Background and Methodology 11 2 Language Change and Social Structure 15 2.1 Language and Change...................................... 15 2.2 Language and the Internet.................................... 19 2.3 Language and Social Structure................................. 20 2.4 Summary............................................. 33 3 Social Computing 35 3.1 Innovation Detection....................................... 35 3.2 Innovation Adoption....................................... 41 3.3 Innovation Diffusion....................................... 46 3.4 Summary............................................. 51 ix 4 Research Methodology 55 4.1 Introduction............................................ 55 4.2 Epistemology and Paradigm................................... 56 4.3 Research Framework....................................... 57 4.4 Datasets and Data Collection.................................. 60 4.4.1 Twitter.......................................... 62 4.4.2 Reddit........................................... 65 4.4.3 Data Collection...................................... 66 4.4.4 Limitations........................................ 69 4.5 Research Methods........................................ 70 4.5.1 Innovation Detection................................... 71 4.5.2 User Innovation Adoption................................ 72 4.5.3 Innovation Propagation................................. 73 4.5.4 Summary......................................... 74 4.6 Ethics............................................... 74 4.7 Technical Implementation.................................... 76 4.8 Summary............................................. 78 5 Data Prepossessing and Social Networks 79 5.1 Introduction............................................ 79 5.2 Social Networks.......................................... 79 5.2.1 Graph Notation...................................... 80 5.2.2 User Interaction..................................... 81 5.2.3 Community Interaction................................. 87 5.2.4 Backbone Extraction................................... 95 5.2.5 Community Detection.................................. 96 5.3 Innovation identification..................................... 98 5.3.1 Timings.......................................... 99 5.4 Summary............................................. 100 II Contributions 103 6 Detection of Language Innovations 105 6.1 Introduction............................................ 105 6.2 Related Work........................................... 106 6.3 Language Innovation and Acceptance.............................. 108 x 6.4 Methods.............................................. 109 6.4.1 Data............................................ 110 6.4.2 Data Grouping...................................... 111 6.5 Operationalisation.......................................