Bridging Big Data and Qualitative Methods in the Social Sciences: a Case Study of Twitter Responses to High Profile Deaths by Suicide

King’s Research Portal DOI: 10.1016/j.osnem.2017.01.002 Document Version Publisher's PDF, also known as Version of record Link to publication record in King's Research Portal Citation for published version (APA): Karamshuk, D., Shaw, F., Brownlie, J., & Sastry, N. (2017). Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide. Online Social Networks and Media, 1, 33-43. https://doi.org/10.1016/j.osnem.2017.01.002 Citing this paper Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections. General rights Copyright and moral rights for the publications made accessible in the Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognize and abide by the legal requirements associated with these rights. •Users may download and print one copy of any publication from the Research Portal for the purpose of private study or research. •You may not further distribute the material or use it for any profit-making activity or commercial gain •You may freely distribute the URL identifying the publication in the Research Portal Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 27. Sep. 2021 Online Social Networks and Media 1 (2017) 33–43 Contents lists available at ScienceDirect Online Social Networks and Media journal homepage: www.elsevier.com/locate/osnem Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide ∗ ∗ ∗ Dmytro Karamshuk a, , Frances Shaw b, Julie Brownlie b, , Nishanth Sastry a, a King’s College London, London WC2R 2LS, UK b University of Edinburgh, Edinburgh EH8 9JU, UK a r t i c l e i n f o a b s t r a c t Article history: With the rise of social media, a vast amount of new primary research material has become available to Received 14 December 2016 social scientists, but the sheer volume and variety of this make it difficult to access through the tradi- Revised 23 January 2017 tional approaches: close reading and nuanced interpretations of manual qualitative coding and analysis. Accepted 25 January 2017 This paper sets out to bridge the gap by developing semi-automated replacements for manual coding through a mixture of crowdsourcing and machine learning, seeded by the development of a careful man- Keywords: ual coding scheme from a small sample of data. To show the promise of this approach, we attempt to Social media create a nuanced categorisation of responses on Twitter to several recent high profile deaths by suicide. Crowd-sourcing Through these, we show that it is possible to code automatically across a large dataset to a high degree Crowdflower of accuracy (71%), and discuss the broader possibilities and pitfalls of using Big Data methods for Social Natural language processing Science. Social science Emotional distress ©2017 The Authors. Published by Elsevier B.V. High-profile suicides This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) Public empathy 1. Introduction cess everything using either the detailed analysis methods of qualitative research or the application of manual coding approaches of Social science has always had to find ways of moving between the kind used in survey research. In short, there are exciting new the small-scale, interpretative concerns of qualitative research and possibilities but also significant challenges. the large-scale, often predictive concerns of the quantitative. The For instance, when celebrities die, or deaths become politicised quantitative end of that spectrum has traditionally had two inter- or public in some fashion, hundreds of thousands or even millions related features: active collection of data and creating a suitable of tweets may result. How can some of the traditional concerns sub-sample of the wider population. To the extent that such meth- of social science –with interpretation (nuance), meaning and so- ods have also captured open-ended or qualitative data, the solu- cial relationships –be pursued within this deluge of largely decon- tion has been to apply manual coding, using a frame developed on textualised communication? Whereas Big Data methods can eas- the back of intensive qualitative analysis or an exhaustive coding ily count the number of tweets, or even attach a ‘sentiment score’ of a smaller sample of responses. Although labour-intensive, man- to individual tweets, it is less clear whether existing methods can ual coding has been critical for obtaining a nuanced understanding identify issues such as the presence of or lack of empathy. And yet of complex social issues. the application of traditional methods from qualitative social sci- Social media has created vast amounts of potential qualitative ence, such as the close analysis of a small-scale sample of tweets research material –in the form of the observations and utterances relating to a public death, or the manual application of a coding of its population of users –that social scientists cannot ignore. Un- frame to a larger volume of responses, are likely to miss crucial in- like the responses to survey questions, such material is not elicited sights relating to the volume, patterning or dynamics. We therefore as part of the research process, nor is its volume limited by the need a mechanism to train the social scientists’ close lens on un- constraints and practicalities of the sample survey. With social me- manageably large datasets –to bridge the gap between close read- dia, we now have so much information that it is impossible to pro- ings and large scale patterning. This paper develops a possible approach, that we term semi- automated coding: Our three-step method first manually boot- ∗ Corresponding authors. straps a coding scheme from a micro-scale sample of data, then E-mail addresses: [email protected] (D. Karamshuk), FrancesShaw@ uses a crowdsourcing platform to achieve a meso-scale model, gmail.com (F. Shaw), [email protected] (J. Brownlie), Nishanth.Sastry@kcl. ac.uk (N. Sastry). and finally applies machine learning to build a macro-scale model. http://dx.doi.org/10.1016/j.osnem.2017.01.002 2468-6964/© 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) 34 D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 The bootstrapping is carefully done by trained researchers, cre- data creates new challenges because of its temporality and breadth ating the nuanced coding scheme necessary for answering social (unlike, for example, survey data which tends to be in response to science questions, and providing an initial ‘golden set’ of labelled specific questions). The main contribution of this paper is the pro- data. Crowdsourcing expands the labels to a larger dataset using posed methodology, mixing machine-learning and crowd-sourcing, untrained workers. The quality of crowd-generated labels is en- and using multiple levels of validation and refinement, to achieve sured by checking agreement among crowdworkers and between a high degree of accuracy in coding nuanced concepts such as the crowd workers’ labels and the golden set. This larger labelled mourning and lack of empathy. dataset is then used to train a supervised machine learning model The practice of employing crowd-workers to manually label that automatically labels the entire dataset. tweets has a short but rich history. Crowdsourcing has been recog- We argue that this approach has particular potential for the nised as a valuable research tool in numerous previous works [11– study of emotions at scale. Emotions have a mutable quality 15] . A comprehensive review of this literature has been provided [1] and this is especially true in the context of social media. Thus, in [13] which –among others – recognises the impact of the job intensive manual coding over a small-scale sample may miss some design on the efficiency of crowd-computations. For instance, Wil- of the temporal and volume dynamics that would be critical for lett et al. in [15] describe a crowd-sourcing design for collecting a full sociological understanding of public expressions of emotion, surprising information in charts, [14] propose a design for online in contrast to the semi-automated coding we propose here, which performance evaluations of user interfaces, etc. Our paper con- captures the entire dataset and its dynamics. tributes to this body of work by proposing a decision tree-based As a case study in applying semi-automated coding, this paper design for crowd-sourcing typologies of social-media posts with looks at public empathy –the expression of empathy that, even if built-in prioritisation of the coding process to meet the aims of it is imagined to be directed at one other person [2] , can poten- the social inquiry being carried out. tially be read by many –in the context of high-profile deaths by Last, but not least, the methods developed here build on recent suicide. Five cases were chosen which had a high rate of public advances in applying artificial neural networks for natural language response on Twitter, with the aim of exploring what types of re- processing of short texts [16] .

Bridging Big Data and Qualitative Methods in the Social Sciences: a Case Study of Twitter Responses to High Profile Deaths by Suicide

“My Voice Speaks for Itself”: the Experiences of Three Transgender Students in Secondary School Choral Programs

Civil Rights Staff Ze-Emanuel Hailu, Counsel Sheila Johnson, Finance Analyst

LGBTQ Singers in the Choral Classroom

Remembering Leelah Alcorn

Creating Trans-Inclusive Schools: Introductory Activities That Enhance the Critical Consciousness of Future Educators

Amicus Brief

Lives Matter: Preventing Suicide in Trans* Youth

Transgender Women and Detransitioning After Death

Improving Health Care for Transgender People

“Gays in Space!” a Qualitative Investigation of Youth Queer Narrative Receptio

TRANSGENDER MEDICINE for ADOLESCENTS and YOUNG ADULTS Speaker: Katherine Blumoff Greenberg, MD

OSNEM Bridging Big Data and Qualitative Methods in the Social