King’s Research Portal

DOI: 10.1016/j.osnem.2017.01.002

Document Version Publisher's PDF, also known as Version of record

Link to publication record in King's Research Portal

Citation for published version (APA): Karamshuk, D., Shaw, F., Brownlie, J., & Sastry, N. (2017). Bridging big data and qualitative methods in the social sciences: A case study of responses to high profile deaths by suicide. Online Social Networks and Media, 1, 33-43. https://doi.org/10.1016/j.osnem.2017.01.002

Citing this paper Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections. General rights Copyright and moral rights for the publications made accessible in the Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognize and abide by the legal requirements associated with these rights.

•Users may download and print one copy of any publication from the Research Portal for the purpose of private study or research. •You may not further distribute the material or use it for any profit-making activity or commercial gain •You may freely distribute the URL identifying the publication in the Research Portal Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 27. Sep. 2021 Online Social Networks and Media 1 (2017) 33–43

Contents lists available at ScienceDirect

Online Social Networks and Media

journal homepage: www.elsevier.com/locate/osnem

Bridging big data and qualitative methods in the social sciences: A case study of Twitter responses to high profile deaths by suicide

∗ ∗ ∗ Dmytro Karamshuk a, , Frances Shaw b, Julie Brownlie b, , Nishanth Sastry a, a King’s College , London WC2R 2LS, UK b University of Edinburgh, Edinburgh EH8 9JU, UK

a r t i c l e i n f o a b s t r a c t

Article history: With the rise of , a vast amount of new primary research material has become available to Received 14 December 2016 social scientists, but the sheer volume and variety of this make it difficult to access through the tradi-

Revised 23 January 2017 tional approaches: close reading and nuanced interpretations of manual qualitative coding and analysis. Accepted 25 January 2017 This paper sets out to bridge the gap by developing semi-automated replacements for manual coding through a mixture of crowdsourcing and machine learning, seeded by the development of a careful man- Keywords: ual coding scheme from a small sample of data. To show the promise of this approach, we attempt to Social media create a nuanced categorisation of responses on Twitter to several recent high profile deaths by suicide.

Crowd-sourcing Through these, we show that it is possible to code automatically across a large dataset to a high degree

Crowdflower of accuracy (71%), and discuss the broader possibilities and pitfalls of using Big Data methods for Social Natural language processing Science. Social science Emotional distress ©2017 The Authors. Published by Elsevier B.V. High-profile suicides This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) Public empathy

1. Introduction cess everything using either the detailed analysis methods of qual- itative research or the application of manual coding approaches of Social science has always had to find ways of moving between the kind used in survey research. In short, there are exciting new the small-scale, interpretative concerns of qualitative research and possibilities but also significant challenges. the large-scale, often predictive concerns of the quantitative. The For instance, when celebrities die, or deaths become politicised quantitative end of that spectrum has traditionally had two inter- or public in some fashion, hundreds of thousands or even millions related features: active collection of data and creating a suitable of tweets may result. How can some of the traditional concerns sub-sample of the wider population. To the extent that such meth- of social science –with interpretation (nuance), meaning and so- ods have also captured open-ended or qualitative data, the solu- cial relationships –be pursued within this deluge of largely decon- tion has been to apply manual coding, using a frame developed on textualised communication? Whereas Big Data methods can eas- the back of intensive qualitative analysis or an exhaustive coding ily count the number of tweets, or even attach a ‘sentiment score’ of a smaller sample of responses. Although labour-intensive, man- to individual tweets, it is less clear whether existing methods can ual coding has been critical for obtaining a nuanced understanding identify issues such as the presence of or lack of empathy. And yet of complex social issues. the application of traditional methods from qualitative social sci- Social media has created vast amounts of potential qualitative ence, such as the close analysis of a small-scale sample of tweets research material –in the form of the observations and utterances relating to a public death, or the manual application of a coding of its population of users –that social scientists cannot ignore. Un- frame to a larger volume of responses, are likely to miss crucial in- like the responses to survey questions, such material is not elicited sights relating to the volume, patterning or dynamics. We therefore as part of the research process, nor is its volume limited by the need a mechanism to train the social scientists’ close lens on un- constraints and practicalities of the sample survey. With social me- manageably large datasets –to bridge the gap between close read- dia, we now have so much information that it is impossible to pro- ings and large scale patterning. This paper develops a possible approach, that we term semi- automated coding: Our three-step method first manually boot- ∗ Corresponding authors. straps a coding scheme from a micro-scale sample of data, then

E-mail addresses: [email protected] (D. Karamshuk), FrancesShaw@ uses a crowdsourcing platform to achieve a meso-scale model, gmail.com (F. Shaw), [email protected] (J. Brownlie), Nishanth.Sastry@kcl. ac.uk (N. Sastry). and finally applies machine learning to build a macro-scale model. http://dx.doi.org/10.1016/j.osnem.2017.01.002 2468-6964/© 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) 34 D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43

The bootstrapping is carefully done by trained researchers, cre- data creates new challenges because of its temporality and breadth ating the nuanced coding scheme necessary for answering social (unlike, for example, survey data which tends to be in response to science questions, and providing an initial ‘golden set’ of labelled specific questions). The main contribution of this paper is the pro- data. Crowdsourcing expands the labels to a larger dataset using posed methodology, mixing machine-learning and crowd-sourcing, untrained workers. The quality of crowd-generated labels is en- and using multiple levels of validation and refinement, to achieve sured by checking agreement among crowdworkers and between a high degree of accuracy in coding nuanced concepts such as the crowd workers’ labels and the golden set. This larger labelled mourning and lack of empathy. dataset is then used to train a supervised machine learning model The practice of employing crowd-workers to manually label that automatically labels the entire dataset. tweets has a short but rich history. Crowdsourcing has been recog- We argue that this approach has particular potential for the nised as a valuable research tool in numerous previous works [11– study of emotions at scale. Emotions have a mutable quality 15] . A comprehensive review of this literature has been provided [1] and this is especially true in the context of social media. Thus, in [13] which –among others – recognises the impact of the job intensive manual coding over a small-scale sample may miss some design on the efficiency of crowd-computations. For instance, Wil- of the temporal and volume dynamics that would be critical for lett et al. in [15] describe a crowd-sourcing design for collecting a full sociological understanding of public expressions of emotion, surprising information in charts, [14] propose a design for online in contrast to the semi-automated coding we propose here, which performance evaluations of user interfaces, etc. Our paper con- captures the entire dataset and its dynamics. tributes to this body of work by proposing a decision tree-based As a case study in applying semi-automated coding, this paper design for crowd-sourcing typologies of social-media posts with looks at public empathy –the expression of empathy that, even if built-in prioritisation of the coding process to meet the aims of it is imagined to be directed at one other person [2] , can poten- the social inquiry being carried out. tially be read by many –in the context of high-profile deaths by Last, but not least, the methods developed here build on recent suicide. Five cases were chosen which had a high rate of public advances in applying artificial neural networks for natural language response on Twitter, with the aim of exploring what types of re- processing of short texts [16] . Specifically, we investigate how to sponse were more or less common in the space of public Twitter, adapt this approach for automating nuanced multivariate classifi- and what factors might affect these responses. cation of public mourning related social media posts. This paper primarily focuses on the methodological challenges The underlying social science research is informed by work in of this research through an engagement with emergent findings social science and media studies on public mourning and grieving , and concludes by considering its potential use for interdisciplinary particularly on social media. Previous studies have, for example, computational social science. A key issue, both within the case looked at the discussion of death and grief on Twitter following study, and more generally, for the success of semi-automated cod- a violent tragedy [17] . Social media responses to the deaths of ing as an approach, is the accuracy of the automatically generated celebrities, and to deaths that have received public attention for labels. One source of error is the quality of crowd-generated la- other reasons, have also been examined [18–20] . Whereas previous bels. As mentioned above, we control for this using different forms studies have looked at communal grief and individual mourning in of agreement, among crowd workers, and with a curated golden untimely deaths such as that of Michael Jackson [18,21] , this paper set. However, our initial attempts on Crowdflower did not generate aims to interrogate discourses and practices around suicide in me- a good level of agreement. On closer analysis, we discovered that diated mourning, an area in which there has been much less of a the crowdworkers were confused by the nuanced classification ex- focus to date. pected of them. To help them, we developed a second innovation, giving them a decision tree ( Fig. 1 ) to guide their coding. This re- 3. Background and approach sulted in around 60% of tweets with agreement. Our tests show that the final machine generated labels agree with the crowd labels As mentioned, we use the study of public expression of empa- with an accuracy of 71%, which permits nuanced interpretations. thy in the face of high-profile suicides as a case study for testing Although this is over 5.6x times the accuracy of random baseline, the feasibility of semi-automated coding. Below we first describe we still need to reconcile the social side of research interpretations the suicides we study, and the datasets that we examine relating with the potentially faulty automatic classification. We allow for to these deaths. Then we outline our philosophy and approach to this by explicitly quantifying the errors in each of the labels, and developing semi-automated coding. drawing interpretations that still stand despite a margin of safety corresponding to these errors. 3.1. Datasets 2. Related literature To analyse public discourses on social media relating to high- The transformative potential of Big Data for social science is profile death by suicides, we chose five such deaths which were now widely recognised, [3,4] with social and emotional phenom- highly publicised, either because the person was famous before ena ranging from suicidal expression [5] and cyber hate [6] inves- their death or because of the circumstances of their death. We tigated through computational social scientific approaches. How- were interested in the range of reactions, from mourning and trib- ever, epistemological and methodological challenges [7,8] remain, utes, to activism and actions, that were elicited in public Twitter and there is an active debate about several aspects of the use of conversations relating to these deaths. Below, we provide some Big Data methods in social science. One critical question is whether context about each of the cases: and how Big Data methods can scale up from small samples to big data in relation to complex social practices that may require close 1. Aaron Swartz, at the time of his death by suicide in 2013, analysis and nuanced interpretation. was under federal indictment for data theft, relating to an Our proposed solution for scaling up is to automate some of action he undertook to automatically download academic the manual research process involved in social science coding prac- journal articles from the online database JSTOR at MIT. Pros- tices. Although previous efforts have looked at assisting social sci- ecutors and MIT were criticised by his family and others ence through automated coding of dates and events in data [9] and after his death. Some critics engaged in hacktivist activi- even open-ended survey responses [10] , coding of social media- ties, others suggested the federal prosecutors had engaged in D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 35

Fig. 1. CrowdFlower job designed as a decision tree. CrowdFlower workers were asked to follow a sequence of binary decisions from the decision tree to label each tweet. 36 D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43

Table 1 would traditionally be met through manual coding, or classification

Description of the case studies and datasets. All datasets consist of the responses through a frame developed after intensive quali- of tweets in English language for the first 20 days from the

date indicated in the table. tative analysis. Although coding has been a mainstay of social sci- ence research, it becomes difficult to apply this at scale given the

Case study From Size Sampled volume of Tweets in Table 1 . 6 The social scientist’s typical alterna- Amanda Todd 2012-10-11 553,664 Full tive would be to select and focus on a small sample of the dataset. Leelah Alcorn 2014-12-30 390,561 Full Unfortunately, this is not a fully satisfactory solution for two rea- Charlotte Dawson 2014-02-22 40,149 Full sons: First, it is not a priori clear which parts of the dataset would Robin Williams 2014-08-11 749,422 Sampled

Aaron Swartz 2013-01-12 84,126 Sampled be most interesting and should be selected for intensive analysis. Second, focusing on a small sample misses aggregate characteris- tics, such as the relative volumes and temporal dynamics of dif- ferent classes of responses, which can provide a new dimension , with Swartz’s activism argued to have played a role to many social science questions, including ours, as it focuses on in his treatment. 1 public, or aggregate, expressions of empathy. 2. Amanda Todd died by suicide at the age of 15 in 2012 in We argue therefore that manual coding needs to be adapted British Columbia, Canada. Her death was widely publicised using computational methods, to scale up the volume of data cre- as a result of a video detailing her experiences of cyberbul- ated by social media platforms. Our solution, which we term semi- lying which she had published on YouTube, and which went automated coding, works as follows: we start by noting that man- viral following her death, accumulating more than 1.6 mil- ual inspection cannot be avoided, because a) social scientists need lion views in three days. Part of her experi- to come up with a coding frame that makes sense for the research ence was the abusive and ongoing sharing of images of her questions that are of interest and b) given that the classes of inter- without her consent. An adult male was implicated in this est encompass nuanced, higher-order social-interaction concepts, it abuse. 2 is easiest to define these by example rather than develop compli- 3. Charlotte Dawson was a New Zealand-Australian television cated rules or heuristics that can identify tweets belonging to the personality, and former model, most famous for her roles on class. Therefore, as a first step, researchers can identify the con- Australia’s Next Top Model, New Zealand Getaway, and The cepts/classes of interest, and provide examples. Subsequently, our Contender Australia. She was heavily involved in social me- goal was to build a machine learning model that can learn these dia, and was a target of cyberbullying for several years prior concepts based on the examples given. to her death, with one incident in 2012 occurring around the In order for the above approach to work, we needed two refine- same time as a previous suicide attempt. She died by suicide ments: First, the machine learning model needs a sufficient num- in 2014, aged 47. Prior to her death she was an ambassador ber of labelled examples. This was still difficult due to the labour- against cyberbullying. 3 intensive nature of coding. Therefore, we adopted a two step ap- 4. Leelah Alcorn was an American girl whose par- proach to generate examples: First, trained researchers created a ents had reportedly refused to accept her female gender coding frame and a carefully curated set of example tweets. Next, identity and sent her to Christian-based . an untrained set of workers on a crowd-sourcing platform were Her suicide note, posted on , attracted wide attention. used to label a larger set of tweets. We controlled for the quality Since her death, Alcorn’s parents have been strongly criti- of labelling using agreement between crowd-workers, and agree- cised. Vigils and other activist events have taken place inter- ment between crowd-workers’ labels and the labels associated by nationally to commemorate her life. 4 researchers for the curated set of tweets. 5. Robin Williams was a very well-known Hollywood actor and Second, as with any application of machine learning, the au- comedian. His suicide attracted an enormous amount of tomatically generated set of labels is bound to have a few er- commentary from fans online. At the time of his death he rors. These should be taken into account in any large-scale analy- had reportedly been suffering from severe depression and sis based on semi-automated coding. We observed that if we are had recently been diagnosed with early-stage Parkinson’s able to quantify the extent of the errors, we can reason about disease. 5 the validity of results within a margin of safety, and ensure that the sociological insights stand despite any shortcomings of the We collected five datasets of related Twitter posts for 20 days model. following each death. We were able to obtain the full dataset of tweets for three cases (Amanda Todd, Leelah Alcorn and Char- lotte Dawson) and sampled datasets for the remaining two (Robin 4. Bootstrapping coding using manual effort Williams and Aaron Swartz). The number of tweets across the dif- ferent cases ranged from 40k (Charlotte Dawson) to 749k (Robin The main social science objective of the study was to analyse

Williams) and constituted a total of 1.8M tweets. The datasets are the typology and dynamics of messages on public Twitter follow- summarised in Table 1 . ing a high profile death by suicide. To tackle scalability issues, we designed a hybrid methodology in which our coding typology was applied manually and gradually on different data scales. We be- 3.2. Analysis approach: semi-automated coding gan by manually coding a few hundred tweets which were sub- sequently used to guide the execution of a large-scale labelling For each of these deaths by suicide, from a social science per- experiment on Crowdflower, a crowd-sourcing platform. We then spective, we were interested in understanding the types of re- used twelve thousand labelled tweets obtained from the results sponses that were elicited during public conversations. The aim of the CrowdFlower experiment to train a state-of-the-art machine learning algorithm for short text analysis and to automatically la-

1 https://en.wikipedia.org/wiki/Aaron _Swartz bel the full dataset (discussed in Section 5 ). Below, we describe

2 https://en.wikipedia.org/wiki/Suicide _of _Amanda _Todd

3 https://en.wikipedia.org/wiki/Charlotte _ Dawson

4 https://en.wikipedia.org/wiki/Death _of _Leelah _Alcorn 6 For comparison, analysing a sample of ≈ 200 tweets to develop an initial coding

5 https://en.wikipedia.org/wiki/Robin _Williams frame (Step 1 below) was an ≈ 1 person-day job. D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 37 the design of the manual part, and how it feeds in to bootstrap acts, and might be coded ’correctly’ in several ways. However, we the machine learning model. wanted to sensitise coders to particular forms of communication over others. For example, if someone shared a news story about 4.1. Coding typology using trained researchers a death but also expressed shock or sadness alongside the shar- ing of the link, we wanted that tweet to be coded primarily in the Initially, a random sample of 200 tweets from each of the five RIP/mourning category. In order to help with this, we require each cases was coded qualitatively to identify patterns in communica- Tweet to be coded with exactly one label, and created a decision tion, a method building on previous Twitter research that divides tree to help coders make decisions about how to code a particular tweets according to content [22–25] . The initial coding frame was tweet ( Fig. 1 ). developed from this subset of the dataset. The prioritisation built into the coding process through the deci- To begin with, we made lists of all content types emerging from sion tree attempts to lessen problems caused by such ambiguities the dataset, made observations on which content types were most between multiple categories, such as a case in which a tweet iden- common, and found ways to differentiate appropriately between tifies a social issue then calls for activism on that basis. It also aims tweet types based on emotional content (blame vs. grief, for ex- for consistency within and between the datasets in terms of how ample) and whom the tweet was directed at (other Twitter users, different types of tweet are understood. However, we still expected the deceased, certain people in particular, or society in general). some ambiguity within the overall dataset, and allowed for coder Our coding frame was then inductively and iteratively developed disagreement in our initial analysis of the data. The need for a de- using cross-validation between two coders, in a manner consistent cision tree to focus the work of coders reminds us that working with previous studies examining emotional content in online set- with big data requires interpretation in the same way as qualita- tings [25] . tive analysis [8] . These aspects of tweets (emotional content, and to whom the emotions were directed) most strongly shaped the codes chosen. 4.3. Fine-tuning execution parameters The reason for this focus was that the coding of these tweets was shaped by our research interest in empathy as a concept and as a We chose CrowdFlower as a platform for executing our exper- social practice within the dataset, so we paid particular attention iments because it provided enough flexibility to fine-tune our ex- to tweets that either displayed empathic feeling or a lack of empa- periment and coders from specific countries –a requirement im- thy toward the deceased or those mourning them. We also iden- posed by our ethics board. More specifically, we employed work- tified strong communicative practices in the dataset. Such an ap- ers from the 15 (a limit imposed by Crowdflower) European Union proach to developing a coding frame requires analytical insight (as countries with the largest populations. much as empirical knowledge) about the potential and likely feel- CrowdFlower provided several mechanisms to control the qual- ings of tweeters, and the diversity of responses within the dataset. ity of coders for the experiment, of which the coders’ agreement Many tweets contain web addresses and links. This presents a with a short scale golden set of pre-coded answers proved the challenge in Twitter analysis, particularly in relation to historical most effective. Two researchers labelled a sample of 200 tweets data, because of the likelihood of broken links and the difficulty (40 from each use case) for the golden set experiment and refined of verifying content. We considered coding according to whether this after three iterations of test runs on CrowdFlower. Further, we or not a tweet contained a link, or automatically coding these as removed ambiguous tweets to ensure every possible chance for headlines or informative tweets. However we ultimately decided crowd workers to agree with the golden set. Finally, we followed 7 that this would strongly skew the dataset, and found that many the CrowdFlower’s recommendations and balanced the number of tweets containing links were not only about information sharing, tweets in each class ending up with a golden set of 64 tweets, with but also contained emotional content that was relevant to our re- 8 tweets from each class. These tweets, rather than being represen- search. tative of the entire dataset, functioned as a benchmark to test the This initial coding suggested that empathy manifested in a accuracy and agreement among coders in the experiment, and al- number of different ways. Through a process of detailed coding fol- lowed us to ensure that tweets were coded by Crowdflower work- lowed by the building of a coding frame with a smaller number of ers who had the best understanding of the appropriateness of a representative categories, a typology of responses was generated: particular code for a particular tweet. Coding by those who showed mourning , where people expressed their personal reactions to the an accuracy of less than 65% and 66% in relation to the golden set death including sadness or shock; social issues , where people drew was excluded from the results of the first and the second experi- attention to or discussed social issues related to the death such ments, respectively. Note, that –although consistent with some of as bullying or depression; activism , where people discussed taking the previous works [26] –these thresholds are slightly lower than action in relation to the aforementioned social issues or attending a more frequently used value of 70% [27,28] . Our choice has been a candlelit vigil; positive actions and negative actions , where peo- motivated by an observation that most of the workers with accu- ple discussed what others were doing or had done in relation to racy between 66% and 70% in the first experiment (and between the death; lack of empathy , where people judged either the person 65% and 70% in the second experiment) have provided reasonable who died or those mourning them; and headline , which denotes a feedback on their failed test questions and so we do not expect straightforward news headline or statement of facts relating to the their contributions to introduce a systematic error in the results. death. Tweets that did not fit comfortably with any of these classes In our test runs, we noted very diverse results in the level of were coded as uncategorisable . coders’ conformity with our golden set: whereas over 40% of test questions were missed or contested by low-quality coders, a signif- 4.2. Scaling the coding using crowd-sourcing icant set of high-quality coders exhibited more than 96% of agree- ment with the golden set. The average level of accuracy among se- Next, we created jobs on the Crowdflower crowdsourcing plat- lected coders (i.e., among those who scored more than 65% on the form to expand the list of human labelled tweets. Providing in- golden set) reached 78–82%. To encourage participation of high- structions for crowdworkers in a brief and descriptive way has quality coders we doubled the default pay for the job and noted been identified as one of the main challenges in conducting crowd- sourcing experiments [13] and this was the case in this research. 7 https://success.crowdflower.com/hc/en-us/articles/202702985-How-to-Create- Tweets are often ambiguous, containing multiple communicative Test-Questions 38 D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43

Table 2 The summary of the CrowdFlower experiments. The table indicates the parameters and the main performance indicators from each experiment.

Characteristic Exp #1 Exp #2

Tweets labelled ≈ 2 k 10k Test questions 64 64 Judgments per Tweet 2 2 Speed vs quality Quality Speed Workers quality threshold 66% 65% Number of selected workers 13 61 Selected workers quality 82% 78% Workers agreement 67% 59% Workers feedback 3.1/5 3.6/5

Fig. 3. Relation between sentiment scores and the classes of the proposed topology . The mean positive and negative sentiment scores –as measured by SentiStrength library –for each class in the dataset labelled by CrowdFlower workers.

A and B can similarly happen when a higher-quality coder voted A as well as when he/she voted B (recall, however, from the previous section that we only included those workers who matched over 65% of the golden set tweets). Secondly, some pairs of classes are confused much more frequently than others: The disagreements are most likely between tweets labelled as “positive action” and Fig. 2. Confusion matrix from the CrowdFlower experiment. The percentage of tweets “headline” ( ≈ 1.6%), and between tweets labelled “uncategorised” coded differently by two workers (columns and rows represent the higher- and and “mourning” or “headline” ( ≈ 1.1–1.6%). This result can be prob- lower-quality coders, respectively). The tweets coded similarly (i.e., diagonal ele- ably explained by the fact that many tweets about people’s actions ments) are excluded. in response to the death came in the form of headlines, and by the fact that there was some misunderstanding among coders about in the description that the job required extra attention and that a when the headline code should be used. good performance would be rewarded with bonuses. We then ran We next validated the crowdsourced labels by analysing the two experiments trading off between speed and quality (i.e. level sentiments of the tweets for which the labels were generated. of conservatism in selecting new coders) and labelled an over- Most sentiment analysis tools typically attach a positive or nega- all sample of around 12k tweets, with each tweet coded by two tive ‘sentiment score’, and therefore are less specific and nuanced CrowdFlower workers. We opted to collect more data points at than the coding frames typically used in social science. However, the cost of having fewer judgments for each label; at the same understanding the general sentiment scores of different classes time, we were conservative in selecting only consensus votes for that the crowd has identified provides us with a coarse-grained the next – machine learning –step of our analysis (in Section 5 ). assurance in the validity of the results. To this end, we used the A few factors contributed to this decision. On the one hand, we SentiStrength library [29] , considered to be one of the best tools had already imposed several measures to control the quality of the for short texts [30] , and associated each tweet with a score be- labelling process –by choosing only high-quality coders and opt- tween 1 and 5 for positive and negative sentiments. ing for consensus votes from two coders. On the other hand, we Fig. 3 presents the mean positive and negative scores for each expected our machine learning algorithm to benefit more from a class in our CrowdFlower dataset. Firstly, we noted that the high- diversity of data points rather than from a diversity of judgments. est negative and the highest positive sentiment scores are observed Since we were interested in analysing the temporal evolution of among the tweets from the most polarised classes –that of Neg- the discourse in our datasets, we sampled an equal number of ative and Positive Action. Similarly, the Mourning/RIP and Lack of tweets from each of the first twenty days in each considered use Empathy classes in our dataset are associated with expectedly high case. The parameters of our CrowdFlower experiments are sum- negative sentiments. Because both results are intuitively expected marised in Table 2 . given the classes, we obtain some assurance about the quality of crowd labels. 4.4. Validation of crowdsourced labels We also observed a striking difference between the sentiment scores of tweets in activism and Social classes, which are seman- The results of the experiments suggested a reasonably high tically close: Whereas the activism related tweets have relatively level (over 60%) of agreement between coders. In Fig. 2 we char- neutral sentiment –as indicated by low negative and positive acterise the cases when workers disagreed in their classification. sentiments –the tweets from the Social Issues class show aver- Each cell in the matrix represents the percentage of tweets which age negative scores of over 2.5 –the second most negative re- were coded differently by two workers, and the columns and rows sult among all classes in the dataset. This suggests not only that represent a higher- and lower-quality workers (as indicated by sentiment analysis and coding are complementary analyses (and their level of agreement with the golden set). The firsts thing to thus both can add different dimensions when used on the same note is that the matrix is predominantly symmetrical, indicating dataset), but also that the crowd-workers are able to distinguish that disagreements have little correlation with difference in the closely related semantic classes in a way that reflects expected dif- quality of coders: disagreement between a specific pair of classes ferences, such as sentiment scores. D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 39

5. Machine learning approach to understanding online classes. In terms of recall, the model was able to capture 69% of in- mourning stances of each class on average with a maximum of 79% achieved for ‘activism’. In order to scale up our analysis from twelve thousand to a mil- lion tweets, we used a supervised machine learning algorithm for 5.3. Manual validation processing short texts. We describe the model and its performance evaluation on our dataset in the rest of this section. To provide an intuitive understanding of the algorithm’s strong performance in discriminating different classes of tweets, in 5.1. Algorithm Table 3 we present the words with the highest relative frequencies in each class with respect to the overall frequency of the words in The goal of the machine learning model is to mimic the hu- the dataset. We note that the illustrations contain words that can man researcher who codes (i.e., classifies) tweets based on their be expected to signify each category (e.g. the ‘activism’ messages content. To recreate this effect, we exploited and adapted a state- predominantly consist of highly relevant words such as ‘sign’, ‘law’, of-the-art deep convolutional neural network architecture CharSCNN ‘petition’, ‘ban’, etc.). for short text classification proposed in [16] that was designed But beyond most frequent word, an important question from to operate at a word-level to capture syntactic and semantic in- the social science perspective is whether the machine learning formation, and at a character-level to capture morphological and model can interpret nuance in particular cases. In some cases, shape information. As argued in [16] the latter is particularly im- particularly tweets where people recommended, congratulated or portant for short texts such as Twitter posts that contain abbre- praised what someone else had done or written in response to viations, misspellings, emoticons and other word forms not com- the death, it did very well in determining subtle changes in tweets mon within traditional texts. As a result, CharSCNN showed sig- and accurately identifying the rhetorical intent of the tweet. These nificant improvement over alternative –recursive deep neural net- were frequently correctly coded as Positive Action, even though the works [31] and traditional bag-of-words models –when applied tweets were otherwise similar to tweet types such as Negative Ac- for fine-grained classification of Tweets. tion or Social Issues: Each tweet in this approach is represented by a sequence of

, . . . , = wrd , wch • 9 N words [w0 wN ] where each word vector wi [r r ] is Thank-you @xxxxxx for this balanced article that illustrates the composed of two sub-vectors r wrd for word-level embeddings and danger of a powerful state & those who resist it . < LINK > r wch for character-level embeddings . We use a one-hot vector repre- sentation for character-level embeddings. Whereas in principle the However there were also several instances where it did less model should be able to extract reasonable word-level embeddings well, and the repetition of similar tweets or claims might then lead from a one-hot vector representation of words too (if the training to inaccuracies in the overall volume of tweets in each category. In dataset is sufficiently large), in practice, it proved to be much more relation to the question of add-ons to quoted tweets, this proved efficient to use externally pre-trained word-level embeddings. Such problematic in some cases. For example, the following tweet was unsupervised pre-training of word representations has significantly coded as an RIP tweet, though neither the quoted tweet nor the improved the classification accuracy in the original CharSCNN pa- comment –#blocked–shouldhave been coded in that way: per –a result which has been also confirmed in our experiments. • #blocked RT @xxxxxx: I don’t know much about the case, but what In particular, we used the Glove word vectors pre-trained on the I do know is I don’t feel sorry for Aaron Swartz’ suicide. dataset of 2B tweets from Pennington and Socher [32] . We used randomly generated values for a minority (25%) of words which The original tweet should have been coded as Lack Of Empa- did not appear in the Glove vocabulary. thy, and the add-on comment as Negative Action. Clearly, however, The neural network we designed in the Theano machine learn- this is a very complex tweet in terms of rhetorical intent and there 8 ing package was composed of two convolution layers with max are likely to be issues with the correct coding of single hashtagged pooling aggregation –one for character-level and one for word- words even in human coding. level embeddings, respectively – followed by two fully connected In the next section we highlight the greater prevalence of a layers with dropouts to control for over-fitting and a final soft- ‘lack of empathy’ in responses to the death of Amanda Todd. There max layer with eight outputs corresponding to each of the labels were many clear examples of correctly identified ‘lack of empa- in our dataset. The network was trained using mini-batch gradient thy’ in this dataset. However, in some cases the machine learning descent by minimising the negative log likelihood of the training approach appears to have misinterpreted complex constructions of dataset. empathy as a lack of empathy. Here are two examples:

• 5.2. Cross-validation RT @xxxxxx: I hate that everyone is suddenly buzzing about Amanda Todd now. She doesn’t need the sympathy now, she We validated the performance of the algorithm over the dataset needed it before ... of tweets labelled by the CrowdFlower workers as described in the In this example, although the phrase ‘she doesn’t need the sym- previous section. Specifically, we used all labels with agreement pathy’ taken on its own would be read as a lack of empathy, the between the coders which resulted in a dataset of 7.1k tweets. We tweet taken as a whole might be understood as saying that suicide note that the modelled reached an average accuracy of 71% in a is preventable, and that in this case it resulted from a failure of 10-fold cross-validation with approximately 50 training epochs in empathy. Likewise: each experiment and minor improvements thereafter. Looking at the model capabilities of predicting individual • RT @xxxxxx: Amanda Todd’s story breaks my fucking heart. She classes of messages ( Table 3 ) we note that the precision varied be- made a stupid mistake, and it followed her for all of the wrong tween 60% for predicting ‘negative actions’ to 89% for discriminat- reasons. ing ‘activism’ with the average precision being over 70% across all

9 Because of the sensitive nature of the topic, all names and identifiable parts

8 http://www.deeplearning.net/software/theano/ (e.g., URLs), have been anonymised or removed. 40 D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43

Table 3 Prediction performance of the classifier. The averaged values of precision (Prec.) and recall (Rec.) of the 10-fold cross-validation are reported along with the most frequent terms from each class. We also report the rates of Type-1 and Type-2 errors which indicate the relative contribution of each class to the total error of the 8-way classifier. Note that the sum of Type-1 and the sum of Type-2 errors (rounded in the table for better presentation) each add up to 0.29 = 1 – reported accuracy (0.71).

Class Frequent terms Precision Recall Type-1 Err. Type-2 Err.

activism Sign, law, therapy, conversion, lgbtq + , ban, enact, petition 0.89 0.79 0.01 0.03 Negative action Funeral, parents, death, best, friend, banned 0.60 0.41 0.02 0.05 Headline Star, tv, found, dead, Australian, suicide, dies 0.61 0.78 0.08 0.04 Lack of empathy blezach, like, shit, people, getting, commit, fuck 0.67 0.73 0.03 0.02 pozsitive action Tribute, billy, crystal, dedicated, emmys, dedicates, transparent 0.67 0.62 0.04 0.05 Rip/mourning Rip, sad, rest, miss, heart, piece, beautiful, missed 0.72 0.77 0.06 0.05 Social issue Bullying, people, suicide, stop, sopbullying, cyberbullying, depression, society 0.69 0.61 0.03 0.04 Uncategorised Liked, de, clt, welcome, youtube, amandashires 0.80 0.78 0.02 0.02

This is clearly an example of empathy, but it may have been interpreted as a lack of empathy because of the phrases ‘stupid mistake’ or ‘wrong reasons’. The use of the word ‘mistake’ , how- ever, actually refers to her being blackmailed and cyberbullied af- ter sharing images of her body on video chat rather than to her death. Despite these distortions, it is clear that the machine learning correctly identifies lack of empathy to be more prevalent in this case, and that this changes over time. However, in a multi-case study, we should be aware that individual circumstances surround- ing events may have an impact on the accuracy of comparisons be- tween cases, and any large-scale analysis would need to take into consideration that the machine learning model would have some Fig. 4. Relative volumes of classes across use cases. The relative volumes of tweets erroneously labelled tweets. from each class along with the estimated error intervals are reported.

6. Analysing dynamics of public empathy class C we estimate the rates of Type-1 and Type-2 errors induced In this section, we highlight the utility of a machine learning by mislabelling tweets in class C . More specifically, Type-1 error is approach in assisting and supporting qualitative research by pre- measured as a share of all cases when a tweet from a class other senting some emergent findings. Specifically, we argue that a ma- than C has been labelled as C , whereas Type-2 error is measured chine coding approach can contribute to a nuanced reading of sub- as a share of all cases when a tweet from class C has been misla- tle social and discursive changes as an event unfolds. There are still belled as some other class (see the last two columns of Table 3 ). In 1 2 clearly instances using this approach where subtleties are missed other words, Type-1 (tC ) and Type-2 (tC ) errors assess the extent ρ and, as a result, qualitative analysis is important to fully under- to which the share C of class C might have been over- or under- stand what is being articulated. The dynamics of communication estimated in our calculations and, therefore, are represented as an ρ − 1 ...ρ + 2 identified through the machine learning approach, however, pro- interval [ C tC C tC ] in Figs. 4 and 5. vide a focus for this analysis. The combined use of iterative qualita- Intuitively, the mislabelling error for each individual class con- tive coding, crowd-coding, machine learning, and qualitative anal- tributes to the overall share of mislabelled cases and can be mea- ysis can potentially help us to better understand complex and nu- sured by the complement of accuracy, i.e., individual Type-1 errors anced social discussions at scale. (as well as individual Type-2 errors) in Table 3 sum up to 0.29 which is equivalent to 1 – accuracy (0.71). Thus, we plot the relative volumes as well as the dynamics of 6.1. On interpreting semi-automated coding classes identified by our machine learning algorithm on the full dataset, indicating the error of the estimate with the grey intervals In the following, we focus on the analysis of the temporal dy- around each class in Fig. 5 and with error bars in Fig. 4 . We note namics of expressions of empathy (or lack of empathy) in our case that in general the errors of estimated shares are relatively smaller studies. We do so through the analysis of the relative shares of than the differences between the shares of the most prominent tweets classified by our machine learning algorithm in each class classes in the vast majority of cases across all suicides considered, at each day during the events. Both the crowd-coded and the allows us to draw qualitative conclusions about the dynamics of machine-coded datasets allowed for the production of visualisa- classes. This in turn allows for the selection of particular moments tions of the dynamics of each of the cases. However, the machine- to be investigated through a closer qualitative reading. coded datasets, because of the volume of tweets coded, allowed for a highly specific and complex reading of the interplay of each of the different tweet types, both across the set of tweets for each suicide and at particular times within each suicide. 6.2. Qualitative reading through semi-automated coding Our main approach will be to compare relative volumes of dif- ferent classes, both in the aggregate (c.f. Fig. 4 ), as well as over To illustrate the benefits of the semi-automated coding ap- time (c.f. Fig. 5 ). To do so, it is important to estimate the error in- proach, we now discuss some qualitative findings which would terval of the predictions made by our algorithm. Specifically, we have been difficult to obtain if only a small subset of data had been need to understand how the error of mislabelling tweets in our used for close reading. experiments is distributed across individual classes and how that In terms of overall types of communication, Fig. 4 showed us affects our estimates of relative volumes. To this end, for every that each case had a different profile in terms of the kinds of D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 41

Fig. 5. Dynamics of public response to high-profile suicides. The per-day relative volumes of tweets from each class are reported along with the estimated error intervals for each class drawn as grey areas, with values obtained from Table 3 . Note, that the reported errors are static across individual days and are calculated from the last two columns of Table 3 . communication that dominated the discussion. We found that for • Many many American dissidents believe U.S. Officials DROVE #Red- Aaron Swartz, headlines and news articles dominated, but Leelah ditFounder #AaronSwartz to suicide with capriciously aggressive Alcorn and Aaron Swartz both had high levels of activism com- prosecution. pared to the other cases: their deaths were more politicised –both deaths resulted in draft laws named after them. In both of these These examples show the uses of machine learning for identi- cases the negative actions of others are also strong in the dataset fying, at scale, moments during the unfolding of events and public (particularly for Leelah Alcorn) suggesting that in both cases a discussions on Twitter, where something significant occurs, minds sense of injustice or mistreatment drove the politicisation and the are changed, or new arguments and claims are made. It also pro- activism that followed. Further qualitative analysis will be carried vides an opportunity for the examination of relationships between out to understand the connection between this politicisation and different communicative types, whether across a whole dataset or the way each death was understood within the dataset and in the for individual Twitter users. broader public sphere. As previously mentioned, the Amanda Todd case had the high- 7. Discussion, conclusions and lessons est levels of ‘Lack of Empathy’ coded tweets as a proportion of the data. Such tweets make up more than a quarter of the Todd Social science has tended to use small-scale, intensive, qualita- data, whereas they form only a small proportion of the data tive methods to explore issues of nuance and emotion. However, if for each of the other cases. This means that participants in the we are interested in the aggregated or social patterning or collec- conversation following her death were more likely to judge her tive expression of such phenomena –as in the case of public em- harshly or to judge others for caring about her death. Although we pathy –we need methods that are capable of bridging from small- have previously discussed the possibility of some minor distortion, scale, intensive study to potentially very large volumes of data that lack of empathy remains a feature when we look closely at the lie beyond the capabilities of manual coding. data. The analysis presented here suggests that the combination of This raises the importance of understanding change over time qualitative analysis with machine learning can offer both a big pic- in relation to each case, rather than relying only on comparison ture view of public events and close analysis of particular turn- of volumes, and we turn to Fig. 5 for this. In the case of Amanda ing points or key moments in discussions of such events. As such, Todd, although ‘Lack of Empathy’ is present in the data from day it can potentially yield new insights not easily achievable through one, it does not begin to dominate until the 4th or 5th day of traditional qualitative social science methods. discussion, showing something of a backlash effect. Prior to that, Although our specific case study looked at emotions and em- the dominant themes were mourning and social issues. Such issues pathy in relation to high-profile deaths by suicide, the overall were strong in these data because of the discussion of bullying and approach of semi-automated coding could be adapted to other cyberbullying in relation to the death of Amanda Todd. Over time, research questions. Our experience suggests, however, that such however, participants in the conversation increasingly make claims adaptation will not be as simple as using a tool or a library. about Amanda being to blame for the bullying she endured. Fur- Rather, it is an approach that needs to be tailored to the prob- ther qualitative analysis is needed to understand the discourses at lem at hand –each research question may require specific tweaks. play here, though in the case of Amanda Todd this might be re- For instance, if crowdsourcing is used to increase the set of man- lated to a continuation of bullying behaviour, as well as perhaps ual labels, slightly different approaches or different decision trees her age and gender [33,34] . may need to be developed to enable adequate levels of agreement In the case of Aaron Swartz, there is a peak in the ‘ac- amongst crowd workers. We made a decision to assign each tweet tivism’ code just over two weeks after his death which coincides to one unique class. Addressing other problems may lead to am- with activities by the group Anonymous. Participants acting under biguous tweets being treated differently, e.g., allowing simultane- this moniker launched attacks on government websites to protest ous or fractional (weighted) membership in multiple classes. Aaron’s prosecution and death. As described in the section on over- With the kind of customisation described above, big data-based all patterns, activism in Aaron’s case was linked to discussion of methods can give us some purchase on aggregated and collective negative actions, on the part of the United States Department of aspects of emotional expression online. This is increasingly neces- Justice, the FBI, and the institutions involved in Aaron’s legal case. sary given the significance of social media in mediating and con- One example of this type of tweet was: stituting emotional lives. At the same time, however, the analysis 42 D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 above also reminds us that, while decision trees and similar ap- [15] W. Willett , J. Heer , M. Agrawala , Strategies for crowdsourcing social data anal- proaches aimed at guiding manual or automated coding can help ysis, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, 2012, pp. 227–236 . to narrow differences in classification, the interpretive gap cannot [16] C.N. dos Santos , M. Gatti , Deep convolutional neural networks for sentiment be completely closed. analysis of short texts., in: Proceedings of International Conference on Compu- Our method aims to combine a conventional classification tational Linguistics, COLING, 2014, pp. 69–78 .

[17] K. Glasgow, C. Fink, J.L. Boyd-Graber, ”Our grief is unspeakable”: automatically method used in qualitative social science (coding), with algorith- measuring the community impact of a tragedy., in: Proceedings of Interna- mic classification using machine learning. Although the authors of tional Conference on Web and Social Media, ICWSM, 2014 . this article included experts in both these approaches, significant [18] J. Garde-Hansen, Measuring mourning with online media: Michael Jackson and challenges arose in merging the two: in particular, we underesti- real-time memories, Celebr. Stud. 1 (2) (2010) 233–235, doi:10.1080/19392397. 2010.482299 . mated the difficulty of creating a coding scheme that can be inter- [19] G. Terzis , et al. , Death trends: activism and the rise of online grief, Kill preted and applied by crowd workers to create reliable high qual- Your Darlings (22) (2015) 9–24 . ity labels. Our initial efforts were unsuccessful as different crowd [20] S.K. Radford, P.H. Bloch, Grief, commiseration, and consumption following the death of a celebrity, J. Consum. Cult. 12 (2) (2012) 137–155, doi: 10.1177/ workers assigned different priorities to the different labels, leading 1469540512446879 . to inconsistency. In our second attempt, therefore, we provided a [21] C. Sian Lee , D. Hoe-Lian Goh , “Gone too soon”: did twitter grieve for michael clear guide for crowd workers, using the decision tree in Fig. 1 to jackson? Online Inf. Rev. 37 (3) (2013) 462–478 . [22] A. Bruns, J. Burgess, K. Crawford, F. Shaw, #qldfloods and @QPSMedia: Crisis help to create greater consistency in labelling. This improvement, Communication on Twitter in the 2011 South East Queensland Floods, Bris- while simple, was instrumental to the success of our methodology. bane, Australia, 2012 . Technical Report URL http://eprints.qut.edu.au/48241/ . We believe this example also illustrates the nature of potential pit- [23] F. Shaw , J. Burgess , K. Crawford , A. Bruns , Sharing news, making sense, saying thanks, Aust. J. Commun. 40 (1) (2013) 23 . falls, and how they are more likely to be non-technical than tech- [24] Z. Zhou , R. Bandari , J. Kong , H. Qian , V. Roychowdhury , Information resonance nical. Paying attention to matters of interpretation is likely to be on twitter: watching iran, in: Proceedings of the First Workshop on Social Me- an essential feature of future interdisciplinary research in compu- dia Analytics, ACM, 2010, pp. 123–131 .

[25] D. Hoe-Lian Goh, C. Sian Lee, An analysis of tweets in response to the death of tational social science. michael jackson, in: Aslib Proceedings, 63, Emerald Group Publishing Limited, 2011, pp. 432–4 4 4 . [26] K. De Kuthy , R. Ziai , D. Meurers , Learning what the crowd can do: a case study Acknowledgements on focus annotation., in: Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics., 2015 . [27] B. Ionescu , A.-L. Radu , M. Menéndez , H. Müller , A. Popescu , B. Loni , Div400: a

The “A Shared Space and A Space for Sharing” project (Grant social image retrieval result diversification dataset, in: Proceedings of the 5th no. ES/M00354X/1) is one of several funded through the EMoTI- ACM Multimedia Systems Conference, ACM, 2014, pp. 29–34 . CON network, which is funded through the following cross-council [28] S. Mac Kim , S. Wan , C. Paris , Detecting social roles in twitter, in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2016, p. 34 . programmes: Partnership for Conflict, Crime and Security Research [29] M. Thelwall , Heart and soul: sentiment strength detection in the social web (led by the Economic and Social Research Council (ESRC)), Con- with sentistrength, CyberEmotions, 2013, pp. 1–14 . nected Communities (led by the Arts and Humanities Research [30] P. Gonçalves , M. Araújo , F. Benevenuto , M. Cha , Comparing and combining sen-

timent analysis methods, in: Proceedings of ACM Conference on Online Social Council (AHRC)), Digital Economy (led by the Engineering and Networks, COSN, ACM, 2013, pp. 27–38 . Physical Sciences Research Council (EPSRC)). [31] R. Socher , A. Perelygin , J.Y. Wu , J. Chuang , C.D. Manning , A.Y. Ng , C. Potts , et al. , Recursive deep models for semantic compositionality over a sentiment tree- bank, in: Proceedings of the Conference on Empirical Methods in Natural Lan- References guage Processing, EMNLP, 1631, Citeseer, 2013, p. 1642 . [32] J. Pennington , R. Socher , C.D. Manning , Glove: global vectors for word repre-

[1] J. Brownlie , Ordinary Relationships. A Sociological Study of Emotions, Reflexiv- sentation., in: Proceedings of the Conference on Empirical Methods in Natural

ity and Culture, Palgrave MacMillan, 2014 . Language Processing, EMNLP, 14, 2014.

[2] D. Brake , Sharing Our Lives Online: Risks and Exposure in Social Media Pal- [33] R. Penney, The rhetoric of the mistake in adult narratives of youth sexuality:

grave MacMillan, Springer, 2014 . the case of Amanda Todd, Fem. Media Stud. 16 (4) (2016) 710–725.

[3] A. Halavais , Bigger sociological imaginations: framing big social data theory [34] J. Ringrose, L. Harvey, Boobs, back-off, six packs and bits: mediated body parts,

and methods, Inf. Commun. Soc. 18 (5) (2015) 583–594 . gendered reward, and sexual shame in teens’ sexting images, Continuum 29

[4] D.V. Shah , J.N. Cappella , W.R. Neuman , Big data, digital media, and computa- (2) (2015) 205–217. tional social science possibilities and perils, Ann. Am. Acad. Political Soc. Sci. Dmytro Karamshuk is a Senior Data Scientist at Skyscan- 659 (1) (2015) 6–13 . ner. His research focuses on data mining and modeling [5] G.B. Colombo , P. Burnap , A. Hodorog , J. Scourfield , Analysing the connectivity behaviour of online users. He has previously worked on and communication of suicidal users on twitter, Comput. Commun. 73 (2016) analysis of BBC iPlayer and various social media web- 291–300 . sites (Foursquare, Twitter, Pinterest, etc.). He is an active [6] M.L. Williams , P. Burnap , Cyberhate on social media in the aftermath of Wool- contributor to the computer networks (Infocom, ComMag, wich: a case study in computational criminology and big data, Br. J. Criminol. JSAC etc.) and data mining communities (KDD, WWW, 56 (2) (2016) 211–238 . ICWSM, etc.). Dmytro’s work has been featured in New [7] R. Tinati , S. Halford , L. Carr , C. Pope , Big data: methodological challenges and Scientist and BBC News. approaches for sociological analysis, Sociology (2014) 663–681 . [8] d. boyd , K. Crawford , Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon, Inf. Commun. Soc. 15 (5) (2012) 662–679 . [9] P.A. Schrodt , Automated coding of international event data using sparse pars- ing techniques, in: Proceedings of the Annual Meeting of the International Frances Shaw is a Postdoctoral Researcher in Applied Studies Association, Chicago, 2001 . Ethics with the Black Dog Institute. She is a social the- [10] A. Esuli , F. Sebastiani , Machines that learn how to code open-ended survey orist and qualitative researcher in the area of media and data, Int. J. Mark. Res. 52 (6) (2010) 775–800 . technology, with a background in media studies and pol- [11] A.D. Shaw , J.J. Horton , D.L. Chen , Designing incentives for inexpert human itics. She is currently examining the ethics and politics of raters, in: Proceedings of ACM Conference on Computer Supported Coopera- social media and mobile device interventions for the di- tive Work, CSCW, 2011 . agnosis and prevention of mental illness, with a particular [12] V.S. Sheng , F. Provost , P.G. Ipeirotis , Get another label? Improving data quality focus on questions of data privacy and security, confiden- and data mining using multiple, noisy labelers, in: Proceedings of the Interna- tiality, surveillance and consent, algorithmic accountabil- tional Conference on Knowledge Discovery and data mining, KDDS, ACM, 2008, ity, and the allocation of moral responsibility in mHealth pp. 614–622 . and eHealth solutions. Previously she was a Research Fel- [13] A. Kittur , J.V. Nickerson , M. Bernstein , E. Gerber , A. Shaw , J. Zimmerman , low at the University of Edinburgh on a research project M. Lease , J. Horton , The future of crowd work, in: Proceedings of Conference partnered with the suicide reduction charity Samaritans on Computer Supported Cooperative Work, CSCW, ACM, 2013, pp. 1301–1318 . UK, researching the expression of emotional distress on social media, and how trust [14] S. Komarov , K. Reinecke , K.Z. Gajos , Crowdsourcing performance evaluations and empathy is established in online spaces. Her primary research interests include of user interfaces, in: Proceedings of Conference on Human-Computer Interac- digital ethics, social media cultures, digital methods, health cultures, digital embod- tion, CHI, ACM, 2013, pp. 207–216 . iment and the self. D. Karamshuk et al. / Online Social Networks and Media 1 (2017) 33–43 43

Julie Brownlie is senior lecturer in sociology at the Uni- Nishanth Sastry is a Senior Lecturer at King’s College versity of Edinburgh. Her research and teaching interests London. He holds a PhD from the University of Cam- include the sociology of emotions, relationships, digital bridge, UK, a Master’s degree from The University of narratives and the everyday. She is currently researching Texas at Austin, and a Bachelor’s degree from Bangalore trust and empathy online as part of the ESRC’s EMoTICON University, India, all in Computer Science. He has spent programme. several years in Industry, at Cisco Systems and at IBM (both in the Software Group and at the TJ Watson Re- search Center). His work in the last few years has focused on analysing large real-world datasets, funded by several grants from two different UK Research Councils (EPSRC and ESRC), as well as by the European Commission. He has given several keynotes about his work, and has fre- quently been featured in various TV shows and other me- dia outlets including Nature News, New Scientist and BBC.