D7.2.1 / Annotated Corpus – Initial Version
Total Page:16
File Type:pdf, Size:1020Kb
D7.2.1 / Annotated Corpus – Initial Version FP7-ICT Strategic Targeted Research Project PHEME (No. 611233) Computing Veracity Across Media, Languages, and Social Networks D7.2.1 Annotated Corpus – Initial Version ___________________________________________________________ Anna Kolliakou, King’s College London Michael Ball, King’s College London Rob Stewart, King’s College London Abstract FP7-ICT Strategic Targeted Research Project PHEME (No. 611233) Deliverable D7.2.1 (WP 7) This deliverable describes the creation and annotation of two corpora of tweets on legal highs and prescribed medication. It provides a short summary of the aims and objectives of WP7, a demonstration work package for the PHEME project, and its demonstration studies. The development of the corpora is presented for each of the two case studies together with a detailed methodology of their annotation. Lastly, both corpora with their corresponding manually-annotated codes are provided. The aim of this deliverable is to describe the initial version of the development and annotation of two corpora of tweets corresponding to the first two demonstration studies of WP7. Keyword list: tweets, corpora, annotation, mephedrone, paliperidone, pregabalin, clozapine. Nature: Report Dissemination: RE Contractual date of delivery: 28.02.15 Actual date of delivery: Reviewed By: Web links: 1 D7.2.1 / Annotated Corpus – Initial Version PHEME Consortium This document is part of the PHEME research project (No. 611233), partially funded by the FP7-ICT Programme. University of Sheffield Universitaet des Saarlandes Department of Computer Science Computer Linguistics Regent Court, 211 Portobello St Postfach 15 11 50 Sheffield S1 4DP, UK D-66041 Saarbrücken Tel: +44 114 222 1930 Germany Fax: +44 114 222 1810 Contact person: Thierry Declerck Contact person: KalinaBontcheva E-mail: [email protected] E-mail: [email protected] MODUL University Vienna GMBH Ontotext AD Am Kahlenberg1 Polygraphia Office Center fl.4, 1190 Wien 47A Tsarigradsko Shosse, Austria Sofia 1504, Bulgaria Contact person: Arno Scharl Contact person: Georgi Georgiev E-mail: [email protected] E-mail: [email protected] ATOS Spain SA King’s College London Calle de Albarracin 25 Strand 28037 Madrid WC2R 2LS London Spain United Kingdom Contact person: TomásPariente Lobo Contact person: Robert Stewart E-mail: [email protected] E-mail: [email protected] iHub Ltd. SwissInfo.ch NGONG, Road Bishop Magua Building Giacomettistrasse 3 4th floor 3000 Bern 00200 Nairobi Switzerland Kenya Contact person: Peter Schibli Contact person: Rob Baker E-mail: [email protected] E-mail: [email protected] The University ofWarwick Kirby Corner Road University House CV4 8UW Coventry United Kingdom Contact person: Rob Procter E-mail: [email protected] 2 Executive Summary This deliverable, firstly, provides a summary of the aims and objectives of PHEME’s WP7 together with a short description of the 4 demonstration studies under development. Demonstration studies 1 and 2 are the focus of the main report. The background and aims of our first demonstration study on social media and medication choices is presented. The methodology for the development of the corpus and search terms related to medication in Twitter is then described. The annotation processes of medication-related tweets for clozapine, paliperidone and pregabalin as well as the results arising from this activity are reported in detail. For our second demonstration study on social media and ‘legal highs’, we provide a short summary of the background and objectives. We present the methodology for corpus and related-terms selection of both Twitter and clinical records. The process and results from the annotation of mephedrone-related tweets and mephedrone-related instances in the clinical notes are then described in detail. We conclude this report by outlining how the development of this deliverable will contribute to future work and summarise the next steps to deliverable 7.2. Contents Executive Summary ..................................................................................................... 3 Contents ........................................................................................................................ 4 1 Introduction .......................................................................................................... 5 1.1 WP7 – Veracity intelligence for patient care ....................................................... 5 2 Demonstration study 1 – Social media and medication choices ...................... 7 2.1 Background .......................................................................................................... 7 2.2 Aims and objectives ............................................................................................. 7 2.2.1 Specific aims ................................................................................................. 7 2.3 Annotation process............................................................................................... 8 2.3.1 Corpus selection ............................................................................................ 8 2.3.2 Search terms .................................................................................................. 8 2.3.4 Methodology ................................................................................................. 8 2.3.5 Results ........................................................................................................... 9 3 Demonstration study 2 – Social media and ‘legal highs’ ................................ 11 3.1 Background ........................................................................................................ 11 3.2 Aims and objectives ........................................................................................... 11 3.2.1 Specific aims ............................................................................................... 11 3.3 Annotation process............................................................................................. 12 3.3.1 Corpus selection .......................................................................................... 12 3.3.2 Search terms ................................................................................................ 12 3.3.3 Methodology ............................................................................................... 14 3.3.3 Results ......................................................................................................... 14 4 Progress so far and steps to D7.2 for M24 ....................................................... 17 4.1 Medication choices ............................................................................................ 17 4.2 Mephedrone ....................................................................................................... 17 4.3 Mental health stigma .......................................................................................... 17 4.4 Self-harming and suicidal behaviour ................................................................. 18 Bibliography and references ..................................................................................... 19 Appendices .................................................................................................................. 20 Appendix Set of annotated tweets for clozapine, paliperidone, pregabalin .. 20 Appendix 2 Set of annotated tweets for mephedrone ......................................... 87 1 Introduction 1.1 WP7 – Veracity intelligence for patient care The PHEME project is focusing on social media veracity, a largely unstudied big data computational challenge. It will model, identify and verify information as it spreads across media, languages and social networks and test this system in two case studies: WP7 and WP8 (digital journalism). The broad aims of WP7 are to turn the project technologies toward practical applications in the healthcare domain, to enable health professionals and policy makers to analyse Internet content for emerging medically-related patterns, rumours, and other health-related issues. This analysis may in turn be used (i) to develop educational materials for patients and the public, by addressing concerns and misconceptions, and (ii) to link to analysis of the electronic health records. The objective of WP7 is therefore to carry out research towards a PHEME-based platform for the medical domain for multi-channel media monitoring, extraction, verification, and visualisation of automatically extracted knowledge across media and languages. This case study provides the integration of PHEME’s technology into a (hospital-based) health records application and methodological and user verification, for the ultimate goals of monitoring health related rumours and misinformation in social media (http://www.pheme.eu/). To this end, there are 4 main demonstration studies being considered in WP7. The aims of these studies will be to: 1) Identify social media attitudes towards certain medications and how these are present in clinical records. 2) Monitor the emergence of novel psychoactive substances in social media, and identify if and how promptly they appear in clinical records. 3) Explore how mental health stigma arises in social media and presents in clinical records. 4) Ascertain the type of influence social media might have on young people at risk of self-harm or suicide. 5 In order to detect rumours, particularly those