Download Thesis
Total Page:16
File Type:pdf, Size:1020Kb
This electronic thesis or dissertation has been downloaded from the King’s Research Portal at https://kclpure.kcl.ac.uk/portal/ Big data approaches to investigating Child Mental Health disorder outcomes Downs, Jonathan Muir Awarding institution: King's College London The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement. END USER LICENCE AGREEMENT Unless another licence is stated on the immediately following page this work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence. https://creativecommons.org/licenses/by-nc-nd/4.0/ You are free to copy, distribute and transmit the work Under the following conditions: Attribution: You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work). Non Commercial: You may not use this work for commercial purposes. No Derivative Works - You may not alter, transform, or build upon this work. Any of these conditions can be waived if you receive permission from the author. Your fair dealings and other rights are in no way affected by the above. Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 07. Oct. 2021 BIG DATA APPROACHES TO INVESTIGATING CHILD MENTAL HEALTH DISORDER OUTCOMES JOHNNY DOWNS Thesis submitted for the degree of Doctor of Philosophy September 2017 Department of Psychological Medicine Institute of Psychiatry, Psychology & Neuroscience King’s College London ABSTRACT Background: In the UK, administrative data resources continue to expand across publically funded youth-orientated health, education and social services. Despite attempts to capture these data in structured formats, which are more accessible for analysis, most health information is stored as free text entry in electronic records. Big data techniques which combine large scale data linkage and automatic information extraction from free text, using Natural Language Processing (NLP), have considerable potential for enhancing the depth of information available from routinely collected public service data. There are a very limited number of published studies which have applied these big data techniques to answer questions relevant to child and adolescent psychiatry. Methods: This thesis examined original and clinically relevant research questions using data from routinely collected electronic health records, enriched by NLP and linkages to external data sources. Five related studies were performed all using data obtained from the SLaM BRC Case Record Information Search (CRIS) extracted using a NLP approaches, with two studies using external linkages with routinely collected national electronic datasets (NHS Hospital Episode Statistics and DfE National Pupil Database, NPD). Results: Using these data resources, I provide empirical support for the hypothesis that neurodevelopmental comorbidities increase children and adolescents’ risk for potentially more harmful treatments, greater treatment complexity and worse clinical outcomes. The NLP methods employed overcame limitations of structured data extraction, providing better assessment of a diverse range of symptom types, severity and related impairments, including suicidal risk, negative symptoms, antipsychotic treatment failure, and self-harm. External data linkages with the NPD enabled population level analyses by nesting clinical samples within their source population. NPD linkage also permitted the inclusion of education performance data, which were not routinely available within electronic health records. Conclusion: The thesis illustrates how the legal, governance and technical challenges were surmountable to enable linkage between NHS and Department for Education public service data. Also, it demonstrated that NLP and data linkages of electronic health records, have a clear role in clinical epidemiological studies of child and adolescent mental health. These tools, combined with the continued digitisation of public service activity, can unlock huge and detailed data resources for population-based analyses. However, current approaches have deficiencies, including limitations in accuracy, construct validity, and restrictions in the data available, providing challenges for future research. 2 ACKNOWLEDGEMENTS I have been extremely fortunate to have Dr Richard Hayes, Professor Tamsin Ford and Professor Matthew Hotopf as supervisors. It was Tamsin and Matthew’s generosity and guidance, which kick-started this research, turning a set of disparate ideas into a MRC funded research project. Richard was the catalyst to finally undertaking the PhD, and I am very grateful for all his support over the course of the last 4 years. He has been a tremendous mentor, balancing my impetuous nature, and kindly reinforcing what I need to do to move from a research idea, to a grant application, and onto published articles. Richard, Tamsin and Matthew continue to inspire me, and fuel my post-doctoral aspirations for child and adolescent mental health research. This PhD has been the most enjoyable period of my working life, which I believe is due to them, and the research environments they have created. I would also like to thank Professor Robert Stewart and Matthew Broadbent who have had a considerable influence on the work within this thesis. Over the course of my PhD, they have developed the BRC Nucleus into a lively, multi-disciplinary workspace; it has provided me with rich opportunities to build valuable friendships and collaborations. Of these, a special mention must go to Dr Laura Pina, who was wonderful to work with, and whose drive and enthusiasm for understanding what might improve lives of children with early onset psychosis, was infectious. Thank you to Professor Ruth Gilbert, who has kept me alert to some of the methodological flaws commonly hidden within data linkage research, but also introduced me to key researchers in the child health and education policy arena, and generously supported me as an early career researcher. Thank you to Richard White, the National Pupil Database Team, and Dr Murat Soncul for helping me drive through the innovative linkage work. I am very grateful to Professor Emily Simonoff, Dr James MacCabe and Dr Rashmi Patel for all their contributions to the research contained in this thesis, and for all their help in shaping my future research ideas. Similarly, I would like to thank Dr Kate Polling, Dr Sumithra Velupillai, Dr Rina Dutta, and Dr George Gkotsis for their important contributions to the thesis and being great colleagues. Thanks to Dr Omer Moghraby for supporting me in keeping my hand in clinically, and his enthusiasm for applying research into clinical practice. I am incredibly grateful for all the practical and moral support that Megan Pritchard, Leo Koeser, Clare Taylor, Craig Colling, Anna Kolliakou, David Chandran, Ryan Little, Jyoti Jyoti, Debbie Cummings, Amelia Jewell and Hitesh Shetty have given me whilst I’ve been at the BRC. Poppy, thanks for keeping it real. None of this work would have been possible without your love. 3 TABLE OF CONTENTS ABSTRACT 2 ACKNOWLEDGEMENTS 3 LIST OF TABLES 10 LIST OF FIGURES 13 LIST OF ABBREVIATIONS 15 PUBLICATIONS 16 CONTRIBUTION STATEMENT 19 ETHICS STATEMENT 20 CHAPTER 1. INTRODUCTION 21 1.1 What are Big Data? 22 1.2 Current challenges for Child & Adolescent Mental Health Epidemiology 23 1.2.1 Issues with conventional epidemiological study approaches: randomised control trials 23 1.2.2 Issues with conventional epidemiological study approaches: cross-sectional surveys 24 1.2.3 Issues with conventional epidemiological study approaches: prospective cohorts 25 1.3 Why do we need ‘Big data’ for child and adolescent mental health research? 26 1.4 Psychiatric epidemiology and Big Data 27 1.4.1 Data linkage of clinical and social care data 27 1.4.2 Natural language processing and the Electronic Health Records 30 1.4.3 Applying Natural Language Processing to Electronic Health Records 32 1.4.4 Natural language processing and its application in child and adolescent psychiatric epidemiology 36 1.4.5 Examples of other Big data methodologies 45 1.5 Aims and structure of this thesis. 45 CHAPTER 2. CLINICAL PREDICTORS OF ANTIPSYCHOTIC USE IN CHILDREN AND ADOLESCENTS WITH AUTISM SPECTRUM DISORDERS: A HISTORICAL OPEN COHORT STUDY USING ELECTRONIC HEALTH RECORDS 49 2.1 Summary 50 2.2 Introduction 51 2.3 Methods 52 2.3.1 Study Setting 52 The Clinical Record Interactive Search (CRIS) system 53 2.3.2 Study sample 54 4 2.3.3 Measurements 55 Outcome: antipsychotic use 55 Exposure: psychiatric comorbidity & intellectual disability 56 Covariates: 56 2.3.4 Analysis 57 2.4 Results 58 2.4.1 Demographic and clinical characteristics of the sample 58 2.4.2 Authentication of co-morbid diagnoses against the SDQ 60 2.4.3 Socio-demographic and clinical factors and their associations with antipsychotic treatment 61 2.4.4 Sensitivity analysis 64 2.5 Discussion 65 2.5.1 Strengths 66 2.5.2 Limitations 66 2.5.3 Conclusions 67 CHAPTER 3. DETECTION OF SUICIDALITY IN ADOLESCENTS WITH AUTISM SPECTRUM DISORDERS: DEVELOPING A NATURAL LANGUAGE PROCESSING APPROACH FOR USE IN ELECTRONIC HEALTH RECORDS 69 3.1 Summary 70 3.2 Introduction 71 3.3 Materials and Methods 74 3.3.1 Data resources 74 3.3.2 Overall workflow 75 3.3 Results 78 3.3.1 Distribution of SR annotation within the random selection of test and training set documents 78 3.3.2 Adaptions to the NLP tool following test and training 78 3.3.3 Performance NLP tool on SR test and training set documents 79 3.3.4 Manual review of NLP and gold-standard discrepancies 80 3.4 Discussion 83 3.4.1 Strengths 84 3.4.2 Limitations 85 3.4.3 Conclusion 85 5 CHAPTER 4. THE ASSOCIATION BETWEEN CO-MORBID AUTISM SPECTRUM DISORDERS AND ANTIPSYCHOTIC TREATMENT FAILURE IN EARLY-ONSET PSYCHOSIS: A HISTORICAL COHORT STUDY USING ELECTRONIC HEALTH RECORDS.