Download Thesis
Total Page:16
File Type:pdf, Size:1020Kb
This electronic thesis or dissertation has been downloaded from the King’s Research Portal at https://kclpure.kcl.ac.uk/portal/ Integrated Approaches to the Risk Prediction of First-episode Psychosis Leirer, Daniel Jonathan Awarding institution: King's College London The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement. END USER LICENCE AGREEMENT Unless another licence is stated on the immediately following page this work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence. https://creativecommons.org/licenses/by-nc-nd/4.0/ You are free to copy, distribute and transmit the work Under the following conditions: Attribution: You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work). Non Commercial: You may not use this work for commercial purposes. No Derivative Works - You may not alter, transform, or build upon this work. Any of these conditions can be waived if you receive permission from the author. Your fair dealings and other rights are in no way affected by the above. Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 30. Sep. 2021 Integrated Approaches to the Risk Prediction of First-episode Psychosis Daniel Jonathan Leirer Supervisor: Prof. Richard Dobson Advisor: Prof. Robin Murray Department of Biostatistics and Health Informatics University of London, King’s College London This dissertation is submitted for the degree of Doctor of Philosophy King’s College London March 2018 "Die Welt zu durchschauen, sie zu erklären, mag großer Denker Sache sein. Mir aber liegt einzig daran, die Welt und mich und alle Wesen mit Liebe und Wunder betrachten zu können." Für Werner Declaration I hereby declare that except where specific reference is made to the work of others, the contents of this dissertation are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. The contents of this dissertation are the result of my own work, and are not the result of collaborations with others, unless specified in the text and acknowledgements in accordance with King’s College London regulations. This PhD Thesis contains 30,000 words, excluding bibliography and appendix. Daniel Jonathan Leirer March 2018 Acknowledgements This work would not have been possible without the contribution and support of many others. Foremost among them were my supervisors Richard Dobson, Steven Newhouse and Robin Murray, who allowed me to undertake this work and offered advice, guidance and support. I also would like to acknowledge Marta Di Forti, Diego Quattrone, Gerome Breen, Olesya Ajnakina and the entire GAP team without whom none of this would have been possible. I am especially grateful to Conrad Iyegbe who initially introduced me to this project, and offered his advice over several years. My deepest gratitude goes to Lorraine Gordon, Philip Asherson, Fruhling Rijsdijk and Baljinder Mankoo for their endless support when I needed it most. I would like to extend my gratitude to all the students and staff at the SGDP and IOPPN who have helped and kept me company over the last 4 years. Especially all the members of the Dobson lab. A special thanks goes to all the friends, family and co-workers who provided encouraging words and commented on early drafts of this work. I would like to thank Werner Leirer, Michelle Leirer, Anika Oellrich and Laura Kutt for reading the first drafts and giving feed- back. My thanks also go to Max Kerz for sharing ideas and providing motivation while writing. I would also like to thank my flatmates Marta Kutt, Neil Tan and Dickson Cheung for keeping my spirits up. Finally, I would like to thank the Guy’s and St’Thomas Charity and King’s Bioscience Insti- tute for generously supporting my studies and the MRC-Social, Genetic and Developmental Psychiatry Centre for its support. Abstract Psychosis is a complex condition that features in many psychiatric disorders, and significantly affects the quality of life for both patients and family members. As part of the Genetics and Psychosis (GAP) study, this thesis presents one of the largest blood gene expression datasets on first-episode psychosis patients to date. This work aimed to characterise the blood-based biological perturbations in psychosis and to investigate the predictive ability of gene expression data. Firstly changes in expression, between healthy controls and first-episode psychosis patients was explored, to identify genes associated with psychosis. I identified hundreds of differentially expressed genes and found associations to pathways involved in transcription, oxidative stress and viral replication. Secondly, network approaches were used to construct modules of genes based on co- expression. I identified modules correlated to the severity of positive symptoms, and enrich- ment for pathways associated with the stress response and multiple brain regions. Thirdly regularised generalised linear models with bootstrapping were used to generate predictions based on combinations of gene expression, genetic and demographic data. The highest performance was found for models incorporating gene expression data, with minimal improvement using additional data. Prediction accuracy for identifying psychosis samples was found to increase with severity of positive symptoms in schizophrenia samples, but not in other psychoses. Finally, machine learning methods were used on public schizophrenia gene expression data to build a variety of predictive models. These models were tested on the Genetics and Psychosis (GAP) gene expression data. The results show increased predictive performance on samples with a schizophrenia diagnosis, compared to other types of psychosis. Overall the thesis presents work analysing a novel gene expression dataset. The results suggest that blood gene expression signatures are more predictive for positive symptoms in schizophrenia than for other psychoses. This work also highlights expression differences in innate immune pathways and the stress response. Table of contents List of figuresx List of tables xii List of Algorithms xiii List of Acronyms and Abbreviations xiv 1 Introduction1 1.1 Psychosis . .1 1.1.1 Symptom Presentation in Psychosis . .2 1.1.2 First Episode Psychosis . .6 1.2 Psychosis Risk Factors . .7 1.2.1 Twin Studies and Heritability . .7 1.2.2 The Genetics of Psychosis . 13 1.2.3 Environmental Risk Factors . 16 1.2.4 Gene Environment interactions . 17 1.2.5 Stress Response and the HPA-axis . 18 1.3 Gene Expression Studies . 21 1.3.1 Gene Expression Studies of Schizophrenia and other Psychosis . 22 1.3.2 The rationale of using Blood for Transcriptomic studies in Psychiatry 23 1.4 Machine Learning for Psychosis . 23 1.5 Aims . 25 2 Methods 26 2.1 Datasets . 26 2.1.1 Genetics and Psychosis (GAP) study . 26 2.1.2 Chronic Schizophrenia Data Set . 31 2.2 Bio-informatics Methods . 32 Table of contents vii 2.2.1 Differential Gene Expression (DGE) Analysis . 32 2.2.2 Weighted Gene Co-Expression Network Analysis (WGCNA) . 32 2.2.3 Gene Enrichment Analysis . 33 2.3 Transcriptomic quality control Pipeline . 33 2.3.1 Gene Expression Preprocessing Overview . 34 2.3.2 Background Correction and Normalisation . 34 2.3.3 Network Analysis for Outlier detection . 34 2.3.4 Correcting for technical and other confounding variables . 35 2.3.5 Expression based Sex detection . 36 2.3.6 Probe Selection . 36 2.4 Machine Learning Methods . 37 2.4.1 Data Preparation . 37 2.4.2 Re-sampling Methods . 38 2.4.3 Generalised Linear Models with Regularisation (Glmnet) . 39 2.4.4 K-Nearest neighbour (KNN) . 39 2.4.5 Naive Bayes (NB) . 40 2.4.6 Random Forests (RF) . 40 2.4.7 Support Vector Machines (SVM) . 41 2.4.8 Artificial Neural Networks (ANN) . 41 2.4.9 Machine Learning Ensembles . 42 3 Differential Expression and Network Analysis 43 3.1 Introduction . 43 3.1.1 Aims . 44 3.2 Methods . 44 3.2.1 Gene Expression Data . 44 3.2.2 Linear Models for Microarray Data (LIMMA) . 45 3.2.3 Weighted Gene Co-Expression Network Analysis (WGCNA) . 45 3.2.4 Gene Enrichment Analysis . 45 3.2.5 Module correlation with Psychosis symptoms . 46 3.2.6 Estimating effect of Medication . 46 3.3 Results . 47 3.3.1 Demographics . 47 3.3.2 Differential expression . 47 3.3.3 Enrichment of Differentially Expressed Genes . 54 3.3.4 Weighted Gene Co-Expression Network Analysis Results . 56 3.3.5 Enrichment analysis of WGCNA modules . 56 Table of contents viii 3.3.6 WGCNA module relationship with Symptom Severity . 59 3.3.7 Medication . 64 3.4 Discussion . 64 3.4.1 Differential Expression Pathways associated with Innate Immunity . 64 3.4.2 Modules associated with Psychosis and Psychosis Severity . 65 3.4.3 Effects of Medication . 69 3.4.4 Conclusion . 70 4 Predictive Modelling using the GAP data 72 4.1 Introduction . 72 4.1.1 Aims . 73 4.2 Methods . 73 4.2.1 Gene Expression Data . 73 4.2.2 Polygenic Risk Score (PRS) . 73 4.2.3 Machine Learning Model for Classification . 74 4.2.4 Classification accuracy . 74 4.3 Results . 75 4.3.1 Polygenic Risk Score Imputation . 75 4.3.2 Performance of Classification Models . 75 4.3.3 Classification accuracy across Bootstrap Iterations . 75 4.3.4 Psychosis severity correlated with classification accuracy . 80 4.4 Discussion . 80 4.4.1 Classification Accuracy was highest for Schizophrenia linked Psychosis 80 4.4.2 Polygenic Risk Score provided no improvement in predictive power 83 4.4.3 Schizophrenia and Schizophreniform disorder comparison .