This Thesis Has Been Submitted in Fulfilment of the Requirements for a Postgraduate Degree (E.G
Total Page:16
File Type:pdf, Size:1020Kb
This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (e.g. PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work is protected by copyright and other intellectual property rights, which are retained by the thesis author, unless otherwise stated. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. A Systems Biological Approach to Parkinson’s Disease Katharina Friedlinde Heil I V N E R U S E I T H Y T O H F G E R D I N B U Doctor of Philosophy Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh 2017 Abstract Parkinson’s Disease (PD) is the second most common neurodegenerative disease in the Western world. It shows a high degree of genetic and phenotypic complexity with many implicated factors, various disease manifestations but few clear causal links. Ongoing research has identified a growing number of molecular alterations linked to the disease. Dopaminergic neurons in the substantia nigra, specifically their synapses, are the key-affected region in PD. Therefore, this work focuses on understanding the disease effects on the synapse, aiming to identify potential genetic triggers and synaptic PD associated mechanisms. Currently, one of the main challenges in this area is data quality and accessibility. In order to study PD, publicly available data were systematically retrieved and analysed. 418 PD associated genes could be identified, based on mutations and curated annotations. I curated an up-to-date and complete synaptic proteome map containing a total of 6,706 proteins. Region specific datasets describing the presynapse, postsynapse and synaptosome were also delimited. These datasets were analysed, investigating similarities and differences, including reproducibility and functional interpretations. The use of Protein-Protein-Interaction Network (PPIN) analysis was chosen to gain deeper knowledge regarding specific effects of PD on the synapse. Thus I generated a customised, filtered, human specific Protein-Protein Interaction (PPI) dataset, con- taining 211,824 direct interactions, from four public databases. Proteomics data and PPI information allowed the construction of PPINs. These were analysed and a set of low level statistics, including modularity, clustering coefficient and node degree, explaining the network’s topology from a mathematical point of view were obtained. Apart from low-level network statistics, high-level topology of the PPINs was stud- ied. To identify functional network subgroups, different clustering algorithms were investigated. In the context of biological networks, the underlying hypothesis is that proteins in a structural community are more likely to share common functions. There- fore I attempted to identify PD enriched communities of synaptic proteins. Once iden- tified, they were compared amongst each other. Three community clusters could be identified as containing largely overlapping gene sets. These contain 24 PD associ- ated genes. Apart from the known disease associated genes in these communities, a total of 322 genes was identified. Each of the three clusters is specifically enriched for specific biological processes and cellular components, which include neurotransmitter secretion, positive regulation of synapse assembly, pre- and post-synaptic membrane, iii scaffolding proteins, neuromuscular junction development and complement activation (classical pathway) amongst others. The presented approach combined a curated set of PD associated genes, filtered PPI information and synaptic proteomes. Various small- and large-scale analytical approaches, including PPIN topology analysis, clustering algorithms and enrichment studies identified highly PD affected synaptic proteins and subregions. Specific disease associated functions confirmed known research insights and allowed me to propose a new list of so far unknown potential disease associated genes. Due to the open design, this approach can be used to answer similar research questions regarding other complex diseases amongst others. iv Lay Summary Parkinson’s Disease is a brain disease with extreme consequences for patients, their families and carers. Treatment only moderates the symptoms and the number of pa- tients is growing on a daily basis. Many research projects identified dysfunctioning intracellular processes mainly lo- cated in a specific part of brain cells. This part, the synapse, is in charge of transport- ing information and its dysfunction leads to known disease symptoms such as tremor, shuffling gait and less known non-motor symptoms such as enhanced sweating. Parkin- son’s can currently only be diagnosed at a stage when brain-cells are dying, making it very hard to treat the disease effectively. Another challenge are the very individual symptoms the disease provokes in patients. A number of dysfunctions are known to appear in the brain cells of patients, but not all of them can be found in all individuals. Therefore this thesis aims towards gaining better understanding of specific disease causes. New knowledge could then help to develop better treatment or even a disease cure. To work towards this aim different systems biological analytical steps were car- ried out. 418 genes which have shown to be affected in Parkinson’s Disease patients were identified. The synapse was analysed and around 6,500 genes were identified in this brain-cell region. To understand the disease influence on the synapse, so called large-scale approaches are required. Protein-Protein-Interaction Networks were used to analyse how proteins interact and allow to identify gene groups which are in charge of specific synaptic functions. Parkinson’s Disease associated genes could be located in the network. By doing so three gene groups with an unexpected, significantly high number of disease associated genes were identified. Apart from the disease genes these contained a set of other genes which were analysed in-depth. It was possible to determine their over- all function which is affected under disease conditions. Amongst others the release of neurotransmitters, the main component of information exchange between brain cells as well as structural aspects, guaranteeing protein interactions and their full functionality could be identified. The set of around 150 specific genes can now be used to i) set up more targeted experiments, ii) help to identify different disease types and iii) develop new treatments. Overall it would not have been possible to obtain these results without the use of large- scale analytical approaches. Hence this work highlights their potential and promising application in the research field of complex diseases. v Acknowledgements It’s been four years since I embarked on this journey and many things have happened, changed, moved forward, and developed. It was not always easy and often felt like a ride on a roller-coaster - during a day, a week, a month, and just generally over the whole four years. Many moments and experiences helped me to keep going, but it was really the people I encountered on this journey who helped the most. Some accompanied me over the entire time, others in specific moments, and everybody has left an imprint on my PhD experience and shaped me - to be the person I am today. Therefore I owe all of you a big, big thank you! Said that, there are a number of people I would like to thank in particular. Douglas Armstrong, you’ve been my supervisor at the University of Edinburgh. Your vision of the scientific future encouraged and motivated me. Thinking about the potential impact my/our research can have has helped me through many moments of struggle, specifi- cally, when the “simple things” proved to be more challenging than expected. Oksana Sorokina, you have accompanied me, especially through the first years of my stud- ies. Thank you for the introduction to network biology and kappa modelling. Colin Mclean, our chats about the concepts and maths behind networks were interesting and helpful and I am also specifically grateful about all your advice during the writing up stage. It has been great to get access to some of your code, learn from it, and use it to answer similar, but different, research questions. Your patience was highly appre- ciated and helped me to grow my knowledge in theoretical network analysis. Thank you David Sterratt for letting me work with you over many, many months. Having a joint project showed me how active exchange can help to solve problems, advance together and motivate each other. I am grateful to have been part of the “review-paper team” and look forward to the day when we submit the work for publication. Emilia Wysocka, you have also been part of that team and your critical thoughts and questions often helped me to rethink concepts that I might have taken for granted. Our discus- sions highly stimulated my thinking process and I hope that I did not distract you too often from your project while asking questions that were slightly off track. Grant Robertson, it has been very valuable to share my research with you. I’ve been uplifted by the constructive conversations that could often shed light over my questions. It was also good to see, that work was not always completely focused on my research topic. I am also very thankful to my second supervisor, Jeanette Hellgren Kotaleski at KTH in Stockholm, who, together with Olivia Eriksson, always found time to listen and give helpful advice. It was great to be welcomed by you and the group during my vii visits at KTH and SciLifeLab - two places that got to be my second research home.