A Bioinformatic Analysis of the Role of Mitochondrial Biogenesis in Human Pathologies
Total Page:16
File Type:pdf, Size:1020Kb
A bioinformatic analysis of the role of mitochondrial biogenesis in human pathologies Robert Bentham A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of University College London. Department of Cell and Developmental Biology University College London Monday 11th July, 2016 Declaration I, Robert Bentham, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the work. July, 2016 Robert Bentham 2 Abstract Disease states are often associated with radical rearrangements of cellular metabolism; suggesting the transcriptome underlying these changes follows a distinctive pattern. Identification of these patterns is complicated by the hugely heterogeneous nature of these diseases, such as cancer, and the patterns remain hidden within noise of large datasets. A new biclustering algorithm called Massively Correlating Biclustering (MCbiclust) was developed to identify these patterns. Taking a large gene set such as those known to be associated with the mitochondria, samples are selected in which these genes are highly correlated. Rigorous benchmarking of this method with other biclustering methods on synthetic gene expression data and an E. coli data set show it to be superior in finding these patterns. This method was used to identify the role mitochondrial biogenesis plays in cancer; applied on the Cancer Cell Line Encyclopedia (CCLE) it identified differences in mitochondrial function based on the different tissue of origin of the cell line. In patient breast tumour samples a change in mitochondrial function was identified and linked to differences in known breast cancer subtypes. Breast cancer cell lines were identified that matched this pattern. Experimen- tally testing these cell lines confirming the significant difference in gene expression expected and also showed significant changes in mitochondrial function demonstrated by measurements in oxygen consumption, proteomics and metabolomics. MCbiclust has been developed into an R package. Using this method, new cancer subtypes can be identified, based on fundamental changes to known pathways. The benefit is twofold: first to increase understanding of these complex systems and second to guide treatment using drug compounds known to target these pathways. The methods described here while applied to cancer and mitochondria, are versatile and can be applied to any large dataset of gene expression measurements. 3 Acknowledgements First, I would like to express my utmost thanks to my supervisors Professor Gyorgy Szabadkai and Dr. Kevin Bryson; without their support, guidance, expert knowledge, kindness, and access to the Department of Computer Science coffee machine, this work could have never been completed. I would like to thank the British Heart Foundation for funding my research and giving me the financial backing that I vitally needed. I would like to thank Professor Michael Duchen and everyone involved with the Szabadkai Lab both past and present: Drs. Jose Vicencio, Zhi Yao, Ronan Astin, Will Kotiadis, Thomas Blacker and Nicoletta Plotegher for their patience in helping me with experimental techniques and welcoming me to the lab. I would also like to thank my fellow PhD students: Julia Hill, Jenny Sharpe, Pedro Dias, Stephanie Sundier, Neta Amior and Gauri Bhosale, all of whom helped me enormously and better than that made the entire experience fun! I would also like to thank Sam Ranasinghe for his work in maintaining much of the equipment in the lab and teaching me how to use the microscopes. I would like to thank everyone involved in CoMPLEX, a department whose exis- tence made possible my transition from mathematics to biological research, and without which I certainly would never have embarked on this work. I thank my fellow Szabadkai lab PhD student Michella Menegollo at the University of Padova who greatly contributed to the experimental work of this project, and Dr. Mariia Yuneva of the Crick Institute for her collaboration and help on this project. Finally, I would have never been able to complete this huge undertaking without the constant support of my family and friends. 4 List of my publications The following publications were produced during my PhD but not related to the topic of this thesis: Astin, R., Bentham, R., Djafarzadeh, S., Horscroft, J. A., Kuc, R. E., Leung, P. S., Skipworth, J. R., Vicencio, J. M., Davenport, A. P., Murray, A. J. et al. (2013), ‘No evidence for a local renin-angiotensin system in liver mitochondria’, Scientific reports 3. Tosatto, A., Sommaggio, R., Kummerow, C., Bentham, R. B., Blacker, T. S., Berecz, T., Duchen, M. R., Rosato, A., Bogeski, I., Szabadkai, G. et al. (2016), ‘The mito- chondrial calcium uniporter regulates breast cancer progression via hif-1a’, EMBO molecular medicine 8(5), 569–585. 5 Contents Declaration 2 Abstract 3 Acknowledgements 4 List of my publications 5 List of Figures 10 List of Tables 13 Abbreviations 15 1 Introduction 19 1.1 Mitochondria . 19 1.1.1 The basics of mitochondrial function . 19 1.1.2 The role of mitochondria in apoptosis and their evolutionary history . 21 1.2 Mitochondrial heterogeneity . 23 1.2.1 The mitochondrial proteome . 23 1.2.2 Variation across tissues and in disease . 25 1.3 Mechanisms of regulation of the mitochondria . 27 1.3.1 Epigenetics . 28 1.3.2 Mitochondrial degradation, quality control and turnover . 29 1.3.3 Mitochondrial biogenesis . 34 6 1.3.4 The transcription factor network underlying mitochondria bio- genesis . 38 1.4 Mitochondria and disease . 52 1.4.1 Cancer . 52 1.4.2 Heart disease . 56 1.4.3 Neurodegeneration, diabetes and ageing . 57 1.5 Investigating the regulation of mitochondria . 60 1.5.1 Experimental methods . 60 1.5.2 Bioinformatics . 61 1.6 Overview and aims of thesis . 68 2 A novel biclustering algorithm 70 2.1 Introduction . 70 2.2 Massively correlated biclustering (MCbiclust) . 74 2.2.1 Defining a method of measuring bicluster quality . 75 2.2.2 A stochastic greedy search for biclusters . 78 2.2.3 Pruning the bicluster . 79 2.2.4 Extending the bicluster . 80 2.2.5 Analysing the bicluster . 81 2.2.6 Thresholding the bicluster . 83 2.2.7 Methods for dealing with multiple runs . 84 2.3 Benchmarking of massively correlated biclustering on a simulated dataset 87 2.3.1 Generation of artificial data . 87 2.3.2 Means of comparison between different biclustering methods . 89 2.3.3 Biclustering methods . 92 2.3.4 Comparison of different biclustering methods . 95 2.4 Case study: Escherichia coli expression data . 98 2.4.1 Rationale . 98 2.4.2 Finding the number of distinct biclusters . 99 2.4.3 Analysis of different bicluster patterns . 101 2.4.4 Analysis of random probe sets . 105 2.5 Conclusion . 107 7 3 Bioinformatic analysis of mitochondrial biogenesis in disease 109 3.1 Introduction . 109 3.1.1 Hypertrophic Cardiomyopathy (HCM) . 110 3.1.2 Cancer cell lines . 112 3.2 Bioinformatic analysis of mitochondrial biogenesis in hypertrophic cardiomyopathy . 113 3.2.1 The data . 113 3.2.2 Silhouette plots and ranking the samples . 114 3.2.3 Comparing the biclusters . 116 3.3 Bioinformatic analysis of mitochondrial biogenesis in cancer cell lines . 124 3.3.1 The data . 124 3.3.2 Silhouette plots and comparison . 124 3.3.3 Understanding the biclusters . 125 3.3.4 Copy number differences . 129 3.3.5 Pharmacology differences . 133 3.4 Conclusion . 136 4 Bioinformatic analysis of mitochondrial biogenesis in breast cancer 140 4.1 Introduction . 140 4.1.1 Breast cancer . 140 4.1.2 Intrinsic subtypes of breast cancer . 143 4.1.3 Examining mitochondrial biogenesis in breast cancer . 147 4.2 Bioinformatic analysis of a breast cancer sample dataset . 147 4.2.1 Using a new gene set . 147 4.2.2 The data . 150 4.2.3 Finding a mitochondrial related bicluster in a breast cancer dataset150 4.2.4 Mutational alterations behind the bicluster . 156 4.3 Identification of a similar bicluster in a breast cancer cell line dataset . 160 4.3.1 The data . 160 4.3.2 Point Scoring algorithm . 161 4.3.3 Selecting breast cancer cell lines . 162 8 4.4 Experimental study of mitochondrial function in different breast cancer cell lines . 164 4.4.1 Methodology . 164 4.4.2 Results . 170 4.5 Conclusion . 177 5 Conclusions 181 Bibliography 185 Appendices 221 A MCbiclust - an R package for massively correlated biclustering 221 A.1 About . 221 A.2 Installation . 221 A.3 Example workflow . 222 B Gene set enrichment result tables 227 C Nanostring gene set 270 D Materials 274 9 List of Figures 1.1 The basic stucture of a mitochondrion . 20 1.2 Oxidative phoshporylation (OXPHOS) system and citric acid cycle within the mitochondrion. 22 1.3 Variation of protein abundance of mitochondrial genes. 25 1.4 Methods of quality control . 30 1.5 Simplified overview of the mitochondrial biogenesis transcription factor network . 38 1.6 Regulation of peroxisome proliferator-activated receptor gamma coacti- vator 1-a (PGC-1a)...........................46 1.7 Mitochondrial dysfunction in the hallmarks of ageing and cancer . 53 1.8 The Warburg effect . 54 1.9 An RNA sequencing (RNA-seq) experiment . 63 2.1 Two models of mitochondrial biogenesis in gene expression data . 71 2.2 Different types of biclusters . 72 2.3 Work pipeline of Massively Correlating Biclustering (MCbiclust) for a single run . 76 2.4 Work pipeline of MCbiclust for multiple runs . 77 2.5 A visual explanation of silhouette widths . 86 2.6 Pipeline used to compare different biclustering algorithms on the syn- thetic data . 91 2.7 Jaccard index matrix from two different discovered MCbiclust patterns compared to the same synthetic bicluster . 94 2.8 Principal component plots from synthetic data results.