Building Models of Small DNA Control Elements for Prediction Of
Total Page:16
File Type:pdf, Size:1020Kb
BUILDING MODELS OF SMALL DNA CONTROL ELEMENTS FOR PREDICTION OF TRANSCRIPTION FACTOR ACTIVITY A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE FACULTY OF SCIENCE AND ENGINEERING 2020 By Jose´ Luis Hernandez´ Dom´ınguez School of Computer Science Contents Abstract 11 Declaration 12 Copyright 13 Acknowledgements 14 1 Introduction 15 1.1 Motivation . 16 1.2 Contribution of the thesis . 17 1.2.1 Contributions breakdown . 20 1.3 Thesis overview . 21 2 Background information 23 2.1 Biological background . 23 2.1.1 Cells . 24 2.1.2 Molecular biology central dogma . 25 2.1.3 Control elements of protein regulation . 30 2.1.4 Experimental methods for extracting transcription factors in- teractions . 34 2.1.4.1 ChIP-chip . 34 2.1.4.2 ChIP-seq . 35 2.1.4.3 Homology and orthology . 38 2.1.5 The regulatory network of transcription factors . 39 2.2 Networks: connecting the world . 41 2 2.2.1 Topological properties of networks . 42 2.2.2 Complex networks models . 43 2.2.3 Biological networks . 47 2.3 Mathematical modelling of the TF regulatory network . 48 2.4 Transcription Factors basal regulatory network . 52 2.4.1 Historical background . 52 2.4.2 Modelling the basal regulatory network . 54 2.4.3 Interaction network and strength interaction extraction method . 60 2.4.4 Self-loops and motifs . 62 3 Methodology 66 3.1 Chapter overview . 66 3.2 Complexity reduction . 70 3.2.1 Streamline of known control processes . 71 3.2.2 Biological model simplification . 72 3.3 Building the transcription factor network . 73 3.3.1 Experimental model database . 73 3.3.2 Identifying the databases for the TFs basal regulatory network 75 3.3.3 Data extraction for the human model . 79 3.3.4 Creating the TFBRN . 80 3.3.5 CLIQUE database . 84 3.3.6 Cancer related databases . 85 3.3.7 Data extraction for the yeast model . 85 3.4 Mathematical model . 86 3.5 Impact of the network . 89 3.5.1 Standard impact . 90 3 3.5.2 Alternative impact . 92 3.5.3 Eigencentrality . 94 3.6 Impact of the links . 95 3.7 Experimental design . 96 4 Exploration of the model 101 4.1 Exploration of the RAW model . 101 4.2 Using the RAW model as measure of centrality . 107 5 Results and discussion of the human model 113 5.1 Analysis of the topology of the network . 114 5.2 Impacts of the network . 116 5.2.1 Impacts of the network based on the RAW model . 116 5.2.2 Relationship between topological characteristics, the impacts of the network and VRD . 117 5.2.3 Impacts and Self-loops . 120 5.2.4 Correlation by Self-loops . 121 5.3 Impact of the network using the Prime model . 125 5.3.1 Correlations of impacts with VRD . 126 5.3.1.1 Correlation of impacts with VRD: only self-loops . 128 5.3.1.2 Correlation of impacts with VRD: without self-loops 130 5.3.2 Correlation of impacts with VRD: positive self-loops . 131 5.3.3 Correlation of impacts with VRD: Negative self-loops . 133 5.4 Self-loops structure experiments . 135 6 Results and discussion of the yeast model 140 6.1 Analysis of the topology of the network . 143 6.2 Impacts of the network . 145 4 6.2.1 Impacts of the network based on the RAW model . 145 6.2.2 Correlation between characteristics and phenotype . 147 6.2.3 Impact and self-loops . 148 6.2.4 Correlation by self-loop . 149 6.3 Impacts of the network using the Prime model . 151 6.3.1 Impacts and Phenotype . 152 6.3.2 Impacts and Phenotype - positive self-loops . 154 6.3.3 Impacts and Phenotype - negative self-loops . 155 6.4 Self-loops analysis . 156 7 Discussion 159 7.1 Main findings . 159 7.1.1 Yeast main findings and contributions . 161 7.2 Results comparative with existing research . 163 7.3 Limitation of the thesis . 164 7.4 Implications of the study . 167 7.5 Future work . 168 Bibliography 170 A Human results 204 B Yeast results 220 5 List of Tables 2.1 Databases analysis . 59 2.1 Databases analysis . 60 3.1 Databases analysis . 76 3.1 Databases analysis . 77 5.1 Correlation: Impacts, topological characteristics, and cancer on full network — Human . 118 5.2 Correlation: Impacts, topological characteristics, and Diseases on full network — Human . 119 5.3 Correlation between impacts, topological characteristics and eigencen- tralities — Human . 120 5.4 Correlation: Impact of the network, topological characteristics, and cancer on self-loops network — Human . 122 5.5 Correlation: Impact of the network, topological characteristics, and diseases on self-loops network — Human . 123 5.6 Correlation: Impact of the network, topological characteristics, and cancer on non-self-loops network — Human . 124 5.7 Correlation: Impacts, topological characteristics, and cancer on non- self-loops network — Human . 124 6.1 Correlation: Impacts, topological characteristics, and Phenotype — Yeast147 6.2 Correlation: Impacts and topological characteristics — Yeast . 148 6.3 Correlation: Impacts and topological characteristics with self-loops — Yeast . 150 6 6.4 Correlation: Impacts and topological characteristics without self-loops — Yeast . 151 7 List of Figures 2.1 Structure of the cell . 25 2.2 Molecular biology central dogma as proposed by Crick . 26 2.3 Simplified model of the molecular biology dogma . 26 2.4 Gene structure . 27 2.5 Transcription process . 28 2.6 Translation process . 29 2.7 DNA compression due to histones modification . 31 2.8 Reference of one ChIP-seq process part . 37 2.9 Simplified example of a genome track . 38 2.10 Transcription factors regulatory network . 40 2.11 Basal regulatory network . 40 2.12 Elements of a network . 41 2.13 Random network . 45 2.14 Small-world network . 46 2.15 Scale-free network . 47 3.1 Complexity reduction of the model . 72 3.2 Transcription factor concentration control flow . 73 3.3 File output from the extraction of each database . 81 3.4 Creation of the adjacency matrix . 82 3.5 CLIQUE disease-gene matching . 84 3.6 Flow of a network . 94 4.1 All subnetworks and networks for the analysis . 102 4.2 Uniform single value model analysis . 103 8 4.3 Randomised initial condition for the model analysis . 104 4.4 Randomised initial conditions of the human subnetwork run 1,000 times 104 4.5 Randomised initial conditions of the human random network run 1,000 times . 105 4.6 Randomised initial conditions of the yeast subnetwork run 1,000 times 106 4.7 Randomised initial conditions of the yeast random network run 1,000 times . 106 4.8 Example of the impact of two TFs in the subnetwork . 108 4.9 Example of the impact difference of two TFs in the subnetwork . 109 4.10 Comparison of the different centrality measures of the human subnetwork110 4.11 Comparison of the different centrality measures of the human random network . 110 4.12 Comparison of the different centrality measures of the yeast subnetwork 111 4.13 Comparison of the different centrality measures of the yeast random network . 111 5.1 Network representation — Human . 114 5.2 In-degree distribution — Human . 115 5.3 Out-degree distribution — Human . 115 5.4 Sorted standard impact, full network — Human . 116 5.5 Sorted alternative impact, full network — Human . 117 5.6 Sorted standard impact, full network self-loops highlighted — Human 121 5.7 Sorted alternative impact, full network self-loops highlighted — Human 121 5.8 Self-loops distribution — Human . 126 5.9 Standard Impact on full network — Human . 127 5.10 Alternative Impact on full network — Human . 128 5.11 Standard Impact on TFs with self-loops — Human . 129 9 5.12 Alternative Impact on TFs with self-loops — Human . 129 5.13 Standard Impact on TFs without self-loops — Human . ..