Application of a Novel Triclustering Method in Analyzing Three Dimensional Transcriptomics Data
Total Page:16
File Type:pdf, Size:1020Kb
Application of A Novel Triclustering Method in Analyzing Three Dimensional Transcriptomics Data Dissertation for the award of the degree "Doctor of Philosophy" Ph.D. Division of Mathematics and Natural Sciences of the Georg-August-University, Goettingen within the doctoral Program for Environmental Informatics (PEI) of the Georg-August University School of Science (GAUSS) submitted by Anirban Bhar from Rishra, India Goettingen, 2015 Thesis committee: Prof. Dr. Edgar Wingender, Institute of Bioinformatics, University Medical Center, Georg-August University, Goettingen, Germany Prof. Dr. Stephan Waack, Theory and Algorithmic Methods, Institute of Computer Science, Georg-August University, Goettingen, Germany Members of the examination board: Referee: Prof. Dr. Edgar Wingender, Institute of Bioinformatics, University Medical Center, Georg-August University, Goettingen, Germany Co-referee: Prof. Dr. Stephan Waack, Theory and Algorithmic Methods, Institute of Computer Science, Georg-August University, Goettingen, Germany Other members of the examination board: Prof. Dr. Burkhard Morgenstern, Institute of Microbiology and Genetics, Department of Bioinformatics, Georg-August University, Goettingen, Germany Prof. Dr. Anita Schoebel, Institute for Numerical and Applied Mathematics, Georg- August University, Goettingen, Germany Prof. Dr. Tim Beissbarth, Department of Medical Statistics, University Medical Center, Georg-August University, Goettingen, Germany Prof. Dr. Dieter Hogrefe, Institute of Computer Science, Georg-August University, Goet- tingen, Germany Date of the oral examination : 24th March, 2015. I I hereby declare that I prepared the PhD thesis entitled \Application of A Novel Triclustering Method in Analyzing Three Dimensional Transcriptomics Data" on my own and with no other sources and aids than quoted. Anirban Bhar Dedicated to my family... III Contents 1 Introduction . 3 1.1 Central Dogma of Molecular Biology . 4 1.1.1 DNA . 4 1.1.2 RNA . 5 1.1.3 Protein . 6 1.1.4 Transcription . 6 1.1.5 Translation . 7 1.2 Microarray Technology . 8 1.2.1 cDNA Microarray . 8 1.2.2 Oligonucleotide Microarray . 9 1.3 Computational Analysis of Microarray Gene Expression Data . 10 1.3.1 Differential Expression Analysis . 10 1.3.2 Machine Learning in Mining Microarray Gene Expression Data . 11 1.3.3 Co-expression and Cluster Analysis . 13 1.4 Structure of the Thesis . 14 1.5 Bibliography . 15 2 A New Triclustering Approach for Unveiling Biological Processes of Disease Pro- gression from Gene Expression Profiles . 19 2.1 Introduction . 20 2.2 Materials and Methods . 20 2.2.1 Materials . 20 2.2.2 Methods: δ-TRIMAX . 21 2.3 Results and Discussion . 28 2.3.1 Results on Simulated Dataset . 28 2.3.2 Results on Real-life Dataset . 29 2.3.3 Biological Significance . 31 2.4 Conclusion . 38 2.5 Bibliography . 38 IV 3 Enhanced Multi-objective Triclustering Based on a Genetic Algorithm and Its Application in Revealing Biological Processes of Development . 49 3.1 Introduction . 50 3.2 Summary of δ-TRIMAX . 50 3.2.1 Aim of δ-TRIMAX . 50 3.2.2 Pitfalls of δ-TRIMAX . 50 3.3 Materials and Methods . 51 3.3.1 Materials . 51 3.3.2 EMOA-δ-TRIMAX . 54 3.3.3 Convergence of Solutions . 57 3.4 Results and Discussions . 58 3.4.1 Artificial Dataset . 58 3.4.2 Real-life Datasets . 60 3.4.3 Performance Comparison . 61 3.4.4 Identifying Key Genes of Triclusters and Analyzing Their Roles During hiPSC Differentiation into Cardiomyocytes . 72 3.5 Conclusion . 74 3.6 Bibliography . 75 3.7 Appendix . 89 4 Speculating about the Role of ZEB2 During Stem Cell Differentiation into Car- diomyocytes . 99 4.1 Introduction . 100 4.2 Materials and Methods . 101 4.2.1 Dataset . 101 4.2.2 Methods . 101 4.3 Results and Discussion . 102 4.3.1 SCCs of Module 1 . 104 4.3.2 SCCs of Module 2 . 104 4.3.3 SCCs of Module 3 . 105 4.3.4 SCCs of Module 4 . 105 4.3.5 Elucidating the Roles of ZEB2 and SMADs Transcription Factors Dur- ing Cardiac Development . 110 4.4 Conclusion . 114 4.5 Bibliography . 115 V 5 Co-regulation Analysis of Time Series Transcriptomics Data Unveils the Roles of Three Node Feed-Forward Loops in Regulating Genes of Signaling Pathways of Breast Cancer Progression . 125 5.1 Introduction . 126 5.2 Materials and Method . 128 5.2.1 Dataset . 128 5.2.2 Methods . 129 5.3 Results and Discussion . 130 5.4 Conclusion . 142 5.5 Bibliography . 142 5.6 Appendix . 146 6 Unraveling Potential Signaling Pathways During the Exposure of Several Tissues to Different Toxicants in Different Species . 167 6.1 Introduction . 168 6.2 Materials and Methods . 168 6.2.1 Dataset 1 . 168 6.2.2 Dataset 2 . 168 6.2.3 Dataset 3 . 169 6.2.4 Dataset 4 . 169 6.2.5 Workflow . 169 6.3 Results and Discussion . 170 6.3.1 Results on Dataset 1 . 171 6.3.2 Results on Dataset 2 . 174 6.3.3 Results on Dataset 3 . 176 6.3.4 Results on Dataset 4 . 177 6.4 Conclusion . 181 6.5 Bibliography . 182 6.6 Appendix . 191 7 Elucidating the Importance of Clustering Replicates in Three Dimensional Mi- croarray Gene Expression Data . 271 7.1 Introduction . 272 7.2 Materials and Methods . 272 7.2.1 Dataset 1 (Accession no.- GSE11324) . 272 7.2.2 Dataset 2 (Accession Number- GSE35671) . 272 VI 7.2.3 Dataset 3 (Accession Number- GSE46280) . 273 7.2.4 Dataset 4 (Accession Number- GSE17693) . 273 7.2.5 Dataset 5 (Accession Number- GSE17933) . 273 7.2.6 Dataset 6 (Accession Number- GSE18858) . 273 7.2.7 Dataset 7 (Accession Number- GSE38513) . 273 7.2.8 Workflow . 274 7.3 Results and Discussion . 275 7.4 Conclusion . ..