Omics Data Exploration: Across Scales and Dimensions

OMICS DATA EXPLORATION: ACROSS SCALES AND DIMENSIONS by Gang Su A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in the University of Michigan 2013 Doctoral Committee: Professor Brian D. Athey, Co-Chair Research Associate Professor Fan Meng, Co-Chair Professor Charles F. Burant Associate Research Scientist Barbara Mirel Assistant Professor Maureen Sartor DEDICATION I dedicate this thesis especially to my grandparents. My grandfather grew up in a shattered nation, and survived in countless brutal battles through WWII, but he never lost his hopes and disciplines, and lived happily even to this day. Whenever I talk with him, I realize that my peaceful life today is nothing but a gift from those great sacrifices the older generations have made – and I should struggle to make my own contributions in this new era and to make something better for the future. He wished his grandson could reach the summit of knowledge someday on behalf of the family, and now I am proudly fulfilling his wish. My grandmother, loved her family all her life, and loved me every second since my first day. I also dedicate this work to my parents, it’s with their persistent love and care that I had a carefree and beautiful childhood. My father was the inspiration for me to go out and see the world. As a sailor in his early years, his footprints covered the majority of the globe, leaving only Europe and North America uncovered due to the political segregation back then. I am now completing the left pieces of this big puzzle, and hopefully I will have much more in the future to pass on to. My mother developed my interest in reading, drawing and later in science. Her devotion and love to the family made me strong and resilient. And to my friends – I always believed that those who surround us define us. I am so grateful to the great experiences with them, and it’s beyond words. Last but not the least; I dedicate this work to dear Yoyo: my great friend, my career partner, and my life-long companion. She gave me the feeling of completeness, and made me a better individual everyday. There are so much more to explore ahead of us, in the writing of our stories together. Get ready; a new chapter is right ahead. ii ACKNOWLEDGEMENTS I would like to first acknowledge my advisor, Dr Fan Meng, for years of patient guidance and support through the course of my PhD. Also to sincerely thank Dr Brian Athey and Charles Burant, for rigorous and insightful academic advising, Dr Barbara Mirel for expert knowledge on user-interface design and workflows, as well as Dr Maureen Sartor for statistics and the structuring of the thesis. I couldn’t have come this far without this excellent interdisciplinary committee. I would also like to thank Dr David States and Steve Qin in the early rotation advising when I began my PhD. Last but not least, I would like to especially thank Dr Margit Burmeister. She has been a great mentor and a great friend, and offered me enormous help for both research and life. I would also like to acknowledge my friends. Ann Arbor has been a second hometown to me and my life is tied with the people here. PhD did take sometime and I have met several batches of friends. Their support and accompany gave me ideas, happiness, and a lot of memorable moments. I sincerely thank you all, for everything. iii TABLE OF CONTENTS Dedication .............................................................................................................................................. ii Acknowledgements ............................................................................................................................... iii List of Figures ........................................................................................................................................ vi List of Tables ....................................................................................................................................... viii Abstract ................................................................................................................................................ ix CHAPTER 1 INTRODUCTION ............................................................................................................................ 1 1.1 The Omics Landscape ...................................................................................................................... 3 1.2 Analytical Methods for Omics Datasets .......................................................................................... 7 1.3 Visual Exploration of Omics Datasets ............................................................................................. 9 1.4 Bioinformatics Challenges in the Omics Era .................................................................................... 9 1.5 My Thesis Work ............................................................................................................................. 15 1.6 Organization of this Thesis ............................................................................................................ 16 1.7 Professional Experience ................................................................................................................ 16 CHAPTER 2 TOOLS FOR EXPLORATION OF OMICS NETWORKS .............................................................................. 18 2.1 Introduction .................................................................................................................................. 18 2.2 GLay and igraph ............................................................................................................................ 20 2.3 Comparison of Different Community Structures ........................................................................... 26 2.4 GSearcher ...................................................................................................................................... 30 2.5 Other Related Tools ...................................................................................................................... 37 2.6 Summary ....................................................................................................................................... 38 CHAPTER 3 CROSS-OMICS KNOWLEDGE-MINING .............................................................................................. 46 3.1 Introduction .................................................................................................................................. 46 3.2 The analysis of NCI-60 Dataset ..................................................................................................... 49 3.3 Evaluation of Robust Correlations ................................................................................................ 57 3.4 Materials and Methods ................................................................................................................. 59 3.5 Results ........................................................................................................................................... 63 3.6 Discussion ..................................................................................................................................... 66 iv CHAPTER 4 COOLMAP FOR INTERACTIVE EXPLORATION OF OMICS DATA .............................................................. 74 4.1 Introduction .................................................................................................................................. 74 4.2 The origins of key design concepts ................................................................................................ 75 4.3 A brief history of Matrix Visualization .......................................................................................... 76 4.4 The overall design of CoolMap ...................................................................................................... 79 4.5 Application of CoolMap to Exploratory Data Analysis .................................................................. 82 4.6 Other Applications ........................................................................................................................ 86 4.7 Summary ....................................................................................................................................... 87 CHAPTER 5 CONCLUSION ............................................................................................................................. 99 APPENDIX ........................................................................................................................................... 105 CoolMap Implementation Details ..................................................................................................... 105 BIBLIOGRAPHY ................................................................................................................................... 120 v LIST OF FIGURES Figure 2-1 GLay result illustrations. ................................................................................................ 41 Figure 2-2 Comparison of GLay and MCODE results. ............................................................... 42 Figure 2-3 Robustness test on community algorithms. ................................................................ 43 Figure 2-4 GSearcher result illustration. ......................................................................................... 44 Figure 2-5 Node

Omics Data Exploration: Across Scales and Dimensions

Choudhury 22.11.2018 Suppl

RNA-‐Seq: Revelation of the Messengers

Supplementary Materials

Software List for Biology, Bioinformatics and Biostatistics CCT

Rnaseq2017 Course Salerno, September 27-29, 2017

Integrative Analysis of Transcriptomic Data for Identification of T-Cell

RNA-Seq Lecture 2

Structure Characterization of a Peroxidase from The

Comparative Analysis of Genome Aligners Shows HISAT2 and BWA Are Among the Best Tools

Tophat2: Accurate Alignment of Transcriptomes in the Presence Of

Magic-BLAST, an Accurate DNA and RNA-Seq Aligner for Long and Short Reads

Rbm10 Facilitates Heterochromatin Assembly Via the Clr6 HDAC Complex