Large-Scale Integration of Microarray Data: Investigating the Pathologies of Cancer and Infectious Diseases Noor Dawany Aydin Tozeren, Phd
Total Page:16
File Type:pdf, Size:1020Kb
Large-Scale Integration of Microarray Data: Investigating the Pathologies of Cancer and Infectious Diseases A Thesis Submitted to the Faculty of Drexel University by Noor Dawany in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2010 ii Dedications To my late grandmother, Dr. Siranoosh Raihani, for being a great inspiration… iii Acknowledgements First and foremost I would like to acknowledge my parents, Bassam and Nada, for their unconditional support throughout my educational life. I would not be here if it were not for their sacrifices and encouragement. I also want to thank my brother, Sahil, for telling me almost nine years ago to toughen up and continue; I have never looked back since then. I owe my sincerest gratitude to my advisor, Dr. Aydin Tozeren, for his patience, his support and for all the effort he has put into this accomplishment. He is responsible for where I am today and where I will be, for that I am forever grateful. I would also like to thank my colleagues at the Center for Integrated Bioinformatics: Mahdi Saramdy, Will Dampier, Yichuan Liu and Perry Evans, and my friends from the Biomedical Engineering Department for their help and for entertaining conversations that commemorate our lives as PhD students at Drexel University. Finally, a big thank you for the support of old friends, for always being there for me… iv Table of Contents List of Tables ................................................................................................................................ viii List of Figures .................................................................................................................................. x Abstract .......................................................................................................................................... xii Chapter 1: Introduction .................................................................................................................... 1 1.1 Motivation: ............................................................................................................................. 1 1.2 Transcription, Translation and Control of Gene Expression: ................................................ 1 1.3 Cancer Overview ................................................................................................................... 4 1.4 Viral Infections and Hijacking Cellular Functions ................................................................ 6 1.4.1 Human Immunodeficiency Virus .................................................................................... 7 1.4.2 Hepatitis C ...................................................................................................................... 9 1.4.3 Influenza A .................................................................................................................... 11 Chapter 2: Gene Expression Microarrays ...................................................................................... 15 2.1 Introduction .......................................................................................................................... 15 2.2 Microarray Normalization: .................................................................................................. 17 2.2.1 Robust Multichip Average Algorithm: ......................................................................... 18 2.2.2 Reference Robust Multichip Average ........................................................................... 19 2.2.3 Custom Chip Definition Files ....................................................................................... 20 2.3 Microarray Analysis and Differential Gene Expression ...................................................... 21 2.3.1 Significance Analysis of Microarrays ........................................................................... 22 2.3.2 Meta-Analysis ............................................................................................................... 23 2.4 Databases ............................................................................................................................. 25 2.4.1 Microarray Databases: .................................................................................................. 25 2.4.2 Functional Annotation Databases: ................................................................................ 26 v Chapter 3: Asymmetric integration of microarray data outperforms meta-analysis approach ...... 28 3.1 Summary .............................................................................................................................. 28 3.2 Background .......................................................................................................................... 29 3.3 Materials and Methods ......................................................................................................... 30 3.3.1 Microarray dataset selection ......................................................................................... 30 3.3.2 Normalization and differential expression .................................................................... 31 3.3.3 Common transcriptional profiles across all five tissue types ........................................ 33 3.3.4 Expanding IV analysis to cDNA data ........................................................................... 33 3.4 Results .................................................................................................................................. 35 3.4.1 Datasets and approaches ............................................................................................... 35 3.4.2 IV meta-analysis and merged SAM overlap significantly in results ............................. 37 3.4.3 Cell cycle pathway is commonly enriched in cancers .................................................. 38 3.4.4 Microarray results match cancer research literature with low p-values ........................ 42 3.5 Discussion ............................................................................................................................ 44 3.6 Conclusion ........................................................................................................................... 47 Chapter 4: Large-scale integration of microarray data reveals genes and pathways common to multiple cancer types ..................................................................................................................... 48 4.1 Summary .............................................................................................................................. 48 4.2 Background .......................................................................................................................... 49 4.3 Materials and Methods ......................................................................................................... 50 4.3.1 Microarray dataset selection and normalization ........................................................... 50 4.3.2 Differential gene expression ......................................................................................... 51 4.3.3 Functional annotation of top ranked and conserved genes ........................................... 52 4.3.4 Consistent differential expression across tissues .......................................................... 52 4.3.5 Cancer literature annotation of identified significant SAM genes ................................ 52 vi 4.4 Results .................................................................................................................................. 53 4.4.1 Dataset........................................................................................................................... 53 4.4.2 SAM genes and their match with research literature .................................................... 56 4.4.3 Cellular pathways enriched for top 400 SAM genes .................................................... 56 4.4.4 SAM genes in multiple gene lists ................................................................................. 59 4.5 Discussion ............................................................................................................................ 65 4.6 Conclusion ........................................................................................................................... 67 Chapter 5: Virus and host iron binding protein interactions .......................................................... 68 5.1 Summary .............................................................................................................................. 68 5.2 Background .......................................................................................................................... 69 5.3 Methods ............................................................................................................................... 71 5.3.1 Identification of iron-associated proteins ...................................................................... 71 5.3.2 Identifying direct HIV-1 iron binding protein targets ................................................... 72 5.3.3 Microarray dataset selection on viral infections ........................................................... 72 5.3.4 Microarray data normalization and differential gene expression .................................. 73 5.3.5 Distribution of gene expression levels of iron binding proteins ..................................