Comparative Gene Expression Analysis to Identify Common Factors in Multiple Cancers
Total Page:16
File Type:pdf, Size:1020Kb
COMPARATIVE GENE EXPRESSION ANALYSIS TO IDENTIFY COMMON FACTORS IN MULTIPLE CANCERS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Leszek A. Rybaczyk, B.A. ***** The Ohio State University 2008 Dissertation Committee: Professor Kun Huang, Adviser Professor Jeffery Kuret Approved by Professor Randy Nelson Professor Daniel Janies ------------------------------------------- Adviser Integrated Biomedical Science Graduate Program ABSTRACT Most current cancer research is focused on tissue-specific genetic mutations. Familial inheritance (e.g., APC in colon cancer), genetic mutation (e.g., p53), and overexpression of growth receptors (e.g., Her2-neu in breast cancer) can potentially lead to aberrant replication of a cell. Studies of these changes provide tremendous information about tissue-specific effects but are less informative about common changes that occur in multiple tissues. The similarity in the behavior of cancers from different organ systems and species suggests that a pervasive mechanism drives carcinogenesis, regardless of the specific tissue or species. In order to detect this mechanism, I applied three tiers of analysis at different levels: hypothesis testing on individual pathways to identify significant expression changes within each dataset, intersection of results between different datasets to find common themes across experiments, and Pearson correlations between individual genes to identify correlated genes within each dataset. By comparing a variety of cancers from different tissues and species, I was able to separate tissue and species specific effects from cancer specific effects. I found that downregulation of Monoamine Oxidase A is an indicator of this pervasive mechanism and can potentially be used to detect pathways and functions related to the initiation, promotion, and progression of cancer. ii Dedicated to my wife iii ACKNOWLEDGMENTS I want to thank my adviser, Dr. Kun Huang, for his seemingly unending patience, guidance and advice. Without which I never would have finished this research. I am indebted to Dr. Jared Butcher for his constant support and input that proved invaluable during my research. I am also grateful to Drs. Donald Holzschu, Meredith Bashaw, and Scott Moody for encouraging me to pursue academia. I want to especially acknowledge my committee members, Drs. Randy Nelson, Jeff Kuret, and Dan Janies who gave up valuable time and resources so that I could succeed. I wish to thank Dr. Christopher Hans for volunteering to be the graduate studies representative on my committee. I want to express my gratitude to both sets of my parents, Drs. Pramod and Dorothy Pathak as well as Mr. and Mrs. Jerome McNally for all their help during the course of my training. I also wish to acknowledge the administrative staff in my program who shepherded through this difficult process. iv VITA April 23, 1980……………………...…………….........Born – Albuquerque, New Mexico 2005……………………………………………………B.A. Psychology, Ohio University 2005-present……………………Graduate Research Associate, The Ohio State University PUBLICATIONS Research Publication 1. L.A. Rybaczyk, M.J. Bashaw, D.R. Pathak, S. Moody, R. Gilders, D. Holzschu, “An overlooked connection: serotonergic mediation of estrogen-related physiology and pathology.” BMC Women’s Health, vol. 5; (2005): 12. (Highly accessed) 2. L.A. Rybaczyk, M.J. Bashaw, D.R. Pathak, K. Huang, “An indicator of cancer: downregulation of Monoamine Oxidase-A in multiple organs and species.” BMC Genomics, 9(1):134, 2008. (Highly accessed) FIELDS OF STUDY Major Field: Integrated Biomedical Sciences v TABLE OF CONTENTS Page Abstract……………………………………………………………………………………ii Dedication……………………………………………………………………………...…iii Acknowledgements……………………………………………………………………….iv Vita………………………………………………………………………………………...v List of Tables.......................................................................................................................ix List of Figures......................................................................................................................x Chapters 1. Introduction………………………………..………………………………………1 1.1 Serotonin and Cancer.....……….………………………………………………3 1.2 Comparative Analysis of Gene Expression in Multiple Cancers……………...4 1.3 Organization of this Dissertation…….………………………………...............6 2. Genechip Technology…………….………………………………………………..7 2.1 Biological Issues…………..…………………………………………………..9 2.2 Current Statistical Approaches………………………………………………10 2.3 Summary……………………………………………………………………..15 3. Serotonin Physiology in Multiple Pathologies with a Focus on Cancer…………17 3.1 Serotonin Regulation………………………………………………………...18 3.2 Serotonin in the Central Nervous System...………….………………………20 3.3 Serotonin in the Musculoskeletal System……………………………………24 vi 3.4 Serotonin in the Vascular System……………………………………………26 3.5 Serotonin in the Immune System…………………………………………….29 3.6 Serotonin in Cancer………………………………………...………………..33 3.7 Summary………………………………………………....…………………..37 4. Hypothesis Testing of the Tryptophan/Serotonin Metabolic Pathway……….….39 4.1 Methods…….……………………………………………..…………………42 4.2 Results……....………………………………………………………………..44 4.3 Discussion..…………………………………………………………………..46 4.4 Summary……....……………………………………………………………..46 5. Whole Genome Analysis……………………….………………………………...48 5.1 Methods……..…………………….…………………………………….……50 5.1.1 Dataset Collection……………………………………………….…51 5.1.2 Dataset Handling…………………………………………………...52 5.1.3 Gene Selection……………………………………………………..52 5.2 Results………………………………………………………………………..56 5.2.1 Frequency of Differential Expression for Genes…………………..57 5.2.2 Human Genes………………………………………………………58 5.3 Discussion……..……………………………………………………………..59 5.4 Summary……………………………………………………………………..60 6. Correlating MAO-A Expression to Identify Differentially Expressed Pathways..61 6.1 Methods………………………………………………………………………62 6.1.1 Dataset Selection.…………………………………………………..62 6.1.2 Correlations………………………………………………………...63 vii 6.2 Results………………………………………………………………………..64 6.3 Discussion……………………………………………………………………64 6.4 Summary…………...…………………………………………………………66 7. Conclusions and Future Directions……………………………...………………67 7.1 Conclusions and Future Directions for Tier I: Hypothesis Testing of the Tryptophan/Serotonin Metabolic Pathway……………...……..……………..68 7.2 Conclusions and Future Directions for Tier II: Whole Genome Analysis…...70 7.3 Conclusions and Future Directions for Tier III: Correlating MAO-A Expression to Identify Differentially Expressed Pathways…………….…….71 7.4 Conclusion…………...………..…………….......……………………………73 References…………………………………………………………….………………….75 Appendix A Tables………………….………………………………………………..…106 Appendix B Figures….………………………………………………………………….136 viii LIST OF TABLES Table Page 1 Description of first datasets identified for analysis……………………………..107 2 Genes listed in the tryptophan pathway in KEGG……………………………...110 3 Descriptive information on human datasets extracted………………………….112 4 Descriptive information on paired datasets extracted from GEO ……………...113 5 Descriptive information on animal datasets extracted from GEO ……………..114 6 The genes with a frequency of 11 out of 19…………………………………….115 7 Genes with frequency of occurrences more than 22 out of 40……………….....116 8 The DAVID output of gene function clustering of the genes with frequency of occurrences more than 22 out of 40.................................................................118 9 The top six signaling networks identified using Ingenuity Pathway Analysis with a frequency of occurrences more than 22 out of 40.....................................122 10 The genes with significant frequency of occurrences in only human datasets....123 11 The DAVID output of gene function clustering of the genes with frequency of occurrences more than 19 out of 32................................................125 12 The top six signaling networks from Ingenuity Pathway Analysis classification of the genes with frequency of occurrences more than 18 out of 32....................134 13 The top six signaling networks from Ingenuity Pathway Analysis classification of the genes that correlated with MAO-A......................................135 ix LIST OF FIGURES Figure Page 1 A flow chart representing the analytical technique used.....................................137 2 Expression of MAO-A in normal and cancer tissue samples..............................138 ( , + 1) 3 CDF of Beta 2 2 for L=19 datasets...................................................139 − ( , + 1) 4 CDF of Beta 2 2 for L=40 datasets...................................................140 − ( , + 1) 5 CDF of Beta 2 2 for L=32 datasets....................................................141 6 A histogram of the −frequencies of common differentially expressed genes for the 19 datasets (Group A)....................................................142 7 A histogram of the gene frequencies for 40 datasets (Group B)..........................143 8 A graph representing the significance of the various pathways for 40 datasets based on an Ingenuity Pathway Analysis.............................................................144 9 The distribution of significant genes in humans..................................................145 10 A graph representing the significance of the various pathways for 32 human datasets based on an Ingenuity Pathway Analysis..............................146 11 Graph representing the significance of the various pathways for 32 human datasets based on an Ingenuity Pathway Analysis..............................147