Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic
Total Page:16
File Type:pdf, Size:1020Kb
Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic Nephropathy A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Jens Schmidt December 2013 © 2013 Jens Schmidt. All Rights Reserved. 2 This thesis titled Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic Nephropathy by JENS SCHMIDT has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Lonnie R. Welch Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 ABSTRACT SCHMIDT, JENS, M.S., December 2013, Computer Science Discovery of Putative STAT5 Transcription Factor Binding Sites in Mice with Diabetic Nephropathy Director of Thesis: Lonnie R. Welch Type 1 diabetes mellitus has become a major disease and impacts patients’ lives significantly. Because the underlying pathways and the genetic causes have not been exhaustively examined yet, this thesis focuses on identifying potential binding sites for STAT5, a transcription factor that is hypothesized to play a role in the inflammation process of diabetic nephropathy, a complication of type 1 diabetes. In this study, motif finding was applied to determine a set of putative STAT5 binding sites. This set was filtered by comparison to three gene ontology terms that are associated with processes that can occur in diabetic nephropathy and by comparison to experimentally validated STAT5 binding sites. This work generated a short list of six genes and their associated sites that should be given the highest priority for experimental validation in the laboratory in an effort to demonstrate a direct, repressive role for STAT5 in diabetic nephropathy. 4 DEDICATION I would like to dedicate this thesis to Gudrun and Werner Schmidt and Ellen Lubbers. 5 ACKNOWLEDGMENTS First, I would like to thank Dr. Lonnie Welch for offering me the opportunity to specialize in the field of bioinformatics and for welcoming me into the bioinformatics laboratory. I would like to thank Dr. Welch for giving me helpful advice and for sharing his vision about solving bioinformatics problems and general philosophical content with me. Dr. Welch always encouraged me to work hard and to continuously build new skills and to extend my knowledge inside and outside the class room. Second, I would like to thank Dr. Karen Coschigano, as well as her collaborator Dr. K. Wyatt McMahon, for collaborating with our bioinformatics laboratory and me on her diabetic nephropathy project, for proofreading this thesis several times and giving valuable feedback, for explaining the biological background and for providing feedback for our intermediate results. Third, I would like to thank Dr. Chang Liu and Dr. Frank Drews for being on my committee. Fourth, I would like to thank Russ College of Engineering and Ohio University for accepting my application for the computer science program and giving me, a non- native speaker, the opportunity to experience United States university education at a high level and for getting to know the American culture better. Fifth, I would like to thank Xiaoyu Liang (Veronica) and Andrew Bissell for working on the diabetic nephropathy project as well. 6 Finally, I would like to thank the bioinformatics laboratory (Rami Al-Ouran, Richard Wolfe, Kevin Plis, Yichao Li and Ashwini Naik) for supporting me. And I would like to thank my family and Ellen Lubbers for always supporting and being there for me. You are great! 7 TABLE OF CONTENTS Page Abstract ...........................................................................................................................3 Dedication .......................................................................................................................4 Acknowledgments ...........................................................................................................5 List of Tables................................................................................................................. 11 List of Figures ............................................................................................................... 13 List of Abbreviations ..................................................................................................... 16 1. Introduction ............................................................................................................... 17 1.1 Diabetes mellitus ................................................................................................. 17 1.2 Diabetic Nephropathy .......................................................................................... 17 1.3 Transcription factors and their mechanism ........................................................... 18 1.4 Transcription factor STAT5 ................................................................................. 18 1.5 Previous laboratory experiments .......................................................................... 19 2. Hypothesis and Specific Aims ................................................................................... 21 2.1 Hypothesis ........................................................................................................... 21 2.2 Specific Aims ...................................................................................................... 21 3. Literature Review ...................................................................................................... 24 8 4. Results ....................................................................................................................... 28 4.1 Identification of putative STAT5 binding sites ..................................................... 28 4.2 Filtering genes by GO terms ................................................................................ 39 4.3 Comparison of putative STAT5 binding sites to published ChIP-seq results ......... 43 5. Discussion ................................................................................................................. 47 5.1 Significance ......................................................................................................... 51 5.2 Future work ......................................................................................................... 52 6. Methods ..................................................................................................................... 54 6.1 Microarray experiments ....................................................................................... 55 6.1.1 Description and interpretation of summarized microarray data sets ................ 56 6.1.2 Analysis of data sets ...................................................................................... 57 6.2 Literature search for experimentally validated STAT5 binding sites ..................... 58 6.2.1 Creation of position weight matrices ............................................................. 59 6.3 Data storage (MySQL database) ........................................................................... 59 6.4 Framework for program execution and to display results (Galaxy) ....................... 60 6.5 Perl programs....................................................................................................... 63 6.5.1 Sequence retrieval from the Ensembl database .............................................. 67 6.5.2 Motif finding (FIMO) .................................................................................... 67 9 6.5.3 Filter of potential STAT5 binding sites .......................................................... 68 6.5.3.1 Comparison of potential STAT5 binding sites to experimentally validated STAT5 binding sites (ChIP-seq)......................................................................... 68 6.5.3.2 Filter potential STAT5 binding sites via gene ontology database ............ 69 6.6 System architecture and configuration .................................................................. 70 6.7 Limitations........................................................................................................... 70 References ..................................................................................................................... 72 Appendix 1: Galaxy GUI screenshots ............................................................................ 82 Appendix 2: Galaxy configuration files (XML format) .................................................. 87 Appendix 3: SQL script “create_db.sql” ...................................................................... 102 Appendix 4: Additional charts of potential STAT5 binding sites .................................. 105 Long STAT5 binding motif ..................................................................................... 105 Short STAT5 binding motif ..................................................................................... 109 Appendix 5: Potential STAT5 binding sites (short motif) validated against ChIP-seq data set with GEO accession number GSM784027 .............................................................. 111 Appendix 6: Potential STAT5 binding sites (long motif) validated against ChIP-seq data set with GEO accession number GSM784027 .............................................................. 118 10 Appendix 7: Position weight matrix file “stat5_pwm__short_gas__default_bg.txt”