Target Genes of Tfs Also May Be Associated with Disease
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT WANG, TIANYUAN. Identifying Transcription Factor Targets and Studying Human Complex Disease Genes. (Under the direction of Dr. Elizabeth R. Hauser and Dr. Steffen Heber.) Transcription factors (TFs) have been characterized as mediators of human complex disease processes. The target genes of TFs also may be associated with disease. Identification of potential TF targets could further our understanding of gene-gene interactions underlying complex disease. We focused on two TFs, USF1 and ZNF217, because of their biological importance, especially their known genetic association with coronary artery disease (CAD), and the availability of chromatin immunoprecipitation microarray (ChIP-chip) results. First, we used USF1 ChIP-chip data as a training dataset to develop and evaluate several kernel logistic regression prediction models. Our most accurate predictor significantly outperformed standard PWM-based prediction methods. This novel prediction method enables a more accurate and efficient genome-scale identification of USF1 binding and associated target genes. Second, the results from independent linkage and gene expression studies suggest that ZNF217 also may be a candidate gene for CAD. We further investigated the role of ZNF217 for CAD in three independent CAD samples with different phenotypes. Our association studies of ZNF217 identified three SNPs having consistent association with CAD in three samples. Aorta expression profiling indicated that the proportion of the aorta with raised lesions was also positively correlated to ZNF217 expression. The combined evidence suggests that 1 ZNF217 is a novel susceptibility gene for CAD. Finally, we applied our previously developed TF binding site (TFBS) prediction method to ZNF217. The performance of the prediction models of ZNF217 and USF1 are very similar. We demonstrated that our TFBS prediction method can be extended to other TFs. In summary, the results of this dissertation research are (1) evaluation of two TFs, USF1 and ZNF217, as susceptibility factors for CAD; (2) development of a generalized method for TFBS prediction; (3) prediction of TFBSs and target genes of two TFs, and identification of SNPs within TFBSs. This research allows for the development of study design to access TF based interactions in genetic susceptibility to human complex disease. 2 Identifying Transcription Factor Targets and Studying Human Complex Disease Genes by Tianyuan Wang A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Bioinformatics Raleigh, North Carolina 2009 APPROVED BY: _______________________________ ______________________________ Dr. Elizabeth R. Hauser Dr. Steffen Heber Chair of Advisory Committee Chair of Advisory Committee ________________________________ ________________________________ Dr. David McK. Bird Dr. Jonathan M. Horowitz _______________________________ Dr. Jeffrey L. Thorne 3 DEDICATION To my wife, daughter and parents ii BIOGRAPHY Tianyuan Wang was born in Beijing, People‘s Republic of China. He graduated from China Agricultural University in Beijing, China in 1991 with a B.S. in Plant Biochemistry and Physiology. He continued his education at the University of Georgia, where he received his M.S. in Plant Pathology in 1997 and M.S. in Pharmacy in 2001. He joined the Bioinformatics program at North Carolina State University in 2004. While working toward his doctoral degree, he worked as a full-time bioinformatics analyst at the Center for Human Genetics at Duke Medical Center under the direction of Dr. Elizabeth R. Hauser. iii ACKNOWLEDGMENTS I would like to express my sincere gratitude to my advisors, Dr. Elizabeth R. Hauser and Dr. Steffen Heber, for their support, encouragement and guidance during my doctoral study. I feel very fortunate to work under Dr. Elizabeth R. Hauser‘s supervision for almost six years. I also would like to thank my other committee members, Dr. David McK. Bird, Dr. Jonathan M. Horowitz, and Dr. Jeffrey L. Thorne. They have guided my research projects and provided valuable suggestions to this dissertation. In addition to my advisory committee, I am very fortunate and grateful to collaborate with Dr. Terrence S. Furey at Duke University. I would like to thank the staff at the Bioinformatics Research Center, in particular, Juliebeth Briseno, for her assistance with graduate school regulations. Further, I thank the staff at the Center for Human Genetics at Duke Medical Center. I really appreciate their supports and friendships. I would like to give special thanks to my wife, Jun Yang, for her encouragement, support and love. Finally, I am very grateful for the supports of my family and many friends during the course of this study. iv TABLE OF CONTENTS LIST OF TABLES .......................................................................................................... viii LIST OF FIGURES .......................................................................................................... ix LIST OF ADDITIONAL FILES ...................................................................................... x 1 Introduction .................................................................................................................. 1 1.1 Human complex diseases........................................................................................ 2 1.2 Genome scan to identify candidate genes .................................................………..3 1.3 Candidate gene analyses ........................................................................................ 6 1.4 Transcription factors and human complex disease ................................................ 7 1.5 TF binding site and target gene prediction ............................................................ 8 1.6 The scope of this dissertation .............................................................................. 12 1.7 Reference ............................................................................................................. 14 2 A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease ................ 18 2.1 Abstract ................................................................................................................ 19 2.2 Background .......................................................................................................... 20 2.3 Methods ................................................................................................................ 24 2.3.1 Genome sequence and features ................................................................... 24 2.3.2 USF1 ChIP-chip data .................................................................................. 25 2.3.3 Preliminary prediction based on PWM scoring method ............................. 26 2.3.4 Prediction method based on genomic features ............................................ 27 2.3.5 Genome-scale prediction and validation ..................................................... 28 2.4 Results ................................................................................................................. 29 2.4.1 Prediction method development ................................................................. 29 2.4.2 Genome-scale prediction and validation .................................................... 31 2.4.3 Distributions of predicted USF1-BSs ......................................................... 32 v 2.5 Discussion ............................................................................................................ 33 2.5.1 USF1 binding site prediction method .......................................................... 33 2.5.2 Training dataset from published USF1 ChIP-chip results .......................... 37 2.5.3 Predicted USF1 binding sites and target genes ........................................... 38 2.5.4 Application to human disease study ............................................................ 40 2.6 Conclusions .......................................................................................................... 42 2.7 Acknowledgements .............................................................................................. 42 2.8 Reference .............................................................................................................. 43 2.9 Figures .................................................................................................................. 48 2.10 Tables ................................................................................................................... 53 2.11 Additional files ..................................................................................................... 56 3 ZNF217 is associated with coronary artery disease in multiple samples ……...... 69 3.1 Abstract ................................................................................................................ 70 3.2 Introduction .......................................................................................................... 71 3.3 Results .................................................................................................................. 73 3.3.1 ZNF217 SNP selection and genotyping ...................................................... 73 3.3.2 Single–marker association of ZNF217 SNP in CATHGEN case-control sample ........................................................................................................