
Washington University in St. Louis Washington University Open Scholarship Arts & Sciences Electronic Theses and Dissertations Arts & Sciences Winter 12-15-2017 A Generalized Biophysical Model of Transcription Factor Binding Specificity and Its Application on High-Throughput SELEX Data Shuxiang Ruan Washington University in St. Louis Follow this and additional works at: https://openscholarship.wustl.edu/art_sci_etds Part of the Bioinformatics Commons Recommended Citation Ruan, Shuxiang, "A Generalized Biophysical Model of Transcription Factor Binding Specificity and Its Application on High- Throughput SELEX Data" (2017). Arts & Sciences Electronic Theses and Dissertations. 1189. https://openscholarship.wustl.edu/art_sci_etds/1189 This Dissertation is brought to you for free and open access by the Arts & Sciences at Washington University Open Scholarship. It has been accepted for inclusion in Arts & Sciences Electronic Theses and Dissertations by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected]. WASHINGTON UNIVERSITY IN ST. LOUIS Division of Biology and Biomedical Sciences Computational and Systems Biology Dissertation Examination Committee: Gary D. Stormo, Chair Jeremy D. Buhler Barak A. Cohen S. Joshua Swamidass Ting Wang A Generalized Biophysical Model of Transcription Factor Binding Specificity and Its Application on High-Throughput SELEX Data by Shuxiang Ruan A dissertation presented to The Graduate School of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2017 St. Louis, Missouri © 2017, Shuxiang Ruan Table of Contents List of Figures ................................................................................................................................ iv List of Tables .................................................................................................................................. v Acknowledgments.......................................................................................................................... vi Abstract ......................................................................................................................................... vii Chapter 1: Introduction ................................................................................................................... 1 1.1 Matrix-Based Models of Transcription Factor Binding Specificity................................. 4 1.2 High-Throughput Methods for Measuring Binding Specificity ....................................... 6 1.3 Contribution of DNA Shape Features to Binding Specificity .......................................... 9 Chapter 2: Inherent Limitations of Probabilistic Models ............................................................. 12 2.1 Models for Transcription Factor Binding Specificity .................................................... 12 2.1.1 Probabilistic Model ............................................................................................................. 12 2.1.2 Biophysical Model .............................................................................................................. 13 2.2 Methods .......................................................................................................................... 14 2.2.1 Simulation Procedure .......................................................................................................... 14 2.2.2 The Estimated Probabilistic Models ................................................................................... 16 2.2.3 The Fitted Biophysical Model ............................................................................................. 16 2.3 Results ............................................................................................................................ 17 2.3.1 Results in the Absence of Measurement Noise ................................................................... 17 2.3.2 Results in the Presence of Measurement Noise .................................................................. 24 2.4 Discussion ...................................................................................................................... 27 Chapter 3: Estimation of Binding Motifs Using HT-SELEX Data .............................................. 30 3.1 Materials ......................................................................................................................... 31 3.2 Generalized Biophysical Model ..................................................................................... 32 3.3 BEESEM Algorithm ...................................................................................................... 34 3.3.1 Parameter Estimation .......................................................................................................... 34 3.3.2 Confidence Intervals ........................................................................................................... 36 3.3.3 Motif Length Selection ....................................................................................................... 36 3.3.4 Initialization of BEESEM with Seed Sequences ................................................................ 37 3.3.5 Evaluation of Binding Models ............................................................................................ 38 ii 3.4 Results ............................................................................................................................ 40 3.4.1 Characterization of BEESEM PWMs ................................................................................. 40 3.4.2 Evaluation Results............................................................................................................... 41 3.5 Discussion ...................................................................................................................... 46 Chapter 4: Contribution of DNA Shape Features to Binding Specificity ..................................... 50 4.1 Materials ......................................................................................................................... 50 4.1.1 JASPAR PFMs .................................................................................................................... 50 4.1.2 ChIP-seq Datasets ............................................................................................................... 51 4.1.3 DNA Shape Features ........................................................................................................... 51 4.2 Methods .......................................................................................................................... 51 4.2.1 Motif Discovery Algorithms Evaluated .............................................................................. 51 4.2.2 Training and Testing Binding Models ................................................................................ 54 4.3 Results ............................................................................................................................ 55 4.4 Discussion ...................................................................................................................... 60 Future Plans .................................................................................................................................. 62 References ..................................................................................................................................... 63 Appendix A: Biophysical Model of BEESEM ............................................................................. 74 Appendix B: Proof of BEESEM Being an Expectation Maximization Algorithm ...................... 82 iii List of Figures Figure 1.1 The workflow of high-throughput SELEX ................................................................7 Figure 2.1 The energy matrix, derived probabilistic models and corresponding logos of a typical simulation .....................................................................................................18 Figure 2.2 The non-linear relationship between binding energy and probability ......................19 Figure 2.3 The correlation between the predicted and true all sequence distributions ..............21 Figure 2.4 The correlation between the predicted and true top 1% sequence distributions ......23 Figure 3.1 The HT-SELEX experiments and the J2013 PFMs .................................................31 Figure 3.2 The results of the PBM and ChIP-seq evaluation tests ............................................45 Figure 3.3 The median relative affinities (MRAs) predicted by different binding models .......49 Figure 4.1 Comparison of the AUPRC scores generated by the DNAshapedTFBS-based methods with and without DNA shape features .......................................................58 Figure 4.2 Comparison of the AUPRC scores generated by DNAshapedTFBS_4bit + shape with other methods ...................................................................................................59 iv List of Tables Table 2.1 The rank correlation between the predicted and true all sequence distributions for probabilistic models in the absence of noise ............................................................22 Table 2.2 The rank correlation between the predicted and true top 1% sequence distributions for probabilistic models in the absence of noise ......................................................24
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages95 Page
-
File Size-