Computational Biology of Transcription Factor Binding (Methods In

TM METHODS IN MOLECULAR BIOLOGY Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK For other titles published in this series, go to www.springer.com/series/7651 Computational Biology of Transcription Factor Binding Edited by Istvan Ladunga Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, USA Editor Istvan Ladunga Department of Statistics University of Nebraska-Lincoln 1901 Vine St., E145 Beadle Center Lincoln, NE 68588-0665, USA [email protected] ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-60761-853-9 e-ISBN 978-1-60761-854-6 DOI 10.1007/978-1-60761-854-6 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010934132 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Cover illustration: Crystal structure of Fis bound to 27 bp optimal binding sequence F2 from Stella, S., Cascio, D., Johnson, R.C. (2010) The shape of the DNA minor groove directs binding by the DNA-bending protein Fis. Genes Dev. 24: 814–826. Printed on acid-free paper Humana Press is part of Springer Science+Business Media (www.springer.com) Preface Transcriptional regulation controls the basic processes of life. Its complex, dynamic, and hierarchical networks control the momentary availability of messenger RNAs for protein synthesis. Transcriptional regulation is key to cell division, development, tissue differenti- ation, and cancer as discussed in Chapters 1 and 2. We have witnessed rapid, major developments at the intersection of computational biology, experimental technology, and statistics. A decade ago, researches were struggling with notoriously challenging predictions of isolated binding sites from low-throughput experiments. Now we can accurately predict cis-regulatory modules, conserved clus- ters of binding sites (Chapters 13 and 15), partly based on high-throughput chromatin immunoprecipitation experiments in which tens of millions of DNA segments are sequenced by massively parallel, next-generation sequencers (ChIP-seq, Chapters 9, 10, and 11). These spectacular developments have allowed for the genome-wide mappings of tens of thousands of transcription factor binding sites in yeast, bacteria, mammals, insects, worms, and plants. Please also note the no less spectacular failures in many laboratories around the world. Having access to chromatin immunoprecipitation, next-generation sequencing, and software is no guarantee for success. The productive and creative use of computational and experimental tools requires a high-level understanding of the underlying biology, the technological characteristics, and the potential and limitation of statistical and computational solutions. This is the raison d’être of this volume, guiding scientists of all disciplines through the jungle of regulatory regions, ChIP-seq, about 200 motif discovery tools and others. As in previous volumes of the series Methods in Molecular BiologyTM,wehelpread- ers to understand the basic principles and give detailed guidance for the computational analyses and biological interpretations of transcription factor binding. We disclose critical practical information and caveats that may be missing from research publications. This volume serves not only computational biologists but experimentalists as well, who may want to understand better how to design and execute experiments and to communicate effectively with computational biologists, computer scientists, and statisticians. Chapter 1 helps readers to find their way in the maze of resources by a high-level overview of the computational, biological, and some experimental solutions of transcription factor binding. Chapter 1 highlights other units in this volume and discusses some of the issues not covered. Why are there so many failed experiments and analyses? Consider, for an example, ChIP-seq, where background noise accounts for more than half of the sequencing reads. Potentially, this may lead to a vast array of false-positive observations. Careful investi- gators, however, can apply kernel-based density estimates and other background mod- eling and correction methods to find significantly enriched signals in such noisy observations (Chapters 9 and 10). Density estimates are followed by improved peak calling with controlled false discovery rate (Chapter 10). Another problem is that ChIP-seq peaks are tens to hundreds of times wider than the footprint of the transcription factor on the DNA. The highest peaks often come from amplification and sequencing bias, v vi Preface not from a bona fide biological signal (Chapter 1). These serious issues mandate the identification of shared, short, and variable DNA motifs, representations of variable binding sites, from moderate-to-low resolution ChIP-seq data using computational motif discovery algorithms. On the other hand, false negatives are also abundant. Consider the temporary nature of regulation, which responds to temporary environmental and internal stimuli. Therefore, a site is typically bound only at a fraction of time, easily missed by snap- shot techniques like ChIP (Chapter 24). In order to reduce the number of false positives and negatives, motifs are trained by a wide spectrum of statistical learning methods. In spite of the diverse implementation of these tools, most of them stem from expectation maximization and Gibbs sampling (Chapters 6, 7,and11) or support vector machines (Chapter 13). The trained tools can find binding sites missed by experiments in the pre- dicted promoter regions (Chapter 5), all regulatory regions (Chapter 4), or in the whole genome. In itself, de novo computational motif prediction is still not accurate enough (Chapter 8). Confidence levels can be increased greatly by integrating binding site loca- tions with in vitro protein–DNA affinities (Chapter 12), evolutionary conserved regions (Chapters 11, 14,and18), and transposable DNA elements that propagate binding sites through the genome (Chapter 14). Time-delayed co-expression as inferred from large compendia of gene expression experiments also indicates binding sites of shared transcription factors. This enormous wealth of information can be retrieved in computa- tionally efficient ways from diverse databases including OregAnno (Chapter 20), Plant- TFDB (Chapter 21), cis-Lexicon (Chapter 22), and genome browsers (Chapters 1, 10, and 22). The integrated observations and predictions help us to reconstruct complex, hierarchical, and dynamic transcriptional regulatory networks (Chapters 23 and 24). This task demands not only new experiments but also the re-annotation of existing experimental data and computational predictions and ongoing, major paradigm changes for all of us. Istvan Ladunga Contents Preface .......................................... v Contributors ....................................... ix 1. An Overview of the Computational Analyses and Discovery of Transcription Factor Binding Sites ........................ 1 Istvan Ladunga 2. Components and Mechanisms of Regulation of Gene Expression ......... 23 Alper Yilmaz and Erich Grotewold 3. Regulatory Regions in DNA: Promoters, Enhancers, Silencers, and Insulators ... 33 Jean-Jack M. Riethoven 4. Three-Dimensional Structures of DNA-Bound Transcriptional Regulators .... 43 Tripti Shrivastava and Tahir H. Tahirov 5. Identification of Promoter Regions and Regulatory Sites ............. 57 Victor V. Solovyev, Ilham A. Shahmuradov, and Asaf A. Salamov 6. Motif Discovery Using Expectation Maximization and Gibbs’ Sampling ...... 85 Gary D. Stormo 7. Probabilistic Approaches to Transcription Factor Binding Site Prediction ..... 97 Stefan Posch, Jan Grau, André Gohr, Jens Keilwagen, and Ivo Grosse 8. The Motif Tool Assessment Platform (MTAP) for Sequence-Based Transcription Factor Binding Site Prediction Tools ................ 121 Daniel Quest and Hesham Ali 9. Computational Analysis of ChIP-seq Data ..................... 143 Hongkai Ji 10. Probabilistic Peak Calling and Controlling False Discovery Rate Estimations in Transcription Factor Binding Site Mapping from ChIP-seq .... 161 Shuo Jiao, Cheryl P. Bailey, Shunpu Zhang, and Istvan Ladunga 11. Sequence Analysis of Chromatin Immunoprecipitation Data for Transcription Factors .............................. 179 Kenzie D. MacIsaac and Ernest Fraenkel 12. Inferring Protein–DNA Interaction Parameters from SELEX Experiments .... 195 Marko Djordjevic 13. Kernel-Based Identification of Regulatory Modules ................ 213 Sebastian J. Schultheiss vii viii Contents 14. Identification of Transcription Factor Binding Sites Derived from Transposable Element Sequences Using ChIP-seq .............. 225 Andrew B. Conley and I. King Jordan 15. Target Gene Identification via Nuclear Receptor Binding Site Prediction ..... 241 Gabor Varga 16. Computing Chromosome Conformation

Computational Biology of Transcription Factor Binding (Methods In

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support