Uniprobe: an Online Database of Protein Binding Microarray Data on Protein–DNA Interactions

Total Page:16

File Type:pdf, Size:1020Kb

Load more

UniPROBE: an online database of protein binding microarray data on protein–DNA interactions The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Newburger, D. E., and M. L. Bulyk. “UniPROBE: An Online Database of Protein Binding Microarray Data on protein-DNA Interactions.” Nucleic Acids Research 37.Database (2009): D77–D82. As Published http://dx.doi.org/10.1093/nar/gkn660 Publisher Oxford University Press (OUP) Version Final published version Citable link http://hdl.handle.net/1721.1/73095 Terms of Use Creative Commons Attribution Non-Commercial Detailed Terms http://creativecommons.org/licenses/by-nc/2.5 Published online 8 October 2008 Nucleic Acids Research, 2009, Vol. 37, Database issue D77–D82 doi:10.1093/nar/gkn660 UniPROBE: an online database of protein binding microarray data on protein–DNA interactions Daniel E. Newburger1 and Martha L. Bulyk1,2,3,* 1Division of Genetics, Department of Medicine, 2Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School and 3Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA 02115, USA Received August 15, 2008; Revised September 18, 2008; Accepted September 21, 2008 ABSTRACT stimuli, differentiation and development in an organism. Despite recent advances in this field, the vast majority The UniPROBE (Universal PBM Resource for of TFs in most major model organisms and pathogens Oligonucleotide Binding Evaluation) database remain either uncharacterized or poorly described (1). hosts data generated by universal protein binding The development of universal (2) protein binding micro- microarray (PBM) technology on the in vitro DNA- array (PBM) technology (3) (Figure 1) offers a new avenue binding specificities of proteins. This initial release for the exploration of protein–DNA binding specificity. of the UniPROBE database provides a centralized Universal PBMs provide an efficient and comprehensive resource for accessing comprehensive PBM data method for in vitro interrogation of DNA-binding pref- on the preferences of proteins for all possible erences. PBM technology complements other currently sequence variants (‘words’) of length k (‘k-mers’), available technologies, such as chromatin immunoprecipi- as well as position weight matrix (PWM) and graphi- tation coupled with either microarray readout (4–7) or cal sequence logo representations of the k-mer high-throughput sequencing (8–10) that identify genomic regions bound in vivo. data. In total, the database hosts DNA-binding Universal PBMs achieve comprehensive, high-resolu- data for over 175 nonredundant proteins from a tion determination of proteins’ DNA-binding preferences diverse collection of organisms, including the pro- by measuring the binding preferences of a protein over karyote Vibrio harveyi, the eukaryotic malarial para- all possible k-mers of a given length (2,11). Currently site Plasmodium falciparum, the parasitic employed custom array designs contain a set of 60-bp Apicomplexan Cryptosporidium parvum, the yeast DNA probes that encompass all possible permutations Saccharomyces cerevisiae, the worm Caenorhabdi- of either 9 (Bulyk Lab, unpublished data) or 10 bp (12), tis elegans, mouse and human. Current web tools depending upon the microarray design (2,12). In addition include a text-based search, a function for asses- to covering all contiguous 9-mers or 10-mers, these array sing motif similarity between user-entered data designs also offer an extensive set of gapped permuta- and database PWMs, and a function for locating tions that provide coverage of binding sites of greater putative binding sites along user-entered nucleotide length. Together, these data can be synthesized to produce high-confidence measurements of the relative preferences sequences. The UniPROBE database is available at of a protein for all possible sequence variants belonging http://thebrain.bwh.harvard.edu/uniprobe/. to a wide range of k-mer patterns typically found in TF binding site motifs (2,12). PBM enrichment scores from the PBM signal intensity data are typically calculated for INTRODUCTION each of the more than 2.3 million 8-mers (i.e. binding site The characterization of transcription factors’ (TFs’) ‘words’ with eight informative nucleotide positions, DNA-binding specificities represents a critical step including all contiguous 8-mers and a large collection of towards understanding the regulation of gene expression gapped 8-mers). These 8-mers encompass the full affinity and elucidating the biophysical properties governing range of DNA binding preferences, from the most prefer- protein–DNA interactions. The study of DNA-binding entially bound k-mers to low-affinity k-mers and nonspe- specificities therefore has profound implications for the cifically associated sequences (2). analysis and prediction of the regulatory networks The TRANSFAC (13) and JASPAR (14) databases con- that govern intracellular function, responses to external tain hundreds of matrices constructed from DNA binding *To whom correspondence should be addressed. Tel: +1 617 525 4725; Fax: +1 617 525 4705; Email: [email protected] ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. D78 Nucleic Acids Research, 2009, Vol. 37, Database issue number of binding site sequences in a JASPAR SELEX dataset is just 28, while the median number of nonredun- dant binding site sequences is just 11. In contrast, the DNA binding profiles obtained from our PBM experiments pro- vide information on the direct binding preferences of a given protein over all k-mer DNA binding sequence var- iants, measured in vitro using the universal PBM technol- ogy; the number of sequence variants examined is limited only by the number of features on the microarray. In addi- tion to the k-mer binding profiles, these procedures also provide DNA binding sequence PWMs derived from the k- mer data using our Seed-and-Wobble algorithm (2). The UniPROBE database hosts the high-resolution DNA binding profiles obtained from PBM experiments on known and predicted TFs (2,3,18–21). The database currently contains DNA binding profiles for many pro- teins not included in similar databases such as JASPAR (14) and TRANSFAC (13), and it offers several tools for searching the database and analyzing user-defined binding profiles or DNA sequences. The resources and analysis tools offered by the UniPROBE database promise to facil- itate previously untenable, downstream genomic analyses, Figure 1. Universal PBM schema. Universal PBMs containing all and we anticipate that it will represent an important geno- possible 10-mers within 60-mer probes are first synthesized as single- mic resource as additional PBM data are compiled. stranded oligonucleotide arrays, to which a common primer is annealed and extended in order to biochemically convert the single-stranded array to a double-stranded DNA (dsDNA) array (these steps are not shown in the figure) (2). The dsDNA array is then bound by protein, DATABASE CONSTRUCTION stained with a fluorophore-conjugated antibody, and scanned; the The UniPROBE database is managed by a MySQL rela- quantified array data are then normalized by the relative amounts of DNA in each spot, and used to calculate k-mer binding data (2). tional database that provides the back-end for user queries PWMs can be calculated either from the k-mer binding data using and facilitates the data retrieval necessary for the site’s our Seed-and-Wobble algorithm (2) or from the 60-mer probe data analysis tools. All HTML pages are dynamically gener- using other motif finding algorithms (34). ated by PHP scripts hosted on an Apache server, and several JavaScript libraries provide interactive interfaces that facilitate site navigation and form accessibility. The site data from various data types; indeed, a given position Apache server also hosts all downloadable data, available weight matrix (PWM) in TRANSFAC frequently is by HTTP connection. derived from binding sequence data compiled from mul- tiple experimental methods, which include lower through- put approaches such as gel retardation (i.e. electrophoretic DATABASE CONTENT mobility shift assays), DNase I footprinting, immuno- The UniPROBE database hosts the results of PBM experi- precipitation, supershift assays and methylation protec- ments, subsequent computational analyses performed tion and higher throughput approaches such as in vitro on these data, and protein annotations. The site currently selection (15) (SELEX). The PAZAR database (16) is a hosts PBM data for over 175 nonredundant proteins from meta-database that contains TF binding site data. It con- a wide range of organisms, including the prokaryote Vibrio tains PWMs from the JASPAR core database and in vivo harveyi, the eukaryotic malarial parasite Plasmodium TF binding site data and cis-regulatory module infor- falciparum, the parasitic Apicomplexan Cryptosporidium mation from various sources, including other databases. parvum, the yeast Saccharomyces cerevisiae, the worm A review of databases of cis-regulatory modules is Caenorhabditis elegans, mouse and human (2,18,20,21). beyond the scope of this article. These data already encompass the majority of mouse The universal PBM technology has several
Recommended publications
  • Locating Mammalian Transcription Factor Binding Sites: a Survey of Computational and Experimental Techniques

    Locating Mammalian Transcription Factor Binding Sites: a Survey of Computational and Experimental Techniques

    Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Review Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques Laura Elnitski,1,4 Victor X. Jin,2 Peggy J. Farnham,2 and Steven J.M. Jones3 1Genomic Functional Analysis Section, National Human Genome Research Institute, National Institutes of Health, Rockville, Maryland 20878, USA; 2Genome and Biomedical Sciences Facility, University of California–Davis, Davis, California 95616-8645, USA; 3Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada V5Z-4E6 Fields such as genomics and systems biology are built on the synergism between computational and experimental techniques. This type of synergism is especially important in accomplishing goals like identifying all functional transcription factor binding sites in vertebrate genomes. Precise detection of these elements is a prerequisite to deciphering the complex regulatory networks that direct tissue specific and lineage specific patterns of gene expression. This review summarizes approaches for in silico, in vitro, and in vivo identification of transcription factor binding sites. A variety of techniques useful for localized- and high-throughput analyses are discussed here, with emphasis on aspects of data generation and verification. [Supplemental material is available online at www.genome.org.] One documented goal of the National Human Genome Research and other regulatory elements is mediated by DNA/protein in- Institute (NHGRI) is the identification of all functional noncod- teractions. Thus, one major step in the characterization of the ing elements in the human genome (ENCODE Project Consor- functional elements of the human genome is the identification tium 2004).
  • Specificity Landscapes Unmask Submaximal Binding Site Preferences of Transcription Factors

    Specificity Landscapes Unmask Submaximal Binding Site Preferences of Transcription Factors

    Specificity landscapes unmask submaximal binding site preferences of transcription factors Devesh Bhimsariaa,b,1, José A. Rodríguez-Martíneza,2, Junkun Panc, Daniel Rostonc, Elif Nihal Korkmazc, Qiang Cuic,3, Parameswaran Ramanathanb, and Aseem Z. Ansaria,d,4 aDepartment of Biochemistry, University of Wisconsin–Madison, Madison, WI 53706; bDepartment of Electrical and Computer Engineering, University of Wisconsin–Madison, Madison, WI 53706; cDepartment of Chemistry, University of Wisconsin–Madison, Madison, WI 53706; and dThe Genome Center of Wisconsin, University of Wisconsin–Madison, Madison, WI 53706 Edited by Michael Levine, Princeton University, Princeton, NJ, and approved September 24, 2018 (received for review July 13, 2018) We have developed Differential Specificity and Energy Landscape Here, we report the development of Differential Specificity (DiSEL) analysis to comprehensively compare DNA–protein inter- and Energy Landscapes (DiSEL) to compare experimental actomes (DPIs) obtained by high-throughput experimental plat- platforms, computational methods, and interactomes of TFs, forms and cutting edge computational methods. While high-affinity especially those factors that bind identical consensus motifs. Our DNA binding sites are identified by most methods, DiSEL uncovered results reveal that (i) most high-throughput experimental plat- nuanced sequence preferences displayed by homologous transcription forms reliably identify high-affinity motifs but yield less reliable factors. Pairwise analysis of 726 DPIs uncovered homolog-specific dif- information on submaximal sites; (ii) with few exceptions, com- ferences at moderate- to low-affinity binding sites (submaximal sites). putational methods model DPIs with a focus on high-affinity DiSEL analysis of variants of 41 transcription factors revealed that sites; (iii) submaximal sites improve the annotation of biologi- many disease-causing mutations result in allele-specific changes in cally relevant binding sites across genomes; (iv) among members binding site preferences.
  • DNA Binding Site of the Growth Factor-Inducible Protein Zif268

    DNA Binding Site of the Growth Factor-Inducible Protein Zif268

    Proc. Natl. Acad. Sci. USA Vol. 86, pp. 8737-8741, November 1989 Biochemistry DNA binding site of the growth factor-inducible protein Zif268 ("zinc fingers"/transcription factor/cell growth) BARBARA CHRISTY AND DANIEL NATHANS* Howard Hughes Medical Institute and the Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205 Contributed by Daniel Nathans, August 31, 1989 ABSTRACT Zif268, a zinc finger protein whose mRNA is The recombinant plasmid was transfected into E. coli strain rapidly activated in cells exposed to growth factors or other BL21(DE3) and the transfected cells were grown as described signaling agents, is thought to play a role in regulating the (22) to an optical density of about 0.6 before induction of T7 genetic program induced by extracellular ligands. We report RNA polymerase with 0.4 mM isopropyl B-D-thiogalacto- that Zif268 has one of the characteristics of a transcriptional pyranoside (IPTG). Bacterial extracts used for DNA binding regulator, namely, sequence-specific binding to DNA. Zif268 were prepared as described (23). After solubilization of synthesized in Escherichia coli bound to two sites upstream of proteins with 4 M urea the preparation was dialyzed against the zif268 gene and to sites in the promoter regions of other 20 mM Tris, pH 7.7/50 mM KCI/10 mM MgCl2/i mM genes. The nucleotide sequences responsible for binding were EDTA/10 ,uM ZnSO4/20% (vol/vol) glycerol/1 mM dithio- defined by DNase I footprinting, by methylation interference threitol/0.2 mM phenylmethylsulfonyl fluoride/1 mM so- experiments, and by use of synthetic oligonucleotides.
  • Direct Inhibition of the DNA-Binding Activity of POU Transcription Factors

    Direct Inhibition of the DNA-Binding Activity of POU Transcription Factors

    Published online 25 April 2008 Nucleic Acids Research, 2008, Vol. 36, No. 10 3341–3353 doi:10.1093/nar/gkn208 Direct inhibition of the DNA-binding activity of POU transcription factors Pit-1 and Brn-3 by selective binding of a phenyl-furan-benzimidazole dication Paul Peixoto1,2, Yang Liu3, Sabine Depauw1,2, Marie-Paule Hildebrand1,2,4, David W. Boykin3, Christian Bailly1,2, W. David Wilson3 and Marie-He´ le` ne David-Cordonnier1,2,* Downloaded from https://academic.oup.com/nar/article/36/10/3341/2410535 by guest on 23 September 2021 1INSERM U-837, Team 4-‘Molecular and cellular targeting for cancer treatment’, Jean-Pierre Aubert Research Center, Institut de Recherches sur le Cancer de Lille, Place de Verdun, F-59045 Lille, 2IMPRT-IFR114, Lille, France, 3Department of Chemistry, Georgia State University, Atlanta, GA, USA and 4Institut de Recherches sur le Cancer de Lille - IRCL, Lille, France Received February 5, 2008; Revised and Accepted April 8, 2008 ABSTRACT control of transcription factors activity by those The development of small molecules to control gene compounds. expression could be the spearhead of future- targeted therapeutic approaches in multiple pathol- ogies. Among heterocyclic dications developed with INTRODUCTION this aim, a phenyl-furan-benzimidazole dication The aim of exerting precise control over the expression DB293 binds AT-rich sites as a monomer and level of specified genes using a small molecule drug is an ’ 5 -ATGA sequence as a stacked dimer, both in the objective with major consequences for many therapeutic minor groove. Here, we used a protein/DNA array applications including cancer, chronic inflammatory dis- approach to evaluate the ability of DB293 to specif- orders, neuro-degenerative or cardiovascular diseases ically inhibit transcription factors DNA-binding in a (1–4).
  • Preferences in a Trait Decision Determined by Transcription Factor Variants

    Preferences in a Trait Decision Determined by Transcription Factor Variants

    Preferences in a trait decision determined by transcription factor variants Michael W. Dorritya,b, Josh T. Cuperusa,c, Jolie A. Carlislea, Stanley Fieldsa,c,d,1, and Christine Queitscha,1 aDepartment of Genome Sciences, University of Washington, Seattle, WA 98195; bDepartment of Biology, University of Washington, Seattle, WA 98195; cHoward Hughes Medical Institute, University of Washington, Seattle, WA 98195; and dDepartment of Medicine, University of Washington, Seattle, WA 98195 Contributed by Stanley Fields, June 27, 2018 (sent for review April 9, 2018; reviewed by Leah E. Cowen and Andrew W. Murray) Few mechanisms are known that explain how transcription factors mating, Ste12 can bind at pheromone-responsive genes as a can adjust phenotypic outputs to accommodate differing environ- homodimer or with the cofactors Mcm1 or Matα1 (10–13). The ments. In Saccharomyces cerevisiae, the decision to mate or invade consensus DNA-binding site of Ste12 is TGAAAC, known as the relies on environmental cues that converge on a shared transcription pheromone response element (PRE) (11). For invasion, Ste12 factor, Ste12. Specificity toward invasion occurs via Ste12 binding and its cofactor Tec1 are both required to activate genes that me- cooperatively with the cofactor Tec1. Here, we determine the range diate filamentation (8, 14, 15). Some of these genes contain a of phenotypic outputs (mating vs. invasion) of thousands of DNA- Ste12 binding site near a Tec1 consensus sequence of GAATGT, an binding domain variants in Ste12 to understand how preference for organization for heterodimeric binding known as a filamentation invasion may arise. We find that single amino acid changes in the response element (FRE).
  • A STAT Protein Domain That Determines DNA Sequence Recognition Suggests a Novel DNA-Binding Domain

    A STAT Protein Domain That Determines DNA Sequence Recognition Suggests a Novel DNA-Binding Domain

    Downloaded from genesdev.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press A STAT protein domain that determines DNA sequence recognition suggests a novel DNA-binding domain Curt M. Horvath, Zilong Wen, and James E. Darnell Jr. Laboratory of Molecular Cell Biology, The Rockefeller University, New York, New York 10021 Statl and Stat3 are two members of the ligand-activated transcription factor family that serve the dual functions of signal transducers and activators of transcription. Whereas the two proteins select very similar (not identical) optimum binding sites from random oligonucleotides, differences in their binding affinity were readily apparent with natural STAT-binding sites. To take advantage of these different affinities, chimeric Statl:Stat3 molecules were used to locate the amino acids that could discriminate a general binding site from a specific binding site. The amino acids between residues -400 and -500 of these -750-amino-acid-long proteins determine the DNA-binding site specificity. Mutations within this region result in Stat proteins that are activated normally by tyrosine phosphorylation and that dimerize but have greatly reduced DNA-binding affinities. [Key Words: STAT proteins; DNA binding; site selection] Received January 6, 1995; revised version accepted March 2, 1995. The STAT (signal transducers and activators if transcrip- Whereas oligonucleotides representing these selected se- tion) proteins have the dual purpose of, first, signal trans- quences exhibited slight binding preferences, the con- duction from ligand-activated receptor kinase com- sensus sites overlapped sufficiently to be recognized by plexes, followed by nuclear translocation and DNA bind- both factors. However, by screening different natural ing to activate transcription (Darnell et al.
  • On the Prediction of DNA-Binding Preferences of C2H2-ZF Domains Using Structural Models: Application on Human CTCF

    On the Prediction of DNA-Binding Preferences of C2H2-ZF Domains Using Structural Models: Application on Human CTCF

    bioRxiv preprint doi: https://doi.org/10.1101/859934; this version posted November 29, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. On the prediction of DNA-binding preferences of C2H2-ZF domains using structural models: application on human CTCF. Alberto Meseguer1,*, Filip Årman1,*, Oriol Fornes2, Ruben Molina1, Jaume Bonet3, Narcis Fernandez-Fuentes4,5, Baldo Oliva1. 1 Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain, 2 Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada, 3 Laboratory of Protein Design & Immunoengineering, School of Engineering, Ecole Polytechnique Federale de Lausanne, Lausanne 1015, Vaud, Switzerland, 4 Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic 08500, Catalonia, Spain, 5IBERS, Institute of Biological, Environmental and Rural Science, Aberystwyth University U.K *Both authors contributed equally to the paper as first authors. ABSTRACT Cis2-His2 zinc finger (C2H2-ZF) proteins are the largest family of transcription factors in human and higher metazoans. However, the DNA-binding preferences of many members of this family remain unknown. We have developed a computational method to predict these DNA-binding preferences. We combine information from crystal structures composed by C2H2-ZF domains and from bacterial one-hybrid experiments to compute scores for protein-DNA binding based on statistical potentials.
  • A-Activating Factor-1 and AP-1 Cooperation Between Serum

    A-Activating Factor-1 and AP-1 Cooperation Between Serum

    Transcriptional Induction of Matrix Metalloproteinase-9 in the Chondrocyte and Synoviocyte Cells Is Regulated via a Novel Mechanism: Evidence for Functional This information is current as Cooperation between Serum Amyloid of September 25, 2021. A-Activating Factor-1 and AP-1 Alpana Ray, B. Sonny Bal and Bimal K. Ray J Immunol 2005; 175:4039-4048; ; Downloaded from doi: 10.4049/jimmunol.175.6.4039 http://www.jimmunol.org/content/175/6/4039 References This article cites 35 articles, 16 of which you can access for free at: http://www.jimmunol.org/ http://www.jimmunol.org/content/175/6/4039.full#ref-list-1 Why The JI? Submit online. • Rapid Reviews! 30 days* from submission to initial decision • No Triage! Every submission reviewed by practicing scientists by guest on September 25, 2021 • Fast Publication! 4 weeks from acceptance to publication *average Subscription Information about subscribing to The Journal of Immunology is online at: http://jimmunol.org/subscription Permissions Submit copyright permission requests at: http://www.aai.org/About/Publications/JI/copyright.html Email Alerts Receive free email-alerts when new articles cite this article. Sign up at: http://jimmunol.org/alerts The Journal of Immunology is published twice each month by The American Association of Immunologists, Inc., 1451 Rockville Pike, Suite 650, Rockville, MD 20852 Copyright © 2005 by The American Association of Immunologists All rights reserved. Print ISSN: 0022-1767 Online ISSN: 1550-6606. The Journal of Immunology Transcriptional Induction of Matrix Metalloproteinase-9 in the Chondrocyte and Synoviocyte Cells Is Regulated via a Novel Mechanism: Evidence for Functional Cooperation between Serum Amyloid A-Activating Factor-1 and AP-11 Alpana Ray,2*† B.
  • Probing DNA Shape and Methylation State on a Genomic Scale with Dnase I

    Probing DNA Shape and Methylation State on a Genomic Scale with Dnase I

    Probing DNA shape and methylation state on a genomic scale with DNase I Allan Lazarovicia,b, Tianyin Zhouc,1, Anthony Shaferd,1, Ana Carolina Dantas Machadoc,1, Todd R. Rileyb,e, Richard Sandstromd, Peter J. Sabod,YanLuc, Remo Rohsc,2, John A. Stamatoyannopoulosd,2, andHarmenJ.Bussemakerb,e,2 Departments of aElectrical Engineering and bBiological Sciences, Columbia University, New York, NY 10027; cMolecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics and Astronomy, and Computer Science, University of Southern California, Los Angeles, CA 90089; dDepartments of Genome Sciences and Medicine, University of Washington, Seattle, WA 98195; and eCenter for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032 Edited by Steven Henikoff, Fred Hutchinson Cancer Research Center, Seattle, WA, and approved February 13, 2013 (received for review October 15, 2012) DNA binding proteins find their cognate sequences within genomic recognition sequence of a TF could affect its binding affinity DNA through recognition of specific chemical and structural features. (18). However, this putative mechanism leaves various observa- Here we demonstrate that high-resolution DNase I cleavage profiles tions unaccounted for. For instance, some TFs interact with the can provide detailed information about the shape and chemical major groove, yet are not affected by DNA methylation despite modification status of genomic DNA. Analyzing millions of DNA the extra methyl group protruding in the major groove. Other backbone hydrolysis events on naked genomic DNA, we show that TFs interact with the minor groove, yet are affected by DNA the intrinsic rate of cleavage by DNase I closely tracks the width of methylation.
  • Predicting Transcription Factor Binding Sites Using Structural Knowledge

    Predicting Transcription Factor Binding Sites Using Structural Knowledge

    Predicting Transcription Factor Binding Sites Using Structural Knowledge Tommy Kaplan1,2, Nir Friedman1,, and Hanah Margalit2,∗ 1 School of Computer Science, The Hebrew University, Jerusalem 91904, Israel {tommy, nir}@cs.huji.ac.il 2 Dept. of Molecular Genetics and Biotechnology, Hadassah Medical School, The Hebrew University, Jerusalem 91120, Israel [email protected] Abstract. Current approaches for identification and detection of tran- scription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid-nucleotide recognition preferences. These are used to predict bind- ing sites for novel transcription factors from the same structural family. We apply our approach to the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with various experimental results. To demonstrate the potential of our al- gorithm, we use the learned preferences to predict binding site models for novel proteins from the same family. These models are then used in genomic scans to find putative binding sites of the novel proteins. 1 Introduction Specific binding of transcription factors to cis-regulatory elements is a crucial component of transcriptional regulation. Previous studies have used both exper- imental and computational approaches to determine the relationships between transcription factors and their targets. In particular, probabilistic models were employed to characterize the binding preferences of transcription factors, and to identify their putative sites in genomic sequences [24, 27]. This approach is useful when massive binding data are available, but it cannot be applied to proteins without extensive experimental binding studies.
  • Fos-Jun Interactions That Mediate Transcription Regulatory Speci®City

    Fos-Jun Interactions That Mediate Transcription Regulatory Speci®City

    Oncogene (2001) 20, 2438 ± 2452 ã 2001 Nature Publishing Group All rights reserved 0950 ± 9232/01 $15.00 www.nature.com/onc Close encounters of many kinds: Fos-Jun interactions that mediate transcription regulatory speci®city Yurii Chinenov1 and Tom K Kerppola*,1 1Howard Hughes Medical Institute, University of Michigan Medical School Ann Arbor, Michigan, MI 48109-0650, USA Fos and Jun family proteins regulate the expression of a Wang et al., 1992). However, only a few of the genes myriad of genes in a variety of tissues and cell types. that mediate the essential functions of speci®c Fos and This functional versatility emerges from their interac- Jun family members have been identi®ed (Bakin and tions with related bZIP proteins and with structurally Curran, 1999; Fu et al., 2000) unrelated transcription factors. These interactions at Fos and Jun family proteins function as dimeric composite regulatory elements produce nucleoprotein transcription factors that bind to AP-1 regulatory complexes with high sequence-speci®city and regulatory elements in the promoter and enhancer regions of selectivity. Several general principles including binding numerous mammalian genes (Curran and Franza, cooperativity and conformational adaptability have 1988). Jun proteins form both homodimers and emerged from studies of regulatory complexes containing heterodimers with Fos proteins, whereas Fos proteins Fos-Jun family proteins. The structural properties of do not form homodimers and require heterodimeriza- Fos-Jun family proteins including opposite orientations tion to bind DNA. The DNA-binding and dimerization of heterodimer binding and the ability to bend DNA can domains among dierent family members are highly contribute to the assembly and functions of such conserved and dierent members of the Fos and Jun complexes.
  • Identification of a Consensus DNA-Binding Site for The

    Identification of a Consensus DNA-Binding Site for The

    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Biochemistry -- Faculty Publications Biochemistry, Department of 2008 Identification of a Consensus DNA-Binding Site for the Arabidopsis thaliana SBP Domain Transcription Factor, AtSPL14, and Binding Kinetics by Surface Plasmon Resonance Xinwen Liang University of Nebraska - Lincoln, [email protected] Tara J. Nazarenus University of Nebraska-Lincoln, [email protected] Julie M. Stone University of Nebraska-Lincoln, [email protected] Follow this and additional works at: https://digitalcommons.unl.edu/biochemfacpub Part of the Biochemistry, Biophysics, and Structural Biology Commons Liang, Xinwen; Nazarenus, Tara J.; and Stone, Julie M., "Identification of a Consensus DNA-Binding Site for the Arabidopsis thaliana SBP Domain Transcription Factor, AtSPL14, and Binding Kinetics by Surface Plasmon Resonance" (2008). Biochemistry -- Faculty Publications. 84. https://digitalcommons.unl.edu/biochemfacpub/84 This Article is brought to you for free and open access by the Biochemistry, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Biochemistry -- Faculty Publications by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Published in Biochemistry 47:12 (2008), pp 3645–3653; doi: 10.1021/bi701431y Copyright © 2008 American Chemical Society. Used by permission. Published online February 27, 2008. Identification of a Consensus DNA-Binding Site for the Arabidopsis thaliana SBP Domain Transcription