Sequence-Specific Nucleic Acid Interactome

Dürnberger et al. Genome Biology 2013, 14:R81 http://genomebiology.com/2013/14/7/R81 RESEARCH Open Access Experimental characterization of the human non- sequence-specific nucleic acid interactome Gerhard Dürnberger1,2, Tilmann Bürckstümmer1,3, Kilian Huber1, Roberto Giambruno1, Tobias Doerks4, Evren Karayel1, Thomas R Burkard1,5, Ines Kaupe1,6, André C Müller1, Andreas Schönegger1, Gerhard F Ecker7, Hans Lohninger8, Peer Bork4, Keiryn L Bennett1, Giulio Superti-Furga1* and Jacques Colinge1* Abstract Background: The interactions between proteins and nucleic acids have a fundamental function in many biological processes, including gene transcription, RNA homeostasis, protein translation and pathogen sensing for innate immunity. While our knowledge of the ensemble of proteins that bind individual mRNAs in mammalian cells has been greatly augmented by recent surveys, no systematic study on the non-sequence-specific engagement of native human proteins with various types of nucleic acids has been reported. Results: We designed an experimental approach to achieve broad coverage of the non-sequence-specific RNA and DNA binding space, including methylated cytosine, and tested for interaction potential with the human proteome. We used 25 rationally designed nucleic acid probes in an affinity purification mass spectrometry and bioinformatics workflow to identify proteins from whole cell extracts of three different human cell lines. The proteins were profiled for their binding preferences to the different general types of nucleic acids. The study identified 746 high- confidence direct binders, 139 of which were novel and 237 devoid of previous experimental evidence. We could assign specific affinities for sub-types of nucleic acid probes to 219 distinct proteins and individual domains. The evolutionarily conserved protein YB-1, previously associated with cancer and drug resistance, was shown to bind methylated cytosine preferentially, potentially conferring upon YB-1 an epigenetics-related function. Conclusions: The dataset described here represents a rich resource of experimentally determined nucleic acid- binding proteins, and our methodology has great potential for further exploration of the interface between the protein and nucleic acid realms. Background with either microarrays (ChIP-chip) [2-5] or sequencing Interactions between proteins and nucleic acids play a technology (ChIP-seq) [6-8] as well as protein-binding pivotal role in a wide variety of essential biological pro- microarrays [9] and protein arrays [10]. The rapid devel- cesses, such as transcription, translation, splicing, or opment of current proteomic technologies has opened chromatin remodeling, defects in which can cause mul- new avenues for performing unbiased proteome-wide tiple diseases [1]. Transcription factors that recognize investigations of NABPs by affinity purification. An in- specific DNA motifs constitute only part of the nucleic depth screen of the yeast chromatin interactome [11] was acid-binding proteins (NABPs), which also include less performed by applying the modified chromatin immuno- sequence-specific interactors. purification (mChIP) approach [12], revealing several The global identification of sequence-specific NABPs multi-protein chromatin complexes. Other researchers has so far been achieved through various approaches, such have employed mass spectrometry (MS) approaches to as chromatin immunoprecipitation (ChIP) in combination study specific aspects of protein-nucleic acid interactions. For instance, Mann and colleagues [13] demonstrated the * Correspondence: [email protected]; [email protected]. power of such techniques by identifying interactors of ac.at functional DNA elements. Using synthetic DNA oligonu- 1 CeMM Research Center for Molecular Medicine of the Austrian Academy of cleotides, DNA sequence-specific-binding proteins and Sciences, Lazarettgasse 14, AKH-BT 25.3, 1090 Vienna, Austria Full list of author information is available at the end of the article proteins that preferably interact with CpG islands were © 2013 Dürnberger et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Dürnberger et al. Genome Biology 2013, 14:R81 Page 2 of 17 http://genomebiology.com/2013/14/7/R81 found. The same group subsequently adapted this method nucleotides to consider. To identify proteins binding to to RNA elements [14]. Recently, mRNA-binding proteins epigenetic modifications, we synthesized additional cyto- were surveyed by covalent UV crosslinking and affinity sine-methylated analogues of the CG-DNA oligonucleo- purification followed by MS analysis in HeLa cells [15]. tides. Furthermore, we included several mononucleotide This work identified 860 high confidence mRNA-protein oligos and an ssDNA oligo with random nucleotide com- interactions including 315 proteins not known before to position. The final set of baits comprised 25 oligonucleo- bind mRNA, thereby illustrating the power of such tides (Supplementary Table S1 in Additional file 1) and approaches. The dataset provided new insight into the the symmetric experimental design (Figure 1a) guaranteed structural properties of mRNA-binding proteins, such as that differential binding of the interacting proteins would being enriched for short repetitive amino acid motifs and be solely due to differences in nucleotide composition. To highly intrinsically disordered. increase the coverage of the human proteome, we per- In this study, we present the first large-scale effort to formed the AP-MS experiments with whole cell lysates map human NABPs with generic classes of nucleic acids. from cell lines derived from the three germ layers: U937 Using synthetic DNA and RNA oligonucleotides as baits (lymphoma, mesoderm), HepG2 (liver carcinoma, endo- and affinity purification (AP)-MS methods we previously derm), and HaCat (keratinocyte, ectoderm). To identify applied to unravel new immune sensors of pathogen- proteins that would bind to the streptavidin matrix - but derived nucleic acids [16,17], we performed pulldown not to the baits - we performed affinity purifications using experiments in three cell lines that yielded greater than the uncoupled matrix with each cell lysate. In total, we 10,000 protein-nucleic acid interactions involving more analyzed 78 biological samples. The synthetic oligonucleo- than 900 proteins. Analysis of this rich dataset allowed us tides were coupled to a matrix by a 5’ biotin moiety and to identify 139 new high confidence NABPs, to provide used to purify NABPs from the biological samples and the experimental evidence for another 98 proteins whose enriched proteins were subsequently identified by MS NABP status had only been inferred computationally, (Figure 1a). and to determine the significant preferential affinity of 219 NABPs for different subtypes of nucleic acids, Protein identification and filtering thereby complementing existing knowledge greatly. The Altogether, the analysis of the 78 pulldown samples dataset we obtained provides many entry points for yielded 10,810 protein identifications; that is, on average, further investigations, whichweillustratebyproposing 140 proteins per bait, involving 952 distinct proteins. new functions for already characterized as well as These results were obtained by imposing a stringent pro- uncharacterized proteins and domains. All the interac- tein group false discovery rate of 1% (Materials and tion data are available to the research community. methods). To measure the achieved enrichment for NABPs, we compared whole cell lysate proteomes Results and discussion acquired with the same MS technology, which we named Bait design core proteomes and published previously [19], with the The diversity of all possible nucleic acid sequences that enriched samples. We found that an average of 21% of can be present in a human cell is virtually infinite and, to proteins in the core proteomes were annotated as NABPs reduce the complexity for a general mapping of protein- in Gene Ontology (GO) [20], and in the enriched samples nucleic acid interactions, we decided to design generic this proportion increased to more than 70% (Figure 1b). nucleic acids as baits that would capture essential differ- Among the known NABPs identified in the affinity purifi- ences between nucleotides. We opted for the synthesis of cations, 154 were not identified in the core proteomes, baits containing all possible dinucleotide combinations indicating that our experimental approach is not limited comprising single-stranded RNA (ssRNA), single-stranded to rather abundant proteins. Conversely, 252 out of 581 DNA (ssDNA) and double-stranded DNA (dsDNA) known NABPs observed in the core proteomes were not (Figure 1a). The use of synthetic oligonucleotides allowed identified in the pulldowns, thereby suggesting that these us to control bait sequences and concentrations. All the NABPs recognize sequence-specific nucleic acids or pat- baits were 30 nucleotides in length and contained two terns not present among the baits (Figure 1c). With nucleotides only in a one-to-one ratio. The choice of the respect to transcription factors, the purification protocol actual dinucleotide pattern resulted from a maximization provided a modest enrichment over the core proteomes of the minimum free energy across all possible

Sequence-Specific Nucleic Acid Interactome

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

CGGBP1 (NM 003663) Human Recombinant Protein – TP308653

DNA Methylation, Mechanisms of FMR1 Inactivation and Therapeutic Perspectives for Fragile X Syndrome

The Database of Chromosome Imbalance Regions and Genes

CGGBP1 Regulates CTCF Occupancy at Repeats Divyesh Patel†, Manthan Patel†, Subhamoy Datta and Umashankar Singh*

Epigenetic Analysis of the Critical Region I for Premature Ovarian Failure

CGGBP1 Regulates Cell Cycle in Cancer Cells Umashankar Singh1*, Pernilla Roswall2, Lene Uhrbom1 and Bengt Westermark1

CGGBP1 Regulates Chromatin Barrier Activity and CTCF Occupancy at Repeats

Comprehensive Gene Expression Analysis of Prostate Cancer Reveals Distinct Transcriptional Programs Associated with Metastatic Disease1

An Immunochemistry-Based Screen for Chemical Inhibitors of DNA-Protein

TRF2: a New Target for Telomere Dysregulation in Human Lymphoid Cells

Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress