Comparative Analysis of Commonly Used Peak Calling Programs for Chip

Total Page:16

File Type:pdf, Size:1020Kb

Comparative Analysis of Commonly Used Peak Calling Programs for Chip Comparative analysis of commonly used peak calling programs for ChIP- Seq analysis Hyeongrin Jeon1, Hyunji Lee1, Byunghee Kang1, Insoon Jang1, Original article Tae-Young Roh1,2,3* 1Department of Life Sciences, Pohang University of Science and Technology (POSTECH), eISSN 2234-0742 Pohang 37673, Korea 2 Genomics Inform 2020;18(4):e42 Division of Integrative Biosciences and Biotechnology, Pohang University of Science and https://doi.org/10.5808/GI.2020.18.4.e42 Technology (POSTECH), Pohang 37673, Korea 3SysGenLab Inc., Pohang 37613, Korea Received: October 6, 2020 Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP- Revised: October 26, 2020 Seq) is a powerful technology to profile the location of proteins of interest on a whole-ge- Accepted: November 22, 2020 nome scale. To identify the enrichment location of proteins, many programs and algorithms have been proposed. However, none of the commonly used peak calling programs could *Corresponding author: accurately explain the binding features of target proteins detected by ChIP-Seq. Here, pub- E-mail: [email protected] licly available data on 12 histone modifications, including H3K4ac/me1/me2/me3, H3K9ac/ me3, H3K27ac/me3, H3K36me3, H3K56ac, and H3K79me1/me2, generated from a human embryonic stem cell line (H1), were profiled with five peak callers (CisGenome, MACS1, MACS2, PeakSeq, and SISSRs). The performance of the peak calling programs was com- pared in terms of reproducibility between replicates, examination of enriched regions to variable sequencing depths, the specificity-to-noise signal, and sensitivity of peak predic- tion. There were no major differences among peak callers when analyzing point source his- tone modifications. The peak calling results from histone modifications with low fidelity, such as H3K4ac, H3K56ac, and H3K79me1/me2, showed low performance in all parame- ters, which indicates that their peak positions might not be located accurately. Our com- parative results could provide a helpful guide to choose a suitable peak calling program for specific histone modifications. Keywords: ChIP-Seq, histone modification, human embryonic stem cell, peak calling pro- gram Introduction Protein-binding regions in the context of chromatin have been detected by the chromatin immunoprecipitation (ChIP) method. Since the first ChIP coupled with high-through- put DNA sequencing (ChIP-Seq) technology for histone modification mapping was in- 2020, Korea Genome Organization troduced with the combination of ChIP and next-generation sequencing, a large amount This is an open-access article distributed of ChIP-Seq data has been produced at the genome level, and the development of data under the terms of the Creative Commons Attribution license (http://creativecommons. analysis tools should thus be emphasized [1-3]. org/licenses/by/4.0/), which permits unre- The basic building block of chromatin, the nucleosome, consists of 146 base pairs (bp) stricted use, distribution, and reproduction in any medium, provided the original work is of DNA and a histone octamer composed of four core histones: H2A, H2B, H3, and H4. properly cited. Post-translational modifications of histone tails play an important role in the epigenetic regulation of genome activity. These modifications include acetylation, methylation, 1 / 9 Jeon HG et al. • Comparison of ChIP-Seq peak calling programs phosphorylation, and ubiquitination. Depending on the types of strand cross-correlation analysis was performed using the SPP histone modifications and binding sites, different enrichment pat- program with the default options (-s -100:5:600, and -x 10), con- terns and related biological effects are expected. For example, sidering two metrics: (1) the normalized strand coefficient, which acetylated histones provide a chromatin environment easily acces- quantifies the fragment length cross-correlation over the back- sible to the transcriptional machinery by changing the chromatin ground cross-correlation rate, and (2) the relative strand correla- conformation. Some histone methylations, such as H3K4me2 and tion, which calculates the ratio of cross-correlation observed at the H3K4me3, are mostly located on promoters, whereas H3K36me3 predicted fragment size against the artifactual cross-correlation ob- is predominantly found on the gene bodies of transcriptionally ac- served at the read length [18]. tive genes [4,5]. The Encyclopedia of DNA Elements (ENCODE) Consortium, Identification of regions enriched with specific histone aiming at the identification of all functional elements in the human modifications genome, proposed a guideline for categorizing protein-bound re- To detect peaks, CisGenome (version 2.0), MACS1 (version gions occupied by point source factors, broad source factors, and 1.4.2), MACS2 (version 2.1.0), PeakSeq (version 1.31), and SIS- mixed source factors [6]. SRs (version 1.4), were used with the default options and recom- The distribution patterns of ChIP-Seq data on the genome have mended parameters for a direct comparison without any optimiza- been analyzed using many different software programs with spe- tion (Supplementary Table 2). For CisGenome, the Bowtie-for- cific algorithms, which use different strategies for searching poten- mat output files were converted into the aln format and the se- tial binding regions, judging the peaks, and calculating significance qpeak command was used. For MACS1, the options of –p 1e-5, [7-10]. Most previous studies have focused on detecting the en- -m 10:30, and --keep-dup 1 were used and for MACS2, the default riched peaks, and several groups have already evaluated peak call- options (-q 0.01, -m 5:50, and --keep dup 1) were applied. In ing programs [11-16]. Although most previous studies compared MACS2, the broad options (-q 0.1, -m 5:50, and --keep-dup 1) the performance of each program for analyzing transcription factor were also used for the broad source peaks. The signal map was pre- binding patterns, some tested histone modifications, including pared from the Bowtie output using the PeakSeq -preprocess com- H3K4me3, H3K9me3, H3K27me3, and H3K36me3 [11,12,14]. mand. During the step of PeakSeq -peak_selection, the default op- However, the performance evaluation of ChIP-Seq analysis pro- tions were used, such as Enrichment_mapped_fragment_length grams needs to be more extensively examined to understand the 200, target_FDR 0.05, N_Simulations 10, Minimum_interpeak_ nature of enrichment of various types of histone modifications. distance 200, and max_Qvalue 0.05. SISSRs detected peaks with Herein, we tested ChIP-Seq data from 12 histone modifications the recommend options (-F 0.001, -e 10, -p 0.001, -m 0.8, -w 20, covering three source types with five peak calling programs -E 2, and -L 500). All peaks in each set were ranked by the follow- (CisGenome, MACS1, MACS2, PeakSeq, and SISSRs). ing guidelines: CisGenome and PeakSeq, pre-sorted peak lists; MACS1 and MACS2, sorted by the significance level (10 × Methods 2log10(p-value)) and then by the fold enrichment; SISSRs, ranked by the fold enrichment and by the significance level (p-val- Data filtering and cross-correlation analysis ue). Frequently detected false positive peaks, regardless of cell line The ChIP-Seq datasets of 12 histone modification types, input, or experiment (called the ENCODE blacklist) were removed for and RNA-sequencing of human embryonic stem cell line (H1) quality control of peaks [19,20]. were downloaded from the NIH Roadmap Epigenomics Project Gene Expression Omnibus (GEO) repository (http://www.ncbi. Comparison of peak calling performance nlm.nih.gov/geo/roadmap/epigenomics/) (Supplementary Table The coincidence of peak positions obtained by the individual pro- 1). The downloaded SRA format files were converted to the grams was examined using the intersectBed and multiIntersectBed FASTQ format via fastq-dump in SRA Toolkit (version 2.4.5). functions (BEDTools version 2.23.0) with a minimum overlap- Raw sequencing reads were filtered by fastq_quality_filter ping size of 1 bp [21]. Pearson correlation coefficients based on (FASTX-Toolkit version 0.0.13.2) with the following options (-p peak ranks between overlapped peaks were calculated, because the 80, -q 20, and -Q33). High-quality reads were mapped to the hu- peak rank represents the order of importance according to algo- man genome (hg19) using Bowtie (version 1.1.1) with the default rithm characteristics. For the multiple comparison analyses of each options (-n 2, -e 70, -l 28, -I 0, -X 250, and -maxbts 250) [17]. histone mark, we used multiIntersectBed in BEDTools. The multi- To evaluate the signal-to-noise ratio of a ChIP-Seq experiment, IntersectBed function provided a comparison among the multiple 2 / 9 https://doi.org/10.5808/GI.2020.18.4.e42 Genomics & Informatics 2020;18(4):e42 files. the shortest peaks. The concordance or co-occupancy of peaks re- The Jaccard similarity coefficients (or index J) were calculated gions identified from two different callers were calculated at the for the measurement of variability: J(A, B) = |A ∩ B| / |A∪B| same genomic loci. The peaks from H3K4me2, H3K4me3, H3K- where A and B are sets of enriched regions in base pairs identified 9ac, H3K27me3, and H3K36me3 varied in length. As a represen- by peak calling programs. Irreproducibility discovery rate (IDR) tative example, the number of peaks enriched with H3K4me3, a analysis with all replicates was performed using the recommended typical narrow source mark, ranged from 24,000 to 37,000 and its parameters (peak.half.width ‒1, min.overlap.ratio 0, is.broadpeak F, enrichment profile was very similar at promoters of actively tran- and ranking.measure p.value for MACS1 and MACS2; q.value for scribed genes with all peak callers (Fig. 2A). The peak positional CisGenome and PeakSeq; signal.value for SISSRs) [22]. For the variability was highly dependent on the histone mark type. His- specificity test, the control sequence reads were mixed with the tone marks such as H3K4me2, H3K4me3, H3K27ac, and H3K- original ChIP-Seq data and then the performance was computed. 9ac, which are associated with transcriptional activation, showed a At a different sequencing read depth, the genomic coverage of the high level of concordance.
Recommended publications
  • Introduction to Chip-Seq
    Introduction to ChIP-seq Shamith Samarajiwa CRUK Summer School in Bioinformatics July 2019 Important!!! • Good Experimental Design • Optimize Conditions (Cells, Antibodies, Sonication etc.) • Biological Replicates (at least 3)!! ○ sample biological variation & improve signal to noise ratio ○ capture the desired effect size ○ statistical power to test null hypothesis • ChIP-seq controls – Knockout, Input (Try not to use IgG) What is ChIP Sequencing? ● Combination of chromatin immunoprecipitation (ChIP) with ultra high-throughput massively parallel sequencing. The typical ChIP assay usually take 4–5 days, and require approx. 106~ 107 cells. Allows mapping of Protein–DNA interactions or chromatin modifications in vivo on a genome scale. ● Enables investigation of ○ Transcription Factor binding ○ DNA binding proteins (HP1, Lamins, HMGA etc) ○ RNA Pol-II occupancy ○ Histone modification marks ●Single cell ChIP-seq is possible (Rotem et al, 2015 Nat. Biotech.) Origins of ChIP-seq technology ● Barski, A., Cuddapah, S., Cui, K., Roh, T. Y., Schones, D. E., Wang, Z., et al. “High-resolution profiling of histone methylations in the human genome.” Cell 2007 ● Johnson, D. S., Mortazavi, A., Myers, R. M., and Wold, B. “Genome-wide mapping of in vivo protein-DNA interactions.” Science 316, 2007 ● Mikkelsen, T. S., Ku, M., Jaffe, D. B., Issac, B., Lieberman, E., Giannoukos, G., et al. “Genome-wide maps of chromatin state in pluripotent and lineage-committed cells.” Nature 2007 ● Robertson et al., "Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing." Nat Methods. 2007 ChIP-seq methodology Cross-link cells Isolate genomic DNA Sonication Immuno-precipitation Park 2009 Nat. Rev Genet. Advances in technologies for nucleic acid-protein interaction detection • ChIP-chip : combines ChIP with microarray technology.
    [Show full text]
  • Recurrent Herpes Simplex Virus Type 1 (HSV-1) Infection Modulates Neuronal Aging Marks in in Vitro and in Vivo Models
    International Journal of Molecular Sciences Article Recurrent Herpes Simplex Virus Type 1 (HSV-1) Infection Modulates Neuronal Aging Marks in In Vitro and In Vivo Models Giorgia Napoletani 1 , Virginia Protto 1 , Maria Elena Marcocci 1 , Lucia Nencioni 1 , Anna Teresa Palamara 1,2,† and Giovanna De Chiara 3,*,† 1 Department of Public Health and Infectious Diseases, Sapienza University of Rome, Laboratory Affiliated to Istituto Pasteur Italia–Fondazione Cenci Bolognetti, 00185 Rome, Italy; [email protected] (G.N.); [email protected] (V.P.); [email protected] (M.E.M.); [email protected] (L.N.); [email protected] (A.T.P.) 2 Department of Infectious Diseases, Istituto Superiore di Sanità, 00161 Rome, Italy 3 Institute of Translational Pharmacology, National Research Council (CNR), 00133 Rome, Italy * Correspondence: [email protected] † Co-last authors. Abstract: Herpes simplex virus 1 (HSV-1) is a widespread neurotropic virus establishing a life-long latent infection in neurons with periodic reactivations. Recent studies linked HSV-1 to neurodegen- erative processes related to age-related disorders such as Alzheimer’s disease. Here, we explored whether recurrent HSV-1 infection might accelerate aging in neurons, focusing on peculiar marks of aged cells, such as the increase in histone H4 lysine (K) 16 acetylation (ac) (H4K16ac); the decrease Citation: Napoletani, G.; Protto, V.; of H3K56ac, and the modified expression of Sin3/HDAC1 and HIRA proteins. By exploiting both Marcocci, M.E.; Nencioni, L.; in vitro and in vivo models of recurrent HSV-1 infection, we found a significant increase in H4K16ac, Palamara, A.T.; De Chiara, G.
    [Show full text]
  • Peak-Calling for Chip-Seq and ATAC-Seq
    Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview ★ Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq data ○ Software packages ○ Practical and Statistical aspects (Normalization, IDR, QC measures) ○ MACS peak calling ○ Overview of transcription factor, DNA binding protein, histone mark and nucleosome free region peaks ○ Narrow and Broad peaks ○ A brief look at the MACS2 settings and methodology ○ ATAC-seq signal detection Signal to Noise modified from Carl Herrmann Strand dependent bimodality Wilbanks et al. 2010 PLOS One Peak Calling Software ★ Comprehensive list is at: https://omictools.com/peak-calling-category MACS2 (MACS1.4) Most widely used peak caller. Can detect narrow and broad peaks. Epic (SICER) Specialised for broad peaks BayesPeak R/Bioconductor Jmosaics Detects enriched regions jointly from replicates T-PIC Shape based EDD Detects megabase domain enrichment GEM Peak calling and motif discovery for ChIP-seq and ChIP-exo SPP Fragment length computation and saturation analysis to determine if read depth is adequate. Quality Measures • Fraction of reads in peaks (FRiP) is dependant on data type. FRiP can be calculated with deepTools2 • PCR Bottleneck Coefficient (PBC) is a measure of library complexity Preseq and preseqR for determining N1= Non redundant, uniquely mapping reads library complexity N2= Uniquely mapping reads Daley et al., 2013, Nat. Methods Quality Measures • Relative strand cross-correlation The RSC is the ratio of the fragment-length cross-correlation value minus the background cross-correlation value, divided by the phantom-peak cross-correlation value minus the background cross-correlation value.
    [Show full text]
  • Hst3 Is Turned Over by a Replication Stress-Responsive SCF Phospho
    Hst3 is turned over by a replication stress-responsive SCFCdc4 phospho-degron Ellen R. Edenberga, Ajay A. Vashishtb, Benjamin R. Topacioa, James A. Wohlschlegelb, and David P. Toczyskia,1 aDepartment of Biochemistry and Biophysics, University of California, San Francisco, CA 94158; and bDepartment of Biological Chemistry, University of California, Los Angeles, CA 90095 Edited* by Stephen J. Elledge, Harvard Medical School, Boston, MA, and approved March 10, 2014 (received for review August 13, 2013) Hst3 is the histone deacetylase that removes histone H3K56 several E3 ubiquitin ligases to identify the one responsible for acetylation. H3K56 acetylation is a cell-cycle– and damage-regu- targeting Hst3 for degradation and found that Hst3 is partially lated chromatin marker, and proper regulation of H3K56 acetyla- stabilized at the nonpermissive temperature of temperature- tion is important for replication, genomic stability, chromatin sensitive mutants in CDC53 (the cullin scaffold for all SCF assembly, and the response to and recovery from DNA damage. ligases) and CDC4 (encoding an essential F-box protein) (Fig. Understanding the regulation of enzymes that regulate H3K56 1A). Inactivation of SCFCdc4 also stabilized Hst3 after treatment acetylation is of great interest, because the loss of H3K56 acetyla- B HST3 with hydroxyurea (HU) to induce replication stress (Fig. 1 ). tion leads to genomic instability. is controlled at both the Cdc4 transcriptional and posttranscriptional level. Here, we show that SCF recognizes its substrates through a phospho-degron – Hst3 is targeted for turnover by the ubiquitin ligase SCFCdc4 (20 22). To understand how Hst3 was targeted for turnover both after phosphorylation of a multisite degron.
    [Show full text]
  • Annominer Is a New Web-Tool to Integrate Epigenetics, Transcription
    www.nature.com/scientificreports OPEN AnnoMiner is a new web‑tool to integrate epigenetics, transcription factor occupancy and transcriptomics data to predict transcriptional regulators Arno Meiler1,3, Fabio Marchiano2,3, Margaux Haering2, Manuela Weitkunat1, Frank Schnorrer1,2 & Bianca H. Habermann1,2* Gene expression regulation requires precise transcriptional programs, led by transcription factors in combination with epigenetic events. Recent advances in epigenomic and transcriptomic techniques provided insight into diferent gene regulation mechanisms. However, to date it remains challenging to understand how combinations of transcription factors together with epigenetic events control cell‑type specifc gene expression. We have developed the AnnoMiner web‑server, an innovative and fexible tool to annotate and integrate epigenetic, and transcription factor occupancy data. First, AnnoMiner annotates user‑provided peaks with gene features. Second, AnnoMiner can integrate genome binding data from two diferent transcriptional regulators together with gene features. Third, AnnoMiner ofers to explore the transcriptional deregulation of genes nearby, or within a specifed genomic region surrounding a user‑provided peak. AnnoMiner’s fourth function performs transcription factor or histone modifcation enrichment analysis for user‑provided gene lists by utilizing hundreds of public, high‑quality datasets from ENCODE for the model organisms human, mouse, Drosophila and C. elegans. Thus, AnnoMiner can predict transcriptional regulators for a studied
    [Show full text]
  • Watanabe S, Resch M, Lilyestrom W, Clark N
    NIH Public Access Author Manuscript Biochim Biophys Acta. Author manuscript; available in PMC 2010 November 1. NIH-PA Author ManuscriptPublished NIH-PA Author Manuscript in final edited NIH-PA Author Manuscript form as: Biochim Biophys Acta. 2010 ; 1799(5-6): 480±486. doi:10.1016/j.bbagrm.2010.01.009. Structural characterization of H3K56Q nucleosomes and nucleosomal arrays Shinya Watanabe1,*, Michael Resch2,*, Wayne Lilyestrom2, Nicholas Clark2, Jeffrey C. Hansen2, Craig Peterson1, and Karolin Luger2,3 1 Program in Molecular Medicine, University of Massachusetts Medical School, 373 Plantation St.; Worcester, Massachusetts 01605 2 Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, CO 80523-1870 3 Howard Hughes Medical Institute Abstract The posttranslational modification of histones is a key mechanism for the modulation of DNA accessibility. Acetylated lysine 56 in histone H3 is associated with nucleosome assembly during replication and DNA repair, and is thus likely to predominate in regions of chromatin containing nucleosome free regions. Here we show by x-ray crystallography that mutation of H3 lysine 56 to glutamine (to mimic acetylation) or glutamate (to cause a charge reversal) has no detectable effects on the structure of the nucleosome. At the level of higher order chromatin structure, the K to Q substitution has no effect on the folding of model nucleosomal arrays in cis, regardless of the degree of nucleosome density. In contrast, defects in array-array interactions in trans (‘oligomerization’) are selectively observed for mutant H3 lysine 56 arrays that contain nucleosome free regions. Our data suggests that H3K56 acetylation is one of the molecular mechanisms employed to keep chromatin with nucleosome free regions accessible to the DNA replication and repair machinery.
    [Show full text]
  • Fstitch: a Fast and Simple Algorithm for Detecting Nascent RNA Transcripts
    FStitch: A fast and simple algorithm for detecting nascent RNA transcripts Joseph Azofeifa Mary A. Allen Manuel Lladser Department of Computer BioFrontiers Institute Department of Applied Science University of Colorado Mathematics University of Colorado 596 UCB, JSCBB University of Colorado 596 UCB, JSCBB Boulder, CO 80309 526 UCB Boulder, CO 80309 Boulder, CO 80309 ∗ Robin Dowell Department of MCD Biology & Computer Science BioFrontiers Institute University of Colorado 596 UCB, JSCBB Boulder, CO 80309 ABSTRACT Keywords We present a fast and simple algorithm to detect nascent Nascent Transcription, Logisitic Regression, Hidden Markov RNA transcription in global nuclear run-on sequencing Models (GRO-seq). GRO-seq is a relatively new protocol that cap- tures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state 1. INTRODUCTION RNA levels, which are affected by transcription, post-trans- Almost all cellular stimulations trigger global transcriptional criptional processing, and RNA stability. A detailed study changes. To date, most studies of transcription have em- of GRO-seq data has the potential to inform on many as- ployed RNA-seq or microarrays. These assays, though pow- pects of the transcription process. GRO-seq data, however, erful, measure steady state RNA levels. Consequently, they presents unique analysis challenges that are only beginning are not true measures of transcription because steady state to be addressed. Here we describe a new algorithm, Fast levels are influenced by not only transcription but also RNA Read Stitcher (FStitch), that takes advantage of two pop- stability. Only recently has a method for direct measur- ular machine-learning techniques, a hidden Markov model ment of transcription genome-wide become available.
    [Show full text]
  • Transcription Shapes Genome-Wide Histone Acetylation Patterns
    ARTICLE https://doi.org/10.1038/s41467-020-20543-z OPEN Transcription shapes genome-wide histone acetylation patterns Benjamin J. E. Martin 1, Julie Brind’Amour 2, Anastasia Kuzmin1, Kristoffer N. Jensen2, Zhen Cheng Liu1, ✉ Matthew Lorincz 2 & LeAnn J. Howe 1 Histone acetylation is a ubiquitous hallmark of transcription, but whether the link between histone acetylation and transcription is causal or consequential has not been addressed. 1234567890():,; Using immunoblot and chromatin immunoprecipitation-sequencing in S. cerevisiae, here we show that the majority of histone acetylation is dependent on transcription. This dependency is partially explained by the requirement of RNA polymerase II (RNAPII) for the interaction of H4 histone acetyltransferases (HATs) with gene bodies. Our data also confirms the targeting of HATs by transcription activators, but interestingly, promoter-bound HATs are unable to acetylate histones in the absence of transcription. Indeed, HAT occupancy alone poorly predicts histone acetylation genome-wide, suggesting that HAT activity is regulated post- recruitment. Consistent with this, we show that histone acetylation increases at nucleosomes predicted to stall RNAPII, supporting the hypothesis that this modification is dependent on nucleosome disruption during transcription. Collectively, these data show that histone acetylation is a consequence of RNAPII promoting both the recruitment and activity of histone acetyltransferases. 1 Department of Biochemistry and Molecular Biology, Life Sciences Institute, Molecular
    [Show full text]
  • A Deep Learning Peak Caller for ATAC-Seq, Chip-Seq, and Dnase-Seq 1,2 1,2 2 Lance D
    bioRxiv preprint doi: https://doi.org/10.1101/2021.01.25.428108; this version posted January 27, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq 1,2 1,2 2 Lance D. Hentges ,​ Martin J. Sergeant ,​ Damien J. Downes ,​ Jim R. ​ ​ ​ 1,2 1* Hughes ​ & Stephen Taylor ​ ​ 1 MRC​ WIMM Centre for Computational Biology, MRC Weatherall Institute of Molecular 2 Medicine, University of Oxford, Oxford, UK. MRC​ Molecular Haematology Unit, MRC ​ Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK. * To​ whom correspondence should be addressed. Abstract Genomics technologies, such as ATAC-seq, ChIP-seq, and DNase-seq, have revolutionized molecular biology, generating a complete genome’s worth of signal in a single assay. Coupled with the use of genome browsers, researchers can now see and identify important DNA encoded elements as peaks in an analog signal. Despite the ease with which humans can visually identify peaks, converting these signals into meaningful genome-wide peak calls from such massive datasets requires complex analytical techniques. Current methods use statistical frameworks to identify peaks as sites of significant signal enrichment, discounting that the analog data do not follow any archetypal distribution. Recent advances in artificial intelligence have shown great promise in image recognition, on par or exceeding human ability, providing an opportunity to reimagine and improve peak calling. We present an interactive and intuitive peak calling framework, LanceOtron, built around image recognition using a wide and deep neural network.
    [Show full text]
  • Histone Deacetylase Inhibitors Globally
    Histone Deacetylase Inhibitors Globally Enhance H3/H4 Tail Acetylation SUBJECT AREAS: BIOCHEMISTRY Without Affecting H3 Lysine 56 CELL BIOLOGY MOLECULAR BIOLOGY Acetylation PROTEOMICS Paul Drogaris1,4*, Vale´rie Villeneuve1,5*, Christelle Pomie`s1, Eun-Hye Lee1,Ve´ronique Bourdeau2, E´ric Bonneil1, Gerardo Ferbeyre2, Alain Verreault1,3 & Pierre Thibault1,4 Received 25 October 2011 1Institute for Research in Immunology and Cancer (IRIC), Universite´ de Montre´al (QC), Canada, 2Department of Biochemistry, Accepted Universite´ de Montre´al (QC), Canada, 3Department of Pathology and Cell Biology, Universite´ de Montre´al (QC), Canada, 4Department 20 December 2011 of Chemistry, Universite´ de Montre´al (QC), Canada, 5Department of Molecular Biology, Universite´ de Montre´al (QC), Canada. Published 12 January 2012 Histone deacetylase inhibitors (HDACi) represent a promising avenue for cancer therapy. We applied mass spectrometry (MS) to determine the impact of clinically relevant HDACi on global levels of histone acetylation. Intact histone profiling revealed that the HDACi SAHA and MS-275 globally increased histone Correspondence and H3 and H4 acetylation in both normal diploid fibroblasts and transformed human cells. Histone H3 lysine requests for materials 56 acetylation (H3K56ac) recently elicited much interest and controversy due to its potential as a diagnostic and prognostic marker for a broad diversity of cancers. Using quantitative MS, we demonstrate that should be addressed to H3K56ac is much less abundant than previously reported in human cells. Unexpectedly, in contrast to P.T. (pierre.thibault@ H3/H4 N-terminal tail acetylation, H3K56ac did not increase in response to inhibitors of each class of umontreal.ca) or A.V. HDACs. In addition, we demonstrate that antibodies raised against H3K56ac peptides cross-react against (alain.verreault@ H3 N-terminal tail acetylation sites that carry sequence similarity to residues flanking H3K56.
    [Show full text]
  • The Prenucleosome, a Stable Conformational Isomer of the Nucleosome
    Downloaded from genesdev.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press The prenucleosome, a stable conformational isomer of the nucleosome Jia Fei,1 Sharon E. Torigoe,1 Christopher R. Brown,2 Mai T. Khuong,1 George A. Kassavetis,1 Hinrich Boeger,2 and James T. Kadonaga1 1Section of Molecular Biology, University of California at San Diego, La Jolla, California 92093, USA; 2Department of Molecular, Cell, and Developmental Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA Chromatin comprises nucleosomes as well as nonnucleosomal histone–DNA particles. Prenucleosomes are rapidly formed histone–DNA particles that can be converted into canonical nucleosomes by a motor protein such as ACF. Here we show that the prenucleosome is a stable conformational isomer of the nucleosome. It consists of a histone octamer associated with 80 base pair (bp) of DNA, which is located at a position that corresponds to the central 80 bp of a nucleosome core particle. Monomeric prenucleosomes with free flanking DNA do not spontaneously fold into nucleosomes but can be converted into canonical nucleosomes by an ATP-driven motor protein such as ACF or Chd1. In addition, histone H3K56, which is located at the DNA entry and exit points of a canonical nucleosome, is specifically acetylated by p300 in prenucleosomes relative to nucleosomes. Prenucleosomes assembled in vitro exhibit properties that are strikingly similar to those of nonnucleosomal histone–DNA particles in the upstream region of active promoters in vivo. These findings suggest that the prenucleosome, the only known stable confor- mational isomer of the nucleosome, is related to nonnucleosomal histone–DNA species in the cell.
    [Show full text]
  • HMMRATAC: a Hidden Markov Modeler for ATAC-Seq
    bioRxiv preprint doi: https://doi.org/10.1101/306621; this version posted December 10, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 HMMRATAC: a Hidden Markov ModeleR for ATAC-seq. 2 3 Evan D. Tarbell1 and Tao Liu1* 4 5 1 Department of Biochemistry, University at Buffalo, Buffalo, NY, 14203, USA 6 * To whom correspondence should be addressed. Tel: 716-829-2749; Fax: 716-849-6890; Email: 7 [email protected] 8 9 ABSTRACT 10 11 ATAC-seq has been widely adopted to identify accessible chromatin regions across the genome. 12 However, current data analysis still utilizes approaches initially designed for ChIP-seq or DNase- 13 seq, without taking into account the transposase digested DNA fragments that contain additional 14 nucleosome positioning information. We present the first dedicated ATAC-seq analysis tool, a 15 semi-supervised machine learning approach named HMMRATAC. HMMRATAC splits a single 16 ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique 17 chromatin structure around accessible regions, and then predicts accessible regions across the 18 entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on 19 published human and mouse ATAC-seq datasets. 20 21 INTRODUCTION 22 23 The genomes of all known eukaryotes are packaged into a nucleoprotein complex called 24 chromatin. The nucleosome is the fundamental, repeating unit of chromatin, consisting of 25 approximately 147 base pairs of DNA wrapped around an octet of histone proteins(1).
    [Show full text]