Empowering the Detection of Chip-Seq “Basic Peaks” (Bpeaks) In
Total Page:16
File Type:pdf, Size:1020Kb
Denecker and Lelandais BMC Res Notes (2018) 11:698 https://doi.org/10.1186/s13104-018-3802-y BMC Research Notes RESEARCH NOTE Open Access Empowering the detection of ChIP‑seq “basic peaks” (bPeaks) in small eukaryotic genomes with a web user‑interactive interface Thomas Denecker* and Gaëlle Lelandais Abstract Objective: bPeaks is a peak calling program to detect protein DNA-binding sites from ChIPseq data in small eukary- otic genomes. The simplicity of the bPeaks method is well appreciated by users, but its use via an R package is challenging and time-consuming for people without programming skills. In addition, user feedback has highlighted the lack of a convenient way to carefully explore bPeaks result fles. In this context, the development of a web user interface represents an important added value for expanding the bPeaks user community. Results: We developed a new bPeaks application (bPeaks App). The application allows the user to perform all the peak-calling analysis steps with bPeaks in a few mouse clicks via a web browser. We added new features relative to the original R package, particularly the possibility to import personal annotation fles to compare the location of the detected peaks with specifc genomic elements of interest of the user, in any organism, and a new organization of the result fles which are directly manageable via a user-interactive genome browser. This signifcantly improves the ability of the user to explore all detected basic peaks in detail. Keywords: ChIP-seq, Peak calling, Protein DNA-binding sites, Small eukaryotic genomes, bPeaks Introduction [5]. Te general idea was to take advantage of sim- ChIP-seq, i.e. chromatin immunoprecipitation sequenc- pler peak calling for species with small genome sizes ing, is an experimental approach to analyze protein (< 20 Mb). Te program bPeaks thus performs an explo- interactions with DNA [1]. Peak detection (also referred ration of ChIP-seq results at the nucleotide scale. It uses as “peak calling”) consists of identifying all the genomic a sliding window, which compares the read distributions regions in which a signifcant enrichment of DNA between the immunoprecipitation (IP) sample and a con- sequences (or reads) in a ChIP sample is observed com- trol sample. We implemented the bPeaks program with pared to a control sample. Tese regions are expected to the R language and the associated package is available represent DNA-binding sites for the studied protein [2]. at the CRAN website [6]. Since its original publication, A considerable number of peak calling software packages the bPeaks R package has been downloaded more than have been developed (for instance [3, 4], etc.) and choos- 11,360 times (July 2018) and successfully used to identify ing the appropriate software, optimized for a specifc bio- DNA-binding sites for diferent proteins in several yeast logical system of interest, is a prerequisite for successful species [7–10]. ChIP-seq data interpretation. Our colleagues, essentially experimental biologists, In this context, we proposed a methodology to iden- appreciate the simplicity of the bPeaks methodology. tify “basic Peaks” (bPeaks) in small eukaryotic genomes Te program uses a combination of only four thresh- olds to mimic “good peak” properties, as described by investigators who visually inspect ChIP-seq results on a *Correspondence: thomas.denecker@u‑psud.fr Institut de Biologie Intégrative de la Cellule (I2BC), Centre National de la genome browser [5]. However, they highlighted several Recherche Scientifque: UMR9198, Université Paris-Saclay, Université Paris- difculties. First, working with an R program is a chal- Sud - Paris 11, 11 ‑ Bâtiment 400, Orsay, France lenging task for people with only limited bioinformatic © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Denecker and Lelandais BMC Res Notes (2018) 11:698 Page 2 of 7 skills. Initial installation of the necessary software and (i) evaluate the overall quality of the analyzed ChIP-seq libraries, importing of the ChIP-seq data in R, and run- data (Lorenz curve and PBC calculation), (ii) facilitate ning a bPeaks search can be excessively difcult for sim- the exploration of output fles (with a user-interactive ply technical reasons and may thus be an obstacle to the genome browser), and (iii) upload annotation fles to in-depth analysis of the results. Also, bPeaks generates a compare the location of the detected peaks with particu- large number of output fles (several dozen), which are lar genomic elements of interest to the user, in any organ- all automatically written and stored in a single operating ism. bPeaks App is an open source program available on system (OS) folder. Tese fles were meant to be helpful Github [11]. It has the advantage that it can be deployed for further investigation (for example, detection of regu- locally (on a personal workstation) or on a server. Here, latory motifs), but feedback from users brought to our we present the technical solutions that were chosen and attention the need for their better organization and docu- explain the main functionalities of the bPeaks App. mentation. Finally, the bPeaks R package only comprises pre-registered annotations of genes for yeast species Main text because our group’s research activities are focused on Methods functional genomics in yeast. Tis is an important limita- General overview tion for researchers interested in other organisms. Te bPeaks App is a web application which uses the We developed a new application to overcome these lim- original bPeaks R package [6]. Te backend of the appli- itations. Referred hereafter as the “bPeaks App” (bPeaks cation is based on three mainstream open source tech- application), it is used via a web browser and makes it nologies: Github [12], Docker [13], and PostgresSQL possible to perform all the analysis steps required to [14] (see Fig. 1a). Te frontend solutions of the appli- identify protein DNA-binding sites with ChIP-seq results cation were chosen to provide users a particularly in a few mouse clicks. We developed new functionalities easy-to-use experience using Shiny, the Web Applica- in the bPeaks App, relative to the original R package, to tion Framework for R [15]. Te Plotly R package [16] ab Fig. 1 Technical solutions to develop the bPeaks App. a Open source softwares used to develop the application are shown here. It is divided into three sections: (1) development and test (in blue), (2) database storage (in red), and (3) fnal production (in green). b Tables defned in the PostgreSQL database that highlight the functioning of the bPeaks App Denecker and Lelandais BMC Res Notes (2018) 11:698 Page 3 of 7 was used to obtain dynamic graphical representa- Criterion to evaluate ChIP‑seq data quality tions, together with Google chart [17]. Te application Lorenz curves and PCR Bottleneck Coefcients (PBCs) requires a database to control user access. Te solu- are classical criteria to evaluate ChIP-seq data quality tion proposed by Shiny requires payment. Tus, we [23] and were both implemented into the bPeaks App. preferred another approach, based on a PostgreSQL Details of the calculations are presented in Additional database. Put very simply, the database is comprised fle 1. of three tables: one to manage user information, one to manage an information dashboard and one for gene Results annotations (Fig. 1b). Te connection between R and bPeaks App PostgreSQL was accomplished using RPostgreSQL [18] With the new bPeaks App, our aim was to (1) simplify and the protection of user passwords achieved with the the use of the bPeaks peak calling method for those pgcrypto extension. with no bioinformatics skills, (2) add several ChIP-seq data representations to assess the overall quality of the initial experiment, and (3) facilitate the exploration of Strategy for versioning the application peak calling results. Also, we wanted to guarantee the Two complementary axes were considered to ensure reproducibility of any results obtained with bPeaks App, appropriate versioning of the bPeaks App: (i) R package systematically tracing all the analysis steps and computa- dependencies and (ii) OS dependencies. Te package tional tool versions. We decided to divide the application manager packrat [19] was used to precisely follow the lat- into two parts referred hereafter as “bPeaks analyzer” est versions of all the packages used for development of and “bPeaks explorer”. bPeaks analyzer focuses solely the bPeaks App. It saves libraries locally and generates a on the peak calling step. Output fles are automatically packrat.lock fle. Tis fle lists the detailed package ver- renamed and reorganized in diferent OS folders. Tese sions that were used, including all dependencies. Te fles represent the starting point for the bPeaks explorer bPeaks App was built on a containerization paradigm part, which allows dynamic and user-interactive visuali- (see Fig. 1a) with Docker, since R software is also depend- zation of the detected peaks, as well as peak localization ent on the OS. Our objective was to entirely pack the relative to particular genomic elements (coding or pro- application and its dependencies in a virtual container. moter sequences, DNA repeated regions, etc.). All these Tus, it was possible to build images that contain eve- features in bPeaks explorer are novel compared to previ- rything required for the bPeaks App to function.