Integrated Analysis of RNA-Seq and Chip-Seq Data Using Strand NGS To

Total Page:16

File Type:pdf, Size:1020Kb

Integrated Analysis of RNA-Seq and Chip-Seq Data Using Strand NGS To Application Note Integrated analysis of RNA-Seq and ChIP-Seq data using Strand NGS to understand the Regulation of Cardiogenesis Mohammed Toufiq, Sumeet Deshmukh, Srikanthi Ramachandrula, Sunil C. Cherukuri Strand Life Sciences Pvt Ltd Overview built an architecture of pathway and NLP 1% of all human births in the west are affected networks. Pathway Analysis showed NKX2-5 with congenital heart disease and this to play a pivotal role in cardiogenesis and any constitutes a major burden on public health change in its levels leads to multiplying effects organizations. Numerous next-generation on heart development pathway. NLP was used sequencing studies utilize multiple to build a network with plausible interaction of technologies to understand and answer genes mined from interaction DB and a complex biological questions. Here, using proprietary database constructed by Strand Strand NGS bioinformatics software version using all the PubMed abstracts. Further, NLP 3.0 and above, we provide an illustrative showed that most proteins that interact with example on how to integrate data from NKX2-5 are either enhancers or growth factors different sources to identify differential which might explain the mechanism of expression profiles and infer their transcription heightened impact on heart development. We factor binding sites (using RNA-Seq and ChIP- intend to showcase Strand NGS as a go to tool Seq data) in a combined analysis to define the for analyzing and interpreting multi-omics data regulation of Cardiogenesis. Transcription with intuitive workflows and user friendly factors like NKX2-5 and MEIS1 have been features. shown to play a critical role in vertebrate heart development. Identifying their expression and targets of these factors, along with the regulatory interactions will be a major step towards understanding the broader cardiac developmental processes. As a part of this study, we re-analyzed publicly available datasets of GSE44576. This RNA-Seq and ChIP-Seq data was generated using the Illumina platform to investigate the expression profiles and genome binding sites of transcription factors. Similar to the conclusions in the original publication1, we could identify the mechanism of transcriptional regulation during cardiac differentiation by successive binding of the two homeodomain transcription factors NKX2-5 and MEIS1 on Popdc2 enhancer. ChIP-Seq data analysis helped in Figure 1: RNA-Seq data and ChIP-Seq data finding the binding domains of NKX2-5 and analysis workflow used for studying the MEIS1, while RNA-Seq data analysis aided in regulation of Cardiogenesis finding their impact on the expression of other genes. In addition to these results, we also Integrated analysis of RNA-Seq & ChIP-Seq data Datasets (less than or equal to 10) were trimmed from that end. To ensure that this trimming did not RNA-Seq and ChIP-Seq datasets from mouse result in very short reads, the minimum read (Mus musculus, mm9 build) were obtained length was fixed at 25bp. In each sample, 79- from NCBI GEO database [GSE445762]. The 84% of the total reads were aligned of which RNA-Seq data is paired end and ChIP-Seq 76-80% being uniquely aligned. data is single end, both generated using the Illumina platform (See Table 1 and Table 2). Post alignment, the reads were de-duplicated, Wild Type (WT) refers to a gene that prevails quantified based on the methods suggested by 3 among the individuals of the natural population Mortazavi et al and quantile normalization whereas Hypomorph is a condition in which was applied. Genes were filtered based on the altered gene product possesses a reduced their normalized signal intensity values th th level of activity or lacks the molecular function. (between 20.0 - 100.0 percentile) and are present at least in 1 out of 2 conditions. Mann- Samples Tissue Whitney unpaired test was performed to find SRR748961 E11.5 heart, Wild Type 1 entities showing statistically significant SRR748962 E11.5 heart, Wild Type 2 differences with p- value cut-off ≤ 0.05. Genes SRR748963 E11.5 heart, Wild Type 3 showing a fold change ≥ 1.5 were retained and SRR748964 E11.5 heart, Hypomorph 1 used for downstream analysis including Gene SRR748965 E11.5 heart, Hypomorph 2 Ontology (GO) and Pathway Analysis. SRR748966 E11.5 heart, Hypomorph 3 The raw reads corresponding to ChIP-Seq Table 1: RNA-Seq experiment information data were aligned in Strand NGS against the genome (mm9). The raw reads were aligned Samples Tissue with a minimum of 90% identity, maximum of SRR748967 E11.5 heart, Input 5% gaps, and 25bp as the minimum aligned SRR748968 E11.5 heart, Nkx2-5 ChIP (S1) read length. SRR748969 E11.5 heart, Nkx2-5 ChIP (S4) Post alignment, de-duplication of reads was Table 2: ChIP-Seq experiment information performed and peaks were detected with the MACS4 algorithm on each of the replicate samples (S1 and S4) against the input separately, using an average fragment size of Data analysis methodology 300 bases and a p-value cut off of 1.0E-5. The resulting peak regions were annotated with All analyses reported in this study were genes present in +/-5 kbp window and the performed using Strand NGS bioinformatics common peak-associated genes were software version 3.0 and above (See Figure identified using a 2-way Venn diagram. Post 1). Before proceeding with the alignment, we this, MEIS1 motif was downloaded from looked at some pre-alignment QC metrics to JASPAR database5, imported into Strand NGS investigate the read quality. In all the samples, and scanned against the whole genome in most of the reads had an average read quality order to find the possible binding sites for of ~39. Alignment was performed using our in- MEIS1. house Strand NGS aligner which follows the Burrows-Wheeler Transform (BWT) approach. Results and Discussion The data was aligned against Transcriptome and Genome (mm9) using UCSC model. The RNA- Seq Analysis alignment parameters allowed for 10% of mismatches and 5% of gaps in a read. Reads Post quantification and filtering, data analysis aligning to multiple locations were reported using Strand NGS revealed a total of 11,929 only once and reads aligning to more than 5 differentially-expressed genes with p-value ≤ locations were ignored. Since the base quality 0.05 with 6,038 showing a fold change dipped towards the 3’ end, low quality bases difference of ≥ 1.5 between the Wild Type and www.strand-ngs.com [email protected] ; [email protected] Integrated analysis of RNA-Seq & ChIP-Seq data Hypomorphic Hearts. Among these, 5,952 showed similar behaviour in the Gene view genes showed up-regulation and 86 genes (See Figure 4). Principle Component Analysis showed down-regulation. Figure 2 displays a (PCA) further confirmed the clear distinction 2-D scatter plot showing a fold change and separation between the Wild Type and difference. Genes including NKX2-5 and Hypomorphic Hearts (See Figure 5). In PCA, MEIS1 (highlighted in yellow) show ≥ 1.5 fold PC1 is the Eigen vector that captures the change expression differences and are primary variation in the dataset, PC2 captures statistically significant. Most genes show an the second most variation and PC3 captures up-regulated expression in the Wild Type the least variations in the dataset. In our Hearts. Figure 3 displays profile plot of gene analysis, sample groups are separated by expression regulation. PC1, PC2, and PC3. Strand NGS offers support for GO analysis, a functional enrichment analysis of the significant genes based on biological process, molecular functions, and cellular components. The affected gene list is found to be significantly enriched for genes primarily involved in heart development and other cardiac-related functions (See Figure 6). Figure 2: 2-D scatter plot plotted on fold change analysis list. Genes including NKX2-5 and MEIS1 (highlighted in yellow) are showing ≥ 1.5 fold change level difference and are also statistically significant with p-value ≤ 0.05. Most genes show an up-regulated expression in the Wild Type Hearts. Figure 4: Gene View showing raw counts of Popdc2 transcripts having a similar pattern to NKX2-5 expression levels in Hypomorphic and Wild Type Hearts. Figure 3: Hypomorphic Hearts show lower gene expression level for NKX2-5, MEIS1, and Popdc2 compared to Wild Type Hearts in the profile plot. In addition, it was observed that Hypomorphic Figure 5: Clear distinction and data separation Hearts with lowered NKX2-5 expression in PCA between the Hypomorphic and Wild showed lowered expression levels of Popdc2 Type Hearts. and their corresponding transcripts also www.strand-ngs.com [email protected] ; [email protected] Integrated analysis of RNA-Seq & ChIP-Seq data A B A B Figure 6: A. Gene Ontology analysis of functionally affected genes. B. The affected gene list is found to be significantly enriched for genes involved in heart development and other cardiac-related functions ChIP-Seq Analysis Peak detection is used to identify the interaction pattern of a protein with DNA which involves gene activation by a set of protein transcription factors. As part of this analysis, peaks were detected with MACS algorithm4. The total numbers of peaks detected on each of the replicate samples (S and S ) against 1 4 Figure 8: A. MEIS1 motif downloaded from the input separately are 14,149 and these JASPAR database. B. MEIS1 motif scanned peak regions annotated to 8,178 genes (+/- against the whole genome in Strand NGS to 5kbp region size). This is as expected as identify the possible binding sites for MEIS1. Popdc2 enhancer has a prominent role for binding of NKX2-5 and MEIS1 (See Figure 7). Integrated analysis of RNA-Seq and MEIS1 motif was downloaded from JASPAR 5 ChIP-Seq database , imported into Strand NGS and scanned it against the whole genome in order To explain the regulation of Cardiogenesis, we to find the possible binding sites for MEIS1 conducted an integrated analysis of the (See Figure 8A and 8B).
Recommended publications
  • Next-Generation Sequencing-Based Method Shows Increased Mutation Detection Sensitivity in an Indian Retinoblastoma Cohort
    Molecular Vision 2016; 22:1036-1047 <http://www.molvis.org/molvis/v22/1036> © 2016 Molecular Vision Received 23 May 2016 | Accepted 14 August 2016 | Published 16 August 2016 Next-generation sequencing-based method shows increased mutation detection sensitivity in an Indian retinoblastoma cohort Jaya Singh,1 Avshesh Mishra,1 Arunachalam Jayamuruga Pandian,2 Ashwin C. Mallipatna,3 Vikas Khetan,4 S. Sripriya,2 Suman Kapoor,1 Smita Agarwal,1 Satish Sankaran,1 Shanmukh Katragadda,1 Vamsi Veeramachaneni,1 Ramesh Hariharan,1,5 Kalyanasundaram Subramanian,1 Ashraf U. Mannan1 (The first two authors contributed equally to this work.) 1Strand Center for Genomics and Personalized Medicine, Strand Life Sciences, Bangalore, India; 2Sankara Nethralaya ONGC Department of Genetics and Molecular Biology, Vision Research Foundation, Chennai, India; 3Department of Pediatric Ophthalmology and Strabismology, Narayana Nethralaya, Bangalore, India; 4Department of Vitreo Retina, Sankara Nethralaya, Chennai, India; 5Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India Purpose: Retinoblastoma (Rb) is the most common primary intraocular cancer of childhood and one of the major causes of blindness in children. India has the highest number of patients with Rb in the world. Mutations in the RB1 gene are the primary cause of Rb, and heterogeneous mutations are distributed throughout the entire length of the gene. Therefore, genetic testing requires screening of the entire gene, which by conventional sequencing is time consuming and expensive. Methods: In this study, we screened the RB1 gene in the DNA isolated from blood or saliva samples of 50 unrelated patients with Rb using the TruSight Cancer panel. Next-generation sequencing (NGS) was done on the Illumina MiSeq platform.
    [Show full text]
  • Project Deliverable
    Ref. Ares(2013)2985750 - 03/09/2013 Project number: 317871 Project acronym: BIOBANKCLOUD Project title: Scalable, Secure Storage and Analysis of Biobank Data Project website URL: http://www.biobankcloud.com/ Project Coordinator Name and Organisation: Jim Dowling, KTH E-mail: [email protected] WORK PACKAGE 8 : Dissemination Work Package Leader Name and Organisation: JAN-ERIC LITTON, Karolinska Institute (KI) E-mail: [email protected] PROJECT DELIVERABLE D8.1: Dissemination plan Deliverable Due date (and month since project start): 2013-08-31, m9 Deliverable Version: v0.2 BiobankCloud D8.1 page 1/22 317871 Document history Version Date Changes By Reviewed Lora Dimitrova Karin Zimmermann 0.1 2013-08-14 First draft Michael Hummel Jörgen Brandt Ulf Leser Lora Dimitrova 0.2 2013-08-26 Final Roxana Merino Martinez Michael Hummel Jan-Eric Litton BiobankCloud D8.1 page 2/22 317871 Abstract The goal of this deliverable is to elaborate a plan for the dissemination of the BiobankCloud. For this purpose two different approaches will be applied: active as well as passive dissemination. Different academic and industrial partners will be actively contacted, while internet tools, various meetings/workshops and publications in major scientific journals will be used for passive dissemination of the BiobankCloud. In D8.1 we describe these possibilities for dissemination of the knowledge about the BiobankCloud in detail. BiobankCloud D8.1 page 3/22 317871 Table of Contents Dissemination............................................................................................................
    [Show full text]
  • Sureselectxt RNA Direct Protocol Provides Simultaneous Transcriptome Enrichment and Ribosomal Depletion of FFPE RNA
    SureSelectXT RNA Direct Protocol Provides Simultaneous Transcriptome Enrichment and Ribosomal Depletion of FFPE RNA Application Note Authors Abstract Jennifer Carter Jones, Alex Siebold, The ability to extract RNA and prepare RNA sequencing (RNA-Seq) libraries and Anne Bergstrom Lucas from Formalin Fixed Paraffin Embedded (FFPE) tissues allows researchers to identify and validate new biomarkers of disease onset, progression, and therapy resistance. However, the typically poor quality of RNA derived from FFPE samples has previously limited use of this tissue source as a resource for transcriptome profiling and research by next generation sequencing (NGS)1. Recently, when performing targeted enrichment of cDNA libraries, we have found that with protocol modifications of the Agilent Strand-Specific RNA Library Prep kit we can generate RNA-Seq data with minimal ribosomal contamination and good sequencing coverage. In this study, we used a set of matched fresh frozen (FF) and FFPE-derived RNA from tumor/normal samples to demonstrate that high-quality data can be derived from FFPE samples using a protocol that does not require upfront ribosomal depletion or poly(A) selection. When using the SureSelectXT RNA Direct protocol and the reagents from the SureSelectXT RNA Direct Kit, we found that transcripts were up regulated and down regulated to similar degrees with similar confidence levels in both the FF and FFPE samples, demonstrating the utility for meaningful gene expression studies with RNA stored in FFPE blocks. Introduction alignment statistics are similar generate much lower RIN values and In much of the world, preserving between the FFPE and FF samples, we have found that these low RIN tissue by formalin fixation followed and a comparison of the tumor values are not the best indicator by paraffin embedding of the versus normal gene expression of success with this new protocol.
    [Show full text]
  • Genespring in Genomics Research a Standard in Biological Interpretation
    Agilent Genomics Software Future Directions Michael Rosenberg, PhD Director, Genomics Software Agilent: A Focused Measurement Company Serving Diverse End Markets Electronic Measurement Bio-Analytical Measurement 2008 Revenue: $3.6 Billion 2008 Revenue: $2.2 Billion General Purpose Communications Chemical Analysis Life Sciences 37% 25% 20% 18% Other General Computers & Forensics Industry Semiconductors Wireless Mfg. Other Comms Environ- Academic & Government mental Food 4% 14% Aerospace Broadband & Defense Wireless R&D R&D/Mfg Petrochemical Pharma & Biotech Page 2 June 4, 2010 Agilent Genomics Portfolio DNA RNA Target aCGH/ CH ChIP Splice miRNA CNV 3 Enrich Variants GE Sys. Transcription Targeted sample prep mRNA Copy number Methylation mRNA Micro RNAs Factors isoforms Make high sensitivity Identify the presence Study chromosomal Map methylation Measure protein/DNA Enrich for genomic Identify the splice measurements of of microRNAs and aberrations and patterns across and interactions and to regions or transcripts forms of specific gene transcription and measure the effect of measure gene copy study effects on better characterize for high throughput genes and study their correlate results with knockouts and number transcription transcription, DNA sequencing effect on protein other genomic data correlate this activity replication and repair. translation. with gene transcription. Agilent Bioinformatics Software A comprehensive suite of applications Transcriptome Genome GeneSpring AGW miRNA, qPCR, Exon, ChIP, Methyl, CGH Copy#, LOH, GWAS RNA
    [Show full text]
  • Giri Narasimhan
    11/29/2016 Giri Narasimhan CURRICULUM VITAE GIRI NARASIMHAN ADDRESS: Professor, School of Computing and Information Science, PHONE: (305) 348-3748 ECS 254A, Florida International University, FAX: (305) 348-6142 Miami, FL 33199. WEBPAGES: http://www.cis.fiu.edu/~giri; http://biorg.cis.fiu.edu/ (Research Group) E-MAIL: [email protected] q EDUCATION DEGREE DISCIPLINE INSTITUTION YEAR B. Tech. Electrical Engineering Indian Institute of Technology, Bombay, India 1982 Ph. D. Computer Science University of Wisconsin - Madison 1989 q EXPERIENCE RANK/POSITION DEPARTMENT/DIVISION INSTITUTION PERIOD Professor School of Computer Science Florida International University From Fall 2004 Associate Dean, College of Engineering & Research and Florida International University 2009-2015 Computing Graduate Studies Visiting Scholar Next Generation Sequencing Strand Life Sciences Jan-Apr 2009 Visiting Professor Microbiology & Molecular Genetics Harvard Medical School Fall 2006 Visiting Researcher IMAGEN-NICTA National ICT Australia (NICTA) Feb 2006 Associate Professor School of Computer Science Florida International University 2001-2004 Professor Mathematical Sciences Department University of Memphis 2001 Associate Professor Mathematical Sciences Department University of Memphis 1995-2001 Visiting Professor Computer Science Department University of Copenhagen,Denmark May-July 2000 Visiting Professor Computer Science Department Lund University, Sweden May-June 1999 Visiting Professor Inst. for Advanced Comp. Studies University of Maryland,College Park Nov-Dec 1997
    [Show full text]
  • Elucidating the Transcriptional Program of Feline Injection-Site Sarcoma Using a Cross- Species Mrna-Sequencing Approach Qi Wei1†, Stephen A
    Wei et al. BMC Cancer (2019) 19:311 https://doi.org/10.1186/s12885-019-5501-z RESEARCH ARTICLE Open Access Elucidating the transcriptional program of feline injection-site sarcoma using a cross- species mRNA-sequencing approach Qi Wei1†, Stephen A. Ramsey1*† , Maureen K. Larson2, Noah E. Berlow3, Donasian Ochola4, Christopher Shiprack1, Amita Kashyap1, Bernard Séguin4, Charles Keller3 and Christiane V. Löhr1* Abstract Background: Feline injection-site sarcoma (FISS), an aggressive iatrogenic subcutaneous malignancy, is challenging to manage clinically and little is known about the molecular basis of its pathogenesis. Tumor transcriptome profiling has proved valuable for gaining insights into the molecular basis of cancers and for identifying new therapeutic targets. Here, we report the first study of the FISS transcriptome and the first cross-species comparison of the FISS transcriptome with those of anatomically similar soft-tissue sarcomas in dogs and humans. Methods: Using high-throughput short-read paired-end sequencing, we comparatively profiled FISS tumors vs. normal tissue samples as well as cultured FISS-derived cell lines vs. skin-derived fibroblasts. We analyzed the mRNA- seq data to compare cancer/normal gene expression level, identify biological processes and molecular pathways that are associated with the pathogenesis of FISS, and identify multimegabase genomic regions of potential somatic copy number alteration (SCNA) in FISS. We additionally conducted cross-species analyses to compare the transcriptome of FISS to those of
    [Show full text]
  • Giri Narasimhan
    6/18/2013 Giri Narasimhan CURRICULUM VITAE GIRI NARASIMHAN ADDRESS: • Professor, School of Computing and Information Science, PHONE: (305) 348-3748 ECS 254A, Florida International University, FAX: (305) 348-6142 Miami, FL 33199. • Associate Dean, Research and Graduate Studies College of Engineering and Computing 10555 W. Flagler Street, EC 2443 WEBPAGES: http://www.cis.fiu.edu/~giri; http://biorg.cis.fiu.edu/ (Research Group) E-MAIL: [email protected] q EDUCATION DEGREE DISCIPLINE INSTITUTION YEAR B. Tech. Electrical Engineering Indian Institute of Technology, Bombay, India 1982 Ph. D. Computer Science University of Wisconsin - Madison 1989 q EXPERIENCE RANK/POSITION DEPARTMENT/DIVISION INSTITUTION PERIOD Professor School of Computer Science Florida International University From Fall 2004 Associate Dean, College of Engineering & From Summer Research and Florida International University Computing 2009 Graduate Studies Visiting Scholar Next Generation Sequencing Strand Life Sciences Jan-Apr 2009 Visiting Professor Microbiology & Molecular Genetics Harvard Medical School Fall 2006 Visiting Researcher IMAGEN-NICTA National ICT Australia (NICTA) Feb 2006 Associate Professor School of Computer Science Florida International University 2001-2004 Professor Mathematical Sciences Department University of Memphis 2001 Associate Professor Mathematical Sciences Department University of Memphis 1995-2001 Visiting Professor Computer Science Department University of Copenhagen,Denmark May-July 2000 Visiting Professor Computer Science Department Lund University,
    [Show full text]