Antibodies Targeting Histone Marks Associated with Transcriptional Ac
Total Page:16
File Type:pdf, Size:1020Kb
SUPPLEMENTARY INFORMATION Supplementary Methods Materials: Antibodies targeting histone marks associated with transcriptional activation (H3K4me3 Abcam #ab8580, H3K9ac Millipore #Mi 06-942), enhancers (H3K27ac Abcam #ab4729) and markers of transcriptional repression (H3K9me3 Abcam #ab8898, H3K27me3 Abcam #ab6002) were used for ChIPseq. DNA and RNA was extracted using either Illustra TriplePrep Kit (GE Healthcare #28-9425-44), and RNeasy Mini kit (Qiagen #7416). ChIP-sequencing data processing: Sequencing reads were aligned to the hg19 reference genome, using BWA and the fragment count at any given position was estimated as the number of uniquely aligned reads. Enriched intervals were identified by comparison of the mean fragment count in a given window (for instance, 1 kb long) against a sample-specific expected distribution estimated from input (chromatin isolated from cells without antibody immunoprecipitation). Peak finding algorithm Model-based Analysis of ChIP-Seq (MACS2 version 2.1.1) was applied to the aligned sequences to obtain a list of binding sites as a reference set for activation marks with narrow peaks (H3K4me3, H3K9ac and H3K27ac). A p-value threshold of 0.01 was used for enrichment for all datasets. Spatial clustering for identification of ChIP enriched regions- SICER (version 1.1) was used to identify extended domains of repressive marks H3K9me3 and H3K27me3, which are associated with diffuse peaks and relatively high background. Significant changes from diagnosis to relapse were determined by ranking ratios and those with the highest ratios were called “dynamic”, by using a geometric inflection point as cut-off for each pair individually instead of using an arbitrary cut-off to account for inter-individual heterogeneity. Dynamic changes were those where histone marks were deposited at both diagnosis and relapse but there was a significant enrichment in one condition or other (i.e. diagnosis vs relapse). Those cases where we saw no deposition of mark at diagnosis but were enriched at relapse (or visa-versa) were labeled as an “exclusive” gain or loss. Dendrograms were generated using hierarchical clustering of the top 5000 differential H3K27ac or H3K9ac peaks at a distance of <3kb of TSS sorted according to FDR and fold change. Pathway analyses of the differential ChIPseq peaks for individual patients for each mark were conducted using the R Bioconductor package GREAT (version 3.0.0) with the PANTHER ontology. The most significant H3K27ac pathways (hypergeometric raw p<0.01) found in at least 2 patients were used as the reference for the bubble plot. The number of patients for each mark that had the reference pathway were plotted and normalized by number of patients. RNA sequencing: RNA was extracted from the samples using Qiagen RNeasy kit. Samples were quantified using Qubit fluorometer and submitted to the genomic core for sequencing on HiSeq 2500. Libraries were prepared using the TruSeq RNA sample prep v2 kit (Illumina) with Ribo- zero set allowing for the analysis of the expression of lncRNA in addition to protein coding genes. Gene Expression Analysis: For cohort A, the TARGET database was used to segregate genes into up-regulated and down-regulated categories based on the log ratio of the fold change at relapse compared to diagnosis. For cohort B, RNA-seq libraries were sequenced using 54 base pair reads on the Illumina Genome Analyzer GAIIx. Image collection and analysis was completed using the Illumina CASAVA pipeline. Adapters from RNA-seq paired-end reads were trimmed and low-quality bases (<30) were removed using Trimmomatic (v0.33) (1). Alignment was performed using STAR (v2.5.3a) (2) to human genome (hg19) and bases with mapping quality <30 were removed. Raw counts of sequencing reads were obtained from HTseq2. Normalized genome browser tracks were generated using BEDTools (v2.26.0) (3). Differential gene expression analysis was performed using DESeq2 (4). For both cohorts, an absolute fold change of > 1.5 Fold was considered as a significant up- or down- regulation. All other genes were considered unchanged. Enhanced Reduced Representation Bisulfite Sequencing (eRRBS): Genomic DNA was extracted and digested with MspI and DNA fragments were ligated with Illumina kits following end repair. Library fragments of 150-250 b.p. and 250-400 b.p. were isolated and bisulfite treated as previously described (5). eRRBS was used as it increases the genomic coverage compared to RRBS while increasing the biologically relevant GC rich loci. The analysis includes a normalization based on expected distributions to account for any bias. For GSEA, genes were ranked by the median of methylation difference in Relapse vs Diagnosis between each CpG in the TSS+-1kb region. Enrichment was calculated using this ranking with FGSEA and MSigDB geneset for each patient separately, and pathways were ordered by the number of patients that had the pathway as significant (q < 0.01). Based on the GSEA results, gene expression (absolute L2FC >0.32) was determined for the Benporath_PRC2_Target leading edge genes per patient. Whole genome sequencing: DNA was extracted from diagnosis, remission and relapse samples as detailed. The DNA was quantified using Picogreen. Fragment Analyzer was used to determine DNA integrity. Library was made using the TruSeq PCR-free 450bp library Prep protocol and sequenced at 100x coverage. Sequencing was done at the New York genome Center (NYGC). Tumor and matched normal DNA sequencing data went through the NYGC somatic pre- processing pipeline which included aligning reads to the GRCh37 human reference genome using the Burrows-Wheeler Aligner (BWA) aln (6), marking of duplicate reads by the use of NovoSort (a multi-threaded bam sort/merge tool by Novocraft technologies http://www.novocraft.com); realignment around indels (done jointly for all samples derived from one individual, e.g. tumor and matched normal samples, or normal, primary and metastatic tumor trios) and base recalibration via Genome Analysis Toolkit (GATK) (7). For quality control, a battery of Picard (QualityScoreDistribution, MeanQualityByCycle, CollectBaseDistributionByCycle, CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, CollectGcBiasMetrics, CollectOxoGMetrics) and GATK (FlagStat, ErrorRatePerCycle) metrics were run on all the sequencing data. BedToolsCoverage and custom R scripts were used to compute sequencing depth of coverage. Conpair was run on all tumor-normal pairs to detect cross- individual contamination and sample mix-ups (8). SNVs were called by muTect (9), Strelka (10) and LoFreq (11) and indels were called by Strelka, and somatic versions of Pindel (12) and Scalpel (13). SNVs or indels called by at least 2 of the 3 programs were considered. References 1. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014;30:2114-20 2. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013;29:15-21 3. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010;26:841-2 4. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:550 5. Akalin A, Garrett-Bakelman FE, Kormaksson M, Busuttil J, Zhang L, Khrebtukova I, et al. Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic landscapes in acute myeloid leukemia. PLoS Genet 2012;8:e1002781 6. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754-60 7. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20:1297-303 8. Bergmann EA, Chen BJ, Arora K, Vacic V, Zody MC. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 2016;32:3196-8 9. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol 2013;31:213-9 10. Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 2012;28:1811-7 11. Wilm A, Aw PP, Bertrand D, Yeo GH, Ong SH, Wong CH, et al. LoFreq: a sequence- quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res 2012;40:11189-201 12. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 2009;25:2865-71 13. Narzisi G, O'Rawe JA, Iossifov I, Fang H, Lee YH, Wang Z, et al. Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat Methods 2014;11:1033-6 Supplemental Table S1: Patient Characteristics Patient ID Age NCI Time to Cytogenetics Day Histone marks Blast % Outcome (years)/ Risk relapse 29 (H3)* D/R Gender Group (months) MRD Cohort A PASEVJ 11/F HR 23 Neutral + K4me3, K27me3,K9ac, K9me3 90/75 deceased PAPSPN 11/M HR 17 Neutral - K4me3, K27Ac, K27me3, K9ac, K9me3 NA/95 deceased PAPZNK 10/F HR 21 Neutral - K4me3, K27ac, K27me3, K9ac, K9me3 NA/90 deceased PARBVI 13/F HR 32 Neutral + K4me3, K27ac, K27me3, K9ac, K9me3 NA/90 deceased PARFLV