Supporting Information
Total Page:16
File Type:pdf, Size:1020Kb
Supporting Information Dvinge et al. 10.1073/pnas.1413374111 SI Materials and Methods 500 --calc-ci --genome-bam on the gene annotation file. (ii) Sample Collection. Peripheral blood specimens were collected at Filter the resulting BAM file to remove alignments with mapq the Fred Hutchinson Cancer Research Center (FHCRC). All scores of 0 and require a minimum splice junction overhang samples were obtained under FHCRC Institutional Review Board- of 6 bp. (iii) Extract all unaligned reads and then align to the approved protocols, and consent was provided according to the splice junction file with TopHat (Version 2.0.8b) (8) with Declaration of Helsinki. Specimens were collected from unrelated the arguments --bowtie1 --read-mismatches 3 --read-edit-dist nonsmoking healthy individuals (two male and two female) by using 2 --no-mixed --no-discordant --min-anchor-length 6 --splice- purple-top BD Vacutainer vials with EDTA as anticoagulant. Cell mismatches 0 --min-intron-length 10 --max-intron-length 1000000 counts per sample varied from 18 million to 68 million. Samples --min-isoform-fraction 0.0 --no-novel-juncs --no-novel-indels --raw- were incubated at room temperature or on ice, and PBMCs were juncs. The parameters --mate-inner-dist and --mate-std-dev were isolated by using a Ficoll–Hypaque gradient at the specified time determined by mapping to constitutive coding exons as deter- ’ points. The volume of the blood samples was brought to 10 mL mined with MISO s exon_utils.py script. (iv) Filter the resulting with RPMI 10% (vol/vol) BCS, and the diluted samples were pi- alignments as described above. (v) Merge the results from TopHat petted over 3 mL of Ficoll–Hypaque, then centrifuged at 1,000 × g and RSEM to generate a final BAM file. for 15 min, before isolating the buffy coat containing the PBMCs. Isoform Expression Measurements. MISO (2) and Version 2.0 of its Because of possible imperfect cell isolation, the layer containing annotations were used to quantify isoform ratios for all exon- the Ficoll and polymorphic nuclear cells was extracted along with skipping events (cassette exons), competing 5′ and 3′ splice the PBMCs. Cells from the cryopreserved samples were immedi- sites, and retained introns. Alternative splicing of constitutive ately put in a freezing medium with 90% (vol/vol) BCS and 10% junctions and retention of constitutive introns was quantified (vol/vol) DMSO and stored in liquid nitrogen. Before RNA se- using reads directly spanning the splice junctions, as described quencing, the samples were thawed in a water bath and washed (5). All subsequent analyses were restricted to splicing events twice with PBS, and the cells were pelleted. with at least 20 relevant reads (reads supporting either or both Mononuclear cells from four normal bone marrow samples isoforms) that were alternatively spliced in our data. Events were purchased from a commercial vendor (Lonza) as part of a were defined as differentially spliced between two samples if published microarray study (1). RNA sequencing was performed as they satisfied the following criteria: (i) at least 20 relevant reads described below. in both samples, (ii) a change in isoform ratio of at least 10%, and (iii)aBayesfactor≥1. Wagenmakers’s framework (9) was RNA Sequencing. PBMCs or mononuclear bone marrow cells were used to compute Bayes factors for differences in isoform ratios lysed in TRIzol, and total RNA was extracted by using Qiagen between samples. RNA easy columns. RNA quality was assessed by using the – μ Agilent 2200 TapeStation. Using 2 4 g of total RNA, we pre- RNA Expression Measurements. RNA transcript levels were nor- pared paired-end, poly(A)-selected, unstranded libraries for Il- malizedwiththetrimmedmeanofMvalues(TMM)method lumina sequencing using the TruSeq protocol modified to select (10) with the scaling factor calculated based on protein-coding DNA fragments in the 100- to 400-bp range. After the adapter transcripts only. Genes were defined as differentially ex- ligation step, the bead-to-library volume ratios were varied as × pressed between two samples if they satisfied the following follows: (i)0.5 Agencourt AMPure XP beads were added to the criteria: (i) a change in transcript abundance of 1.5 log fold, < × sample library to select for fragments 400 bp. (ii)Then,1 beads and (ii)aBayesfactor≥100. When considering entire datasets, > were added to the library to select for 100-bp fragments. Finally, only genes differentially expressed in at least 25% of the sam- DNA fragments were amplified by using 15 cycles of PCR and ples were included. separated by 2% (wt/vol) agarose gel electrophoresis. DNA frag- ments (300 bp) were purified by using the Qiagen MinElute gel Identification of Upstream Regulators. Upstream regulators driving extraction kit. Barcoded RNA-seq libraries were sequenced in the changes in gene expression were identified by using the four-plex on the Illumina HiSeq 2500 using 2 × 49 bp reads. “Upstream Analysis” module of Ingenuity Pathway Analysis (Qiagen). Only RNAs and proteins were considered as regulators, Genome Annotations. Alternative splicing events were grouped as and they were selected based on (i) an activation z score >2, (ii) ′ ′ skipped exon events, competing 5 and 3 splice sites, and retained consistent behavior across time points in a comparison analysis, introns, by using MISO (Version 2.0) annotations (2). Constitutive and (iii) consistent behavior across gene sets of varying stringency junctions were defined as splice junctions that were not alterna- (log fold change >1, 2, or 3). The regulators were connected using tively spliced in any isoform of the UCSC knownGene track (3). “Path explorer” using only high-confidence predictions across To map the reads, separate annotation files were created for RNA human, mouse, and rat and excluding interactions based solely transcripts and RNA splice junctions. The RNA transcript anno- on predicted miRNA/mRNA targets. tation is a combination of isoforms from MISO (Version 2.0) (2), UCSC knownGene (3), and the Ensembl 71 gene annotation (4). GO Enrichment. Enrichment of Biological Process terms from The RNA splice junction annotation contains an enumerating of GO was performed by using the R package goseq (11), using all possible combinations of annotated splice sites as described (5). all protein coding genes as the background universe, the “Wallenius” method, and correcting for gene-length bias. RNA-Seq Read Mapping. The read mapping pipeline contains the The resulting false discovery rates were corrected by using the following steps. (i) Map all reads to the UCSC hg19 (NCBI Benjamini–Hochberg approach. GRCh37) human genome assembly by using Bowtie (6) and RSEM (7). RSEM (Version 1.2.4) was modified to call Bowtie External Datasets. For liquid tumors, samples were obtained (Version 1.0.0) with the -v 2 mapping strategy. RSEM was then from The European Genome-phenome Archive, the Sequence invoked with the arguments --bowtie-m 100 --bowtie-chunkmbs Read Archive, or directly from the authors. For solid tumors, Dvinge et al. www.pnas.org/cgi/content/short/1413374111 1of6 raw RNA-seq reads from tumor samples with matched normals (Version 1.2; UNC Bioinformatics Utilities). All samples were downloaded from CGHub. For samples with data were processed in a manner identical to the PBMCs from available in FASTQ format, this format was used directly as healthy donors. For the final comparisons, low-coverage input. For samples with data available in BAM (generated by samples with <10 million reads mapping to protein-coding MapSplice), the raw reads were extracted by using sam2fastq transcripts were excluded. 1. Stirewalt DL, et al. (2008) Identification of genes with abnormal expression changes in 7. Li B, Dewey CN (2011) RSEM: Accurate transcript quantification from RNA-Seq data acute myeloid leukemia. Genes Chromosomes Cancer 47(1):8–20. with or without a reference genome. BMC Bioinformatics 12:323. 2. Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNA sequencing 8. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: Discovering splice junctions with experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015. RNA-Seq. Bioinformatics 25(9):1105–1111. 3. Meyer LR, et al. (2013) The UCSC Genome Browser database: Extensions and updates 9. Wagenmakers EJ, Lodewyckx T, Kuriyal H, Grasman R (2010) Bayesian hypothesis 2013. Nucleic Acids Res 41(database issue):D64–D69. testing for psychologists: A tutorial on the Savage-Dickey method. Cognit Psychol 4. Flicek P, et al. (2013) Ensembl 2013. Nucleic Acids Res 41(database issue):D48–D55. 60(3):158–189. 5. Hubert CG, et al. (2013) Genome-wide RNAi screens in human brain tumor isolates 10. Robinson MD, Oshlack A (2010) A scaling normalization method for differential ex- reveal a novel viability requirement for PHF5A. Genes Dev 27(9):1032–1045. pression analysis of RNA-seq data. Genome Biol 11(3):R25. 6. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient 11. Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis for alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. RNA-seq: Accounting for selection bias. Genome Biol 11(2):R14. t=0h t=48h Coding RNA ρ = 0.984 ρ =0.982 104 101 10-1 10-1 101 104 10-1 101 104 Donor 3 (XX) Non-coding RNA 104 ρ = 0.878 ρ =0.867 101 Donor 4 (XY) Donor 4 (XY) 10-1 10-1 101 10410-1 101 104 Donor 3 (XX) Fig. S1. Reproducibility between expression of coding and noncoding transcripts for donors 3 and 4. Expression levels were normalized with TMM nor- malization (1) using the coding genes to account for difference in read coverage. The Spearman correlation between donors remains unchanged from be- ginning to end of the ex vivo incubation.