Supporting Information

Dvinge et al. 10.1073/pnas.1413374111 SI Materials and Methods 500 --calc-ci --genome-bam on the annotation file. (ii) Sample Collection. Peripheral blood specimens were collected at Filter the resulting BAM file to remove alignments with mapq the Fred Hutchinson Cancer Research Center (FHCRC). All scores of 0 and require a minimum splice junction overhang samples were obtained under FHCRC Institutional Review Board- of 6 bp. (iii) Extract all unaligned reads and then align to the approved protocols, and consent was provided according to the splice junction file with TopHat (Version 2.0.8b) (8) with Declaration of Helsinki. Specimens were collected from unrelated the arguments --bowtie1 --read-mismatches 3 --read-edit-dist nonsmoking healthy individuals (two male and two female) by using 2 --no-mixed --no-discordant --min-anchor-length 6 --splice- purple-top BD Vacutainer vials with EDTA as anticoagulant. Cell mismatches 0 --min-intron-length 10 --max-intron-length 1000000 counts per sample varied from 18 million to 68 million. Samples --min-isoform-fraction 0.0 --no-novel-juncs --no-novel-indels --raw- were incubated at room temperature or on ice, and PBMCs were juncs. The parameters --mate-inner-dist and --mate-std-dev were isolated by using a Ficoll–Hypaque gradient at the specified time determined by mapping to constitutive coding exons as deter- ’ points. The volume of the blood samples was brought to 10 mL mined with MISO s exon_utils.py script. (iv) Filter the resulting with RPMI 10% (vol/vol) BCS, and the diluted samples were pi- alignments as described above. (v) Merge the results from TopHat petted over 3 mL of Ficoll–Hypaque, then centrifuged at 1,000 × g and RSEM to generate a final BAM file. for 15 min, before isolating the buffy coat containing the PBMCs. Isoform Expression Measurements. MISO (2) and Version 2.0 of its Because of possible imperfect cell isolation, the layer containing annotations were used to quantify isoform ratios for all exon- the Ficoll and polymorphic nuclear cells was extracted along with skipping events (cassette exons), competing 5′ and 3′ splice the PBMCs. Cells from the cryopreserved samples were immedi- sites, and retained introns. Alternative splicing of constitutive ately put in a freezing medium with 90% (vol/vol) BCS and 10% junctions and retention of constitutive introns was quantified (vol/vol) DMSO and stored in liquid nitrogen. Before RNA se- using reads directly spanning the splice junctions, as described quencing, the samples were thawed in a water bath and washed (5). All subsequent analyses were restricted to splicing events twice with PBS, and the cells were pelleted. with at least 20 relevant reads (reads supporting either or both Mononuclear cells from four normal bone marrow samples isoforms) that were alternatively spliced in our data. Events were purchased from a commercial vendor (Lonza) as part of a were defined as differentially spliced between two samples if published microarray study (1). RNA sequencing was performed as they satisfied the following criteria: (i) at least 20 relevant reads described below. in both samples, (ii) a change in isoform ratio of at least 10%, and (iii)aBayesfactor≥1. Wagenmakers’s framework (9) was RNA Sequencing. PBMCs or mononuclear bone marrow cells were used to compute Bayes factors for differences in isoform ratios lysed in TRIzol, and total RNA was extracted by using Qiagen between samples. RNA easy columns. RNA quality was assessed by using the – μ Agilent 2200 TapeStation. Using 2 4 g of total RNA, we pre- RNA Expression Measurements. RNA transcript levels were nor- pared paired-end, poly(A)-selected, unstranded libraries for Il- malizedwiththetrimmedmeanofMvalues(TMM)method lumina sequencing using the TruSeq protocol modified to select (10) with the scaling factor calculated based on -coding DNA fragments in the 100- to 400-bp range. After the adapter transcripts only. were defined as differentially ex- ligation step, the bead-to-library volume ratios were varied as × pressed between two samples if they satisfied the following follows: (i)0.5 Agencourt AMPure XP beads were added to the criteria: (i) a change in transcript abundance of 1.5 log fold, < × sample library to select for fragments 400 bp. (ii)Then,1 beads and (ii)aBayesfactor≥100. When considering entire datasets, > were added to the library to select for 100-bp fragments. Finally, only genes differentially expressed in at least 25% of the sam- DNA fragments were amplified by using 15 cycles of PCR and ples were included. separated by 2% (wt/vol) agarose gel electrophoresis. DNA frag- ments (300 bp) were purified by using the Qiagen MinElute gel Identification of Upstream Regulators. Upstream regulators driving extraction kit. Barcoded RNA-seq libraries were sequenced in the changes in gene expression were identified by using the four-plex on the Illumina HiSeq 2500 using 2 × 49 bp reads. “Upstream Analysis” module of Ingenuity Pathway Analysis (Qiagen). Only RNAs and were considered as regulators, Genome Annotations. Alternative splicing events were grouped as and they were selected based on (i) an activation z score >2, (ii) ′ ′ skipped exon events, competing 5 and 3 splice sites, and retained consistent behavior across time points in a comparison analysis, introns, by using MISO (Version 2.0) annotations (2). Constitutive and (iii) consistent behavior across gene sets of varying stringency junctions were defined as splice junctions that were not alterna- (log fold change >1, 2, or 3). The regulators were connected using tively spliced in any isoform of the UCSC knownGene track (3). “Path explorer” using only high-confidence predictions across To map the reads, separate annotation files were created for RNA human, mouse, and rat and excluding interactions based solely transcripts and RNA splice junctions. The RNA transcript anno- on predicted miRNA/mRNA targets. tation is a combination of isoforms from MISO (Version 2.0) (2), UCSC knownGene (3), and the Ensembl 71 gene annotation (4). GO Enrichment. Enrichment of Biological Process terms from The RNA splice junction annotation contains an enumerating of GO was performed by using the R package goseq (11), using all possible combinations of annotated splice sites as described (5). all protein coding genes as the background universe, the “Wallenius” method, and correcting for gene-length bias. RNA-Seq Read Mapping. The read mapping pipeline contains the The resulting false discovery rates were corrected by using the following steps. (i) Map all reads to the UCSC hg19 (NCBI Benjamini–Hochberg approach. GRCh37) assembly by using Bowtie (6) and RSEM (7). RSEM (Version 1.2.4) was modified to call Bowtie External Datasets. For liquid tumors, samples were obtained (Version 1.0.0) with the -v 2 mapping strategy. RSEM was then from The European Genome-phenome Archive, the Sequence invoked with the arguments --bowtie-m 100 --bowtie-chunkmbs Read Archive, or directly from the authors. For solid tumors,

Dvinge et al. www.pnas.org/cgi/content/short/1413374111 1of6 raw RNA-seq reads from tumor samples with matched normals (Version 1.2; UNC Bioinformatics Utilities). All samples were downloaded from CGHub. For samples with data were processed in a manner identical to the PBMCs from available in FASTQ format, this format was used directly as healthy donors. For the final comparisons, low-coverage input. For samples with data available in BAM (generated by samples with <10 million reads mapping to protein-coding MapSplice), the raw reads were extracted by using sam2fastq transcripts were excluded.

1. Stirewalt DL, et al. (2008) Identification of genes with abnormal expression changes in 7. Li B, Dewey CN (2011) RSEM: Accurate transcript quantification from RNA-Seq data acute myeloid leukemia. Genes Cancer 47(1):8–20. with or without a reference genome. BMC Bioinformatics 12:323. 2. Katz Y, Wang ET, Airoldi EM, Burge CB (2010) Analysis and design of RNA sequencing 8. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: Discovering splice junctions with experiments for identifying isoform regulation. Nat Methods 7(12):1009–1015. RNA-Seq. Bioinformatics 25(9):1105–1111. 3. Meyer LR, et al. (2013) The UCSC Genome Browser database: Extensions and updates 9. Wagenmakers EJ, Lodewyckx T, Kuriyal H, Grasman R (2010) Bayesian hypothesis 2013. Nucleic Acids Res 41(database issue):D64–D69. testing for psychologists: A tutorial on the Savage-Dickey method. Cognit Psychol 4. Flicek P, et al. (2013) Ensembl 2013. Nucleic Acids Res 41(database issue):D48–D55. 60(3):158–189. 5. Hubert CG, et al. (2013) Genome-wide RNAi screens in human brain tumor isolates 10. Robinson MD, Oshlack A (2010) A scaling normalization method for differential ex- reveal a novel viability requirement for PHF5A. Genes Dev 27(9):1032–1045. pression analysis of RNA-seq data. Genome Biol 11(3):R25. 6. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient 11. Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) analysis for alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. RNA-seq: Accounting for selection bias. Genome Biol 11(2):R14.

t=0h t=48h Coding RNA ρ = 0.984 ρ =0.982 104

101

10-1 10-1 101 104 10-1 101 104 Donor 3 (XX)

Non-coding RNA 104 ρ = 0.878 ρ =0.867

101 Donor 4 (XY) Donor 4 (XY)

10-1 10-1 101 10410-1 101 104 Donor 3 (XX)

Fig. S1. Reproducibility between expression of coding and noncoding transcripts for donors 3 and 4. Expression levels were normalized with TMM nor- malization (1) using the coding genes to account for difference in read coverage. The Spearman correlation between donors remains unchanged from be- ginning to end of the ex vivo incubation.

1. Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25.

Dvinge et al. www.pnas.org/cgi/content/short/1413374111 2of6 AML classifier signatures AML prognostic signatures

4 6

2 4

0 2 0 -2 -2 -4 30 30 38 34 28 26 27 31 241203 136 174 40 49 74 27 29 29 33 31 29 28 NA 213 135 11794 117 49 24 78 log2 fold change in PBMC (48-0h)

LSCHSC cluster5 cluster6cluster4 cluster1cluster3cluster9 cluster8 cluster7cluster2 Li, 2013 cluster16 cluster11 cluster13cluster15cluster12 cluster10cluster14 t(8;21)_up t(15;17)_up t(11q23)_upt(8;21)_down t(15;17)_down t(11q23)_down Metzeler,Bullinger, 2008 2004 v(16),t(16,16)_up in Rapin, 2014 (bad) v(16),t(16;16)_down Rapin, 2014 (good) in

Fig. S2. Log2 fold change after 48 h of incubation (intersection of donors 3 and 4) of published gene sets included in prognostic signatures or used for sample classification. Numbers indicate genes within each signature or subtype. (Left) AML classifier signatures: genes characterizing 16 novel AML subtypes (1), genes that are up- or down-regulated across cytogenetic subgroups (2), and genes corresponding to a leukemic stem cell (LSC) or hematopoietic stem cell (HSC) profile (3). (Right) AML prognostic signatures: genes used for stratifying patients by good or bad prognosis (2) or for general prediction of outcome (4–6).

1. Valk PJ, et al. (2004) Prognostically useful gene-expression profiles in acute myeloid leukemia. N Engl J Med 350(16):1617–1628. 2. Rapin N, et al. (2014) Comparing cancer vs normal gene expression profiles identifies new disease entities and common transcriptional programs in AML patients. Blood 123(6):894–904. 3. Eppert K, et al. (2011) Stem cell gene expression programs influence clinical outcome in human leukemia. Nat Med 17(9):1086–1093. 4. Metzeler KH, et al.; Cancer and Leukemia Group B; German AML Cooperative Group (2008) An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood 112(10):4193–4201. 5. Bullinger L, et al. (2004) Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med 350(16):1605–1616. 6. Li Z, et al. (2013) Identification of a 24-gene prognostic signature that improves the European LeukemiaNet risk classification of acute myeloid leukemia: An international collaborative study. J Clin Oncol 31(9):1172–1181.

Dvinge et al. www.pnas.org/cgi/content/short/1413374111 3of6 A CCNDBP1

0h 4h 8h 24h 48h B C PBMC Bone marrow Diagnosis Cassette exon Frameshift produces Relapse new stop codon Peripheral blood 7 Bone marrow 5 | | Peripheral blood 19 | Bone marrow | AML 13 Retained intron In-frame stop codon 9 | | Premature termination codon Peripheral blood Bone marrow 55 AML (pediatric) 19 |

Accumulation of NMD substrates

Fig. S3. (A) Intron retention and cassette exon skipping in CCNDBP1 (donor 4). (B) Splicing events that introduce premature termination codons. (C)The relative accumulation of NMD substrates is calculated as in Fig. 3E, and stratified depending on sample origin (bone marrow vs. peripheral blood), patient time point (sample extracted at leukemia diagnosis or during relapse), and the percentage of immature blasts present in the sample. Numbers of samples in each subset are shown to the left. Datasets top to bottom are from refs. 1–3 and dbGaP study no. 2447.

1. Meyer JA, et al. (2013) Relapse-specific mutations in NT5C2 in childhood acute lymphoblastic leukemia. Nat Genet 45(3):290–294. 2. Macrae T, et al. (2013) RNA-Seq reveals spliceosome and proteasome genes as most consistent transcripts in human cancer cells. PLoS ONE 8(9):e72884. 3. Wen H, et al. (2012) New fusion transcripts identified in normal karyotype acute myeloid leukemia. PLoS ONE 7(12):e51203.

ABPHF20 LEF1 HIF1A TNF STAT4

CLL 182

CLL 004 NUPR1

C IL8 PTGES DUSP8 CCR5 NFkB 0 TGFB1 IL1B

TLR3 02 024 036 fold-change 2 DDIT3 PLAU CD83 CX3CR1 0 EGF PDGFBB RELA Average log 02 036 024

Fig. S4. (A) PHF20 and LEF1 cassette exons in two B-CLL patients (1), using the same gene regions as in Fig. 1F. Introns are truncated 100 bp from exon junctions (dashed lines). (B) Direct (solid) and indirect (dashed) interactions between putative upstream regulators of the changes in protein-coding gene expression caused by incubation, as reported by the IPA software. Double circles, protein families or dimers (PDGFBB, PDGFB homodimer; NFkB, NFKB1 and

NFKB2). (C)Log2 fold change (average of donors 3 and 4) of genes that are downstream targets of multiple regulators from B at 4, 8, 24, and 48 h.

1. Quesada V, et al. (2012) Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet 44(1):47–52.

Dvinge et al. www.pnas.org/cgi/content/short/1413374111 4of6 Table S1. Publicly available liquid tumor datasets Dataset Cancer n* Coverage, million reads ± SD Source

AML (pediatric) AML 61 24 ± 6 dbGaP† AML (TCGA) AML 169 77 ± 30 Ref. 1 AML AML 41 31 ± 8 Ref. 2 AML AML 17 30 ± 8 Ref. 3 AML AML 43 97 ± 31 Ref. 4 aCML Atypical chronic myeloid leukemia 11 41 ± 15 Ref. 5 B-CLL B-cell CLL 61 42 ± 12 Ref. 6 B-ALL B-cell acute lymphocytic leukemia 20 48 ± 9 Ref. 7 T-ALL T-cell acute lymphocytic leukemia 12 106 ± 13 Ref. 4 T-ALL T-cell acute lymphocytic leukemia 30 48 ± 10 Ref. 8 Lymphoma Burkitt lymphoma (B cell) 26 76 ± 16 Ref. 9 Lymphoma Non-Hodgkin lymphoma 94 183 ± 41 Ref. 10

Overview of datasets with at least 10 primary patient samples and publicly available RNA-seq data. *Number of samples that pass our minimum coverage threshold (SI Materials and Methods). † dbGaP accession no. phs000218, study no. 2447.

1. Cancer Genome Atlas Research Network (2013) Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 368(22):2059–2074. 2. Wen H, et al. (2012) New fusion transcripts identified in normal karyotype acute myeloid leukemia. PLoS ONE 7(12):e51203. 3. McNerney ME, et al. (2013) CUX1 is a haploinsufficient tumor suppressor gene on 7 frequently inactivated in acute myeloid leukemia. Blood 121(6):975–983. 4. Macrae T, et al. (2013) RNA-Seq reveals spliceosome and proteasome genes as most consistent transcripts in human cancer cells. PLoS ONE 8(9):e72884. 5. Piazza R, et al. (2013) Recurrent SETBP1 mutations in atypical chronic myeloid leukemia. Nat Genet 45(1):18–24. 6. Quesada V, et al. (2012) Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet 44(1):47–52. 7. Meyer JA, et al. (2013) Relapse-specific mutations in NT5C2 in childhood acute lymphoblastic leukemia. Nat Genet 45(3):290–294. 8. Atak ZK, et al. (2013) Comprehensive analysis of transcriptome variation uncovers known and novel driver events in T-cell acute lymphoblastic leukemia. PLoS Genet 9(12):e1003997. 9. Schmitz R, et al. (2012) Burkitt lymphoma pathogenesis and therapeutic targets from structural and functional genomics. Nature 490(7418):116–120. 10. Morin RD, et al. (2011) Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476(7360):298–303.

Table S2. PBMCs from healthy donors Donor Sex Treatment Hours Coverage, million reads RIN* Figure†

01 XX — 0 35.6 9.2 1 C–E and 3E Cryopreservation 1 83.3 8.4 1 C–E and 3E Cryopreservation 24 49.7 8.4 1 C–E and 3E Cryopreservation 48 45.2 8.5 1 C–E and 3E 02 XY — 0 42.9 9.1 1 C–E and 3E Cryopreservation 1 47.5 8.1 1 C–E and 3E Cryopreservation 24 33.2 8.2 1 C–E and 3E Cryopreservation 48 55.1 8.4/8.9 1 C–E and 3E 03 XX — 0 35.4 9.0 1–4 — 4 33.0 8.8 1–4 — 8 36.9 8.8 1–4 — 24 46.7 8.7 1–4 — 48 46.8 8.2 1–4 Incubated on ice 0 41.9 9.3 4B Incubated on ice 24 35.4 9.0 4B Incubated on ice 48 49.0 8.6/8.7 4B 04 XY — 0 48.2 9.0 1–4 — 4 45.1 9.1 1–4 — 8 46.3 9.1 1–4 — 24 48.4 9.2 1–4 — 48 42.9 8.2 1–4 Incubated on ice 0 46.4 9.0 4B Incubated on ice 24 33.2 8.6 4B Incubated on ice 48 39.9 9.1 4B

Number of samples from healthy blood donors and sample treatment before RNA sequencing. *Where two RIN numbers are listed, RNA from two separate vials was pooled before generating the RNA sequencing libraries. † The figures in which the sample is being used.

Dvinge et al. www.pnas.org/cgi/content/short/1413374111 5of6 Other Supporting Information Files

Dataset S1 (XLS) Dataset S2 (XLS) Dataset S3 (XLS)

Dvinge et al. www.pnas.org/cgi/content/short/1413374111 6of6