bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
1 Recapitulation of human germline coding variation in an ultra-mutated infant leukemia
2 Alexander M Gout1, Rishi S Kotecha2,3,4, Parwinder Kaur5,6, Ana Abad1, Bree Foley7, Kim W 3 Carter8, Catherine H Cole3, Charles Bond9, Ursula R Kees2, Jason Waithman7,*, Mark N 4 Cruickshank1*
5 1Cancer Genomics and Epigenetics Laboratory, Telethon Kids Cancer Centre, Telethon Kids 6 Institute, University of Western Australia, Perth, Australia. 7 2Leukaemia and Cancer Research Laboratory, Telethon Kids Cancer Centre, Telethon Kids 8 Institute, Perth, Australia. 9 3Department of Haematology and Oncology, Princess Margaret Hospital for Children, Perth, 10 Australia. 11 4School of Medicine, University of Western Australia, Perth, Australia. 12 5Personalised Medicine Centre for Children, Telethon Kids Institute, Australia. 13 6Centre for Plant Genetics & Breeding, UWA School of Agriculture & Environment. 14 7Cancer Immunology Unit, Telethon Kids Institute, Perth, Australia. 15 8McCusker Charitable Foundation Bioinformatics Centre, Telethon Kids Institute, Perth, 16 Australia. 17 9School of Molecular Sciences, The University of Western Australia, Perth, Australia. 18 19 * These authors contributed equally to this work. Correspondence and requests for 20 materials should be addressed to J.W. ([email protected]) or to M.N.C. 21 ([email protected]). 22 23 Keywords: infant acute lymphoblastic leukemia, hypermutation, mismatch repair defect, 24 MSH2 25 Running title: Ultra-mutation targets germline alleles in an infant leukemia 26 Word Count: (4608) 27 Total number of Figures/Tables: 5 figures and 2 tables 28 Financial support: Children’s Leukaemia and Cancer Research Foundation (Inc.) Fellowship 29 (MNC); NHMRC Career Development Fellowship 1066869 (JW); Raine Medical Research 30 Foundation Clinician Research Fellowship 2016-2018 (RSK); Children’s Leukaemia and 31 Cancer Research Foundation (Inc.) project grants (URK, MNC, RSK); Ian Potter Foundation 32 (AG); Healy Foundation (AG). Telethon Kids Institute Blue Sky Research Grant (MNC, 33 URK, RSK, JW). Telethon Kids Institute Research Focus Area Precision Medicine Working 34 Group Funding (JW, MNC). 35 Conflicts of interest: The authors declare no conflicts of interest.
1 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
36 Abstract: Mixed lineage leukemia/Histone-lysine N-methyltransferase 2A gene
37 rearrangements occur in 80% of infant acute lymphoblastic leukemia, but the role of
38 cooperating events is unknown. Infant leukemias typically carry few somatic lesions, however
39 we identified a case with over 100 somatic point mutations per megabase. The patient
40 presented at 82 days of age, among the earliest onset of cancer hypermutation recorded. The
41 transcriptional profile showed global similarities to canonical cases. Coding lesions were
42 predominantly clonal and almost entirely targeting alleles reported in human genetic variation
43 databases with a notable exception in the mismatch repair gene, MSH2. The patient’s
44 diagnostic leukemia transcriptome was depleted of rare and low-frequency germline alleles
45 due to loss-of-heterozygosity, while somatic point mutations targeted low-frequency and
46 common human alleles in proportions that offset this discrepancy. Somatic signatures of ultra-
47 mutations were highly correlated with germline single nucleotide polymorphic sites indicating
48 a common role for 5-methylcytosine deamination, DNA mismatch repair and DNA adducts.
49 These data suggest similar molecular processes shaping population-scale human genome
50 variation also underlie evolution of an infant ultra-mutated leukemia case.
2 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
51 Introduction:
52 Histone-lysine N-methyltransferase 2A (KMT2A) (also known as Mixed lineage
53 leukemia) gene rearrangements (KMT2A-R) at 11q23 occur in infant, pediatric, adult and
54 therapy-induced acute leukemias and are associated with a poor prognosis. KMT2A-R occurs
55 in around 80% of infant acute lymphoblastic leukemia (ALL) and 35-50% of infant acute
56 myeloid leukemia (AML) involving more than 60 distinct partner genes 1. KMT2A-R infant
57 ALL (iALL) genomes have an exceedingly low somatic mutation burden, with recurring
58 mutations in RAS-PI3K complex genes. Compared to wild-type KMT2A, the fusion
59 oncoprotein has altered histone-methyltransferase activity, causing epigenetic and
60 transcriptional deregulation. While KMT2A-oncoproteins are strong drivers of
61 leukemogenesis, several lines of evidence suggest that KMT2A-R alone is insufficient for
62 tumor initiation. For example, there is significant variability in latency of disease onset in
63 experimental models of KMT2A-fusions 2, 3 and in patients with KMT2A-R detected in
64 neonatal blood spots 4. Furthermore, KMT2A-R acute leukemias 5 in children are enriched for
65 mutations affecting epigenetic regulatory genes and on average harbor more somatic lesions
66 than infants. These observations have prompted speculation that iALL is a distinct
67 developmental and genetic entity and pre-leukemic transformation occurs in utero, targeting a
68 cell of origin with a genetic/epigenetic permissive predisposition 6.
69 As a step towards defining cooperating events we performed next-generation
70 sequencing on a KMT2A-R iALL patient cohort and integrated the data with a larger set of
71 acute leukemias. We investigated the combined germline and somatic mutation profiles,
72 identifying a highly mutated specimen from a patient diagnosed at 82 days of age. Here we
73 report sequence analysis of this unique genetic entity in relation to its canonical KMT2A-R
74 iALL counterparts, revealing common mechanisms driving leukemia ultra-mutation and
75 population-scale human genome variation.
76
3 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
77 Methods:
78 Patient specimens, external sequencing data and controls
79 The study cohort consisted of ten infants diagnosed at Princess Margaret Hospital for
80 Children (Perth), of which we have previously reported RNA-seq data for five patients 7.
81 Additional RNA-seq data was downloaded for 39 KMT2A-R iALL cases at diagnosis, six
82 cases at relapse, six cases lacking the KMT2A-R (KMT2A-germline) and for pediatric
83 KMT2A-R patients at diagnosis, comprising 10 AML cases, seven ALL cases and one
84 undifferentiated case as reported previously by Andersson et al. 5 (accession
85 EGAS00001000246). RNA-seq data comprising 28 KMT2A-R and 52 KMT2A-germline adult
86 diagnosistic AML cases downloaded from the Short Read Archive (BioProject Umbrella
87 Accession PRJNA2787668; Leucegene: AML sequencing). Normal human bone marrow
88 transcriptomes were downloaded from the Short Read Archive (BioProject PRJNA284950)
89 comprising 20 pooled sorted human hematopoietic progenitor populations 9. The full list of
90 samples analysed in this study including those downloaded from external sources is
91 summarised in Supplementary Table S1.
92 Sample preparation and next-generation sequencing
93 RNA was isolated with TRIzol™ Reagent (Invitrogen Life Science Technologies,
94 Carlsbad, CA, USA) and purified with RNeasy Purification Kits (Qiagen, Hilden, Germany).
95 Genomic DNA was isolated with QIAamp DNA Mini Kit (Qiagen). Nucleic acid integrity
96 and purity was assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa
97 Clara, CA). RNA-seq was performed at the Australian Genome Research Facility, Melbourne
98 with 1 μg RNA as previously reported 7. Whole exome sequencing (WES) was performed by
99 the Beijing Genome Institute using SureSelect XT exon enrichment (Agilent Technologies)
100 with 200 ng of remission DNA (v4 low input kit) or 1 μg of diagnostic DNA (v5). Diagnostic
101 RNA and remission genomic DNA sequencing was performed on a HiSeq 2000 (Illumina,
102 Inc., San Diego, CA, USA) instrument with 100bp paired end reads; diagnostic genomic
4 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
103 DNA sequencing was performed on a HiSeq 2500 with 150bp paired end reads. Sequencing
104 libraries were constructed using Illumina TruSeq kits (Illumina, Inc.).
105 Bioinformatics
106 For gene-expression analysis, RNA-seq reads were mapped to hg19 with HISAT and
107 gene-counts generated with HTseq 10. Data was normalized with RUV-seq 11 implementing
108 factor analysis of negative control genes identified by fitting a linear model with cell type
109 (hematopoietic sub-set or leukemia sub-type) as a covariate. voom! 12 was used to adjust for
110 library size and generate counts per million (cpm) values. KMT2A-fusion split-reads were
111 identified using RNA-seq data with FusionFinder 13 using default settings.
112 Single nucleotide variants (SNVs) were identified from RNA-seq data using the
113 intersection of two methodologies: (1) the “Genome Analysis Toolkit (GATK) Best Practices
114 workflow for SNP and indel calling on RNAseq data” and (2) SNPiR 14. WES data was
115 processed based on the GATK Best Practices workflow. Variants were characterized using
116 custom scripts that extract the coverage of sequenced bases from VCF and bam files,
117 determining the overlap among samples, and calculating variant allele fractions (VAFs).
118 High confidence somatic mutations were identified using VarScan2 (v2.3.9) 15 setting
119 cut-offs of P-value<0.005; coverage > 50X; VAF > 0.1 or VAF > 0.3 to enrich for clonal
120 mutations. Variants were annotated with ANNOVAR 16 and with metadata from additional
121 sources including seven functional/conservation algorithms (PolyPhen2 17, Sift 18,
122 MutationTaster 19, likelihood ratio test 20, GERP 21, PhyloP 22, and CADD 23) and databases of
123 human genetic variation (ExAC, 1000 genomes, HapMap, dbSNP, COSMIC and ClinVar).
124 Predicted functional alleles were assigned for variants called as damaged/deleterious by at
125 least 4/7 algorithms.
126 Somatic mutational spectra were investigated using the R package, DeconstructSigs
127 (v1.8.0) 24 calculating the fraction of somatic substitutions within the possible 96-trinucleotide
128 contexts relative to the exome background. Copy number analysis was performed on WES
5 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
129 data using VarScan2 to calculate log2 depth ratio and GC-content and Sequenza (v2.1.2) 25 to
130 identify chromosomal segments with copy loss. Gene ontology enrichment was performed
131 using GREAT (v3.0.0) 26 bioinformatics tools. Mutations were inspected by mapping the
132 genomic position to crystal structures in the Protein DataBank27 and analyzed with Pymol
133 (v1.8.6.2).
134 Data and code availability
135 Scripts used to execute variant detection software and analysis are available at
136 https://github.com/cruicks/ultramutated-case under an MIT license and are being developed
137 on Github (https://github.com/cruicks/vcfbamCompare). The sequencing data has been
138 deposited in the Short Read Archive with study ID PRJNA296746.
139
140 Results:
141 Characteristics of KMT2A-R iALL cohort and RNA-seq variant detection pipeline
142 We reported previously KMT2A-R fusion transcripts for 5/10 cases 7, and of the
143 remaining five KMT2A-R iALL we detected KMT2A-MLLT3 (P401, P438), KMT2A-AFF1
144 (P848), KMT2A-MLLT1 (P706) and KMT2A-EPS15 (P809). KMT2A-fusion transcripts were
145 therefore detected in all samples and reciprocal fusion sequences detected in 5/10 samples
146 (Table 1). To determine the validity of our RNA-seq based pipeline for detecting rare and
147 common alleles, we examined the concordance of single nucleotide variants (SNVs) called
148 from RNA-seq and WES data. Across seven samples with matched WES and RNA-seq data,
149 we identified a total of 54,792 point mutations from RNA-seq reads of which 166 were absent
150 from matched WES (coverage>20 reads), representing a discordance rate of 0.3% (Figure 1a;
151 Supplementary Table S2). In one sample, P706 there were two events called with RNA-seq
152 data that were not identified in matched WES but with evidence extracted using the Integrated
153 Genome Browser at previously reported somatic mutation sites in FLT3 (encoding FLT3
6 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
154 p.V491L 28; 200/1,719 alternate RNA-seq reads; 15/214 alternate WES reads) and KRAS
155 (encoding KRAS p.A146T 29; 24/159 alternate RNA-seq reads; 6/51 alternate WES reads).
156 Furthermore, analysis of 117 discordant events with coverage > 50 reads, revealed 76
157 recurrences in two or more samples (64%) at 26 unique sites, however none of these sites
158 have been validated as somatic mutations in cancer to date in the Catalog of Somatic
159 Mutations in Cancer (COSMIC)30 or The Pediatric Cancer Genome Project (PCGP)31
160 databases. These results indicate a high rate of reproducibility of the RNA-seq variant
161 detection pipeline and suggest that RNA-DNA differences mostly comprise artifacts and/or
162 novel RNA-editing events 32 and a minority of putative somatic alterations exclusively
163 detected by RNA-seq.
164 Identification of a highly mutated infant leukemia
165 Expressed somatic mutations were defined as variants detected in both diagnostic
166 leukemia RNA-seq and WES, but absent in remission WES at sites with >20 reads (Figure
167 1b-d). Most patients (5/6) showed the expected “silent genomic landscape” (Supplementary
168 Table S3) except for patient P337 (Figure 1b-d; Supplementary Table S4). Three patients
169 expressed RAS-family hot-spot somatic mutations (P848 and P706: KRAS p.A146T and
170 P438: NRAS.p.G12S). We detected no expressed somatic missense mutations in patients
171 P399 and P401. The pair of monozygotic twins (P809 and P810) lacked private expressed
172 mutations but we detected two shared expressed missense mutations clustered in the PER3
173 gene (encoding PER3.p.M1006R and PER3.p.K1007E). These results are consistent with
174 recurring RAS-pathway mutations and low mutation rate reported previously 5, 33. However,
175 in patient P337 RNA-seq reads we detected 1,420 expressed somatic mutations
176 (Supplementary Table S4). The overlap of both rare and common alleles across the matched
177 datasets verifies the common origin of RNA-seq and diagnostic/remission WES data ruling
178 out sample contamination or mislabeling (Figure 1a-c). We previously reported a complex
179 KMT2A translocation involving 2q37, 19p13.3 and 11q2334 in the highly mutated leukemia
180 sample and established two cell lines from the diagnostic sample 7.
7 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
181 Characterisation and clonal analysis of somatic lesions
182 Somatic lesions were further evaluated in six cases by comparing diagnostic and remission
183 WES. Two different variant allele fraction (VAF)-cut-offs, either VAF>0.1 or VAF>0.3, were
184 chosen to enrich for sub-clonal and clonal events respectively. There were on average ~82.2
185 (range 69-95) sub-clonal and ~3.4 clonal (range 3-4) somatic point mutations in canonical
186 KMT2A-R iALL exomes, confirming previous studies of a low somatic mutation rate in the
187 dominant clone (Table 1). In contrast, there were 198 sub-clonal and 5,054 clonal somatic
188 point mutations in the highly-mutated sample, equating to a rate of 139 mut/Mb, classifying
189 this sample as ultra-mutated. There were also ~40-fold increased number of indels in the
190 dominant clone of the ultra-mutated sample (n=260) compared to typical cases (average =
191 6.6; range: 3-9) including 13 predicted to cause a frame-shift protein-coding truncation (Table
192 1; Supplementary Table S5). There were 2,420 exonic substitutions encoding 1,109 missense
193 alleles and 12 stop-gain alleles (Supplementary Table S5). We also noted that the ultra-
194 mutated case carried an excess of loss-of-heterozygosity (LOH) positions, with 6,555 high
195 confidence sites; vastly more than typical cases which had a mean of 57.2 LOH positions
196 (range 28-100) (Figure 2a-b; Table 1).
197 WES was performed on two independent cell lines derived from the ultra-mutated diagnostic
198 specimen, PER-784A and PER-826A, to characterize clonal somatic events. (Figure 2c-d).
199 Most somatic and LOH-events detected in the primary ultra-mutated sample were also called
200 in both, PER-784A and PER-826A cell lines using P337 remission WES for all comparisons
201 (Figure 2d). These data suggest that the somatic lesions are largely clonal in agreement with
202 the VAF-distributions (Figure 2b, d). In total, 4,754/5,054 (94%) somatic events
203 (Supplementary Table S6) and 6,150/6,575 (93%) LOH-events (Supplementary Table S7)
204 were called in the derived cell lines. Moreover, there were relatively few discordant sites
205 when directly comparing cell lines with the primary diagnostic sample (PER-784A: 34
206 discordant sites; PER-826A: 44 discordant sites; Supplementary Table S6)). We further
207 investigated evidence of clonal somatic lesions in cell lines using an alternative approach, by
8 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
208 directly extracting quality filtered WES reads mapping to coding somatic point mutations
209 (n=2,411; Q>50 in diagnosis WES; Figure 2e) and LOH-sites (n=3,117; Q>50 in remission
210 WES; Figure 2f; Supplementary Table S8). All coding clonal somatic mutations were
211 supported by reads in both cell lines of which 121 (5%) were homozygous (VAF>0.95 in cell
212 line and VAF>0.9 in primary sample). Most LOH-sites were evidenced by reference alleles
213 (VAF<10%; 2,179/3,117) in cell lines, with ~30% of LOH-sites evidenced by alternate alleles
214 (VAF> 90%; 926/3,117) and only 12 sites (0.3%) with both reference and alternate alleles
215 (VAF range 10-90%).
216 We performed copy number analysis of the ultra-mutated sample to map LOH-sites to
217 chromosomal segments with evidence of deletions. Most genomic regions were called as
218 diploid with limited chromosomal copy loss detected (Supplementary Figure S1a-b). There
219 was a total of 382/6,555 (5.8%) LOH-sites mapping to homozygous deletions, suggesting
220 only a minority of the identified LOH-sites likely arose through chromosome segment
221 deletion. Infant KMT2A-R ALL typically carry a small number of copy-number alterations
222 which may coincide with LOH-sites. We determined the distance of LOH-sites, somatic
223 mutations and germline variants (called by VarScan2) to the nearest homozygous SNV in
224 each leukemia diagnostic sample finding that in canonical cases LOH-sites tend to be located
225 closer to homozygous sites than either germline or somatic variants (Supplementary Figure
226 S1c; left panel) and show a unique bimodal distribution (Supplementary Figure S1c; right
227 panel). In contrast the average distance of LOH-sites to homozygous SNVs in the ultra-
228 mutated sample was similar to that of germline variants and somatic mutations with each
229 displaying a similar distribution (Supplementary Figure S1c). Thus, most LOH-sites in the
230 ultra-mutated specimen do not show evidence of localising within deleted or copy-neutral
231 loss-of-heterozygosity chromosomal segments.
232 In summary, these data demonstrate that somatic lesions in the primary sample were
233 predominantly clonal and LOH-sites appear to comprise a large proportion of somatic
234 substitutions rather than chromosomal alterations. We also note relatively few coding
9 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
235 sequence changes in derived cell lines compared to the dominant diagnostic leukemic clone,
236 both which are capable of long-term propagation. These results could reflect loss of the
237 hypermutator phenotype during in vitro culture or in diagnostic leukemia cells.
238
239 Signatures of somatic mutations and LOH-sites correlate with profiles of germline alleles
240 We next investigated the sequence features of somatic and LOH substitutions in the
241 ultra-mutated sample (Figure 3). We hypothesised that LOH-sites and somatic mutations may
242 show common feature biases depending on the context of alteration. Somatic events were
243 evaluated separately that induce a heterozygous mutation (n=4,773) or homozygous mutation
244 (n=281), and LOH causing reference→alternative changes (n=2,057) were evaluated
245 separately to alternative→reference changes (n=4,453). Since we noted an overlap in the
246 somatic mutations of the ultra-mutated sample with variants in the ExAC database (Figure 1a-
247 c), we first examined the distribution of these sites with respect to the derived allele frequency
248 in ExAC. We observe that each of these categories of sequence alterations in the ultra-
249 mutated sample were distributed across the range of population allele-frequencies (Figure 3a-
250 b). While the distributions of LOH and somatic sites were similar, somatic heterozygous
251 changes were relatively more frequent at higher frequency alleles as were LOH
252 reference→alternate alleles.
253 Somatic signatures of ultra-mutations were investigated in comparison to
254 heterozygous and homozygous single nucleotide polymorphisms (SNPs) in patient remission
255 samples. Heterozygous positions were defined by VAF<0.9 yielding 51,347 heterozygous and
256 31,237 homozygous SNPs in canonical remission samples and 10,681 heterozygous and 6
257 617 homozygous SNPs in the ultra-mutated remission sample. Analysis of WES remission
258 variants from canonical patients and the ultra-mutated sample revealed similar ExAC allele
259 frequency distributions with distinctions between heterozygous and homozygous alleles
260 (Figure 3c-d). The overall enrichment of each category of somatic substitution and LOH
10 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
261 across the 96 possible trinucleotide contexts resembled patient remission SNPs with elevated
262 proportions of C>T transitions at NpCpG motifs (Figure 3e).
263 Pre-defined signatures were used to infer mutational profiles revealing Signature 1A
264 (associated with spontaneous deamination of 5-methylcytosine), Signature 12 (associated
265 with unknown process with strand-bias for T→C substitutions) and Signature 20 (associated
266 with defective DNA mismatch repair, MMR) as enriched in each of the mutation/variant sets,
267 however with distinctive proportions (Figure 3f). Heterozygous somatic mutations and LOH
268 alternative→reference sites were additionally enriched for Signature 1B (variant of Signature
269 1A) and showed highly correlated profiles (Pearson’s correlation 0.919). LOH
270 reference→alternative sites were additionally enriched for Signature 5 (associated with
271 unknown process with strand-bias for T→C substitutions) and were highly correlated with
272 homozygous somatic mutation profiles (0.912) and to profiles of patient heterozygous
273 positions (P337: 0.935; canonical: 0.926). Homozygous somatic mutations also displayed
274 similarities to heterozygous patient SNPs (P337: 0.981; canonical: 0.985). Overall the somatic
275 mutational profiles indicate highly related patterns of both somatic ultra-mutations and LOH-
276 sites. Notably LOH alternative→reference sites showed unique and overlapping features to
277 somatic heterozygous mutations. We also observe underlying similarities in the profiles of
278 somatic substitutions and germline alleles.
279
280 Ultra-mutation distribution recapitulates common and rare germline variation
281 We noted a similar spectrum of rare and common alleles in the ultra-mutated
282 leukemia transcriptome compared to typical KMT2A-R iALL cases; but there were
283 comparatively fewer rare/low-frequency alleles shared with the remission sample (Figure 1a-
284 c; ExAC<0.5%). We further explored this trend to disentangle the contribution of somatic
285 acquired alleles and LOH-sites. Analysis of the remission and diagnostic samples from this
286 patient in isolation, revealed a similar proportion of rare/low-frequency coding alleles
11 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
287 detected by WES and both were proportionally within the same range as canonical
288 counterparts (Figure 4a). However, when directly comparing the remission and diagnostic
289 WES, we noted a modest depletion of germline rare/low-frequency coding SNV alleles
290 defined by VarScan2 (Figure 4a; 2.9% rare alleles in P337 compared to 3.5±0.2% in
291 canonical cases). This apparent depletion of germline alleles in the ultra-mutated leukemic
292 sample was coincident with an increased proportion of somatic acquired rare/low-frequency
293 coding SNV alleles compared to LOH-sites (Figure 4a). Therefore, when considering the total
294 pool of SNVs in the leukemic coding genome, the somatic gains and losses of rare/low-
295 frequency coding SNV alleles were in proportions that resulted in a similar distribution as
296 canonical cases (Figure 4a). We extended this observation by investigating the rare and
297 common allele abundance of expressed germline-encoded variants and somatic mutations in
298 the ultra-mutated leukemic sample compared with a larger cohort of KMT2A-R iALL
299 transcriptomes (Figure 4b; n=32 St. Jude patients and n=8 Perth patients). Most typical
300 KMT2A-R iALL diagnostic specimens expressed similar numbers of rare and low-frequency
301 expressed SNVs, except for 4/32 cases sequenced by St Jude investigators, which had an
302 elevated number of expressed rare and low-frequency ExAC alleles (Figure 4b). Excluding
303 these samples, typical KMT2A-R iALL patients expressed on average 42.3 ±10.4 (standard
304 deviation) variants that were absent from ExAC and 212 ± 32.5 variants with an allele
305 frequency of <0.5% in ExAC. The total number of expressed variants in the ultra-mutated
306 sample was slightly lower than typical cases (38 absent from ExAC and 174 AF <0.5% in
307 ExAC), and were approximately comprised of half germline (17/38 and 88/174 respectively)
308 and half somatic acquired alleles. Overall, the ultra-mutated transcriptome had the lowest
309 prevalence of germline-encoded rare and low-frequency alleles among the larger sample size
310 (Table 2). Furthermore, the somatic mutation abundance within each allele-frequency bin was
311 proportional to their abundance in prototypical silent cases. Consequently, the low numbers of
312 rare germline alleles in the ultra-mutated transcriptome were off-set by somatic mutations
313 such that the total numbers were within the same range as typical cases (Figure 4b-c). We
314 conclude that despite carrying thousands of somatic mutations, the coding-genome of ultra-
12 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
315 mutated patient P337 shows similarities to canonical KMT2A-R iALL cases, vis-à-vis rare
316 allele prevalence and representation of common human genetic variation.
317 RAS-pathway and DNA-replication and repair gene mutations
318 Gene ontology analysis of the expressed non-synonymous somatic mutations (n=535)
319 revealed over-representation of RAS-pathway genes (7/23 genes; 7-fold enriched; P=0.049;
320 RALGDS, RAC1, CHUK, PIK3R1, NFKB1, PLD1 and PIK3CA) and “MLL signature 1 genes”
321 (33/369 genes; 2-fold enriched; P=0.046). Of the 535 non-synonymous expressed somatic
322 mutations, 45 were predicted deleterious/damaging, including in the MMR gene MSH2
323 (p.E489K). The MSH2 somatic mutation localizes to a hot-spot, within the MMR ATPase
324 domain in a region at the periphery of the DNA binding site, approximately 20 Å from the
325 DNA molecule in the complex structure of full-length MSH2, a fragment of MSH6, ADP and
326 DNA containing a G:T mispair 35. The mutation encodes a substitution of a negatively
327 charged glutamate with a positively charged lysine (Figure 4d). There was a non-synonymous
328 somatic mutation in another MMR gene, MUTYH; however, this site is a common human
329 allele with a derived AF of 0.297 and was not called as damaged/deleterious by our prediction
330 pipeline. There were no somatic mutations, LOH-sites nor rare/deleterious coding germline
331 alleles detected in POLE or POLD1 which are frequently co-mutated with MSH2 in ultra-
332 mutated tumours 36, 37 and in sporadic early-onset colorectal cancer 38. Among the 45
333 expressed and predicted damaging somatic mutations, there were two rare substitutions in
334 DNA polymerase encoding genes: POLG2 (p.D308V; ExAC AF 0.0000165), which encodes
335 a mitochondrial DNA polymerase and POLK (p.R298H; ExAC AF 0.0003), which encodes
336 an error prone replicative polymerase that catalyzes translesion DNA synthesis. The POLK
337 mutation maps to a conserved impB/mucB/samB family domain (Pfam: PF00817) involved in
338 UV-protection. The encoded substitution of arginine to histidine is located approximately 20
339 Å from the DNA molecule in the structure of a 526 amino acid N-terminal Pol K fragment
340 complexed with DNA containing a bulky deoxyguanosine adduct induced by the carcinogen
341 benzo[a]pyrene (Figure 4e) 39. Both wild-type and mutant residues are positively charged,
13 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
342 however arginine is strongly basic (pI 11) while histidine is weakly basic (pI 7) and titratable
343 at physiological pH 40. We additionally observed an expressed and predicted functional
344 germline rare allele in ATR, a master regulator of the DNA damage response (p.L2076V;
345 ExAC AF 0.0002).
346 Global transcriptional similarities of the ultra-mutated and canonical infant leukemias
347 We next examined the transcriptional profile of the highly mutated KMT2A-R iALL
348 specimen in comparison to a collection of patient leukemia samples and purified blood
349 progenitor populations (Figure 5). As expected cell type was the major source of variation
350 distinguishing normal hematopoietic progenitor, lymphoblastic leukemias and myeloid
351 leukemias. We also observe clustering according to KMT2A translocation status and partner
352 genes as reported previously 41. Replicate samples of the highly mutated KMT2A-R iALL
353 specimen cluster with canonical counterparts demonstrating global transcriptional similarities
354 (Figure 5).
14 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
355 Discussion:
356 We report sequence analysis of an ultra-mutated acute lymphoblastic leukemia
357 diagnosed in an infant at 82 days of age which, to our knowledge, represents the earliest onset
358 of a hypermutant tumor recorded. The mutation rate observed was at the upper limit of cases
359 reported previously in childhood cancers 36, 37. By integrating analysis of somatic mutations
360 and germline alleles with a larger cohort of KMT2A-R iALL patients we found that the high
361 somatic mutation rate does not generate large numbers of rare alleles compared to typical
362 cases. Rather, we observe that the ultra-mutated KMT2A-R leukemic coding genome closely
363 resembles the rare allele landscape of canonical cases. This apparent paradox is explained by
364 the net effect of somatic gains and losses, both of which were targeted to rare and common
365 alleles in proportions that off-set the depleted pool of germline alleles observed in the
366 leukemic coding sequences. Our data indicate both the rare allele burden and transcriptional
367 profile of the ultra-mutated leukemia were comparable to its canonical counterparts.
368 Therefore, despite its radically different genetic route to leukemogenesis, the molecular
369 profile of the ultra-mutated specimen seems to have converged towards that of canonical
370 cases.
371 Most ultra-mutated childhood cancers reported previously have a combined MMR
372 and proof reading polymerase deficiency 36, 37 - we detect evidence for the former (MSH2
373 point mutation), but not the latter. However, we identify somatic mutations in two genes
374 encoding DNA polymerases (POLK and POLG2) and an additional MMR gene (MUTYH).
375 Notably, the POLK mutation is a rare allele, predicted functional and encodes pol κ, a
376 replicative error-prone translesion DNA polymerase that plays a role in bypassing unrepaired
377 bases that otherwise block DNA replication 42, 43. For example, pol κ preferentially
378 incorporates dAMP at positions complementary to 7,8-dihydro-8-oxo-2′-deoxyguanosine
379 adducts 44. As a corollary to these observations, POLK mutations have been found in prostate
380 cancer with elevated C→T transition 45. Further studies are required to determine if any of
15 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
381 these mutations cooperate to specify the unique mutational pattern observed in the ultra-
382 mutated KMT2A-R iALL genome.
383 The largest fraction of ultra-mutations was attributed to Signature 1, associated with
384 5-methylcytosine deamination inducing C→T transitions, which accumulates in normal
385 somatic cells with age over the human lifespan 46. This signature is found ubiquitously across
386 tumour sub-types 47, and prominently in hypermutated early onset AML patients with
387 germline MBD4 mutations 48. LOH-sites (reference→alternative) were enriched for Signature
388 5 which is also correlated with human aging. We observe enrichment of Signature 20 in
389 LOH-sites and somatic alterations, consistent with an association with a MMR-defect 47. This
390 signature has also been reported as enriched in tumours with germline-MMR and secondary
391 mutation in the POLD1 proof-reading gene 47. We also find enrichment of Signature 12 which
392 arises from a poorly characterized underlying molecular process; however, the dominant base
393 transitions, A→G and T→C have been observed in association with DNA damage (e.g. by
394 adducts) in normal and cancer tissues 49, 50. These observations suggest common mutagenic
395 and repair deficits driving age-related mutation in humans were operative in the cell of origin
396 during pre-natal evolution of the ultra-mutated infant leukemia genome. Collectively, these
397 results implicate the MSH2 point mutation as the primary MMR-hit, followed by an
398 additional event effecting a proof reading polymerase such as POLK, however, our data imply
399 that POLE and POLD1 mutations are not cooperative in this process.
400 The preponderance of ultra-mutated sites to overlap germline alleles in proportion to
401 their population allele frequencies demonstrates that hypermutation can recapitulate aspects
402 of population scale human genetic variation in a cancer genome. Aproximately half of the
403 somatic substitutions appear to target heterozygous SNPs resulting in loss-of-heterozygosity.
404 We cannot rule out a role for focal chromosomal alterations undetectable by our copy-number
405 and zygosity analyses. However collectively our data suggest that a major fraction of the
406 LOH-sites arose by point mutagenesis of heterozygous positions. They more frequently target
407 the alternate allele inducing reversion to the reference base, with around 30% of events
16 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
408 resulting in a homozygous non-reference allele in the diagnostic sample. Most ultra-mutations
409 are therefore likely passengers coinciding with common human alleles, consistent with
410 findings that hypermutated tumours tend to have relatively few drivers proportional to their
411 mutation burden 51; however, as proposed previously by Valentine and colleagues52, a
412 minority of rare alleles may contribute to infant leukemogenesis. While our results do not
413 resolve the temporal order of the key oncogenic events - namely MSH2 point mutation,
414 genomic hypermutation and KMT2A translocation – due to the clonal nature of these events
415 they appear to have occurred in rapid succession. This could be explained by a sentinel DNA
416 repair defect inducing hypermutation and genome instability followed by positive selection of
417 cooperating lesions conferring a growth advantage resulting in clonal expansion and fixation
418 of passengers 53. An alternative model is also possible in which the MSH2 MMR-defect
419 impaired double-strand-break repair 54, 55 promoting KMT2A-rearrangement, but with neutral
420 selection of somatic point mutations. Such a scenario appears inconsistent with the observed
421 predominance of clonal mutations, but could be reconciled if the hypermutator phenotype was
422 lost rapidly after the KMT2A-rearrangement and prior to clonal expansion. Since pervasive
423 positive selection has been observed in cancer and non-cancerous tissues 51, 56, we favour the
424 former model as being more plausible. However, since we noted loss of hypermutator
425 phenotype in patient-derived cell lines, the later model is not ruled out by our data.
426 Collectively our results illuminate a link between mutagenesis during human aging, germline
427 variation and somatic ultra-rmutation in cancer. Further studies are required to determine if
428 combinations of somatic mutations and/or germline alleles promote KMT2A-R iALL
429 leukemogenesis/hypermutation and if somatic mutated sites significantly overlap human
430 germline alleles in other malignancies.
17 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
431 Financial support: Children’s Leukaemia and Cancer Research Foundation (Inc.) Fellowship
432 (MNC); NHMRC Career Development Fellowship 1066869 (JW); Raine Medical Research
433 Foundation Clinician Research Fellowship 2016-2018 (RSK); Children’s Leukaemia and
434 Cancer Research Foundation (Inc.) project grants (URK, MNC, RSK); Ian Potter Foundation
435 (AG); Healy Foundation (AG). Telethon Kids Institute Blue Sky Research Grant (MNC,
436 URK, RSK, JW). Telethon Kids Institute Research Focus Area Precision Medicine Working
437 Group Funding (JW, MNC).
438 Acknowledgments: We thank Daniel MacArthur and Fengmei Zhao for assisting with exome
439 sequence analysis; and researchers at The Pediatric Cancer Genome Project at St. Jude
440 Children's Research Hospital for provision of data. We thank all the patients, families and
441 hospital staff for providing samples.
442 Authorship contributions: AG wrote computer code, performed bioinformatics, conceived
443 analysis strategy, interpreted and analyzed data and edited the manuscript. AA and BF
444 performed experiments. KWC, PK and CB performed additional data analysis and
445 interpretation. RSK provided patient samples and clinical data, assisted with data
446 interpretation and edited the manuscript. CHC, URK and JW acquired funding support,
447 assisted with data interpretation and edited the manuscript. MNC performed bioinformatics,
448 formulated research, conceived analysis strategy, acquired funding support, interpreted data,
449 designed experiments and drafted the manuscript.
450 Disclosure of Conflicts-of-Interest: The authors declare no conflicts of interest.
18 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
451 References:
452 1. Meyer C, et al. The MLL recombinome of acute leukemias in 2017. Leukemia, 453 (2017).
454 455 2. Bergerson RJ, et al. An insertional mutagenesis screen identifies genes that 456 cooperate with Mll-AF9 in a murine leukemogenesis model. Blood 119, 4512-4523 457 (2012).
458 459 3. Chen W, O'Sullivan MG, Hudson W, Kersey J. Modeling human infant MLL leukemia 460 in mice: leukemia from fetal liver differs from that originating in postnatal marrow. 461 Blood 117, 3474-3475 (2011).
462 463 4. Gale KB, et al. Backtracking leukemia to birth: identification of clonotypic gene 464 fusion sequences in neonatal blood spots. Proc Natl Acad Sci U S A 94, 13950-13954 465 (1997).
466 467 5. Andersson AK, et al. The landscape of somatic mutations in infant MLL-rearranged 468 acute lymphoblastic leukemias. Nat Genet 47, 330-337 (2015).
469 470 6. Sanjuan-Pla A, et al. Revisiting the biology of infant t(4;11)/MLL-AF4+ B-cell acute 471 lymphoblastic leukemia. Blood 126, 2676-2685 (2015).
472 473 7. Cruickshank MN, et al. Systematic chemical and molecular profiling of MLL- 474 rearranged infant acute lymphoblastic leukemia reveals efficacy of romidepsin. 475 Leukemia, (2016).
476 477 8. Lavallee VP, et al. The transcriptomic landscape and directed chemical interrogation 478 of MLL-rearranged acute myeloid leukemias. Nat Genet 47, 1030-1037 (2015).
479 480 9. Casero D, et al. Long non-coding RNA profiling of human lymphoid progenitor cells 481 reveals transcriptional divergence of B cell and T cell lineages. Nat Immunol 16, 482 1282-1291 (2015).
483 484 10. Anders L, et al. Genome-wide localization of small molecules. Nat Biotechnol 32, 92- 485 96 (2014).
486 487 11. Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor 488 analysis of control genes or samples. Nat Biotechnol 32, 896-902 (2014).
489 490 12. Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model 491 analysis tools for RNA-seq read counts. Genome Biol 15, R29 (2014).
19 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
492 493 13. Francis RW, Thompson-Wicking K, Carter KW, Anderson D, Kees UR, Beesley AH. 494 FusionFinder: a software tool to identify expressed gene fusion candidates from 495 RNA-Seq data. PLoS One 7, e39987 (2012).
496 497 14. Piskol R, Ramaswami G, Li JB. Reliable identification of genomic variants from RNA- 498 seq data. Am J Hum Genet 93, 641-651 (2013).
499 500 15. Koboldt DC, et al. VarScan 2: somatic mutation and copy number alteration 501 discovery in cancer by exome sequencing. Genome Res 22, 568-576 (2012).
502 503 16. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants 504 from high-throughput sequencing data. Nucleic Acids Res 38, e164 (2010).
505 506 17. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense 507 mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7, Unit7 20 (2013).
508 509 18. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous 510 variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073-1081 511 (2009).
512 513 19. Schwarz JM, Rodelsperger C, Schuelke M, Seelow D. MutationTaster evaluates 514 disease-causing potential of sequence alterations. Nat Methods 7, 575-576 (2010).
515 516 20. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. 517 Genome Res 19, 1553-1561 (2009).
518 519 21. Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A. Distribution and 520 intensity of constraint in mammalian genomic sequence. Genome Res 15, 901-913 521 (2005).
522 523 22. Cooper GM, et al. Single-nucleotide evolutionary constraint scores highlight disease- 524 causing mutations. Nat Methods 7, 250-251 (2010).
525 526 23. Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general 527 framework for estimating the relative pathogenicity of human genetic variants. Nat 528 Genet 46, 310-315 (2014).
529 530 24. Rosenthal R, McGranahan N, Herrero J, Taylor BS, Swanton C. DeconstructSigs: 531 delineating mutational processes in single tumors distinguishes DNA repair 532 deficiencies and patterns of carcinoma evolution. Genome Biol 17, 31 (2016).
533
20 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
534 25. Favero F, et al. Sequenza: allele-specific copy number and mutation profiles from 535 tumor sequencing data. Ann Oncol 26, 64-70 (2015).
536 537 26. McLean CY, et al. GREAT improves functional interpretation of cis-regulatory 538 regions. Nat Biotechnol 28, 495-501 (2010).
539 540 27. Berman HM, et al. The Protein Data Bank. Nucleic Acids Res 28, 235-242 (2000).
541 542 28. He J, et al. Integrated genomic DNA/RNA profiling of hematologic malignancies in 543 the clinical setting. Blood 127, 3004-3014 (2016).
544 545 29. Chang MT, et al. Identifying recurrent mutations in cancer reveals widespread 546 lineage diversity and mutational specificity. Nat Biotechnol 34, 155-163 (2016).
547 548 30. Forbes SA, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids 549 Res 45, D777-D783 (2017).
550 551 31. Downing JR, et al. The Pediatric Cancer Genome Project. Nat Genet 44, 619-622 552 (2012).
553 554 32. Pickrell JK, Gilad Y, Pritchard JK. Comment on "Widespread RNA and DNA sequence 555 differences in the human transcriptome". Science 335, 1302; author reply 1302 556 (2012).
557 558 33. Chang VY, Basso G, Sakamoto KM, Nelson SF. Identification of somatic and germline 559 mutations using whole exome sequencing of congenital acute lymphoblastic 560 leukemia. BMC Cancer 13, 55 (2013).
561 562 34. Henderson MJ, et al. A xenograft model of infant leukaemia reveals a complex MLL 563 translocation. Br J Haematol 140, 716-719 (2008).
564 565 35. Warren JJ, Pohlhaus TJ, Changela A, Iyer RR, Modrich PL, Beese LS. Structure of the 566 human MutSalpha DNA lesion recognition complex. Mol Cell 26, 579-592 (2007).
567 568 36. Campbell BB, et al. Comprehensive Analysis of Hypermutation in Human Cancer. Cell 569 171, 1042-1056 e1010 (2017).
570 571 37. Shlien A, et al. Combined hereditary and somatic mutations of replication error 572 repair genes result in rapid onset of ultra-hypermutated cancers. Nat Genet 47, 257- 573 262 (2015).
574
21 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
575 38. Palles C, et al. Germline mutations affecting the proofreading domains of POLE and 576 POLD1 predispose to colorectal adenomas and carcinomas. Nat Genet 45, 136-144 577 (2013).
578 579 39. Jha V, Bian C, Xing G, Ling H. Structure and mechanism of error-free replication past 580 the major benzo[a]pyrene adduct by human DNA polymerase kappa. Nucleic Acids 581 Res 44, 4957-4967 (2016).
582 583 40. Szpiech ZA, et al. Prominent features of the amino acid mutation landscape in 584 cancer. PLoS One 12, e0183273 (2017).
585 586 41. Stam RW, et al. Gene expression profiling-based dissection of MLL translocated and 587 MLL germline acute lymphoblastic leukemia in infants. Blood 115, 2835-2844 (2010).
588 589 42. Kanemaru Y, et al. DNA polymerase kappa protects human cells against MMC- 590 induced genotoxicity through error-free translesion DNA synthesis. Genes Environ 591 39, 6 (2017).
592 593 43. Maddukuri L, Ketkar A, Eddy S, Zafar MK, Eoff RL. The Werner syndrome protein 594 limits the error-prone 8-oxo-dG lesion bypass activity of human DNA polymerase 595 kappa. Nucleic Acids Res 42, 12027-12040 (2014).
596 597 44. Irimia A, Eoff RL, Guengerich FP, Egli M. Structural and functional elucidation of the 598 mechanism promoting error-prone synthesis by human DNA polymerase kappa 599 opposite the 7,8-dihydro-8-oxo-2'-deoxyguanosine adduct. J Biol Chem 284, 22467- 600 22480 (2009).
601 602 45. Yadav S, Mukhopadhyay S, Anbalagan M, Makridakis N. Somatic Mutations in 603 Catalytic Core of POLK Reported in Prostate Cancer Alter Translesion DNA Synthesis. 604 Hum Mutat 36, 873-880 (2015).
605 606 46. Alexandrov LB, et al. Clock-like mutational processes in human somatic cells. Nat 607 Genet 47, 1402-1407 (2015).
608 609 47. Alexandrov LB, et al. Signatures of mutational processes in human cancer. Nature 610 500, 415-421 (2013).
611 612 48. Mathijs A Sanders EC, Christoffer Flensburg, Annelieke Zeilemaker, Sarah E Miller, 613 Adil al Hinai, Ashish Bajel, Bram Luiken, Melissa Rijken, Tamara Mclennan, Remco M 614 Hoogenboezem, François G Kavelaars, Marnie E Blewitt, Eric M Bindels, Warren S 615 Alexander, Bob Löwenberg, Andrew W Roberts, Peter J M Valk, Ian Majewski. 616 Germline loss of MBD4 predisposes to leukaemia due to a mutagenic cascade driven 617 by 5mC. bioRxiv, (2018).
22 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
618 619 49. Chen L, Liu P, Evans TC, Jr., Ettwiller LM. DNA damage is a pervasive cause of 620 sequencing errors, directly confounding variant identification. Science 355, 752-756 621 (2017).
622 623 50. Lin J, Shi T. Error-prone DNA polymerase and oxidative stress increase the incidences 624 of A to G mutations in tumors. Oncotarget 8, 45154-45163 (2017).
625 626 51. Martincorena I, et al. Universal Patterns of Selection in Cancer and Somatic Tissues. 627 Cell 171, 1029-1041 e1021 (2017).
628 629 52. Valentine MC, et al. Excess congenital non-synonymous variation in leukemia- 630 associated genes in MLL- infant leukemia: a Children's Oncology Group report. 631 Leukemia, (2013).
632 633 53. Dagogo-Jack I, Shaw AT. Tumour heterogeneity and resistance to cancer therapies. 634 Nat Rev Clin Oncol, (2017).
635 636 54. van Oers JM, et al. The MutSbeta complex is a modulator of p53-driven 637 tumorigenesis through its functions in both DNA double-strand break repair and 638 mismatch repair. Oncogene 33, 3939-3946 (2014).
639 640 55. Burdova K, Mihaljevic B, Sturzenegger A, Chappidi N, Janscak P. The Mismatch- 641 Binding Factor MutSbeta Can Mediate ATR Activation in Response to DNA Double- 642 Strand Breaks. Mol Cell 59, 603-614 (2015).
643 644 56. Martincorena I, et al. Tumor evolution. High burden and pervasive positive selection 645 of somatic mutations in normal human skin. Science 348, 880-886 (2015).
646
647
23 bioRxiv preprint certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder Gout et al Ultra-mutation targets germline alleles in an infant leukemia doi: 648 Table 1: Characterisation of somatic lesions in KMT2A-R patient samples. https://doi.org/10.1101/248690
Reciproc Point al Diagnostic Mutations Point Indels Point mutations or Patient KMT2A- leukemia Constitutional 0.1 > VAF1 Mutations VAF > LOH2- hotspots with RNA- ID KMT2A-fusion fusion DNA remission DNA < 0.3 VAF > 0.3 0.3 sites SEQ reads Not > 1,000 expressed P337 KMT2A-MLLT1 detected Exome Exome 198 5,054 260 6,555 somatic mutations Not No non-silent somatic ; a this versionpostedFebruary7,2018.
P401 KMT2A- MLLT 3 detected Exome Exome 78 3 9 45 mutations CC-BY-NC 4.0Internationallicense AFF1- No non-silent somatic P399 KMT2A-AFF1 KMT2A Exome Exome 95 3 7 67 mutations Not P438 KMT2A- MLLT3 detected Exome Exome 74 4 9 100 NRAS.p.G12S EPS15- PER3.p.M1006R, P810 KMT2A-EPS15 KMT2A Exome Exome 95 4 5 46 PER3.p.K1007E EPS15- PER3 M1006R, P809 KMT2A-EPS15 KMT2A Exome Exome 69 3 3 28 PER3.p.K1007E AFF1- P848 KMT2A-AFF1 KMT2A no Exome NA NA NA NA KRAS.p.A146T . AFF1- The copyrightholderforthispreprint(whichwasnot P287 KMT2A-AFF1 KMT2A no no NA NA NA NA No COSMIC3 hotspots Not P706 KMT2A- MLLT1 detected no Exome NA NA NA NA KRAS.p.A146T Not P272 KMT2A-AFF1 detected no no NA NA NA NA No COSMIC3 hotspots 649 1VAF: Variant allele fraction; 2LOH: Loss-of-heterozygosity; 3COSMIC: Catalogue of Somatic Mutations in Cancer
24 bioRxiv preprint certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder Gout et al Ultra-mutation targets germline alleles in an infant leukemia doi: 650 Table 2: Mean and standard deviation of expressed rare (Not in ExAC or <0.5% allele frequency), low-frequency (0.5-1% allele frequency) and common 651 (>1% allele frequency) alleles in 39 KMT2A-R patient samples compared to ultra-mutated (P337) sample and its constituent expressed germline alleles and https://doi.org/10.1101/248690 652 somatic mutations.
Not in ExAC <0.5% AF1 in ExAC 0.5-1% AF in ExAC >1% AF in ExAC KMT2A-R iALL 42.3±10.4 212.1±32.6 76.8±14.7 7722.3±266.0 P337 expressed 38 174 89 7590 P337 Germline 17 88 47 6235 P337 Somatic 21 86 42 1355
1 2 ; a 653 AF: allele frequency; ExAC: Exome aggregation consortium this versionpostedFebruary7,2018. CC-BY-NC 4.0Internationallicense . The copyrightholderforthispreprint(whichwasnot
25 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
654 Figure Legends:
655 Figure 1: Characterization of germline and somatic point mutations identifies an ultra-
656 mutated infant leukemia case. (a) Overlap of variants identified from diagnostic leukemia
657 specimen by RNA-seq and whole exome sequencing (WES). (b) Overlap of variants called in
658 diagnostic RNA-seq and remission bone marrow WES. (c) Overlap of variants called in
659 diagnostic and remission WES. Barplots colored showing overlap binned by derived allele
660 frequency as follows: present with AF<0.5% in ExAC (green), present with AF 0.5-1% in
661 ExAC (yellow), present with AF>1% in ExAC (purple), absent from the ExAC (orange), non-
662 overlapping variants are colored to show sites with coverage <20 reads (grey) or coverage
663 >20 reads (black) in the comparator. (d) Scatterplots of diagnostic leukemia (x-axis) versus
664 remission (y-axis) variant allele fraction (VAF) from WES for coding variants detected in
665 diagnostic leukemia RNA-seq.
666 Figure 2: Clonal analysis of somatic ultra-mutations. Histograms showing the VAF
667 density distribution of somatic mutations (left, x-axis: dark shaded bars) or LOH-sites (right;
668 x-axis: light shaded bars) in diagnostic leukemia WES from canonical cases (a) and the ultra-
669 mutated case (b). (c) Illustration of cell lines derived from ultra-mutated specimen and
670 histograms displaying the VAF density distribution of somatic mutations (left, x-axis: dark
671 shaded bars) or LOH-sites (right; x-axis: light shaded bars) in cell lines (PER-784A: red;
672 PER-826A: green). (d) Venn diagram displaying the overlap in somatic mutations (left) and
673 LOH-sites (right) identified by VarScan2 comparing the primary diagnostic specimen and
674 derived cell lines (PER-784A: red; PER-826A: green) to the primary remission sample. (e)
675 Somatic mutation and (f) LOH coding sites showing total read coverage (left y-axis) and VAF
676 (right y-axis) from primary specimens (left panels) and cell lines (right panels) ordered by
677 decreasing VAF in the diagnostic sample (x-axis).
678
26 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
679 Figure 3: Germline-like somatic signatures of ultra-mutations. (a-d) Histograms showing
680 the number (a-b) or frequency (c-d) of events overlapping the spectrum of ExAC derived
681 allele frequency bins with 5% increments. (a) Somatic mutations called by VarScan2
682 grouping homozygous and heterozygous events separately. (b) Loss-of-heterozygosity sites
683 called by VarScan2 grouping reference→alternative and alterantive→reference sites
684 separately. (c-d) All quality filtered coding variants from each of the canonical patients or
685 P337 ultra-mutated specimen called by GATK grouped by single nucleotide polymorphisms
686 (SNPs) that were homozygous and heterozygous events from (c) remission sample or (d)
687 diagnostic sample. (e) Normalized fraction of variants within unique trinucleotide context
688 (ordered alphabetically on the x-axis) for VarScan2 calls (left panel; as shown for panels a-b),
689 or remission variants (right panel; as shown for panel c). (f) Somatic signatures identified by
690 deconstructSigs for categories of mutations/SNPs (displayed in panel e) showing their relative
691 proportions (bar plot) and pair-wise Pearson’s correlation coefficients (heatmap), ordered by
692 unsupervised hierarchical clustering.
693
694 Figure 4: Ultra-mutations target rare and common human alleles and are associated
695 with somatic point mutations encoding a novel MSH2 allele and a rare POLK allele.
696 (a) Proportion of coding single nucleotide variants in diagnostic (D) and remission (R) whole
697 exome sequencing (WES) data binned by derived allele frequency as follows: present with
698 AF<0.5% in ExAC (green), present with AF 0.5-1% in ExAC (yellow), present with AF>1%
699 in ExAC (purple), absent from the ExAC (orange). Grey shading denotes data from the ultra-
700 mutated specimen. The distribution of variants called as germline, somatic or LOH by
701 VarScan2 are displayed for comparison. (b) Number of variants (y-axis) called from RNA-seq
702 reads from St Jude and Perth cohorts (open circles) are plotted with the total variants detected
703 in the ultra-mutated sample (shaded circles), germline-encoded variants (shaded triangle
704 pointing up) and somatic point mutations (shaded triangle pointing down) binned according to
27 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
705 the allele count in ExAC database (red: not in ExAC; green < 0.5%; yellow: 0.5-1% and blue
706 > 0.5%). (c) Bar plot displaying fraction of variants within rare, low-frequency and common
707 allele frequency bins (as described for panel a) ordered by increasing proportion of common
708 alleles (i.e. derived AF>1%) and excluding outliers (SJINF018, SJINF021, SJINF010 and
709 SJINF008). Proportion of RNA-seq germline and somatic variants identified by overlap with
710 remission/diagnostic WES data are displayed for comparison. (d) Crystal structure of the
711 MutS lesion recognition complex (PDB CODE 2O8D35) MSH2:MSH6 mismatch repair
712 proteins (MSH2: green, MSH6: blue) bound to ADP and G:T mismatch DNA (orange)
713 showing the location of the amino acid change (NM_000251:exon9:c.G1465A:p.E489K). (e)
714 Crystal structure of Pol k (green and blue representing different molecules) in complex with a
715 bulky DNA adduct (orange) (PDB CODE 4U6P39) showing locations of the amino acid
716 change (NM_016218:exon7:c.G893A:p.R298H).
717 Figure 5: Global transcriptional concordance of ultra-mutated infant leukemia. Multi-
718 dimensional scaling of RNA-seq count data from acute leukemias and normal bone marrow
719 progenitor populations displaying translocation partner gene by color and blood/leukemia
720 type by symbol shape according to the legend. Labels show replicate measurements from the
721 ultra-mutated diagnostic specimen, P337.
28 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Gout et al Ultra-mutation targets germline alleles in an infant leukemia
722 Supplementary Fig. S1. Copy-number alterations and loss-of-heterozygosity in ultra-
723 mutated KMT2A-R iALL diagnostic sample. (a) Genomic copy number ratios of the ultra-
724 mutated sample comparing diagnostic and remission WES data. (b) Proportion of reads
725 mapping to different copy-number bins (up to 4). (c) Box and whisker plots (median with
726 upper and lower quartiles) displaying the nearest distance of a variant/mutation to a
727 homozygous variant in the diagnostic WES data for canonical samples (left) compared to the
728 ultra-mutated sample (right) for VarScan2 germline, somatic and LOH SNV calls.
729 Supplementary Table S1: Summary of samples
730 Supplementary Table S2: Read evidence of RNA-seq variants that were not detected in
731 matched WES data.
732 Supplementary Table S3: Comparison of RNA-seq reads from diagnosis sample with WES
733 reads.
734 Supplementary Table S4: Comparison of RNA-seq reads from diagnosis sample with WES
735 reads from diagnosis, remission and derived cell lines.
736 Supplementary Table S5: Annotation of somatic mutations and indels from ultra-mutated
737 patient P337.
738 Supplementary Table S6: Summary of LOH-sites called in primary P337 and derived cell
739 lines.
740 Supplementary Table S7: Summary of somatic SNVs called in primary P337 and derived
741 cell lines.
29 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
FIGURE 1 a b c P810 P810 P810 P809 P809 P809 P438 P438 P438 P399 P399 P399 P401
Diagnosis P401 P401 P337 RNA vs exome
P337 RNA Diagnosis P337 Diagnosis exome Diagnosis vs remission exome vs remission exome 0 0 20 40 0 20 40 50 100 150 200 250 400 600 800 100 200 300 6000 7000 8000 100 200 300 6000 7000 8000 1000 16000180002000022000 Number of variants Number of variants Number of variants Not in ExAC <0.5% AF in ExAC 0.5-1% AF in ExAC >1% AF in ExAC Discordant >20 reads Discordant <20 reads d “Silent genomic landscape” Ultra-mutated P399 P401 P438 P809 P810 P337 1.0 1.0 1.0 1.0 1.0 1.0
0.5 0.5 0.5 0.5 0.5 0.5
Remission exome VAF Remission 0 0 0 0 0 0 0 0.5 1.0 0 0.5 1.0 0 0.5 1.0 0 0.5 1.0 0 0.5 1.0 0 0.5 1.0 Diagnosis exome VAF b e Frequency a FIGURE 2 400 1 000 1 000 Read coverage Frequency 10 15 800 600 400 200 200 400 600 800 0 0 5 P337 diagnosisVAF . . 1.0 0.5 0.1 P399 diagnosisVAF 0
. . 1.0 0.5 0.1 P337 coding somatic variants somatic coding P337
500 bioRxiv preprint certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder Remission Diagnosis 1000 4 000 150 300 0 0 . 1.0 0.5 0
1500 1.0 0.5 0
2000 doi: Reads Reads VAF VAF https://doi.org/10.1101/248690
10 1.0 0.5 0.0 0.5 1.0 c
P337 diagnosis 0 4 8
n fractio allele ariant V P401 diagnosisVAF . . 1.0 0.5 0.1
1 000 Read coverage 1 000 800 600 400 200 200 400 600 800
0 Cell line read evidence read line Cell 100 150 50 0 500 PER-826A PER-784A . 1.0 0.5 0 PER-784A PER-826A 1000 300
1500 0 ; PER-784A VAF . . 1.0 0.5 0.1 a this versionpostedFebruary7,2018. CC-BY-NC 4.0Internationallicense 10 15 0 5
2000 P438 diagnosisVAF . . 1.0 0.5 0.1 Reads Reads VAF VAF
1.0 0.5 0.0 0.5 1.0 4 000
n fractio allele ariant V 0 f . 1.0 0.5 0 200 400 0 1 000 Read coverage 1 000 800 600 400 200 200 400 600 800 . 1.0 0.5 0
0 P337 coding LOH sites LOH coding P337
500 400
1000 0 . . 1.0 0.5 0.1 PER-826A VAF Remission Diagnosis 0 4 8 P809 diagnosisVAF
1500 1.0 0.5 0.1 . The copyrightholderforthispreprint(whichwasnot 2000 4 000 2500 0 Reads Reads 100 150 VAF VAF . 1.0 0.5 0 3000 50 0
1.0 0.5 0.0 0.5 1.0 . 1.0 0.5 0
n fractio allele ariant V
1 000 Read coverage 1 000 800 600 400 200 200 400 600 800 PER-784A 0 d P337 diagnosis
940 99 Cell line read evidence read line Cell 0 4 8 7 305 177 . . 1.0 0.5 0.1
500 P810 diagnosisVAF Somatic 4 272 300
1000 PER-826A VarScan2overlap PER-784A PER-826A 127 1500 100 2000 PER-784A 0 115 P337 diagnosis . 1.0 0.5 0 2500 405 260 5 485 Reads Reads LOH 405 45 VAF VAF
3000 PER-826A
129 1.0 0.5 0.0 0.5 1.0
n fractio allele ariant V bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
FIGURE 3 a P337 somatic homozygous b P337 LOH Ref→Alt 10 000 10 000 P337 somatic heterozygous P337 LOH Alt→Ref 1 000 1 000
100 100
10 10
1 1 Number of events Number of events 0.1 0.1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 0.00 0.05 0.10 0.15 0.20 0.250.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.750.80 0.85 0.90 0.95 1.00 ExAC derived allele frequency ExAC derived allele frequency c d Remission coding Diagnosis coding 1 P337 homozygous Canonical homozygous 1 P337 homozygous Canonical homozygous P337 heterozygous Canonical heterozygous P337 heterozygous Canonical heterozygous 0.1 0.1
0.01 0.01
0.001 0.001
0.0001 0.0001 Relative frequency (fractions ) Relative frequency (fractions ) 0.00 0.050.10 0.15 0.200.25 0.30 0.35 0.400.45 0.50 0.55 0.600.65 0.70 0.750.80 0.85 0.90 0.951.00 0.00 0.050.10 0.15 0.200.25 0.30 0.35 0.400.45 0.50 0.55 0.600.65 0.70 0.750.80 0.85 0.90 0.951.00 ExAC derived allele frequency ExAC derived allele frequency e VarScan2 P337 somatic lesions Remission coding variants P337 somatic heterozygous P337 heterozygous 0.08 N = 4 773 0.08 N = 10 681
0.04 0.04 Fraction Fraction 0 0 P337 somatic homozygous 0.08 P337 homozygous N = 281 N = 6 617 0.04 0.04 Fraction Fraction 0 0 P337 LOH Alt→Ref 0.08 Canonical heterozygous N = 4 453 N = 51 347 0.04 0.04 Fraction Fraction 0 0 0.08 P337 LOH Ref→Alt 0.08 Canonical homozygous N = 2 057 N = 31 237 0.04 0.04 Fraction Fraction 0 0 T[T>A]T T[T>A]T T[T>C]T T[T>C]T T[C>T]T T[C>T]T T[T>A]A T[T>A]A A[T>A]T A[T>A]T T[T>G]T T[T>G]T T[T>C]A T[T>C]A A[T>C]T A[T>C]T C[T>A]T C[T>A]T T[T>A]C T[T>A]C T[C>T]A T[C>T]A A[T>A]A A[T>A]A A[C>T]T A[C>T]T T[C>A]T T[C>A]T T[T>C]C T[T>C]C T[T>G]A T[T>G]A A[T>G]T A[T>G]T C[T>C]T C[T>C]T T[T>A]G A[T>C]A T[T>A]G A[T>C]A G[T>A]T G[T>A]T T[C>T]C T[C>T]C A[T>A]C A[T>A]C C[T>A]A C[T>A]A C[C>T]T C[C>T]T A[C>T]A A[C>T]A T[C>A]A T[C>A]A A[C>A]T A[C>A]T C[T>G]T C[T>G]T T[T>G]C T[T>G]C G[T>C]T T[T>C]G G[T>C]T T[T>C]G A[T>G]A A[T>G]A C[T>C]A C[T>C]A A[T>C]C A[T>C]C T[C>T]G T[C>T]G C[T>A]C C[T>A]C G[T>A]A G[T>A]A G[C>T]T G[C>T]T A[T>A]G A[T>A]G C[C>T]A C[C>T]A T[C>G]T A[C>T]C T[C>G]T A[C>T]C T[C>A]C T[C>A]C C[C>A]T C[C>A]T A[C>A]A A[C>A]A T[T>G]G T[T>G]G G[T>G]T G[T>G]T A[T>G]C A[T>G]C C[T>G]A C[T>G]A C[T>C]C C[T>C]C G[T>C]A G[T>C]A A[T>C]G A[T>C]G C[T>A]G C[T>A]G G[T>A]C G[T>A]C C[C>T]C C[C>T]C G[C>T]A G[C>T]A A[C>T]G A[C>T]G T[C>G]A T[C>G]A T[C>A]G T[C>A]G A[C>G]T A[C>G]T G[C>A]T G[C>A]T A[C>A]C A[C>A]C C[C>A]A C[C>A]A C[T>G]C C[T>G]C G[T>G]A G[T>G]A G[T>C]C G[T>C]C A[T>G]G A[T>G]G C[T>C]G C[T>C]G G[T>A]G G[T>A]G G[C>T]C G[C>T]C C[C>T]G C[C>T]G T[C>G]C T[C>G]C C[C>G]T C[C>G]T A[C>G]A A[C>G]A G[C>A]A G[C>A]A C[C>A]C C[C>A]C A[C>A]G A[C>A]G C[T>G]G C[T>G]G G[T>G]C G[T>G]C G[T>C]G G[T>C]G G[C>T]G G[C>T]G T[C>G]G T[C>G]G G[C>G]T G[C>G]T A[C>G]C A[C>G]C C[C>G]A C[C>G]A G[C>A]C G[C>A]C C[C>A]G C[C>A]G G[T>G]G G[T>G]G C[C>G]C C[C>G]C G[C>G]A G[C>G]A A[C>G]G A[C>G]G G[C>A]G G[C>A]G C[C>G]G C[C>G]G G[C>G]C G[C>G]C G[C>G]G G[C>G]G C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G f Proportion of variants (%) Pearson’s correlation coefficient 0 50 100 0.2 0.4 0.6 0.8 1.0 Unknown P337 LOH Alt→Ref Signature U1 P337 heterozygous somatic Signature R3 P337 homozygous SNP Signature 5 Canonical homozygous SNP Signature 20 P337 LOH Ref→Alt Signature 12 P337 homozygous somatic Signature 1B Canonical heterozygous SNP Signature 1A P337 heterozygous SNP a FIGURE 4 Proportion of variants (%) 100 10 95 0 5 d 0.5-1% Not inExAC
MSH2 p.E489K P399-R WES codingvariants bioRxiv preprint AF inExAC P399-D certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder P401-R P401-D P438-R P438-D MSH2 p.E489K P809-R P809-D
>1% <0.5% P810-R
P810-D doi:
AF inExAC P337-R 22 AF inExAC
P337-D https://doi.org/10.1101/248690 VarScan2 Å
23 Gerrmline LOH Å Somatic 20 Å b 10 000
Number of variant1 000 s 100 10
St Jude Not inExAC Perth ; a P337 this versionpostedFebruary7,2018. Germline CC-BY-NC 4.0Internationallicense b Somatic RNA-seq codingvariants
St Jude <0.5% AF Perth P337 Germline Somatic
St Jude 0.5-1% AF Perth P337
e Germline Somatic POLK p.R298H .
St Jude The copyrightholderforthispreprint(whichwasnot Perth >1% AF P337 Germline Somatic c Proportion of variants (%) 100 95 10 0 5
SJINF017 SJINF059 SJINF016 20 POLK p.R298H SJINF004
30 SJINF013 Å
POLK p.R298H SJINF014 Å SJINF011
30 SJINF001 SJINF015 Å P287 P810 SJINF020 RNA-seq codingvariants P809 P438 SJINF063 SJINF012 SJINF006 SJINF019 SJINF062 SJINF009 SJINF053 P337 SJINF002 SJINF055 P706 P401 P399 SJINF064 SJINF003 SJINF005 SJINF022 P848 Germline Somatic bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
FIGURE 5 Blood progenitor ● MLL/KMT2A-Germline ● ● ● ● ● MLL/KMT2A-MLLT1 ● ● ● ● ● ●● ● P337_rep1 MLL/KMT2A-AFF1 ● ● ● ● ● ● ●● ● ● ●
1 2 MLL/KMT2A-MLLT3 1 2 ● P337_rep2 ● ● ● ● ●● ● ● ● MLL/KMT2A-MLLT4 ● ● ● ● ● MLL/KMT2A-MLLT10 ● ● ● 0 0 ● ● MLL/KMT2A-EPS15 MLL/KMT2A-USP2 MLL/KMT2A-CASC5 ● MLL/KMT2A-MLLT6 MLL/KMT2A-ELL ● Dimension 2 MLL/KMT2A-ENL
Leading logFC dim 2 MLL/KMT2A-SEPT9 ● MLL/KMT2A-ENAH LMPP MLL/KMT2A-GAS7 BCP Diagnostic infant ALL HSC Relapse infant ALL LCP −4 −3 −2 −1
−4 −3 −2 −1 Childhood ALL Thy Childhood AML Leucegene AML −4 −2 0 2 Dimension 1 Leading logFC dim 1 bioRxiv preprint doi: https://doi.org/10.1101/248690; this version posted February 7, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
SUPPLEMENTARY FIGURE S1 a 2.5 7 Copy number 2.0 6 5 1.5 4 1.0 3 2 Depth ratio 0.5 1 0 0.0 Chromosome: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X b c 80 Canonical 7 P337 0.5 6 0.4 Canonical LOH 60 5 0.3 4
40 3 Density 0.2 2 0.1 Percentage (%) Percentage 20 1
to homozygous SNP) 0.0
log10 (shortest distance 0 0 2 4 6 0 Germline Somatic LOH Germline Somatic LOH 0 1 2 3 4 log10 (shortest distance to homozygous SNP) Copy number N = 19 245 Bandwidth = 0.1431