bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Mutational signatures in upper tract urothelial carcinoma define etiologically distinct

2 subtypes with prognostic relevance

3 Xuesong Li1†, Huan Lu2,3†, Bao Guan 1,2,4,5†, Zhengzheng Xu2,3, Yue Shi2, Juan Li2,3, Wenwen

4 Kong2,3, Jin Liu6, Dong Fang1,5, Libo Liu1,4,5, Qi Shen1,4,5, Yanqing Gong1,4,5, Shiming He1,4,5, Qun

5 He1,4,5, Liqun Zhou 1,4,5* and Weimin Ci 2,3,7*

6 Affiliations:

7 1Department of Urology, Peking University First Hospital, Beijing 100034, China.

8 2Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese

9 Academy of Sciences, Beijing 100101, China;

10 3University of Chinese Academy of Sciences, Beijing 100049, China.

11 4Institute of Urology, Peking University, Beijing 100034, China.

12 5National Urological Cancer Center, Beijing, 100034, China.

13 6Department of Urology, The Third Hospital of Hebei Medical University, Hebei 050051, China.

14 7Institute of Stem cell and Regeneration, Chinese Academy of Sciences, Beijing 100101, China.

15

16 *Correspondence to: [email protected]; and [email protected].

17 † These authors contributed equally to this work.

18

19 Keywords:

20 Upper tract urothelial carcinoma; Whole-genome sequencing; Mutational signature; Aristolochic

21 acid.

22

23

1

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24 Abstract

25 Parts of East Asia have a very high upper tract urothelial carcinoma (UTUC) prevalence, and an

26 etiological link between the medicinal use of herbs containing aristolochic acid (AA) and UTUC

27 has been established. The mutational signature of AA, which is characterized by a particular

28 pattern of A:T to T:A transversions, can be detected by genome sequencing. Thus, integrating

29 mutational signatures analysis with clinicopathological data may be a crucial step toward

30 personalized treatment strategies for the disease. Therefore, we performed whole-genome

31 sequencing (WGS) of 90 UTUC patients from China. Mutational signature analysis via

32 nonnegative matrix factorization method defined three etiologically distinct subtypes with

33 prognostic relevance: (i) AA, the typical AA mutational signature characterized by signature 22,

34 had the highest tumor mutation burden, the best clinical outcomes; (ii) Age, an age-related group

35 featured by signatures 1 and 5, had the lowest weighted genome instability index score, the worst

36 clinical outcomes; and (iii) DSB, signature with deficiencies in DNA double strand break-repair

37 featured by signatures 3, the intermediate clinical outcomes. Additionally, the distinct AA subtype

38 was associated with AA exposure, the highest number of predicted neoantigens and heavier

39 lymphocytes infiltrating. Thus, it may be good candidate for immune checkpoint blockade therapy.

40 Notably, we showed AA-mutational signature was also identified in histologically “normal”

41 urothelial cells. Thus, non-invasive urine test based on the AA-mutational signature could take

42 advantage of this “field effect” to screen individuals at increased risk of recurrence due to exposure

43 to herbal remedies containing the AA. Collectively, the findings here may accelerate

44 the development of novel prognostic markers and personalized therapeutic approaches for UTUC

45 patients in China.

46

2

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

47 Introduction

48 The epidemiology and clinicopathological characteristics of UTUC is largely influenced by its

49 geographic distribution. Parts of East Asia such as Taiwan have a much higher UTUC prevalence,

50 accounting for more than 30% of urothelial cancers (UCs)(1-3) compared to only 5-10% of UCs

51 in Western populations(4). Chinese patients are more frequently female, of higher rate for kidney

52 failure(5). Conceivably, geographic differences in risk factors for UTUC, such as AA-containing

53 herb drugs consumption may account for some of these observations(1, 6-8). It has been shown

54 that AA-induced mutational signature was characterized by A:T to T:A transversions at the

55 sequence motif A[C∣T]AGG(8, 9). The unusual genome-wide AA signature, termed signature

56 22 in COSMIC, holds great potential as “molecular fingerprints” for AA exposure in multiple

57 cancer types(7, 9). Similarly, other endogenous or exogenous DNA damaging agents will also

58 create mutational signatures detectable by DNA sequencing(10). Thus, analyses of mutational

59 signatures, informative in other tumors(11, 12), have not been comprehensively reported in UTUC

60 especially Chinese patients. The most comprehensive genomic studies of UTUC in Western

61 populations used exome sequencing or targeted panel sequencing of cancer-associated genes(13-

62 15). However, deriving signatures from whole-exome data may be problematic(16). Herein, we

63 reported the first comprehensive genomic analysis of UTUC based on mutational signatures using

64 WGS in 90 Chinese UTUC patients. Mutational signature analysis defined three UTUC subtypes

65 with distinct etiologies and clinical outcomes. We further showed that geographically distinct AA

66 subtype with high tumor mutation burden and higher level of tumor-infiltrating lymphocytes held

67 great potential for immunotherapy. These results provided deeper insights into the biology of this

68 cancer.

69

3

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

70 Results

71 Copy number variations (CNVs) dominate the UTUC landscape

72 The flowchart of selecting patients was shown in fig. S1. Clinicopathologic characteristics of 90

73 UTUC patients were listed in table S1. The samples were sequenced by WGS to an average 30X

74 depth which allowed us to comprehensively catalog both single nucleotide variations (SNVs),

75 small insertions and deletions (indels) and copy number variations (CNVs). We identified a

76 median of 19,639 (interquartile range [IQR], 16,578 to 32,937) SNVs and a median of 2,197 (IQR,

77 1,615 to 2,650) indels. A median of 437 (IQR, 355 to 584) coding mutations was noted (fig. 1A).

78 The increased mutation load was consistent with the increased prevalence of exposure to the potent

79 mutagen AAs (fig. 1B). Next, we examined the candidate driver genes with MutSigCV (Q<0.05).

80 Only two genes, TP53 and FRG2C, were identified (fig. 1C). However, many genes listed in the

81 Cancer Gene Census as known driver genes were affected by non-silent mutations including genes

82 which frequently mutated in Western UTUC patients, such as KMT2D (18%) and ARID1A (14%)

83 (fig. 1C). Additionally, the overall genomic landscape confirmed the dominance of recurrent

84 CNVs compared with SNVs and indels in the cohort (fig. 1D).

85 Mutational signatures define three UTUC subtypes with distinct etiology

86 Next, we explored the dynamic interplay of risk factors and cellular processes using mutational

87 signature analysis. We identified 23 mutational signatures defined by COSMIC in our cohort,

88 which were merged by shared aetiologies into 9 signatures by MutationalPatterns(17) (Materials

89 and Methods). Hierarchical clustering based on the number of SNVs attributable to each signature

90 confirmed three major subtypes (fig. 2A): (1) AA, there is significantly higher number of signature

91 22(1, 8) mutations (a median of 1,871 ) compared to the other two subtypes (fig. 2B, Wilcoxon

92 rank test); (2) Age, there is significantly higher number of signatures 1 and 5, attributed to clocklike

4

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

93 mutational processes accumulated over cell divisions(18) (fig. 2C, Wilcoxon rank test); and (3)

94 DSB, there is significantly higher number of signature 3 mutations, associated with failure of DNA

95 double-strand break-repair by homologous recombination (fig. 2D, Wilcoxon rank test). We

96 further verified whether the subtypes defined by mutational signatures associated with their

97 attributed etiological and clinicopathologic features. Consistent with previous epidemiological

98 studies in Asian patients(6, 19-21), we found that the AA subtype was significantly associated

99 with AA-containing herb drugs intake, poor renal function, female sex, multifocality and lower T

100 stage (fig. 2E, Kruskal-Wallis test). Thus, the AA subtype defined by mutational signatures did

101 capture the etiological subgroup of UTUC patients who had AAs exposure. Furthermore,

102 consistent with the scenario that DNA damage repair deficiency may increase genomic instability,

103 the DSB subtype exhibited both higher level weighted genome integrity index (wGII) and higher

104 level microsatellite instability (MSI) compared to the Age subtype (fig. 2F and 2G). Additionally,

105 lymphovascular invasion, an independent predictor of clinical outcomes for UTUC(22), was

106 higher in the Age subtype compared to the DSB subtype. Although statistically significance was

107 not achieved which may be due to limited number of cases (table S2).

108 Subtypes defined by mutational signatures can predict clinical outcomes

109 Next, we explored whether clinical outcomes of the three subtypes differed significantly. Notably,

110 a Kaplan-Meier plot revealed that the subtypes can predict both cancer-specific survival (CSS)

111 (fig. 3A, log-rank, p=0.006) and metastasis-free survival (MFS) (fig. 3B, log-rank, p=0.007). The

112 AA and DSB subtypes exhibited favourable outcomes compared with the Age subtype. Since an

113 improved prognostic stratification could offer more tailored therapeutic decisions, we further

114 stratified the patients into subgroups, muscle invasion and non-muscle invasion subgroups.

115 Consistently, the AA and DSB subtypes also exhibited favourable outcomes in muscle-invasive

5

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

116 UTUC patients (log-rank, p=0.001 for CSS, log-rank, p=0.002 for MFS) (fig. 3C and 3D). In

117 multivariate Cox regression proportional analyses adjusted for the effects of standard

118 clinicopathologic variables, mutational subtypes were still an independent risk factor for CSS

119 (Hazard Ratio [HR]=4.05, 95% CI: 1.58~10.42, p=0.004) and MFS (HR=3.43, 95% CI: 1.46~8.06,

120 p=0.006) (table S3). We also used the recently described mutational signature deconvolution

121 method, Mutalisk, for comparison(23). This method was based on different statistical frameworks,

122 and therefore some differences were to be expected. Nevertheless, we observed the same key

123 signature patterns with similar-sized patient subgroups expressing the dominant signature types

124 (fig. S2A). Both Mutalisk and MutationalPatterns identified the same group of patients with AA

125 signature. MutationalPatterns identified 9 putative Age subtype of UTUC patients that Mutalisk

126 did not identify (T002, T005, T013, T040, T047, T048, T087, T091 and T094) (fig. S2B). Since

127 the Age subtypes by either MutationalPatterns and Mutalisk showed the worst CSS and MFS (fig.

128 3, fig. S2C and S2D), we would propose that MutationalPatterns was preferable to err on the side

129 of overcalling rather than undercalling the presence of age-related signature.

130 Integration of mutational signature subtypes with copy number alterations

131 One practical question was how this subtyping could be implemented clinically. Previous study

132 had showed that 10X coverage data can confidently retrieve dominant signatures(11). Therefore,

133 lower-coverage WGS could provide a cost-effective alternative for signature-based stratification

134 especially for the AA subtype (more than 35% mutations contributed by signature 22 in our cohort).

135 Although for the Age and DSB subtype which did not have the dominant signatures, we next

136 explored whether CNV features can distinguish these two subtypes. First, we examined the

137 landscape of CNVs in UTUC by ichorCNA(24) which can predict CNVs in tumor-only samples.

138 Recurrent CNVs in UTUC patients of BCM/MDACC cohort were also identified in our study(13),

6

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

139 such as gain in 1q, 8q and loss in 2q, 5q, 10q (fig. S3A). However, the defined mutational subtypes

140 presented similarities in terms of CNV profiles (fig. S4A and S4B). Of note, several focal CNVs

141 were significantly associated with the mutational subtypes, such as deletion of 10q26.2-26.3 was

142 more prevalent in the DSB subtype compared to the Age subtype, and amplification of 1q31.1-

143 1q32.1 was more prevalent in the AA subtype compared to the DSB subtype (fig. S4C). Moreover,

144 we performed unsupervised hierarchical clustering which separated UTUCs into three groups with

145 distinct CNV patterns. However, there was not significantly association between mutational

146 signature subtypes and CNV subtypes (fig. S5A). Meanwhile, the overall survival of the CNV

147 subtypes did not differ significantly (fig. S5B and S5C). Among all the recurrent CNV regions,

148 only amplification of 17q25.1 was significantly associated with better overall survival (fig. S5D).

149 Hence, the CNV profiles seem to capture a different type of clinicopathological information

150 compared with the mutational signature subtypes.

151 Field cancerization may contribute to malignant transformation especially for the AA 152 subtype 153 Clinical interest in the AA subtype is increasing especially in East Asia since AA-associated

154 UTUCs are prevalent(1, 25). Consistent with our previous epidemiological study(21), we found

155 an increased rate of multifocality and high bladder recurrence in the AA subtype (fig. 2B). Field

156 cancerization(26), which is the development of a field with genetically altered cells, has been

157 proposed to explain the development of multiple primary tumours and local recurrence. Therefore,

158 we further sequenced three AA subtype patients (table S4) including a multifocal patient (fig. 4A

159 and fig. S6). We did find that significant amount of SNVs and indels in the morphologically normal

160 urothelium specimens in all three patients. Strikingly, the AA mutational signature was

161 consistently identified in urothelium specimens and tumor tissues in all 3 patients (fig. 4B), which

162 indicated that AA exposure may contribute to the field cancerization. Probably damaging

7

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

163 mutations in genes that were documented by the COSMIC and the KEGG pathway in cancer were

164 shown in table S5. Moreover, in the multifocal patient, the urothelial tumour at renal pelvic from

165 2007 shared no genetic alterations with a renal pelvic tumour from 2015 or a bladder tumour from

166 2015. However, the two tumours from 2015 were genetically related (fig. 4C). Therefore, field

167 cancerization and intraluminal seeding could co-contribute to the multifocality and increased

168 bladder recurrence in the AA subtype patients.

169 Potential for immunotherapy in the AA subtype of UTUC patients

170 The AA subtype bears high mutation burdens thus may be good candidate for immune checkpoint

171 blockade therapy(27). To address this possibility, we predicted neoantigens binding to patient-

172 specific human leukocyte antigen (HLA) types. The AA subtype had the highest number of

173 predicted neoantigens (fig. 5A). Moreover, it has been reported that lymphocytes infiltration

174 especially CD3+ lymphocytes in tumour region is associated with improved survival in a range of

175 cancers including urothelial cancer(28-30). Previous study has shown that the number of tumor-

176 infiltrating lymphocyte independently correlates progress-free survival in NSCLC patients treated

177 with nivolumab immunotherapy(31). Pathological assessment of tumor-infiltrating lymphocytes

178 allows an easy evaluation of immune infiltration. Therefore, we further evaluated the extent of

179 tumor-infiltrating mononuclear cells (TIMCs) and CD3+ lymphocytes in 76 available samples

180 (table S6). We found the number of CD3+ lymphocytes was positively associated with the number

181 of stromal TIMCs (R2=0.72; p<0.001) (fig. 5B). The AA subtype had the highest number of both

182 stromal TIMCs (Wilcoxon rank test, p<0.001) and CD3+ lymphocytes (Wilcoxon rank, p<0.001)

183 (fig. 5C and 5D). Representative images from a patient of the AA subtype and the Age subtype

184 were shown in the fig. 5E and fig. 5F, respectively.

185 186 Discussion

8

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

187 Herein, we performed the first comprehensive genomic analysis of Chinese UTUC patients.

188 Three subtypes were defined by mutational signatures. We found that the AA subtype was

189 significantly associated with the usage of AA-containing herb drugs. However, a substantial

190 percentage of patients with the AA-mutational signature did not report usage of AA-containing

191 herb drugs. This finding may be due to the difficulty to track the dosage of AAs from various

192 herbal remedies. Some AA-containing herbal remedies have been officially prohibited in China

193 since 2003, but contains over 500 species, many of which have medicinal

194 properties(32). In the current study, we only took 70 AA-containing single products, such as

195 Guang Fangchi, Qing Mu Xiang, Ma Dou Ling, Tian Xian Teng, Xun Gu Feng, and Zhu Sha Lian

196 in Mandarin Chinese and mixed herbal formulas, such as, Long Dan Xie Gan, into account. And

197 the detail list of AA-containing herb drugs surveyed by current study was shown in table S7. Some

198 herb plants containing AA were not banned such as Xi Xin(33). Exposure to AAs and their

199 derivatives may still be widespread in China. Notably, we showed AA-mutational signature was

200 also identified in histologically “normal” urothelial cells. In a published study(34), similar “field

201 effect” was also identified in an Chinese UTUC patient. Thus, non-invasive urine test based on the

202 AA-mutational signature could take advantage of this “field effect” to screen individuals at

203 increased risk of recurrence due to exposing to herbal remedies containing the carcinogen AA.

204 One limitation of current study was lack of a matched normal for 85 of 90 UTUC patients due

205 to lack of funds. Therefore, we relied on the public mutation database and in-house genomic data

206 of 1000 healthy Chinese individuals for the filtering germline variances. But in recent years, these

207 catalogs of human variation have grown exponentially, questioning the necessity of sequencing a

208 matched normal for every tumor sample(35). Additionally, a matched normal will not be available

209 in many pathological conditions, such as cancer cell lines and liquid biopsy. Although we can not

9

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

210 make definite conclusions that the SNVs identified in current study were somatic mutations. We

211 may also overcall the Age subtypes without a matched normal since germline variants contributed

212 to signature 5 in COSMIC which combined with signature 1 as age-related signature in current

213 study. However, with this caveat, the mutational signature subtypes did capture distinct clinical

214 and genomic features, such as better overall survival, higher level of wGII and higher level of MSI

215 in DSB subtype compared to Age subtype of patients.

216 Additionally, our study supports a rationale for management of UTUC, most notably the AA

217 subtype. Patients with AA mutational signature may predict better checkpoint inhibitors response.

218 Strikingly, the DSB subtype was featured by signature 3 in COSMIC. In pancreatic cancer,

219 responders to platinum therapy usually exhibit signature 3 mutations(36). These results suggested

220 that platinum-based therapy may also benefit UTUC patients of DSB subtype. More recently,

221 protein components of the DNA damage repair (DDR) systems have been identified as promising

222 avenues for targeted cancer therapeutics(37). We also further assessed the mutation rates across

223 345 genes associated with DDR, as previously described in a pan-cancer analysis(37) (fig. S7).

224 There was a 1.6 fold enrichment (P=0.007, 95% CI:1.12-2.20) and a 1.4 fold enrichment (P=0.01,

225 95% CI:1.07-1.86) of DDR genes alterations in DSB and Age subtype, respectively. Further

226 mechanistic insights of the DSB and Age subtypes may provide the basis for therapeutic

227 opportunities within the DNA damage response. In summary, our study provides the most

228 comprehensive genomic profile of Chinese UTUC patients to date. The findings may accelerate

229 the development of novel prognostic markers and personalized therapeutic approaches for UTUC

230 patients especially those from geographical regions with exposure to AA-containing herb drugs.

231

232 Materials and Methods

10

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

233 Patients cohort

234 All the fresh samples in this study were obtained from Peking University First Hospital between

235 January 2016 to December 2017 (Grant No.2015[977]). These fresh samples were stored in liquid

236 nitrogen after surgery immediately. Formalin-fixed, paraffin-embedded (FFPE) samples were

237 provided by the Institute of Urology after pathologic diagnosis. The main endpoint events

238 consisted of overall survival (OS), cancer-specific survival (CSS), metastasis-free survival (MFS),

239 progression-free survival (PFS) and bladder-recurrence free survival (BRFS). All patients were

240 not treated with neoadjuvant chemotherapy. AA exposure assessment was performed according to

241 self-reported data on 70 AA and its derivatives-containing herb drugs (collectively, AA) intake(21,

242 38). These herbs were taken as single products (Guan Mu Tong (Aristolochia manshuriensis Kom),

243 Guang Fangchi (Aristolochia fangchi), Qing Mu Xiang (Radix Aristolochiae), Ma Dou Ling

244 (Fructus Aristolochiae), Tian Xian Teng (Caulis Aristolochiae), Xun Gu Feng (herba

245 Aristolochiae mollissimae), and Zhu Sha Lian (Aristolochia cinnabarina)) or were components of

246 mixed herbal formulas (eg, Guan Mu Tong in the Long Dan Xie Gan mixture). And the detail list

247 of AA-containing herb drugs surveyed by current study was shown in table S7. The accumulated

248 self-reported usage for the above drugs more than a year was termed as AA exposure patients.

249 Clinical and demographic information was obtained from a prospectively maintained institutional

250 database. The study was approved by the Ethics Committee of Peking University First Hospital.

251 Whole-genome sequencing

252 The sheared DNA was repaired and 3´ dA-tailed using the NEBNext Ultra II End Repair/dA-

253 Tailing Module unit and then ligated to paired-end (PE) adaptors using the NEBNext Ultra II

254 Ligation Module unit. After purification by AMPure XP beads, the DNA fragments were amplified

11

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

255 by PCR for 6-8 cycles. Finally, the libraries were sequenced with an Illumina X-ten instrument,

256 thus generating 2 x 150-bp PE reads.

257 Single nucleotide variations (SNVs) and indels calling

258 SNVs and indels were called using VarScan2(39)and Vardict(40) in the form of single sample

259 mode, then Rtg-tools(41) was used to remove variants called in a set of 1000 healthy, unrelated

260 Chinese individuals (unpublished data from CAS Precision Medicine Initiative) and got the

261 common variants called by the two software. Further filtering criteria were carried out according

262 to the reference(11, 42). Finally, Annovar(43) was applied to remove variants whose mutation

263 frequency was no less than 0.001 in 1000 Genomes project, latest Exome Aggregation Consortium

264 dataset, NHLBI-ESP project with 6500 exomes, latest Haplotype Reference Consortium database

265 and latest Kaviar database.

266 Mutational signature analysis

267 R package MutationalPatterns(17) was carried out to assign each sample’ signature. We discovered

268 23 COSMIC signatures which were merged by shared etiologies into 9 signatures in our cohort.

269 We named signature 22 which has been found in cancer samples with known exposures to

270 aristolochic acid(1, 8) as AA, DNA double strand break-repair deficiency featured by signature 3

271 as DSB. We merged signature 1 and 5 as Age(18), signature 2, 13 as APOBEC, combined

272 signatures including signature 6, 15, 20, 21, 26, 27 as MSI, combined signatures including

273 signature 8, 12, 16, 17, 18, 19, 23, 25, 28, 30 as unknown. Mutational signatures was extracted

274 from the mutation count matrix, with non-negative matrix factorization (NMF). Hierarchical

275 clustering was performed by the number of single nucleotide variants (SNVs) attributable to each

276 signature.

277 Copy number variations (CNVs) analysis

12

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

278 To determine copy number states, we used a recently published method(44). Briefly, the genome

279 was divided into bins of variable length with an equal amount of potential uniquely mapping reads.

280 We simulated approximately 37X hg19 reads mapped by BWA and divided the genome into

281 50,000 bins of the same mapping ability. Loess normalization was used to correct for GC bias.

282 Candidate driver gene analysis

283 Candidate driver genes were explored by MutSigCV(45) to analyse SNVs and indels. Chromatin

284 state, DNA replication time, expression level and the first 20 principal components of 169

285 chromatin marks(46) from the RoadMap Epigenomics Project(47) of the mutated gene were used

286 as covariates. We applied the cut off 0.05 for Q value. Cancer census genes from COSMIC(48)

287 release v87 were calculated for their mutation rates (mutations/gene length) and genes with more

288 than 10 samples were selected with high mutation rates.

289 The weighted genome integrity index score (wGII) analysis

290 We calculated the wGII score as reported(49). Briefly, the percentage of gained and lost genomic

291 material was calculated relative to the ploidy of the sample. The use of percentages eliminates the

292 bias induced by differing chromosome sizes. The wGII score of a sample was defined as the

293 average of this percentage value over the 22 autosomal chromosomes.

294 Microsatellite instability (MSI) analysis

295 mSINGS(50) was used to calculate the ratio of unstable loci in the genome of each sample. First,

296 the microsatellites at the reference genome were calculated by MSIsensor(51). Sixteen

297 microsatellite stable esophageal squamous cell carcinomas(52) were used to calculate the baseline

298 for the absence of MSI signature (signature 6, 15, 20, 21, 26, 27) as defined by COSMIC(48).

299 Neoantigen prediction

13

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

300 HLAscan(53) was used to genotype the HLA region with HLA-A, HLA-B and HLA-C taken into

301 consideration. NetMHC4.0(54) was used for predictions of peptide-MHC class I interaction, SNVs

302 annotated by Annovar(43) as nonsynonymous were used to do this analysis. An in-house script

303 was carried out to obtain possible 9–amino acid sequences covering the mutated amino acids

304 according to the manual. Rank in the output that was no more than two were considered as binding

305 in the light of BindLevel suggestion.

306 Assessing tumor-infiltrating lymphocytes

307 The tumor infiltrating mononuclear lymphocytes were measured according to a standardized

308 method from the International Immuno-Oncology Biomarkers Working Group(55). And the CD3

309 antibody (ab5690, 1:10000; Abcam) was used to evaluate the CD3+ lymphocytes in tumor section.

310 Acknowledgements: We thank Gengyan Xiong and Lei Zhang for scientific inputs about the

311 manuscript as well as Han Hao, Qin Tang and Xiaoteng Yu for assistance with clinicopathological

312 characterizations.

313

314 Funding: This work was supported by CAS Strategic Priority Research Program (XDA16010102

315 to W.C.), the National Key R&D Program of China (2018YFC2000100 to W.C.), CAS (QYZDB-

316 SSW-SMC039 to W.C.), the National Natural Science Foundation of China (7152146 to X.L.,

317 81672541 to W.C. and 81672546 to L.Z.), K.C. Wong Education Foundation to W.C., the Clinical

318 Features Research of Capital (Z151100004015173 to L.Z.), the Capital Health Research and

319 Development of Special (2016-1-4077 to L.Z.).

320 Disclosure: Weimin Ci certifies that there are no conflicts of interest.

321 Data availability

14

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

322 The raw sequence data reported in this paper have been deposited in the Genome Sequence

323 Archive(56) in BIG Data Center(57), Beijing Institute of Genomics (BIG), Chinese Academy of

324 Sciences (accession numbers HRA000029). That can be accessed at http://bigd.big.ac.cn/gsa-

325 human/s/HiObV4f3.

326 327 328 1. Chen CH, Dickman KG, Moriya M, Zavadil J, Sidorenko VS, Edwards KL, et al. Aristolochic acid- 329 associated urothelial cancer in Taiwan. Proc Natl Acad Sci U S A. 2012;109(21):8241-6. 330 2. Yang MH, Chen KK, Yen CC, Wang WS, Chang YH, Huang WJ, et al. Unusually high incidence of 331 upper urinary tract urothelial carcinoma in Taiwan. Urology. 2002;59(5):681-7. 332 3. Hsiao PJ, Hsieh PF, Chang CH, Wu HC, Yang CR, Huang CP. Higher risk of urothelial carcinoma in the 333 upper urinary tract than in the urinary bladder in hemodialysis patients. Ren Fail. 2016;38(5):663-70. 334 4. Roupret M, Babjuk M, Comperat E, Zigeuner R, Sylvester RJ, Burger M, et al. European Association of 335 Urology Guidelines on Upper Urinary Tract Urothelial Carcinoma: 2017 Update. European urology. 336 2018;73(1):111-22. 337 5. Singla N, Fang D, Su X, Bao Z, Cao Z, Jafri SM, et al. A Multi-Institutional Comparison of 338 Clinicopathological Characteristics and Oncologic Outcomes of Upper Tract Urothelial Carcinoma in China 339 and the United States. J Urol. 2017;197(5):1208-13. 340 6. Grollman AP. Aristolochic acid nephropathy: Harbinger of a global iatrogenic disease. Environmental 341 and molecular mutagenesis. 2013;54(1):1-7. 342 7. Poon SL, Pang ST, McPherson JR, Yu W, Huang KK, Guan P, et al. Genome-wide mutational 343 signatures of aristolochic acid and its application as a screening tool. Sci Transl Med. 2013;5(197):197ra01. 344 8. Hoang ML, Chen CH, Sidorenko VS, He J, Dickman KG, Yun BH, et al. Mutational signature of 345 aristolochic acid exposure as revealed by whole-exome sequencing. Sci Transl Med. 2013;5(197):197ra02. 346 9. Ng AWT, Poon SL, Huang MN, Lim JQ, Boot A, Yu W, et al. Aristolochic acids and their derivatives 347 are widely implicated in liver cancers in Taiwan and throughout Asia. Science translational medicine. 348 2017;9(412). 349 10. Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational signatures in human cancers. 350 Nature reviews Genetics. 2014;15(9):585-98. 351 11. Secrier M, Li X, de Silva N, Eldridge MD, Contino G, Bornschein J, et al. Mutational signatures in 352 esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance. Nat Genet. 353 2016;48(10):1131-41. 354 12. Schulze K, Imbeaud S, Letouze E, Alexandrov LB, Calderaro J, Rebouissou S, et al. Exome sequencing 355 of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets. Nat 356 Genet. 2015;47(5):505-11. 357 13. Moss TJ, Qi Y, Xi L, Peng B, Kim TB, Ezzedine NE, et al. Comprehensive Genomic Characterization of 358 Upper Tract Urothelial Carcinoma. European urology. 2017. 359 14. Audenet F, Isharwal S, Cha EK, Donoghue MTA, Drill EN, Ostrovnaya I, et al. Clonal Relatedness and 360 Mutational Differences between Upper Tract and Bladder Urothelial Carcinoma. Clinical cancer research : an 361 official journal of the American Association for Cancer Research. 2018. 362 15. Moss TJ, Qi Y, Xi L, Peng B, Kim TB, Ezzedine NE, et al. Comprehensive Genomic Characterization of 363 Upper Tract Urothelial Carcinoma. European urology. 2017;72(4):641-9. 364 16. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of

15

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

365 mutational processes in human cancer. Nature. 2013;500(7463):415-21. 366 17. Blokzijl F, Janssen R, van Boxtel R, Cuppen E. MutationalPatterns: comprehensive genome-wide 367 analysis of mutational processes. Genome Med. 2018;10(1):33. 368 18. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, et al. Clock-like mutational 369 processes in human somatic cells. Nat Genet. 2015;47(12):1402-7. 370 19. Grollman AP, Shibutani S, Moriya M, Miller F, Wu L, Moll U, et al. Aristolochic acid and the etiology of 371 endemic (Balkan) nephropathy. Proc Natl Acad Sci U S A. 2007;104(29):12129-34. 372 20. Yang HY, Wang JD, Lo TC, Chen PC. Increased risks of upper tract urothelial carcinoma in male and 373 female chinese herbalists. Journal of the Formosan Medical Association = Taiwan yi zhi. 2011;110(3):161-8. 374 21. Zhong W, Zhang L, Ma J, Shao S, Lin R, Li X, et al. Impact of aristolochic acid exposure on oncologic 375 outcomes of upper tract urothelial carcinoma after radical nephroureterectomy. OncoTargets and therapy. 376 2017;10:5775-82. 377 22. Kikuchi E, Margulis V, Karakiewicz PI, Roscigno M, Mikami S, Lotan Y, et al. Lymphovascular invasion 378 predicts clinical outcomes in patients with node-negative upper tract urothelial carcinoma. Journal of clinical 379 oncology : official journal of the American Society of Clinical Oncology. 2009;27(4):612-8. 380 23. Lee J, Lee AJ, Lee JK, Park J, Kwon Y, Park S, et al. Mutalisk: a web-based somatic MUTation AnaLyIS 381 toolKit for genomic, transcriptional and epigenomic signatures. Nucleic Acids Res. 2018;46(W1):W102-W8. 382 24. Adalsteinsson VA, Ha G, Freeman SS, Choudhury AD, Stover DG, Parsons HA, et al. Scalable whole- 383 exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat Commun. 384 2017;8(1):1324. 385 25. Colin P, Koenig P, Ouzzane A, Berthon N, Villers A, Biserte J, et al. Environmental factors involved in 386 of urothelial cell carcinomas of the upper urinary tract. BJU Int. 2009;104(10):1436-40. 387 26. Vanharanta S, Massague J. Field cancerization: something new under the sun. Cell. 388 2012;149(6):1179-81. 389 27. Yarchoan M, Hopkins A, Jaffee EM. Tumor Mutational Burden and Response Rate to PD-1 Inhibition. 390 The New England journal of medicine. 2017;377(25):2500-1. 391 28. Huh JW, Lee JH, Kim HR. Prognostic significance of tumor-infiltrating lymphocytes for patients with 392 colorectal cancer. Archives of surgery. 2012;147(4):366-72. 393 29. Thomas NE, Busam KJ, From L, Kricker A, Armstrong BK, Anton-Culver H, et al. Tumor-infiltrating 394 lymphocyte grade in primary melanomas is independently associated with melanoma-specific survival in the 395 population-based genes, environment and melanoma study. Journal of clinical oncology : official journal of 396 the American Society of Clinical Oncology. 2013;31(33):4252-9. 397 30. Lipponen PK, Eskelinen MJ, Jauhiainen K, Harju E, Terho R. Tumour infiltrating lymphocytes as an 398 independent prognostic factor in transitional cell . European journal of cancer. 399 1992;29A(1):69-75. 400 31. Gataa I ML, Auclin E, Le Moulec S, Alemany P, Kossai M, Massé J , CARAMELLA C, Remon Masip J, 401 Lahmar J, Ferrara R, Gazzah A, Soria JC, Planchard D, Besse B, Adam J. . 112P Pathological evaluation of 402 tumor infiltrating lymphocytes and the benefit of nivolumab in advanced non-small cell lung cancer (NSCLC). 403 Annals of Oncology. 2017;28(suppl_5). 404 32. KreLL D SJ. Aristolochia: the malignant truth. Lancet Oncology. 2013;14(1):25-6. 405 33. Hsieh SC, Lin IH, Tseng WL, Lee CH, Wang JD. Prescription profile of potentially aristolochic acid 406 containing Chinese herbal products: an analysis of National Health Insurance data in Taiwan between 1997 407 and 2003. Chinese medicine. 2008;3:13. 408 34. Du Y, Li R, Chen Z, Wang X, Xu T, Bai F. Mutagenic Factors and Complex Clonal Relationship of 409 Multifocal Urothelial Cell Carcinoma. European urology. 2016. 410 35. Kumar A, White TA, MacKenzie AP, Clegg N, Lee C, Dumpit RF, et al. Exome sequencing identifies a 411 spectrum of mutation frequencies in advanced and lethal prostate cancers. Proceedings of the National

16

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

412 Academy of Sciences of the United States of America. 2011;108(41):17087-92. 413 36. Greer JB, Whitcomb DC. Role of BRCA1 and BRCA2 mutations in pancreatic cancer. Gut. 414 2007;56(5):601-5. 415 37. Pearl LH, Schierz AC, Ward SE, Al-Lazikani B, Pearl FM. Therapeutic opportunities within the DNA 416 damage response. Nat Rev Cancer. 2015;15(3):166-80. 417 38. Li WH, Yang L, Su T, Song Y, Li XM. [Influence of taking aristolochic acid-containing Chinese drugs 418 on occurrence of urinary transitional cell cancer in uremic uremic patients undergoing dialysis]. Zhonghua yi 419 xue za zhi. 2005;85(35):2487-91. 420 39. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and 421 copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568-76. 422 40. Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, et al. VarDict: a novel and 423 versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res. 424 2016;44(11):e108. 425 41. Cleary JG, Braithwaite R, Gaastra K, Hilbush BS, Inglis S, Irvine SA, et al. 2015. 426 42. Li H. Toward better understanding of artifacts in variant calling from high-coverage samples. 427 Bioinformatics. 2014;30(20):2843-51. 428 43. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high- 429 throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. 430 44. Baslan T, Kendall J, Rodgers L, Cox H, Riggs M, Stepansky A, et al. Genome-wide copy number 431 analysis of single cells. Nat Protoc. 2012;7(6):1024-41. 432 45. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational 433 heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214-8. 434 46. Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, et al. Universal Patterns of 435 Selection in Cancer and Somatic Tissues. Cell. 2017;171(5):1029-41.e21. 436 47. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 437 111 reference human epigenomes. Nature. 2015;518(7539):317-30. 438 48. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of 439 Somatic Mutations In Cancer. Nucleic acids research. 2018. 440 49. Endesfelder D, Burrell RA, Kanu N, McGranahan N, Howell M, Parker PJ, et al. Chromosomal 441 Instability Selects Gene Copy-Number Variants Encoding Core Regulators of Proliferation in ER+Breast 442 Cancer. Cancer Research. 2014;74(17):4853-63. 443 50. Salipante SJ, Scroggins SM, Hampel HL, Turner EH, Pritchard CC. Microsatellite instability detection 444 by next generation sequencing. Clin Chem. 2014;60(9):1192-9. 445 51. Niu B, Ye K, Zhang Q, Lu C, Xie M, McLellan MD, et al. MSIsensor: microsatellite instability detection 446 using paired tumor-normal sequence data. Bioinformatics. 2014;30(7):1015-6. 447 52. Song Y, Li L, Ou Y, Gao Z, Li E, Li X, et al. Identification of genomic alterations in oesophageal 448 squamous cell cancer. Nature. 2014;509(7498):91-5. 449 53. Ka S, Lee S, Hong J, Cho Y, Sung J, Kim HN, et al. HLAscan: genotyping of the HLA region using 450 next-generation sequencing data. BMC Bioinformatics. 2017;18(1):258. 451 54. Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: Improved Peptide- 452 MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol. 453 2017;199(9):3360-8. 454 55. Hendry S, Salgado R, Gevaert T, Russell PA, John T, Thapa B, et al. Assessing Tumor-Infiltrating 455 Lymphocytes in Solid Tumors: A Practical Review for Pathologists and Proposal for a Standardized Method 456 from the International Immuno-Oncology Biomarkers Working Group: Part 2: TILs in Melanoma, 457 Gastrointestinal Tract Carcinomas, Non-Small Cell Lung Carcinoma and Mesothelioma, Endometrial and 458 Ovarian Carcinomas, Squamous Cell Carcinoma of the Head and Neck, Genitourinary Carcinomas, and

17

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

459 Primary Brain Tumors. Advances in anatomic pathology. 2017;24(6):311-35. 460 56. Wang Y, Song F, Zhu J, Zhang S, Yang Y, Chen T, et al. GSA: Genome Sequence Archive. 461 Genomics, proteomics & bioinformatics. 2017;15(1):14-8. 462 57. Members BIGDC. Database Resources of the BIG Data Center in 2018. Nucleic Acids Res. 463 2018;46(D1):D14-D20. 464 465 466

467 468 Fig. 1. Oncoplot of genomic alterations in UTUC. (A) The number of coding mutations were 469 represented by barplots. (B) Selected patient clinical features were included. (C) Cancer consensus 470 frequently mutated genes and mutation types were indicated in the legend. (D) Selected loci with 471 high recurrence rate among patients were shown. Copy number gain/loss and 472 amplification/deletion were inferred as genes with GISTIC scores of 1 and 2, respectively. BLCA: 473 synchronous bladder cancer, bladder recurrence and bladder cancer history. AA: aristolochic acid. 474 CKD: chronic kidney disease. 475 476

18

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

477 478 Fig. 2. Genomic subtypes defined by mutational signatures. (A) Bar plot of the number of 479 SNVs attributable to 9 merged signatures in each of the 90 tumors, sorted by hierarchical clustering 480 (dendrogram at bottom), revealing AA-related (AA, yellow), Age-related (Age, orange), and DNA 481 damage repair deficient (DSB, green). Selected clinical features were represented in the bottom 482 tracks. (B) The box plot showed the number of signature 22 mutations in tumors within each 483 signature subtype. *P<0.05; **P<0.01; ***P< 0.001 (Wilcoxon rank test). (C) The box plot 484 showed the number of signatures 1&5 mutations in tumors within each signature subtype. *P<0.05; 485 **P<0.01; ***P< 0.001 (Wilcoxon rank test). (D) The box plot showed the number of signature 3 486 mutations in tumors within each signature subtype. *P<0.05; **P<0.01; ***P< 0.001 (Wilcoxon 487 rank test). (E) The bar graph showed the association between three subtypes and clinicopathologic 488 features. *P<0.05; **P<0.01; ***P< 0.001 (Kruskal-Wallis test). (F) Comparison of the weighted 489 Genome Instability Index (wGII) scores among three subtypes. (G) Microsatellite instability (MSI) 490 status was evaluated by fraction of unstable microsatellite loci identified by WGS data. P-values

19

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

491 were calculated by Wilcoxon rank test (*P<0.05, **P< 0.01, ***P< 0.001). 492

493 494 Fig. 3. Subtypes defined by mutational signatures predict clinical outcome. (A)-(D)Kaplan- 495 Meier survival curves showed the mutational signature subtypes can predict both CSS and MFS 496 for the whole cohort, as well as for muscle-invasive UTUC patients. CSS: cancer-specific survival. 497 MFS: metastasis-free survival. P-values were calculated by the log-rank test. n, the number of 498 cases. 499 500

20

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

501

502 503 Fig. 4. Field cancerization may contribute to malignant transformation especially for the AA 504 subtype. (A) Laser capture microdissection of urothelium from a representative patient. Magnified 505 insets (13x) of the black frame area showed ureteral urothelium before laser capture and the 506 remaining cells after laser capture. (B) Trinucleotide contexts for somatic mutations in biopsies 507 from three patients of the AA subtype. The height of each bar (the y-axis) represented the 508 proportion of all observed mutations that fall into a particular trinucleotide mutational class. AA:

21

bioRxiv preprint doi: https://doi.org/10.1101/735324; this version posted August 15, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

509 aristolochic acid; UU: ureter urothelium; BU: bladder urothelium; RC: renal cortex; BT: bladder 510 tumor; UT: urothelium tumor; and PT: pelvis tumour. (C) Phylogenetic relationships of the six 511 samples from the multifocal patient were deciphered using mrbayes_3.2.2. Spatial locations of 512 core biopsies of the patient were presented in the left panel. 513 514 515

516 517 Fig. 5. Predicted neoantigens and tumor-infiltrating lymphocytes in each mutational subtype. 518 (A) Neoantigen burden was significantly higher in the AA subtype. Statistical significance was 519 determined by Wilcoxon rank test (***P<0.001). (B) Positive correlation of the percentage of 520 stromal tumor-infiltrating mononuclear cells (TIMCs) and the number of CD3+ lymphocytes in 76 521 UTUC patients of our cohort (nAA=23; nAge=32; nDSB=21). The percentage of stromal TIMCs (C) 522 and CD3+ lymphocytes (D) were shown in each subtype of patients. Statistical significance was 523 determined by the Wilcoxon rank test (**P<0.01, ***P<0.001). HP represented high-power field. 524 Images of TIMCs and CD3+ lymphocytes of a representative patient from the AA subtype (E) and 525 the Age subtype (F) Triangle highlighted the TIMCs or CD3+ lymphocytes in stromal tumour 526 region. Arrow highlighted the TIMCs or CD3+ lymphocytes in intra-tumour region. 527 528

22