Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Common and rare sequence variants influencing tumor biomarkers in blood

Sigurgeir Olafsson1, Kristjan F. Alexandersson1, Johann G.K. Gizurarson1, Katrin Hauksdottir1, Orvar Gunnarsson2, Karl Olafsson3, Julius Gudmundsson1, Simon N. Stacey1, Gardar Sveinbjornsson1, Jona Saemundsdottir1, Einar S. Bjornsson4,5, Sigurdur Olafsson4,6, Sigurdur Bjornsson4,7, Kjartan B. Orvar4,7, Arnor Vikingsson8,9, Arni J. Geirsson8,10,11, Sturla Arinbjarnarson12, Gyda Bjornsdottir1, Thorgeir E. Thorgeirsson1, Snaevar Sigurdsson1, Gisli H. Halldorsson1, Olafur T. Magnusson1, Gisli Masson1, Hilma Holm1, Ingileif Jonsdottir1,5, Olof Sigurdardottir13, Gudmundur I. Eyjolfsson11, Isleifur Olafsson14, Patrick Sulem1, Unnur Thorsteinsdottir1,5, Thorvaldur Jonsson5,15, Thorunn Rafnar1, Daniel F. Gudbjartsson1,16*, Kari Stefansson1,5,.

1deCODE genetics/AMGEN, Reykjavik, Iceland 2Department of Oncology, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 3Department of Obstetrics and Gynecology, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 4Department of Medicine, Landspitali – The National University Hospital of Iceland, Reykjavik, Iceland 5Faculty of Medicine, University of Iceland, Reykjavik, Iceland 6Division of Gastroenterology and Hepatology, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 7The Medical Center, Glaesibae, Reykjavik, Iceland 8Department of Medicine, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 9Thraut Fibromyalgia Clinic, Reykjavik, Iceland 10Center for Rheumatology Research, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 11Icelandic Medical Center (Laeknasetrid), Laboratory in Mjodd (RAM), Reykjavik, Iceland 12The Laboratory of the Medical Clinic Glaesibae, Reykjavik, Iceland 13Department of Clinical Biochemistry, Akureyri Hospital, Akureyri, Iceland 14Department of Clinical Biochemistry, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 15Department of Surgery, Landspitali, The National University Hospital of Iceland, Reykjavik, Iceland 16School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.

*To whom correspondence should be addressed. [email protected]. Address: Sturlugata 8, 101 Reykjavik, Iceland. Tel: +354 570-1900

1

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Running title: GWAS of tumor biomarkers

Keywords: Genome-wide association study, Cancer Antigens, Biomarkers, Alkaline phosphatase, Alpha-fetoprotein, Cancer Antigen 15.3, Cancer Antigen 19.9, Cancer Antigen 125, Carcinoembryonic antigen, Cancer registry

Funding

This study was funded by deCODE Genetics/Amgen and supported in part by the National

Institute of Dental and Craniofacial Research of the National Institutes of Health, under award number R01DE022905, awarded to Dr. Kari Stefansson, [email protected].

Conflict of interest

The authors that are affiliated with deCODE are employees of deCODE genetics/Amgen Inc. The other authors declare no conflict of interest.

Nr of tables: 3

Nr of figures: 2

Words in abstract: 241

Word count (excluding abstract, acknowledgements, section headers and references): 3421

2

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Abstract

2 Background: Alpha-fetoprotein, cancer antigens 15.3, 19.9, 125, carcinoembryonic antigen and

3 alkaline phosphatase are widely measured in attempts to detect cancer and to monitor

4 treatment response. However, due to lack of sensitivity and specificity, their utility is debated.

5 The serum levels of these markers are affected by a number of non-malignant factors, including

6 genotype. Thus, it may be possible to improve both sensitivity and specificity by adjusting test

7 results for genetic effects.

8 Methods: We performed genome-wide association studies of serum levels of alpha-fetoprotein

9 (N = 22,686), carcinoembryonic antigen (N = 22,309), cancer antigens 15.3 (N = 7,107), 19.9 (N =

10 9,945) and 125 (N = 9,824), and alkaline phosphatase (N = 162,774). We also examined the

11 correlations between levels of these biomarkers and the presence of cancer, using data from a

12 nation-wide cancer registry.

13 Results: We report a total of 84 associations of 79 sequence variants with levels of the six

14 biomarkers, explaining between 2.3 and 42.3% of the phenotypic variance. Among the 79

15 variants, 22 are cis (in- or near the encoding the biomarker), 18 have minor allele

16 frequency less than 1%, 31 are coding variants and 7 are associated with in

17 whole blood. We also find multiple conditions associated with higher biomarker levels.

18 Conclusions: Our results provide insights into the genetic contribution to diversity in

19 concentration of tumor biomarkers in blood.

20 Impact: Genetic correction of biomarker values could improve prediction algorithms and

21 decision-making based on these biomarkers.

3

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

22

23 Introduction

24 Tumor biomarkers are substances or processes that can indicate the presence of cancer [1].

25 Several tumor biomarkers are in clinical use for monitoring therapy but all lack the sensitivity

26 and specificity to be used for screening. However, recent advances in the detection of

27 circulating tumor DNA suggest that multi-analyte blood tests that combine an assay of

28 somatically mutated DNA (“liquid biopsy”) and protein and carbohydrate biomarkers in serum

29 have the potential to both find early cancer and to help determine its site of origin [2].

30 In this work, we focused on six commonly measured biomarkers, namely alpha-

31 fetoprotein (AFP), carcinoembryonic antigen (CEA), cancer antigens (CA) 15.3, 19.9 and 125,

32 and alkaline phosphatase (ALP). Measured in serum, these biomarkers are frequently used to

33 monitor status of disease, response to therapy and recurrence [1]. AFP is used as a biomarker

34 of hepatocellular carcinoma (HCC), endodermal sinus tumor of the ovary and non-seminoma

35 testicular germ cell tumors (TGCT) [3]. CEA has been used as a biomarker for colorectal cancer

36 [4]. CA-15.3 and CA-125 are mainly used as biomarkers of cancers of the breast and ovary,

37 respectively [5, 6], and CA-19.9 is used as a biomarker for pancreatic cancer [7]. We also include

38 ALP in our analysis because its levels are commonly elevated in cancers of the liver and bone

39 and when other cancers metastasize to these tissues [8]. However, the measurement of ALP in

40 serum is one of the most common blood tests ordered and we recognize that there are many

41 reasons for ALP measurements other than suspicion of- or monitoring of neoplasms.

42 Despite widespread use of these biomarkers in clinical practice, their low sensitivity and

43 specificity continue to cause controversy over their use [9-11]. As their levels are partially

4

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

44 determined by genetic factors, one approach to improve their sensitivity and specificity would

45 be to define “normal” values based on age, sex and genotype [2, 9]. We have previously

46 reported how genetic correction for variants affecting levels of prostate specific antigen (PSA)

47 results in personalized PSA cutoff value, which is more informative than a general cutoff value

48 when deciding to perform a prostate biopsy [12].

49 The main goal of this study is to perform a genome-wide association study (GWAS) of

50 the levels of all six tumor biomarkers to identify sequence variants that affect baseline

51 biomarker levels, regardless of cancer diagnosis. We also describe the associations of the six

52 tumor biomarker levels with various cancer diagnoses, obtained from a nation-wide cancer

53 registry, and for comparison, with four non-neoplastic diseases.

54

55 Materials and Methods 56 Cancer diagnoses, including the date of diagnosis, were extracted from the Icelandic cancer

57 registry (ICR) (http://www.cancerregistry.is), which contains all diagnoses of solid cancers made

58 in in the country from January 1st 1955 to December 31st 2015 [13]. We also assessed four

59 other diseases; inflammatory bowel disease (IBD), liver cirrhosis and pancreatitis, because

60 these diseases are associated with inflammation in the gastrointestinal organs and

61 fibromyalgia, because patients present with diverse symptoms and often undergo

62 measurements for tumor biomarkers as part of a lengthy diagnosis journey. Tumor biomarker

63 measurements were made in Icelandic laboratories from 1990 to 2015 and linked with disease

64 diagnoses on the basis of encrypted social security numbers.

65 This study was approved by the National Bioethics Committee of Iceland (reference

66 numbers VSNb2006010014/03.12, 06-007-V3 and VSNb201501033/03.12). A further

5

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

67 description of subject recruitment, phenotyping, genotyping and imputation is available in the

68 supplementary methods.

69 70 Genome wide association study 71 72 To identify genetic variants associated with baseline biomarker values, we performed a GWAS

73 of all available data, including both cancer patients and individuals without cancer diagnosis.

74 When multiple measurements of a biomarker were available for a subject, we used the earliest

75 value recorded. We found this approach to be the most powerful as exclusion of cancer

76 patients resulted in great loss of power, for residual associations (secondary, tertiary variants

77 etc) in particular. The first measurement was used as this meant that measurements taken

78 months or years before cancer diagnosis were available for a subset of the cancer patients.

79 As indicated in Table 1 the biomarkers all show extremely right-skewed distributions.

80 The data contains a number of extreme outliers but no trends were observed to link these with

81 date of measurement, age of the subject or even cancer type (Supplementary

82 figure 1).

83 We performed a rank-based inverse normal transformation adjusting for age at

84 measurement and time to death for deceased subjects for each gender separately. Adjustment

85 for time to death was performed as we have observed large changes close to the time of death

86 for many quantitative measurements. In this case, a high biomarker value shortly before death

87 might indicate a high tumor burden. We tested the association between biomarker value and

88 genotype by a generalized form of linear regression. To assess significance of primary

89 associations, we used different P value thresholds depending on the annotation class of the

90 variant as described in Sveinbjornsson et al [14]. We consider loss-of-function variants

6

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

91 (frameshifts, stop codon gained/lost, initiator codon variants and splice acceptor/donor

92 variants) significant at 3.6 × 10-7, in-frame insertions/deletions, missense and splice region

93 variants at 7.4 × 10-8, synonymous variants, up/downstream variants, variants that resulted in a

94 stop-codon being retained, and variants in 3’/5’ untranslated regions at 5.3 × 10-9, intronic and

95 intergenic variants within DNase hypersensitivity sites at 3.3 × 10-9 and intergenic and intronic

96 variants outside DNase hypersensitivity sites at 1.1 × 10-9. We used the Variant Effect Predictor

97 (VEP) release 80 [15] to annotate variants, considering only protein-coding transcripts from

98 RefSeq release 67 [16].

99 Loci harboring variants associated by these criteria underwent further analyses to check

100 for the presence of other, independent variants affecting the trait. While any variant passed the

101 significance threshold at a locus, we tested all variants flanking the primary signal by 1-13 Mb,

102 depending on the strength of association and the recombination pattern at the locus, by

103 sequentially adding the top variant from previous steps as covariate in the regression. The use

104 of wide windows was occasionally necessary to avoid spurious associations arising from very

105 low, but not negligible, LD between an extremely significant variant and distant variants.

106 Residual associations were generally close to the primary signal, as can be seen in Tables 2 and

107 3. We considered significant variants that passed a simple Bonferroni correction for the number

108 of variants in the respective region. This process was repeated, adding the top variant as

109 covariate for the next round until no significant associations remained. The gene expression

110 and co-localization analyses are described in the supplementary methods.

111 112 Comparison of biomarkers across diseases 113

7

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

114 Many of the subjects have undergone several biomarker measurements, prior to, during and/or

115 after being diagnosed with cancer, with most measurements being from a few months before

116 official diagnosis date or later. Other subjects have undergone measurements without being

117 diagnosed with cancer. The latter subjects show a trend for being younger than the cancer

118 patients. Supplementary figure 2 shows the age distribution of each subject group for each

119 biomarker. Some subjects have had more than one type of cancer. In this case, the first

120 diagnosis was used.

121 We used a two-sided Wilcoxon rank-sum test to compare the highest values of

122 biomarkers recorded for individuals with specific diseases to the highest values of subjects

123 without cancer diagnosis (or diagnosis of any of the other diseases, Figure 2). We considered

124 significant P-values that passed Bonferroni correction for 144 tests (6 biomarkers x 24 disease

125 conditions).

126 The cohorts for each biomarker moderately overlap. Supplementary figure 3A shows the

127 overlap among subjects measured for AFP, CA-15.3, CA-19.9, CA-125 and CEA. Almost all

128 subjects also had ALP measured. Supplementary figure 3B shows the overlap among extreme

129 outliers, defined as subjects having biomarker values >1000 for AFP, CA-15.3, CA-125 and CEA

130 and >5000 for CA-19.9.

131

132 Results

133 Sequence variants associated with biomarker levels

134 We performed a GWAS on biomarker levels with subjects ranging from 7,107 to 162,774 (Table

135 1) and identified 84 associations between sequence variants and biomarker levels. We found 3

8

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

136 sequence variants associated with AFP levels, 6 with CA-15.3, 9 with CA-125, 4 with CA-19.9, 11

137 with CEA and 51 with ALP (Figure 1, Tables 2 and 3). To assess if any of the identified variants

138 associates with risk of developing cancer, we tested the association between all the variants

139 and the diagnosis of 20 cancer types in the Icelandic material. With the exception of the

140 association of rs760077 with gastric cancer described below, none of the variants associates

141 with the risk of developing cancer in our cohorts (P > 1 × 10−6, Supplementary figure 6). We

142 also assessed the effect of the variants on gene expression in whole blood in a sample of 2,528

143 Icelanders who were whole-genome RNA sequenced (Table 3). Lastly, we performed a co-

144 localization analysis to link the identified variants with a range of biochemical traits for which

145 summary statistics are available at deCODE Genetics (Supplementary Table 2). We summarize

146 the results for each biomarker below.

147

148 Alpha-fetoprotein (AFP)

149 We found three sequence variants associated with AFP levels that collectively explain 2.3% of

150 its variance. Two are independent missense variants in the SERPINA1 gene on 14

151 (rs28929474-T (Glu366Lys, allele Z) and rs17580- A (Glu288Val, allele S)) and one is a common

152 intergenic variant at the same locus. SERPINA1 encodes the serine protease inhibitor 훼-1-

153 antitrypsin and we found the minor alleles of both missense variants to be associated with

154 decreased quantity of 훼-1-antitrypsin in blood in 6,452 Icelandic subjects for which this

155 measurement was available (훽 = −1.71 푆퐷, 푃 = 1.0 × 10−83 and 훽 = −0.61 푆퐷, 푃 = 2.0 ×

156 10−38 for rs28929474-T and rs17580-A, respectively). Reduced levels of this protein are known

157 to cause emphysema, cirrhosis and hepatocellular carcinoma [17]. The same alleles that reduce

9

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

158 SERPINA1 also associate with reduced levels of AFP. Additionally, rs28929474-T associates with

159 higher levels of three liver enzymes [18] (Supplementary Table 2).

160 It is of note that we find no association of variants within the AFP gene or its promoter

161 with serum levels of AFP. We replicate neither of the two variants reported to associate with

162 AFP levels in the Chinese population [19] (P = 0.84 for rs12506899 and P = 0.01 with effect in

163 the opposite direction for rs2251844).

164

165 Cancer Antigen 15.3 (CA-15.3)

166 We found six sequence variants associated with CA-15.3 levels that collectively explain 42.3% of

167 its variance. Five of these variants are within 1 Mb from the MUC1 gene on ,

168 which encodes CA-15.3, and one lies within an intron of the ABO gene on chromosome 9. A

169 single missense variant in MTX1, rs760077-A, explains 33.5% of the trait variance. We have

170 previously shown that rs760077-A is associated with higher number of tandem repeats within

171 exon 2 of MUC1, and protection against gastric cancer [20]. It is likely that the increased

172 number of epitopes on Muc1 leads to higher apparent CA-15.3 levels in the carriers. A second

173 variant at the locus, rs41264915-G, is associated with decreased levels of MUC1 mRNA in blood

174 (훽 = −0.33 푆퐷, 푃 = 1.5 × 10−16), a surprising finding given that the same allele is

175 associated with elevated CA-15.3 levels.

176

177 Cancer Antigen 125 (CA-125)

178 We found nine sequence variants associated with CA-125 levels that collectively explain 10.5%

179 of its variance. One variant is within an intron of the MUC16 gene on chromosome 19, which

10

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

180 encodes CA-125. One of the variants lies upstream of the MSLN gene on chromosome 16, which

181 encodes mesothelin, a cell surface molecule known to bind CA-125 on the mesothelial lining

182 [21]. We further found two low frequency missense variants rs9927389 and rs150425699

183 (Ala72Val and Arg557His, respectively) in MSLN, suggesting that the observed effect on CA-125

184 levels is mediated through structural changes in mesothelin.

185 Five variants are in- or close to- GAL3ST2 on chromosome 2. Of these, three are low

186 frequency (MAF<2%) missense variants (Pro143Leu, Leu383Pro and Tyr153Cys), implicating the

187 encoded galactose-3-O-sulfotransferase in regulation of CA-125 levels. It is presently not clear

188 how this enzyme affects CA-125 levels. We note that the binding site of the antibody used for

189 CA-125 detection has not been reported by the antibody’s producer and speculate that variants

190 in GAL3ST2 may influence the glycosylation of Mucin16 and thus affect the outcome of CA-125

191 measurements but not necessarily the quantity of the protein in blood. Notably, these variants

192 explain a larger fraction of the variance than does the cis-variant in MUC16 (Table 2).

193

194 Cancer Antigen 19.9 (CA-19.9)

195 We found four sequence variants associated with CA-19.9 levels that collectively explain 27.4%

196 of its variance. Some are moderately correlated with variants found to associate with CA-19.9

197 levels in the Chinese population [19]. CA-19.9 levels are defined by antibody binding to the cell

198 surface carbohydrate Sialyl-Lewis A and all the associated variants implicate known to

199 function in the production and secretion of this molecule. We found a stop-gained variant,

200 rs601338, in FUT2 (secretor gene of the Lewis antigen pathway), a variant upstream of FUT6,

201 rs708686, and a variant in an intron of the FUT3 (Lewis gene), rs2608894, to associate with CA-

11

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

202 19.9 levels. Additionally, a variant upstream of B3GNT3, rs34262244-A, which greatly affects

203 the expression of the gene (훽 = −1.04, 푃 = 1.3 × 10−296), associate with CA-19.9. B3GNT3

204 encodes UDP-GlcNAc-훽-Gal 훽-1,3-N-acetlyglucosaminyltransferase 3, which plays a role in the

205 formation of the backbone of sulfo-sialyl Lewis X tetrasaccaride structures [22].

206 The signals characterized by rs601338 (stop-gained in FUT2) and rs708686 (upstream of

207 FUT6) co-localize with biochemical traits in blood (Supplementary Table 2). Among these are

208 associations of rs708686-T with levels of Galactoside 3(4)-L-fucosyltransferase (훽 =

209 −0.84 푆퐷, 푃 = 2.5 × 10−22), an effect consistent with reduced levels of CA19.9), and

210 replicated associations of rs601338 with lipase levels previously reported [23].

211

212 Carcinoembryonic antigen (CEA)

213 Eleven sequence variants were associated with CEA levels in our study, explaining 11.9% of its

214 variance. Again, some show moderate correlations with variants associated with CEA levels in

215 the Chinese population [19]. Five of the variants are within 250kb from CEACAM5 gene on

216 chromosome 9, which encodes CEA. The variants rs601338 and rs708686 in FUT2 and upstream

217 of FUT6, respectively, that associate with CA-19.9 levels also associate with CEA but the effect

218 of rs708686 is in the opposite direction to that on CA-19.9.

219 We also observed an association with an intergenic variant, rs7041150, and two

220 independent variants upstream and downstream of ABO. One of these, rs10901252-C,

221 associates with increased expression of ABO in whole blood (훽 = 1.61 푆퐷, 푃 = 1.8 ×

222 10−210 ). The variants in ABO co-localize with several blood-related traits such as hematocrit,

12

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

223 hemoglobin and granulocyte levels [24] and with cholesterol levels [25] (Supplementary Table

224 2).

225

226 Alkaline phosphatase (ALP)

227 We identified 51 variants associated with ALP levels, collectively explaining 6.9% of its variance.

228 We confirm 13 of 14 associations reported in a recent study of ALP in European populations

229 [26], which partially overlaps with our material. Among these are several associations with the

230 ALPL gene on chromosome 1, driven by familial clusters carrying very rare coding variants

231 (Table 2). The remaining variants are scattered throughout the genome, with variants in - or

232 near genes involved in glycoprotein biology (ABO, FUT2, ST3GAL4, GPLD1, ASGR1) and

233 carbohydrate and lipid metabolism (JMJD1C, TREH, HNF1A, B4GALNT3, PCK1, DHRS9, GCKR)

234 being prominent. Many of these variants also co-localize with other biochemical traits, such as

235 triglyceride concentrations [25] (Supplementary Table 2).

236 Out of the 51 variants, 19 are coding and 3 associate with changes in gene expression

237 (Table 4). We found associations between rs8736-T and decreased expression of TMC4

238 (훽 = −0.97 푆퐷, 푃 < 10−300), between rs4654748-T and decreased expression of NBPF3

239 (훽 = −0.94 푆퐷, 푃 = 8.6 × 10−311) and between rs174564-G and increased expression of

240 FADS2 in whole blood (훽 = 0.23 푆퐷, 푃 = 7.9 × 10−23).

241

242 Diseases affecting biomarker levels

243 Given the lack of specificity of the cancer biomarkers, it is of interest to explore which

244 conditions are likely to affect the levels of a particular marker. To search for conditions

13

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

245 associated with elevated levels of the selected biomarkers, we compared the highest values for

246 all individuals in a patient group with subjects without cancer diagnosis and identified several

247 diseases where patients showed elevations in biomarker levels (Figure 2, Supplementary Table

248 1). We also compared the first measurements done on each individual (Supplementary Figure

249 4) as well as the proportion of the highest values falling into bins (Supplementary Figure 5)

250 defined using the reference values used at Landspitali, the National University Hospital in

251 Iceland (Table 1).

252 In general, the biomarker levels agree with the indicated clinical use of the respective

253 biomarker but there are some interesting departures from the expected. High AFP levels are

254 overwhelmingly found in HCC with increased levels also seen in TGCT and cirrhosis, a risk factor

255 for HCC. The largest CA-125 levels are strongly associated with ovarian/peritoneal and

256 pancreatic cancers as expected, but a highly significant association is also found between CA-

257 125 levels and cancers of unknown primary site and a more moderate increase in a number of

258 other malignancies. In addition to pancreatic cancer, CA-19.9 levels are also high in

259 cholangiocarcinoma (CCA), but the increase in other cancers is much less. CA-15.3 shows a

260 moderate increase in several cancer types in addition to breast cancer, which does not stand

261 out with respect to this biomarker. The levels of CEA and ALP are highest in cancers of the

262 gastrointestinal tract, but both markers show increased levels in many cancer types and ALP

263 levels are increased in all the non-cancer phenotypes tested as well.

264 265

266 Discussion 267

14

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

268 Tumor antigens are often measured in individuals with unknown ailments in search of

269 diagnostic clues, as well as being used for monitoring the progression of tumors and response

270 to treatment. In our study, we sought to identify variants that influence tumor biomarker

271 values regardless of cancer diagnosis. We did not remove cancer patients from the cohort

272 before association testing. Their inclusion adds variance to the dataset and results in more

273 conservative estimates of the variance explained by each variant. We repeated the GWAS of

274 the cancer antigens, excluding breast cancer, peritoneal and ovarian cancer, pancreatic cancer

275 and pancreatic, colorectal and gastric cancer patients for CA-15.3, CA-125, CA-19.9 and CEA,

276 respectively. We observed all the same primary associations but generally larger effect

277 estimates and higher P-values, as a result of reduced sample size. This, together with the

278 observation that only rs760077 is associated with cancer status in the Icelandic cohorts

279 (Supplementary Figure 6) suggests that these variants generally do not act through effect on

280 cancer risk or aggressiveness.

281 We report 84 associations between sequence variants and tumor biomarker levels in

282 the Icelandic population. While we are unaware of a study having been published on CA-15.3

283 and CA-125, variants affecting the other biomarkers have been reported [19, 27]. We confirmed

284 most of those associations and discover many additional variants, some of which are rare

285 variants with large effects. In an attempt to shed light on their biology, we tested all the

286 variants for association with expression of genes in their vicinity and for co-localizations with

287 biochemical traits in blood. Blood is not the most relevant tissue for any of the biomarkers and

288 we would not be able to detect tissue or cell type specific effects. Some of the variants may also

289 assert their influence on genes farther away than the 250 kb cut-off.

15

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

290 Our comparisons across cancer types show that tumor antigens are strongly associated

291 with diagnosis of several cancers while also highlighting the lack of specificity of these tests. As

292 demonstrated in Figure 2, many cancers other than those for which the biomarkers are most

293 commonly used showed significant elevation of biomarker values.

294 A limitation and a potential source of bias in our study is that subjects were not

295 randomly selected for biomarker assaying but rather measurements were done because of

296 suspicion of a particular ailment, to monitor the progression of an existing disease or for some

297 other clinical reason. In particular, the subject labeled as ‘Without cancer diagnosis’ in Figure 2

298 are individuals seeking medical assistance and biomarker levels may not reflect those found in

299 healthy individuals.

300 The utility of genetic correction in prediction models depends on the fraction of the

301 trait’s variance explained by sequence variants. The relatively high variance explained by

302 identified genetic factors, for some of the cancer antigens in particular, suggests that

303 improvements could be achieved by the inclusion of these variants in such models. A single

304 variant, rs760077, explains >33% of the variance in CA-15.3. This biomarker is generally

305 considered of low specificity but this high fraction of variance explained highlights the potential

306 for improvement by correcting values for genotype. However, CA-125 is perhaps of most

307 interest in this regard, as effective screening tools for ovarian cancer are currently lacking.

308 Recent screening trials of ovarian cancer reported no mortality benefit from screening with CA-

309 125 and trans-vaginal ultrasound [9, 28]. A retrospective study of these cohorts based on

310 genotypically corrected CA-125 levels may show greater benefit of CA-125 screening.

16

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

311 Furthermore, corrected biomarker values should be useful when combining liquid biopsies and

312 more traditional biomarkers to detect early tumors [2].

17

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Acknowledgements

The authors would like to acknowledge the work of the staff of the genotyping and informatics facilities in deCODE Genetics and of the Icelandic Cancer registry, without whom this study would not have been possible. This study was funded by deCODE Genetics/Amgen.

18

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

References

1. Ludwig JA, Weinstein JN. Biomarkers in cancer staging, prognosis and treatment selection. Nat Rev Cancer 2005;5(11):845-56. 2. Cohen JD, Li L, Wang Y, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 2018;359(6378):926-930. 3. Talerman A, Haije WG, Baggerman L. Serum alphafetoprotein (AFP) in patients with germ cell tumors of the gonads and extragonadal sites: correlation between endodermal sinus (yolk sac) tumor and raised serum AFP. Cancer 1980;46(2):380-5. 4. Thomas P, Toth CA, Saini KS, et al. The structure, metabolism and function of the carcinoembryonic antigen gene family. Biochim Biophys Acta 1990;1032(2-3):177-89. 5. Kufe DW. Mucins in cancer: function, prognosis and therapy. Nat Rev Cancer 2009;9(12):874-85. 6. Bast RC, Jr., Hennessy B, Mills GB. The biology of ovarian cancer: new opportunities for translation. Nat Rev Cancer 2009;9(6):415-28. 7. Goonetilleke KS, Siriwardena AK. Systematic review of carbohydrate antigen (CA 19-9) as a biochemical marker in the diagnosis of pancreatic cancer. Eur J Surg Oncol 2007;33(3):266- 70. 8. Bacci G, Picci P, Ferrari S, et al. Prognostic significance of serum alkaline phosphatase measurements in patients with osteosarcoma treated with adjuvant or neoadjuvant chemotherapy. Cancer 1993;71(4):1224-30. 9. Jacobs IJ, Menon U, Ryan A, et al. Ovarian cancer screening and mortality in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): a randomised controlled trial. Lancet 2016;387(10022):945-956. 10. Ballehaninna UK, Chamberlain RS. The clinical utility of serum CA 19-9 in the diagnosis, prognosis and management of pancreatic adenocarcinoma: An evidence based appraisal. J Gastrointest Oncol 2012;3(2):105-19. 11. Duffy MJ. Serum tumor markers in breast cancer: are they of clinical value? Clin Chem 2006;52(3):345-51. 12. Gudmundsson J, Besenbacher S, Sulem P, et al. Genetic correction of PSA values using sequence variants associated with PSA levels. Sci Transl Med 2010;2(62):62ra92. 13. Sigurdardottir LG, Jonasson JG, Stefansdottir S, et al. Data quality at the Icelandic Cancer Registry: comparability, validity, timeliness and completeness. Acta Oncol 2012;51(7):880-9. 14. Sveinbjornsson G, Albrechtsen A, Zink F, et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat Genet 2016;48(3):314-7. 15. McLaren W, Pritchard B, Rios D, et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 2010;26(16):2069-70. 16. Pruitt KD, Tatusova T, Brown GR, et al. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 2012;40(Database issue):D130-5. 17. Stoller JK, Aboussouan LS. A review of alpha1-antitrypsin deficiency. Am J Respir Crit Care Med 2012;185(3):246-59.

19

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

18. Prins BP, Kuchenbaecker KB, Bao Y, et al. Genome-wide analysis of health-related biomarkers in the UK Household Longitudinal Study reveals novel associations. Sci Rep 2017;7(1):11008. 19. He M, Wu C, Xu J, et al. A genome wide association study of genetic loci that influence tumour biomarkers cancer antigen 19-9, carcinoembryonic antigen and alpha fetoprotein and their associations with cancer risk. Gut 2014;63(1):143-51. 20. Helgason H, Rafnar T, Olafsdottir HS, et al. Loss-of-function variants in ATM confer risk of gastric cancer. Nat Genet 2015;47(8):906-10. 21. Rump A, Morikawa Y, Tanaka M, et al. Binding of ovarian cancer antigen CA125/MUC16 to mesothelin mediates cell adhesion. J Biol Chem 2004;279(10):9190-8. 22. Yeh JC. FM. UDP-GlcNAc: BetaGal Beta-1,3-N-Acetylglucosaminyltransferase 3 (B3GNT3). In: Taniguchi N. HK, Fukuda M., Narimatsu H., Yamaguchi Y., Angata T, (ed). Handbook of Glycosyltransferases and Related Genes. Tokyo: Springer; 2014, 295-302. 23. Weiss FU, Schurmann C, Guenther A, et al. Fucosyltransferase 2 (FUT2) non-secretor status and blood group B are associated with elevated serum lipase activity in asymptomatic subjects, and an increased risk for chronic pancreatitis: a genetic association study. Gut 2015;64(4):646-56. 24. Astle WJ, Elding H, Jiang T, et al. The Allelic Landscape of Blood Cell Trait Variation and Links to Common Complex Disease. Cell 2016;167(5):1415-1429.e19. 25. Teslovich TM, Musunuru K, Smith AV, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 2010;466(7307):707-13. 26. Chambers JC, Zhang W, Sehmi J, et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 2011;43(11):1131-8. 27. Pinsky PF, Yu K, Kramer BS, et al. Extended mortality results for ovarian cancer screening in the PLCO trial with median 15years follow-up. Gynecol Oncol 2016;143(2):270- 275.

20

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

1 Tables Biomarker Total N (% Female) Measurement Avg. age at first Median first value Median largest value Reference

subjects unit measurement (range) (range) (range) valuea

AFP 22,686 12,886 (56.8) U/ml 57 (0 - 100) 3.0 (0.4 – 385,725) 3.2 (0.4 – 385,725) <5.8

CA-15.3 7,107 6,304 (88.7) U/ml 62 (1 - 99) 17.7 (1.0 – 17,340) 20.8 (1.0 – 47,600) <25

CA-125 9,824 9,087 (92.5) U/ml 58 (1 - 103) 16.6 (0.6 – 56,820) 18.0 (0.6 – 63,062) <35

CA-19.9 9,945 5,708 (57.4) U/ml 66 (7 - 101) 16.3 (0.0 – 1,769,950) 18.9 (0.0 – 16,571,700) <31

CEA 22,309 13,095 (58.7) ng/ml 63 (0 - 103) 2.3 (0.0 – 63,962) 2.7 (0.0 – 116,069) <4.6

ALP 162,774 87,897 (54.0) U/ml 46 (0 - 114) 119.0 (0.0 – 15,825) 134.0 (5.0 – 21,668) <105

2

3 Table 1: Tumor markers and subjects used in this study. AFP: Alpha-fetoprotein; CA: Cancer antigen; CEA: Carcinoembryonic antigen;

4 ALP: Alkaline phosphatase.

5 a The reference value is that generally considered based on the respective tests used in Landspitali, the National University Hospital

6 in Iceland.

7

8

9 21

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Chromosome: MAF a b c 2 rsID Position [%] Amin Amaj Gene Annotation P PCorrected Effect R Alpha-fetoprotein -0.85 (- 1.2 8.3 0.96, - rs28929474 chr14:94378610 0.8 T C SERPINA1 Glu366Lys × 10−47 × 10−42 0.74) 0.012 Cancer antigen 15.3 0.86 < 1 < 1 (0.81, rs760077 chr1:155208991 35.4 A T MTX1 Ser63Thr × 10−300 × 10−300 0.90) 0.34 GAAA -0.33 (- 1.8 8.0 0.38, - NA chr9:133264504 19.5 G CTGCC ABO intron × 10−47 × 10−40 0.29) 0.035 Cancer antigen 125 -0.32 (- 2.9 1.3 0.36, - rs62193080 chr2:241800675 20.6 G C GAL3ST2 intron × 10−57 × 10−49 0.28) 0.033 -0.15 (- 3.2 0.18, - rs3764246 chr16:760143 23.6 G A MSLN upstream × 10−15 3.0 × 10−8 0.11) 0.008 0.14 1.8 8.2 (0.11, rs73005873 chr19:8896954 39.7 A G MUC16 intron × 10−17 × 10−10 0.17) 0.009 Cancer antigen 19.9 -0.52 (- 1.9 1.8 0.56, - rs708686 chr19:5840608 23.1 T C FUT6 upstream × 10−179 × 10−172 0.49) 0.097 -0.57 (- 1.3 1.8 0.61, - rs601338 chr19:48703417 39.3 G A FUT2 Trp154Ter × 10−291 × 10−286 0.54) 0.16

22

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

-0.13 (- 3.0 0.17, - rs34262244 chr19:17795098 28.5 A G B3GNT3 upstream × 10−13 2.8 × 10−6 0.10) 0.0070 Carcinoembryonic antigen -0.08 (- 1.8 0.10, - rs7041150 chr9:106732343 37.8 A C — intergenic × 10−12 2.7 × 10−5 0.06) 0.0030 -0.13 (- 9.4 0.16, - rs635634 chr9:133279427 13.0 T C ABO upstream × 10−16 8.8 × 10−9 0.10) 0.0039 0.10 4.6 (0.08, rs708686 chr19:5840608 23.1 T C FUT6 upstream × 10−16 4.3 × 10−9 0.13) 0.0039 0.71 5.2 3.5 (0.67, rs9621 chr19:41727239 5.6 A G CEACAM5 Gly678Arg × 10−201 × 10−195 0.76) 0.054 -0.27 (- 5.1 6.9 0.29, - rs601338 chr19:48703417 39.3 G A FUT2 Trp154Ter × 10−130 × 10−125 0.25) 0.035 Alkaline phosphatase -0.46 (- 1.4 9.5 0.49, - rs149344982 chr1:21563267 1.5 A G ALPL Arg75His × 10−157 × 10−146 0.43) 0.0061 6.1 rs1862069 chr2:169077231 49.6 A G DHRS9 upstream × 10−10 0.0057 -0.03 0.00034 4.6 rs1260326 chr2:27508073 34.1 T C GCKR Leu446Pro × 10−11 3.1 × 10−5 0.03 0.00038 -0.53 (- Frameshift 1.8 2.4 0.58, - rs573778305 chr6:24429112 0.8 C CT GPLD1 Val815 × 10−111 × 10−106 0.48) 0.0042 -0.08 (- 6.7 0.10, - rs62621812 chr7:127375029 2.5 A G ZNF800 Pro103Ser × 10−9 0.0045 0.05) 0.00029

23

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

0.06 1.7 (0.05, rs6984305 chr8:9320758 8.4 A T — intergenic × 10−16 2.5 × 10−9 0.08) 0.00057 -0.03 (- 2.5 0.03, - rs4242592 chr8:118956736 47.9 T G TNFRSF11B upstream × 10−10 0.0023 0.02) 0.00034 -0.03 (- 4.0 0.04, - rs28601761 chr8:125487789 43.0 G C — intergenic × 10−15 1.8 × 10−7 0.02) 0.00053 0.08 1.4 (0.06, rs41282145 chr9:101487225 4.3 A T TMEM246 upstream × 10−14 1.3 × 10−7 0.10) 0.00050 GAAA -0.20 (- < 1 < 1 0.21, - NA chr9:133264504 19.5 G CTGCC ABO intron × 10−300 × 10−300 0.19) 0.013 -0.05 (- 1.7 1.2 0.06, - rs1935 chr10:63168063 47.6 C G JMJD1C Glu2353Asp × 10−29 × 10−23 0.04) 0.0011 -0.03 (- 1.4 0.04, - rs10790256 chr11:118663373 22.2 T C TREH synonymous × 10−9 0.013 0.02) 0.00031 0.04 9.2 8.6 (0.03, rs10893507 chr11:126416693 48.1 A C ST3GAL4 downstream × 10−18 × 10−11 0.04) 0.00065 -0.03 (- 1.3 0.04, - rs7955258 chr12:461781 44.5 A G B4GALNT3 intron × 10−14 5.9 × 10−7 0.02) 0.00051 -0.03 (- 1.1 0.04, - rs10849087 chr12:4540899 27.1 T C C12orf4 upstream × 10−9 0.0010 0.02) 0.00031 0.03 3.4 (0.02, rs2393791 chr12:120986153 35.4 C T HNF1A intron × 10−13 5.0 × 10−6 0.04) 0.00044 -0.03 (- 7.6 0.03, - rs9533095 chr13:42394913 43.5 G T — intergenic × 10−10 0.011 0.02) 0.00033

24

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

0.20 3.4 2.3 (0.15, rs28929474 chr14:94378610 0.8 T C SERPINA1 Glu366Lys × 10−17 × 10−11 0.24) 0.00062 0.03 7.7 (0.02, rs2297066 chr14:103100498 21.9 G C EXOC3L4 Asp93Glu × 10−9 0.0052 0.04) 0.00029 0.04 4.6 (0.03, rs71391445 chr16:72171122 18.1 G GA PMFBP1 intron × 10−13 6.9 × 10−6 0.05) 0.00045 0.63 7.3 3.3 (0.57, rs186021206 chr17:7166093 0.4 A G — intergenic × 10−89 × 10−81 0.69) 0.0034 -0.03 (- 1.7 0.04, - rs5112 chr19:44927023 48.6 G C — intergenic × 10−16 7.7 × 10−9 0.03) 0.00058 0.03 TMC4 8.1 (0.03, rs8736 chr19:54173495 41.6 T C upstream × 10−16 7.6 × 10−9 0.04) 0.00056 ABHD12 -0.03 (- 6.95 0.03, - rs2500430 chr20:25298327 49.2 G A PYGB downstream × 10−10 0.0065 0.02) 0.00034 -0.12 (- 2.90 0.17, - rs41302559 chr20:57565383 0.9 A G PCK1 Arg483Gln × 10−8 0.020 0.08) 0.00026 10 aThe position is given in hg38

11 bThe P-value after a weighted Bonferroni adjustment where a different threshold was used for each functional class (see methods). 12 13 cEffect of the minor allele (Amin) is reported as standard deviations of rank-based inverse normally transformed data 14 Table 2: Primary variants within loci associating with tumor biomarker levels.

15

16

25

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Chromosome: MAF a † b c 2 rsID Position [%] Amin Amaj Gene Annotation P Effect PCorrect Effect R Covariate(s)

Alpha-fetoprotein -0.38 (- -0.38 (- 8.3 0.43, - 2.3 0.43, - rs17580 chr14:94380925 3.3 A T SERPINA1 Glu288Val × 10−38 0.32) × 10−33 0.32) 0.0091 rs28929474 0.06 0.06 7.5 (0.04, 2.1 (0.04, rs28929474, rs2402446 chr14:94399800 47.6 T A — intergenic × 10−9 0.08) × 10−5 0.08) 0.0020 rs17580 Cancer antigen 15.3 0.38 0.38 2.5 (0.34, 5.8 (0.34, rs41264915 chr1:155197995 14.2 G A THBS3 intron × 10−63 0.43) × 10−59 0.43) 0.035 rs760077 -0.38 (- -0.38 (- 1.4 0.48, - 3.2 0.48, - rs760077, rs72704117 chr1:155205298 3.0 T C THBS3 Arg102Gln × 10−14 0.28) × 10−10 0.28) 0.0083 rs41264915 rs760077,

-0.37 (- -0.37 (- rs41264915 1.4 0.48, - 3.2 0.48, - rs564968560 chr1:155614403 2.0 A G MSTO1 3'UTR × 10−11 0.27) × 10−11 0.27) 0.0055 rs72704117 rs760077,

rs41264915 0.25 0.25 2.5 (0.16, 5.8 (0.16, rs72704117, rs822493 chr1:155855602 3.3 T C SYT11 upstream × 10−8 0.33) × 10−4 0.33) 0.0038 rs564968560 Cancer antigen 125 0.91 0.91 9.6 (0.75, 2.7 (0.75, rs141828605 chr2:241803397 0.9 T C GAL3ST2 Pro143Leu × 10−29 1.07) × 10−24 1.07) 0.015 rs62193080

26

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

0.61 0.61 2.2 (0.49, 6.2 (0.49, rs62193080, rs150107870 chr2:241804117 1.9 C T GAL3ST2 Leu383Pro × 10−23 0.73) × 10−19 0.73) 0.014 rs141828605 rs62193080,

0.68 0.68 rs141828605 8.4 (0.51, 2.4 (0.51, rs139344622 chr2:241803427 0.9 G A GAL3ST2 Tyr153Cys × 10−16 0.84) × 10−11 0.84) 0.008 rs150107870 rs62193080,

rs141828605 -0.12 (- -0.12 (- 2.2 0.16, - 6.1 0.16, - rs150107870, rs5839764 chr2:241764203 38.6 G C D2HGDH intron × 10−11 0.09) × 10−7 0.09) 0.007 rs139344622 -0.46 (- -0.46 (- 2.6 0.58, - 6.7 0.58, - rs9927389 chr16:764058 1.5 T C MSLN Ala72Val × 10−13 0.34) × 10−9 0.34) 0.006 rs3764246 -0.74 (- -0.74 (- 4.3 0.98, - 7.0 0.98, - rs3764246, rs150425699 chr16:768452 0.4 A G MSLN Arg557His × 10−9 0.50) × 10−5 0.50) 0.005 rs9927389 Cancer antigen 19.9 -0.21 (- -0.21 (- 3.7 0.27, - 1.4 0.27, - rs2608894 chr19:5847989 17.1 T C FUT3 intron × 10−10 0.14) × 10−5 0.14) 0.012 rs708686 Carcinoembryonic antigen 0.12 0.12 9.4 (0.07, 3.0 (0.07, rs10901252 chr9:133252613 6.5 C G ABO downstream × 10−8 0.16) × 10−3 0.16) 0.0017 rs635634 0.10 0.10 3.8 (0.08, 6.3 (0.08, rs59654817 chr19:41709489 33.4 A G CEACAM5 intron × 10−20 0.13) × 10−15 0.13) 0.0049 rs9621 0.10 0.10 rs9621, 1.8 (0.08, 3.0 (0.08, rs12985771 chr19:41725256 36.3 C A CEACAM5 intron × 10−18 0.13) × 10−13 0.13) 0.0051 rs59654817

27

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

0.63 0.63 rs9621, 4.5 (0.43, 7.6 (0.43, rs59654817, rs770162662 chr19:41755473 0.3 T G CEACAM6 upstream × 10−10 0.83) × 10−5 0.83) 0.0026 rs12985771 rs9621, 0.10 0.10 rs59654817, 6.9 (0.06, 1.2 (0.06, rs12985771, rs7247317 chr19:41712945 49.0 T G CEACAM5 intron × 10−9 0.13) × 10−3 0.13) 0.0046 rs770162662 0.52 0.52 7.4 (0.33, 4.2 (0.33, rs757625335 chr19:48505357 0.3 A G LMTK3 intron × 10−8 0.71) × 10−3 0.71) 0.0014 rs601338 Alkaline phosphatase -1.12 (- -1.12 (- 4.3 1.22, - 4.2 1.22, - rs138587317 chr1:21563248 0.12 A G ALPL Glu69Lys × 10−110 1.03) × 10−105 1.03) 0.0043 rs149344982 -1.27 (- -1.27 (- 2.8 1.43, - 2.8 1.43, - rs149344982, rs121918007 chr1:21564139 0.06 A !A ALPL Glu114Lys × 10−51 1.10) × 10−46 1.10) 0.0019 rs138587317 -2.05 (- -2.05 (- rs149344982, 6.9 2.42, - 5.9 2.42, - rs138587317, rs773257111 chr1:21563143 0.01 A G ALPL Ala34Thr × 10−27 1.67) × 10−22 1.67) 0.0011 rs121918007 rs149344982, -1.54 (- -1.54 (- rs138587317, 3.6 1.98, - 3.1 1.98, - rs121918007, rs121918019 chr1:21564094 0.009 A G ALPL Ala99Thr × 10−12 1.11) × 10−7 1.11) 0.00043 rs773257111 rs149344982, rs138587317, -0.10 (- -0.10 (- rs121918007, 1.5 0.10, - 1.3 0.10, - rs773257111, rs4654748 chr1:21459575 43.4 T C NBPF3 intron × 10−118 0.09) × 10−113 0.09) 0.0045 rs121918019 rs149344982, rs138587317, rs121918007, -0.08 (- -0.08 (- rs773257111, 5.9 0.09, - 5.0 0.09, - rs121918019, rs1780329 chr1:21576457 16.5 A G ALPL intron × 10−52 0.07) × 10−47 0.07) 0.0019 rs4654748

28

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

rs149344982, rs138587317, rs121918007, rs773257111, -0.07 (- -0.07 (- rs121918019, 1.4 0.08, - 1.2 0.08, - rs4654748, rs11463187 chr1:21540322 21.2 TG T ALPL intron × 10−35 0.05) × 10−30 0.05) 0.0014 rs1780329 rs149344982, rs138587317, rs121918007, rs773257111, rs121918019, -0.14 (- -0.14 (- rs4654748, 1.9 0.16, - 1.6 0.16, - rs1780329, rs115257434 chr1:21570934 2.3 A G ALPL Intron × 10−20 0.11) × 10−15 0.11) 0.00083 rs11463187 rs149344982, rs138587317, rs121918007, rs773257111, rs121918019, rs4654748, -0.042 (- -0.042 rs1780329, 2.7 0.05, - 1.6 (-0.05, - rs11463187, rs1697405 chr1:21577713 40.5 !C C ALPL 3’ UTR × 10−19 0.03) × 10−15 0.03) 0.00085 rs115257434 rs149344982, rs138587317, rs121918007, rs773257111, rs121918019, rs4654748, rs1780329, 0.037 ( 0.037 ( rs11463187, 6.8 0.03, 2.3 0.03, rs115257434, rs1318236 chr1:21625531 43.9 C T RAP1GAP intron × 10−18 0.05) × 10−14 0.05) 0.00067 rs1697405 -0.09 (- -0.09 (- 3.1 0.10, - 1.5 0.10, - rs17300770 chr6:24462792 11.6 C G GPLD1 Asp275Glu × 10−43 0.08) × 10−37 0.08) 0.0016 rs573778305

29

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

0.05 0.05 1.8 (0.04, 8.4 (0.04, rs573778305, rs9467148 chr6:24435774 27.0 A G GPLD1 intron × 10−29 0.06) × 10−24 0.06) 0.0011 rs17300770 -0.51 (- -0.51 (- rs573778305, 9.4 0.63, - 4.4 0.63, - rs17300770, rs146221974 chr6:24473633 0.1 A G GPLD1 Ser159Leu × 10−17 0.39) × 10−11 0.39) 0.00058 rs9467148 rs573778305, 0.06 ( 0.06 ( rs17300770, 9.6 0.04, 4.5 0.04, rs9467148, rs116287860 chr6:24456679 8.5 C A GPLD1 intron × 10−14 0.08) × 10−8 0.08) 0.00049 rs146221974 rs573778305, rs17300770, -0.42 (- -0.42 (- rs9467148, 3.7 0.54, - 1.7 0.54, - rs146221974, rs183821586 chr6:24473963 0.1 G A GPLD1 intron × 10−11 0.30) × 10−5 0.30) 0.00040 rs116287860 -0.06 (- -0.06 (- 1.4 0.08, - 7.9 0.08, - rs6993155 chr8:125496809 4.4 G A — intergenic × 10−8 0.04) × 10−4 0.04) 0.00028 rs28601761 0.03 0.03 1.4 (0.02, 3.8 (0.02, rs2183745 chr9:101456893 27.8 T A — intergenic × 10−10 0.04) × 10−6 0.04) 0.00036 rs41282145 0.10 0.10 Frameshift 1.3 (0.08, 4.2 (0.08, rs56392308 chr9:133255669 6.5 C CG ABO Pro354 × 10−21 0.12) × 10−18 0.12) 0.0011 chr9:133264504 0.14 0.14 3.8 (0.09, 1.1 (0.09, rs527478501 chr11:62072649 0.6 G A — intergenic × 10−8 0.19) × 10−3 0.19) 0.00026 rs174564 -0.03 (- -0.03 (- 2.7 0.04, - 8.0 0.04, - rs174564, rs17145892 chr11:62432797 18.8 T A AHNAK downstream × 10−7 0.02) × 10−3 0.02) 0.00022 rs527478501 -0.05 (- -0.05 (- 1.9 0.06, - 5.5 0.06, - rs78689694 chr11:126364925 10.5 C G ST3GAL4 intron × 10−12 0.03) × 10−8 0.03) 0.00043 rs10893507 0.18 0.18 3.1 (0.11, 9.3 (0.11, rs200173452 chr12:552099 0.4 T C B4GALNT3 Arg382Cys × 10−8 0.24) × 10−4 0.24) 0.00027 rs7955258

30

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

0.03 0.03 2.4 (0.02, 6.9 (0.02, rs77303550 chr16:72045758 18.2 T C — intergenic × 10−7 0.04) × 10−3 0.04) 0.00023 rs71391445 !AAG AAGA

AGAA GAAA 0.03 0.03 4.0 (0.02, 1.5 (0.02, rs186021206, NA chr17:7156651 17.5 AGAG GAG — intergenic × 10−8 0.04) × 10−3 0.04) 0.00026 rs55714927 0.07 0.07 T C 2.6 (0.06, 9.5 (0.06, rs55714927 chr17:7176997 21.0 ASGR1 synonymous × 10−48 0.09) × 10−44 0.09) 0.0019 rs186021206 17 18 aThe position is given in hg38

19 b The corrected P-value is the P-value after Bonferroni correction for the number of variants in the locus.

20 cEffect of the minor allele (Amin) is reported as standard deviations of rank-based inverse normally transformed data 21 Table 3: Residual associations within loci associating with tumor biomarker levels.

31

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

rsID Chromosome:positiona EAb Biomarker(s) Gene 훽 [SD] P Covariatec

rs760077 chr1:155208991 A CA-15.3 THBS3 0.64 3.4 × 10−114 — rs41264915 chr1:155197995 G CA-15.3 MUC1 -0.33 1.5 × 10−16 — rs34262244 chr19:17795098 A CA-19.9 B3GNT3 -1.04 1.3 × 10−296 — rs10901252 chr9:133252613 C CEA ABO 1.61 1.8 × 10−210 — rs4654748 chr1:21459575 C ALP NBPF3 -0.94 8.6 × 10−311 — rs174564 chr11:61820833 G ALP FADS2 0.23 7.9 × 10−23 rs7943728 rs8736 chr19:54173495 T ALP TMC4 -0.97 1.0 × 10−300 —

Table 4: Variants associated with biomarker levels and with gene expression in whole blood.

EA: Effect allele; ALP: Alkaline phosphatase; CA: Cancer Antigen; CEA: Carcinoembryonic

antigen.

aThe position is given for hg38.

bThe effect allele is the minor allele.

cWe condition the expression on the covariate when there is more than one variant in the

region affecting the expression of the gene and the variant affecting biomarker value is not the

most significant variant.

32

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Figure legends Figure 1: A Manhattan plot

Figure 1 contains a Manhattan plot for A) Alpha fetoprotein; B) Cancer antigen 15.3; C) Cancer antigen 125; D) Cancer antigen 19.9; E) Carcinoembryonic antigen; F) Alkaline phosphatase.

Figure 2: Biomarker elevation across cancer types.

Figure 2 shows the median of the largest value ever recorded for each individual across cancers and diseases for a) Alpha fetoprotein; b) Cancer antigen 15.3; c) Cancer antigen 125; d) Cancer antigen 19.9; e) Carcinoembryonic antigen; f) Alkaline phosphatase. In each plot, International Statistical Classification of Diseases and Related Health Problems (ICD) 10 codes categorize cancers. The ICD-10 code C.80 represents cancers of unknown primary site. These cancers have high metastatic potential as they have spread before the primary tumor has grown large enough to be detected. Red line indicates the median for individuals not diagnosed with cancer (or any of the other diseases listed) at the end of 2015. Asterisk indicates diseases that differ significantly from this group by a two-sided Wilcoxon rank-sum test after Bonferroni correcting for 144 (6*24) tests. IBD: Inflammatory bowel disease.

33

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 30, 2019; DOI: 10.1158/1055-9965.EPI-18-1060 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.

Common and rare sequence variants influencing tumor biomarkers in blood

Sigurgeir Olafsson, Kristjan F. Alexandersson, Johann G.K Gizurarson, et al.

Cancer Epidemiol Biomarkers Prev Published OnlineFirst October 30, 2019.

Updated version Access the most recent version of this article at: doi:10.1158/1055-9965.EPI-18-1060

Supplementary Access the most recent supplemental material at: Material http://cebp.aacrjournals.org/content/suppl/2019/10/30/1055-9965.EPI-18-1060.DC1

Author Author manuscripts have been peer reviewed and accepted for publication but have not yet been Manuscript edited.

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cebp.aacrjournals.org/content/early/2019/10/30/1055-9965.EPI-18-1060. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cebp.aacrjournals.org on October 1, 2021. © 2019 American Association for Cancer Research.