LEVERAGING DEMOGRAPHIC DIFFERENCES IN INCIDENCE FOR

DISCOVERY AND VALIDATION OF RISK VARIANTS IN GLIOMA

by

QUINN T. OSTROM

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Department of Population and Quantitative Health Sciences

CASE WESTERN RESERVE UNIVERSITY

January, 2018

CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Quinn T. Ostrom

candidate for the degree of Ph.D.

Committee Chair William Bush, Ph.D.

Committee Member Jill Barnholtz-Sloan, Ph.D.

Committee Member Frederick Schumacher, Ph.D.

Committee Member Farren Briggs, Ph.D.

Date of Defense November 16, 2017

*We also certify that written approval has been obtained for any proprietary material contained therein.

2

Contents List of Tables ...... 6 List of Figures ...... 8 Acknowledgements ...... 11 Abstract ...... 14 Chapter 1 – Introduction ...... 16 Background ...... 16 Literature review ...... 17 1.2.1 Glioma classification ...... 17 1.2.2 Demographic variation in glioma incidence ...... 18 1.2.3 Studies of environmental and behavioral risk factors in glioma ...... 19 1.2.4 Genome-wide association studies ...... 24 1.2.5 Inherited risk of glioma ...... 27 Pathway approaches to germline SNP data ...... 34 Guiding discovery by focusing on disease disparities ...... 38 1.4.1 Sex-specific analyses in genome-wide association studies ...... 39 1.4.2 Age-at-onset in genome-wide association studies ...... 40 Study population ...... 41 1.5.1 Study cohorts ...... 41 1.5.2 Genotyping, quality control and imputation ...... 42 Specific Aims ...... 44 Chapter 2 – Identifying sex-specific risk loci for glioma ...... 47 Abstract ...... 47 Background ...... 47 Methods ...... 49 2.3.1 Study population...... 49 2.3.2 Genotyping, quality control and imputation ...... 49 2.3.3 Statistical methods ...... 50 Results ...... 54 2.4.1 Previously discovered glioma risk regions ...... 56 2.4.2 Genome-wide scan of nominally significant regions ...... 62 2.4.3 Agnostic scan of sex loci ...... 65 2.4.4 Combined analysis of germline variants and somatic characterization ...... 65

3

2.4.5 Sex-stratified genotypic risk scores ...... 68 Discussion ...... 72 Conclusions ...... 80 Chapter 3 – Sex-specific and pathway modeling of inherited glioma risk ...... 82 Abstract ...... 82 Background ...... 83 Methods ...... 84 Results ...... 87 3.4.1 Sex-specific gene scores ...... 87 3.4.2 Gene scores conditioned on previously identified glioma risk loci ...... 89 3.4.3 Gene scores for ...... 91 3.4.4 Sex-specific pathway scores ...... 92 Discussion ...... 99 Conclusions ...... 104 Chapter 4 – Identifying risk loci associated with variation in age at diagnosis in glioblastoma ...... 106 Abstract ...... 106 Background ...... 107 Methods ...... 108 4.3.1 Study population...... 108 4.3.2 Genotyping and imputation...... 109 4.3.3 Statistical methods ...... 109 Results ...... 112 4.4.1 Previously identified glioma risk regions ...... 112 4.4.2 Sex- and age stratified results ...... 121 4.4.3 Combined analysis of germline variants and somatic characterization ...... 124 Discussion ...... 127 Conclusions ...... 132 Chapter 5 – Conclusions and future directions ...... 134 Conclusion ...... 134 Future directions ...... 137 5.2.1 Phenotypic variability, and incorporating molecular classification of glioma into genetic association studies...... 137

4

5.2.2 Validation of sex chromosome associations and new approaches to analyzing the sex ...... 139 5.2.3 Exploring the association between melanoma and glioma ...... 142 5.2.4 Familial approaches to glioma genetics ...... 143 5.2.5 Genetic association studies in non-European and admixed populations ...... 144 5.2.6 Sex-specific variation in genetic risk for cancer ...... 147 Bibliography ...... 149

5

List of Tables Table 1-1. Inherited syndromes associated with gliomas ...... 27 Table 1-2 Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI) (Melin, et al.) ...... 32 Table 2-1. Population characteristics by study and sex ...... 55 Table 2-2. Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI), overall and stratified by sex ...... 57 Table 2-3 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p- values from meta-analysis and individual studies for rs11979158, rs55705857 and rs9841110 overall and by histology groupings...... 57 Table 2-4 Case-only odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis for rs11979158, rs55705857 and rs9841110 overall and by histology groupings...... 59 Table 2-5 Case-only odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs11979158, rs55705857 and rs9841110 overall and by histology groupings...... 59 Table 2-6 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p- values from meta-analysis for rs11979158, rs55705857 and rs9841110 by specific non-GBM histologies...... 60 Table 2-7 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p- values from meta-analysis and individual studies for rs11979158, rs55705857 and rs9841110 by specific non-GBM histologies...... 61 Table 2-8 Characteristics of individuals in The Cancer Genome Atlas, by study and sex 65 Table 2-9. Linkage disequilibrium measures, sex-stratified odds ratios, and 95% confidence intervals (95% CI), and p-values from meta-analysis for marker SNPs selected within the Cancer Genome Atlas genotyping data ...... 67 Table 2-10 Risk allele frequencies (RAF) Case-only odds ratios, 95% confidence intervals (95% CI), and p-values for marker SNPs from four study meta-analysis and the Cancer Genome Atlas genotyping data ...... 68 Table 2-11 Odds ratios and 95% confidence intervals for unweighted score in all glioma (URSa) by sex...... 71 Table 2-12 Odds ratios and 95% confidence intervals for unweighted score in GBM (URS-GBMa) by sex...... 71 Table 2-13 Odds ratios and 95% confidence intervals for unweighted score in all non- GBM (URS-NGBMa) by sex...... 72 Table 2-14 Minor allele frequencies (MAF), for meta-analysis and individual studies for rs11979158, rs55705857 and rs9841110 overall and by histology groupings. .... 74 Table 3-1 Demographic characteristics of included GWAS studies ...... 86 Table 3-2 Gene scores in males for prioritized genes by algorithm and histology...... 88 Table 3-3 Gene scores in females for prioritized genes by algorithm and histology ...... 89 Table 3-4 Conditional gene scores in males for prioritized genes by algorithm and histology ...... 90 Table 3-5 Conditional gene scores in females for prioritized genes by algorithm and histology ...... 91 Table 3-6 Gene scores for prioritized X chromosome genes by histology ...... 92 6

Table 3-7 Significant pathways (p<0.001 in any testing group) by sex and histology ..... 92 Table 4-1 Population characteristics by study ...... 108 Table 4-2 Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI) stratified by age...... 113 Table 4-3 Case-only betas, 95% confidence intervals (95% CI), and p-values from meta- analysis and individual studies for selected previous GWAS hits, overall, and in persons 54+ at time of diagnosis only ...... 117 Table 4-4 Case-only betas, 95% confidence intervals (95% CI), and p-values from meta- analysis and individual studies for selected previous GWAS hits ...... 117 Table 4-5 Age-specific odds ratios (OR), 95% confidence intervals (95% CI), and p- values from meta-analysis for rs723527, rs11979158, and rs55705857 for cases diagnosed in 2000 and later ...... 119 Table 4-6 Age-specific odds ratios (OR), 95% confidence intervals (95% CI), and p- values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 for cases diagnosed in 2000 and later ...... 119 Table 4-7 Minor allele frequencies (MAF), for meta-analysis and individual studies by case-control status and age group for rs723527, rs11979158, and rs55705857. 120 Table 4-8 Age- and sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 in persons age 18-53 ...... 122 Table 4-9 Age- and sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 in persons age 54-63...... 122 Table 4-10 Age- and sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 in persons age 64+ ...... 123 Table 4-11 Minor allele frequencies (MAF), for meta-analysis and individual studies by case-control status, age group, and sex for rs75061358, rs723527, rs11979158, and rs55705857...... 123 Table 4-12 Linkage disequilibrium measures, minor allele frequency (MAF) odds ratios, and 95% confidence intervals (95% CI), and p-values from prior eight-study meta-analysis for previously-identified glioma risk SNPs or marker SNPs selected within The Cancer Genome Atlas (TCGA) genotyping data ...... 126 Table 4-13 Case-only betas, 95% confidence intervals (95% CI), and p-values for previously-identified glioma risk SNPs or marker SNPs within The Cancer Genome Atlas (TCGA) genotyping data ...... 127 Table 4-14 Age-specific odds ratios (OR), 95% confidence intervals (95% CI), and p- values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857...... 128

7

List of Figures Figure 1-1 Distribution of tumors overall, and glioma by histologic type in the US (CBTRUS 2010-2014) ...... 18 Figure 1-2. Average Annual Age-Adjusted Incidence by Histologic group and Age at diagnosis (CBTRUS 2010-2014) ...... 19 1.2.3 Figure 1-3. Average Annual Incidence by Sex and Age at diagnosis (CBTRUS 2010-2014) Studies of environmental and behavioral risk factors in glioma ...... 19 Figure 1-4. Manhattan plot of -log(p) values for all glioma (~8,200 cases and ~14,400 controls) ...... 33 Figure 1-5 Manhattan plot of -log(p) values for glioblastoma (~4,600 cases and ~14,400 controls) ...... 34 Figure 1-6 Manhattan plot of -log(p) values for non-glioblastoma glioma (~ 3,100 cases and ~14,400 controls) ...... 34 Figure 2-1 Average Annual Incidence of all glioma, glioblastoma and lower grade glioma by sex and age at diagnosis (CBTRUS 2010-2014) ...... 48 Figure 2-2. Study Schematic for analyses of autosomal SNPs ...... 49 Figure 2-3. Study Schematic for analyses of sex chromosome SNPs ...... 52 Figure 2-4 Sex-specific odds ratios overall and by histology grouping, 95% CI and p- values for selected previous GWAS hits and 3p21.31 (rs9841110) for all glioma, GBM, and non-GBM ...... 56 Figure 2-5. Manhattan plot of -log(p) values for all glioma in A) males and B) females 63 Figure 2-6 Manhattan plot of -log(p) values for GBM in A) males and B) females ...... 63 Figure 2-7 Manhattan plot of -log(p) values for non-GBM in A) males and B) females . 63 Figure 2-8 P-values of SNPs between 48.8mb and 50mb on chromosome 3 in males for A) all glioma, B) GBM, and C) non-GBM, and in females for D) all glioma, E) GBM, and F) non-GBM ...... 64 Figure 2-9 Proportion of samples with IDH1/2 mutation in the TCGA GBM and LGG datasets by sex, overall and stratified by study ...... 66 Figure 2-10 Proportion of samples by glioma subtype (based on IDH1/2 mutation, 1p19q, and TERT mutation [43]) in the TCGA GBM and LGG datasets by sex, overall and stratified by study ...... 66 Figure 2-11 Density of histology-specific unweighted risk score by sex and case/control status for A) URS in all glioma, B) URS in GBM, C) URS in non-GBM, D) URS- GBM in GBM, only and E) URS-NGBM in non-GBM only ...... 69 Figure 2-12 Odds ratios and 95% confidence intervals for unweighted risk (URS) score in A) all glioma, B) GBM-specific URS (URS-G) in GBM, and C) and non-GBM- specific URS (URS-NGBM) for in non-GBM ...... 70 Figure 2-13 Sex-specific odds ratios and 95% CI from meta-analysis and by study for rs11979158 (7p11.2) for all glioma, GBM, and non-GBM ...... 73 Figure 2-14 Sex-specific odds ratios and 95% CI from meta-analysis and by study for rs55705857 (8q24.21) for all glioma, GBM, and non-GBM ...... 74 Figure 2-15 Sex-specific odds ratios and 95% CI from meta-analysis and by study for rs9841110 (3p21.31) for all glioma, GBM, and non-GBM ...... 77 Figure 3-1. Study schematic for a) generation of discovery and validation summary statistic sets, b) generation of discovery gene-based tests and prioritization c) 8

validation of gene-based tests, d) generation of discovery pathway-based tests and prioritization e) validation of pathway-based tests ...... 85 Figure 3-2 Gene scores for prioritized genes by algorithm, histology, and sex for a) BPESC1 (3q23), B) SLC6A18 (5p15.33), C) TERT (5p15.33), D) EGFR (7p11.2), E)CDKN2B (9p21.3), F) DNAH2 (17p13.1), G) STMN3 (20q13.33), H) RTEL1-TNFRSF6B (20q13.33) ...... 88 Figure 3-3 Conditional gene scores for prioritized genes by algorithm, histology, and sex for A) SLC6A18 (5p15.33), B) TERT (5p15.33), C) EGFR (7p11.2), D)CDKN2B (9p21.3), E) DNAH2 (17p13.1), F) RTEL1-TNFRSF6B (20q13.33) ...... 90 Figure 3-4 Biocarta telomere pathway for all glioma in a) males, and b) females, and for GBM in c) males and d) females...... 93 Figure 3-5 Bladder cancer (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females...... 94 Figure 3-6 Glioma (KEGG) for all glioma in a) males, and b) females, and for glioblastoma in c) males and d) females...... 95 Figure 3-7 Melanoma (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females...... 96 Figure 3-8 Non-small cell lung cancer (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females ...... 97 Figure 3-9 Pancreatic cancer (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females ...... 98 Figure 3-10 Overlap of genes containing SNPs with nominally significant glioma associations by identified KEGG pathways ...... 99 Figure 4-1 Average a) Annual Incidence of GBM by age at diagnosis (CBTRUS 2010- 2014), and b) Relative survival after diagnosis with GBM by age of diagnosis (SEER 2000-2014) ...... 107 Figure 4-2 Study schematic for age-stratified case-control analyses ...... 110 Figure 4-3 Study schematic for case-only analyses ...... 111 Figure 4-4 Manhattan plot of -log(p) values for GBM in A) ages 18-53, B) ages 54-63, and C) ages 64+ ...... 114 Figure 4-5 Age-specific odds ratios, 95% CI and p-values for selected previous GWAS hits ...... 115 Figure 4-6 Manhattan plot of -log(p) values for case-only analysis of GBM in A) fixed effects and B) random effects ...... 116 Figure 4-7 Case only estimates for selected SNPs in a) all ages, b) cases ages 54+ only ...... 116 Figure 4-8 Age- and sex-specific odds ratios, 95% CI and p-values for selected previous GWAS hits ...... 122 Figure 4-9 Proportion of samples in the TCGA GBM dataset by age by A) IDH mutation status, B) glioma subtypes (based on IDH, 1p19q, and TERT) , and C) methylation groups identified by TCGA pan-glioma working group ...... 125 Figure 4-10 Age-specific odds ratios and 95% CI from meta-analysis and by study for rs723527 ...... 129 Figure 4-11 Age-specific odds ratios and 95% CI from meta-analysis and by study for rs11979158 ...... 129 9

Figure 4-12 Age-specific odds ratios and 95% CI from meta-analysis and by study for rs55705857 ...... 129 Figure 5-1 . Incidence of malignant brain tumors by global region (CBTRUS, CI5-X) 145 Figure 5-2 Incidence of glioma by race, ethnicity and histology in the US (CBTRUS) 145 Figure 5-3 Incidence rate ratios for common cancers (SEER 2009-2013) ...... 148

10

Acknowledgements

First, I would like to thank my advisor, Jill Barnholtz-Sloan. Learning and working with her has been extremely rewarding, both professionally and personally, and I am incredibly grateful to her for her mentorship and support. While working with her, she has allowed me the opportunity to be involved in a broad range of multi-disciplinary projects that have taught me to think broadly about scientific questions. She has taught me, by her example, what a good scientist and collaborator should be.

I would like to thank Farren Briggs, Frederick Schumacher, and William Bush for being on my dissertation committee, and for their thoughtful advice and insightful critique on this work. I am also grateful to Nathan Morris, for his statistical expertise and comments on my analytic plans.

I am grateful to Carol Kruchko, of the Central Brain Tumor Registry of the United States, for helping me learn the potential clinical and public health impact of brain tumor epidemiology. Both Ms. Kruchko and Dr. Barnholtz-Sloan’s dedication and passion to brain tumor research have been incredibly inspiring as I have chosen to make it a focus of my career.

I am also thankful for the friendship and support of all of the current and past members of the

Barnholtz-Sloan lab, as well as their feedback on this work, including Yanwen Chen, Haley

Gittleman, Lindsay Stetson, Peter Liao, Yi Fritz, and Jordonna Fulop. I am also grateful to

William Huang and Warren Coleman for their assistance in setting up and trouble-shooting

11

analytic pipelines. I would particularly like to thank Yingli Wolinsky and Karen Devine, for their guidance and support throughout my time as part of this team.

I am grateful to the Glioma International Case Control Study Consortium (Christopher I.

Amos, Jill S. Barnholtz-Sloan, Jonine L. Bernstein, Melissa L. Bondy, Elizabeth B.

Claus, Richard S. Houlston, Dora Il'yasova, Robert B. Jenkins, Christoffer Johansen,

Daniel Lachance, Rose Lai, Ching C. Lau, Beatrice S. Melin, Ryan T. Merrell, Sara H.

Olson, Siegal Sadetzki, Joellen Schildkraut, and Sanjay Shete), the San Francisco Adult

Glioma Study working group (Margaret Wrensch, John Wiencke, Terri Rice, Lucie

McCoy, Helen Hansen, Mitch Berger, Paige Bracci, Susan Chang, Jennifer Clarke,

Annette Molinaro, Arie Perry, Melike Pezmecki, Michael Prados, Ivan Smirnov, Tarik

Tihan, Kyle Walsh, Joseph Wiemels, Shichun Zheng), and the GliomaScan working

group (Ulrika Andersson, Preetha Rajaraman, Stephen J. Chanock, Martha S. Linet,

Zhaoming Wang, Meredith Yeager, Laura E. Beane Freeman, Stella Koutros, Demetrius

Albanes, Kala Visvanathan, Victoria L. Stevens, Roger Henriksson, Dominique S.

Michaud, Maria Feychting, Anders Ahlbom, Graham G. Giles, Roger Milne, Roberta

McKean-Cowdin, Loic Le Marchand, Meir Stampfer, Avima M. Ruder, Tania Carreon,

Goran Hallmans, Anne Zeleniuch-Jacquotte, J. Michael Gaziano, Howard D. Sesso,

Mark P. Purdue, Emily White, Ulrike Peters, Howard D. Sesso, Julie Buring) for

graciously allowing me to use their data, and providing feedback on my analysis plans for

this project as they developed. I would especially like to thank Melissa Bondy, Georgina

Armstrong, Beatrice Melin, Margaret Wrensch, Terri Rice, Christopher Amos, Jeanette

Eckel-Passow, Ben Kinnersley, Richard Houlston, Robert Jenkins, Joshua Rubin, Justin

12

Lathia, and Michael Berens for their feedback on aspects of this project. I am also

grateful to all the patients and individuals for their participation in the studies

incorporated in this project, and would also like to thank the clinicians and other hospital

staff, cancer registries and study staff who contributed to the blood sample and data collection.

Finally I would like to dedicate this dissertation to my parents—Charles and Kriss

Ostrom—for a lifetime of support and encouragement, and to my husband Corey Close,

whose patience and affection have been invaluable.

13

Leveraging Demographic Differences in Incidence for Discovery and Validation of

Risk Variants in Glioma

Abstract

By

QUINN T. OSTROM

Glioma is the most commonly occurring malignant brain tumor in the United States, with highest incidence in males and persons over 60. There are no well validated risk factors for these tumors that explain a large proportion of cases. To date, genome-wide association studies have identified 25 validated risk loci which explain 30% of overall heritable risk. The goal of this dissertation is to utilize demographic differences in incidence to increase power for detection of variants with varying effects by age and sex.

Sex-stratified logistic regression models were used to generate betas, standard errors, and p-values. A significant association was detected at 7p11.2 (EGFR), a SNP previously associated with glioma risk, in males only. A previously identified intergenic SNP in

8q24.21 had an effect size in females that was approximately 2-fold that of males.

Additionally, I identified a new large region on 3p21.31 with significant association in females only. Gene-specific p-values were generated by combining single SNP p-values for all SNPs within gene boundaries using three different scoring algorithms. Genes were prioritized that were identified by at least 2 of 3 algorithms. Pathway scores were generated by combined gene-specific p-values using Pascal. Significant associations were found in genes containing SNPs previously associated with glioma (EGFR, and TERT)

14

which remained nominally significant after conditioning on known SNPs, suggesting these regions may contain additional sources of genetic risk. There were significant associations in the telomere maintenance pathway in both sexes. Age-stratified logistic regression models were used to generate betas, standard errors, and p-values. A previously identified SNP in 8q24.21 was found to have a significant effect in the youngest age group, with no detectable effect in older age groups. An examination of glioblastoma patients within The Cancer Genome Atlas found higher prevalence of

‘LGG’-like characteristics within this younger age group, suggesting that frequency of

‘secondary’ glioblastoma is higher in younger persons. Though the identified differences in effect of risk variants do not fully explain the observed incidence difference in glioma by sex and age, they provide some evidence that there may be sex or age variation in processes of gliomagenesis.

15

Chapter 1 – Introduction

Background

Glioma is the most commonly occurring malignant brain tumor in the United States (US), with an average annual age-adjusted incidence of 6.0 per 100,000 from 2009-2013 [1].

Though these tumors are rare, they cause significant morbidity and mortality.

Glioblastoma (GBM), the most commonly occurring type of glioma, has a 5-year survival rate of approximately 5% [1]. There are no well validated risk factors for these tumors that explain a large proportion of cases, and the vast majority cases are sporadic [2, 3].

Estimates of the heritability of these tumors are ~25% (with previously discovered risk loci for these tumors accounting for approximately 30% of heritable risk) suggesting that there are both undiscovered behavioral or environmental risk factors, and undiscovered sources of genetic risk [4, 5] . Incidence of glioma varies substantially by age, sex, and race/ethnicity, and they are most common in white non-Hispanic men in older adulthood

[1]. The male preponderance in incidence increases with age [1]. This variation in glioma incidence by sex and age is currently unexplained by known risk factors.

Most previously conducted glioma genome-wide association studies (GWAS) have identified 25 validated risk loci in European ancestry populations, and have used

European ancestry samples where the majority of cases are males, and are predominately higher grade tumors [4, 6-10]. To date, glioma GWAS have not specifically investigated the potential genetic sources of risk variation by sex, or age. Conducting research in populations that vary in glioma incidence also provides an opportunity to discover variants that may vary in direction or magnitude of effect between populations and not 16

rise to the level of genome-wide statistical significance when groups are combined in

single analysis. Discovery of variants that differ between demographic groups in the

population can contribute to development of population-specific risk calculations, and

contribute to the development of overall risk models.

Literature review

1.2.1 Glioma classification

Gliomas represent 31% of all brain and central nervous system (CNS) tumors diagnosed

in the US, and 81% of malignant brain and CNS tumors [1]. These tumors originate from

glial cells, and histologic classification of gliomas is categorized primarily by the

predominant cell type of origin: astrocytomas (derived from astrocytes),

oligodendrogliomas (derived from oligodendrocytes), and oligoastrocytomas (or ‘mixed’

gliomas). Glial cells are the most numerous cell type in the brain, where they provide

support for the signaling functions of neurons [11]. Astrocytes are the most common type

of glial cell, and their primary function is to maintain the chemical environment for

neuronal signaling. The primary function of oligodendrocytes is to lay myelin around

axons. Within specific histologic groups, tumors are assigned a world health organization (WHO) grade (I-IV, with WHO grade IV having the worst prognosis) based on cellular characteristics that are associated with clinical outcomes [12].

The most common type of glioma in adults is GBM, a WHO grade IV astrocytoma,

which accounts for 56.1% of all gliomas diagnosed in the US (Figure 1-1) [1]. GBM has

the poorest outcomes of all glioma, with median survival after diagnosis estimated to be

~12 months [13]. Lower-grade gliomas (LGG, or non-GBM glioma), are WHO grade II 17

or III astrocytomas, oligodendroglioma, or oligoastrocytoma [1]. Non-GBM astrocytomas with WHO grade I-III represent approximately 20% of all glioma, while oligodendroglial tumors represent ~8% of diagnosed gliomas in the US.

Figure 1-1 Distribution of brain tumors overall, and glioma by histologic type in the US (CBTRUS 2010-2014)

1.2.2 Demographic variation in glioma incidence

Incidence of glioma varies significantly by age in the US, and this variation in incidence

by age varies significantly by subtype (Figure 1-2) [1]. Lower grade gliomas (including non-glioblastoma astrocytomas and oligodendroglial tumors, WHO grade II and III) are the most common subtype in persons less than 40 years old, and incidence of GBM increases with age to a peak at 75-79 years old [1].

18

Figure 1-2. Average Annual Age-Adjusted Incidence by Histologic group and Age at diagnosis (CBTRUS 2010-2014)

Incidence of glioma varies significantly by sex in the US, and most glioma histologies

occur with significantly higher incidence in males. Males and females have

approximately the same incidence of glioma in childhood, but males begin to have higher

incidence than females during young adulthood, and this difference increases with

increasing age (Figure 1-3) [1].

Figure 1-3. Average Annual Incidence by Sex and Age at diagnosis (CBTRUS 2010-2014)

1.2.3 Studies of environmental and behavioral risk factors in glioma

Many risk factors have been examined as potential contributors to glioma risk, but few

have been validated. The strongest evidence exists for increased risk after ionizing

radiation exposure to the head, and decreased risk with a history of allergies or atopic

19

disease [3]. The potential influence of other factors has been examined extensively but consistent associations have not been found.

1.2.3.1 Allergies

Studies of large and diverse groups of cases and controls have consistently shown that history of atopic conditions (including asthma, hay fever, eczema, and food allergies) lead to reduced glioma risk [14-21]. A recent meta-analysis estimated that allergies reduce glioma risk by nearly 40% [22], but specific associations by allergy type and duration have been inconsistent. One analysis found that glioma risk decreases with increased total reported allergy types (e.g., seasonal, animal, medication, and food), younger age at first diagnosis with allergy and increased time since allergy diagnosis

[16]. Other studies have shown that the decrease in glioma risk was strengthened by current or recent diagnosis of allergy, which suggests that recall bias may be contributing to some of the observed effect [19, 21].

Histology-specific analyses have suggested that the protective effect conferred by allergy may vary by histology, and specific allergy types. In one pooled analysis of seven case- control studies, oligodendroglioma and anaplastic oligodendroglioma odds were significantly reduced among individuals with a history of asthma, alone or in combination with allergies, but not as a result of a history of allergies alone [23]. The

underlying biologic mechanism through which allergy protects against development of

glioma is not known. One hypothesis is that those with allergic conditions may be in a

heightened state of immune-surveillance, and thus be more likely to discourage abnormal

cell growth [16, 22, 24, 25]. It has also been suggested that IgE antibodies for some

20

allergens may also react to brain tumor antigens [26, 27], or that those with allergies may

be more effective at responding to potential environmental carcinogens throughout the

life span [26, 28]. There is some evidence of modification of the association between allergy and glioma by known glioma risk single nucleotide polymorphisms (SNPs) [29]--

including rs498872 (PHLDB1), rs4977756 (CDKN2B), rs6010620 (RTEL1), and

rs4809324 (RTEL1) [29, 30]—which may provide further evidence towards determining the biological mechanism of this association.

1.2.3.2 Ionizing Radiation

Moderate-to-high radiation exposure is the strongest and most consistently documented environmental risk factor for brain tumors [31], where it has been observed in atomic bomb survivor studies [32, 33], therapeutic radiation cohorts (both for treatment of prior cancer and benign conditions) [34-44] and occupational and environmental studies [45].

The association between ionizing radiation and non-malignant tumors, particularly meningioma, is much stronger than glioma. In studies of the Israeli tinea capitus cohort

(mean dose: 1.5 Gy, mean follow up: 40 years), analyses have found a ~2 fold increase in risk for gliomas for exposed children as compared to unexposed siblings, with a linear dose-response association [38]. Data from the atomic bomb survivors replicated glioma specific risks consistent with a linear dose-response at moderate doses [32]. Two studies of cancer survivors who had received high-dose radiation treatments for a non-brain primary cancer had increased odds of gliomas [46, 47], while the results of other analyses have been inconsistent [48-51].

21

1.2.3.3 Endogenous and Exogenous Sex Hormone Exposure

Previous analyses have examined the impact of exogenous and endogenous sex hormone

exposure as a potential environmental risk factor for glioma [52]. The lower incidence of glioma in females has led some to hypothesize that increased lifetime estrogen exposure may act as a protective factor against developing these tumors. Lifetime hormone exposure can be difficult to accurately measure, and analyses have focused on surrogates for endogenous exposure (e.g. age at menarche, parity, and age at menopause) and use of supplemental estrogen or progesterone (e.g. hormone replacement therapy [HRT], oral

contraceptives).

Two meta-analyses of these studies found an increased association between age at

menarche and glioma with relative risk of ~1.3 [53, 54]. Studies of parity and glioma risk

have been mixed, with some finding decreases in glioma risk for those who have had

children as opposed to childless persons [55-59], and others finding some increase in risk with childbearing [60, 61], with few of these results reaching statistical significance. A meta-analysis of these studies found a negative, but non-significant association between parity and glioma [53]. Several studies have evaluated the potential relationship between age at menopause and risk of developing a glioma, but have largely found null associations [55-61]. Two meta-analyses of these studies found a null association between age at menopause and glioma [53, 54].

Analyses attempting to determine the association between glioma and exogenous sex hormone exposure have focused primarily on exposure to HRT and hormonal contraceptives. The results of these studies have been mixed, with some studies finding

22

reduced risk of glioma with use of HRT [56, 60, 62] and others finding a null association or increased risk [58, 59, 61, 63, 64]. A meta-analysis of six prior case-control studies

found decreased odds of glioma with ever use of HRT (OR=0.68, 95% CI: 0.58-0.61)

[54]. One recent meta-analysis found significantly increased risk of glioma for users of

estrogen-only HRT (RR=1.23, 95% CI=1.06-1.42), but not estrogen-progestin HRT

(RR=0.92, 95% CI: 0.78-1.08) [65].

Studies examining the association between use of hormonal contraception and glioma generally found a small inverse association or a null association [56, 58-63, 66]. A meta- analysis of six prior case-control studies found decreased odds of glioma with ever use of

oral hormonal contraceptives as compared to non-users (OR=0.71, 95% CI: 0.60-0.83)

[54]. A recent study utilizing the Danish cancer registry and the Danish national

prescription registry found a significant association between use of hormonal

contraception and glioma, with the highest increase associated with over five years of

contraceptive use (OR=1.9, 95% CI: 1.2-.1.9) [67].

Johansen and colleagues recently reviewed the state of the literature in glioma risk factor

research and found substantial variability in the results derived from case-control studies

as opposed to cohort study designs [68]. In evaluating the evidence present by case-

control studies of exogenous hormone exposure (relying on retrospective self-report) as

opposed to those from cohort studies with prospective self-report, or prescribing records

were available they concluded that studies based on self-report resulted in lower risk

estimates as compared to the null associations reported by analyses using prescription

data. The inability to replicate case-control findings in cohort designs suggests that these

23

case-control studies may be affected by bias. The potential protective role of estrogen on glioma risk remains an interesting hypothesis that has yet to be proven or disproven.

Further exploration of the effect of estrogen on gliomagenesis in model organizations may provide mechanistic information that could be used in developing future studies examining this association in humans.

1.2.4 Genome-wide association studies

Genome-wide approaches to genetic studies have been in use since the early 1980s when

technologies were developed that could characterize polymorphisms across the genome.

Originally these methods used restriction fragment length polymorphisms [69, 70], but

later utilized microsatellite markers [71]. Due to limitations of the technologies used to

characterize these markers, they often were spaced unevenly across the genome and

characterized only a small number of markers at a time (<1000). These techniques

allowed for the characterization of only a small portion of human genetic variability, and

were largely used for family-based linkage studies. Though linkage analyses have been

successfully used to identify risk loci of large effect for many diseases, they lack power

to identify variants of low effect size, and may not identify risk associated with common

variants that may be present in unaffected family members [72].

In the late 1990s, Risch and Merikangas proposed the use of association study designs for

genetic studies of complex disease in order to increase power, by comparing alleles

frequencies between cases and controls [73, 74]. These designs were possible using both

familial and non-familial designs. Accompanying this was the emerging hypothesis that

genetic risk for common disease is likely comprised of many common (Minor allele

24

frequency [MAF]>5%) alleles (the common disease, common variant hypothesis) [75].

Though the number of known SNPs was still small, technologies were also being development for genotyping thousands of SNPs in parallel [76]. The completion of phase

I of the HapMap project produced a publicly available databased cataloging over

1,000,000 polymorphic SNPs and mapping patterns of linkage disequilibrium across the genome [77]. This dataset allowed for informed selection of ‘tagged’ SNPs (that could reliably predict haplotypes that would contain a causal variant) for association studies.

This resulted in the publication of thousands of GWAS including those by Ozaki et al.

[78], Klein et al. [79], and the Wellcome Trust Case Control consortium [80]. Modern genotyping arrays for GWAS generally contain 500,000-5,000,000 SNPs, which is likely enough to tag all haplotypes within the human population, but does not include all known polygenic SNPs. Projects such as HapMap, the 1,000 genomes project [81], the UK 10K

[82, 83], and the Haplotype Reference Consortium [84] have provided rich references datasets that can be used to impute GWAS datasets and may increase the probability of identification of a true causal variant via association study [85].

GWAS methods have been highly successful in identifying risk variants. As of August

2017, the GWAS catalog reports 3,057 publications reporting 33,544 SNP associations.

These studies have been able to include larger and larger numbers of cases and controls as genotyping costs have rapidly decreased, and researchers have been able to leverage previously collected datasets through meta-analysis. Increasing array size—and imputed datasets—leads to an ever-increasing multiple testing burden. The recent GICC meta- analysis used a total of ~8,200 glioma cases and ~14,400 controls, the largest set of glioma cases to date, and identified 13 additional loci of relatively low effect size [4]. 25

This suggests that further use of GWAS methods in glioma would require significant

increases in sample size in order to be appropriately powered to identify additional loci of

small effect. Funding for increased collection of large numbers of cases and controls is

not likely to be available; especially considering the time it takes to accrue many more

cases in a rare disease. New methods can leverage datasets collected and summary

statistics generated by GWAS, and are an efficient way to continue to search for sources

of inherited genetic risk in complex disease. .

GWAS is often considered to be ‘hypothesis-free’, in that no a priori knowledge is used to select SNPs prior to analysis [86]. These study designs are based on multiple

assumptions, particularly that risk for common diseases lies in genetic variants that are common in the population and that the genetic effect of these variants is additive. Most

GWAS are designed within the context of case-control studies, and are therefore also subject to the limitations and assumptions of these study designs. Genome-wide scans

can be a useful approach for generating hypothesis about the genetic architecture of a

disease. Use of a priori knowledge to develop research questions using genome-wide

data can allow for study designs that limit multiple testing burden, and thus increase

power, by eliminating tests for loci that have not been prioritized [87, 88] . Another way

to increase precision and power is through increasing the precision and specificity of

phenotypes used for these analyses, or by targeting specific high-risk populations.

Applying genome-wide associations study methods to specific populations with varying incidence of disease can increase power for detection of variants that may vary in effect size between demographic groups. Conducting research in these subpopulations that vary

26

in incidence also provides an opportunity to discover variants that may vary in effect size within the combined populations and, thus, not rise to the level of genome-wide statistical significance as a result. Focusing on population segments with varying incidence of glioma and glioma subtypes provide opportunities to increase power for detection, as well as elucidate risk loci that may have varying effect sizes between groups.

1.2.5 Inherited risk of glioma

1.2.5.1 Mendelian cancer syndromes, and familial aggregation of glioma

The vast majority cases of glioma occur in individuals with no family history of glioma, but approximately 5% of gliomas are familial [2]. An even smaller proportion of gliomas are due to Mendelian disorders or inherited syndromes, approximately 1-2% of adult and

4% of pediatric cases [45]. A summary of the inherited syndromes most commonly associated with glioma is provided in Table 1-1.

Table 1-1. Inherited syndromes associated with gliomas

Disorder/Syndrome Associated Gene (Location) Mode of Inheritance Phenotypic features (OMIM ID) gliomas Neurofibromas, Neurofibromatosis 1 Astrocytoma, optic NF1 (17q11.2) Dominant Schwannomas, café-au-lait (162200) nerve glioma macules Acoustic neuromas, Neurofibromatosis 2 NF2 (22q12.2) Dominant meningiomas, Ependymoma (101000) neurofibromas, eye lesions Development of multi- TSC1,TSC2 Tuberous sclerosis Giant cell Dominant system non-malignant (9q34.14,16p13.3) (191100, 613254) astrocytoma tumors Predisposition to MSH2, MLH1, Lynch syndrome gastrointestinal, Dominant GBM, other gliomas MSH6, PMS2 (120435) endometrial and other cancers Predisposition to numerous Li-Fraumeni cancers, especially breast, TP53 (17p13.1) Dominant GBM, other gliomas syndrome (151623 brain, and soft-tissue sarcoma Melanoma- Predisposition to malignant p16/CDKN2A neural system tumor Dominant melanoma and malignant Glioma (9p21.3) syndrome (155755) brain tumors Development of Acquired post-zygotic IDH1/IDH2 intraosseous benign Ollier disease mosaicism, Dominant Glioma (2q33.3/15q26.1) cartilaginous tumors, with reduced penetrance cancer predisposition 27

Some epidemiological studies have identified increased risk of developing a brain tumor

for first-degree relatives (parents, children, and full siblings) of incident cases, but due to

the rarity of glioma and further rarity of familial glioma the sample sizes for these studies

are generally small [2, 89-92]. A study of first-degree relatives (FDR) of persons with

astrocytoma in Sweden found increased cases of all brain tumors (Standardized incidence

ratio [SIR]= 2.12), and a larger increased risk of astrocytoma (SIR= 3.18) [93]. Within

the San Francisco Adult Glioma Study (SFAGS), they estimated that glioma cases had an

increased odds of having a validated family history of glioma (Odds ratio [OR]=2.3) as

compared to controls [2]. A study of the Swedish cancer registry estimated the risk of

glioma among FDR and spouses to estimate the genetic component of familial

aggregation as compared to shared environment, and found that having a FDR with a

brain tumor led to an increase in risk (SIRGBM= 2.30, and SIRLGG= 3.84) with no

increased risk in spouses. SIRs were higher for younger age at onset then lower age at

onset in persons with an FDR with LGG, but had no variation by age at onset in GBM.

This suggests that there is a genetic component to familial aggregation, though this

method does not account for shared environment in childhood. A large pooled analysis of

glioma probands from three sites showed SIRs for FDR to be approximately 2, with

increased SIRs for tumors diagnosed at younger ages [94]. Some analyses have also shown elevated risk of other non-brain cancers, including melanoma and sarcoma [92,

94].

Linkage studies have been conducted within affected ‘glioma families’ (where more than one person has been diagnosed with a glioma) but have had little success identifying high-penetrance glioma risk variants [95-98]. Two early linkage studies [97, 98] of 28

familial glioma found a single linkage peak at 15q23-q26.3 [97], but were limited by small sample size. Analysis of families collected through the GLIOGENE consortium found a significant linkage peak at 17q12-21.32, which could not be replicated in a validation set, as well as other peaks that did not reach a LOD score of 3 [95]. This region was further refined using a variable age-at-onset model to identify two separate peaks within the previously identified linkage region [96]. The initial GLIOGENE linkage analysis also found linkage peaks that did not reach a LOD score of 3 at 6p22.3,

12p13.33-12.1, 17q22-23.2, and 18q23 [95]. These regions do not overlap with those identified through genetic association studies in non-familial cases. When genotype frequencies of SNPs within these regions were compared between persons with a familial history of glioma and unaffected controls, there was a moderately significant association for 12 SNPs [99]. Additional analyses of these glioma families have found alterations in

MutS homolog 2 (MSH2) [100], and protection of telomeres protein 1 (POT1)

[101] that appear to be associated with glioma in a small number of families, but the large majority of mutations within glioma families appear to be private.

1.2.5.2 Candidate gene and genome-wide association studies in sporadic glioma

To date, there have been six GWAS conducted in glioma [7-10] and several analyses focused on candidate regions [6, 102, 103]. The first GWAS in glioma were conducted concurrently at the MD Anderson Cancer Center [8] (MDA-GWAS) and University of

California, San Francisco [9] (SFAGS-GWAS), using glioma cases recruited as part of long-running prospective glioma studies at these institutions. MDA-GWAS (which also included a set of cases and controls ascertained in the UK as part of INTERPHONE [104]

29

and the UK birth cohort [UK-GWAS]) identified five risk loci for glioma: 5p15.33

(rs2736100, TERT), 8q24.21 (rs4295627, CCDC26), 9p21.3 (rs4977756, CDKN2A-

CDKN2B), 20q13.33 (rs6010620, RTEL1) and 11q23.3 (rs498872, PHLDB1) [8].

SFAGS-GWAS identified 3 risk loci for glioma: 9p21 (rs1412829, CDKN2B), and two

independent SNPs in 20q13.3, (rs6010620 and rs4809324, RTEL1) [9]. A combined

French (FRE-GWAS) and German (GER-GWAS) group conducted a 3rd glioma GWAS, which identified two independent associations at 7p11.2 (rs11979158 and rs2252586,

EGFR) [10]. A 4th glioma GWAS was conducted by researchers at the National Cancer

Institute (GliomaScan) which contained cases and controls ascertained from 14 cohort studies, 3 case-control studies, and 1 case-only study [7]. This analysis replicated previous findings at 20q13.33 (rs6010620 and rs4809324, RTEL1), 5p15.33 (rs2736100,

TERT), 9p21.3 (rs4977756 and rs1412829, CDKN2BAS), 7p11.2 (rs11979158 and rs2252586, EGFR), 8q24.21 (rs4295627, CCDC26) and 11q23.3 (rs498872, PHLDB1),

though not all of these were statistically significant at the genome-wide level.

With the development of genotype imputation techniques, several groups attempted to better characterize the regions that had been discovered with previous GWAS. A UCSF group imputed 30 regions containing nominally significant SNPs in the previous reported

SFAGS-GWAS [9] data, and found a new variant at 3q26.2 (rs1920116, TERC) [102]. A meta-analysis of MDA-GWAS, UK-GWAS, FRE-GWAS, and GER-GWAS identified four new risk loci: one, 12q23.33 (rs3851634, POLR3B) for GBM, and three for non-

GBM at 10q25.2 (rs11196067, VTI1A), 11q23.2 (rs648044, ZBTB16), 12q21.2

(rs12230172, intergenic) and 15q24.2 (rs1801591, ETFA) [105]. An additional analysis

30

utilizing these four datasets estimated the heritability of glioma overall to be 25%, with estimates of for all forms of – 26% for GBM and 25% for non-GBM glioma [5].

Though gliomas are known to be heterogeneous, most analyses have been conducted on all glioma types pooled or with classification based on histologically assigned type and grade. Due to their rarity, glioma case cohorts are often ascertained at multiple centers over multiple years, and availability of results from molecular tests may not be available on all cases. A combined group from UCSF and the Mayo Clinic explored the 8q24.21 locus and found seven independent low frequency SNPs that were associated with glioma risk, with the most significant association at rs55705857 (CCDC26) [103]. They further examined these risk loci by tumor molecular characteristics, and found that this variant was most strongly associated with oligodendroglial tumors with mutant IDH1/IDH2

(OR=5.1, and OR=4.8, respectively). There was no significant association for IDH1/2 wild type tumors. Another analysis focused on the 8q24.21 locus in an independent set also confirmed that rs55705857 (CCDC26) was the strongest association within this region [6].

Previous glioma GWAS have been limited by methodological problems. Glioma is a rare disease, and it is difficult to recruit a significant number of cases from a single center.

Many of these studies included controls that were recruited from a second population, or were secondary analyses using controls that had been genotyped for other studies. The

Glioma International Case-Control (GICC) Study [106] was established to recruit a relatively large number of cases and controls from multiple sites using a consistent protocol. This analysis represented the first glioma GWAS with sufficient power to detect

31

variants of lower effect size in LGG. This new set was combined via meta-analysis with

SFAGS-GWAS, MDA-GWAS, GliomaScan, UK-GWAS, FRE-GWAS and GER-GWAS for a total of ~8,200 glioma cases and ~14,400 controls (Table 1-2, Figure 1-4, Figure

1-5, Figure 1-6) [4]. Overall, this analysis validated seven loci previously identified by other GWAS (5p15.33, 7p11.2, 8q24.21, 9p21.3, 11q23.3, 16q12.1, 17p13.1, and

20q13.33). This analysis also identified five new loci (1p31.3, 11q14.1, 16p13.3,

16q12.1, and 22q13.1) (Table 1-2, Figure 1-5). The largest number of new loci were identified for LGG, where eight new loci were identified (1q32.1, 1q44, 2q33.3, 3p14.1,

10q24.33, 11q21, 14q12, and 16p13.3) validated (Table 1-2, Figure 1-6). With the addition of these loci, it was estimated that currently known glioma risk loci explain 27% of familial risk for GBM, and 34% for non- GBM glioma.

Table 1-2 Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI) (Melin, et al.)

SNP RSID Associated All glioma GBM glioma Non-GBM glioma (Locus) Gene RAF P OR (95%CI) P OR (95%CI) P OR (95%CI) rs12752552 RAVER2 0.870 4.07x10-9 1.18 2.04x10-9 1.22 4.78x10-3 1.11 (1p31.3) (1.11-1.24) (1.15-1.31) (1.03-1.18) rs4252707 MDM4 0.220 2.97x10-7 1.12 0.0150 1.07 3.34x10-9 1.19 (1q32.1) (1.07-1.17) (1.01-1.13) (1.12-1.26) rs12076373 AKT3 0.837 4.97x10-4 1.09 0.8460 0.99 2.63x10-10 1.23 (1q44) (1.04-1.15) (0.94-1.06) (1.16-1.32) rs7572263 C2orf80 0.756 2.58x10-6 1.11 0.0190 1.06 2.18x10-10 1.20 (2q33.3) (1.06-1.15) (1.01-1.12) (1.13-1.26) rs11706832 LRIG1 0.456 1.06x10-5 1.08 0.1580 1.03 7.66x10-9 1.15 (3p14.1) (1.05-1.12) (0.99-1.08) (1.09-1.20) rs2736100 TERT 0.499 2.34x10-45 1.29 1.20x10-53 1.41 7.26x10-10 1.16 (5p15.33) (1.25-1.34) (1.35-1.47) (1.10-1.21) rs10069690 TERT 0.276 2.71x10-66 1.45 8.33x10-74 1.61 1.14x10-16 1.27 (5p15.33) (1.39-1.51) (1.53-1.69) (1.20-1.34) rs2252586 EGFR 0.281 1.38x10-13 1.16 7.89x10-15 1.20 1.89x10-4 1.10 (7p11.2) (1.11-1.20) (1.15-1.26) (1.05-1.16) rs11979158 EGFR 0.831 1.21x10-17 1.24 1.94x10-19 1.31 7.73x10-6 1.16 (7p11.2) (1.18-1.30) (1.24-1.39) (1.08-1.23) rs55705857 CCDC26 0.057 9.53x10-79 1.99 9.45x10-7 1.27 7.28x10-149 3.39 (8q24.21) (1.85-2.13) (1.16-1.40) (3.09-3.71) rs4977756 CDKN2B- 0.400 1.46x10-41 1.28 4.24x10-40 1.34 2.28x10-14 1.20 (9p21.3) AS1 (1.23-1.32) (1.29-1.40) (1.15-1.26) rs11598018 OBFC1 0.462 3.07x10-7 1.10 0.0103 1.06 3.39x10-8 1.14 (10q24.33) (1.06-1.14) (1.01-1.11) (1.09-1.20) rs11196067 VTI1A 0.579 3.79x10-5 1.08 0.1820 1.03 3.53x10-9 1.15 (10q25.2) (1.04-1.12) (0.99-1.08) (1.10-1.21) rs11599775 VTI1A 0.620 4.34x10-5 1.08 0.2990 1.02 3.44x10-9 1.16 (10q25.2) (1.04-1.12) (0.98-1.07) (1.10-1.22) 32

rs11233250 Intergenic 0.868 5.40x10-6 1.14 9.95x10-10 1.24 0.5920 0.98 (11q14.1) (1.08-1.21) (1.16-1.33) (0.91-1.05) rs7107785 MAML2 0.479 2.96x10-4 1.07 0.8440 1.00 3.87x10-10 1.16 (11q21) (1.03-1.11) (0.95-1.04) (1.11-1.21) rs648044 PHLDB1 0.390 3.55x10-3 1.06 0.3630 0.98 4.66x10-12 1.19 (11q23.2) (1.02-1.10) (0.93-1.03) (1.13-1.25) rs498872 PHLDB1 0.307 4.09x10-11 1.14 0.7150 1.01 8.46x10-33 1.35 (11q23.3) (1.10-1.18) (0.96-1.06) (1.28-1.41) rs1275600 Intergenic 0.595 2.67x10-4 1.07 0.8620 1.00 3.72x10-9 1.16 (12q21.2) (1.03-1.11) (0.96-1.05) (1.10-1.21) rs10131032 AKAP6 0.916 2.33x10-6 1.17 0.2470 1.05 5.07x10-11 1.33 (14q12) (1.09-1.24) (0.97-1.13) (1.22-1.44) rs1801591 ETFA 0.088 3.56x10-7 1.17 0.3430 1.04 6.36x10-13 1.33 (15q24.2) (1.10-1.24) (0.96-1.12) (1.23-1.44) rs2562152 RHBDF1 0.850 1.18x10-3 1.09 1.93x10-8 1.21 0.9480 1.00 (16p13.3) (1.04-1.15) (1.13-1.29) (0.93-1.07) rs3751667 RHBDF1 0.208 8.75x10-10 1.14 5.95x10-6 1.13 2.61x10-9 1.18 (16p13.3) (1.09-1.19) (1.07-1.19) (1.12-1.25) rs10852606 HEATR3 0.713 3.66x10-11 1.14 1.29x10-11 1.18 2.42x10-3 1.08 (16q12.1) (1.10-1.19) (1.13-1.24) (1.03-1.14) rs78378222 TP53 0.013 8.64x10-38 2.53 4.82x10-29 2.63 4.70x10-27 2.73 (17p13.1) (2.19-2.91) (2.22-3.11) (2.27-3.28) rs6010620 RTEL1 0.794 2.81x10-40 1.34 5.49x10-44 1.46 2.60x10-9 1.19 (20q13.33) (1.29-1.40) (1.38-1.54) (1.12-1.25) rs2235573 SLC16A8 0.507 8.64x10-7 1.09 1.76x10-10 1.15 0.3250 1.02 (22q13.1) (1.06-1.13) (1.10-1.20) (0.97-1.07) 95%CI: 95% Confidence interval, SNP: Single nucleotide polymorphism, RAF: Risk allele frequency

Figure 1-4. Manhattan plot of -log(p) values for all glioma (~8,200 cases and ~14,400 controls)

33

Figure 1-5 Manhattan plot of -log(p) values for glioblastoma (~4,600 cases and ~14,400 controls)

Figure 1-6 Manhattan plot of -log(p) values for non-glioblastoma glioma (~ 3,100 cases and ~14,400 controls)

Pathway approaches to germline SNP data

Each individual GWAS results in regression estimates for hundreds of thousands of

SNPs, only several hundred of which may be prioritized for further investigation. While

this process is appropriate for identifying individual loci that contribute to the

development of disease, there is likely additional information about disease risk within

results that do not meet thresholds for statistical significance. Multi-SNP methods, such

34

as gene or pathway-based approaches, can leverage this data for additional discovery in a

manner that complements single-SNP approaches [107]. There are multiple methods for

combining summary statistics to generate multi-marker tests. The most straight-forward

of these are methods that combine all SNPs within a defined gene together and generate a

gene score. These scores can also then be further collapsed into biological pathways and

a pathway-score calculated based on the included gene-scores.

There is no consensus on the best approach for calculating gene- and pathway-based scores using summary statistics generated by GWAS, and numerous methods have been developed [108-116]. These approaches are widely used in analyses of other types of biological data, most notably data, and some of these have been

modified for use with GWAS data [117]. The algorithms used and assumptions made by

each of these approaches vary, and as a result the appropriate program to use may vary.

For the purposes of the analyses contained within this dissertation, three approaches were

used for generating gene scores: Pascal, BimBam, and GATES. These were chosen based

on computational efficiency, as well as the ability of these programs to share reference

datasets.

Pascal (Pathway scoring algorithm) [109] calculates gene scores using maximum-of-chi-

squares (MOCS) and sum-of-chi-squares (SOCS) correcting for LD structure (based on a

reference set). The SOCS method implemented in Pascal is a variation of the VEGAS

(Versatile gene-based association study) [112, 113] scoring algorithm, as follows:

35

Equation 1-1

= 𝑛𝑛 2 𝑇𝑇𝑠𝑠𝑠𝑠𝑠𝑠 � 𝑧𝑧𝑖𝑖 𝑖𝑖=1 where n is the number of SNPs in a gene, and zi is the z score for the i-th SNP. Under the null hypothesis, the Z-scores of n SNPs will be distributed as multivariate normal. Tsum is then distributed according to the weighted sum of distributed random variables, as 2 𝑖𝑖 follows: 𝜒𝜒

Equation 1-2

= 2 𝑠𝑠𝑠𝑠𝑠𝑠 𝑖𝑖 1 𝑇𝑇 �𝑖𝑖𝜆𝜆 𝜒𝜒 where λi is the i-th eigenvector of Σ. Unlike the other algorithms discussed in this chapter,

Pascal does not assume genes within a pathway are independent. For genes that are in LD on a chromosome, these are considered to be a ‘fusion gene’ and have only one gene score calculated. The pathway score is then calculated using both independent and fusion genes. A parameter free enrichment strategy is used to calculate pathway scores using either a chi-squared method (gene score p-values are ranked and transformed to a uniform distribution, these values are then transformed by a chi-square quantile function, and summed) or an empirical sampling method (gene scores are transformed with chi- square quantile function and summed, then Monte Carlo estimates of the p-values are obtained by sampling random sets of the same size).

Bimbam [110] (as implemented in FAST using summary statistics [118]) is a Bayesian regression approach. This method calculates an average Bayes Factor for all K possible models within a gene, where K is the number of SNPs, as follows: 36

Equation 1-3

= | | . ′ −1 0 5 −1 𝑌𝑌 𝑌𝑌 − 𝐵𝐵′Ω 𝐵𝐵 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑁𝑁 Ω 𝜎𝜎𝑎𝑎 � ′ 3 � 𝑌𝑌 𝑌𝑌 − 𝑁𝑁𝑌𝑌 where the number of individuals is N, Y is the phenotype column vector, Y′ is the scalar

phenotype mean; the matrix τ is diagonal with terms , where is an adjustable −2 −2 𝑎𝑎 𝑎𝑎 parameter representing the typical additive variance for𝜎𝜎 a SNP; 𝜎𝜎= (Τ + ) ; = −1 is a 2-component column vector of regression coefficients.Ω The model𝑋𝑋 ′then𝑋𝑋 uses𝐵𝐵 a ′ LaplaceΩ𝑋𝑋 𝑌𝑌 method to estimate posterior distributions of the model’s parameters, and distribution models are obtained using the Fletcher-Reeves conjugate gradient algorithm.

P-values are then generated by permutation.

GATES (Gene-based association test that uses extended Simes procedure) [111]

(implemented in FAST [118]) is an extension of the Simes procedure [119] that combines

SNP-based p-values, using the p-value correlation matrix to estimate the number of

independent SNPs within the gene. The test statistic is estimated as follows, where me is

the effective number of independent tests within a gene, and me(j) is the effective number

of independent p-values within the top j SNPs:

Equation 1-4

( ) 𝑚𝑚𝑒𝑒𝑝𝑝𝑗𝑗 𝑚𝑚𝑚𝑚𝑚𝑚 � � 𝑚𝑚𝑒𝑒 𝑗𝑗 The resulting gene-based p-values approximate a uniform distribution (0, 1), and is the

representative p-value for the gene.

37

Guiding discovery by focusing on disease disparities

Applying genome-wide associations study methods to specific populations with varying

incidence of disease can increase power for detection of variants that vary in effect size

between demographic groups. Conducting research in these populations that vary in

incidence also provides an opportunity to discover variants that may vary in effect size

within populations and not rise to the level of genome-wide statistical significance as a result. Focusing on population segments with varying incidence of glioma and glioma subtypes provides opportunities to increase power for detection, as well as elucidate risk loci that may have varying effect sizes between groups.

Expanding genomics research to include results that are specifically relevant to women and non-European populations is critical to equity in personalized health care [120-123].

Most GWAS of complex disease have not investigated sex-specific effects, though it is widely known that there are differences in incidence and outcomes by sex in most complex diseases [124, 125]. There are also significant differences in incidence and outcomes of most complex diseases by ancestry, but persons of non-European ancestry have been largely systematically excluded from GWAS [121, 123]. Previous glioma

GWAS have not assessed sex-specific effects, or scanned for variants on the sex chromosomes. Variant effect on age at onset has been examined for a small number of previous GWAS hits, but these scans have targeted specific age groups that have varying incidence of specific histologic types.

38

1.4.1 Sex-specific analyses in genome-wide association studies

Historically, sex differences have often been underreported in scientific research.

Biomedical research has recently come under criticism for disproportionately researching

non-sex-specific conditions in males, which has resulted in limited study of sex-specific

factors potentially implicated in disease processes [126]. Recent studies have

demonstrated in multiple species that sex differences can be the result of not only sex-

hormone levels, but also genetic variation [127]. Nearly all diseases have sex specific

differences in incidence and prevalence. These differences can result from sex-specific variation in exposure to environmental hazards, the biological result of sex hormone exposures, immunological differences between males and females, or sex-specific genetic variation [127, 128]. While there is no sex difference in the autosomal DNA sequence between males and females, there is evidence that sex-specific regulatory variation contributes to biological sex difference [129-131] and that sex-bias in gene expression may vary by tissue type [132, 133].

Despite the significant sex difference in incidence exist in many cancers and other complex diseases, these differences are not often the focus of genetic association studies

[125, 127, 134, 135]. Though the sex chromosomes are critical to many sex dimorphic traits, these chromosomes are often excluded from GWAS [136]. While previous generations of genotyping arrays have often had poor coverage on the sex chromosomes, modern arrays now include thousands of SNPs on the X chromosome. Analysis of the sex chromosomes requires different methods than those for the autosomes, as these data cannot be imputed using the same methods or analyzed using the same models. As a result there are significantly fewer associations identified by all previous GWAS for the 39

X chromosome as compared to autosomal chromosomes that contain less genetic material

[137].

Sex-specific analyses have the potential to reveal genetic sources of sexual dimorphism

in risk, as well as to increase power in the case of sex-specific loci [124, 125]. There are two primary approaches to incorporating sex into GWAS: sex-stratified analyses, case- only analyses (using sex as the response variable), or incorporation of sex as an interaction term (e.g. a gene by environment [GxE] approach). Use of GxE models often requires that the effect of sex on the association reach a stringent p-value to correct for multiple testing, and as a result these methods may miss sex effects that are not very large

[135]. Sex-stratified analyses decrease sample size and therefore power, but these analyses successfully identified sex-specific associations that may not have been identified using interaction models [138, 139].

1.4.2 Age-at-onset in genome-wide association studies

Cancer is a disease caused by a dysregulated and unstable genome, which is often (at least in part) accumulated over the lifespan. Cancer, along with many other chronic complex diseases, can to some extent be thought of as a disease of aging [140, 141]. A previous meta-analysis of GWAS in age-associated conditions found a shared risk loci for many of these conditions located on 1p31.3 (five diseases), 5p15.33 (five diseases) and 9p21.3 (ten diseases), all of which have been previous associated with glioma [142].

Analyses in other complex diseases have found that the effect size and direction of effect of variants detected via GWAS can vary based on age [143]. In general, the odds ratios associated with individual germline genetic variants are higher in early onset traits,

40

including childhood cancers [144, 145]. GWAS have also been able to identify specific

genetic variants associated with early age at onset in prostate and breast cancer [146,

147].

Previous GWAS in glioma have demonstrated that the effect of risk variants varies significantly between GBM and lower grade gliomas [4, 7-9, 105, 148, 149]. Previous

analyses have adjusted models for the effect of age at diagnosis, but have not examined

the data for age-specific effects of risk loci. Variants in telomere-associated genes in

particular (especially TERT) have strong age-associated effects [102, 148]. Other identified glioma loci are in telomere-related genes (10q24.33, 20q13.33), and a previous analysis has associated variants in genes with glioma risk, as well as age at diagnosis for glioma [148, 150].

Study population

1.5.1 Study cohorts

This study was approved locally by the institutional review board (IRB) at University

Hospitals Cleveland Medical Center and by each participating study sites IRB. In this

study, data were combined from four prior glioma GWAS: Glioma International Case-

Control Study (GICC), San Francisco Adult Glioma Study GWAS (SFAGS-GWAS),

MD Anderson Glioma GWAS (MDA-GWAS), and National Cancer Institute’s

GliomaScan [4, 7-9, 106]. The SFAGS-GWAS includes controls from the Illumina

iControls dataset, and MDA-GWAS includes controls from Cancer Genetic Markers of

Susceptibility (CGEMS) breast and prostate studies [151-153]. Details of data collection

and classification are available in previous publications [4, 7-9, 106]. Details of sample 41

size, inclusion and exclusion criteria for each aim are included in subsequent chapters

(Chapter 2, Section 2.3.1 and Chapter 4, Section 4.3.1).

1.5.2 Genotyping, quality control and imputation

GICC cases and controls were genotyped on the Illumina Oncoarray [154], which

included 37,000 beadchips customized to include previously-identified glioma-specific candidate SNPs. SFAGS-GWAS cases and some controls were genotyped on Illumina’s

HumanCNV370-Duo BeadChip, and the remaining controls were genotyped on the

Illumina HumanHap300 and HumanHap550. MDA-GWAS cases were genotyped on the

Illumina HumanHap610 and controls were genotyped using the Illumina HumanHap550

(CGEMS breast [151, 153]) or HumanHap300 (CGEMS prostate [152]). GliomaScan

cases were genotyped on the Illumina 660W, while controls were selected from cohort

studies and were genotyped on Illumina 370D, 550K, 610Q, or 660W (See Rajaraman et

al. for specific details of genotyping [7]). Details of DNA collection and processing are available in previous publications [4, 7-9].

All datasets have previously undergone standard GWAS QC procedures and have been filtered based on call rate (individual and per SNP), sex mismatch between reported and genotyped sex, violation of Hardy-Weinberg equilibrium, and MAF<0.01. GICC recruited individuals from MD Anderson Cancer Center and University of California, San

Francisco (UCSF) (recruiting sites for MDA-GWAS and SFAGS-GWAS, respectively), so all datasets were merged and cryptic relatedness was assessed to remove related individuals as well as duplicates. All datasets were then imputed using SHAPEIT [155] and IMPUTE [156] using a merged reference panel consisting of data from the 1,000

42

genomes project (phase 3) [81] and the UK10K [82, 83] . Imputation was conducted

separately for each study. For SFAG-GWAS and MDA-GWAS where cases and controls

were genotyped on different platforms, the data were pruned to a common set of SNPs

between cases and controls before imputation. Genotypes were aligned to the positive

strand in both imputation and genotyping. For X chromosome imputation, males and

females were imputed separately.

Estimated global European ancestry was estimated using FastPop [151] developed by the

GAMEON consortium, which utilizes a principal components-based method that uses the

three HapMap continental ancestry populations (AFR, EUR, and EAS) as references.

All Individuals with a call rate (CR) <99% were excluded, as well as all individuals who

were of non-European ancestry (<80% estimated European ancestry using the FastPop

[157] procedure developed by the GAMEON consortium). All apparent first-degree

relative pairs (identified using estimated identity by descent [IBD]≥.5) were removed, for

example, the control was removed from a case-control pair; otherwise, the individual

with the lower call rate was excluded. SNPs with a call rate <95% were excluded as were

those with a minor allele frequency (MAF)<0.01, or displaying significant deviation from

Hardy-Weinberg equilibrium (HWE) (p<1x10-5). Additional details of quality control procedures have been previously described in Melin, et al. [4]. All datasets were imputed separately using SHAPEIT and IMPUTE using a merged reference panel consisting of data from the 1,000 genomes project and the UK10K [81-83, 155, 156]_ENREF_26.

43

Specific Aims

Problem statement: Glioma is the most commonly occurring malignant brain tumor in

the US, with an average annual age-adjusted incidence of 6.0 per 100,000 from 2009-

2013, though incidence varies significantly by sex, race, and age [1]. Though these

tumors are rare, they cause relatively significant morbidity and mortality [1, 158]. There

are no well validated risk factors for these tumors that explain a large proportion of cases,

and the vast majority of cases are sporadic [2, 3]. To date, glioma GWAS have found 25

validated risk loci in European ancestry populations and have not specifically

investigated the potential genetic sources of risk variation by sex, or age [4, 7-9, 106].

Estimates of the heritability of these tumors are ~25% (with previously discovered risk loci for these tumors accounting for approximately 30% of heritable risk) suggesting that there are both undiscovered behavioral and/or environmental risk factors, and undiscovered sources of genetic risk [4, 5]. The overall goal of this project is to utilize these age- and sex-specific incidence differences to increase power for detection of variants that may have varying effects by age and sex. The overall hypothesis is population variation in risk and incidence is to some extent the result of genetic variation between persons diagnosed with glioma by age and sex. This project leverages four

previously conducted glioma GWAS datasets: GICC, MDA-GWAS, SFAGS-GWAS,

and GliomaScan for a combined sample of 8,037 European ancestry glioma cases and

10,686 European ancestry controls.

The primary aims of this project are as follows:

Aim 1a: To identify sex-specific risk loci for glioma (Chapter 2).

44

Previous analyses have examined the impact of estrogen exposure as a potential environmental risk factor for these tumors, but have found mixed results with low effect sizes [53, 54, 65, 159]. This suggests that there may be differences in male and female genetic architecture of the glioma subtypes more common in adulthood, including GBM and grade II-III lower grade gliomas. The hypothesis for this aim is that Glioma etiology varies by sex, and the effect size of some risk variants vary between males and females.

Aim 1b: To identify genes and pathways associated with glioma, and identify genes and pathways that may differ by sex (Chapter 3).

No single variant has been found that explains a large proportion of glioma cases, and genetic risk is likely polygenic. It is likely that risk loci with the largest effect sizes and high statistical significance have been identified, and increasing sample sizes are inefficient and it is not probable that they will identify new single loci with significant effect on glioma risk. Single-SNP tests may not be appropriate for additional loci discovery. Each individual

GWAS results in regression estimates for hundreds of thousands of single nucleotide polymorphisms (SNPs), only several hundred of which may be prioritized for further investigation. Multi-SNP methods, such as gene or pathway-based approaches, can allow for additional discovery in a manner that complements single-SNP approaches, while substantially reducing the multiple testing burden associated with GWAS. A recent sex- stratified analysis has identified glioma risk loci that differ by sex and further analyses using gene- and pathway-based approaches may further elucidate sex differences in genetic risk for glioma. The hypothesis for this ai is that genetic architecture of glioma risk varies by sex, and multi-marker tests can both increase power to identify sources of genetic risk

45

that have lower effect sizes and generate hypotheses about biological mechanisms for this

varying architecture.

Aim 2: To identify variants associated with age at diagnosis in GBM (Chapter 4).

Patterns of overall glioma incidence, as well as histology specific incidences, vary by

age. Many previously discovered genetic risk loci have histology-specific associations [4,

160, 161], but the effect of these risk loci on age at diagnosis has not been systematically

explored. An age-specific analysis could potentially increase power to detect new

variants that may be associated with younger age at diagnosis, and more accurately

quantify the age-specific effect sizes of previously discovered variants [144]. This

analysis aims to explore the effect of genetic risk variants on age at diagnosis in GBM,

the most common type of glioma. The hypothesis for this aim is that individuals

diagnosed with GBM at younger-than-median ages have a greater burden of genetic risk as compared to those that are diagnosed at older ages.

Conducting research in populations that vary in incidence also provides an opportunity to discover variants that may vary in effect size within populations and not rise to the level of genome-wide statistical significance as a result. Focusing on population segments with varying incidence of glioma and glioma subtypes could potentially increase power for detection, as well as elucidate risk loci that may have varying effect sizes between groups. Discovery of variants that differ between demographic groups in the population can contribute to development of population-specific risk calculations, and contribute to the development of overall risk models.

46

Chapter 2 – Identifying sex-specific risk loci for glioma

Abstract

Incidence of glioma is approximately 50% higher in males. Previous analyses have examined exposures related to sex hormones in women as potential protective factors for these tumors, with inconsistent results. Previous glioma GWAS have not stratified by sex. Sex-specific genetic effects were assessed in autosomal SNPs and sex chromosome variants for all glioma and GBM and non-GBM patients using data from four previous glioma GWAS. Datasets were analyzed using sex-stratified logistic regression models and combined using meta-analysis. There were 4,831 male cases, 5,216 male controls,

3,206 female cases and 5,470 female controls. A significant association was detected at rs11979158 (7p11.2) in males only. Association at rs55705857 (8q24.21) was stronger in females than in males. A large region on 3p21.31 was identified as having significant association in females only. Sex-specific differences in genetic risk were identified, however these differences in effect of risk variants do not fully explain the observed incidence difference in glioma by sex.

Background

Glioma is the most common type of primary malignant brain tumor in the US, with an average annual age-adjusted incidence rate of 6.0 per 100,000 population [1]. Population- based studies consistently demonstrate that incidence of gliomas varies significantly by sex. Most glioma histologies occur with a 30-50% higher incidence in males, and this male preponderance of glial tumors increases with age in adult glioma (Figure 2-1) [1].

47

Several studies have attempted to estimate the influence of lifetime estrogen and progestogen exposure on glioma risk in women [53, 65]. Results of these analyses have been mixed, and it is not possible to conclusively determine the impact of hormone exposure on glioma risk. Male predominance in incidence occurs broadly across multiple cancer types and is also evident in cancers that occur in pre-pubertal children and in post- menopausal adults [158, 162]. Together these observations suggest that other mechanisms in addition to acute sex hormone actions must be identified to account for the magnitude of sex difference in glioma incidence.

Figure 2-1 Average Annual Incidence of all glioma, glioblastoma and lower grade glioma by sex and age at diagnosis (CBTRUS 2010-2014)

Though sex differences exist in glioma incidence, sex differences have not been interrogated in previous glioma GWAS. Sex-specific analyses have the potential to reveal genetic sources of sexual dimorphism in risk, as well as to increase power for detection of loci where effect size or direction may vary by sex [124, 125, 135, 138, 139]. The aim of this analysis is to investigate potential sex-specific sources of genetic risk for glioma that may contribute to observed sex-specific incidence differences.

48

Methods

2.3.1 Study population.

In this study, data was combined from four prior glioma GWAS: GICC, SFAGS-GWAS,

MDA-GWAS, and GliomaScan (Figure 2-2). Details of data collection and classification are available in Chapter 1, Section 1.5.1.

Figure 2-2. Study Schematic for analyses of autosomal SNPs

2.3.2 Genotyping, quality control and imputation

Details of DNA collection and processing for the four GWAS datasets are available in

Chapter 1, Section 1.5.1. TCGA cases were genotyped on the Affymetrix Genomewide

6.0 array using DNA extracted from whole blood (see previous manuscript for details of

DNA processing [163, 164]), and underwent standard GWAS QC, and duplicate and related individuals within datasets have been excluded [4]. Ancestry outliers were

identified in TCGA using principal components analysis in plink 1.9 [165]. Resulting

files were imputed using Eagle 2 and Minimac3 as implemented on the Michigan

Imputation Server (https://imputationserver.sph.umich.edu) using the Haplotype

Reference Consortium Version r1.1 2016 as a reference panel [84, 166, 167]. Somatic 49

characterization of TCGA cases was obtained from the final dataset used for the TCGA pan-glioma analysis [168], and classification schemes were adopted from Eckel-Passow, et al. [169] and Ceccarelli, et al. [168].

2.3.3 Statistical methods

2.3.3.1 Sex-stratified scan of the autosomal chromosomes

Data were analyzed using sex-stratified logistic regression models in SNPTEST for all

SNPs on autosomal chromosomes within 500kb of previously identified risk loci, and/or those found to be nominally significant (p<5x10-4) in a previous meta-analysis (Figure

2-2) [4, 170]. Sex-specific betas (βM and βF), standard errors (SEM and SEF), and p-values

(pM and pF) were generated using sex-stratified logistic regression models that were

adjusted for number of principal components that significantly differed between cases

and controls within each study. Estimate of difference (βD), and standard error of

difference (SED) were estimated separated for each dataset, using the sex-specific effect

estimates (Equation 2-1) and standard errors (Equation 2-2) from the sex-stratified

models as follows:

Equation 2-1

=

𝛽𝛽𝐷𝐷 𝛽𝛽𝑀𝑀 − 𝛽𝛽𝐹𝐹 Equation 2-2

= + 2 2 𝑆𝑆𝑆𝑆𝐷𝐷 �𝑆𝑆𝑆𝑆𝑀𝑀 𝑆𝑆𝑆𝑆𝐹𝐹 The difference between the groups was then tested using a z test [171, 172]. Sex-stratified results and differences estimates from the four studies were separately combined via 50

inverse-variance weighted fixed effects meta-analysis in META [173]. See Figure 2-2

for schematic of autosomal analysis methods. Effects were considered statistically

significant at p<2.8x10-6 (Bonferroni correction for 16,000 tests).

2.3.3.2 Case-only analysis for significant SNPs associated with sex.

In order to identify individual SNPs that may explain vary by sex in cases only, a case-only

agnostic scan was conducted of variants that were nominally significant in the previously

conducted overall meta-analysis (p<5x10-4) using a logistic regression assuming an additive genetic model to estimate beta, standard error, and p-values [170], using sex as a binary outcome where males are coded 0, and females are coded as 1. Analyses will be performed separately for each dataset and then combined via inverse-variance weighted meta-analysis.

2.3.3.3 Perform agnostic scan of the sex chromosome for glioma risk variants.

X and Y chromosome data were available from GICC set only. Males and females were separately imputed for the X chromosome using the previously described merged reference panel. X chromosomes were analyzed using logistic regression model in SNPTEST module

‘newml’ assuming complete inactivation of one allele in females, and males are treated as homozygous females (Figure 2-3). The ‘newml’ method controls for genotype uncertainty using multiple Newton-Raphson iterations to assess parameters for missing data likelihood.

51

Figure 2-3. Study Schematic for analyses of sex chromosome SNPs

2.3.3.4 Analysis of TCGA germline and somatic data

Only newly diagnosed cases from TCGA GBM and LGG with no neo-adjuvant treatment

or prior cancer were used. Demographic characteristics, molecular classification and

somatic alterations data was obtained from Ceccarelli, et al. [168]. Chi-square tests were used to compare the frequency of somatic alterations between age groups. SNPs found to be nominally significant (p<5x10-4) in a previous eight-study meta-analysis [4], with imputation quality ≥ 0.7 were identified within the TCGA genotype data and D’ and r2

values in CEU were used to select proxy SNPs [174]. Using these SNPs, a case-only

analysis using sex as a binary phenotype was conducted using logistic regression in

SNPTEST assuming an additive model to estimate beta, standard error, and p-values

[170]. Results were considered significant at p<0.003 (Bonferroni correction for 15 tests).

2.3.3.5 Calculation of unweighted genetic risk scores

In order to estimate the cumulative effects of significant variants by sex, histology-

specific unweighted risk scores were calculated using the SNPs found to be significantly

52

associated with each outcome. Data from all four studies was merged, and any imputed genotypes with INFO≥.8 were converted to hard calls. An overall unweighted risk score

(URS) was generated using the sum of risk alleles at rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440. As risk alleles are known to have histology specific associations [4], histology-specific scores were generated for GBM and non-GBM using only the SNPs found to have a significant association with each histology. GBM-specific URS (URS-G) was calculated by summing the number of risk alleles at rs9841110, rs10069690, rs11979158, rs634537, rs78378222, and rs2297440. Non-GBM-specific (URS-N) specific URS was calculated by summing the number of risk alleles at rs10069690, rs55705857, rs634537, rs12803321, rs78378222, and rs2297440. Unweighted risk scores

(URS) were calculated by summing all risk alleles for each individual. Differences in median scores were tested between groups using Wilcoxon rank sum tests. Scores were compared against the median score for each set (URS: 10, URS-GBM: six alleles, URS-

NGBM: four alleles). Odds ratios and 95% confidence intervals for each level of the score using sex-stratified logistic regression adjusted for age at diagnosis (for controls where only an age range was available, the mean value of the range was used), where each score was compared to the median score within the entire population as described in

Shete et al. [8].

2.3.3.6 Calculation of trait variance explained by SNPs with sex-specific effects

In order to determine whether the identified SNPs with sex-specific effects more accurate estimate odds of glioma than sex alone, logistic regression models were used to estimate

53

odds of all glioma, GBM, and non-GBM glioma based on sex using the GICC data only.

Proportion of variance in odds of glioma explained by sex-specific SNPs was calculated using R2 estimated using the log likelihood of the null model (sex, age at diagnosis, and the first two principal components only) and the full model (including identified SNPs, rs9841110, rs11979158, rs55705857) [175], calculated as follows:

Equation 2-3

log( ) = 1 log( ) 2 𝐿𝐿𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑅𝑅 − 𝐿𝐿𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 Proportion of variance explained was also calculated separately by sex for each histology

(null model adjusted for age at diagnosis, and the first two principal components only).

Results

There were 4,831 male cases, 5,176 male controls, 3,206 female cases, and 5,410 female

controls (Table 2-1). A slightly larger proportion of male cases were GBM (58.7% of

male cases vs 52.5% of female cases). Controls were slightly older than cases. GBM

cases had a higher mean age than non-GBM cases, which is consistent with known

incidence patterns of these tumors. Male and female cases within histology groups had

similar age at diagnosis. The proportion of non-GBM cases varied by study due to

differing recruitment patterns and study objectives (see original publications for details of

recruitment patterns and inclusion criteria [4, 7-9, 106]).

54

Table 2-1. Population characteristics by study and sex

Males Females Characteristic Study Cases Controls Cases Controls N Total 4,831 5,176 3,206 5,410 GICCa 2,733 1,868 1,831 1,397 SFAGS-GWASb 440 749 237 1,611 MDA-GWASc 714 1,094 429 1,142 GliomaScand 944 1,465 709 1,260 Mean Age (SD) Total 52.5 (14.5) 58.2 (15.2) 51.8 (14.9) 54.7 (14.5) GICC 52.5 (14.3) 56.1 (13.4) 51.3 (14.6) 53.4 (14.3) SFAGS-GWAS 53.8 (13.0) 50.6 (14.8) 53.5 (14.0) 49.3 (13.2) MDA-GWAS 47.1 (13.0) Modal age 47.7 (13.9) Modal age group: 60-69e group: 65-69f GliomaScan 56.0 (15.5) 69.3 (12.7) 55.1 (15.7) 64.0 (15.4) GBM (% of total)g Total 2,835 (58.7%) -- 1,682 (52.5%) -- GICC 1,575 (57.6%) -- 885 (48.3%) -- SFAGS-GWAS 333 (75.7%) -- 178 (75.1%) -- MDA-GWAS 397 (55.6%) -- 246 (57.3%) -- GliomaScan 530 (56.1%) -- 373 (52.6%) -- GBM - Mean Age Total 57.3 (12.0) -- 57.8 (12.1) -- (SD) GICC 57.7 (11.4) -- 57.8 (11.6) -- SFAGS-GWAS 56.4 (11.5) -- 56.2 (12.3) -- MDA-GWAS 52.0 (11.7) -- 53.7 (11.3) -- GliomaScan 60.4 (13.0) -- 61.4 (12.5) -- Non-GBM (% of Total 1,716 (35.5%) -- 1,320 (41.2%) -- total) g GICC 1,036 (37.9%) -- 862 (47.1%) -- SFAGS-GWAS 107 (24.3%) -- 59 (24.9%) -- MDA-GWAS 317 (44.4%) -- 183 (42.7%) -- GliomaScan 256 (27.1%) -- 216 (30.5%) -- Non-GBM - Mean Total 44.3 (14.4) -- 43.9 (14.3) -- Age (SD) GICC 44.7 (14.6) -- 44.6 (14.2) -- SFAGS-GWAS 45.7 (14.2) -- 45.4 (15.8) -- MDA-GWAS 41.0 (11.9) -- 39.6 (12.9) -- GliomaScan 46.3 (15.5) -- 44.4 (15.2) -- Astrocytoma (Non- Total 787 (45.9%) -- 585 (44.3%) -- GBM) (WHO GICC 494 (47.7%) -- 400 (46.4%) -- grade II-III) (% of MDA-GWAS 155 (48.9%( -- 88 (48.1%) -- g Non-GBM) GliomaScan 138 (53.9%) -- 97 (44.9%) -- Astrocytoma (Non- Total 44.4 (14.8) -- 44.0 (15.0) -- GBM) (WHO GICC 44.6 (15.1) -- 43.7 (14.6) -- grade II-III) – MDA-GWAS 41.0 (12.1) -- 40.0 (14.4) -- Mean Age (SD) GliomaScan 47.4 (15.5) -- 48.6 (16.4) -- Oligodendroglioma Total 483 (28.1%) -- 399 ( 30.2%) -- (WHO grade II-III) GICC 310 (29.9%) -- 266 (30.9%) -- (% of Non-GBM) g MDA-GWAS 95 (30.0%) -- 54 (29.5%) -- GliomaScan 78 ( 30.5%) -- 79 (36.6%) -- Oligodendroglioma Total 44.6 (13.0) -- 44.4 (12.7) -- (WHO grade II-III) GICC 45.4 (12.9) -- 45.9 (12.6) -- – Mean Age (SD) MDA-GWAS 40.1 (11.3) -- 39.0 (10.4) -- GliomaScan 47.1 (13.7) -- 43.1 (13.2) -- a. Data from Glioma International Case-Control Study (GICC; Melin, et al. [4]). b. Data from San Francisco Adult Glioma Study GWAS (SFAGS-GWAS; Wrensch, et al. [9]). c. data from MD Anderson Cancer Center GWAS (MDA-GWAS; Shete, et al. [8]). d. Data from the National Cancer Institute’s GliomaScan (GliomaScan; Rajaraman, et al. [7]). e. Data from CGEMS prostate study (Yeager et al. [152]). Continuous age is not available, age distribution is as follows 50-59: 12.3%, 60-69: 56.7%, 70-79: 30.7%, 80-89: 0.3%. f. Data from CGEMS breast study (Hunter et al. [153]). Continuous age is not available, age distribution is as follows: 0- 54: 4.3%, 55-59: 15.0%, 60-64: 23.6%, 65-69: 27.5%, 70-74: 19.0%, 75-99: 10.7%; g. Histology information not available for all cases and frequencies may not add to 100%.

55

2.4.1 Previously discovered glioma risk regions

There were 5,934 SNPs within 500kb of 25 previously discovered glioma risk loci with

IMPUTE2 information score (INFO)>0.7 and MAF>0.01 that were previously found to have at least a nominal (p<5x10-4) association with glioma [4], and results were considered significant at p<2.8x10-6 level (adjusted for 6,000 tests in each of three

histologies [18,000 tests], see Figure 2-2 for schematic of study design). Among the 25

-6 previously validated glioma risk loci, nine loci contained ten SNPs with pM<2.8x10

-6 and/or pF<2.8x10 in any histology: 1p31.3 (RAVER2), 5p15.33 (TERT), 7p11.2 (EGFR, two independent loci), 8q24.21 (intergenic region near MYC), 9p21.3 (CDKN2B-AS1),

11q23.3 (PHLDB1), 16p13.3 (RHBDF1), 17p13.1 (TP53), and 20q13.33 (RTEL1) (Table

2-2). ORM and ORF were similar in the majority of these loci.

For one of two independent loci at 7p11.2 (rs11979158), there was a significant

-12 association only in males for all glioma (ORM= 1.33 [95%CI=1.23-1.44], pM=4.87x10 )

-12) and GBM (ORM=1.40 [95%CI=1.28-1.54], pM=1.26x10 but the sex differences did not

meet the significance threshold (overall pD=0.0055, and GBM pD=0.1184) (Figure 2-4,

Table 2-2, see Table 2-1 for sample sizes).

Figure 2-4 Sex-specific odds ratios overall and by histology grouping, 95% CI and p-values for selected previous GWAS hits and 3p21.31 (rs9841110) for all glioma, GBM, and non-GBM

56

Table 2-2. Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI), overall and stratified by sex Overall Males Females SNP (Locus) Histology P OR (95% CI) PM ORM (95% CI) PF ORF (95% CI) PD rs12752552 All glioma 2.30x10-9 1.23 (1.15-1.31) 1.40x10-6 1.25 (1.14-1.37) 3.22x10-4 1.21 (1.09-1.34) 0.7280 (1p31.3) GBM 2.85x10-8 1.25 (1.16-1.35) 3.27x10-6 1.28 (1.15-1.42) 8.41x10-4 1.24 (1.09-1.41) 0.7535 Non-GBM 3.55x10-4 1.18 (1.08-1.30) 0.0235 1.15 (1.02-1.30) 0.0036 1.23 (1.07-1.42) 0.4252 rs9841110 All glioma 1.20x10-4 1.10 (1.05-1.15) 0.5885 1.02 (0.96-1.08) 5.55x10-8 1.22 (1.14-1.32) 1.77x10-4 (3p21.31) GBM 6.16x10-5 1.12 (1.06-1.18) 0.3429 1.04 (0.96-1.11) 1.44x10-7 1.27 (1.16-1.38) 6.04x10-4 Non-GBM 0.3196 1.03 (0.97-1.10) 0.4816 0.97 (0.89-1.06) 0.0160 1.13 (1.02-1.24) 0.0186 rs10069690 All glioma 5.38x10-51 1.48 (1.41-1.56) 7.58x10-31 1.49 (1.39-1.60) 4.88x10-20 1.45 (1.34-1.57) 0.5688 (5p15.33) GBM 1.40x10-57 1.63 (1.54-1.74) 3.38x10-35 1.64 (1.52-1.78) 6.29x10-22 1.60 (1.45-1.76) 0.7049 Non-GBM 4.53x10-12 1.29 (1.20-1.39) 1.20x10-6 1.27 (1.15-1.40) 1.67x10-6 1.31 (1.17-1.46) 0.7036 rs75061358 All glioma 1.24x10-19 1.44 (1.33-1.55) 6.93x10-12 1.43 (1.29-1.59) 1.71x10-9 1.46 (1.29-1.66) 0.8114 (7p11.2) GBM 7.75x10-26 1.64 (1.50-1.80) 2.66x10-16 1.65 (1.46-1.86) 1.16x10-11 1.68 (1.45-1.96) 0.8211 Non-GBM 1.34x10-4 1.25 (1.11-1.40) 0.0079 1.23 (1.06-1.43) 0.0129 1.25 (1.05-1.49) 0.9246 rs11979158 All glioma 6.57x10-12 1.23 (1.16-1.31) 4.87x10-12 1.33 (1.23-1.44) 0.0187 1.12 (1.02-1.22) 0.0055 (7p11.2) GBM 4.59x10-15 1.33 (1.24-1.42) 1.26x10-12 1.40 (1.28-1.54) 1.33x10-4 1.24 (1.11-1.39) 0.1184 Non-GBM 0.0022 1.14 (1.05-1.23) 2.74x10-5 1.27 (1.13-1.41) 0.9014 0.99 (0.88-1.12) 0.0034 rs55705857 All glioma 2.50x10-48 1.90 (1.75-2.07) 1.09x10-14 1.56 (1.40-1.75) 1.22x10-39 2.45 (2.14-2.80) 3.46x10-7 (8q24.21) GBM 6.66x10-7 1.33 (1.19-1.49) 0.0344 1.17 (1.01-1.34) 4.16x10-7 1.61 (1.34-1.94) 0.0066 Non-GBM 3.34x10-96 3.45 (3.07-3.87) 8.13x10-36 2.66 (2.28-3.10) 1.85x10-65 4.71 (3.94-5.63) 8.44x10-7 rs634537 All glioma 6.14x10-34 1.31 (1.26-1.37) 2.37x10-21 1.33 (1.25-1.41) 6.38x10-14 1.30 (1.21-1.39) 0.6496 (9p21.3) GBM 2.25x10-35 1.39 (1.32-1.46) 1.00x10-20 1.38 (1.29-1.48) 1.92x10-16 1.41 (1.30-1.53) 0.6544 Non-GBM 1.94x10-10 1.22 (1.15-1.30) 2.63x10-8 1.26 (1.16-1.37) 4.88x10-4 1.18 (1.08-1.30) 0.3131 rs12803321 All glioma 1.23x10-6 1.13 (1.07-1.18) 3.96x10-4 1.12 (1.05-1.19) 8.49x10-6 1.18 (1.10-1.26) 0.2680 (11q23.3) GBM 0.9142 1.00 (0.94-1.05) 0.4497 0.97 (0.91-1.04) 0.6463 1.02 (0.94-1.11) 0.3667 Non-GBM 3.29x10-26 1.42 (1.33-1.51) 1.82x10-14 1.41 (1.29-1.53) 8.88x10-13 1.43 (1.30-1.57) 0.7207 rs3751667 All glioma 2.33x10-6 1.13 (1.08-1.19) 2.98x10-6 1.18 (1.10-1.26) 0.0297 1.09 (1.01-1.19) 0.1779 (16p13.3) GBM 2.53x10-4 1.12 (1.05-1.19) 2.22x10-4 1.16 (1.07-1.26) 0.1130 1.08 (0.98-1.19) 0.2729 Non-GBM 6.69x10-7 1.20 (1.12-1.29) 2.62x10-6 1.26 (1.14-1.38) 0.0060 1.17 (1.05-1.31) 0.3241 rs78378222 All glioma 1.91x10-29 2.47 (2.11-2.89) 3.36x10-17 2.41 (1.97-2.96) 1.75x10-12 2.43 (1.90-3.12) 0.8483 (17p13.1) GBM 4.68x10-23 2.70 (2.22-3.29) 1.27x10-14 2.65 (2.07-3.40) 2.28x10-9 2.67 (1.93-3.68) 0.8731 Non-GBM 3.44x10-18 2.82 (2.23-3.56) 1.10x10-10 2.79 (2.04-3.80) 4.40x10-8 2.70 (1.89-3.85) 0.9385 rs2297440 All glioma 7.10x10-33 1.39 (1.31-1.46) 4.09x10-21 1.42 (1.32-1.52) 1.34x10-13 1.37 (1.26-1.49) 0.5299 (20q13.33) GBM 1.39x10-33 1.48 (1.39-1.57) 1.22x10-19 1.47 (1.35-1.59) 1.15x10-16 1.53 (1.39-1.70) 0.5159 Non-GBM 8.46x10-9 1.24 (1.15-1.34) 2.92x10-7 1.29 (1.17-1.43) 0.0040 1.18 (1.05-1.32) 0.1916

Table 2-3 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta- analysis and individual studies for rs11979158, rs55705857 and rs9841110 overall and by histology groupings. Males Sex Difference Histolo ORM ORF RSID gy Study PM (95% CI) Phet PF (95% CI) Phet PD Phet rs9841110 All Meta- 0.5885 1.02 0.0728 5.55x10-8 1.22 0.2751 1.77x10-4 0.6269 (3p21.31) glioma analysis (0.96-1.08) (1.14-1.32) GICC 0.3089 0.95 0.0123 1.15 0.0052 (0.87-1.05) (1.03-1.28) SFAGS- 0.016 1.25 0.0059 1.35 0.2857 GWAS (1.04-1.49) (1.09-1.67) MDA-GWAS 0.5538 1.05 0.0621 1.18 0.1586 (0.90-1.22) (0.99-1.40) GliomaScan 0.788 1.02 6.14x10-5 1.34 0.0024 (0.90-1.16) (1.16-1.55) GBM Meta- 0.3429 1.04 0.1308 1.44x10-7 1.27 0.2649 6.04x10-4 0.6330 analysis (0.96-1.11) (1.16-1.38) GICC 0.5104 0.97 0.0027 1.22 0.003 (0.87-1.07) (1.07-1.39) 57

SFAGS- 0.021 1.26 0.0025 1.45 0.184 GWAS (1.04-1.53) (1.14-1.85) MDA-GWAS 0.5797 1.05 0.3271 1.11 0.3565 (0.88-1.27) (0.90-1.37) GliomaScan 0.4944 1.06 3.89x10-4 1.38 0.0128 (0.90-1.23) (1.16-1.65) Non- Meta- 0.4816 0.97 0.3776 0.016 1.13 0.5611 0.0186 0.7033 GBM analysis (0.89-1.06) (1.02-1.24) GICC 0.1834 0.92 0.342 1.07 0.0547 (0.82-1.04) (0.94-1.21) SFAGS- 0.2235 1.21 0.619 1.10 0.3596 GWAS (0.89-1.65) (0.75-1.64) MDA-GWAS 0.7644 1.03 0.0567 1.26 0.1014 (0.85-1.26) (0.99-1.60) GliomaScan 0.6195 0.95 0.0795 1.22 0.0523 (0.77-1.17) (0.98-1.52) rs1197915 All Meta- 4.87x10-12 1.33 0.1316 0.0187 1.12 0.772 0.0055 0.2849 8 (7p11.2) glioma analysis (1.23-1.44) (1.02-1.22) GICC 2.94x10-10 1.47 0.1528 1.11 0.0013 (1.30-1.66) (0.96-1.27) SFAGS- 0.0256 1.32 0.2579 1.17 0.2631 GWAS (1.03-1.68) (0.89-1.54) MDA-GWAS 0.019 1.25 0.8186 1.02 0.0812 (1.04-1.51) (0.83-1.26) GliomaScan 0.0757 1.16 0.0705 1.18 0.4543 (0.98-1.37) (0.99-1.41) GBM Meta- 1.26x10-12 1.40 0.4644 1.33x10-4 1.24 0.9652 0.1184 0.6473 analysis (1.28-1.54) (1.11-1.39) GICC 1.14x10-9 1.52 0.0254 1.22 0.0229 (1.33-1.73) (1.02-1.44) SFAGS- 0.0349 1.33 0.0784 1.32 0.484 GWAS (1.02-1.73) (0.97-1.79) MDA-GWAS 0.0371 1.27 0.1382 1.22 0.4067 (1.01-1.59) (0.94-1.59) GliomaScan 0.0078 1.32 0.0322 1.28 0.4175 (1.08-1.61) (1.02-1.59) Non- Meta- 2.74x10-5 1.27 0.3829 0.9014 0.99 0.1587 0.0034 0.1072 GBM analysis (1.13-1.41) (0.88-1.12) GICC 3.48x10-5 1.37 0.8557 0.98 0.0018 (1.18-1.60) (0.83-1.16) SFAGS- 0.3373 1.22 0.5226 0.85 0.1358 GWAS (0.82-1.82) (0.52-1.40) MDA-GWAS 0.1229 1.21 0.1696 0.82 0.0205 (0.95-1.54) (0.61-1.09) GliomaScan 0.6499 1.06 0.0901 1.27 0.1776 (0.82-1.38) (0.96-1.68) rs5570585 All Meta- 1.09x10-14 1.56 0.0426 1.22x10-39 2.45 0.6289 0 0.0662 7 (8q24.21) glioma analysis (1.40-1.75) (2.14-2.80) GICC 1.19x10-10 1.68 1.63x10-19 2.30 0.005 (1.43-1.97) (1.92-2.76) SFAGS- 0.4872 1.14 4.16x10-6 3.18 6.44x10-4 GWAS (0.78-1.67) (1.94-5.21) MDA-GWAS 2.67x10-6 1.90 9.89x10-7 2.46 0.1315 (1.46-2.49) (1.72-3.53) GliomaScan 0.0481 1.28 3.25x10-12 2.60 5.59x10-5 (1.00-1.62) (1.99-3.40) GBM Meta- 0.0344 1.17 0.8709 4.16x10-7 1.61 0.0728 0.0066 0.0823 analysis (1.01-1.34) (1.34-1.94) GICC 0.051 1.22 0.0612 1.29 0.3702 (1.00-1.49) (0.99-1.69) SFAGS- 0.6649 1.10 0.0014 2.56 0.0094 GWAS (0.73-1.65) (1.44-4.57) MDA-GWAS 0.3057 1.21 0.063 1.56 0.197 (0.84-1.72) (0.98-2.48) GliomaScan 0.7125 1.06 8.59x10-5 2.07 0.0027 (0.78-1.43) (1.44-2.97) 58

Non- Meta- 8.13x10-36 2.66 0.026 1.85x10-65 4.71 0.0514 3.46x10-7 0.0662 GBM analysis (2.28-3.10) (3.94-5.63) GICC 6.85x10-25 2.81 1.42x10-39 4.09 0.0051 (2.31-3.42) (3.32-5.05) SFAGS- 0.455 1.31 1.44x10-6 12.68 1.89x10-4 GWAS (0.65-2.66) (4.51-35.64) MDA-GWAS 3.17x10-12 3.50 2.55x10-11 6.38 0.0346 (2.46-4.97) (3.70-11.00) GliomaScan 0.0015 1.90 1.82x10-14 6.17 7.71x10-5 (1.28-2.82) (3.87-9.83)

The previously identified SNP at 8q24.21 (rs55705857) was the most significant SNP in both males and females. Odds ratio for rs55705857 in all glioma was significantly higher

-39 in females (ORF=2.45 [95%CI=2.14-2.80], pF=1.22x10 ) as compared to males

-14 -7 (ORM=1.56 [95%CI=1.40-1.75], pM=1.09x10 ) with pD=3.46x10 . In non-GBM only,

-65 ORF (ORF=4.71 [95%CI=3.94-5.63], pF=1.85x10 ) was also elevated as compared to

-36 -7 ORM (ORM=2.66 [95%CI=2.28-3.10)], pM=8.13x10 ) with pD=8.44x10 (Table 2-2,

Figure 2-4). This association was further explored in a case-only analysis, where there was a significant difference between males and females overall (p=0.0012), and in non-

GBM (p=0.0084) (Table 2-4, see Table 2-1 for sample sizes).

Table 2-4 Case-only odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta- analysis for rs11979158, rs55705857 and rs9841110 overall and by histology groupings. RSID (Locus) Histology P OR (95% CI) Phet rs9841110 (3p21.31) All Glioma 0.0520 0.93 (0.87-1.00) 0.2801 GBM 0.0428 0.91 (0.82-1.00) 0.3218 Non-GBM 0.5709 0.97 (0.86-1.08) 0.7404 rs11979158 (7p11.2) All Glioma 0.0071 1.13 (1.03-1.24) 0.8834 GBM 0.2392 1.08 (0.95-1.22) 0.8130 Non-GBM 0.0115 1.20 (1.04-1.39) 0.2339 rs55705857 (8q24.21) All Glioma 0.0012 1.20 (1.07-1.33) 0.0523 GBM 0.6513 1.04 (0.88-1.23) 0.1497 Non-GBM 0.0084 1.23 (1.05-1.43) 0.0968

Table 2-5 Case-only odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta- analysis and individual studies for rs11979158, rs55705857 and rs9841110 overall and by histology groupings. RSID Histology All Glioma (Locus) Study P OR (95% CI) Phet rs9841110 All Glioma Meta-analysis 0.0520 0.93 (0.87-1.00) 0.2801 (3p21.31) GICC 0.2862 0.95 (0.87-1.04) SFAGS-GWAS 0.8498 0.98 (0.76-1.25) MDA-GWAS 0.8190 1.02 (0.84-1.24) GliomaScan 0.0115 0.82 (0.71-0.96) GBM Meta-analysis 0.0428 0.91 (0.82-1.00) 0.3218 59

GICC 0.1032 0.90 (0.79-1.02) SFAGS-GWAS 0.5560 0.92 (0.69-1.22) MDA-GWAS 0.4452 1.11 (0.85-1.43) GliomaScan 0.0453 0.81 (0.65-1.00) Non-GBM Meta-analysis 0.5709 0.97 (0.86-1.08) 0.7404 GICC 0.9061 0.99 (0.86-1.14) SFAGS-GWAS 0.5600 1.17 (0.69-1.97) MDA-GWAS 0.5710 0.92 (0.68-1.23) GliomaScan 0.3430 0.87 (0.66-1.16) rs11979158 All Glioma Meta-analysis 0.0071 1.13 (1.03-1.24) 0.8834 (7p11.2) GICC 0.0924 1.11 (0.98-1.25) SFAGS-GWAS 0.4149 1.15 (0.82-1.62) MDA-GWAS 0.0820 1.24 (0.97-1.57) GliomaScan 0.2396 1.12 (0.93-1.36) GBM Meta-analysis 0.2392 1.08 (0.95-1.22) 0.8130 GICC 0.5852 1.05 (0.88-1.24) SFAGS-GWAS 0.9093 1.02 (0.69-1.52) MDA-GWAS 0.8085 1.04 (0.75-1.45) GliomaScan 0.1607 1.22 (0.93-1.60) Non-GBM Meta-analysis 0.0115 1.20 (1.04-1.39) 0.2339 GICC 0.0764 1.18 (0.98-1.41) SFAGS-GWAS 0.1465 1.67 (0.84-3.33) MDA-GWAS 0.0214 1.5 (1.06-2.13) GliomaScan 0.7295 0.94 (0.65-1.35) rs55705857 All Glioma Meta-analysis 0.0012 1.20 (1.07-1.33) 0.0523 (8q24.21) GICC 0.0293 1.17 (1.02-1.34) SFAGS-GWAS 0.0522 1.56 (1.00-2.44) MDA-GWAS 0.4549 0.89 (0.66-1.21) GliomaScan 0.0024 1.47 (1.15-1.88) GBM Meta-analysis 0.6513 1.04 (0.88-1.23) 0.1497 GICC 0.3163 0.89 (0.71-1.12) SFAGS-GWAS 0.2701 1.35 (0.79-2.32) MDA-GWAS 0.9869 1.00 (0.63-1.58) GliomaScan 0.0691 1.40 (0.97-2.02) Non-GBM Meta-analysis 0.0084 1.23 (1.05-1.43) 0.0968 GICC 0.0236 1.24 (1.03-1.49) SFAGS-GWAS 0.0573 2.23 (0.98-5.11) MDA-GWAS 0.3932 0.83 (0.55-1.27) GliomaScan 0.0511 1.53 (1.00-2.34) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

Table 2-6 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta- analysis for rs11979158, rs55705857 and rs9841110 by specific non-GBM histologies. Males Females ORM ORF RSID (Locus) Histology PM (95% CI) Phet PF (95% CI) Phet PD rs9841110 Astrocytoma (Non-GBM) 0.5304 1.04 0.751 0.0407 1.15 0.549 0.2409 (3p21.31) (WHO grade II-III) (0.92-1.17) (1.01-1.32) Oligodendroglioma 0.4190 0.94 0.694 0.0973 1.14 0.360 0.0649 (WHO grade II-III) (0.81-1.09) (0.98-1.34) rs11979158 Astrocytoma (Non-GBM) 0.0023 0.79 0.056 0.9363 0.99 0.418 0.0500 (7p11.2) (WHO grade II-III) (0.68-0.92) (0.83-1.18) Oligodendroglioma 0.0221 0.81 0.865 0.6561 1.05 0.262 0.0471 (WHO grade II-III) (0.68-0.97) (0.86-1.28) rs55705857 Astrocytoma (Non-GBM) 1.19x10-21 2.87 0.073 2.15x10-28 4.64 0.237 0.0065 (8q24.21) (WHO grade II-III) (2.31-3.56) (3.53-6.09) Oligodendroglioma 5.37x10-34 5.47 0.103 3.68x10-58 12.15 0.027 6.60x10-5 (WHO grade II-III) (4.16-7.19) (8.96-16.48)

60

Table 2-7 Sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta- analysis and individual studies for rs11979158, rs55705857 and rs9841110 by specific non-GBM histologies. Males Females RSID ORM ORF (Locus) Histology Study PM (95% CI) Phet PF (95% CI) Phet PD rs9841110 Astrocytoma (Non- Meta- 0.5304 1.04 0.751 0.0407 1.15 0.549 0.2409 (3p21.31) GBM) (WHO grade analysis (0.92-1.17) (1.01-1.32) II-III) GICC 0.8819 1.01 0.3113 1.09 0.2594 (0.87-1.18) (0.92-1.29) MDA- 0.3388 1.14 0.2454 1.21 0.3799 GWAS (0.87-1.48) (0.88-1.68) GliomaScan 0.8658 1.02 0.0826 1.32 0.1134 (0.78-1.34) (0.96-1.80) Oligodendroglioma Meta- 0.4190 0.94 0.694 0.0973 1.14 0.360 0.0649 (WHO grade II-III) analysis (0.81-1.09) (0.98-1.34) GICC 0.2508 0.90 0.4020 1.09 0.0812 (0.74-1.08) (0.89-1.33) MDA- 0.8013 1.04 0.0500 1.51 0.0851 GWAS (0.75-1.45) (1.00-2.27) GliomaScan 0.9752 0.99 0.6171 1.09 0.3544 (0.70-1.41) (0.77-1.55) rs11979158 Astrocytoma (Non- Meta- 0.0023 0.79 0.056 0.9363 0.99 0.418 0.0500 (7p11.2) GBM) (WHO grade analysis (0.68-0.92) (0.83-1.18) II-III) GICC 3.90x10-4 0.70 0.9938 1.00 0.0093 (0.58-0.85) (0.80-1.25) MDA- 0.1640 0.80 0.3965 1.19 0.0630 GWAS (0.58-1.10) (0.80-1.78) GliomaScan 0.4710 1.13 0.3097 0.82 0.1075 (0.81-1.59) (0.55-1.21) Oligodendroglioma Meta- 0.0221 0.81 0.865 0.6561 1.05 0.262 0.0471 (WHO grade II-III) analysis (0.68-0.97) (0.86-1.28) GICC 0.0475 0.79 0.6237 1.07 0.0450 (0.63-1.00) (0.83-1.37) MDA- 0.5593 0.89 0.1941 1.40 0.0830 GWAS (0.60-1.32) (0.84-2.33) GliomaScan 0.2618 0.78 0.3297 0.81 0.4496 (0.50-1.21) (0.53-1.24) rs55705857 Astrocytoma (Non- Meta- 1.19x10-21 2.87 0.073 2.15x10-28 4.64 0.237 0.0065 (8q24.21) GBM) (WHO grade analysis (2.31-3.56) (3.53-6.09) II-III) GICC 1.72x10-18 3.34 2.75x10-17 4.01 0.1947 (2.55-4.36) (2.91-5.53) MDA- 6.98x10-5 2.77 3.76x10-7 7.38 0.0185 GWAS (1.68-4.58) (3.41-15.96 ) GliomaScan 0.0509 1.68 1.42x10-7 6.19 0.0014 (1.00-2.83) (3.14-12.22) Oligodendroglioma Meta- 5.37x10-34 5.47 0.103 3.68x10-58 12.15 0.027 6.60x10-5 (WHO grade II-III) analysis (4.16-7.19) (8.96-16.48) GICC 2.94x10-23 5.49 3.10x10-40 10.24 0.0054 (3.92-7.68) (7.26-14.43) MDA- 1.88x10-11 8.41 3.97x10-6 11.57 0.3031 GWAS (4.52-15.66) (4.09-32.74) GliomaScan 0.0035 2.96 1.53x10-16 36.02 6.35x10-6 (1.43-6.15) (15.38-84.37) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

Previous studies have found a strong association between rs55705857 and oligodendroglial tumors (particularly tumors with isocitrate dehydrogenase 1/2 (IDH1/2) mutation and loss of the 1p and 19q), so this association was further explored in the non- 61

GBM (lower grade glioma [LGG]) histology groups (Table 2-6, see Table 2-1 for sample sizes). For World Health Organization (WHO) grade II-grade III astrocytoma,

-28 effect was stronger in females (ORF=4.64 [95%CI=3.53-6.09], pF=2.15x10 ) as

-21 compared to males (ORM=2.87 [95%CI=2.31-3.56], pM=1.19x10 ) with pD=0.0065. For

WHO grade II-III oligodendrogliomas effect was stronger than observed in WHO grade

II-III astrocytomas, and effect size was stronger in females (ORF=12.15 [95%CI= 8.96-

-58 16.48], pF=3.68x10 ) as compared to males (ORM=5.47 [95%CI=4.16-7.19],

-34 -5 pM=5.37x10 ) with pD=6.60x10 . Oligoastrocytic tumors were not included in sub-

analyses due to recent research that suggests that these tumors are not an entity that is

molecularly distinct from oligodendrogliomas or astrocytomas [176].

2.4.2 Genome-wide scan of nominally significant regions

In a previous eight-study meta-analysis, ~12,000 SNPs (INFO>0.7, MAF>0.01) were identified as having a nominally significant (p<5x10-4) association with all glioma, GBM,

or non-GBM [4]. A sex-stratified genome-wide scan was conducted within this set of

-6 SNPs and results were considered significant at pD<1.4x10 (adjusted for 12,000 tests in

each of three histologies [36,000 tests], see Figure 2-2 for schematic of study design).

Similar genome-wide peaks were observed between males and females (Figure 2-5,

Figure 2-6, Figure 2-7).

62

Figure 2-5. Manhattan plot of -log(p) values for all glioma in A) males and B) females

Figure 2-6 Manhattan plot of -log(p) values for GBM in A) males and B) females

Figure 2-7 Manhattan plot of -log(p) values for non-GBM in A) males and B) females

63

Figure 2-8 P-values of SNPs between 48.8mb and 50mb on chromosome 3 in males for A) all glioma, B) GBM, and C) non-GBM, and in females for D) all glioma, E) GBM, and F) non-GBM

One large region within 3p21.31 (49400kb-49600kb, ~200kb) was identified as being

significantly associated with glioma and GBM in females only (Figure 2-8). There were

243 SNPs with nominally significant associations within this region in the previous eight- study meta-analysis (p<5x10-4), and 32 of these had nominally significant sex

-6 -6 associations (pF<5x10 or pM<5x10 ) in all glioma or GBM. The strongest association in females within this region was at rs9841110, in both all glioma (ORF=1.22 [95%CI=1.14-

-8 -4 1.32], pF=5.55x10 ) with pD=1.77x10 ) and GBM only (ORF=1.27 [95%CI=1.16-1.38],

-7 -4 pF=3.86x10 ) with pD=6.04x10 ), while there were no significant associations detected in males (Figure 2-4). No SNPs in this region were significantly associated with non-

GBM. In a case-only analysis there was a marginally significant difference was detected

between males and females overall (p=0.0520) and in GBM (p=0.0428) (Table 2-4).

64

2.4.3 Agnostic scan of sex chromosome loci

SNPs on the sex chromosomes were analyzed in GICC only. There were 245,746 SNPs

with INFO≥0.7 and MAF≥0.01 on the X chromosome after quality control and imputation, and results were considered significant at p<2x10-7 (corrected for 250,000

tests, see Figure 2-3 for a schematic of study design). No SNPs met this significance

threshold. After quality control procedures were complete, there were 300 SNPs

remaining on the Y chromosome. No significant signals were detected on the Y

chromosome.

2.4.4 Combined analysis of germline variants and somatic characterization

Due to the lack of molecular classification data included in the GICC, MDA-GWAS,

SFAGS-GWAS, and GliomaScan datasets, glioma data obtained from TCGA datasets

(GBM and LGG) were used to explore the potential confounding due to molecular subtype variation with histologies. After quality control procedures, there were 758 individuals from the TCGA dataset available for analysis with available germline genotyping, molecular characterization, sex and age data (Table 2-8). Overall, slightly

more females (53.2%) as compared to males (47.2%) had IDH1/2 mutant glioma, but this

difference was not statistically significant (p=0.1104) (Figure 2-9).

Table 2-8 Characteristics of individuals in The Cancer Genome Atlas, by study and sex Characteristic Overall Males Females N 786 458 328 GBM (% of total) 356 (45.3%) 221 (48.3%) 135 (41.2%) Non-GBM (% of total) 430 (54.7%) 237 (51.7%) 193 (58.8%) Mean Age (SD) 50.74 (15.55) 51.21 (15.59) 50.11 (15.49) GBM - Mean Age (SD) 59.83 (12.89) 60.16 (12.33) 59.29 (13.79) Non-GBM - Mean Age (SD) 43.23 (13.40) 42.85 (13.54) 43.69 (13.24) IDH wild type (% of total) 373 (50.3%) 229 (52.8%) 144 (46.8%) GBM (% of total) 297 (94.3%) 187 (94.0%) 110 (94.8%) Non-GBM (% of total) 76 (17.8%) 42 (17.9%) 34 (17.7%) IDH mutant (% of total) 369 (49.7%) 205 (47.2%) 164 (53.2%) GBM (% of total) 18 (5.7%) 12 (6.0%) 6 (5.2%) Non-GBM (% of total) 351 (82.2%) 193 (82.1%) 158 (82.3%) 65

Figure 2-9 Proportion of samples with IDH1/2 mutation in the TCGA GBM and LGG datasets by sex, overall and stratified by study

When tumors were stratified by histological type, approximately equal proportions of

males and females had IDH1/2 mutations present in their tumors (Figure 2-9, GBM:

6.0% in males, and 5.2% in females; LGG: 17.9% in males, and 17.7% in females). There

were also no significant differences by sex in IDH/TERT/1p19q subtype (Figure 2-10, overall p=0.2859).

Figure 2-10 Proportion of samples by glioma subtype (based on IDH1/2 mutation, 1p19q, and TERT mutation [43]) in the TCGA GBM and LGG datasets by sex, overall and stratified by study

SNPs found to be nominally significant (p<5x10-4) in a previous eight-study meta-

analysis, with imputation quality (r2) ≥0.7 were identified within the TCGA germline 66

genotype data and D’ and r2 values in CEU were used to select proxy SNPs (Table 2-9)

[174].

Table 2-9. Linkage disequilibrium measures, sex-stratified odds ratios, and 95% confidence intervals (95% CI), and p-values from meta-analysis for marker SNPs selected within the Cancer Genome Atlas genotyping data RSID Marker SNP Results from Sex-Stratified four-study meta-analysis (Locus) RSID [LD Risk Histology [Risk PM ORM (95% CI) PF ORF (95% CI) pD (CEUa)] Allele Allele] rs9841110 rs9814873 A/G All glioma 0.6552 1.01 (0.95-1.08) 1.45x10-7 1.22 (1.13-1.31) 0.0002 (3p21.31) [D'=1; r2=1 GBM 0.4139 1.03 (0.96-1.11) 1.98x10-7 1.26 (1.16-1.38) 0.0005 [C/G] (C=A,G=G)] Non-GBM 0.4526 0.97 (0.89-1.06) 0.0279 1.12 (1.01-1.23) 0.026 rs11979158 rs7785013 G/A All glioma 1.87x10-12 1.34 (1.23-1.45) 0.0168 1.12 (1.02-1.23) 0.005 (7p11.2) [D'=1; r2=1 GBM 4.62x10-13 1.41 (1.29-1.55) 1.11x10-4 1.25 (1.12-1.39) 0.107 [A/G] (A=G,G=A)] Non-GBM 2.18x10-5 1.27 (1.14-1.42) 0.9088 0.99 (0.88-1.12) 0.003 rs55705857 rs4636162 G/A All glioma 0.0158 1.08 (1.01-1.15) 0.0018 1.12 (1.04-1.20) 0.427 (8q24.21) [D’=1; GBM 0.6525 1.02 (0.95-1.09) 0.9398 1.00 (0.92-1.09) 0.835 [A/G] r2=0.104 Non-GBM 1.37x10-5 1.21 (1.11-1.32) 6.07x10-8 1.31 (1.19-1.45) 0.213 (A=G,G=A)] a. As estimated using LD link (https://analysistools.nci.nih.gov/LDlink/) [174]

A case-only analysis was conducted using sex as a binary phenotype for proxy SNPs in

the TCGA dataset. In the overall meta-analysis, there was a nominally significant signal in the case-only meta-analysis for the proxy SNP in 3p21.31 in GBM (Table 2-10). There was no significant association in the TCGA set, but RAF was elevated in females as compared to males in the GBM set, as well as in all IDH1/2 wild type gliomas (Table

2-10). RAF in LGG, and IDH1/2 mutant glioma was similar among males and females.

There was a nominally significant signal in the case-only meta-analysis for the proxy

SNP at 7p11.2, but no significant association in the TCGA, but RAF was elevated in males as compared to females in the GBM set, as well as in all IDH1/2 wild type gliomas

(Table 2-10). There was no significant signal detected in the overall case-only meta- analysis for the proxy SNP at 8q24.21, or within the TCGA set. Among both LGG and

IDH1/2 mutant, RAF was elevated in females as opposed to males.

67

Table 2-10 Risk allele frequencies (RAF) Case-only odds ratios, 95% confidence intervals (95% CI), and p-values for marker SNPs from four study meta-analysis and the Cancer Genome Atlas genotyping data Four-study Meta-Analysis The Cancer Genome Atlas Case-only analysis Case-only analysis Males Females (males:females) Males Females (males:females) Marker SNP Histology OR OR RAFcases RAFcases P (95% CI) INFO RAFcases RAFcases p (95% CI) rs9814873 All glioma 0.692 0.707 0.0577 1.07 1.00 0.701 0.716 0.5321 0.93 (3p21.31) (1.00-1.15) (0.75-1.16) GBM 0.694 0.716 0.0371 1.11 1.00 0.697 0.742 0.2003 0.80 (1.01-1.22) (0.58-1.12) LGG (non- 0.686 0.691 0.6446 1.03 1.00 0.705 0.697 0.8039 1.04 GBM) (0.92-1.15) (0.77-1.40) IDH1/2 ------1.00 0.704 0.731 0.4343 0.88 wild type (0.64-1.21) IDH1/2 ------1.00 0.705 0.692 0.7023 1.06 mutant (0.77-1.47) rs7785013 All glioma 0.864 0.847 0.0058 0.88 0.99 0.855 0.850 0.7813 1.04 (7p11.2) (0.80-0.96) (0.78-1.39) GBM 0.872 0.861 0.2141 0.92 0.99 0.854 0.840 0.6073 1.12 (0.81-1.05) (0.73-1.72) LGG (non- 0.855 0.832 0.0109 0.83 0.99 0.856 0.857 0.9585 0.99 GBM) (0.72-0.96) (0.67-1.47) IDH1/2 ------0.99 0.864 0.837 0.3132 1.24 wild type (0.82-1.87) IDH1/2 ------0.99 0.846 0.875 0.2447 0.77 mutant (0.50-1.19) rs4636162 All glioma 0.358 0.365 0.5161 1.02 0.93 0.392 0.424 0.2113 0.88 (8q24.21) (0.96-1.09) (0.71-1.08) GBM 0.343 0.338 0.6001 0.98 0.93 0.374 0.404 0.4456 0.89 (0.89-1.07) (0.66-1.20) LGG 0.383 0.401 0.1594 1.08 0.94 0.410 0.438 0.3891 0.88 (0.97-1.20) (0.67-1.17) IDH1/2 ------0.92 0.371 0.373 0.9480 0.99 wild type (0.73-1.34) IDH1/2 ------0.94 0.419 0.460 0.2613 0.84 mutant (0.63-1.14)

2.4.5 Sex-stratified genotypic risk scores

In order to estimate the cumulative effects of significant variants by sex, unweighted risk scores (URS) were calculated by summing all risk alleles for each individual using the ten SNPs (rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440) found to be significantly associated with glioma in this analysis. GBM (URS-GBM) and non-GBM (URS-NGBM) specific

URS were calculated only using sets of six SNPs in this set that were significantly associated with these histologies (URS-GBM: rs9841110, rs10069690, rs11979158, rs634537, rs78378222, and rs2297440, and URS-NGBM: rs10069690, rs55705857,

68

rs634537, rs12803321, rs78378222, and rs2297440). See Section 2.3.3.5 for additional information on score calculation. Median URS, URS-GBM, and URS-NGBM were significantly different (p<0.0001) between cases and controls in both males and females in all histology groups (Figure 2-11).

Figure 2-11 Density of histology-specific unweighted risk score by sex and case/control status for A) URS in all glioma, B) URS in GBM, C) URS in non-GBM, D) URS-GBM in GBM, only and E) URS- NGBM in non-GBM only

There was no significant difference in median risk scores between male and female cases

for any histology group. Glioma risk increased with increasing number of alleles in both

males and females for the ten SNPs included in the overall URS, as well as the six SNPs in the URS-GBM and six SNPs in URS-NGBM (Figure 2-12, Table 2-11, Table 2-12,

Table 2-13). Risk was higher in females (OR=3.97 [95%CI=2.42-6.80]) as compared to

69

males (OR=1.74 [95%CI=1.21-2.53]) in all glioma for individuals for with 13 to16 alleles (Table 2-11), though the difference between these estimates were not statistically significant. Risk was also higher among females (OR=2.69 [95%CI=1.98-3.66]) as compared to males (OR=1.79 [95%CI=1.38-2.32]) in GBM for individuals with 8 to 11 risk alleles (Table 2-12), as well as in non-GBM for individuals with 6 to 11 risk alleles

(females: OR=2.83 [95%CI=2.12-3.78], males: OR=1.70 [95%CI=1.31-2.19]) (Table

2-13), though the difference between these estimates were not statistically significant.

The estimates may underestimate actual risk due to varying effect sizes and alleles frequencies between risk variants.

Figure 2-12 Odds ratios and 95% confidence intervals for unweighted risk (URS) score in A) all glioma, B) GBM-specific URS (URS-G) in GBM, and C) and non-GBM-specific URS (URS-NGBM) for in non-GBM 70

Table 2-11 Odds ratios and 95% confidence intervals for unweighted score in all glioma (URSa) by sex. Number Males Females of risk Cases Controls Cases Controls alleles (%) (%) OR (95% CI) P (%) (%) OR (95% CI) P 0-5 46 161 0.24 1.96x10-15 30 175 0.19 1.52x10-15 (1.2%) (4.3%) (0.17-0.34) (1.2%) (4.7%) (0.13-0.28) 6 140 286 0.38 6.20x10-16 105 307 0.38 1.10x10-13 (3.7%) (7.6%) (0.30-0.48) (4.1%) (8.3%) (0.30-0.49) 7 411 575 0.56 3.33x10-11 249 597 0.46 1.09x10-15 (10.9%) (15.3%) (0.48-0.67) (9.7%) (16.1%) (0.38-0.56) 8 639 815 0.61 2.04x10-10 457 818 0.63 3.45 x10-8 (17.0%) (21.6%) (0.52-0.71) (17.8%) (22.1%) (0.54-0.74) 9 830 826 0.79 1.60x10-3 557 79 0.79 3.33 x10-3 (22.1%) (21.9%) (0.68-0.91) (21.7%) (21.4%) (0.67-0.92) 10 768 621 Ref -- 556 621 Ref -- (20.4%) (16.5%) (21.7%) (16.8%) 11 543 615 1.33 1.82x10-3 351 292 1.35 2.35 x10-3 (14.4%) (16.3%) (1.11-1.59) (13.7%) (7.9%) (1.11-1.65) 12 247 121 1.71 1.62x10-5 189 82 2.53 1.91 x10-10 (6.6%) (3.2%) (1.34-2.18) (7.4%) (2.2%) (1.91-3.39) 13-16 110 47 1.74 3.06x10-3 70 20 3.97 1.42 x10-7 (2.9%) (1.2%) (1.21-2.53) (2.7%) (0.5%) (2.42-6.80) Total 3,761 3,767 1.26 Trend: 2564 3707 1.34 Trend: (1.23-1.30) 8.74x10-63 (1.30-1.38) 3.17x10-74 per allele per allele a. model includes age, and sum of risk alleles for rs12752552, rs9841110, rs10069690, rs11979158, rs55705857, rs634537, rs12803321, rs3751667, rs78378222, and rs2297440. Reference is 10 alleles.

Table 2-12 Odds ratios and 95% confidence intervals for unweighted score in GBM (URS-GBMa) by sex. Number Males Females of risk Cases Controls Cases Controls alleles (%) (%) OR (95% CI) P (%) (%) OR (95% CI) P 0-2 28 181 0.21 (0.14-0.31) 1.10x10-13 20 181 0.25 (0.15-0.39) 1.02x10-8 (1.2%) (4.4%) (1.4%) (4.4%) 3 128 439 0.39 (0.31-0.48) 2.84x10-17 77 485 0.35 (0.27-0.46) 2.77x10-14 (5.3%) (10.6%) (5.3%) (11.7%) 4 347 945 0.49 (0.42-0.58) 2.23x10-18 217 954 0.51 (0.42-0.61) 2.21x10-12 (14.4%) (22.7%) (15.0%) (23.0%) 5 631 1,156 0.73 (0.64-0.84) 1.11x10-5 356 1,181 0.67 (0.57-0.80) 3.88x10-6 (26.2%) (27.8%) (24.7%) (28.5%) 6 684 920 Ref -- 393 877 Ref -- (28.4%) (22.1%) (27.3%) (21.2%) 7 430 396 1.46 (1.24-1.73) 9.58x10-6 272 374 1.62 (1.33-1.97) 1.54x10-6 (17.9%) (9.5%) (18.9%) (9.0%) 8-11 159 120 1.79 (1.38-2.32) 9.47x10-6 107 88 2.69 (1.98-3.66) 2.32x10-10 (6.6%) (2.9%) (7.4%) (2.1%) Total 2,407 4,157 1.40 (1.35-1.46) Trend: 1442 4140 1.47 (1.40-1.54) Trend: per allele 2.05x10-66 per allele 3.76x10-60 a. model includes age and sum of risk alleles for rs9841110, rs10069690, rs11979158, rs634537, rs78378222, and rs2297440. Reference is 6 alleles.

71

Table 2-13 Odds ratios and 95% confidence intervals for unweighted score in all non-GBM (URS- NGBMa) by sex. Number Males Females of risk Cases Controls Cases Controls alleles (%) (%) OR (95% CI) P (%) (%) OR (95% CI) P 0-1 59 341 0.41 5.49x10-8 34 346 0.30 1.09x10-9 (4.2%) (8.8%) (0.29-0.56) (3.1%) (9.1%) (0.20-0.44) 2 130 715 0.40 1.06x10-13 129 756 0.55 9.97x10-7 (9.2%) (18.5%) (0.32-0.51) (11.6%) (19.9%) (0.43-0.70) 3 340 1,089 0.76 4.06x10-3 263 1,145 0.72 1.25x10-3 (24.1%) (28.2%) (0.64-0.92) (23.7%) (30.2%) (0.59-0.88) 4 419 979 Ref -- 298 921 Ref -- (29.7%) (25.4%) (26.8%) (24.3%) 5 288 523 1.28 1.65x10-2 251 477 1.71 1.08x10-6 (20.4%) (13.5%) (1.05-1.57) (22.6%) (12.6%) (1.38-2.12) 6-11 175 213 1.70 5.19x10-5 137 146 2.83 1.41x10-12 (12.4%) (5.5%) (1.31-2.19) (12.3%) (3.9%) (2.12-3.78) Total 1,411 3,860 1.36 (1.29-1.42) Trend: 1,112 3,791 1.49 (1.41-1.58) Trend: per allele 1.16x10-32 per allele 2.59x10-45 a. model includes age and sum of risk alleles for rs10069690, rs55705857, rs634537, rs12803321, rs78378222, and rs2297440. Reference is 4alleles.

Discussion

This represents the first analysis of inherited risk variants in sporadic glioma focused

specifically on sex differences, and the first agnostic unbiased scan for glioma risk

variants on the X and Y sex chromosomes. Like many other types of cancer, there is a

male preponderance of glioma. This incidence difference is not currently explained by

known environmental or genetic risk factors.

One SNP at the 7p11.2 locus (rs11979158) showed significant association in males only,

in both all glioma and GBM (Table 2-2). Effects were similar in all studies included in

the analysis (Table 2-3, Figure 2-13). This variant is within one of two previously

identified independent glioma risk loci located near epidermal growth factor receptor

(EGFR) and is most strongly associated with risk for GBM [4, 177]. Though EGFR is implicated in many cancer types and is a target for many anti-cancer therapies, this risk locus has not been previously associated with any other cancer type. Estrogen has been demonstrated to interact with EGFR as well as other growth factors [178]. Previous

72

studies have not been definitive about the role of endogenous estrogen exposure in glioma risk, so it is not possible to determine the biological plausibility of this association

[178]. Alternatively, cell intrinsic, hormone independent sex differences in EGF effects have been observed in a murine model of gliomagenesis in which EGF treatment was transforming for male but not female astrocytes that had been rendered null for neurofibromin and p53 function [179]. While this specific SNP was not genotyped on the germline genotyping array used for TCGA, a SNP in strong LD with rs11979158

(rs7785013, D’=1, r2=1 in CEU [174]) was evaluated. The association in the case-only

analysis in TCGA was not statistically significant in any histology group, but a similar

trend to that observed in the overall meta-analysis in sex-specific RAF was observed in both the overall GBM group, as well as in the IDH1/2 wild type group.

Figure 2-13 Sex-specific odds ratios and 95% CI from meta-analysis and by study for rs11979158 (7p11.2) for all glioma, GBM, and non-GBM

The association at 8q24.21 (rs55705857) is the strongest that has been identified by

glioma GWAS to date [4], with an odds ratio of 1.99 (95%CI=1.85-2.13, p=9.53x10-79) in

glioma overall, and an odds ratio of 3.39 (95%CI=3.09-3.71, p=7.28x10-149) in non-

GBM. rs55705857 is located in an intergenic region near coiled-coil domain containing

26 (CCDC26, a long non-coding RNA). This analysis found that this association is stronger in females than males in all glioma and non-GBM, where female odds ratio 73

estimates are ~2x those of males (Table 2-2). ORs were higher in women than men in all studies included in the analysis, but the magnitude of the ORs varied between studies

(Table 2-14 for MAF by study, Figure 2-14). Furthermore, the MAF for rs55705857 in the SFAGS-GWAS differed from the other three studies (See Table 2-14 for MAF by study). Consequently, a sensitivity analysis was conducted to assess the effect of study heterogeneity on this estimate in non-GBM using only the GICC, MDA-GWAS, and

GliomaScan datasets. The results of this analysis did not substantially change from (Main

-6 -5 analysis pD=1.20x10 and sensitivity pD=1.49x10 ).

Figure 2-14 Sex-specific odds ratios and 95% CI from meta-analysis and by study for rs55705857 (8q24.21) for all glioma, GBM, and non-GBM

Table 2-14 Minor allele frequencies (MAF), for meta-analysis and individual studies for rs11979158, rs55705857 and rs9841110 overall and by histology groupings. Males Females Cases Controls Cases Controls RSID Histology Study N RAF N RAF Nca RAFca Nco RAFco rs9841110 All glioma Meta-analysis 4,831 0.307 5,176 0.313 3,206 0.291 5,417 0.337 (3p21.31) GICC 2,733 0.311 1,868 0.312 1,831 0.299 1,397 0.337 SFAGS-GWAS 440 0.282 749 0.333 237 0.270 1,618 0.343 MDA-GWAS 714 0.293 1,094 0.302 429 0.296 1,142 0.335 GliomaScan 944 0.318 1,465 0.311 709 0.275 1,260 0.330 GBM Meta-analysis 2,835 0.304 5,176 0.313 1,682 0.283 5,417 0.337 GICC 1,575 0.310 1,868 0.312 885 0.287 1,397 0.337 SFAGS-GWAS 333 0.279 749 0.333 178 0.253 1,618 0.343 MDA-GWAS 397 0.290 1,094 0.302 246 0.307 1,142 0.335 GliomaScan 530 0.313 1,465 0.311 373 0.269 1,260 0.330 Non-GBM Meta-analysis 1,716 0.312 5,176 0.313 1,320 0.306 5,417 0.337 GICC 1,036 0.317 1,868 0.312 862 0.314 1,397 0.337 SFAGS-GWAS 107 0.290 749 0.333 59 0.322 1,618 0.343 MDA-GWAS 317 0.297 1,094 0.302 183 0.281 1,142 0.335 GliomaScan 256 0.325 1,465 0.311 216 0.294 1,260 0.330 Astrocytoma (Non- Meta-analysis 787 0.296 4,427 0.309 585 0.300 3,799 0.334 GBM) (WHO grade II- GICC 494 0.299 1,868 0.312 400 0.307 1,397 0.337 III) MDA-GWAS 155 0.274 1,094 0.302 88 0.288 1,142 0.335 74

GliomaScan 138 0.312 1,465 0.311 97 0.279 1,260 0.330 Oligodendroglioma Meta-analysis 483 0.314 4,427 0.309 399 0.303 3,799 0.334 (WHO grade II-III) GICC 310 0.321 1,868 0.312 266 0.312 1,397 0.337 MDA-GWAS 95 0.289 1,094 0.302 54 0.241 1,142 0.335 GliomaScan 78 0.314 1,465 0.311 79 0.316 1,260 0.330 rs11979158 All glioma Meta-analysis 4,831 0.136 5,176 0.170 3,206 0.153 5,417 0.163 (7p11.2) GICC 2,733 0.135 1,868 0.186 1,831 0.149 1,397 0.164 SFAGS-GWAS 440 0.125 749 0.154 237 0.141 1,618 0.151 MDA-GWAS 714 0.137 1,094 0.166 429 0.164 1,142 0.166 GliomaScan 944 0.146 1,465 0.163 709 0.160 1,260 0.173 GBM Meta-analysis 2,835 0.128 5,176 0.170 1,682 0.139 5,417 0.163 GICC 1,575 0.129 1,868 0.186 885 0.135 1,397 0.164 SFAGS-GWAS 333 0.122 749 0.154 178 0.126 1,618 0.151 MDA-GWAS 397 0.134 1,094 0.166 246 0.140 1,142 0.166 GliomaScan 530 0.128 1,465 0.163 373 0.151 1,260 0.173 Non-GBM Meta-analysis 1,716 0.145 5,176 0.170 1,320 0.167 5,417 0.163 GICC 1,036 0.142 1,868 0.186 862 0.164 1,397 0.164 SFAGS-GWAS 107 0.136 749 0.154 59 0.186 1,618 0.151 MDA-GWAS 317 0.140 1,094 0.166 183 0.197 1,142 0.166 GliomaScan 256 0.162 1,465 0.163 216 0.150 1,260 0.173 Astrocytoma (Non- Meta-analysis 787 0.134 4,427 0.066 585 0.136 3,799 0.057 GBM) (WHO grade II- GICC 494 0.142 1,868 0.059 400 0.133 1,397 0.051 III) MDA-GWAS 155 0.131 1,094 0.070 88 0.141 1,142 0.059 GliomaScan 138 0.107 1,465 0.073 97 0.148 1,260 0.061 Oligodendroglioma Meta-analysis 483 0.168 4,427 0.066 399 0.207 3,799 0.057 (WHO grade II-III) GICC 310 0.168 1,868 0.059 266 0.224 1,397 0.051 MDA-GWAS 95 0.196 1,094 0.070 54 0.146 1,142 0.059 GliomaScan 78 0.132 1,465 0.073 79 0.194 1,260 0.061 rs55705857 All glioma Meta-analysis 4,831 0.096 5,176 0.066 3,206 0.112 5,417 0.055 (8q24.21) GICC 2,733 0.097 1,868 0.059 1,831 0.113 1,397 0.051 SFAGS-GWAS 440 0.070 749 0.063 237 0.096 1,618 0.050 MDA-GWAS 714 0.113 1,094 0.070 429 0.106 1,142 0.059 GliomaScan 944 0.092 1,465 0.073 709 0.120 1,260 0.061 GBM Meta-analysis 2,835 0.072 5,176 0.066 1,682 0.075 5,417 0.055 GICC 1,575 0.069 1,868 0.059 885 0.062 1,397 0.051 SFAGS-GWAS 333 0.068 749 0.063 178 0.085 1,618 0.050 MDA-GWAS 397 0.080 1,094 0.070 246 0.081 1,142 0.059 GliomaScan 530 0.078 1,465 0.073 373 0.100 1,260 0.061 Non-GBM Meta-analysis 1,716 0.135 5,176 0.066 1,320 0.159 5,417 0.055 GICC 1,036 0.140 1,868 0.059 862 0.168 1,397 0.051 SFAGS-GWAS 107 0.073 749 0.063 59 0.130 1,618 0.050 MDA-GWAS 317 0.154 1,094 0.070 183 0.139 1,142 0.059 GliomaScan 256 0.116 1,465 0.073 216 0.152 1,260 0.061 Astrocytoma (Non- Meta-analysis 787 0.134 4,427 0.066 585 0.136 3,799 0.057 GBM) (WHO grade II- GICC 494 0.142 1,868 0.059 400 0.133 1,397 0.051 III) MDA-GWAS 155 0.131 1,094 0.070 88 0.141 1,142 0.059 GliomaScan 138 0.107 1,465 0.073 97 0.148 1,260 0.061 Oligodendroglioma Meta-analysis 483 0.168 4,427 0.066 399 0.207 3,799 0.057 (WHO grade II-III) GICC 310 0.168 1,868 0.059 266 0.224 1,397 0.051 MDA-GWAS 95 0.196 1,094 0.070 54 0.146 1,142 0.059 GliomaScan 78 0.132 1,465 0.073 79 0.194 1,260 0.061 Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

A histology-specific analysis found a similar sex differences in ORs for rs55705957 for both non-GBM astrocytoma, and oligodendroglioma (Table 2-6). Previous analyses have shown that this variant is strongly associated with IDH1/2 mutant tumors, particularly those that have 1p/19q deletions [160, 161]. Data on IDH1/2 mutation and 1p/19q 75

codeletion were not available for this dataset, and as a result it was not possible to further explore molecular subtype associations. To assess potential differences in frequency of

IDH1/2 mutant GBM, frequency of these mutations was compared by sex within the

TCGA GBM dataset [168]. Approximately the same proportion of males as females had

IDH1/2 mutations (8.1% vs 7.5%, respectively), so it does not seem likely that females

may be more likely than males to present with IDH1/2 mutant GBM.

I identified a large region in 3p21.31 that was associated with all glioma and GBM in

females only (Table 2-2). Effects were similar in all studies included in the analysis (See

Table 2-14 for MAF by study Figure 2-15 ). The strongest association in this region was

rs9841110, which is an intronic variant located upstream of dystroglycan 1 (DAG1)

within an enhancer region. The identified risk allele at rs9841110 (C) is associated with

significantly increased expression in glutathione peroxidase 1 (GPX1 1.3x10-7 in

cerebellum, p=3.1x10-7, in frontal cortex), macrophage stimulating 1 receptor (MST1R

[RON], p=4.3x10-5 in cerebellar hemisphere) and ring finger protein 123 (RNF123

[KPC1), and significantly decreased expression of macrophage stimulating 1 (MST1,

p=4.8x10-6 in hypothalamus and –p=1.5x10-5 in cerebellum),and RNA binding motif

protein 6 (RBM6, p=2.7x10-6 in cerebellum) in normal brain tissue [180]. GBM samples have elevated expression of GPX1 (fold change 2.79) and decreased expression of

MST1R (fold change 0.44) as compared to normal tissue [163, 181], and increased expression of GPX1 and MSTIR1 have been associated with poor prognosis in multiple cancer types [182, 183].

76

Though this region has not previously been associated with glioma, previous GWAS have detected associations at 3p21.31 for a large variety of traits, including several autoimmune diseases as well as increased age at menarche [184-187]. Three variants previously associated with increased age at menarche (rs7647973-A: D’=1.0 and r2=0.1441; rs6762477-G: D’=0.66 and r2=0.1659; rs7617480-A: D’=1.0 and

r2=0.1332) are in linkage disequilibrium with the identified risk allele at rs9841110 (C),

with in CEU [174, 185]. If lifetime estrogen exposure modifies glioma risk, it is

reasonable that variants which increases age at menarche, which may potentially decrease

total lifetime estrogen exposure, may also be related to glioma risk in females. Due to the

complexity of measuring lifetime estrogen exposure (which is affected by age at

menarche, age at menopause, parity, breast feeding patterns, and estrogen replacement

therapy post-menopause) it is difficult to determine the ‘true’ effect that this exposure

might have on glioma risk.

Figure 2-15 Sex-specific odds ratios and 95% CI from meta-analysis and by study for rs9841110 (3p21.31) for all glioma, GBM, and non-GBM

As compared to a model containing age at diagnosis and sex alone, the three SNPs

(rs55705857, rs9841110 and rs11979158) identified as having sex-specific effects

explain an additional 1.4% of trait variance within the GICC set. The variance explained

77

by these SNPs varies by histology (0.7% in GBM, and 3.3% in Non-GBM). The variance

explained by the addition of these three SNPs was higher in females for all glioma (1.3%

in males and 2.2% in females), and non-GBM glioma (2.3% in males and 5.3% in

females), and slightly higher in males for GBM (0.9% in males and 0.7% in females).

In order to compare the cumulative effects of glioma risk variants by sex, unweighted

risk scores (URS) were generated by summing all risk alleles using the ten SNPs found to

be significantly associated with glioma in this analysis. GBM (URS-GBM) and non-

GBM (URS-NGBM) specific URS were calculated using sets of six SNPs in this set that

were associated with significantly associated with these histologies. Individuals with

lower numbers of risk alleles had significantly lower risk of glioma, and those with

higher numbers of alleles had increased risk for glioma, with statistically significant

trends in each histology group. Males and females with low risks scores had similar odds

of glioma, while females had increased odds in the upper strata of scores as compared to

males. Development of risk scores that weight alleles by effect size, and use sex-specific

estimates for variants for which effect size varies by sex (such as 7p11.2 and 8q24.21),

may lead to better predictive values for risk scores.

This is the first sex-specific analysis of germline risk variants for glioma, and identifies

three loci with sex-specific effects, and leverages multiple existing glioma GWAS

datasets. While often not included in GWAS, sex-stratified analyses can reveal genetic

sources of sexual dimorphism in risk, [124, 125]. Sex variation in genetic susceptibility to

disease is likely not due to sex differences in DNA sequence, but is likely to be related to

sex-specific regulatory functions [129-131]. These analyses may not only contribute to

78

understanding of sources of sex difference in incidence, but may also suggest mechanisms and pathways that vary by sex in contributions to gliomagenesis.

In addition to genetic sources of difference, there are likely several additional factors acting in combination which contribute to sex differences in glioma incidence. Sex differences in disease can also be linked to in-utero development, during which time gene expression and risk phenotypes are patterned through the action of X alleles that escape inactivation and genes on the non-pseudo-autosomal component of the Y chromosome, as well as the epigenetic effects of in utero testosterone. [188]. A previous analysis estimating heritability of brain and CNS tumors by sex using twins attempted to estimate sex-specific relative risks, but these analyses were limited by a small sample size [189].

Further investigation of the inheritance patterns of familial glioma by sex may also provide additional information about sex differences in this disease.

There are several limitations to this analysis. Individuals included in these datasets were recruited during different time periods from numerous institutions, with no central review of pathology. Molecular tumor markers were unavailable for all datasets, and as a result classifications are based on the treating pathologist using the prevailing histologic criteria at time of diagnosis. The variant at 8q24.21 has been shown to have significant association with particular molecular subtypes, and without molecular data it is not possible to determine whether the observed result is an artifact of varying molecular features by sex. Oligodendroglioma as a histology is highly enriched for IDH1/2 and

1p/19q co-deleted tumors (117/174, or ~67% within the TCGA glioma dataset [168] and it is therefore likely that the analysis using only tumors classified as oligodendroglioma

79

captured most of this molecular subtype. Males and females within histology groups have different frequencies of IDH1/2 mutation [168], which may confound the estimates for

8q24.21.

There are likely several factors acting in combination which contribute to sex differences in glioma incidence. Many potential environmental risk factors (e.g. occupational exposures) and risk behaviors are strongly correlated with sex. Sex differences in disease could be linked to varying conditions in-utero, where x-linked and other genes with sex- differentiated expression could play a role in neural development. A previous analysis estimating heritability of brain and CNS tumors by sex using twins attempted to estimate sex-specific relative risks, but these analyses were limited by a small sample size [189].

Further investigation of the inheritance patterns of familial glioma by sex may also provide additional information about sex differences in this disease.

Conclusions

Sex and other demographic differences in cancer susceptibility can provide important

clues to etiology, and these differences can be leveraged for discovery in genetic

association studies. This analysis identified potential sex-specific effects in two previous

identified glioma risk loci (7p11.2, and 8q24.21), and 1 newly identified autosomal locus

(3p21.31). Odds ratios for the highest strata of an unweighted risk score calculated by

summing total risk alleles was higher in females as compared to males in all three

histology groups. These significant differences in effect size may be a result of differing

biological function of these variants by sex due to biological sex differences, or

80

interaction between these variants and unidentified risk factors that vary in prevalence or effect by sex.

81

Chapter 3 – Sex-specific gene and pathway modeling of

inherited glioma risk

Abstract

Genome-wide association studies (GWAS) have identified 25 risk variants for glioma, and

these explain ~27% and ~34% of heritability for GBM and non-GBM, respectively.

Genetic risk for these tumors is likely polygenic; gene- and pathway-based analyses may

increase power to detect new sources of genetic risk. Most glioma histologies occur with

significantly higher incidence in males. A sex-stratified analysis has identified glioma risk variants that differ by sex, and further analyses using gene- and pathway-based approaches may further elucidate risk variation by sex. Results from the Glioma International Case-

Control (GICC) Study were used as a testing set, and results from three prior glioma GWAS

were combined via inverse-variance weighted fixed effects meta-analysis in and used as a

validation set for any significant genes and pathways. Using summary statistics for

autosomal markers found to be nominally significant (p<0.01) in a previous eight-study

meta-analysis and X chromosome makers with nominally significant single SNP association (p<0.01), three algorithms (Pascal, BimBam, and GATES) were used to

generate gene-scores using all SNPs within 50kb of each gene, and Pascal was used to

generate pathway scores. 25 genes within five regions reached the set significance

threshold (p<3.3x10-6) in at least two of three algorithms in all glioma or GBM in males,

and 19 genes within six regions reached the set significance threshold for females

(p<3.3x10-6) in at least two of three algorithms in all glioma or GBM. EGFR and RTEL1-

TNFRSF6B were significantly associated with all glioma and GBM in males only, and 82

these associations remained nominally significant after conditioning on known risk loci.

There was also a female-specific association in TERT which remained nominally significant after conditioning on previous GWAS hits. There were nominal associations with the Telomeres, Telomerase, Cellular Aging, and Immortality pathway in both males and females in all glioma and GBM. These results suggest that there may be biologically relevant significant differences by sex in genetic risk for glioma. Further gene- and pathway-based analyses may further elucidate the biological processes through which this risk is conferred.

Background

Glioma is the most common type of primary malignant brain tumor in the US, with an

average annual age-adjusted incidence rate of 6.0/100,000 [1]. Each individual GWAS

results in regression estimates for hundreds of thousands of single nucleotide

polymorphisms (SNPs), only several hundred of which may be prioritized for further

investigation. While this process is appropriate for identifying individual loci that

contribute to the development of disease, there is likely additional information about

disease risk within results that do not meet thresholds for statistical significance. Single-

SNP tests may not be appropriate for additional loci discovery given the known

biological complexity of gliomas. Multi-SNP methods, such as gene or pathway-based approaches, can allow for additional discovery in a manner that complements single-SNP

approaches, while substantially reducing the multiple testing burden associated with

GWAS [107]. A recent sex-stratified analysis has identified glioma risk loci that differ by

83

sex (Chapter 2) and further analyses using gene- and pathway-based approaches may further elucidate sex differences in genetic risk for glioma.

Methods

Using summary statistics for autosomal markers found to be nominally significant

(p<0.01) in a previous eight-study meta-analysis [4] (Figure 3-1a) and X chromosome

makers with nominally significant single SNP association (p<0.01), three algorithms,

Pascal [109], BimBam [110], and GATES [111], were used to generate gene-scores.

Gene-based effects were assessed using all SNPs within 50kb of each gene (using 5’ and

3’ UTR) as defined using the UCSC hg19 assembly. Results from the GICC were used as a testing set (Figure 3-1a), and results from three prior glioma GWAS (SFAGS-GWAS,

MDA-GWAS, and GliomaScan) were combined via inverse-variance weighted fixed effects meta-analysis in META [173] and used as a validation set for any significant genes and pathways (Figure 3-1). See Table 3-1 for an overview of characteristics for individuals included in these datasets).

Summary statistics were generated using sex-stratified logistic regression models in

SNPTEST [170] (see Chapter 2, Section 2.3.3.1 for additional information on these analyses). Autosomal chromosomes were analyzed using sex-stratified logistic regression models to estimate sex-specific betas (βM and βF), standard errors (SEM and SEF), and p-

values (pM and pF). X chromosome data were available from the GICC set only, and analyzed using logistic regression model in SNPTEST module ‘newml’ assuming complete inactivation of one allele in females, and males are treated as homozygous

females (see Chapter 2, Section 2.3.3.3 for additional information on these analyses).

84

Linkage disequilibrium information was based on structure within the European cases from the 1,000 genomes project phase 3 dataset . All analyses were performed separately for males and females to identify genes and pathways with germline variation between cases and controls. Genes were prioritized that were identified by at least two of the three selected algorithms (Figure 3-1b, Figure 3-1c). Analyses were conducted for glioma overall, and for GBM only, by sex within each dataset.

Figure 3-1. Study schematic for a) generation of discovery and validation summary statistic sets, b) generation of discovery gene-based tests and prioritization c) validation of gene-based tests, d) generation of discovery pathway-based tests and prioritization e) validation of pathway-based tests

85

Table 3-1 Demographic characteristics of included GWAS studies

Males Females Characteristic Study Cases Controls Cases Controls N Total 4,831 5,216 3,206 5,470 GICCa 2,733 1,868 1,831 1,397 SFAGS-GWASb 440 749 237 1,611 MDA-GWASc 714 1,094 429 1,142 GliomaScand 944 1,465 709 1,260 Mean Age (SD) Total 52.5 (14.5) 58.2 (15.2) 51.8 (14.9) 54.7 (14.5) GICC 52.5 (14.3) 56.1 (13.4) 51.3 (14.6) 53.4 (14.3) SFAGS-GWAS 53.8 (13.0) 50.6 (14.8) 53.5 (14.0) 49.3 (13.2) MDA-GWAS 47.1 (13.0) Modal age 47.7 (13.9) Modal age group: group: 60-69e 65-69f GliomaScan 56.0 (15.5) 69.3 (12.7) 55.1 (15.7) 64.0 (15.4) GBM (% of total) Total 2,835 (58.7%) -- 1,682 (52.5%) -- GICC 1,575 (57.6%) -- 885 (48.3%) -- SFAGS-GWAS 333 (75.7%) -- 178 (75.1%) -- MDA-GWAS 397 (55.6%) -- 246 (57.3%) -- GliomaScan 530 (56.1%) -- 373 (52.6%) -- GBM - Mean Age (SD) Total 57.3 (12.0) -- 57.8 (12.1) -- GICC 57.7 (11.4) -- 57.8 (11.6) -- SFAGS-GWAS 56.4 (11.5) -- 56.2 (12.3) -- MDA-GWAS 52.0 (11.7) -- 53.7 (11.3) -- GliomaScan 60.4 (13.0) -- 61.4 (12.5) -- a. Data from Glioma International Case-Control Study (GICC; Melin, et al. [4]). b. Data from San Francisco Adult Glioma Study GWAS (SFAGS-GWAS; Wrensch, et al. [9]). c. data from MD Anderson Cancer Center GWAS (MDA-GWAS; Shete, et al. [8]). d. Data from the National Cancer Institute’s GliomaScan (GliomaScan; Rajaraman, et al. [7]). e. Data from CGEMS prostate study (Yeager et al. [152]). Continuous age is not available, age distribution is as follows 50-59: 12.3%, 60-69: 56.7%, 70-79: 30.7%, 80-89: 0.3%. f. Data from CGEMS breast study (Hunter et al. [153]). Continuous age is not available, age distribution is as follows: 0- 54: 4.3%, 55-59: 15.0%, 60-64: 23.6%, 65-69: 27.5%, 70-74: 19.0%, 75-99: 10.7%.

Gene scores were calculated using Pascal [109], Bimbam [110] and GATES [111] (both

BimBam and GATES were as implemented in FAST [118]) For all methods implemented within FAST, SNPs were excluded if they were in complete LD (r2=1) with another SNP in the gene, which limited the amount of SNPs evaluated within each gene.

Pathway scores were generated using Pascal [109], using gene and fusion-gene scores generated by the Pascal algorithm (Figure 3-1d, Figure 3-1e). The pathway score was then calculated using both independent and fusion genes. Results from each gene and pathway algorithm were compared within each sex as well as between sexes. Pathway information was obtained from KEGG, Reactome, and Biocarta, as made available in

MSigDB [190-194].

86

For genes within regions that contain SNPs previously identified as significant by

GWAS, conditional analyses were run for all SNPs within those regions using SNPTEST and adjusted gene-scores were calculated. All figures were generated using R 3.3.2, ggplot2, graphite, network, Intergraph, ggnetwork, igraph, and gridExtra [195-200].

Results

159,706 SNPs from the testing set and 163,115 SNPs from the validation set were

included in gene-based analyses. Gene scores were generated for ~16,000 genes and were

considered significant at p<3.3x10-6 (based on a Bonferroni correction for 15,000 tests).

P-values in the validation set were considered significant at p<0.001 (based on a

Bonferroni correction for 50 tests, [25 total genes tested in each sex]).

3.4.1 Sex-specific gene scores

Among males, 25 genes within five regions had scores that reached the set significance

threshold (p<3.3x10-6) in at least two of three evaluated algorithms in all glioma or GBM

(See Figure 3-2 and Table 3-2 for the strongest associations within each region of the six

regions where genes met the set significance threshold). 19 genes within six regions had

scores that reached the set significance threshold for females (p<3.3x10-6) in at least two of three evaluated algorithms in all glioma or GBM (Please see Figure 3-2 and Table 3-3 for the strongest associations within each of the six regions where genes met the set significance threshold). Solute carrier family 6, member 18 (SLC6A18), Telomerase (TERT), and cyclin dependent inhibitor 2B (CDKN2B), and stathmin 3 (STMN3) reached the set significance threshold in both males and females in

87

GBM, while SLC6A18, TERT, and STMN3 reached the set significance threshold in both sexes in all glioma. All shared associations replicated.

Figure 3-2 Gene scores for prioritized genes by algorithm, histology, and sex for a) BPESC1 (3q23), B) SLC6A18 (5p15.33), C) TERT (5p15.33), D) EGFR (7p11.2), E)CDKN2B (9p21.3), F) DNAH2 (17p13.1), G) STMN3 (20q13.33), H) RTEL1-TNFRSF6B (20q13.33)

Table 3-2 Gene scores in males for prioritized genes by algorithm and histology Discovery Validation Gene Algo- Algo- (location) rithms p rithms p Histology Pascal BimBam GATES <3.3x10-6 Pascal BimBam GATES <0.001 BPESC1 All glioma 0.1628 0.3621 1.0000 0/3 0.8506 1.0000 1.0000 0/3 (3q23) GBM 0.2889 0.7328 1.0000 0/3 0.3963 0.3103 0.3744 0/3 SLC6A18 All glioma 4.08x10-10 ≤1.00x10-6 3.26x10-13 3/3 1.78x10-11 ≤1.00x10-6 3.81x10-15 3/3 (5p15.33) GBM 1.00x10-12 ≤1.00x10-6 1.20x10-17 3/3 1.34x10-11 ≤1.00x10-6 4.88x10-15 3/3 TERT All glioma 6.92x10-8 ≤1.00x10-6 4.99x10-13 3/3 4.21x10-9 ≤1.00x10-6 6.00x10-15 3/3 (5p15.33) GBM 1.69x10-10 ≤1.00x10-6 1.84x10-17 3/3 7.20x10-10 ≤1.00x10-6 7.67x10-15 3/3 EGFR All glioma 1.00x10-12 ≤1.00x10-6 2.60x10-9 3/3 6.63x10-6 4.99x10-4 6.15x10-4 3/3 (7p11.2) GBM 1.00x10-12 ≤1.00x10-6 7.75x10-9 3/3 4.84x10-7 1.45x10-4 2.04x10-4 3/3 CDKN2B All glioma 1.00x10-12 ≤1.00x10-6 3.65x10-13 3/3 1.04x10-8 ≤1.00x10-6 2.41x10-8 3/3 (9p21.3) GBM 1.00x10-12 ≤1.00x10-6 4.54x10-13 3/3 4.81x10-9 ≤1.00x10-6 2.87x10-8 3/3 DNAH2 All glioma 7.21x10-4 ≤1.00x10-6 4.01x10-7 2/3 7.64x10-6 ≤1.00x10-6 4.11x10-8 3/3 (17p13.1) GBM 0.0023 3.76x10-5 7.94x10-6 0/3 9.60x10-5 2.00x10-6 7.88x10-7 3/3 STMN3 All glioma 8.96x10-10 ≤1.00x10-6 4.37x10-8 3/3 9.90x10-12 ≤1.00x10-6 2.03x10-11 3/3 (20q13.33) GBM 3.96x10-11 ≤1.00x10-6 2.94x10-9 3/3 1.13x10-9 ≤1.00x10-6 9.53x10-9 3/3 RTEL1- All glioma 6.42x10-10 ≤1.00x10-6 4.71x10-8 3/3 1.00x10-12 ≤1.00x10-6 2.18x10-11 3/3 TNFRSF6B GBM 7.59x10-11 ≤1.00x10-6 3.17x10-9 3/3 9.19x10-11 ≤1.00x10-6 1.02x10-8 3/3 (20q13.33) Abbreviations: BPESC1: blepharophimosis, epicanthus inversus and ptosis, candidate 1 (non-protein coding); SLC6A18: solute carrier family 6 member 18; TERT: telomerase reverse transcriptase; EGFR: epidermal growth factor receptor; CDKN2B-AS1: CDKN2B antisense RNA 1; CDKN2B: cyclin dependent kinase inhibitor 2B; DNAH2: dynein axonemal heavy chain 2; STMN3: stathmin 3; RTEL1-TNFRSF6B: RTEL1-TNFRSF6B readthrough (NMD candidate).

88

Table 3-3 Gene scores in females for prioritized genes by algorithm and histology Discovery Validation Gene Algo- Algo- (location) rithms p rithms p Histology Pascal BimBam GATES <3.3x10-6 Pascal BimBam GATES <0.001 BPESC1 All glioma 9.42x10-6 1.08x10-5 2.36x10-5 0/3 0.4891 1.0000 1.0000 0/3 (3q23) GBM 7.69x10-7 2.00x10-6 1.73x10-6 3/3 0.8286 1.0000 1.0000 0/3 SLC6A18 All glioma 5.57x10-9 ≤1.00x10-6 1.91x10-11 3/3 1.29x10-8 ≤1.00x10-6 7.43x10-10 3/3 (5p15.33) GBM 1.45x10-10 ≤1.00x10-6 1.20x10-14 3/3 3.27x10-7 ≤1.00x10-6 2.68x10-9 3/3 TERT All glioma 1.69x10-7 ≤1.00x10-6 2.93x10-11 3/3 4.33x10-9 ≤1.00x10-6 1.17x10-9 3/3 (5p15.33) GBM 1.41x10-8 ≤1.00x10-6 1.83x10-14 3/3 2.97x10-8 ≤1.00x10-6 4.21x10-9 3/3 EGFR All glioma 5.85x10-4 0.0023 0.0085 0/3 0.0031 0.0112 0.0331 0/3 (7p11.2) GBM 1.38x10-5 1.44x10-4 1.35x10-4 0/3 2.56x10-5 0.0021 0.0065 1/3 CDKN2B All glioma 7.42x10-7 4.00x10-6 4.98x10-6 1/3 7.51x10-8 ≤1.00x10-6 5.35x10-7 3/3 (9p21.3) GBM 5.53x10-9 ≤1.00x10-6 1.63x10-8 3/3 8.99x10-8 ≤1.00x10-6 5.93x10-7 3/3 DNAH2 All glioma 2.41x10-4 3.07x10-4 3.08x10-4 0/3 0.0012 7.75x10-6 1.96x10-6 2/3 (17p13.1) GBM 0.0036 0.0032 0.0067 0/3 0.0037 1.41x10-4 3.31x10-5 2/3 STMN3 All glioma 2.12x10-8 2.00x10-6 4.17x10-6 2/3 6.21x10-6 2.00x10-6 2.69x10-6 3/3 (20q13.33) GBM 2.39x10-8 3.00x10-6 7.25x10-6 2/3 2.42x10-7 ≤1.00x10-6 1.30x10-8 3/3 RTEL1- All glioma 9.64x10-8 1.02x10-5 1.14x10-5 1/3 6.04x10-7 2.00x10-6 2.89x10-6 3/3 TNFRSF6B GBM 4.52x10-8 4.00x10-6 7.82x10-6 1/3 1.11x10-8 ≤1.00x10-6 1.40x10-8 3/3 (20q13.33) Abbreviations: BPESC1: blepharophimosis, epicanthus inversus and ptosis, candidate 1 (non-protein coding); SLC6A18: solute carrier family 6 member 18; TERT: telomerase reverse transcriptase; EGFR: epidermal growth factor receptor; CDKN2B-AS1: CDKN2B antisense RNA 1; CDKN2B: cyclin dependent kinase inhibitor 2B; DNAH2: dynein axonemal heavy chain 2; STMN3: stathmin 3; RTEL1-TNFRSF6B: RTEL1-TNFRSF6B readthrough (NMD candidate).

Epidermal growth factor receptor (EGFR), dynein axonemal heavy chain 2 (DNAH2), and several genes surrounding regulator of telomere elongation 1 (RTEL1) on chromosome 20 (including, RTEL1-TNFRSF6B [RTEL1-TNFRSF6B]) reached the significance threshold in males only (Figure 3-2). In all glioma, CDKN2BA reached the set significance threshold in males only. All genes replicated in males. Blepharophimosis, epicanthus inversus and ptosis, candidate 1 (non-protein coding) (BPESC1) reached the significance threshold in all glioma in females only (Figure 3-2), but this association was not able to be replicated.

3.4.2 Gene scores conditioned on previously identified glioma risk loci

The association in EGFR was nominally significant in males after conditioning on three

SNPs previously identified by GWAS within this gene (rs75061358, rs723527, and rs11979158), including one (rs11979158) that has previously been identified as having a

89

sex-specific effect (see Chapter 2, Figure 3-3, Table 3-4). Associations in STMN3 and

RTEL1-TNFRSF6B were also nominally significant after conditioning in both males and females (Figure 3-3, Figure 3-3, Table 3-5). The association at TERT was nominally significant for females in GBM only after conditioning on the previous identified SNP

(Figure 3-3, Table 3-5).

Figure 3-3 Conditional gene scores for prioritized genes by algorithm, histology, and sex for A) SLC6A18 (5p15.33), B) TERT (5p15.33), C) EGFR (7p11.2), D)CDKN2B (9p21.3), E) DNAH2 (17p13.1), F) RTEL1-TNFRSF6B (20q13.33)

Table 3-4 Conditional gene scores in males for prioritized genes by algorithm and histology Discovery Validation Gene Histology Pascal BimBam GATES Pascal BimBam GATES SLC6A18a All glioma 0.0947 0.0806 0.1093 0.2174 0.0123 0.0163 GBM 0.1178 0.0169 0.0213 0.1178 0.0120 0.0156 TERTa All glioma 0.2761 0.0726 0.1538 0.6647 0.0236 0.0262 GBM 0.4487 0.0233 0.0326 0.4487 0.0146 0.0252 EGFRb All glioma 0.0081 9.45x10-4 0.0016 0.0559 0.0013 0.0017 GBM 0.0015 0.0004 0.0016 0.0295 0.0133 0.0865 CDKN2Bc All glioma 0.3343 0.2419 0.2336 0.4163 0.3065 0.3226 GBM 0.2599 0.0124 0.0751 0.2599 0.1452 0.2118 DNAH2d All glioma 0.0666 0.0565 0.3101 0.7448 0.9758 1.0000 GBM 0.5835 0.5161 1.0000 0.5835 0.6694 0.4353 STMN3e All glioma 0.0033 0.0070 0.0093 0.0303 8.25x10-4 0.0011 GBM 0.0053 0.0060 0.0133 0.0063 0.0071 0.0080 RTEL1- All glioma 0.0069 0.0041 0.0101 0.0174 0.0015 0.0012 TNFRSF6Be GBM 0.0082 0.0067 0.0143 0.0101 0.0087 0.0086 Abbreviations: BPESC1: blepharophimosis, epicanthus inversus and ptosis, candidate 1 (non-protein coding); SLC6A18: solute carrier family 6 member 18; TERT: telomerase reverse transcriptase; EGFR: epidermal growth factor receptor; 90

CDKN2B-AS1: CDKN2B antisense RNA 1; CDKN2B: cyclin dependent kinase inhibitor 2B; DNAH2: dynein axonemal heavy chain 2; STMN3: stathmin 3; RTEL1-TNFRSF6B: RTEL1-TNFRSF6B readthrough (NMD candidate). a. conditioned on rs10069690 (TERT); b. conditioned on rs75061358, rs723527, and rs11979158 (EGFR); c. conditioned on rs634537; d. conditioned on rs78378222 (TP53); e. conditioned on rs2297440 (RTEL1)

Table 3-5 Conditional gene scores in females for prioritized genes by algorithm and histology Discovery Validation Gene Histology Pascal BimBam GATES Pascal BimBam GATES SLC6A18 a All glioma 0.0130 0.0021 0.0012 0.0119 0.0140 0.0196 GBM 0.0361 0.0002 0.0004 0.0361 0.0032 0.0041 TERT a All glioma 0.0412 0.0017 0.0018 0.0053 0.0051 0.0189 GBM 0.0065 7.16x10-4 5.86x10-4 0.0065 0.0046 0.0067 EGFRb All glioma 0.0896 0.0565 0.1152 0.0397 0.0645 0.3248 GBM 0.0723 0.0484 0.1274 0.0123 0.0806 0.6290 CDKN2B c All glioma 0.5988 1.0000 1.0000 0.0061 0.0089 0.0249 GBM 0.0010 0.8468 1.0000 0.0010 0.0010 0.0033 DNAH2d All glioma 0.3671 0.5645 0.6547 0.5484 0.6613 1.0000 GBM 0.5758 0.2903 0.2693 0.5758 0.7016 1.0000 STMN3e All glioma 8.91x10-4 0.0042 0.0128 0.0290 0.0078 0.0278 GBM 0.0022 0.0032 0.0194 0.0805 0.0132 0.0368 RTEL1- All glioma 0.0019 0.0023 0.0094 0.0415 0.0135 0.0300 TNFRSF6Be GBM 0.0036 0.0044 0.0195 0.0777 0.0122 0.0397 Abbreviations: BPESC1: blepharophimosis, epicanthus inversus and ptosis, candidate 1 (non-protein coding); SLC6A18: solute carrier family 6 member 18; TERT: telomerase reverse transcriptase; EGFR: epidermal growth factor receptor; CDKN2B-AS1: CDKN2B antisense RNA 1; CDKN2B: cyclin dependent kinase inhibitor 2B; DNAH2: dynein axonemal heavy chain 2; STMN3: stathmin 3; RTEL1-TNFRSF6B: RTEL1-TNFRSF6B readthrough (NMD candidate). a. conditioned on rs10069690 (TERT); b. conditioned on rs75061358, rs723527, and rs11979158 (EGFR); c. conditioned on rs634537; d. conditioned on rs78378222 (TP53); e. conditioned on rs2297440 (RTEL1)

3.4.3 Gene scores for X chromosome genes

There were 202,886 X chromosome SNPs with MAF≥0.01 and INFO score ≥0.7 in the

GICC dataset. Gene scores were calculated for 56 X chromosome genes with at least five

SNPs, and associations were considered significant at p<8.3x10-4 (Bonferroni correction for 60 tests). There were 12 genes within four chromosomal regions that reached the significance threshold in at least two of three algorithms (Results from the strongest association in each region are shown in Table 3-6). Shroom Family Member 2

(SHROOM2) (Xp22.2), and Armadillo Repeat Containing, X-Linked 2 (ARMCX2)

(Xq22.1) were significantly associated with both all glioma, and GBM, while dystrophin

(DMD) (Xq21.2-p21.1) was significantly associated with all glioma only, and zinc finger protein 185 with LIM domain (ZNF185) was significantly associated with GBM only

91

Table 3-6 Gene scores for prioritized X chromosome genes by histology Pascal BimBam GATES Algorithms gene (location) histology SNPsa P SNPsa Testsb P SNPsa Testsb P p<8.3x10-4 SHROOM2 All glioma 7 1.20x10-4 6 5.09 7.68x10-4 6 5.09 0.0020 2/3 (Xp22.2) GBM 9 1.45x10-5 8 7.06 5.02x10-4 8 7.06 0.0012 2/3 DMD (Xp21.2- All glioma 88 3.22x10-5 79 59.92 3.13x10-4 79 59.92 0.0026 2/3 p21.1) GBM 39 6.53x10-6 37 31.23 0.0047 37 31.23 0.0097 1/3 ARMCX2 All glioma 49 1.07x10-4 44 33.78 1.89x10-4 44 33.78 4.79x10-4 3/3 (Xq22.1) GBM 63 5.82x10-5 58 45.41 2.13x10-4 58 45.41 0.0011 2/3 ZNF185 (Xq28) All glioma 40 0.0018 33 24.26 0.0026 33 24.26 0.0061 0/3 GBM 49 6.19x10-5 42 33.52 3.04x10-4 42 33.52 9.22x10-4 2/3 Abbreviations: SHROOM2: shroom family member 2; DMD: dystrophin; ARMCX6: armadillo repeat containing, X-linked 6; ARMCX2: armadillo repeat containing, X-linked 2; ZNF185: zinc finger protein 185 with LIM domain a. Nominally significant (p<0.01) SNPs used in calculating gene score b. Number of independent SNPs after filtered for linkage disequilibrium

3.4.4 Sex-specific pathway scores

There were 1,077 pathways in the combined KEGG, BioCarta, and Reactome sets, and associations were considered significant in the discovery set at p<5x10-5 (Bonferroni correction for 1,000 tests), and significant in the discovery set at p<0.0088 (Bonferroni correction for six tests). No pathways reached the set significance threshold, but there were several nominally significant associations. The Telomeres, Telomerase, Cellular

Aging, and Immortality pathway reached nominal significance in both males and females in all glioma, and GBM (Table 3-7, Figure 3-4). When the gene-scores for the genes contained within this pathway were examined, the association with this pathway was driven primarily by strong associations in EGFR, TERT, and TP53 (Figure 3-4).

Table 3-7 Significant pathways (p<0.001 in any testing group) by sex and histology Pathway (Database) Histology Males Females Discovery Validation Discovery Validation Telomeres, Telomerase, Cellular Aging, All glioma 5.32x10-5 6.50x10-4 2.61x10-4 0.0018 and Immortality (BioCarta) GBM 5.90x10-5 8.60x10-4 8.30x10-4 0.0041 Bladder cancer (KEGG) All glioma 9.00x10-5 0.0013 0.0306 0.0038 GBM 1.27x10-4 5.50x10-4 0.0045 0.0030 Glioma (KEGG) All glioma 5.80x10-4 0.0057 0.0361 5.00x10-4 GBM 0.0011 0.0061 0.0048 0.0018 Melanoma (KEGG) All glioma 7.60x10-4 0.0032 0.0219 1.70x10-4 GBM 8.70x10-4 0.0020 0.0013 7.60x10-4 Non-small cell lung cancer (KEGG) All glioma 7.30x10-4 0.0171 0.0290 7.40x10-4 GBM 0.0013 0.0455 0.0011 0.0019 Pancreatic cancer (KEGG) All glioma 2.65x10-4 0.0132 0.0021 0.0016 GBM 0.0021 0.1124 2.01x10-4 9.10x10-4

92

Figure 3-4 Biocarta telomere pathway for all glioma in a) males, and b) females, and for GBM in c) males and d) females.

Nominally significant associations were identified in five cancer-specific KEGG

pathways: bladder cancer, glioma, melanoma, non-small cell lung cancer, and pancreatic

cancer (Figure 3-5, Figure 3-6, Figure 3-7, Figure 3-8, Figure 3-9). There is significant

overlap between these genesets (Figure 3-10), and when the gene scores used to build

each pathway were examined all the associations appear to be driven largely by strong

associations in EGFR, and CDKN2A which are members of all KEGG cancer pathways

found to be nominally associated with glioma in this analysis. 93

Figure 3-5 Bladder cancer (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females.

94

Figure 3-6 Glioma (KEGG) for all glioma in a) males, and b) females, and for glioblastoma in c) males and d) females.

95

Figure 3-7 Melanoma (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females.

96

Figure 3-8 Non-small cell lung cancer (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females

97

Figure 3-9 Pancreatic cancer (KEGG) for all glioma in a) males, and b) females, and for GBM in c) males and d) females

98

Figure 3-10 Overlap of genes containing SNPs with nominally significant glioma associations by identified KEGG pathways

Discussion

Multi-marker tests, such as gene- or pathway-based tests, allow investigators to leverage previously existing summary statistics and increase power when strength of single-SNP

associations may be low. This analysis aimed to explore additional potential sources of

genetic risk that may contribute to sex differences in genetic risk for glioma. All

autosomal genes identified by and validated within this analysis were proximate to

previously identified GWAS hits. After conditioning on these previously identified SNPs,

regions including TERT, EGFR and RTEL1 remained nominally significant, while

associations at the other identified genes were no longer significant. While GWAS has

identified one locus near TERT, two independent loci near EGFR, and one loci near

RTEL1 that are highly significantly associated with glioma risk, the results of this 99

conditional analysis suggest that there are remaining sources of genetic risk for glioma within these regions.

Four regions on the X chromosome (Xp22.2, Xp21.2-p21.1, Xq22.1, and Xq28)

contained genes that reached the significance threshold in at least two of three algorithms

(Table 3-6).The summary statistics used for this analysis were generated using a combined male and female set (see Chapter 2, Section 2.3.3.3 for additional information on these analyses), and further analysis using sex-stratified models is necessary in order to understand the potential effect of these genes on glioma risk. None of these four regions have been previously associated with glioma. SNPs surrounding SHROOM2

(Xp22.2) were previously associated with prostate and colon cancer [201-203]. There are no known associations with inherited variants in the other four regions and increased risk for cancer, though all have been shown to be dysregulated in some cancer cells. DMD encodes for dystrophin, which is an essential component of muscle tissue. Inherited or de novo mutations in DMD are well known to cause a spectrum of muscle diseases called dystrophinopathies (including Duchenne muscular dystrophy, and Becker muscular dystrophy) [204]. Deletions in this gene have been found in mesenchymal and stromal tumors, and downregulation of this gene has been associated with progression and metastasis in these tumors [205, 206]. ARMCX2 (Xq22.1) is a member of the armadillo family of , several of which have been implicated in tumorigenesis [207].

ARMCX2 has been shown to be differentially expressed in cancer cell lines as compared to normal cell lines, though expression in glioma cell lines does not differ from normal

[208]. The protein encoded by this gene has been shown to be decreased in lung cancer and expression of ZNF185 is negatively correlated with progression in prostate cancer 100

where it is silenced by methylation [209, 210]. Without a validation set, it is not possible to know if these are true associations or the result of type 1 error. Further exploration of these genes is necessary to determine their true relationship with glioma risk.

The Telomeres, Telomerase, Cellular Aging, and Immortality pathway reached nominal significance in both males and females in all glioma, and GBM (Table 3-7). This pathway contains EGFR, TERT, and TP53, all of which contain variants that have been previously associated with increased odds of developing glioma. Variants associated with telomere maintenance have been associated with glioma, as well as many other complex diseases [211-213]. An analysis that used DNA extracted from normal blood to generate a weighted genetic score based on eight SNPs associated with leukocyte telomere length

(LTL) (ACYP2, TERC, NAF1, TERT, OBFC1, CTC1, ZNF208, and RTEL1) found that

LTL was estimated to be ~5% longer in glioma cases versus controls [150]. The significance of the telomere maintenance pathway may explain the remaining significant association in the regions surrounding TERT, EGFR and RTEL1, as any variants affecting telomere length could contribute to glioma risk.

The numerous KEGG cancer pathways found to be significant in this analysis are likely due to the strength of association in genes (CDKN2A, EGFR) that are members of many pathways. While these associations are driven by these specific genes, they may also be evidence of shared genetic pathways in sources of genetic risk, or process of carcinogenesis between these cancers and glioma. Both the KEGG glioma and melanoma pathways were significantly associated with all glioma in males, both of which appear to be strongly driven by associations in CDKN2A (Figure 3-6). Previous analyses suggested

101

an association between genetic risk for glioma and melanoma, both in terms of syndromic cancer (most notably Melanoma-neural system tumor syndrome, caused by inherited variants in CDKN2A [3]), familial glioma and sporadic disease. An analysis of the NCI’s

SEER system found that persons with a previous diagnosis of melanoma have incidence of glioma that is 1.42 times that of the general population [214]. Family based studies have found that relatives of glioma patients have higher than expected incidence of melanoma, approximately 2-4 times that of the general population [94, 215]. GWAS for melanoma to date have identified at least 21 genetic risk loci [216, 217], including SNPs in the regions surrounding CDNK2A and TERT that have been previously associated with glioma [4]. These SNPs do not account for a large proportion of risk in either cancer type, but there is some evidence that telomere length and pathways of telomere maintenance may contribute to risk in both diseases [218].

While multi-marker tests (including gene, and pathway tests) have the ability to increase power to detect associations as compared to single-SNP tests, different methods will perform differently and may be better suited to particular types of genetic architecture.

Results for methods that use LD information, including all algorithms evaluated in this analysis, may also be significantly altered by the reference populations to estimate LD.

All of the included methods attempt to adjust for potential score inflation due to LD, using the 1,000 genomes project EUR super population as a reference set. FAST does this by pruning the data of SNPs that are in complete linkage (r2=1), while Pascal does this by

generating ‘fusion’ gene scores for genes that are in linkage with each other. These

‘fusion’ genes are then utilized in pathway analyses to avoid inflation due to the physical

proximity of genes, and decreased p-value inflation [109]. Due to variations in 102

adjustment for linkage disequilibrium used in the two programs, the number of included

SNPs by each gene varied slightly. Both methods require that each variant in the summary statistics be present in the LD reference file, and as a result these methods are

not able to incorporate variants that do not have a standard ID. FAST additionally limits

the dataset by requiring that all markers be biallelic SNPs, and does not accept indels.

Both Pascal (which is an implementation of the VEGAS scoring system) and the GATES method within FAST do not rely on permutations for estimating p-values. The VEGAS algorithm as implemented within FAST [118], and VEGAS2 [112] both rely on Monte

Carlo simulations to estimate P-values. Permutation-based tests are significantly more computationally intensive, especially when gene scores are being calculated across the entire genome. BimBam uses permutations to calculate exact p-values, as a result is more computationally intensive. The number of permutations used to calculate determines the boundaries for an exact p-value (ranging from 1 to 1/n, where n is the number of permutations), which may result in increasing permutations for increased p-value specificity. Pascal allows for both a sum of chi square, as used in this analysis, or a maximum chi square calculation of the test statistics. All of these methods require consideration of the assumptions being made about the genetic architecture of the disease and population of interest.

There is a well-known bias in GWAS towards large genes [219], and this bias may influence the results of this analysis. Large genes may be enriched for tag SNPs selected on arrays, and will be further enriched through imputation. All of the algorithms used for this analysis can be effected by gene size. Large genes with many SNPs of minimal

103

significance and few SNPs of large effect may ‘dilute’ the gene score in methods based on summed scores, such as Pascal and VEGAS. All algorithms prune SNPs in attempt to obtain a set of independent SNPs, but this may still bias results towards large genes if the gene contains multiple haplotype blocks. This analysis used a relatively large window surrounding the defined genes (+/- 50kb) which may further bias analyses towards large genes.

This represents the first genome-wide sex-specific gene-based analysis for germline risk variants in glioma. There are several limitations to this analysis. All glioma cases from the included four GWAS datasets were recruited at time of first diagnosis, and the assigned diagnoses represent the primary tumor type. There may also be variation in the histologies contained within each set by sex. The proportion of each dataset that is composed of GBM as compared to lower grade gliomas varies by both study and sex

(Table 3-1). Less than 50% of female glioma cases in the testing set are GBM, whereas over 50% of female cases are GBM in the validation sets. Glioma is a heterogonous disease, and due to all of these factors, it is likely that heterogeneity exists between the utilized datasets.

Conclusions

Multi-marker tests, such as gene- or pathway-based tests, allow investigators to leverage previously existing summary statistics and increase power when strength of single-SNP

associations may be low. This analysis aimed to explore additional potential sources of

genetic risk that may contribute to sex differences in genetic risk for glioma. There was a

nominally significant association between germline variants in RTEL1 in both males and

104

females after conditioning on previously identified SNPs. There was also a significant association between germline variants in the telomere maintenance pathway in both males and females, building on previous evidence of the relationship between inherited variants related to increased telomere length and increased risk for glioma. There was also a male specific association in EGFR, and a female-specific association in TERT

which remained nominally significant after conditioning on previous GWAS hits. The

results of this analysis confirm previously known information about inherited glioma risk,

and provide potential mechanistic explanations for how these variants may affect the

process of gliomagenesis.

105

Chapter 4 – Identifying risk loci associated with variation in age at diagnosis in glioblastoma

Abstract

Glioblastoma (GBM) is the most commonly occurring type of malignant brain tumor in the US. Incidence of these tumors increases with increasing age, peaking at ~70 years old. These tumors also occur less frequently in younger adults, and younger age at diagnosis is significantly associated with improved outcomes. Relationship between glioma risk SNPs and age-at-diagnosis has previously been explored for candidate SNPs,

but prior GBM genome-wide association studies (GWAS) have not stratified by age. I

assessed potential age-specific genetic effects in autosomal SNPs for GBM patients using

data from four previous glioma GWAS. Datasets were analyzed using age-stratified

logistic regression models and combined using meta-analysis. There were 4,512 total

GBM cases, and 10,582 controls between the four studies. A significant association was

detected at two previously identified SNPs in 7p11.2 (rs723527, and rs11979158) in

persons >53 years old only. There was a significant association at the previously

identified lower grade glioma (LGG) risk loci at 8q24.21 (rs55705857) in persons 18-53

years old only. An examination of GBM patients within the Cancer Genome Atlas found

prevalence of ‘LGG’-like characteristics to be higher in the tumors of patients within this

younger age group, suggesting that frequency of ‘secondary’ GBM that has progressed

from previously undiagnosed lower grade glioma is higher in this group as compared to

older groups. Age-specific differences in cancer susceptibility can provide important

clues to etiology, and these differences can be leveraged for discovery in genetic 106

association studies. While age is known to be a strong prognostic factor in GBM, the

results of this analysis suggest that younger age is associated with phenotype and risk of

GBM, and this should be taken into consideration when true molecular classification of

GBM is not available.

Background

GBM represents the majority of gliomas diagnosed in adults, and constitute 61.9% of all

gliomas diagnosed in persons 18 and older [220]. While these tumors are most common

in older adulthood, with peak incidence at ~70 years of age (Figure 4-1a), these tumors also occur in younger adults. Persons diagnosed with GBM at younger ages have significantly better survival outcomes than those who are diagnosed at older ages, with 5- year survival of 19.0% in persons 20-44 as compared to 1.8% in persons 65+ (Figure

4-1b) [220, 221].

Figure 4-1 Average a) Annual Incidence of GBM by age at diagnosis (CBTRUS 2010-2014), and b) Relative survival after diagnosis with GBM by age of diagnosis (SEER 2000-2014)

Many previously discovered genetic risk loci have histology-specific associations [4, 160,

161],_ENREF_7 but the effect of these risk loci on age at diagnosis has not been 107

systematically explored. An age-specific analysis could potentially increase power to detect new variants that may be associated with younger age at diagnosis, and more

accurately quantify the age-specific effect sizes of previously discovered variants [144].

This analysis aims to explore the effect of genetic risk variants on age at diagnosis in

GBM, the most common type of glioma.

Methods

4.3.1 Study population.

In this study, data on GBM cases were combined from four prior glioma GWAS: GICC,

SFAGS-GWAS, MDA-GWAS, and GliomaScan. Details of data collection and

classification are available in Chapter 1, Section 1.5.1. After quality control was

completed, these datasets combined contained 4,523 GBM cases (See Table 4-1 for

additional study characteristics). Germline genotyping and tumor molecular classification

data was obtained for GBM cases included in The Cancer Genome Atlas (TCGA) [163].

Table 4-1 Population characteristics by study The Cancer GWAS Datasets Genome Atlas Characteristic Cases Controls Cases Total 4,512 10,582 356 GICCa 2,456 3,264 -- SFAGS-GWASb 511 2,367 -- MDA-GWASc 643 2,228 -- GliomaScand 902 2,723 -- Male (%) 2,832 (62.8%) 5,167 (48.8%) 221 (62.1%) GICCa 1,573 (64.0%) 1,868 (57.2%) -- SFAGS-GWASb 333 (65.2%) 749 (31.6%) -- MDA-GWASc 397 (61.7%) 1,086 (48.7%) -- GliomaScand 529 (58.6%) 1,464 (53.8%) -- Mean Age (SD) 57.5 (12.0) 56.4 (15.4) 59.8 (12.9) GICCa 57.8 (11.4) 54.9 (13.8) -- SFAGS-GWASb 56.3 (11.8) 49.7 (13.8) -- MDA-GWASc 52.7 (11.6) Modal age group:60-69e (males); and 65-69f -- (females) GliomaScand 60.8 (12.8) 64.0 (15.4) -- a. Data from Glioma International Case-Control Study (GICC; Melin, et al. [4]). 108

b. Data from San Francisco Adult Glioma Study GWAS (SFAGS-GWAS; Wrensch, et al. [9]). c. data from MD Anderson Cancer Center GWAS (MDA-GWAS; Shete, et al. [8]). d. Data from the National Cancer Institute’s GliomaScan (GliomaScan; Rajaraman, et al. [7]). e. Data from CGEMS prostate study (Yeager et al. [152]). Continuous age is not available, age distribution is as follows 50-59: 12.3%, 60-69: 56.7%, 70-79: 30.7%, 80-89: 0.3%; f. Data from CGEMS breast study (Hunter et al. [153]). Continuous age is not available, age distribution is as follows: 0- 54: 4.3%, 55-59: 15.0%, 60-64: 23.6%, 65-69: 27.5%, 70-74: 19.0%, 75-99: 10.7% .

4.3.2 Genotyping and imputation.

Details of data collection and classification for the four GWAS datasets are available in

Chapter 1, Section 1.5.2. TCGA cases were genotyped on the Affymetrix Genomewide

6.0 array using DNA extracted from whole blood (see previous manuscript for details of

DNA processing [163, 164]), and underwent standard GWAS QC, and duplicate and related individuals within datasets have been excluded [4] . Ancestry outliers were

identified in TCGA using principal components analysis in plink 1.9 [165]. Resulting

files were imputed using Eagle 2 and Minimac3 as implemented on the Michigan imputation server (https://imputationserver.sph.umich.edu) using the Haplotype

Reference Consortium Version r1.1 2016 as a reference panel [84, 166, 167]. Somatic

characterization of TCGA cases was obtained from the final dataset used for the TCGA

pan-glioma analysis [168], and classification schemes were adopted from Eckel-Passow,

et al. [169] and Ceccarelli, et al. [168].

4.3.3 Statistical methods

4.3.3.1 Age-stratified scan of the autosomal chromosomes

GBM cases were divided into three age strata (18-53, 54-63, and 64+) for case control

analyses (see Figure 4-2 for overview of study design). These groups roughly correspond

to the tertiles of the age distribution in all GBM cases combined from the four datasets.

All controls were used for comparison with each dataset, and were not stratified by age. I 109

analyzed the data using age-stratified logistic regression models in SNPTEST [170] for all SNPs on autosomal chromosomes generating age-specific betas, standard errors, and p-values adjusting for sex and the number of principal components that significantly differed between cases and controls within each study as determined in prior meta- analyses [4]. These age strata were also separated by sex and age- and sex-specific analyses were done for loci previously identified as having an effect that varied by sex

(Chapter 2). Results from the GICC, SFAGS-GWAS, MDA-GWAS and GliomaScan studies were combined via meta-analysis using both inverse-variance weighted fixed effects method in META [173]. Results from the case-control analysis were considered statistically significant at the p<5x10-8 level (using a Bonferroni correction for 1,000,000

tests [222]).

Figure 4-2 Study schematic for age-stratified case-control analyses

110

Figure 4-3 Study schematic for case-only analyses

In order to identify individual SNPs that may affect in age at diagnosis, a case-only

agnostic scan was conducted of variants that were previously nominally significant in a

-4 -4 previous meta-analysis (See Melin, et al. [4], p<5x10 and pheterogeneity>5x10 ) . Using

these SNPs, a case-only analysis using age at diagnosis as a continuous phenotype was

conducted using linear regression in SNPTEST assuming an additive model to estimate

beta, standard error, and p-values [170] (see Figure 4-3 for overview of study design).

All models were adjusted for sex, and both GICC and GliomaScan were adjusted for

principal components due to genomic inflation (GICC: λunadjusted=1.02, λadjusted=1.01;

SFAGS-GWAS: λunadjusted=1.01, λadjusted=1.01; MDA-GWAS: λunadjusted=0.99,

λadjusted=0.99; GliomaScan: λunadjusted=1.04, λadjusted=1.01). Results from the GICC,

SFAGS-GWAS, MDA-GWAS and GliomaScan studies were combined via meta-

analysis using both inverse-variance weighted fixed effects and random effects method in

META [173]. Results were considered statistically significant at the p<1.6x10-6 level

(using a Bonferroni correction for 30,000 tests). All figures were generated using R 3.3.2,

qqman, and ggplot [195, 196, 223, 224]. 111

4.3.3.2 Analysis of TCGA germline and somatic data

TCGA GBM cases [163] were divided into three age strata (18-53, 54-63, and 64+) for analysis. Only newly diagnosed cases with no neo-adjuvant treatment or prior cancer were used. Demographic characteristics, molecular classification and somatic alterations data was obtained from Ceccarelli, et al. [168]. Chi-square tests were used to compare the frequency of somatic alterations between age groups. SNPs found to be nominally significant (p<5x10-4) in a previous eight-study meta-analysis [4], with imputation quality

≥ 0.7 were identified within the TCGA genotype data and D’ and r2 values in CEU were

used to select proxy SNPs [174]. Using these SNPs, a case-only analysis using age at diagnosis as a continuous phenotype was conducted using linear regression in SNPTEST assuming an additive model to estimate beta, standard error, and p-values [170]. Results were considered significant at p<0.003 (Bonferroni correction for 16 tests).

Results

There were 4,512 total GBM cases, and 10,582 controls in the four included GWAS

datasets (Table 4-1). Overall, 62.8% of GBM cases in the four GWAS datasets were

male, with a mean age at diagnosis of 57.5. Controls had similar mean age, but a slight

majority of controls were female.

4.4.1 Previously identified glioma risk regions

In a previous eight-study meta-analysis, ~12,000 SNPs (INFO>0.7, MAF>0.01) were

identified as having a nominally significant (p<5x10-4) association with glioma [4].

Results from the case-control analysis were considered statistically significant at the

p<5x10-8 level (using a Bonferroni correction for 1,000,000 tests, which has been used as 112

the threshold for previous case-control studies in glioma) (Figure 4-2). SNPs in 5/25 previously identified risk loci reached genome wide significance in at least one age strata, including three previously identified SNPs in 7p11.2. SNPs in 5p15.33, 9p21.3, 17p.13.1, and 20q13.33 were of similar significance and effect size in all three age groups (Table

4-2, Figure 4-4). No new risk loci were identified in an age-stratified genome-wide scan.

Table 4-2 Previously identified glioma risk loci and histology-specific odds ratios (OR) and 95% confidence intervals (95% CI) stratified by age. Risk Ages 18-53 Ages 54-63 Ages 64+ SNP (Locus) Allele P OR (95% CI) P OR (95% CI) P OR (95% CI) rs10069690 (5p15.33) C/T 3.79x10-18 1.51 4.05x10-33 1.78 4.59x10-31 1.72 (1.38-1.66) (1.62-1.95) (1.57-1.89) rs75061358 (7p11.2) T/G 6.78x10-12 1.67 2.44x10-12 1.70 3.67x10-16 1.85 (1.44-1.94) (1.47-1.97) (1.60-2.15) rs723527 (7p11.2) A/G 1.37x10-5 1.19 1.50x10-9 1.28 2.14x10-11 1.32 (1.10-1.29) (1.18-1.39) (1.21-1.43) rs11979158 (7p11.2) A/G 2.47x10-4 1.22 6.13x10-8 1.35 2.18x10-10 1.42 (1.10-1.36) (1.21-1.50) (1.27-1.58) rs55705857 (8q24.21) A/G 9.30x10-11 1.76 0.4225 1.08 0.0280 1.21 (1.49-2.10) (0.90-1.28) (1.02-1.44) rs634537 (9p21.3) T/G 9.50x10-17 1.40 3.72x10-19 1.44 2.62x10-15 1.38 (1.29-1.51) (1.33-1.56) (1.27-1.49) 1.09 1.00 0.91 rs498872 (11q23.3) A/G 0.0428 0.9587 0.0284 (1.00-1.19) (0.92-1.09) (0.84-0.99) 1.12 0.94 0.93 rs12803321 (11q23.3) G/C 0.0062 0.1786 0.0795 (1.03-1.22) (0.87-1.03) (0.85-1.01) rs78378222 (17p13.1) T/G 8.65x10-16 3.80 2.87x10-8 2.54 4.13x10-13 3.26 (2.74-5.25) (1.83-3.53) (2.37-4.48) rs2297440 (20q13.33) T/C 8.21x10-13 1.41 7.35x10-19 1.55 1.49x10-14 1.45 (1.28-1.55) (1.40-1.70) (1.32-1.60)

Among the three identified SNPs in 7p11.2, rs75061358 was of similar significance and effect size in all age groups. Two other SNPs (rs723527 and rs11979158) were nominally significant in the age 18-53 strata, and reached genome-wide significance in the age 54-

63, and age 64+ stratum (Table 4-2, Figure 4-4, Figure 4-5). The risk locus at 8q24.21

(rs55705857) reached genome-wide significance in the 18-53 year old group (p=9.30x10-

11), while the effect of this SNP was null in the other age groups (Table 4-2, Figure 4-4,

113

Figure 4-5). The odds ratio for this SNP in the 18-53 year age group was 1.76 (95% CI:

1.49-2.10).

Figure 4-4 Manhattan plot of -log(p) values for GBM in A) ages 18-53, B) ages 54-63, and C) ages 64+

114

Figure 4-5 Age-specific odds ratios, 95% CI and p-values for selected previous GWAS hits

A case-only analysis was conducted using the same 12,000 nominally significant SNPs

with age at diagnosis as the outcome variable, and associations were considered

significant at p<4.17x10-6 (adjusted for 12,000 tests) (Figure 4-3). There were two peaks

that reached statistical significance in the fixed effects meta-analysis at 8q24.21, and

11q23.3 (Figure 4-6a). When a random effects meta-analysis was done, only the peak at

8q24.21 reached the significance threshold (Figure 4-6b).

The previously identified SNP at 8p24.21 was significantly associated with younger age

at diagnosis, with an approximate decrease in age at diagnosis of 3 years for each risk

-6 -11 allele (prandom=3.70x10 , pfixed=1.51x10 ) (Table 4-3, Figure 4-7). When this

association was evaluated in cases ages 54 and older only, the association was null. There

was variability of detected effect in rs55705857 by study (Table 4-4), though tests for

heterogeneity were not significant. In order to assess the role that selection bias may be

occurring due to confounding between age and diagnosis and histologic classification, a

sensitivity analysis was performed using only individuals diagnosed after 2000 (Table

4-5). Among individuals 18-53 diagnosed after 2000 only, there was still a genome-wide significant signal for the SNP at 8q24.21, rs55705857 (p=5.47x10-10, OR=1.84 [95%

CI=1.52-2.23]). 115

Figure 4-6 Manhattan plot of -log(p) values for case-only analysis of GBM in A) fixed effects and B) random effects

Figure 4-7 Case only estimates for selected SNPs in a) all ages, b) cases ages 54+ only

116

Table 4-3 Case-only betas, 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for selected previous GWAS hits, overall, and in persons 54+ at time of diagnosis only All Ages Ages 54+ only Beta (95% Meta-analysis Beta (95% Meta-analysis SNP (Locus) CI) pFixed pRandom Phet CI) pFixed pRandom Phet rs10069690 (5p15.33) 0.76 0.0043 0.0053 0.3754 0.13 0.4818 0.4814 0.0328 (0.24,1.29) (-0.23,0.50) rs723527 (7p11.2) -0.48 0.0564 0.0848 0.2490 -0.02 0.9158 0.8817 0.3719 (-0.98,0.01) (-0.37,0.33) rs11979158 (7p11.2) -1.08 0.0021 0.0021 0.6140 -0.05 0.8507 0.8507 0.7327 (-1.76,-0.39) (-0.55,0.45) rs55705857 (8q24.21) -3.21 1.51x10-11 3.70x10-6 0.1267 0.21 0.5659 0.6331 0.0469 (-4.14,-2.28) (-0.50,0.92) rs634537 (9p21.3) 0.17 0.4835 0.8009 0.0602 -0.23 0.1826 0.3170 0.1067 (-0.30,0.64) (-0.56,0.11) rs498872 (11q23.3) 1.44 2.91x10-8 0.0034 0.0138 0.52 0.0056 0.0225 0.1739 (0.93,1.95) (0.15,0.88) rs12803321 (11q23.3) 1.49 6.00x10-9 0.0148 5.15x10-5 0.35 0.0554 0.0554 0.4408 (0.99,1.99) (-0.01,0.70) rs78378222 (17p13.1) -0.61 0.3987 0.4353 0.0977 0.35 0.5020 0.5233 0.3621 (-2.04,0.81) (-0.67,1.36) rs2297440 (20q13.33) 0.79 0.0144 0.0512 0.0607 -0.19 0.4031 0.4031 0.6680 (0.16,1.43) (-0.65,0.26)

Table 4-4 Case-only betas, 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for selected previous GWAS hits All Ages Ages 54+ only Beta Meta-analysis Beta Meta-analysis SNP (Locus) Study (95% CI) P pFixed pRandom (95% CI) p pFixed pRandom rs10069690 Meta-analysis 0.76 -- 0.0043 0.0053 0.13 0.4818 0.4814 (5p15.33) (0.24,1.29) (-0.23,0.50) GICC 0.60 0.0812 0.06 0.8102 (-0.07,1.27) (-0.40,0.51) SFAGS-GWAS 1.92 0.0255 1.32 0.0375 (0.24,3.61) (0.08,2.57) GliomaScan 1.33 0.0828 -0.75 0.1144 (-0.17,2.84) (-1.69,0.18) MDA-GWAS 0.31 0.6263 0.85 0.1240 (-0.93,1.55) (-0.23,1.94) rs723527 Meta-analysis -0.48 -- 0.0564 0.0848 -0.02 0.9158 0.8817 (7p11.2) (-0.98,0.01) (-0.37,0.33) GICC -0.14 0.6704 0.14 0.5434 (-0.81,0.52) (-0.31,0.59) SFAGS-GWAS -0.17 0.8276 -0.32 0.5916 (-1.67,1.34) (-1.51,0.86) GliomaScan -0.70 0.2977 -0.63 0.1409 (-2.02,0.62) (-1.46,0.21) MDA-GWAS -1.44 0.0111 0.29 0.5736 (-2.55,-0.33) (-0.71,1.29) Meta-analysis -1.08 -- 0.0021 0.0021 -0.05 0.8507 0.8507 rs11979158 (-1.76,-0.39) (-0.55,0.45) (7p11.2) GICC -0.83 0.0809 0.10 0.7628 (-1.76,0.10) (-0.55,0.74) SFAGS-GWAS -0.84 0.4519 -0.84 0.3305 (-3.03,1.35) (-2.52,0.85) GliomaScan -0.80 0.3936 -0.30 0.6252 (-2.65,1.04) (-1.48,0.89) MDA-GWAS -1.97 0.0087 0.15 0.8344 (-3.44,-0.50) (-1.27,1.58) rs55705857 Meta-analysis -3.21 -- 1.51x10-11 3.70x10-6 0.21 0.5659 0.6331 (8q24.21) (-4.14,-2.28) (-0.50,0.92) GICC -2.66 3.41x10-5 -0.09 0.8419 (-3.92,-1.40) (-1.00,0.82) 117

SFAGS-GWAS -5.54 2.64x10-4 -1.96 0.1572 (-8.51,-2.56) (-4.67,0.76) GliomaScan -1.57 0.2640 0.45 0.5672 (-4.31,1.18) (-1.09,2.00) MDA-GWAS -4.30 1.01x10-5 2.56 0.0150 (-6.21,-2.39) (0.50,4.62) rs634537 Meta-analysis 0.17 -- 0.4835 0.8009 -0.23 0.1826 0.3170 (9p21.3) (-0.30,0.64) (-0.56,0.11) GICC 0.25 0.4309 -0.16 0.4736 (-0.38,0.88) (-0.58,0.27) SFAGS-GWAS 0.72 0.3365 -0.03 0.9644 (-0.75,2.19) (-1.17,1.12) GliomaScan -1.37 0.0329 0.19 0.6463 (-2.64,-0.11) (-0.61,0.98) MDA-GWAS 0.72 0.1804 -1.28 0.0075 (-0.33,1.77) (-2.21,-0.34) rs498872 Meta-analysis 1.44 -- 2.91x10-8 0.0034 0.52 0.0056 0.0225 (11q23.3) (0.93,1.95) (0.15,0.88) GICC 0.97 0.0055 0.30 0.2043 (0.29,1.66) (-0.17,0.77) SFAGS-GWAS 1.45 0.0595 1.51 0.0110 (-0.06,2.96) (0.35,2.68) GliomaScan 0.87 0.2147 0.98 0.0350 (-0.51,2.25) (0.07,1.89) MDA-GWAS 3.11 8.89x10-8 0.19 0.7106 (1.97,4.25) (-0.83,1.21) rs12803321 Meta-analysis 1.49 -- 6.00x10-9 0.0148 0.35 0.0554 0.0554 (11q23.3) (0.99,1.99) (-0.01,0.70) GICC 0.61 0.0754 0.15 0.5089 (-0.06,1.29) (-0.30,0.61) SFAGS-GWAS 1.37 0.0716 1.15 0.0548 (-0.12,2.87) (-0.02,2.32) GliomaScan 1.85 0.0061 0.55 0.2025 (0.53,3.17) (-0.30,1.39) MDA-GWAS 3.80 7.42x10-11 0.39 0.4320 (2.66,4.94) (-0.59,1.38) rs78378222 Meta-analysis -0.61 -- 0.3987 0.4353 0.35 0.5020 0.5233 (17p13.1) (-2.04,0.81) (-0.67,1.36) GICC -0.11 0.9084 0.18 0.7844 (-1.99,1.77) (-1.11,1.47) SFAGS-GWAS -5.63 0.0197 -1.73 0.4420 (-10.37,-0.90) (-6.16,2.69) GliomaScan 1.88 0.3556 2.04 0.0883 (-2.10,5.85) (-0.31,4.39) MDA-GWAS -1.34 0.3987 -0.38 0.7814 (-4.45,1.77) (-3.08,2.31) rs2297440 Meta-analysis 0.79 -- 0.0144 0.0512 -0.19 0.4031 0.4031 (20q13.33) (0.16,1.43) (-0.65,0.26) GICC 0.05 0.9105 -0.11 0.7200 (-0.80,0.90) (-0.68,0.47) SFAGS-GWAS 1.11 0.2958 -1.01 0.2442 (-0.97,3.19) (-2.71,0.69) GliomaScan 1.58 0.0583 0.10 0.8534 (-0.06,3.21) (-1.00,1.20) MDA-GWAS 2.15 0.0032 -0.54 0.3888 (0.72,3.58) (-1.77,0.69) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

118

Table 4-5 Age-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis for rs723527, rs11979158, and rs55705857 for cases diagnosed in 2000 and later 18-53 54-63 64+ OR OR OR SNP P (95% CI) Phet P (95% CI) Phet P (95% CI) Phet rs723527 3.54x10-5 1.21 0.1637 6.96x10-8 1.28 0.2731 3.68x10-9 1.31 0.0221 (7p11.2) (1.11-1.33) (1.17-1.40) (1.20-1.43) rs11979158 3.24x10-4 1.25 0.7780 1.08x10-7 1.39 0.7724 8.09x10-9 1.43 0.5305 (7p11.2) (1.11-1.41) (1.23-1.57) (1.26-1.61) rs55705857 5.47x10-10 1.84 0.2703 0.7143 1.04 0.7178 0.1916 1.14 0.2472 (8q24.21) (1.52-2.23) (0.85-1.27) (0.94-1.39)

Table 4-6 Age-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 for cases diagnosed in 2000 and later 18-53 54-63 64+ SNP OR OR OR (Locus) Study P (95% CI) Phet P (95% CI) Phet P (95% CI) Phet rs723527 Meta- 3.54x10-5 1.21 0.1637 6.96x10-8 1.28 0.2731 3.68x10-9 1.31 0.0221 (7p11.2) analysis (1.11-1.33) (1.17-1.40) (1.20-1.43) GICC 2.66x10-5 1.28 3.72x10-7 1.33 2.12x10-7 1.34 (1.14-1.43) (1.19-1.49) (1.20-1.49) SFAGS- 0.0406 1.28 0.0718 1.26 0.3152 1.15 GWAS (1.01-1.61) (0.98-1.62) (0.87-1.53) GliomaScan 0.5962 0.90 0.7554 0.95 7.36x10-5 1.59 (0.61-1.33) (0.69-1.31) (1.26-1.99) MDA-GWAS 0.7588 1.04 0.0514 1.29 0.3839 0.86 (0.83-1.29) (1.00-1.67) (0.62-1.21) Meta- 3.24x10-4 1.25 0.7780 1.08x10-7 1.39 0.7724 8.09x10-9 1.43 0.5305 rs11979158 analysis (1.11-1.41) (1.23-1.57) (1.26-1.61) (7p11.2) GICC 0.0038 1.25 5.56x10-6 1.42 7.99x10-7 1.45 (1.08-1.46) (1.22-1.64) (1.25-1.68) SFAGS- 0.0466 1.37 0.0384 1.42 0.3867 1.18 GWAS (1.00-1.88) (1.02-1.99) (0.81-1.72) GliomaScan 0.9552 1.01 0.5762 1.12 0.0019 1.62 (0.62-1.66) (0.75-1.69) (1.19-2.19) MDA-GWAS 0.1977 1.21 0.0426 1.41 0.3922 1.21 (0.91-1.61) (1.01-1.97) (0.78-1.88) rs55705857 Meta- 5.47x10- 1.84 0.2703 0.7143 1.04 0.7178 0.1916 1.14 0.2472 10 (8q24.21) analysis (1.52-2.23) (0.85-1.27) (0.94-1.39) GICC 6.93x10-7 1.82 0.6121 1.06 0.4955 1.09 (1.44-2.31) (0.84-1.36) (0.86-1.38) SFAGS- 4.84x10-4 2.61 0.4708 1.24 0.8318 0.93 GWAS (1.52-4.47) (0.69-2.21) (0.48-1.82) GliomaScan 0.9631 1.02 0.9968 1.00 0.5867 1.14 (0.47-2.19) (0.50-2.00) (0.71-1.81) MDA-GWAS 0.0179 1.83 0.4011 0.79 0.0248 2.37 (1.11-3.01) (0.45-1.38) (1.12-5.06) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

119

Table 4-7 Minor allele frequencies (MAF), for meta-analysis and individual studies by case-control status and age group for rs723527, rs11979158, and rs55705857. Cases SNP (Locus) Study Controls (All Ages) 18-53 54-63 64+ N MAF N MAF N MAF N MAF Meta-analysis 10,582 0.410 1,533 0.366 1,476 0.351 1,503 0.347 GICC 3,264 0.427 795 0.355 823 0.348 838 0.351 rs723527 SFAGS-GWAS 2,367 0.398 200 0.345 163 0.334 148 0.358 (7p11.2) GliomaScan 2,228 0.405 229 0.389 262 0.376 411 0.325 MDA-GWAS 2,723 0.403 309 0.390 228 0.346 106 0.382 Meta-analysis 10,582 0.166 1,533 0.141 1,476 0.131 1,503 0.124 GICC 3,264 0.176 795 0.140 823 0.127 838 0.126 rs11979158 SFAGS-GWAS 2,367 0.152 200 0.120 163 0.126 148 0.125 (7p11.2) GliomaScan 2,228 0.166 229 0.155 262 0.149 411 0.120 MDA-GWAS 2,723 0.168 309 0.147 228 0.129 106 0.118 Meta-analysis 10,582 0.060 1,533 0.088 1,476 0.063 1,503 0.069 GICC 3,264 0.056 795 0.085 823 0.056 838 0.058 rs55705857 SFAGS-GWAS 2,367 0.054 200 0.099 163 0.067 148 0.049 (8q24.21) GliomaScan 2,228 0.065 229 0.087 262 0.084 411 0.090 MDA-GWAS 2,723 0.067 309 0.090 228 0.060 106 0.097 Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MAF: Minor allele frequency; MDA-GWAS: MD Anderson GWAS

There was a nominal association in the case-only analysis between age at diagnosis and

the previously identified SNP at 5p15.33 (rs10069690) and GBM (pfixed =0.0033, prandom

=0.0053) with an increase in age of diagnosis of 0.76 years for each risk allele (Table

4-3, Figure 4-7). When the case-only analysis was limited to only the two older age

groups (individuals >53 years old), there was no significant association between this SNP

and age at diagnosis (pfixed=0.4818, prandom=0.4814). In the fixed effects meta-analysis,

two previously identified SNPs in 11q23.3 were significantly associated with older age at

diagnosis (Table 4-3, Figure 4-7). For both SNPs (rs12803321 and rs498872), each risk

allele was associated with an increase of approximately 1.5 years in age at diagnosis.

There was some heterogeneity by study for both of the evaluated SNPs at 11q23.3

(rs12803321 phet=5.5x10-5; rs498872 phet=0.0138), and in the random effects meta-

analysis the associations at both of these SNPs no longer met the threshold for

significance. In the subset of cases 54 and older, rs498872 remained nominally

significantly associated with age at diagnosis (pfixed=0.0056, prandom=0.0225), with an

120

estimated 0.52 year increase in age of diagnosis for each risk allele (Table 4-3, Figure

4-7). There was no association between rs12803321 and age at diagnosis in the older subset. There was a nominal association between rs11979158 and younger age at diagnosis (pfixed= 0.0021, prandom= 0.0021), but this association was null in the subset of

cases ages 54 and older (Table 4-3 Figure 4-7).

4.4.2 Sex- and age stratified results

A previous analysis of these datasets (see Chapter 2) identified a significant difference in effect size of rs11979158 between males and females, where the association between this SNP and GBM reached genome-wide significance in males only. There was no association between rs11979158 and GBM in females in the youngest age strata, and only

nominally significant associations in the two older strata (Figure 4-8). The previous

analysis also found a significant difference in effect size of rs55705857 between males

and females, where effect was approximately 2 times as high in females as opposed to

males. Sex-stratified models were run in within each age strata. This association reaches

-9 genome-wide significance in females only (pF=9.19x10 , ORF=2.02 [95% CI: 1.59-

-4, 2.57]), whereas the effect in males was only nominally significant (pM=1.05x10

ORM=1.46 [95% CI: 1.21-1.77]) (Figure 4-8). There was variability of detected effect in

rs55705857 by study, though tests for heterogeneity were not significant (Table 4-8,

Table 4-9, Table 4-10).

121

Figure 4-8 Age- and sex-specific odds ratios, 95% CI and p-values for selected previous GWAS hits

Table 4-8 Age- and sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 in persons age 18-53 18-53 RSID Males Females (Locus) Study PM ORM (95% CI) Phet PF ORF (95% CI) Phet rs723527 Meta-analysis 3.39x10-4 1.21 (1.09-1.34) 0.5807 0.0144 1.18 (1.03-1.35) 0.5648 (7p11.2) GICC 0.0010 1.26 (1.09-1.46) 0.0091 1.20 (0.99-1.45) SFAGS-GWAS 0.1104 1.15 (0.87-1.53) 0.3133 1.03 (0.73-1.47) GliomaScan 0.2866 1.07 (0.82-1.38) 0.8418 1.06 (0.76-1.47) MDA-GWAS 0.5648 1.28 (1.02-1.60) 0.6941 1.29 (0.97-1.71) rs11979158 Meta-analysis 1.00x10-4 1.33 (1.15-1.53) 0.4723 0.3802 1.08 (0.91-1.29) 0.9795 (7p11.2) GICC 0.0019 1.73 (1.41-2.11) 0.5255 1.01 (0.78-1.32) SFAGS-GWAS 0.0153 1.22 (0.78-1.90) 0.9649 1.06 (0.67-1.66) GliomaScan 0.2744 1.16 (0.81-1.65) 0.7934 1.14 (0.76-1.73) MDA-GWAS 0.3464 1.37 (1.02-1.86) 0.4863 1.09 (0.74-1.59) rs55705857 Meta-analysis 1.05x10-4 1.46 (1.21-1.77) 0.4785 9.19x10-9 2.02 (1.59-2.57) 0.5645 (8q24.21)) GICC 3.29x10-4 1.53 (1.17-2.00) 6.15x10-4 2.93 (2.05-4.18) SFAGS-GWAS 0.1047 1.04 (0.62-1.75) 1.83x10-4 1.91 (1.09-3.35) GliomaScan 0.8738 1.36 (0.82-2.26) 0.0305 1.82 (1.01-3.27) MDA-GWAS 0.1605 1.63 (1.06-2.51) 0.0313 1.86 (1.08-3.22) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

Table 4-9 Age- and sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 in persons age 54-63. 54-63 Males Females

RSID (Locus) Study PM ORM (95% CI) Phet PF ORF (95% CI) Phet rs723527 Meta-analysis 4.79x10-9 1.38 (1.24-1.53) 0.9679 0.0208 1.17 (1.02-1.33) 0.3472 (7p11.2) GICC 1.15x10-5 1.31 (1.13-1.52) 0.0085 1.28 (1.06-1.53) SFAGS-GWAS 0.0777 1.34 (0.99-1.81) 0.2588 0.95 (0.62-1.44) GliomaScan 0.0244 1.44 (1.11-1.86) 0.7071 1.10 (0.82-1.46) MDA-GWAS 0.0077 1.39 (1.06-1.81) 0.5743 1.28 (0.92-1.76) rs11979158 Meta-analysis 6.88x10-7 1.47 (1.26-1.70) 0.5027 0.0191 1.24 (1.04-1.49) 0.7531 (7p11.2) GICC 1.14x10-5 1.22 (0.99-1.50) 0.0874 1.63 (1.26-2.11) SFAGS-GWAS 0.3516 1.24 (0.82-1.88) 0.1262 1.25 (0.67-2.35) GliomaScan 0.2321 1.58 (1.11-2.25) 0.2501 1.07 (0.73-1.57) MDA-GWAS 0.0196 1.59 (1.08-2.33) 0.7429 1.25 (0.82-1.91) rs55705857 Meta-analysis 0.8618 1.02 (0.82-1.27) 0.8737 0.1776 1.21 (0.92-1.60) 0.3763 (8q24.21)) GICC 0.6675 0.92 (0.69-1.24) 0.7845 1.83 (1.23-2.75) SFAGS-GWAS 0.8117 1.10 (0.57-2.15) 0.1276 1.53 (0.70-3.33) 122

GliomaScan 0.7053 0.85 (0.51-1.41) 0.1293 0.83 (0.48-1.43) MDA-GWAS 0.5575 1.07 (0.61-1.87) 0.6376 1.06 (0.48-2.33) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

Table 4-10 Age- and sex-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857 in persons age 64+ 64+ Males Females

RSID (Locus) Study PM ORM (95% CI) Phet PF ORF (95% CI) Phet rs723527 Meta-analysis 1.19x10-6 1.30 (1.17-1.45) 0.0551 5.34x10-6 1.36 (1.19-1.55) 0.9404 (7p11.2) GICC 4.47x10-5 0.98 (0.85-1.13) 0.0015 1.50 (1.25-1.80) SFAGS-GWAS 0.8983 1.51 (1.11-2.06) 0.0636 1.37 (0.89-2.11) GliomaScan 1.42x10-4 0.97 (0.79-1.20) 0.0103 1.24 (0.97-1.58) MDA-GWAS 0.8770 1.35 (0.92-1.97) 0.3203 1.34 (0.88-2.06) rs11979158 Meta-analysis 7.71x10-8 1.52 (1.30-1.76) 0.3902 0.0010 1.37 (1.14-1.65) 0.7245 (7p11.2) GICC 1.33x10-6 1.13 (0.93-1.39) 0.0872 1.53 (1.18-1.99) SFAGS-GWAS 0.5762 1.54 (0.99-2.40) 0.1839 1.43 (0.76-2.67) GliomaScan 0.0056 1.22 (0.90-1.65) 0.0369 1.84 (1.32-2.58) MDA-GWAS 0.4735 1.65 (0.97-2.82) 0.0803 1.26 (0.63-2.50) rs55705857 Meta-analysis 0.5682 1.06 (0.86-1.32) 0.5971 0.0010 1.54 (1.19-1.99) 0.0884 (8q24.21)) GICC 0.5718 0.67 (0.50-0.89) 0.7017 1.02 (0.68-1.53) SFAGS-GWAS 0.3421 1.04 (0.45-2.36) 0.9716 2.06 (0.73-5.83) GliomaScan 0.8640 1.41 (0.94-2.09) 3.62x10-4 2.25 (1.51-3.35) MDA-GWAS 0.3280 1.09 (0.55-2.15) 0.0356 1.08 (0.51-2.31) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

Table 4-11 Minor allele frequencies (MAF), for meta-analysis and individual studies by case-control status, age group, and sex for rs75061358, rs723527, rs11979158, and rs55705857. Cases Controls (All Ages) 18-53 54-63 64+

RSID Males Females Males Females Males Females Males Females (Locus) Study N MAF N MAF N MAF N MAF N MAF N MAF N MAF N MAF rs723527 Meta-analysis 5,167 0.410 5,415 0.410 984 0.363 549 0.372 917 0.337 559 0.375 931 0.350 572 0.341 (7p11.2) GICC 1,868 0.427 1,396 0.427 517 0.356 278 0.354 515 0.342 308 0.359 541 0.352 297 0.348 SFAGS-GWAS 749 0.391 1,618 0.402 127 0.335 73 0.363 111 0.324 52 0.356 95 0.384 53 0.311 MDA-GWAS 1,464 0.403 1,259 0.402 143 0.381 86 0.401 149 0.346 113 0.416 237 0.319 174 0.333 GliomaScan 1,086 0.402 1,142 0.408 197 0.386 112 0.397 142 0.320 86 0.390 58 0.405 48 0.354 rs11979158 Meta-analysis 5,167 0.171 5,415 0.163 984 0.135 549 0.152 917 0.127 559 0.138 931 0.122 572 0.127 (7p11.2) GICC 1,868 0.186 1,396 0.164 517 0.138 278 0.144 515 0.125 308 0.130 541 0.123 297 0.133 SFAGS-GWAS 749 0.154 1,618 0.151 127 0.098 73 0.158 111 0.135 52 0.106 95 0.137 53 0.104 GliomaScan 1,464 0.163 1,259 0.173 143 0.140 86 0.180 149 0.141 113 0.159 237 0.112 174 0.132 MDA-GWAS 1,086 0.166 1,142 0.166 197 0.147 112 0.147 142 0.113 86 0.157 58 0.138 48 0.094 rs55705857 Meta-analysis 5,167 0.066 5,415 0.055 984 0.086 549 0.092 917 0.064 559 0.061 931 0.066 572 0.074 (8q24.21) GICC 1,868 0.059 1,396 0.051 517 0.085 278 0.085 515 0.059 308 0.050 541 0.062 297 0.052 SFAGS-GWAS 749 0.063 1,618 0.050 127 0.090 73 0.114 111 0.061 52 0.081 95 0.049 53 0.049 GliomaScan 1,464 0.073 1,259 0.061 143 0.080 86 0.098 149 0.082 113 0.086 237 0.075 174 0.110 MDA-GWAS 1,086 0.071 1,142 0.059 197 0.089 112 0.093 142 0.064 86 0.053 58 0.092 48 0.103 Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

123

4.4.3 Combined analysis of germline variants and somatic characterization

Due to the lack of molecular classification data included in the GICC, MDA-GWAS,

SFAGS-GWAS and GliomaScan datasets, GBM data obtained from TCGA GBM dataset were used to explore the potential confounding due to molecular subtype variation with histologies. After quality control procedures, there were 356 individuals from the TCGA dataset available for analysis with available germline genotyping, tumor molecular characterization, and age at diagnosis information (Table 4-1). Age at diagnosis differed significantly by molecular subgroup. Median age at diagnosis was lowest in IDH1/2 mutant GBM samples (38.5) and highest in IDH1/2 wildtype samples (62.0). Within the youngest age group of individuals 8-53, 15/100 individuals had IDH1/2 mutant tumors

(15%), as compared to 2/94 in those 54-63 (2.1%), and 1/121 in those 64+ (0.8%)

(Figure 4-9a). No samples included in the TCGA dataset had deletions of 1p/19q in any age group. Frequency of TERT mutation increased with increasing age, with the highest frequency of this feature (94%) occurring in the oldest age group (Figure 4-9b). Among methylation subtypes, frequency of G-CIMP occurred most frequently in the youngest group (14.4%), as compared to 3.3% in ages 54-63, and <1% in those 64+ (Figure 4-9c).

Overall, GBM samples in the youngest age group appeared to be more “LGG-like” as compared to those in the two older groups [164, 168].

124

Figure 4-9 Proportion of samples in the TCGA GBM dataset by age by A) IDH mutation status, B) glioma subtypes (based on IDH, 1p19q, and TERT) , and C) methylation groups identified by TCGA pan-glioma working group

SNPs found to be nominally significant (p<5x10-4)in a previous eight-study meta-

analysis, with imputation quality ≥ 0.7, were identified within the TCGA genotype data

and D’ and r2 values in CEU were used to select proxy SNPs (

Table 4-12) [174]. No proxy SNPs meeting the imputation quality and previous significance threshold were able to be identified for 11q23.3.

125

Table 4-12 Linkage disequilibrium measures, minor allele frequency (MAF) odds ratios, and 95% confidence intervals (95% CI), and p-values from prior eight-study meta-analysis for previously- identified glioma risk SNPs or marker SNPs selected within The Cancer Genome Atlas (TCGA) genotyping data Results from eight-study Previously identified Risk MAF meta-analysis SNP (Locus) Marker SNP Allele LD (CEUa) (CEU) OR (95% CI) P rs10069690 (5p15.33) rs2736100 C/A D’=0.854; r2=0.276 0.51 1.42 (1.36,1.48) 3.20x10-55 rs75061358 (7p11.2) rs11766623 T/C D’=1; r2=0.161 0.27 1.21 (1.15,1.27) 1.03x10-14 rs723527 (7p11.2) rs10244020 T/C D’=0.934; r2=0.742 0.49 1.24 (1.19,1.30) 3.69x10-22 rs11979158 (7p11.2) rs7785013 G/A D'=1; r2=1 0.79 1.33 (1.25,1.41) 1.72x10-20 rs55705857 (8q24.21) rs4733720 C/A D’=1; r2=0.018 0.70 1.09 (1.04,1.14) 3.35x10-4 rs634537 (9p21.3) rs634537 T/G -- 0.44 1.38 (1.32,1.44) 1.01x10-45 rs78378222 (17p13.1) rs1641528 A/G D'=1; r2=0.037 0.88 1.15 (1.08,1.23) 1.97x10-5 rs2297440 (20q13.33) rs2738758 C/G D'=0.915; r2=0.837 0.77 1.41 (1.34,1.49) 1.20x10-37 a. As estimated using LD link [174] b. See Melin, et al. [4]

There were no signals identified associated with age within any of the previously

identified loci that reached a genome-wide significance threshold (Table 4-13).

Nominally significant associations were detected between risk alleles at age at diagnosis

in IDH1/2 wildtype gliomas at 5p15.33 (p=0.0688), with each risk allele associated with

a 1.92-year increase in age at diagnosis. There was also a nominally significant

association between one SNP in 7p11.2 (rs7785013) within all GBM (p=0.0200), and

IDH1/2 wildtype GBM only (p=0.0834) where each risk allele associated with a 2.5 year decrease in age at diagnosis. There was no significant association identified for the marker SNP within rs55705857, though this marker SNP was in weak LD with the previously identified risk SNP, and only nominally significant in the previous meta-

analysis (

Table 4-12).

126

Table 4-13 Case-only betas, 95% confidence intervals (95% CI), and p-values for previously- identified glioma risk SNPs or marker SNPs within The Cancer Genome Atlas (TCGA) genotyping data MAF by Age Group All GBM IDH1/2 Wild Type GBM Previously identified Beta Beta SNP (Locus) Marker SNP INFO 18-53 54-63 64+ (95% CI) p (95% CI) p rs10069690 (5p15.33) rs2736100 0.90 0.59 0.56 0.56 0.35 0.7370 1.92 0.0688 (-1.71, 2.41) (-0.15, 3.99) rs75061358 (7p11.2) rs11766623 0.86 0.29 0.29 0.27 1.24 0.3187 1.02 0.4086 (-1.20, 3.67) (-1.40, 3.43) rs723527 (7p11.2) rs10244020 0.75 0.70 0.69 0.70 0.35 0.7801 1.05 0.3989 (-2.11, 2.81) (-1.39, 3.49) rs11979158 (7p11.2) rs7785013 0.99 0.81 0.85 0.88 -3.20 0.0200 -2.47 0.0834 (-5.89, -0.50) (-5.27, 0.33) rs55705857 (8q24.21) rs4733720 0.92 0.66 0.74 0.68 -0.04 0.9692 -0.43 0.7012 (-2.26, 2.17) (-2.65, 1.78) rs634537 (9p21.3) -- 0.83 0.39 0.41 0.42 0.85 0.4631 1.52 0.2035 (-1.42, 3.12) (-0.82, 3.86) rs78378222 (17p13.1) rs1641528 0.99 0.86 0.88 0.87 -0.15 0.9192 -0.97 0.5236 (-3.10, 2.79) (-3.95, 2.01) rs2297440 (20q13.33) rs2738758 0.91 0.84 0.82 0.80 -0.65 0.6143 -0.45 0.7197 (-3.17, 1.88) (-2.93, 2.03)

Discussion

This represents the first analysis of inherited risk variants in sporadic GBM focused

specifically on differences in age at diagnosis. Like many other types of chronic disease,

incidence of GBM increases with increasing age, with highest incidence in persons 65-70

years old (Figure 4-1) [1]. This study demonstrated that there are age-related differences

in frequency of known heritable genetic risk variants in glioma, largely driven by

differences in those less than 54 years of age.

Two SNPs at the 7p11.2 locus (rs11979158 and rs723527) showed variation in

association by age (Figure 4-5), with similar effects in all included studies (Figure 4-10,

Figure 4-11, Table 4-14). These variants were within one of two previously identified

independent glioma risk loci located near epidermal growth factor receptor (EGFR),

which is most strongly associated with risk for GBM [4, 177]. Though EGFR is

implicated in many cancer types and is a target for many anti-cancer therapies, this risk

127

locus has not been previously associated with any other cancer type. Glioma risk SNPs within EGFR have been previously associated with increased EGFR copy number within their tumors [225], an alteration that has been commonly reported in GBM [226].

Variation in an observed association with a SNP at the 8q24.21 locus (rs55705857) by age was present for those 18-53 years of age (Figure 4-12). This SNP has been previously associated with non-GBM glioma, in particular low grade tumors that have

IDH1/2 mutation and 1p/19q co-deletion, and a significant association between this SNP and GBM has not been previously reported.

Table 4-14 Age-specific odds ratios (OR), 95% confidence intervals (95% CI), and p-values from meta-analysis and individual studies for rs723527, rs11979158, and rs55705857. 18-53 54-63 64+ SNP OR (95% OR (95% OR (95% (Locus) Study P CI) Phet P CI) Phet P CI) Phet rs723527 Meta- 1.37x10-5 1.19 0.2992 1.50x10-9 1.28 0.6193 2.14x10-11 1.32 0.2263 (7p11.2) analysis (1.10-1.29) (1.18-1.39) (1.21-1.43) GICC 2.66x10-5 1.28 3.72x10-7 1.33 2.12x10-7 1.34 (1.14-1.43) (1.19-1.49) (1.20-1.49) SFAGS- 0.0601 1.23 0.0360 1.29 0.3053 1.14 GWAS (0.99-1.53) (1.02-1.64) (0.89-1.46) GliomaScan 0.3372 1.10 0.1460 1.15 4.84x10-6 1.43 (0.90-1.35) (0.95-1.39) (1.23-1.67) MDA- 0.4872 1.06 0.0146 1.28 0.5770 1.08 GWAS (0.89-1.27) (1.05-1.57) (0.82-1.44) rs11979158 Meta- 2.47x10-4 1.22 0.8084 6.13x10-8 1.35 0.8007 2.18x10-10 1.42 0.8696 (7p11.2) analysis (1.10-1.36) (1.21-1.50) (1.27-1.58) GICC 0.0038 1.25 5.56x10-6 1.42 7.99x10-7 1.45 (1.08-1.46) (1.22-1.64) (1.25-1.68) SFAGS- 0.0573 1.33 0.0944 1.31 0.2050 1.24 GWAS (0.99-1.77) (0.95-1.80) (0.89-1.74) GliomaScan 0.3124 1.14 0.0972 1.23 5.17x10-4 1.43 (0.88-1.48) (0.96-1.58) (1.17-1.76) MDA- 0.2420 1.15 0.0419 1.31 0.0837 1.39 GWAS (0.91-1.44) (1.01-1.70) (0.96-2.01) rs55705857 Meta- 9.30x10-11 1.76 0.3040 0.4225 1.08 0.5380 0.0280 1.21 0.0811 (8q24.21) analysis (1.49-2.10) (0.90-1.28) (1.02-1.44) GICC 6.93x10-7 1.82 0.6121 1.06 0.4955 1.09 (1.44-2.31) (0.84-1.36) (0.86-1.38) SFAGS- 2.88x10-4 2.50 0.5241 1.19 0.4330 0.79 GWAS (1.52-4.09) (0.69-2.05) (0.45-1.41) GliomaScan 0.1666 1.35 0.2083 1.31 0.0138 1.48 (0.88-2.07) (0.86-1.98) (1.08-2.02) MDA- 0.0151 1.63 0.4518 0.85 0.0356 1.99 GWAS (1.10-2.41) (0.56-1.30) (1.05-3.80) Abbreviations: GICC: Glioma International Case-Control Study; SFAGS-GWAS: San Francisco Adult Glioma Study GWAS; MDA-GWAS: MD Anderson GWAS

128

Figure 4-10 Age-specific odds ratios and 95% CI from meta-analysis and by study for rs723527

Figure 4-11 Age-specific odds ratios and 95% CI from meta-analysis and by study for rs11979158

Figure 4-12 Age-specific odds ratios and 95% CI from meta-analysis and by study for rs55705857

Previous studies have reported that alterations in telomerase-related genes (e.g. TERT and

RTEL1) that have been consistently reported to be associated with glioma are associated with increased age at diagnosis [148]. This analysis found that variants in TERT

(rs10069690) and RTEL1 (rs2297440) reached genome-wide significance among all three age cohorts, with no substantial difference in estimated effect by age (Table 4-5). In the

129

case-only analysis, there was a nominally significant association between rs10069690

(pRandom=0.0053) and increased age at diagnosis, with an increase of 0.76 years of age per risk allele (Figure 4-7,Table 4-3). There was no significant association between

rs2297440 (pRandom=0.0512) and increased age at diagnosis. The analysis also reported

association between risk alleles in PHLDB1 and decreased age at diagnosis [148], but no

SNPs in PHLDB1 reached genome-wide significance in any cohort in this analysis. There was a nominally significant association between these SNPs and age at diagnosis in a case-only analysis, but when analysis was limited to those 54 years of age and older, the effect of these SNPs on age at diagnosis was null (Figure 4-7, Table 4-3), suggesting that phenotype differences between the younger and older cohort may be driving the observed allele frequency differences rather than a true effect on age at diagnosis.

Both SNPs in 7p11.2 and 8q24.21 that had age-specific associations in this analysis have been previously documented to have histology-specific associations. Risk SNPs in 7p11.2 are most strongly associated with GBM, while the risk SNP in 8q24.21 has previously been shown to be specifically associated with lower grade, IDH1/2 mutant gliomas.

While IDH1/2 mutation is most common in lower grade gliomas (where ~80% of tumors have this alteration), only a minority (~5%) of GBM have this alteration. Within GBM,

IDH1/2 mutation is generally considered to be a marker of secondary GBM, or GBM that has arisen from a previously undiagnosed lower grade glioma [227]. These mutations in

IDH1/2 occur very early in the process of gliomagenesis, and as a result represent an entity that is distinct from GBM without IDH1/2 mutation. It is likely that these ’primary’

IDH1/2 mutation GBM that are tumors which progressed quickly from lower grade tumors without being previously detected [227]. 130

Individuals with secondary GBM have been consistently reported to be younger as compared to those with primary GBM [228-232]. A population-based study found that

73% of secondary GBM had mutations in IDH1, as compared to 3.7% of primary GBM

[228]. Among individuals with tumors that appear histologically to be GBM, IDH1/2 mutation has been known to occur more commonly in younger individuals as compared to older individuals [233-235]. A recent study performed DNA sequencing on 100 individuals ages 18-45, and 98 individuals 70 years of age or older with GBM and found that IDH1 mutations occurred nearly 10x as frequently among younger persons with

GBM (26% vs 3.1%, respectively) [235]. Another analysis from two Austrian centers found IDH1 mutations occurring at a frequency of 39.2% of 56 evaluated cases [234].

Among the GBM patient set included in The Cancer Genome Atlas (TCGA), IDH1/2 mutation occurred in 15% of cases in individuals 18-53, as compared to 2.1% and 0.8% in individuals 54-63 and 64+, respectively (Figure 4-9).

All GBM cases from the included four GWAS datasets were recruited at time of first diagnosis, and the assigned diagnoses represent the primary tumor type. Diagnostic technologies and histologic classification criteria of GBM has changed over the time period during which cases were recruited for the four included studies. GliomaScan recruitment occurred over the longest period of time (1974-2011), followed by MDA-

GWAS (1990-2008), SFAGS-GWAS (1997-2006), and GICC (2010-2013). Revisions to the WHO classification for brain tumors occurred in 1993, 2000, and 2007, and as a result the three earlier studies may include cases that were diagnosed under multiple classification schemas. While classification using IDH1/2 mutation and 1p19q codeletion was not codified until the release of the 2016 WHO classification scheme, these markers 131

were gradually adopted for use in glioma diagnosis prior to the release of this scheme. As a result, both changing classification criteria over time as well as gradual adoption of molecular markers could have resulted in inclusion of some earlier cases that may not have been classified as GBM if diagnosed during a later period.

The TCGA dataset was used to explore the potential confounding due to molecular

subtype variation with histologies, but these analyses have limited power due to sample

size and low allele frequencies of some identified variants. While molecular classification

information was available for TCGA cases, genotyping data was generated using a

different platform than the GWAS datasets used for this study, and as a result information

quality at the identified loci was poor. Within the GBM histology group in TCGA,

proportions of molecular markers by age varied significantly. The largest difference was

in frequency of IDH1/2, which was present in 15% of individuals 18-53, as compared to

2.1% and 0.8% in the two older age groups, respectively (p=0.0005) (Figure 4-9).

Conclusions

Age-specific differences in cancer susceptibility can provide important clues to etiology,

and these differences can be leveraged for discovery in genetic association studies. This

analysis identified potential age-specific effects in 2 previous identified glioma risk loci

(7p11.2, and 8q24.21). The association of a SNP known to confer risk for IDH1/2 mutant glioma with GBM within individuals 18-53 suggests that a substantial portion of younger individuals included in GBM research may present initially with ‘secondary GBM.’ The higher prevalence of IDH1/2 mutant GBM within this younger age group is also evident within TCGA GBMs. While age is known to be a strong prognostic factor in GBM, the

132

results of this analysis suggest that younger age is associated with phenotype and risk of

GBM, and this should be taken into consideration when true molecular classification of

GBM is not available.

133

Chapter 5 – Conclusions and future directions

Conclusion

Glioma is the most commonly occurring malignant brain tumor in the US, with an

average annual age-adjusted incidence of 6.0 per 100,000 from 2009-2013. However,

incidence rates stratified by sex, race, and age differ significantly. Though these tumors

are rare, they cause significant morbidity and mortality. There are no well-validated risk

factors for these tumors that explain a large proportion of cases, and the vast majority of

cases occur in individuals with no family history of glioma. Previously conducted glioma

GWAS have included predominantly European ancestry males and higher grade gliomas.

Glioma is significantly more common in males, and persons of older age. To date, glioma

GWAS have found 25 validated risk loci in European ancestry populations and have not

specifically investigated the potential genetic sources of risk variation by sex or age. The

overall goal of this dissertation was to utilize these demographic differences in incidence

to increase power for detection of variants that may have varying effects by age and sex.

The overall hypothesis is that population variation in risk and incidence is to some extent

the result of genetic variation between persons diagnosed with glioma by age and sex.

In the first part of this dissertation (Chapter 2), potential sex-specific genetic effects were assessed in autosomal SNPs and sex chromosome variants for 1) all glioma, 2)

GBM and 3) non-GBM patients using data from four previous glioma GWAS in

European-ancestry populations (the GICC, MDA-GWAS, SFAGS-GWAS, and

GliomaScan). Sex stratified logistic regression models were used to generate sex-specific betas, standard errors, and p-values. A z-test was used to generate a p-value for the 134

difference between male and female estimates. Data from each study were analyzed separately and combined using inverse variance weighted meta-analysis. A significant

association was detected at rs11979158 (7p11.2, EGFR), a SNP previously associated

with glioma risk, in males only. A previously identified intergenic SNP 8q24.21

(rs55705857) had an effect size in females that was approximately 2x that of males.

Additionally, a new large region on 3p21.31 was identified with significant association in

females only. The variance explained by the addition of three SNPs identified as having

sex-specific effects SNPs (rs55705857, rs9841110 and rs11979158) was higher in females for all glioma (1.3% in males and 2.2% in females), and non-GBM glioma (2.3% in males and 5.3% in females), and slightly higher in males for GBM (0.9% in males and

0.7% in females). Though the identified differences in effect of risk variants do not fully explain the observed incidence difference in glioma by sex, they provide evidence that gliomagenesis may differ by sex.

In the second part of this dissertation (Chapter 3) sex-specific single-SNP summary statistics generated in Chapter 2 were used to generate gene- and biological pathway-

specifics scores for all glioma and GBM. Using the GICC data as a testing set, and a

meta-analysis of SFAGS-GWAS, MDA-GWAS, and GliomaScan as a validation set,

gene-specific scores were generated using three different scoring algorithms (Pascal,

BimBam, and GATES), and a consensus (identified as significant by 2/3 algorithms) was

used to prioritize genes for validation. Pathway scores were generated using Pascal.

Significant associations were found in genes containing SNPs previously associated with

glioma risk, including EGFR, TERT, RTEL1 and TP53. EGFR was significantly

associated with all glioma and GBM in males only. There were significant associations 135

between glioma and germline variants in pathways of telomere maintenance, as well as the KEGG glioma pathway. Further interrogation of genes within these pathways may identify additional sources of genetic risk for this disease in general, as well as sources of sex-specific risk.

In the final part of this dissertation (Chapter 4), potential age-specific genetic effects in autosomal SNPs were assessed using data from four previous glioma GWAS in

European-ancestry populations (the GICC, MDA-GWAS, the SFAGS-GWAS, andGliomaScan). Cases were stratified into three age groups (18-53, 54-63, 64+) that were approximately the tertiles of the age distribution among cases from all four studies combined. Age-stratified logistic regression models were used to generate age strata- specific betas, standard errors, and p-values. A previously identified intergenic SNP at

8q24.21 (rs55705857) was found to have a genome-wide significant effect in the youngest age group, while there was no detectable effect in either older age group. This

SNP is strongly associated with IDH1/2 mutant lower grade gliomas, and has not been previously associated with GBM. Some SNPs within the previously identified risk regions within EGFR did not reach genome-wide significance in the youngest groups, while their effect size in the older age groups was consistent with what has previously been reported. Further analyses conducted within the GBM data from The Cancer

Genome Atlas suggested that a larger portion of younger persons with GBM have somatic features associated with lower grade glioma, as compared to older persons, in particular a higher prevalence of IDH1/2 mutant GBM. This suggests that a substantial portion of younger individuals included in GBM research may present initially with

‘secondary GBM.’ While age is well known to be a strong prognostic factor in GBM, this 136

suggests that younger age is associated with a ‘secondary GBM’ phenotype presenting as primary disease, and this should be taken into consideration when true molecular classification of GBM is not available.

Sex, age and other demographic differences in cancer susceptibility can provide important clues to etiology, and these differences can be leveraged for discovery in genetic association studies. These significant differences in effect size by sex may be a result of differing biological function of these variants by sex due to biological sex differences in gene expression, or interaction between these variants and unidentified risk factors that vary in prevalence or effect by sex. The higher prevalence of an “LGG-like”

GBM phenotype within younger persons suggest that younger age is associated with phenotype and risk of GBM, and this should be taken into consideration when true molecular classification of GBM is not available. Conducting age- and sex- stratified analyses within GWAS can provide population-specific information about genetic susceptibility, refine phenotype definitions, and generate new hypotheses about the biological mechanisms through which germline variants affect risk of disease.

Future directions

5.2.1 Phenotypic variability, and incorporating molecular classification of glioma into

genetic association studies

Accurate classification of disease and exposure status is critical in case-control studies,

with misclassification of these factors causing poorly estimated or incorrect associations,

and significant decreases in power [236-240]. Standard procedures for data quality control in GWAS aim to decrease genotype error as much as possible, but phenotyping 137

error may be more difficult to evaluate. Studies of phenotype measurement in longitudinal traits have demonstrated that differences in measurement method can significantly affect the SNPs found to be significant by GWAS [241], and this has also been demonstrated in heterogeneous binary traits where classification may be subjective

[242, 243]. This unmeasured phenotyping error likely contributes to both ‘missing heritability’ and difficulties in replicating associations.

Historically, glioma GWAS have categorized phenotype based on histologic criteria only, with few exceptions [103]. The significant inter-rater variability in diagnosis of glioma has been well documented [244, 245], though the extent of misclassification has been shown to be more significant among lower grade glioma as compared to GBM [246]. The advancement of high-throughput technologies that have allowed for full molecular classification of gliomas, which has resulted in re-organization of glioma histologic classification, as well as the creation and elimination of histologies [163, 164, 168, 169,

176]. The incorporation of these molecular markers into histologic classification criteria

has been shown to significantly improve diagnostic accuracy and consistency [247].

While many of these molecular markers have been used consistently at large academic medical centers for years, they did not become components of the WHO classification criteria until the 2016 revision [248]. All GWAS datasets included in these analyses were classified based on the diagnosis contained in each individual’s medical record, all of which were assigned based on the prevailing criteria at the time, and largely lack information on molecular classification markers. The lack of central pathology review and molecular classification data means that there is likely significant heterogeneity within the assigned histologic groups. These results suggest that the cohort of younger 138

persons with GBM may be more likely to have phenotypic misclassification, with as much as 15-20% of individuals <54 years old at time of diagnosis presenting with a secondary GBM. The inclusion of these misclassified individuals in analyses attempting to find genetic risk factors for primary GBM likely decreases power to detect ‘true’ genetic variants associated with GBM.

Molecular characterization is increasingly a regular part of cancer diagnosis, and the

subtypes identified provide an opportunity to more accurately classify diagnoses for

genetic analyses. Many cancer subtypes have distinct lineages of acquisition of malignant

behavior, even when they arise within the same tissue site. In glioma, IDH1/2 mutation is

a precipitating event in gliomagenesis, and delineates between two distinct glioma

phenotypes. In order to improve risk prediction models for glioma, it is essential to fully

understand associations between germline variants and somatic characteristics of tumors.

Where access to clinical tumor specimens exist, molecular classification data should be

generated for individuals already genotyped within GWAS datasets. Replicating

previously discovered findings within these subsets of individuals with molecular

classification is an important first step in beginning to refine these associations.

5.2.2 Validation of sex chromosome associations and new approaches to analyzing the

sex chromosomes

Sex chromosomes analyses have historically been largely excluded from GWAS [249].

There are significantly fewer associations identified by GWAS for the X chromosome as

compared to autosomal chromosomes that contain less genetic material [137], and no

associations on the Y chromosome [250]. Incorporation of these chromosomes has

139

become easier with the development of imputation pipelines, and statistical methods that are appropriate for sex chromosomes, and modern genotyping arrays that now include thousands of sex chromosome SNPs.

Analyses of X chromosome data have several remaining limitations. Unlike autosomal analyses, re-use of older datasets for variant discovery via meta-analysis may not be possible due to low coverage of these variants on older arrays. As a result, power for detecting associations will be lower than autosomal analyses that are more easily able to leverage previously existing data. Analysis of the X chromosome often requires making assumptions about the underlying genetic model, including patterns of X inactivation and dominance. The development of new methodologies, such as those that utilize machine learning approaches [251] or that do not assume a shared genetic model between males and females [252], allow for analyses of X chromosome data that do not require making assumptions about the genetic model. Approaching existing data with these methods may increase power to detect associations that do not rely on accurately describing the underlying genetic model, or making assumptions about patterns of X inactivation.

While the Y chromosome has been utilized by researchers for investigating population genetics [250, 253, 254], GWAS has not been successful at identifying variants on this chromosome that are associated with the development of disease. Though technology and coverage has improved on the X chromosome, genotyping arrays often have poor coverage on the Y chromosome. The Y chromosome is not constrained in length in the same manner as the other chromosomes, and as a result is prone to significant structural variation and accumulation of short tandem repeats that are not easily genotyped through

140

array-based methods. These characteristics mean that the best way to characterize variation on the Y chromosome is through approaches that account for structural variation, such as sequencing or through haplotype approaches [255].

Though there are no known Y chromosome polymorphisms associated with disease, there is evidence that such associations may exist. Previous research has demonstrated that more than half of Y chromosome genes are expressed in non-gonadal tissue types [256],

which demonstrates that that variation in these genes could have widespread effects

across the body. The Y chromosome is known to vary within disease states, and complete

loss of the Y chromosome is associated with older age and smoking. Loss of this

chromosome has been linked with multiple diseases, and cancer types, including renal

cell, head and neck, and bladder cancers [257-261]. Due to the direct patrilineal

transmission of the Y, each individual can be classified into a distinct haplogroup [262-

264]. Y chromosome haplogroups have identified associations with coronary artery

disease, and have associations with immune function [265]. With the development of

more affordable sequencing technologies, and analytic approaches to analyzing variation

on the Y chromosome, it may be possible to identify disease associations on this

chromosome.

Generating additional sex chromosome genotyping data on currently existing glioma

GWAS datasets is the most efficient way to approach these questions. Where remaining

normal DNA samples exist, targeted or whole genome sequencing data can be generated

to validate the X chromosome associations discovered within this analysis. Sequencing

also affords the possibility to assess relationship between Y haplogroup and glioma risk,

141

which may further explain the male preponderance in this disease as well as the ancestry- specific incidence variation in glioma.

5.2.3 Exploring the association between melanoma and glioma

Incidence of melanoma and glioma are both highest in populations with primarily

European ancestry, including white non-Hispanics in the US (SEER). Both melanoma and glioma share an inverse association between these diseases and allergy [266].

Previous analyses suggested an association between shared? genetic risk for glioma and melanoma, both in terms of syndromic cancer (most notably Melanoma-neural system tumor syndrome, caused by inherited variants in CDKN2A [3]), familial glioma and sporadic disease. Family based studies have found that relatives of glioma patients have higher than expected incidence of melanoma (approximately 2-4 times that of the general population) [94, 215]. A previous analysis of the NCI’s SEER system from 1992 to 2009 found that persons with a previous diagnosis of melanoma have incidence of glioma that is 1.42x that of the general population, with excess risk among females as compared to males (42% vs 29%) [214]. This SEER-based study found no difference in excess risk based on whether radiation treatment was received for melanoma, suggesting that the increased glioma risk is not due to ionizing radiation exposure. GWAS for melanoma to date have identified SNPs in the regions surrounding CDNK2A and TERT that have also been previously associated with glioma [4, 216, 217]. Variants in POT1 have been associated with both glioma and melanoma [267-269], and there is evidence that telomere length and pathways of telomere maintenance may contribute to risk in both diseases

[213, 218, 270].

142

Previous analyses exploring the relationship between glioma and melanoma have largely utilized registry and family-based study designs. Data collected by the GICC includes detailed family history of cancer information for all genotyped cases and controls. This dataset represents one of the largest glioma datasets that includes both genotype and family history data, and could be utilized for further exploration of potential genetic sources of shared risk between these two diseases within families. Large genotype datasets now exist for glioma and melanoma [216, 271-277] that facilitate more accurate exploration of shared genetic risk between these diseases. Several groups have conducted cross-cancer analyses using GWAS datasets from multiple cancer types, and have found some sources of shared risk across cancers [278-281]. These analyses have not included glioma or melanoma to date, but these data will be included in planned analyses by these groups. The results of cross-cancer analyses such as these have the potential to further characterize the shared genetic risk for these diseases.

5.2.4 Familial approaches to glioma genetics

Linkage-based familial studies in glioma have not been successful at identifying and validating variants of high penetrance [95-98]. The GLIOGENE consortium was formed to collect these high risk glioma families, and many of these families now have array- based genotype and sequencing data available for analysis [282]. These approaches have largely focused on attempting to identify rare variants of high penetrance, but it is likely that common variants of low penetrance may also play a role in the development of familial disease. New approaches to these data may allow for validation of known variants in the context of family disease, as well as further identification of variants that

143

may be specifically associated with familial disease. Case-only analyses using family

history of disease as the outcome variable, and case-control analyses in specific subsets of cases have identified variants specifically associated with familial cases of cancer in other cancer types, including esophageal and breast cancer [283, 284]. Identification of common variants associated with familial disease may also provide further evidence about the mechanistic relationship between identified rare private mutations within glioma families.

5.2.5 Genetic association studies in non-European and admixed populations

Glioma incidence is highest in countries with primarily European ancestry populations, including the US, Canada, Australia, and Northern Europe (Figure 5-1) [285-287].

Incidence of glioma varies significantly by race in the US (Figure 5-2), and lifetime risk of developing a malignant brain tumor is nearly twice as high in white (141 per 100,000 persons) as compared to Black individuals (79 per 100,000) [288]. Incidence rates for all glioma subtypes vary significantly by self-reported race and ethnicity, and are highest in white non-Hispanics (Figure 5-2) [289]. Demonstrated differences in incidence and analysis of somatic alterations in tumors suggest that pathways of gliomagenesis may vary by genetic ancestry [290].

144

Figure 5-1 . Incidence of malignant brain tumors by global region (CBTRUS, CI5-X)

Figure 5-2 Incidence of glioma by race, ethnicity and histology in the US (CBTRUS)

The vast majority of genetic association studies in glioma to date have been conducted in

primarily European-ancestry individuals [7-9, 105, 148, 149]. Previous analyses have attempted to compare allele frequencies of previous GWAS hits by within reference data sets by ancestry groups in order to account for differences in incidence, but these have failed to identify new risk variants in non-European continental ancestry populations

[291]. Several candidate SNP studies have been conducted in East Asian populations, which have found novel association loci in XRCC1/3 [292], ZGPAT, SLC2A4RG, and 145

SBTB46 [293], as well as validated associations in EGFR and RTEL1 that were previously discovered in European ancestry populations [294]. Heterozygosity at previously identified risk SNPs varies between continental ancestry groups at many of these loci, particularly at 3p14.1, 10q25.2, 16p13.3, and 20p13.33. Demonstrated differences in incidence and analysis of somatic alterations in tumors [290, 295] suggest that frequency of risk variants, initiating mutations, and pathways of gliomagenesis may

vary by ancestry.

Due to the rarity of glioma overall and the decreased incidence of glioma in African

Americans as compared to persons of European ancestry there have been limited analyses

done to identify genetic variants associated with glioma risk in non-European

populations. In the US, patterns of glioma incidence make it difficult to obtain a large

enough sample size of non-European populations for analyses that are appropriately

powered to identify associations that reach genome-wide significance. African American,

Asian, and Hispanic glioma cases and matched controls were recruited as part of the

GICC and GliomaSE [296]. The combined sample of non-European ancestry cases within

these two studies represents the largest aggregation of non-European ancestry glioma

cases currently available. It is likely that re-assessing genotype probabilities using a four

continent model that takes into account American ancestry may discover additional

individuals with a substantial proportion of non-European ancestry. Characterization of

non-European individuals diagnosed with glioma may provide additional information that

may suggest explanations for significant differences in incidence by self-identified race

and ethnicity for these tumors in the absence of an environmental risk factor that explain

a substantial portion of risk. 146

5.2.6 Sex-specific variation in genetic risk for cancer

Sex-specific analyses such as those presented in this dissertation have the potential to reveal genetic sources of sexual dimorphism in risk, as well as to increase power in the case of sex-specific loci [124, 125, 138, 139]. Most cancer types show significant sex differences in incidence that are similar in scale to what is observed in malignant brain tumors (Figure 5-3) [297]. In general, cancer is more common in males as compared to females. The male preponderance of malignant brain and other CNS tumors (IRR=1.39) is of a similar scope as that observed in lung cancer (IRR=1.40), colon cancer

(IRR=1.32), and non-Hodgkin’s lymphoma (IRR=1.45) [298].

There are likely several factors acting in combination that contribute to sex differences in cancer incidence, including differences by sex in the prevalence of risk behaviors, cultural differences in exposure to environmental risks, and genetic variation by sex.

Significant sex difference in incidence exist in many cancers and other complex diseases, but sex differences are rarely reported in genetic epidemiology studies [125]. Sex-specific effects in genetic risk for complex disease have been reported in prior studies, including melanoma [138, 139]. Many cancer GWAS do not report sex-stratified estimates, and often exclude the sex chromosomes from analyses [136, 137].

147

Figure 5-3 Incidence rate ratios for common cancers (SEER 2009-2013)

The analyses reported in this dissertation identified multiple risk loci where strength of

association and risk allele frequency varied by sex. Most cancer types have large GWAS

datasets, many of which are publicly available. Several groups have conducted cross-

cancer analyses using these GWAS datasets from multiple cancer types, and have found

some sources of shared risk across cancers [278-281]. Though many of the cancers

included in these analyses are sex-specific (e.g. prostate, ovarian, and endometrial cancer), or have shared heritability that may vary by sex (e.g. glioma and melanoma),

these previous studies have not incorporated sex. An assessment of sex-specific germline

risk for cancer across cancer types where there are significant sex differences in incidence not explained by currently known risk factors could further illuminate the mechanisms of sex variation in the malignant process.

148

Bibliography

1. Ostrom, Q.T., et al., CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2009–2013. Neuro-oncology, 2016. 18: p. v1–v75. 2. Wrensch, M., et al., Familial and personal medical history of cancer and nervous system conditions among adults with glioma and controls. Am J Epidemiol, 1997. 145(7): p. 581-93. 3. Ostrom, Q.T., et al., The epidemiology of glioma in adults: a “state of the science” review. Neuro-oncology, 2014. 16(7): p. 896-913. 4. Melin, B.S., et al., Genome-wide association study of glioma subtypes identifies specific differences in genetic susceptibility to glioblastoma and non-glioblastoma tumors. Nat Genet, 2017. 49(5): p. 789-794. 5. Kinnersley, B., et al., Quantifying the heritability of glioma using genome-wide complex trait analysis. Sci Rep, 2015. 5: p. 17267. 6. Enciso-Mora, V., et al., Deciphering the 8q24.21 association for glioma. Human molecular genetics, 2013. 22(11): p. 2293-302. 7. Rajaraman, P., et al., Genome-wide association study of glioma and meta- analysis. Human Genetics 2012. 131(12): p. 1877-88. 8. Shete, S., et al., Genome-wide association study identifies five susceptibility loci for glioma. Nature Genetics, 2009. 41(8): p. 899-904. 9. Wrensch, M., et al., Variants in the CDKN2B and RTEL1 regions are associated with high-grade glioma susceptibility. Nature Genetics, 2009. 41(8): p. 905-8. 10. Sanson, M., et al., Chromosome 7p11.2 (EGFR) variation influences glioma risk. Human molecular genetics, 2011. 20(14): p. 2897-904. 11. D, P., A. GJ, and F. D, Neuroglial cells, in Neuroscience, 2nd edition. 2001, Sunderland: MA. 12. Louis, D.N., et al., The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathol, 2007. 114(2): p. 97-109. 13. Stupp, R., et al., Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. New England journal of medicine, 2005. 352(10): p. 987-96. 14. Berg-Beckhoff, G., et al., History of allergic disease and epilepsy and risk of glioma and meningioma (INTERPHONE study group, Germany). European Journal Of Epidemiology, 2009. 24(8): p. 433-440. 15. Il'yasova, D., et al., Association between glioma and history of allergies, asthma, and eczema: a case-control study with three groups of controls. Cancer Epidemiology, Biomarkers and Prevention, 2009. 18(4): p. 1232-1238. 16. McCarthy, B.J., et al., Assessment of type of allergy and antihistamine use in the development of glioma. Cancer Epidemiology, Biomarkers and Prevention, 2011. 20(2): p. 370-378. 17. Scheurer, M.E., et al., Long-term anti-inflammatory and antihistamine medication use and adult glioma risk. Cancer Epidemiology, Biomarkers and Prevention, 2008. 17(5): p. 1277-1281.

149

18. Schoemaker, M.J., et al., History of allergies and risk of glioma in adults. International Journal Of Cancer, 2006. 119(9): p. 2165-2172. 19. Turner, M.C., et al., Allergy and brain tumors in the INTERPHONE study: pooled results from Australia, Canada, France, Israel, and New Zealand. Cancer Causes and Control, 2013. 24(5): p. 949-960. 20. Wiemels, J.L., et al., IgE, allergy, and risk of glioma: update from the San Francisco Bay Area Adult Glioma Study in the temozolomide era. International Journal Of Cancer, 2009. 125(3): p. 680-687. 21. Wigertz, A., et al., Allergic conditions and brain tumor risk. American Journal Of Epidemiology, 2007. 166(8): p. 941-950. 22. Linos, E., et al., Atopy and risk of brain tumors: a meta-analysis. Journal of the National Cancer Institute, 2007. 99(20): p. 1544-1550. 23. McCarthy, B.J., et al., Risk factors for oligodendroglial tumors: a pooled international study. Neuro-Oncology, 2011. 13(2): p. 242-250. 24. Wiemels, J.L., et al., Reduced immunoglobulin E and allergy among adults with glioma compared with controls. Cancer Res, 2004. 64(22): p. 8468-73. 25. Wiemels, J.L., et al., History of allergies among adults with glioma and controls. Int J Cancer, 2002. 98(4): p. 609-15. 26. Schwartzbaum, J., et al., Association between prediagnostic IgE levels and risk of glioma. J Natl Cancer Inst, 2012. 104(16): p. 1251-9. 27. Jensen-Jarolim, E., et al., AllergoOncology: the role of IgE-mediated allergy in cancer. Allergy, 2008. 63(10): p. 1255-66. 28. Sherman, P.W., E. Holland, and J.S. Sherman, Allergies: their role in cancer prevention. Q Rev Biol, 2008. 83(4): p. 339-62. 29. Lachance, D.H., et al., Associations of high-grade glioma with glioma risk alleles and histories of allergy and smoking. American Journal Of Epidemiology, 2011. 174(5): p. 574-581. 30. Schoemaker, M.J., et al., Interaction between 5 genetic variants and allergy in glioma risk. American Journal Of Epidemiology, 2010. 171(11): p. 1165-1173. 31. Braganza, M.Z., et al., Ionizing radiation and the risk of brain and central nervous system tumors: a systematic review. Neuro Oncol, 2012. 14(11): p. 1316- 24. 32. Preston, D.L., et al., Solid cancer incidence in atomic bomb survivors: 1958-1998. Radiation research, 2007. 168(1): p. 1-64. 33. Preston, D.L., et al., Tumors of the nervous system and pituitary gland associated with atomic bomb radiation exposure. J Natl Cancer Inst, 2002. 94(20): p. 1555- 63. 34. Ron, E., et al., Tumors of the brain and nervous system after radiotherapy in childhood. N Engl J Med, 1988. 319(16): p. 1033-9. 35. Karlsson, P., et al., Intracranial tumors after exposure to ionizing radiation during infancy: a pooled analysis of two Swedish cohorts of 28,008 infants with skin hemangioma. Radiat Res, 1998. 150(3): p. 357-64. 36. Karlsson, P., et al., Intracranial tumors after radium treatment for skin hemangioma during infancy--a cohort and case-control study. Radiat Res, 1997. 148(2): p. 161-7. 150

37. Lindberg, S., et al., Cancer incidence after radiotherapy for skin haemangioma during infancy. Acta Oncol, 1995. 34(6): p. 735-40. 38. Sadetzki, S., et al., Long-term follow-up for brain tumor development after childhood exposure to ionizing radiation for tinea capitis. Radiation research, 2005. 163(4): p. 424-32. 39. Shore, R.E., et al., Tumors and other diseases following childhood x-ray treatment for ringworm of the scalp (Tinea capitis). Health Phys, 2003. 85(4): p. 404-8. 40. Yeh, H., et al., Cancer incidence after childhood nasopharyngeal radium irradiation: a follow-up study in Washington County, Maryland. Am J Epidemiol, 2001. 153(8): p. 749-56. 41. Neglia, J.P., et al., New primary neoplasms of the central nervous system in survivors of childhood cancer: a report from the Childhood Cancer Survivor Study. J Natl Cancer Inst, 2006. 98(21): p. 1528-37. 42. Little, M.P., et al., Risks of brain tumour following treatment for cancer in childhood: modification by genetic factors, radiotherapy and chemotherapy. Int J Cancer, 1998. 78(3): p. 269-75. 43. Taylor, A.J., et al., Population-based risks of CNS tumors in survivors of childhood cancer: the British Childhood Cancer Survivor Study. J Clin Oncol, 2010. 28(36): p. 5287-93. 44. Walter, A.W., et al., Secondary brain tumors in children treated for acute lymphoblastic leukemia at St Jude Children's Research Hospital. J Clin Oncol, 1998. 16(12): p. 3761-7. 45. Bondy, M.L., et al., Brain tumor epidemiology: consensus from the Brain Tumor Epidemiology Consortium. Cancer, 2008. 113(7 Suppl): p. 1953-68. 46. Neglia, J.P., et al., New primary neoplasms of the central nervous system in survivors of childhood cancer: a report from the Childhood Cancer Survivor Study. Journal of the National Cancer Institute, 2006. 98(21): p. 1528-37. 47. Taylor, A.J., et al., Population-based risks of CNS tumors in survivors of childhood cancer: the British Childhood Cancer Survivor Study. Journal of clinical oncology, 2010. 28(36): p. 5287-93. 48. Preston-Martin, S., W. Mack, and B.E. Henderson, Risk factors for gliomas and meningiomas in males in Los Angeles County. Cancer research, 1989. 49(21): p. 6137-43. 49. Davis, F., et al., Medical diagnostic radiation exposures and risk of gliomas. Radiation research, 2011. 175(6): p. 790-6. 50. Pearce, M.S., et al., Radiation exposure from CT scans in childhood and subsequent risk of leukaemia and brain tumours: a retrospective cohort study. Lancet, 2012. 380(9840): p. 499-505. 51. Mathews, J.D., et al., Cancer risk in 680,000 people exposed to computed tomography scans in childhood or adolescence: data linkage study of 11 million Australians. BMJ, 2013. 346: p. f2360. 52. Cowppli-Bony, A., et al., Brain tumors and hormonal factors: review of the epidemiological literature. Cancer Causes Control, 2011. 22(5): p. 697-714.

151

53. Zong, H., et al., Reproductive factors in relation to risk of brain tumors in women: an updated meta-analysis of 27 independent studies. Tumour Biol, 2014. 35(11): p. 11579-86. 54. Qi, Z.Y., et al., Exogenous and endogenous hormones in relation to glioma in women: a meta-analysis of 11 case-control studies. PLoS One, 2013. 8(7): p. e68695. 55. Lambe, M., P. Coogan, and J. Baron, Reproductive factors and the risk of brain tumors: a population-based study in Sweden. Int J Cancer, 1997. 72(3): p. 389-93. 56. Hatch, E.E., et al., Reproductive and hormonal factors and risk of brain tumors in adult females. Int J Cancer, 2005. 114(5): p. 797-805. 57. Wigertz, A., et al., Reproductive factors and risk of meningioma and glioma. Cancer Epidemiol Biomarkers Prev, 2008. 17(10): p. 2663-70. 58. Silvera, S.A., A.B. Miller, and T.E. Rohan, Hormonal and reproductive factors and risk of glioma: a prospective cohort study. Int J Cancer, 2006. 118(5): p. 1321-4. 59. Kabat, G.C., et al., Reproductive factors and exogenous hormone use and risk of adult glioma in women in the NIH-AARP Diet and Health Study. Int J Cancer, 2011. 128(4): p. 944-50. 60. Huang, K., et al., Reproductive factors and risk of glioma in women. Cancer Epidemiol Biomarkers Prev, 2004. 13(10): p. 1583-8. 61. Michaud, D.S., et al., Reproductive factors and exogenous hormone use in relation to risk of glioma and meningioma in a large European cohort study. Cancer Epidemiol Biomarkers Prev, 2010. 19(10): p. 2562-9. 62. Felini, M.J., et al., Reproductive factors and hormone use and risk of adult gliomas. Cancer Causes Control, 2009. 20(1): p. 87-96. 63. Wigertz, A., et al., Risk of brain tumors associated with exposure to exogenous female sex hormones. Am J Epidemiol, 2006. 164(7): p. 629-36. 64. Benson, V.S., et al., Hormone replacement therapy and incidence of central nervous system tumours in the Million Women Study. Int J Cancer, 2010. 127(7): p. 1692-8. 65. Benson, V.S., et al., Menopausal hormone therapy and central nervous system tumor risk: large UK prospective study and meta-analysis. Int J Cancer, 2015. 136(10): p. 2369-77. 66. Benson, V.S., et al., Lifestyle factors and primary glioma and meningioma tumours in the Million Women Study cohort. Br J Cancer, 2008. 99(1): p. 185-90. 67. Andersen, L., et al., Hormonal contraceptive use and risk of glioma among younger women: a nationwide case-control study. Br J Clin Pharmacol, 2015. 79(4): p. 677-84. 68. Johansen, C., et al., Study designs may influence results: the problems with questionnaire-based case-control studies on the epidemiology of glioma. Br J Cancer, 2017. 116(7): p. 841-848. 69. Botstein, D., et al., Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet, 1980. 32(3): p. 314-31.

152

70. Lander, E.S. and D. Botstein, Mapping complex genetic traits in humans: new methods using a complete RFLP linkage map. Cold Spring Harb Symp Quant Biol, 1986. 51 Pt 1: p. 49-62. 71. Weissenbach, J., et al., A second-generation linkage map of the . Nature, 1992. 359(6398): p. 794-801. 72. Kruglyak, L., The road to genome-wide association studies. Nat Rev Genet, 2008. 9(4): p. 314-8. 73. Risch, N. and K. Merikangas, The future of genetic studies of complex human diseases. Science, 1996. 273(5281): p. 1516-7. 74. Khoury, M.J. and Q. Yang, The future of genetic studies of complex human diseases: an epidemiologic perspective. Epidemiology, 1998. 9(3): p. 350-4. 75. Collins, F.S., M.S. Guyer, and A. Charkravarti, Variations on a theme: cataloging human DNA sequence variation. Science, 1997. 278(5343): p. 1580-1. 76. Wang, D.G., et al., Large-scale identification, mapping, and genotyping of single- nucleotide polymorphisms in the human genome. Science, 1998. 280(5366): p. 1077-82. 77. International HapMap, C., A haplotype map of the human genome. Nature, 2005. 437(7063): p. 1299-320. 78. Ozaki, K., et al., Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet, 2002. 32(4): p. 650-4. 79. Klein, R.J., et al., Complement factor H polymorphism in age-related macular degeneration. Science, 2005. 308(5720): p. 385-9. 80. Wellcome Trust Case Control, C., Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 2007. 447(7145): p. 661-78. 81. Genomes Project, C., et al., A global reference for human genetic variation. Nature, 2015. 526(7571): p. 68-74. 82. Golding, J., et al., ALSPAC--the Avon Longitudinal Study of Parents and Children. I. Study methodology. Paediatr Perinat Epidemiol, 2001. 15(1): p. 74- 87. 83. Moayyeri, A., et al., The UK Adult Twin Registry (TwinsUK Resource). Twin Res Hum Genet, 2013. 16(1): p. 144-9. 84. McCarthy, S., et al., A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet, 2016. 48(10): p. 1279-83. 85. Marchini, J. and B. Howie, Genotype imputation for genome-wide association studies. Nat Rev Genet, 2010. 11(7): p. 499-511. 86. Marjoram, P., A. Zubair, and S.V. Nuzhdin, Post-GWAS: where next? More samples, more SNPs or more biology? Heredity (Edinb), 2014. 112(1): p. 79-88. 87. Bush, W.S., S.M. Dudek, and M.D. Ritchie, Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac Symp Biocomput, 2009: p. 368-79. 88. Cantor, R.M., K. Lange, and J.S. Sinsheimer, Prioritizing GWAS results: A review of statistical methods and recommendations for their application. Am J Hum Genet, 2010. 86(1): p. 6-22. 153

89. Malmer, B., R. Henriksson, and H. Grönberg, Familial brain tumours—genetics or environment? A nationwide cohort study of cancer risk in spouses and first- degree relatives of brain tumour patients. International Journal of Cancer, 2003. 106(2): p. 260-263. 90. Malmer, B., et al., Familial aggregation of astrocytoma in northern Sweden: An epidemiological cohort study. International Journal of Cancer, 1999. 81(3): p. 366-370. 91. Hill, D.A., et al., Cancer in first-degree relatives and risk of glioma in adults. Cancer Epidemiol Biomarkers Prev, 2003. 12(12): p. 1443-8. 92. Scheurer, M.E., et al., Aggregation of cancer in first-degree relatives of patients with glioma. Cancer Epidemiol Biomarkers Prev, 2007. 16(11): p. 2491-5. 93. Malmer, B., et al., Familial aggregation of astrocytoma in northern Sweden: an epidemiological cohort study. Int J Cancer, 1999. 81(3): p. 366-70. 94. Scheurer, M.E., et al., Familial aggregation of glioma: a pooled analysis. Am J Epidemiol, 2010. 172(10): p. 1099-107. 95. Shete, S., et al., Genome-wide high-density SNP linkage search for glioma susceptibility loci: results from the Gliogene Consortium. Cancer research, 2011. 71(24): p. 7568-75. 96. Sun, X., et al., A variable age of onset segregation model for linkage analysis, with correction for ascertainment, applied to glioma. Cancer Epidemiology, Biomarkers and Prevention, 2012. 21(12): p. 2242-51. 97. Paunu, N., et al., A novel low-penetrance locus for familial glioma at 15q23- q26.3. Cancer Res, 2002. 62(13): p. 3798-802. 98. Malmer, B., et al., Homozygosity mapping of familial glioma in Northern Sweden. Acta Oncol, 2005. 44(2): p. 114-9. 99. Liu, Y., et al., Insight in glioma susceptibility through an analysis of 6p22.3, 12p13.33-12.1, 17q22-23.2 and 18q23 SNP genotypes in familial and non- familial glioma. Hum Genet, 2012. 131(9): p. 1507-17. 100. Andersson, U., et al., Germline rearrangements in families with strong family history of glioma and malignant melanoma, colon, and breast cancer. Neuro Oncol, 2014. 16(10): p. 1333-40. 101. Bainbridge, M.N., et al., Germline Mutations in Shelterin Complex Genes Are Associated With Familial Glioma. Journal of the National Cancer Institute, 2015. 107(1). 102. Walsh, K.M., et al., Variants near TERT and TERC influencing telomere length are associated with high-grade glioma risk. Nature Genetics, 2014. 103. Jenkins, R.B., et al., A low-frequency variant at 8q24.21 is strongly associated with risk of oligodendroglial tumors and astrocytomas with IDH1 or IDH2 mutation. Nature Genetics, 2012. 44(10): p. 1122-5. 104. Cardis, E., et al., The INTERPHONE study: design, epidemiological methods, and description of the study population. Eur J Epidemiol, 2007. 22(9): p. 647-64. 105. Kinnersley, B., et al., Genome-wide association study identifies multiple susceptibility loci for glioma. Nat Commun, 2015. 6: p. 8559.

154

106. Amirian, E.S., et al., The Glioma International Case-Control Study: A Report From the Genetic Epidemiology of Glioma International Consortium. Am J Epidemiol, 2016. 183(2): p. 85-91. 107. Wang, K., M. Li, and M. Bucan, Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet, 2007. 81(6): p. 1278-83. 108. Lee, S., et al., Pathway-based approach using hierarchical components of collapsed rare variants. Bioinformatics, 2016. 32(17): p. i586-i594. 109. Lamparter, D., et al., Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. PLoS Comput Biol, 2016. 12(1): p. e1004714. 110. Servin, B. and M. Stephens, Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet, 2007. 3(7): p. e114. 111. Li, M.X., et al., GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet, 2011. 88(3): p. 283-93. 112. Mishra, A. and S. Macgregor, VEGAS2: Software for More Flexible Gene-Based Testing. Twin Res Hum Genet, 2015. 18(1): p. 86-91. 113. Mishra, A. and S. MacGregor, A Novel Approach for Pathway Analysis of GWAS Data Highlights Role of BMP Signaling and Muscle Cell Differentiation in Colorectal Cancer Susceptibility. Twin Res Hum Genet, 2017. 20(1): p. 1-9. 114. Kwak, I.Y. and W. Pan, Gene- and pathway-based association tests for multiple traits with GWAS summary statistics. Bioinformatics, 2017. 33(1): p. 64-71. 115. Huang, H., et al., Gene-based tests of association. PLoS Genet, 2011. 7(7): p. e1002177. 116. D'Addabbo, A., et al., RS-SNP: a random-set method for genome-wide association studies. BMC Genomics, 2011. 12: p. 166. 117. Weng, L., et al., SNP-based pathway enrichment analysis for genome-wide association studies. BMC Bioinformatics, 2011. 12: p. 99. 118. Chanda, P., et al., Fast association tests for genes with FAST. PLoS One, 2013. 8(7): p. e68585. 119. Simes, J., An Improved Bonferroni Procedure for Multiple Tests of Significance. Vol. 73. 1986. 751-754. 120. Petrovski, S. and D.B. Goldstein, Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome Biol, 2016. 17(1): p. 157. 121. Need, A.C. and D.B. Goldstein, Next generation disparities in human genomics: concerns and remedies. Trends Genet, 2009. 25(11): p. 489-94. 122. Ramos, E. and C. Rotimi, The A's, G's, C's, and T's of health disparities. BMC Med Genomics, 2009. 2: p. 29. 123. Bustamante, C.D., E.G. Burchard, and F.M. De la Vega, Genomics for the world. Nature, 2011. 475(7355): p. 163-5. 124. Liu, L.Y., et al., Sex differences in disease risk from reported genome-wide association study findings. Hum Genet, 2012. 131(3): p. 353-64. 125. Dorak, M.T. and E. Karpuzoglu, Gender differences in cancer susceptibility: an inadequately addressed issue. Front Genet, 2012. 3: p. 268.

155

126. Arnold, A.P. and A.J. Lusis, Understanding the sexome: measuring and reporting sex differences in gene systems. Endocrinology, 2012. 153(6): p. 2551-5. 127. Ober, C., D.A. Loisel, and Y. Gilad, Sex-specific genetic architecture of human disease. Nat Rev Genet, 2008. 9(12): p. 911-22. 128. Uekert, S.J., et al., Sex-related differences in immune development and the expression of atopy in early childhood. J Allergy Clin Immunol, 2006. 118(6): p. 1375-81. 129. Reinius, B., et al., An evolutionarily conserved sexual signature in the primate brain. PLoS Genet, 2008. 4(6): p. e1000100. 130. Rinn, J.L. and M. Snyder, Sexual dimorphism in mammalian gene expression. Trends Genet, 2005. 21(5): p. 298-305. 131. Ellegren, H. and J. Parsch, The evolution of sex-biased genes and sex-biased gene expression. Nat Rev Genet, 2007. 8(9): p. 689-98. 132. Yang, X., et al., Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Res, 2006. 16(8): p. 995-1004. 133. Gershoni, M. and S. Pietrokovski, The landscape of sex-differential transcriptome and its consequent selection in human adults. BMC Biol, 2017. 15(1): p. 7. 134. Ngo, S.T., F.J. Steyn, and P.A. McCombe, Gender differences in autoimmune disease. Front Neuroendocrinol, 2014. 35(3): p. 347-69. 135. Powers, M.S., et al., From sexless to sexy: Why it is time for human genetics to consider and report analyses of sex. Biol Sex Differ, 2017. 8: p. 15. 136. Hickey, P.F. and M. Bahlo, X chromosome association testing in genome wide association studies. Genet Epidemiol, 2011. 35(7): p. 664-70. 137. Wise, A.L., L. Gyi, and T.A. Manolio, eXclusion: toward integrating the X chromosome in genome-wide association analyses. Am J Hum Genet, 2013. 92(5): p. 643-7. 138. Grassmann, F., et al., A Candidate Gene Association Study Identifies DAPL1 as a Female-Specific Susceptibility Locus for Age-Related Macular Degeneration (AMD). Neuromolecular Med, 2015. 17(2): p. 111-20. 139. Kocarnik, J.M., et al., Pleiotropic and sex-specific effects of cancer GWAS SNPs on melanoma risk in the population architecture using genomics and epidemiology (PAGE) study. PLoS One, 2015. 10(3): p. e0120491. 140. Finkel, T., M. Serrano, and M.A. Blasco, The common biology of cancer and ageing. Nature, 2007. 448(7155): p. 767-74. 141. Johnson, S.C., et al., Genetic evidence for common pathways in human age- related diseases. Aging Cell, 2015. 14(5): p. 809-17. 142. Jeck, W.R., A.P. Siebold, and N.E. Sharpless, Review: a meta-analysis of GWAS and age-associated diseases. Aging Cell, 2012. 11(5): p. 727-31. 143. Rukh, G., et al., Inverse relationship between a genetic risk score of 31 BMI loci and weight change before and after reaching middle age. Int J Obes (Lond), 2016. 40(2): p. 252-9. 144. Raynor, L.A., N. Pankratz, and L.G. Spector, An analysis of measures of effect size by age of onset in cancer genomewide association studies. Genes Chromosomes Cancer, 2013. 52(9): p. 855-9.

156

145. Agopian, A.J., L.M. Eastcott, and L.E. Mitchell, Age of onset and effect size in genome-wide association studies. Birth Defects Res A Clin Mol Teratol, 2012. 94(11): p. 908-11. 146. Ahsan, H., et al., A genome-wide association study of early-onset breast cancer identifies PFKM as a novel breast cancer gene and supports a common genetic spectrum for breast cancer at any age. Cancer Epidemiol Biomarkers Prev, 2014. 23(4): p. 658-69. 147. Al Olama, A.A., et al., A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer. Nat Genet, 2014. 46(10): p. 1103-9. 148. Walsh, K.M., et al., Genetic variants in telomerase-related genes are associated with an older age at diagnosis in glioma patients: evidence for distinct pathways of gliomagenesis. Neuro-oncology, 2013. 15(8): p. 1041-7. 149. Enciso-Mora, V., et al., Low penetrance susceptibility to glioma is caused by the TP53 variant rs78378222. British journal of cancer, 2013. 108(10): p. 2178-85. 150. Walsh, K.M., et al., Longer genotypically-estimated leukocyte telomere length is associated with increased adult glioma risk. Oncotarget, 2015. 6(40): p. 42468- 77. 151. Ahmed, S., et al., Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2. Nat Genet, 2009. 41(5): p. 585-90. 152. Yeager, M., et al., Identification of a new prostate cancer susceptibility locus on chromosome 8q24. Nat Genet, 2009. 41(10): p. 1055-7. 153. Hunter, D.J., et al., A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet, 2007. 39(7): p. 870-4. 154. Amos, C.I., et al., The OncoArray Consortium: a Network for Understanding the Genetic Architecture of Common Cancers. Cancer Epidemiol Biomarkers Prev, 2016. 155. Delaneau, O., J. Marchini, and J.F. Zagury, A linear complexity phasing method for thousands of genomes. Nat Methods, 2012. 9(2): p. 179-81. 156. Howie, B.N., P. Donnelly, and J. Marchini, A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet, 2009. 5(6): p. e1000529. 157. Li, Y., et al., FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data. BMC Bioinformatics, 2016. 17: p. 122. 158. Siegel, R.L., K.D. Miller, and A. Jemal, Cancer Statistics, 2017. CA Cancer J Clin, 2017. 67(1): p. 7-30. 159. Krishnamachari, B., et al., A pooled multisite analysis of the effects of female reproductive hormones on glioma risk. Cancer Causes Control, 2014. 25(8): p. 1007-13. 160. Enciso-Mora, V., et al., Deciphering the 8q24.21 association for glioma. Hum Mol Genet, 2013. 22(11): p. 2293-302. 161. Jenkins, R.B., et al., A low-frequency variant at 8q24.21 is strongly associated with risk of oligodendroglial tumors and astrocytomas with IDH1 or IDH2 mutation. Nat Genet, 2012. 44(10): p. 1122-5. 157

162. Howlader N, N.A., Krapcho M, Miller D, Bishop K, Kosary CL, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds), SEER Cancer Statistics Review, 1975-2014, based on November 2016 SEER data submission. 2017, National Cancer Institute: Bethesda, MD. 163. Brennan, C.W., et al., The somatic genomic landscape of glioblastoma. Cell, 2013. 155(2): p. 462-77. 164. The Cancer Genome Atlas Research Network, et al., Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med, 2015. 372(26): p. 2481-98. 165. Purcell, S. and C. Chang. PLINK 1.9. Available from: https://www.cog- genomics.org/plink2. 166. Das, S., et al., Next-generation genotype imputation service and methods. Nat Genet, 2016. 48(10): p. 1284-7. 167. Loh, P.R., et al., Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet, 2016. 48(11): p. 1443-1448. 168. Ceccarelli, M., et al., Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell, 2016. 164(3): p. 550-63. 169. Eckel-Passow, J.E., et al., Glioma Groups Based on 1p/19q, IDH, and TERT Promoter Mutations in Tumors. N Engl J Med, 2015. 372(26): p. 2499-508. 170. Marchini, J., et al., A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet, 2007. 39(7): p. 906-13. 171. Paternoster, R., et al., USING THE CORRECT STATISTICAL TEST FOR THE EQUALITY OF REGRESSION COEFFICIENTS. Criminology, 1998. 36(4): p. 859-866. 172. Mittelstrass, K., et al., Discovery of Sexual Dimorphisms in Metabolic and Genetic Biomarkers. PLOS Genetics, 2011. 7(8): p. e1002215. 173. Liu, J.Z., et al., Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat Genet, 2010. 42(5): p. 436-40. 174. Machiela, M.J. and S.J. Chanock, LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics, 2015. 31(21): p. 3555-7. 175. Mittlbock, M. and M. Schemper, Explained variation for logistic regression. Stat Med, 1996. 15(19): p. 1987-97. 176. Sahm, F., et al., Farewell to oligoastrocytoma: in situ molecular genetics favor classification as either oligodendroglioma or astrocytoma. Acta Neuropathol, 2014. 128(4): p. 551-9. 177. Sanson, M., et al., Chromosome 7p11.2 (EGFR) variation influences glioma risk. Hum Mol Genet, 2011. 20(14): p. 2897-904. 178. Filardo, E.J., Epidermal growth factor receptor (EGFR) transactivation by estrogen via the G-protein-coupled receptor, GPR30: a novel signaling pathway with potential significance for breast cancer. J Steroid Biochem Mol Biol, 2002. 80(2): p. 231-8. 179. Sun, T., et al., Sexually dimorphic RB inactivation underlies mesenchymal glioblastoma prevalence in males. J Clin Invest, 2014. 124(9): p. 4123-33.

158

180. Consortium, G.T., The Genotype-Tissue Expression (GTEx) project. Nat Genet, 2013. 45(6): p. 580-5. 181. Broad Institute TCGA Genome Data Analysis Center, Analysis-ready standardized TCGA data from Broad GDAC Firehose 2016_01_28 run. Broad Institute of MIT and Harvard. 2016. 182. Hayashi, Y., et al., The Complete Loss of Receptors MET and RON Is a Poor Prognostic Factor in Patients with Extrahepatic Cholangiocarcinoma. Anticancer Res, 2016. 36(12): p. 6585-6592. 183. Lee, J.R., et al., Overexpression of glutathione peroxidase 1 predicts poor prognosis in oral squamous cell carcinoma. J Cancer Res Clin Oncol, 2017. 143(11): p. 2257-2265. 184. Elks, C.E., et al., Thirty new loci for age at menarche identified by a meta- analysis of genome-wide association studies. Nat Genet, 2010. 42(12): p. 1077- 85. 185. Perry, J.R., et al., Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature, 2014. 514(7520): p. 92-7. 186. Pokrajac-Bulian, A., M. Toncic, and P. Anic, Assessing the factor structure of the Body Uneasiness Test (BUT) in an overweight and obese Croatian non-clinical sample. Eat Weight Disord, 2015. 20(2): p. 215-22. 187. Raelson, J.V., et al., Genome-wide association study for Crohn's disease in the Quebec Founder Population identifies multiple validated disease loci. Proc Natl Acad Sci U S A, 2007. 104(37): p. 14747-52. 188. Sun, T., et al., An integrative view on sex differences in brain tumors. Cell Mol Life Sci, 2015. 72(17): p. 3323-42. 189. Lichtenstein, P., et al., Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med, 2000. 343(2): p. 78-85. 190. Liberzon, A., et al., Molecular signatures database (MSigDB) 3.0. Bioinformatics, 2011. 27(12): p. 1739-40. 191. Subramanian, A., et al., Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 2005. 102(43): p. 15545-50. 192. Kanehisa, M. and S. Goto, KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 2000. 28(1): p. 27-30. 193. D'Eustachio, P., Reactome knowledgebase of human biological pathways and processes. Methods Mol Biol, 2011. 694: p. 49-61. 194. Nishimura, D., BioCarta. Biotech Software & Internet Report, 2001. 2(3): p. 117- 120. 195. R Core Team. R: A language and environment for statistical computing. 2017; Available from: http://www.R-project.org/. 196. Wickham, H. ggplot2: elegant graphics for data analysis. 2009; Available from: http://had.co.nz/ggplot2/book. 197. Csardi G, N.T., The igraph software package for complex network research. InterJournal, 2006. Complex Systems: p. 1695.

159

198. Briatte, F. ggnetwork: Geometries to Plot Networks with 'ggplot2'. R package version 0.5.1. 2016; Available from: https://CRAN.R- project.org/package=ggnetwork. 199. Sales, G., E. Calura, and C. Romualdi. graphite: GRAPH Interaction from pathway Topological Environment. R package version 1.16.0. 2015; Available from: http://www.bioconductor.org/packages/release/bioc/html/graphite.html. 200. Auguie, B. gridExtra: Miscellaneous Functions for "Grid" Graphics. R package version 2.3. 2017; Available from: https://CRAN.R- project.org/package=gridExtra. 201. Eeles, R.A., et al., Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat Genet, 2013. 45(4): p. 385-91, 391e1-2. 202. Dunlop, M.G., et al., Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat Genet, 2012. 44(7): p. 770-6. 203. Closa, A., et al., Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis. Carcinogenesis, 2014. 35(9): p. 2039-46. 204. Darras, B.T., D.T. Miller, and D.K. Urion, Dystrophinopathies, in GeneReviews(R), M.P. Adam, et al., Editors. 1993: Seattle (WA). 205. Pantaleo, M.A., et al., Dystrophin deregulation is associated with tumor progression in KIT/PDGFRA mutant gastrointestinal stromal tumors. Clin Sarcoma Res, 2014. 4: p. 9. 206. Wang, Y., et al., Dystrophin is a tumor suppressor in human cancers with myogenic programs. Nat Genet, 2014. 46(6): p. 601-6. 207. Hatzfeld, M., The armadillo family of structural proteins. Int Rev Cytol, 1999. 186: p. 179-224. 208. Kurochkin, I.V., et al., ALEX1, a novel human armadillo repeat protein that is expressed differentially in normal tissues and carcinomas. Biochem Biophys Res Commun, 2001. 280(1): p. 340-7. 209. Wang, J., H.H. Huang, and F.B. Liu, ZNF185 inhibits growth and invasion of lung adenocarcinoma cells through inhibition of the akt/gsk3beta pathway. J Biol Regul Homeost Agents, 2016. 30(3): p. 683-691. 210. Vanaja, D.K., et al., Transcriptional silencing of zinc finger protein 185 identified by expression profiling is associated with prostate cancer progression. Cancer Res, 2003. 63(14): p. 3877-82. 211. Codd, V., et al., Identification of seven loci affecting mean telomere length and their association with disease. Nat Genet, 2013. 45(4): p. 422-7, 427e1-2. 212. Walsh, K.M., et al., Variants near TERT and TERC influencing telomere length are associated with high-grade glioma risk. Nat Genet, 2014. 46(7): p. 731-5. 213. Walsh, K.M., et al., Telomere maintenance and the etiology of adult glioma. Neuro Oncol, 2015. 17(11): p. 1445-52. 214. Scarbrough, P.M., et al., Exploring the association between melanoma and glioma risks. Ann Epidemiol, 2014. 24(6): p. 469-74. 215. Paunu, N., et al., Cancer incidence in families with multiple glioma patients. Int J Cancer, 2002. 97(6): p. 819-22.

160

216. Ransohoff, K.J., et al., Two-stage genome-wide association study identifies a novel susceptibility locus associated with melanoma. Oncotarget, 2017. 8(11): p. 17586-17592. 217. Kocarnik, J.M., et al., Replication of associations between GWAS SNPs and melanoma risk in the Population Architecture Using Genomics and Epidemiology (PAGE) Study. J Invest Dermatol, 2014. 134(7): p. 2049-2052. 218. Endicott, A.A., J.W. Taylor, and K.M. Walsh, Telomere length connects melanoma and glioma predispositions. Aging (Albany NY), 2016. 8(3): p. 423-4. 219. Mirina, A., et al., Gene size matters. PLoS One, 2012. 7(11): p. e49093. 220. Ostrom, Q.T., et al., CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2010-2014. Neuro-oncology, 2017. 19(S5): p. v1-v88. 221. Surveillance Epidemiology and End Results (SEER) Program, SEER*Stat Database: Incidence - SEER 18 Regs Custom Data (with additional treatment fields), Nov 2016 Sub (2000-2014) - Linked To County Attributes - Total U.S., 1969-2015 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2017, based on the November 2016 submission. 2017. 222. Pe'er, I., et al., Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol, 2008. 32(4): p. 381-5. 223. Karssen, L.C., C.M. van Duijn, and Y.S. Aulchenko, The GenABEL Project for statistical genomics. F1000Res, 2016. 5: p. 914. 224. Turner, S.D., qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165. 2014. 225. Ghasimi, S., et al., Genetic risk variants in the CDKN2A/B, RTEL1 and EGFR genes are associated with somatic biomarkers in glioma. J Neurooncol, 2016. 127(3): p. 483-92. 226. Watanabe, K., et al., Overexpression of the EGF receptor and p53 mutations are mutually exclusive in the evolution of primary and secondary glioblastomas. Brain Pathol, 1996. 6(3): p. 217-23; discussion 23-4. 227. Ohgaki, H. and P. Kleihues, The definition of primary and secondary glioblastoma. Clin Cancer Res, 2013. 19(4): p. 764-72. 228. Nobusawa, S., et al., IDH1 mutations as molecular signature and predictive factor of secondary glioblastomas. Clin Cancer Res, 2009. 15(19): p. 6002-7. 229. Balss, J., et al., Analysis of the IDH1 codon 132 mutation in brain tumors. Acta Neuropathol, 2008. 116(6): p. 597-602. 230. Yan, H., et al., IDH1 and IDH2 mutations in gliomas. New England Journal of Medicine, 2009. 360(8): p. 765-73. 231. Toedt, G., et al., Molecular signatures classify astrocytic gliomas by IDH1 mutation status. Int J Cancer, 2011. 128(5): p. 1095-103. 232. Ohgaki, H. and P. Kleihues, Genetic profile of astrocytic and oligodendroglial gliomas. Brain Tumor Pathol, 2011. 28(3): p. 177-83.

161

233. Wiestler, B., et al., Malignant astrocytomas of elderly patients lack favorable molecular markers: an analysis of the NOA-08 study collective. Neuro Oncol, 2013. 15(8): p. 1017-26. 234. Leibetseder, A., et al., Outcome and molecular characteristics of adolescent and young adult patients with newly diagnosed primary glioblastoma: a study of the Society of Austrian Neurooncology (SANO). Neuro Oncol, 2013. 15(1): p. 112-21. 235. Ferguson, S.D., et al., GBM-associated mutations and altered protein expression are more common in young patients. Oncotarget, 2016. 7(43): p. 69466-69478. 236. van der Sluis, S., et al., Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies. PLoS One, 2010. 5(11): p. e13929. 237. Gordon, D. and S.J. Finch, Factors affecting statistical power in the detection of genetic association. J Clin Invest, 2005. 115(6): p. 1408-18. 238. Zheng, G. and X. Tian, The impact of diagnostic error on testing genetic association in case-control studies. Stat Med, 2005. 24(6): p. 869-82. 239. Edwards, B.J., et al., Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genetics, 2005. 6: p. 18-18. 240. Karami, S., et al., Telomere structure and maintenance gene variants and risk of five cancer types. Int J Cancer, 2016. 139(12): p. 2655-2670. 241. Barendse, W., The effect of measurement error of phenotypes on genome wide association studies. BMC Genomics, 2011. 12: p. 232. 242. Manchia, M., et al., The Impact of Phenotypic and Genetic Heterogeneity on Results of Genome Wide Association Studies of Complex Diseases. PLoS ONE, 2013. 8(10): p. e76295. 243. Smith, S., et al., Genome wide association studies in presence of misclassified binary responses. BMC Genetics, 2013. 14: p. 124-124. 244. van den Bent, M.J., Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician's perspective. Acta Neuropathol, 2010. 120(3): p. 297-304. 245. Aldape, K., et al., Discrepancies in diagnoses of neuroepithelial neoplasms: the San Francisco Bay Area Adult Glioma Study. Cancer, 2000. 88(10): p. 2342- 2349. 246. Scott, C.B., et al., Central pathology review in clinical trials for patients with malignant glioma. A Report of Radiation Therapy Oncology Group 83-02. Cancer, 1995. 76(2): p. 307-13. 247. Kim, B.Y., et al., Diagnostic discrepancies in malignant astrocytoma due to limited small pathological tumor sample can be overcome by IDH1 testing. J Neurooncol, 2014. 118(2): p. 405-12. 248. Louis, D.N., et al., The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary. Acta Neuropathol, 2016. 131(6): p. 803-20. 249. Konig, I.R., et al., How to include chromosome X in your genome-wide association study. Genet Epidemiol, 2014. 38(2): p. 97-103.

162

250. Xue, Y. and C. Tyler-Smith, Past successes and future opportunities for the genetics of the human Y chromosome. Hum Genet, 2017. 136(5): p. 481-483. 251. Winham, S.J., G.D. Jenkins, and J.M. Biernacka, Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias. Genet Epidemiol, 2016. 40(2): p. 123-32. 252. Chen, Z., et al., Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat Methods Med Res, 2017. 26(2): p. 567-582. 253. Calafell, F. and M.H.D. Larmuseau, The Y chromosome as the most popular marker in genetic genealogy benefits interdisciplinary research. Hum Genet, 2017. 136(5): p. 559-573. 254. Batini, C. and M.A. Jobling, Detecting past male-mediated expansions using the Y chromosome. Hum Genet, 2017. 136(5): p. 547-557. 255. Jobling, M.A. and C. Tyler-Smith, Human Y-chromosome variation in the genome-sequencing era. Nat Rev Genet, 2017. 18(8): p. 485-497. 256. Skaletsky, H., et al., The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 2003. 423(6942): p. 825-37. 257. Duijf, P.H., N. Schultz, and R. Benezra, Cancer cells preferentially lose small chromosomes. Int J Cancer, 2013. 132(10): p. 2316-26. 258. Zhang, L.J., et al., Molecular genetic evidence of Y chromosome loss in male patients with hematological disorders. Chin Med J (Engl), 2007. 120(22): p. 2002-5. 259. Veiga, L.C., et al., Loss of Y-chromosome does not correlate with age at onset of head and neck carcinoma: a case-control study. Braz J Med Biol Res, 2012. 45(2): p. 172-8. 260. Forsberg, L.A., Loss of chromosome Y (LOY) in blood cells is associated with increased risk for disease and mortality in aging men. Hum Genet, 2017. 136(5): p. 657-663. 261. Dumanski, J.P., et al., Mosaic Loss of Chromosome Y in Blood Is Associated with Alzheimer Disease. Am J Hum Genet, 2016. 98(6): p. 1208-1219. 262. Consortium, Y.C., A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res, 2002. 12(2): p. 339-48. 263. Karafet, T.M., et al., New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res, 2008. 18(5): p. 830- 8. 264. Jobling, M.A. and C. Tyler-Smith, The human Y chromosome: an evolutionary marker comes of age. Nat Rev Genet, 2003. 4(8): p. 598-612. 265. Maan, A.A., et al., The Y chromosome: a blueprint for men's health? Eur J Hum Genet, 2017. 25(11): p. 1181-1188. 266. Schafer, I., et al., Association of atopy and tentative diagnosis of skin cancer - results from occupational skin cancer screenings. J Eur Acad Dermatol Venereol, 2017. 267. Calvete, O., et al., The wide spectrum of POT1 gene variants correlates with multiple cancer types. Eur J Hum Genet, 2017. 25(11): p. 1278-1281.

163

268. Shi, J., et al., Rare missense variants in POT1 predispose to familial cutaneous malignant melanoma. Nat Genet, 2014. 46(5): p. 482-6. 269. Bainbridge, M.N., et al., Germline mutations in shelterin complex genes are associated with familial glioma. J Natl Cancer Inst, 2015. 107(1): p. 384. 270. Telomeres Mendelian Randomization, C., et al., Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study. JAMA Oncol, 2017. 3(5): p. 636-651. 271. Barrett, J.H., et al., Genome-wide association study identifies three new melanoma susceptibility loci. Nat Genet, 2011. 43(11): p. 1108-13. 272. Antonopoulou, K., et al., Updated field synopsis and systematic meta-analyses of genetic association studies in cutaneous melanoma: the MelGene database. J Invest Dermatol, 2015. 135(4): p. 1074-1079. 273. Law, M.H., et al., Genome-wide meta-analysis identifies five new susceptibility loci for cutaneous malignant melanoma. Nat Genet, 2015. 47(9): p. 987-995. 274. Bishop, D.T., et al., Genome-wide association study identifies three loci associated with melanoma risk. Nat Genet, 2009. 41(8): p. 920-5. 275. Iles, M.M., et al., A variant in FTO shows association with melanoma risk not due to BMI. Nat Genet, 2013. 45(4): p. 428-32, 432e1. 276. Song, F., et al., Identification of a melanoma susceptibility locus and somatic mutation in TET2. Carcinogenesis, 2014. 35(9): p. 2097-101. 277. Brown, K.M., et al., Common sequence variants on 20q11.22 confer melanoma susceptibility. Nat Genet, 2008. 40(7): p. 838-40. 278. Scarbrough, P.M., et al., A Cross-Cancer Genetic Association Analysis of the DNA Repair and DNA Damage Signaling Pathways for Lung, Ovary, Prostate, Breast, and Colorectal Cancer. Cancer Epidemiol Biomarkers Prev, 2016. 25(1): p. 193-200. 279. Kar, S.P., et al., Genome-Wide Meta-Analyses of Breast, Ovarian, and Prostate Cancer Association Studies Identify Multiple New Susceptibility Loci Shared by at Least Two Cancer Types. Cancer Discovery, 2016. 6(9): p. 1052-1067. 280. Fehringer, G., et al., Cross-Cancer Genome-Wide Analysis of Lung, Ovary, Breast, Prostate, and Colorectal Cancer Reveals Novel Pleiotropic Associations. Cancer Res, 2016. 76(17): p. 5103-14. 281. Hung, R.J., et al., Cross Cancer Genomic Investigation of Inflammation Pathway for Five Common Cancers: Lung, Ovary, Prostate, Breast, and Colorectal Cancer. J Natl Cancer Inst, 2015. 107(11). 282. Malmer, B., et al., GLIOGENE an International Consortium to Understand Familial Glioma. Cancer Epidemiol Biomarkers Prev, 2007. 16(9): p. 1730-4. 283. Song, X., et al., GWAS follow-up study of esophageal squamous cell carcinoma identifies potential genetic loci associated with family history of upper gastrointestinal cancer. Sci Rep, 2017. 7(1): p. 4642. 284. Fernandes, G.C., et al., Association of polymorphisms with a family history of cancer and the presence of germline mutations in the BRCA1/BRCA2 genes. Hered Cancer Clin Pract, 2016. 14: p. 2. 285. Leece, R., et al., Global Incidence of Malignant Brain and Other Central Nervous System Tumors by Histology, 2003-2007. Neuro Oncol, 2017. 164

286. Forman D, et al., Cancer Incidence in Five Continents, Vol. X (electronic version). 2013 International Agency for Research on Cancer: Lyon, France. 287. Ostrom, Q.T., et al., CBTRUS Statistical Report: Primary Brain and Other Central Nervous System Tumors Diagnosed in the United States in 2010-2014. Neuro Oncol, 2017. 19. 288. Surveillance Epidemiology and End Results (SEER) Program, DevCan database: "SEER 18 Incidence and Mortality, 2000-2011, with Kaposi Sarcoma and Mesothelioma". National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released August 2014, based on the November 2013 submission. Underlying mortality data provided by NCHS (www.cdc.gov/nchs). . 289. Ostrom, Q.T., et al., CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2008-2012. Neuro Oncol, 2015. 17 Suppl 4: p. iv1-iv62. 290. Chen, P., et al., Ethnicity delineates different genetic pathways in malignant glioma. Cancer Res, 2001. 61(10): p. 3949-54. 291. Jacobs, D.I., et al., Leveraging ethnic group incidence variation to investigate genetic susceptibility to glioma: a novel candidate SNP approach. Front Genet, 2012. 3: p. 203. 292. Liu, H.B., et al., Comprehensive study on associations between nine SNPs and glioma risk. Asian Pac J Cancer Prev, 2012. 13(10): p. 4905-8. 293. Song, X., et al., Fine mapping analysis of a region of 20q13.33 identified five independent susceptibility loci for glioma in a Chinese Han population. Carcinogenesis, 2012. 33(5): p. 1065-71. 294. Li, G., et al., RTEL1 tagging SNPs and haplotypes were associated with glioma development. Diagn Pathol, 2013. 8: p. 83. 295. Wiencke, J.K., et al., Molecular features of adult glioma associated with patient race/ethnicity, age, and a polymorphism in O6-methylguanine-DNA- methyltransferase. Cancer Epidemiol Biomarkers Prev, 2005. 14(7): p. 1774-83. 296. Egan, K.M., et al., Cancer susceptibility variants and the risk of adult glioma in a US case-control study. J Neurooncol, 2011. 104(2): p. 535-42. 297. Centers for Disease Control and Prevention National Center for Health Statistics. United States Cancer Statistics: 1999 - 2013 Incidence, WONDER Online Database. United States Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute; 2016. . 2016; Available from: http://wonder.cdc.gov/ucd-icd10.html. 298. Surveillance Epidemiology and End Results (SEER) Program, SEER*Stat Database: Incidence - SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2015 Sub (1973-2013 varying) - Linked To County Attributes - Total U.S., 1969-2014 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2016, based on the November 2015 submission. 2016.

165