The Identification of Colorectal Cancer Susceptibility Using a Cross-Species,

Systems Genetics Approach

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Madelyn Margaret Gerber

Graduate Program in Biomedical Sciences

The Ohio State University

2015

Dissertation Committee:

Amanda Ewart Toland, PhD, Advisor

Joanna Groden, PhD

Jeffrey Parvin, MD, PhD

Wolfgang Sadee, Dr.rer.nat.

1

Copyright by

Madelyn Margaret Gerber

2015

2

Abstract

Colorectal cancer (CRC) is the third most commonly diagnosed and third leading cause of cancer-related death in the United States. As much as 35% of risk for developing this disease is due to genetic risk factors. Genome-wide association studies (GWAS) have identified ~40 independent common risk variants that contribute to genetic predisposition. However, these variants fail to explain the majority of the genetic component for risk. Identification of additional susceptibility alleles for CRC is critical for developing genetic screening tools capable of predicting individuals at heightened genetic risk. This dissertation describes three projects aimed at uncovering CRC susceptibility genes using a cross-species systems genetics approach.

The first project (Chapter 2) involves assessing single nucleotide polymorphisms

(SNPs) at loci that were linked to CRC risk by GWAS for allele-specific somatic copy number gains or losses in human colorectal tumor genomes. We hypothesized that

GWAS-identified SNPs exhibit allele-specific copy number changes (termed ‗allele- specific imbalance,‘ or ASI) in the tumor genome of patient specimens in much the same fashion as oncogenes are amplified and tumor suppressors are lost. We tested this hypothesis using quantitative genotyping to detect relative gains or losses of GWAS- identified SNP alleles in a cohort of human paired colorectal tumor/normal DNA samples. Testing of 17 SNPs revealed statistically significant allele-specific copy number

ii changes at one SNP, rs6983267 at 8q24, which suggests that ASI occurs at CRC risk loci but perhaps at low frequency.

The second project discussed in this document (Chapter 3) describes a large ASI screen of SNPs tagging for candidate genes at the human orthologs of three murine CRC susceptibility quantitative trait loci (QTL). We prioritized genes for inclusion in this study based on RNA-seq data generated from the normal colons of the CRC-resistant and

CRC-susceptible mouse parental strains that were used to linkage-map these QTLs.

Because no -damaging coding SNPs were identified by RNA-seq, we focused largely on genes showing differential expression between the strains and/or genes with a documented role in cancer-relevant pathways or processes. Two SNPs in SNX10 emerged from our screen and warrant further investigation.

The third project (Chapter 4) reports our preliminary functional investigations of the hypoxia-inducible factor Epas1, whose maps to the Scc4 susceptibility QTL.

This gene is differentially expressed in the colons of the CRC-resistant and CRC- susceptible mouse strains used to map Scc4. Furthermore, five tagging SNPs within

EPAS1 show suggestive evidence of ASI in human colorectal tumor/normal DNA pairs, but were not statistically significant after correction for multiple comparisons. These data prompted us to investigate EPAS1/Epas1 for functional effects in human and mouse colon cell lines. We tested this gene for effects on β-catenin/TCF-mediated transcriptional activity under conditions of normoxia and hypoxia in SW480 and

HCT116 human colon adenocarcinoma cell lines. Under hypoxic culture conditions, we observed activation of our TOPflash β-catenin/TCF reporter construct when Epas1

iii expression was enforced, suggesting a plausible role for EPAS1 in modifying CRC susceptibility. In summary, our data validate our cross-species approach as an innovative strategy for uncovering novel candidate CRC susceptibility genes.

iv

Dedication

This document is dedicated to my parents, John and Rebecca Gerber, for their loving support and guidance. My amazing mom and dad have taught me the value of hard work,

optimism, and fearlessness in the face of adversity. They are truly my superheroes

without capes.

v

Acknowledgments

I would like to acknowledge my advisor, Dr. Amanda Toland, for the knowledge and training she has imparted over the four years I was fortunate to spend in her laboratory. It was an honor and a pleasure to be mentored by such an outstanding geneticist and role model for women in science.

I give sincere thanks to my committee members, Drs. Groden, Parvin, and Sadee, for their wisdom and guidance throughout my graduate education. Each member of my dissertation committee has provided invaluable insights that have been tremendously beneficial to my research and to my development as a scientist.

I would like to extend many heartfelt thanks to my friends, classmates, and co-workers for the support and expertise they have provided over the course of my graduate education. Each of the following individuals has contributed their friendship, support, wisdom, and scholarly advice: Jessica Fleming, Jessica Gillespie, Bill Hankey, Andrew

Kerns, Morgan Schrock, Mary Severin, Mia Tazi, and Linan Wang.

vi

My dear friend, Mia ―Precious‖ Tazi, encouraged me on a constant basis and was the best dissertation writing buddy anyone could ask for. I truly could not have written this document were it not for her positive energy and the comic relief she so wonderfully provided. I will forever cherish the memories of working on our dissertations together in the wee hours of the morning and our writing breaks spent laughing together over episodes of Parks and Recreation.

Finally, I would like to acknowledge my dear family for giving me the strength and determination to pursue a PhD. My parents, John and Rebecca, and siblings Matthew,

Blake, and Eleanor have been a source of tremendous inspiration to me. My aunt, Mary

Wiley, has been a role model and second mother to me since an early age. My grandmother, Margaret Wiley, possesses such a curious mind and a keen interest in science. Her curiosity and thirst for knowledge inspire the same in me. Lastly, I wish to acknowledge my late grandfather, Donald Wiley, Jr. Grandpa Wiley was my biggest cheerleader and is the bravest and wisest man I have ever known. All my love and thanks go out to those who have helped me achieve this dream.

vii

Vita

August 6, 1988……………………………...Born, St. Paul, MN

2006…………………………………………Mahtomedi Senior High School

2010 ...... B.A. Biology and Psychology, University of Minnesota, Morris

2010 to present ...... Graduate Research Associate, Department of Molecular Virology, Immunology & Medical Genetics, The Ohio State University

Publications

Gerber MM, Hampel H, Schulz NP, Fernandez S, Wei L, Zhou X-P, de la Chapelle A,

Toland AE. Evaluation of allele-specific somatic changes in genome-wide association study susceptibility alleles in human colorectal cancers. PLOS ONE 2012; 7(5): e37672.

Siekmann TE, Gerber MM, Toland AE. Variants in an Hdac9 intronic enhancer impact Twist1 expression. Submitted to International Journal of Cancer, In Revision,

2015.

viii

Gerber MM, Hampel H, Zhou XP, Deveci M, Catalyurek U, Schulz NP, Suhy A, de la

Chapelle A, Eward Toland A. Allele-specific imbalance mapping at human orthologs of mouse susceptibility to colon cancer (Scc) loci. Submitted to International Journal of

Cancer, In Revision, 2015.

Fields of Study

Major Field: Biomedical Sciences (Human Genetics and Cancer Biology Areas of Emphasis)

ix

Table of Contents

Abstract ...... ii

Dedication ...... v

Acknowledgments...... vi

Vita ...... viii

List of Tables ...... xvi

List of Figures ...... xviii

Chapter 1: Introduction ...... 1

1.1 Biology of the Colon and Rectum ...... 1

1.2 Colorectal Cancer: Cancer of the Colon or Rectum ...... 2

1.3 The Genetics behind Colorectal Cancer ...... 4

1.4 Strategies for Identifying Colorectal Cancer Susceptibility Genes ...... 6

1.5 Mouse Models for the Study of Colon Carcinogenesis...... 9

1.6 A Cross-Species, Systems Genetics Approach to Uncover Candidate Genes ...... 13

Chapter 2: Analysis of Allele-Specific Imbalance at Single Nucleotide Polymorphisms

Identified by Genome-Wide Association Studies of Colorectal Cancer ...... 19

2.1 Introduction ...... 19 x

2.2 Objectives and Hypothesis ...... 21

2.3 Materials and Methods ...... 22

2.3.1 Human Samples ...... 22

2.3.2 DNA Extraction ...... 23

2.3.3 Quantitative Genotyping...... 24

2.3.4 Analysis of Allele-Specific Imbalance ...... 26

2.3.5 Validation Studies...... 28

2.3.6 Compilation of Allele-Specific Imbalance Data from Multiple Studies ...... 28

2.3.7 Correlation Analysis of Allelic Imbalances and Age, Sex, and Tumor Stage .. 29

2.4 Results ...... 29

2.4.1 Discovery Set Genotyping ...... 29

2.4.2 Validation Set Genotyping ...... 30

2.4.3 Combined Genotyping Results from Discovery and Validation Sample Sets . 31

2.4.4 Compilation of Allelic Imbalance Data from Multiple Studies ...... 31

2.4.5 Correlation Analysis of Allelic Imbalances and Age, Sex, and Tumor Stage .. 32

2.5 Discussion ...... 33

2.6 Conclusions ...... 38

Chapter 3: Allele-Specific Imbalance Mapping of Human SCC4, SCC5, and SCC13 in

Colorectal Tumors ...... 62

xi

3.1 Introduction ...... 62

3.2 Objectives and Hypothesis ...... 64

3.3 Materials and Methods ...... 64

3.3.1 Mouse Samples ...... 64

3.3.2 RNA-Seq and Analysis of Next-Generation Sequencing Data ...... 65

3.3.3 Human Samples ...... 67

3.3.4 Choice of Tagging SNPs for study ...... 68

3.3.5 Sequenom® Quantitative Genotyping and R-ratio calculations ...... 68

3.3.6 Analysis of Allele-Specific Imbalance ...... 69

3.3.7 Validation Studies...... 69

3.4 Results ...... 70

3.4.1 RNA-Seq Analysis of CRC-Sensitive and CRC-Resistant Mouse Strains ...... 70

3.4.2 Array Comparative Genomic Hybridization Analysis of Human Colorectal

Tumors ...... 71

3.4.3 Sequenom® Allele-Specific Imbalance Mapping ...... 72

3.5 Discussion ...... 74

3.6 Conclusions ...... 78

Chapter 4: Functional Characterization of Hypoxia Inducible Factor 2α (Epas1) Using

Colon Epithelial Cell Lines ...... 82

xii

4.1 Introduction ...... 82

4.1.1 The Role of Hypoxia in Cancer ...... 82

4.1.2 The Function of EPAS1 (HIF2α) in Cancer ...... 83

4.1.3 Evidence from Our Studies Supporting Epas1 as a Candidate Susceptibility

Gene for Scc4 ...... 85

4.2 Objectives and Hypotheses ...... 85

4.3 Materials and Methods ...... 86

4.3.1 Real Time Quantitative PCR Analysis of Mouse Colon Specimens for Epas1,

Csnk1a1, and Ptprj Expression ...... 86

4.3.2 Sub-cloning Epas1 into the pEGFP-N1 Expression Vector ...... 87

4.3.3 siRNA Reagents ...... 88

4.3.4 TOPflash and FOPflash Luciferase Reporter Vectors ...... 89

4.3.5 Cell Lines and Culture Conditions ...... 89

4.3.6 Transfection Methods ...... 90

4.3.7 TOPflash/FOPflash Luciferase Assays ...... 91

4.3.8 SDS-PAGE and Western Blots...... 93

4.3.9 Antibodies for Western Blotting ...... 95

4.3.10 Real Time Quantitative PCR Analysis of β-catenin/TCF Transcriptional

Targets ...... 95

xiii

4.3.11 Real Time Quantitative PCR Analysis of Csnk1a1 and Ptprj Expression in

Epas1-Expressing Cells ...... 96

4.3.12 MTT Cell Viability Assay in Epas1-Expressing Fred Cells...... 97

4.4 Results ...... 98

4.4.1 Epas1 and Csnk1a1 are Differentially Expressed in Balb/cHeA and STS/A

Colon ...... 98

4.4.2 Epas1 Enhances β-catenin/TCF-Driven Transcription of the TOPflash

Luciferase Reporter Plasmid Under Hypoxic Conditions ...... 99

4.4.3 Epas1-GFP Expression does not Stimulate c- Expression in Normoxia 101

4.4.4 Epas1 Expression is Associated with Decreased Csnk1a1 and Ptprj Transcript

Levels in an Immortalized Mouse Colon Epithelial Cell Line ...... 102

4.4.5 Epas1-GFP does not Alter Proliferation or Viability in Immortalized Mouse

Colon Epithelial Cells Grown Under Normoxic Conditions ...... 102

4.5 Discussion ...... 103

4.5.1 Colon Transcript Levels of Epas1, Csnk1a1, and Ptprj are Lower in the CRC-

Sensitive Strain STS/A Compared to the CRC-Resistant Strain Balb/cHeA ...... 104

4.5.2 Epas1 Promotes β-catenin/TCF Transcriptional Activity in Human Colon

Adenocarcinoma Cells Under Hypoxic Conditions ...... 105

4.5.3 Epas1 does not Affect Expression of c-MYC in Normoxia, but may Regulate

Levels of Csnk1a1 (Scc5) and Ptprj (Scc1) ...... 108

xiv

4.5.4 Enforced Epas1-GFP Expression in Immortalized Mouse Colon Epithelial

Cells does not Induce Cell Proliferation in Normoxia ...... 109

4.6 Conclusions ...... 110

Chapter 5: Conclusions ...... 124

References ...... 128

Appendix A: Chapter 2 Supplemental Table ...... 140

Appendix B: Chapter 3 Supplemental Tables ...... 142

xv

List of Tables

Table 1. CRC risk-associated GWAS SNPs assessed for allele-specific imbalance in our samples ...... 40

Table 2. Analysis of allele-specific imbalance in discovery sample set ...... 42

Table 3. Analysis of allele-specific imbalance in discovery, validation, and combined sample sets ...... 44

Table 4. Combined analysis with published allele-specific imbalance studies...... 46

Table 5. Chromosomal losses and gains by aCGH at human arms housing

SCC4 (2p), SCC5 (5q), and SCC13 (4q, 7p) ...... 79

Table 6. SNPs showing evidence of ASI after replication ...... 81

Table 7. Real time qPCR primer/probe assays purchased from Integrated DNA

Technologies (Coralville, IA, USA) ...... 112

Table 8. Primers for Sequenom® MassARRAY quantitative genotyping of GWAS- identified SNPs ...... 141

Table 9. Primers for Sequenom® MassARRAY quantitative SNP genotyping ...... 188

Table 10. Genes showing differential expression between Balb/cHeA and STS/A by

RNA-Seq ...... 197

Table 11. Coding SNPs identified in Balb/cHeA and STS/A by RNA-seq ...... 200

Table 12. Allele-specific imbalance data for 525 SNPs in the discovery sample set .... 224

xvi

Table 13. Allele-specific imbalance data for 103 SNPs in the validation sample set .... 227

Table 14. Allele-specific imbalance data for 103 SNPs in the discovery sample set and validation sample set combined ...... 230

xvii

List of Figures

Figure 1. Gross anatomy of the colon and rectum ...... 15

Figure 2. A cross-section through the colon ...... 16

Figure 3. The subtypes of colorectal cancer ...... 17

Figure 4. Interacting Susceptibility to colon cancer (Scc) loci Scc1, Scc4, Scc5, and

Scc1348,50 ...... 18

Figure 5. Generation of standard curves for detecting allelic imbalance ...... 48

Figure 6. Box plots of the distribution of R-ratios for the GWAS SNPs ...... 56

Figure 7. Protein domain structures of the hypoxia-inducible factor subunits HIF1α,

HIF1β (ARNT), and HIF2α (EPAS1) ...... 113

Figure 8. Schematic detailing the regulation of HIFα subunits under normoxia and hypoxia ...... 114

Figure 9. Highlighted genes and cellular processes that are transcriptionally regulated by the HIF family ...... 115

Figure 10. Real time qPCR analysis of Epas1 (A), Csnk1a1 (B), and Ptprj (C) in the normal healthy colons of Balb/cHeA and STS/A female mice (n = 3 mice per strain) .. 116

Figure 11. Western blot of GFP-tagged Epas1 protien (Epas1-GFP) from transfected

SW480 cells ...... 117

xviii

Figure 12. TOPflash/FOPflash reporter assay reveals activation of β-catenin/TCF-driven transcription in SW480 (A) and HCT116 (B) cells cultured under hypoxic (1% O2) conditions ...... 118

Figure 13. Treatment with siEPAS1 dose-dependently reduces endogenous levels of

EPAS1 in HCT116 cells at 24 hours post-transfection under normoxic (~21% O2) conditions ...... 119

Figure 14. Normoxia (~21% O2) does not elicit EPAS1-dependent TOPflash/FOPflash reporter activity in HCT116 cells at 24 hours post-transfection with 5nM siControl or

5nM siEPAS1...... 120

Figure 15. The enforced expression of Epas1-GFP in SW480 cells does not alter transcription of the canonical Wnt target gene c-MYC under normoxic culture conditions

24 or 48 hours post-transfection ...... 121

Figure 16. Epas1-GFP expression in Fred cells under normoxic conditions is associated with reduced expression of Csnk1a1 (A) and Ptprj (B) ...... 122

Figure 17. Enforced expression of Epas1-GFP in immortalized C57Bl/6 colon epithelial cells (Fred) does not affect cell viability or proliferation under normoxic culture conditions as measured by MTT assay, 24 to 96 hours post-transfection ...... 123

xix

Chapter 1: Introduction

1.1 Biology of the Colon and Rectum

The colon and rectum are structures of the , which is the terminal portion of the gastrointestinal tract (Figure 1). The colon is a long tube that is separated into four portions: the ascending colon, the transverse colon, the descending colon, and the sigmoid colon. The ascending colon connects to the cecum in the lower right quadrant of the abdomen and ascends on the right side. At the inferior surface of the , the ascending colon turns left to form the right colic flexure. The colon continues across the abdomen as the transverse colon. The transverse colon curves beneath the at the left colic flexure and continues downward as the descending colon. The sigmoid colon projects toward the midline of the abdomen, follows an S-shaped curve and terminates at the rectum. The rectum constitutes the last 20 cm of the gastrointestinal tract and converges into the anal canal.1

A complex arrangement of tissues aids in the passage of digested food through the lumen of the colon. Moving outward from the lumen of the colon, the colonic wall consists of the mucosa, submucosa, muscularis, and serosa (Figure 2). The mucosa is composed of simple columnar epithelium, lamina propria, and muscularis mucosae. The submucosa consists of connective tissue and houses lymphatic and blood vessels. An external layer of longitudinal smooth muscle and an internal layer of circular smooth

1 muscle together compose the muscularis layer of the large intestine and facilitate the peristaltic mechanism required for transit of food matter. Finally, the serosa is the peritoneal covering of the external surface of the large intestine.1

The columnar epithelial cells of the mucosa layer perform the absorptive function of the colon. Columnar epithelial cells line the lumen of the colon and rectum and facilitate the absorption of water, ions, and vitamins from fecal matter as it transits through the digestive tract.1 The intestinal epithelium constitutes the largest mucosal surface in the human body, with ~400 square meters of surface area organized into crypts and villi.2 Pluripotent intestinal epithelial stem cells reside in the base of crypts, where they direct the proliferation and differentiation processes that facilitate renewal of the mucosal surface. Most cells in the intestinal lumen border are absorptive enterocytes, but secretory cell types such as Paneth cells, goblet cells, and enteroendocrine cells also play critical roles in the lubrication of the intestine, as well as contributing to the barrier and immune function of this organ.1

1.2 Colorectal Cancer: Cancer of the Colon or Rectum

Colorectal cancer (CRC) is the third most commonly diagnosed cancer in the

United States. The American Cancer Society estimates that nearly 137,000 Americans will be diagnosed with CRC in 2014 and that over 50,000 will die from this disease.3

Incidence and mortality rates have diminished in recent years, largely due to prevention and early detection through screening. According to the U.S. Preventive Services Task

Force, screening for CRC using fecal occult blood testing, sigmoidoscopy, or

2 colonoscopy is recommended beginning at age 50 years and continuing until age 75 years.4 However, only 59% of Americans age 50 and older report following the

3 recommended screening procedures for individuals in this age group.

Early detection is particularly useful for this disease, as the development of colorectal adenocarcinomas occurs over the course of one to two decades and early precancerous lesions such as polyps are visible by screening. As such, the early surgical removal of precancerous polyps can have remarkable benefit to the patient. Even after a polyp has progressed to cancer, survival rates can be quite promising for early-stage disease. When CRCs are detected at a localized stage, meaning there is no penetration of the colon or rectal wall into nearby tissues, survival rates are 90%. However, localized

CRCs rarely present with symptoms, so only 40% are detected at such an early stage.

Once the cancer has disseminated to regional lymph nodes or metastasized to distant

3 organs, survival declines to 70% and 13%, respectively.

Typically, colorectal malignancies begin as adenocarcinomas (neoplasias of glandular cells) in the epithelial crypt and progress to penetrating the muscularis mucosa and the lymphatic and blood vessel-rich submucosa. At this point, the adenocarcinomas may become a malignant adenocarcinoma. Once penetrated through the wall of the colon, the opportunity for malignant metastasis becomes a concern for the patient. The two most common sites for CRC metastasis are the liver and the .5

In general, for localized and regional CRCs, surgical resection of the affected segment of the colon or rectum is the standard treatment. Patients with regional CRCs that have infiltrated lymph nodes or invaded adjacent tissues will often be provided

3 adjuvant or radiation therapy. For those suffering from metastatic CRCs, surgery may be performed but is not recommended for all patients. Metastatic CRCs are often treated with chemotherapy, radiation therapy, or a combination of the two with the intent to alleviate symptoms and prolong life.3

1.3 The Genetics behind Colorectal Cancer

Colorectal cancers are typically grouped according to three patterns observed in human populations: sporadic, familial, and inherited (Figure 3). Nearly 70% of CRCs are sporadic cases, meaning there is no obvious inherited predisposition. These cases tend to occur in individuals over the age of 50 who have other risk factors.6 Familial cases, in which CRCs develop too frequently in families to be considered sporadic but which are not inherited in a pattern consistent with an inherited syndrome, comprise up to 25% of all cases of CRC.6 Inherited syndromes with single gene defects constitute a minority of cases (~2-6%).7 Familial adenomatous polyposis (FAP) is a dominant Mendelian syndrome in which patients inherit a mutated copy of the adenomatous polyposis coli gene (APC).7 Hereditary nonpolyposis colorectal cancer (HNPCC, or Lynch syndrome) is caused by inheritance of defective DNA mismatch repair genes (MLH1, MSH2, PMS2 and MSH6).7 Other inherited syndromes that predispose individuals to benign or malignant colon tumors include Peutz-Jeghers syndrome, juvenile polyposis, and MYH- associated polyposis.7

While the gene defects involved in inherited CRC syndromes are fairly well characterized, researchers lack a comprehensive understanding of the elements that

4 contribute to sporadic CRC risk in human populations. Colorectal cancer is a complex disease with many known risk factors. Researchers conducting twin studies have explored the heritable component of risk. According to such studies, the genetic contribution to risk is estimated at between 12 and 35%.8,9 This suggests that as much as a third of the risk for developing sporadic CRC is due to an individual‘s genetic constitution. Previously, it was believed that the remainder of disease risk is explained by such factors as environmental exposures and behavioral influences like diet and exercise, but a new study suggests that random occurring during normal stem cell divisions in intestinal crypts underlie much of the etiology for CRC.10

According to the common disease-common variant hypothesis, which postulates that common diseases (occurring in 1-5% of the human population) are caused by the combined inheritance of multiple common variants, the heritability of a complex disease like sporadic CRC could be attributed to inheritance of common low-penetrance alleles.11

To date, the identification of low-penetrance CRC susceptibility variants has largely been accomplished by population-based case-control genome-wide association studies

(GWAS). In these studies, frequencies of single nucleotide polymorphisms (SNPs) mapping throughout the entire are compared in large numbers of cases

(affected individuals) and controls (unaffected individuals) to detect SNP alleles that associate with disease risk. Presently, GWAS has been successfully used to identify over

40 independent low-penetrance variants that contribute to CRC risk. However, the proportion of genetic risk that these 40 variants explain is modest and incomplete.12 It has been proposed that the remainder of heritable risk comes from rare variants, copy number

5 variants, gene-gene interactions, gene-environment interactions and parent-of-origin effects.12,13 Overestimates of heritability are also a possible explanation for the missing heritability.13 This ―missing heritability‖ problem justifies the need for novel strategies to uncover heritable risk factors.

1.4 Strategies for Identifying Colorectal Cancer Susceptibility Genes

Many experimental strategies have been utilized in efforts to uncover susceptibility genes for complex traits. For our studies, we focused on the following methods: linkage analysis, association studies, array comparative genomic hybridization, and allele-specific imbalance mapping. Each of these strategies has advantages and disadvantages; thus, an integrated approach may present the best alternative to strictly adhering to one strategy.

Linkage studies involve genotyping families at polymorphic markers and assessing linkage regions for co-segregation with the affected phenotype.14,15 While this strategy is useful for detecting highly penetrant single-gene defects in Mendelian disorders, this strategy is not ideal for detecting low-penetrance alleles contributing to complex polygenic traits like colorectal cancer.15 Another limitation is the requirement for parents, siblings, and other extended family members. For diseases like sporadic CRC which predominantly occur in elderly patients for whom parents and/or siblings may be unavailable for testing, this represents a large hurdle to acquiring sufficient families to draw meaningful conclusions.

6

Association studies provide a powerful approach to identify low-penetrance alleles that cannot be detected using linkage analysis.15,16 These studies involve direct testing of genetic polymorphisms in large cohorts of cases and controls.16 In genome- wide association studies (GWAS), polymorphisms tagging for linkage regions spanning the entire genome are genotyped in the cases and controls. The frequency of each allele is compared in the cases and controls, and alleles that occur at statistically higher frequencies in cases versus controls are said to ―associate‖ with disease risk. Due to the vast numbers of polymorphisms tested, this experimental approach requires thousands of cases and controls to achieve genome-wide statistical power. In a meta-analysis of multiple colon cancer GWA studies from the United Kingdom, researchers calculated the cumulative risk associated with 10 robustly replicated SNPs. They estimated that approximately 6% of the full-sibling relative risk, corresponding to ~1.26% of the phenotypic variance in the liability scale, could be explained by these 10 common, independent low-penetrance CRC risk alleles.17 A more recent study which considered the contributions of 31 CRC risk loci but employed an alternative method of risk modeling generated a meta-analysis heritability estimate for these GWAS regions of just

1.14%.18 Collectively, GWAS efforts thus far have only successfully explained ~1-2% of risk for CRC, leaving much of genetic risk unaccounted for at this time.17,18

Another tool that has expanded our knowledge of genes contributing to cancerous growth is array comparative genomic hybridization (array-CGH, or aCGH). Array-CGH studies have been utilized to assess somatic copy number alterations in human colorectal tumors. Such studies have led to the identification of oncogenes and tumor suppressor

7 genes that play a role in the development and progression of this cancer.19,20 This method involves hybridizing fragmented DNA from patient tumor samples and a normal reference DNA to thousands of probes on a microarray. The tumor DNA and reference

DNA are differentially labeled with red and green fluorophores, respectively. The intensity of the two fluorescent labels is measured and compared to determine the presence of copy number changes at a given . Probes that emit a red signal coincide with regions that show genomic gains in the tumor DNA, whereas probes that emit a green signal coincide with regions showing genomic losses. Yellow signals indicate no change in the ratio of tumor to normal DNA. The relative intensity of the green and red signals allow for quantification of genomic gains or losses.21

While aCGH can detect copy number alterations with high sensitivity, it lacks the ability to discriminate the two alleles at a given locus. Thus, this method fails to detect allele-specific copy number alterations that generate allelic imbalance. Allelic imbalance occurs when there is complete loss of one allele of a pair or copy number gain of one allele relative to the other.22,23 Allelic imbalance can be detected by comparing the proportion of one allele to the other in DNA collected from an individual who is constitutively heterozygous at that locus.23 In a proportion of loci showing allelic imbalance, one specific allele may exhibit preferential copy number gain or loss compared to the other allele. Allele-specific imbalance (ASI), the preferential somatic gain or loss of one allele of a pair in neoplastic tissues, is a quantitative phenomenon that provides another means to assess regions of the genome for the presence of a susceptibility or resistance variant.24 In support of its utility, ASI has been observed

8 among 40% of mouse skin cancer susceptibility loci and at human variants associated with colon cancer risk by GWAS.25-28 For example, ASI has been reproducibly observed at the SNP rs6983267 at 8q24, a GWAS SNP that has been functionally investigated for its putative role as a regulatory SNP in an enhancer element that may influence expression of MYC or a long noncoding RNA named CARLo-5.26,27,29-32 Technologies such as next-generation sequencing and quantitative genotyping permit identification of relative gains or losses of alleles in tumor DNA samples compared to germline DNA from the same individual.33,34

Family-based linkage studies, GWAS, aCGH analysis, and ASI mapping each have contributed to our understanding of genes and variants that contribute to CRC risk.

However, it is unlikely that any of these methods individually can identify all the variants underlying disease risk. Furthermore, none have been able to satisfactorily identify gene- gene interactions, which likely contribute to the missing heritability for CRC.

1.5 Mouse Models for the Study of Colon Carcinogenesis

Mouse models have immense value in elucidating mechanisms of colon carcinogenesis and characterizing genes that contribute to susceptibility. Mice are particularly useful in laboratory research for their low maintenance cost, short gestation time, ease of breeding and genetic manipulation, and the high degree of genetic homogeneity available in inbred strains.35 Collectively, these features render the mouse an ideal for studying genetic susceptibility to complex disease.

9

Several different mouse strains have been genetically engineered to develop spontaneous intestinal tumors. Researchers have developed various transgenic mouse models with inactivated Apc.35,36 Depending on the deletion engineered in the Apc gene, mice will spontaneously develop fewer than 10 to greater than 300 adenomas throughout the intestinal tract.35 The most recognized Apc mutant mouse model is the ApcMin/+ mouse

(Multiple intestinal neoplasia), which carries a nonsense at residue

850 that generates a truncated protein product.37 The ApcMin/+ mouse develops an average of 30 tumors, most of which grow in the small intestine.35 By crossing ApcMin/+ mice with other inbred mouse strains and examining genomic regions that modified the Min mutation by decreasing adenoma formation and increasing longevity, researchers discovered a handful of modifiers, termed Modifier of Min, or Mom loci.38-43

In addition to inactivation of Apc, genetic mouse models have been produced with other tumor suppressors or oncogenes knocked-out or knocked-in. For example, mice with a stable, activated form of β-catenin developed multifocal dysplastic lesions akin to those found in patients with FAP. Lynch syndrome has also been modeled using transgenic mice harboring mutations in mismatch repair genes. However, these mice show variable susceptibility phenotypes and only marginally recapitulate the microsatellite instability observed in Lynch syndrome patients. Other mouse models with mutations in the transforming growth factor beta (Tgfb) pathway and Kras have also been explored for their effects on colon tumor formation.35

As an alternative to generating transgenic knock-in or knock-out mouse models, researchers have also capitalized on the availability of chemical carcinogens as a means

10 to induce intestinal tumors in rodent models. Just as human populations differ in genetic predisposition to CRC, mouse strains of diverse genetic backgrounds exhibit differential sensitivity to carcinogen-induced intestinal cancer.44 For example, the strain STS/A is highly susceptible to dimethylhydrazine- or azoxymethane-induced intestinal adenomas

(developing an average of 8-18 tumors per mouse), while Balb/cHeA is resistant (0 tumors per mouse).44 By utilizing cancer-resistant and cancer-susceptible mouse strains in linkage mapping experiments, researchers uncovered a number of quantitative trait loci

(QTLs) for colon cancer susceptibility (termed Susceptibility to colon cancer, or Scc, loci). Studies using recombinant congenic strains derived from Balb/cHeA and STS/A parental strains led to the identification of 15 Scc loci.45-50 Other mouse crosses were used to identify the Colon cancer susceptibility loci Ccs1 and Ccs2.51,52 Another research group performed interspecific backcrosses of the tumor-susceptible A/J strain and the tumor-resistant Spretus/EiJ strain to replicate existing Scc loci and to identify 6 novel loci, Scc16-21.53 In total, 23 unique Scc and Ccs QTLs have been mapped across 13 mouse by crossing mouse strains susceptible and resistant to chemically induced colon cancer.53 The relatively small modifier effects of these QTLs illustrate the difficulties of identifying allelic combinations contributing to cancer susceptibility in humans. Importantly, however, the genes that underlie susceptibility at these murine

QTLs may likewise play a role in human CRC susceptibility.

Linkage studies in mouse models also facilitate the investigation of genetic interactions. Numerous two-way interactions between susceptibility QTLs have been observed in the mouse, suggesting that networks of interacting, low-penetrance alleles

11 modify CRC susceptibility.48-50,53 The identification of genetic interactions is more straightforward in mice due the high degree of homogeneity among inbred mouse strains and the long stretches of linkage disequilibrium in the mouse crosses used for linkage.54

One colon cancer susceptibility locus, Scc1, has been mapped to mouse and refined to the candidate gene Ptprj (Figure 4).55 In humans, PTPRJ functions as a tumor suppressor by negatively regulating growth-promoting tyrosine kinases.

Single nucleotide polymorphisms in PTPRJ have been associated with susceptibility to

CRC in one study.56 Two Scc loci, Scc5 and Scc13, show synergistic interactions with the

Ptprj locus to enhance risk (Figure 4).50 The Scc5 locus also demonstrates a reciprocal interaction with Scc4, wherein the risk associated with the allele at Scc5 is dependent on the allele present at Scc4 (Figure 4).48 The genes responsible for risk at these loci are unknown but are intriguing candidates for study in the context of human colon cancer.

Importantly for their utility in studying human cancer genetics, mouse models of chemically induced CRC follow a similar multistage disease progression and develop tumors with many of the same mutations observed in human sporadic tumors.44 This similarity suggests that the genes controlling cancer susceptibility in mice will be relevant to humans.57 Furthermore, important biological sequences like gene coding regions and regulatory elements are conserved in evolution. and sequence information from mice can therefore be used as a filter for identifying genes that are strong candidates for a role in human tumorigenesis. Thus, cross-species analysis of cancer risk provides a powerful approach for uncovering novel susceptibility genes.

12

1.6 A Cross-Species, Systems Genetics Approach to Uncover Candidate Genes

Susceptibility to complex diseases like CRC is multi-factorial and dependent on gene-gene and gene-environment interactions. There is an increasing appreciation for the need to develop methods that consider the complex etiology of such diseases.

Experimental approaches that combine genetic data with molecular and physiological phenotype assessments will expand our knowledge of the genetic contribution to cancer susceptibility. A review by Quigley and Balmain highlights the unique advantages of augmenting linkage and association studies with other data sources, such as gene expression data, analysis of allele-specific somatic changes, and phenotypic assays.58 To uncover novel susceptibility alleles, Quigley and Balmain propose a switch to employing a ―systems genetics‖ approach, as opposed to depending on the diminishing returns of performing larger and larger association studies. They suggest that this integrated approach will better capture the contributions of gene-gene interactions and gene- environment interactions.

While systems genetics can be performed in human and mouse separately, a cross-species strategy is an innovative approach to unearthing the missing heritability for

CRC.54 Experimental mouse models of cancer have improved our understanding of how genetic alterations influence tumor development. They have also led to the discovery of cancer susceptibility genes (like Ptprj at the Scc1 locus) whose orthologs contribute to human risk.59 Although specific polymorphisms identified in a mouse model of cancer susceptibility may not be conserved in human populations, it is likely that the genes that contribute to mouse cancer susceptibility serve similar roles in human disease.58 The

13 projects described in this dissertation involve the integration of linkage data from highly controlled mouse colon cancer studies with array comparative genomic hybridization and allele-specific imbalance mapping in human colorectal tumors. This innovative approach has led to the discovery of novel genes and pathways that contribute to colon cancer susceptibility.

14

Figure 1. Gross anatomy of the colon and rectum Image borrowed from http://www.cedars-sinai.edu/Patients/Programs-and- Services/Colorectal-Cancer-Center/Services-and-Treatments/Sigmoid-Colectomy.aspx.

15

Figure 2. A cross-section through the colon Moving outward from the lumen, the colon wall is composed of the mucosa, the submucosa, the muscularis, and the serosa. Image borrowed from https://gi.jhsps.org/.

16

Sporadic Colorectal Cancer <1% ~2-3% <0.10% ~66% 10-30% Cases with Familial Risk

Hereditary Nonpolyposis Colorectal Cancer (Lynch Syndrome)

Familial Adenomatous Polyposis (FAP)

Hamartomatous Polyposis Syndromes

Figure 3. The subtypes of colorectal cancer This pie chart describes the contributions of each subtype of colorectal cancer (sporadic, familial, and inherited syndromes) to total incidence of this disease. Image adapted from Burt, RW. Colon cancer screening. Gastroenterolgy 2000; 119:837-853.

17

Figure 4. Interacting Susceptibility to colon cancer (Scc) loci Scc1, Scc4, Scc5, and Scc1348,50 This cartoon depicts mouse cancer susceptibility loci (demarcated by brackets) that statistically enhance risk when inherited in combination. Green arrows indicate synergistic interactions among loci. The red arrow indicates a reciprocal interaction between Scc4 and Scc5. The numbers below the arrows indicated interaction p-values.

18

Chapter 2: Analysis of Allele-Specific Imbalance at Single Nucleotide Polymorphisms

Identified by Genome-Wide Association Studies of Colorectal Cancer

2.1 Introduction

In the 1970s, geneticist Alfred Knudson made pioneering observations of children with inherited and sporadic retinoblastoma. He noticed that children with the inherited form of the disease tended to develop the cancer at an earlier age than did children with de novo disease. Furthermore, their tumors tended to arise bilaterally (in both eyes) and in greater multiplicity. Collectively, these analyses led Knudson to hypothesize that two hits to a tumor suppressor gene (now called RB1) are required to cause retinoblastoma. This two-hit hypothesis, often referred to as the Knudson hypothesis, paved the way for the identification of the first tumor suppressor genes.60

Many early efforts were directed at uncovering other tumor suppressor genes following the principles of Knudson‘s two-hit hypothesis. While genes do exist which adhere to the two-hit hypothesis and show genomic loss coupled with inactivating mutation, it is now apparent that mechanisms other than biallelic mutation can also work to disrupt these tumor suppressors. These include gene silencing by promoter methylation, haploinsufficiency of the gene product, or inheritance of a weakly functional or completely functional allele of the gene that predisposes to tumor development.61

19

While tumor suppressor genes are generally lost, silenced, or inactivated by mutations, the opposite holds true for their oncogenic counterparts. Oncogenes promote tumor initiation or progression. Canonically, they encode that control cell proliferation or apoptosis. Activation of oncogenes can occur due to mutation, structural alterations to chromosomes that generate gene fusions or placement of the oncogene under the transcriptional control of a hyperactive promoter or enhancer element, or by amplification within the genome. An activated mutation or activated copy of an oncogene is frequently selected for gain or amplification in tumors.62

Germline variation in tumor suppressor genes and oncogenes can influence patterns of genomic aberrations in tumors by conferring selective advantages to cells that amplify tumor-promoting alleles or lose tumor-suppressing alleles.55,63-65 If the common disease-common variant hypothesis holds true and genetic risk for cancer is due to the combined inheritance of many common alleles affecting tumor susceptibility or resistance, one may predict that selection pressures during the development of a tumor will produce detectable patterns of allelic loss or gain. Indeed, previous studies in mouse models show evidence that a subset of susceptibility loci for skin and colon cancer demonstrate strain-specific gains or losses consistent with these loci housing tumor suppressing alleles or tumor promoting alleles.25,55 Furthermore, allele-specific gains of a specific single nucleotide polymorphism (SNP) in the oncogene AURKA, rs2273535, have been observed in multiple studies of colorectal tumors.64,65 Preferential allelic gains or losses in multiple regions of the genome have been identified in genome-wide screens of individuals with multiple independent primary tumors and in genomic studies of

20 glioblastoma samples via the comparison of germline and somatic genotypes.63,66 These observations provide support for contribution of genetic background to the pattern of genetic changes occurring in tumors. In light of these findings, the exploration of allele- specific gains or losses in tumors represents a promising strategy for uncovering common alleles that modify cancer risk and for elucidating the relationship between germline variation and the development of sporadic cancer.61

2.2 Objectives and Hypothesis

Numerous genome-wide association studies (GWAS) have uncovered alleles associated with risk for CRC.67-74 The SNP rs6983267 mapping to 8q24 has been associated with both colorectal and prostate cancer risk at a genome-wide significance level.67,75,76 Allele-specific copy number analyses showed that the G allele (the putative risk allele) of rs6983267 shows preferential allelic gains in both colon tumors and myeloid leukemia.26,77 To our knowledge, no other SNPs from published GWAS literature have definitively and reproducibly shown allele-specific imbalance in colorectal tumors. In the study described here, we performed quantitative genotyping of 17 statistically significant variants from published GWAS (including rs6983267) in paired normal and colorectal tumor DNA. The goal of this study was to investigate these SNPs for relative somatic gains of the susceptibility allele or loss of the resistance allele using allelic imbalance analyses. We hypothesized that the allele present at a GWAS risk locus may influence the pattern of genomic alterations occurring in human colorectal tumors,

21 with risk alleles showing changes in relative copy number in tumor DNA compared to matched normal germline DNA.

2.3 Materials and Methods

2.3.1 Human Samples

The Ohio State University (OSU) Institutional Review Board approved this study.

All study participants provided written informed consent for use of their tissues in research.

For the discovery sample set, paired normal and tumor blocks were obtained through The Ohio State University Human Tissue Research Network and the Midwest

Cooperative Human Tissue Network. Tumors that exhibited microsatellite stability and/or stained positively for the Lynch syndrome proteins MSH2, MLH1, PMS2, and MSH6 by immunohistochemistry (IHC) were prioritized for inclusion in the study. When microsatellite or IHC data were unavailable, tumors that showed characteristics suggestive of Lynch syndrome such as right-sided location, poor differentiation, and a high percentage of mucin were excluded. After selection, confirmation of diagnosis and

DNA extraction, 194 sample pairs were available for study.

A validation set of 296 paired non-tumor/tumor DNA samples were obtained from two existing study collections. Samples from 196 individuals were acquired from a population-based study cohort of incident colon cancer diagnosed in the metropolitan

Columbus area.78,79 Blood DNA was available for all cases. An additional 100 fresh frozen paired normal and tumor tissue samples were obtained through the Cooperative

22

Human Tissue Network at The Ohio State University Medical Center. Specimens were snap frozen shortly after surgery and received anonymously along with a full pathology report. The 296 cases were all microsatellite stable; the set of 196 samples were stable by microsatellite instability testing, and the 100 fresh frozen tumors all showed intact mismatch repair proteins by IHC staining.

2.3.2 DNA Extraction

The discovery sample set was derived from blocks of archived formalin-fixed paraffin-embedded (FFPE) tissue containing both tumor and adjacent normal tissue.

Hematoxylin and eosin stains were prepared from these tissue blocks. A pathologist analyzed the stained slides to confirm diagnosis and to mark off regions of tumor cells within the tissue blocks. Tissue cores of 1.6mm diameter were prepared from marked regions consisting of 70% or more tumor cells for collection of tumor DNA, or from regions with normal histology for isolation of normal (non-tumor) DNA.

Genomic DNA was extracted from tissue cores by first removing paraffin via xylene and ethanol washes. Deparaffinized tissues were digested overnight at 55°C in lysis buffer with 3μg/μl proteinase K. Protein contaminants were removed by separating aqueous and organic phases using phenol:chloroform:isoamyl alcohol 25:24:1 reagent.

The nucleic acid-rich aqueous layer was treated with 7.5M ammonium acetate, ice-cold

100% ethanol, and glycogen to precipitate DNA. Pelleted DNA was washed with 75% ethanol, air dried, and resuspended in nuclease-free water. The DNA isolates were quantified and assessed for purity with a Nanodrop-1000 spectrophotometer.

23

DNA collection from samples in the validation sample set was performed as follows. Validation set DNAs from the 100 paired normal/tumor samples from the

Cooperative Human Tissue Network were isolated from the fresh frozen tissue by the same extraction protocol used for the discovery set samples, as were the tumor DNAs from the Columbus metropolitan area study (excluding the paraffin removal by xylene and ethanol treatment). The normal DNAs from the individuals of the Columbus metropolitan area study were isolated from blood samples in the OSU Human Genetics

Sample Bank by standard protocols. Normal DNAs from the three sources (FFPE archived tissue, fresh frozen tissue, and blood) exhibited similar frequencies of heterozygosity and similar A260/A280 ratios, suggesting comparable quality of DNA isolates across sample sources.

2.3.3 Quantitative Genotyping

To test our hypothesis that CRC susceptibility loci would show allele-specific somatic events in tumors, we searched the recent literature to identify variants showing evidence of CRC risk from GWA studies.67,68,71-73,80-85 Seventeen SNPs (rs6691170, rs6687758, rs10936599, rs16892766, rs6983267, rs7014346, rs719725, rs3802842, rs11169552, rs7136702, rs4444235, rs4779584, rs9929218, rs4939827, rs10411210, rs961253, rs4925386) meeting or approaching genome-wide significance (p-value < 10-7) for colorectal cancer risk in published GWA studies were chosen for analysis of allele- specific imbalance in the discovery set of tumor/normal DNA pairs and are documented in Table 1. Other inclusion criteria for these SNPs included identification in Caucasian

24 populations and a sufficiently high documented minor allele frequency (MAF > 20%) for identification of enough heterozygotes for statistical power. The SNP rs16892766 was the only exception to this criterion, as it has a documented MAF of 7%. The SNP rs4925386 was eliminated after genotyping in the original sample set due to a failure rate greater than 15%.

Multiplexed primers for PCR amplification and allele-specific single base extension reactions were designed using the Sequenom® MassARRAY Assay Design 3.1 software (Appendix A, Table 8). Mass spectrometry-based genotyping of 20ng of tumor and paired normal DNA was performed using Sequenom® MassARRAY iPlex Gold

(Sequenom Inc., San Diego, CA, USA) according to the manufacturer‘s protocol. Briefly, multiplexed polymerase chain reaction (PCR) was used to amplify ~100 amplicons containing each interrogated SNP. The amplification primers include a 10-mer tag (5‘-ACGTTGGATG-3‘) on the 5‘ end (Appendix A, Table 8). For each DNA sample, the multiplexed PCR amplification step generated a pool of amplicons, each containing a different SNP to be genotyped. Next, the amplified samples were treated with shrimp alkaline phosphatase (SAP) to neutralize residual unincorporated deoxynucleotides

(dNTPs) from the initial amplification step. In the third step of the protocol, SAP-treated products were prepared for a multiplexed, single-base extension PCR. In this step, products underwent a PCR with mass-labeled terminating dideoxynucleotides (ddNTPs) and extension primers designed to hybridize just one nucleotide next to the SNP of interest (Appendix A, Table 8). Thus, the terminating ddNTP was added at the SNP position. Finally, resin and water were added to each well of the plate and the iPLEX

25

Gold reaction products were nanodispensed onto a SpectroCHIP array and processed using the MassARRAY mass spectrometer. Spectra were analyzed using TyperAnalyzer

Software and data were exported in an excel file. Each 384-well Sequenom® plate included four negative template controls (dH2O), two samples tested in duplicate, and four positive control DNAs.

We assessed the validity of Sequenom® MassARRAY quantitative genotyping as a sensitive means of detecting allelic imbalance by generating log-transformed N-ratios

(N-ratio = normal allele 1 peak area/normal allele 2 peak area) for DNA mixtures of known homozygote DNA samples representing 0, 20, 40, 50, 60, 80, and 100% allelic contributions. We did not have appropriate homozygous DNAs for three of the SNPs so these were not evaluated. The majority of slopes and R-values for these were very close to standard curves for ―perfect data‖ suggesting a high degree of sensitivity for our method of detecting allelic deviations from 50% (Figure 5).

2.3.4 Analysis of Allele-Specific Imbalance

The Sequenom® TyperAnalyzer software quantifies the area under each of the allele peaks and assigns either a heterozygous or homozygous call to the SNP. By calculating the ratio of the two alleles, samples are assigned heterozygous or homozygous calls at that SNP. For all SNPs tested, we scored allelic imbalance by calculating the R- ratio for each DNA pair. We defined the R-ratio as the ratio of the two allele peak areas in the normal DNA divided by the ratio of the two allele peak areas in the paired tumor

DNA (R-ratio = Normal(allele 1/allele 2) / Tumor(allele 1/allele 2)). Samples were scored as having

26 imbalance, defined as the relative loss of either the first or second allele in the tumor sample, if the R-ratio was greater than 1.5 or less than 0.67, respectively. The R-ratio thresholds used to determine imbalance have been described previously in the literature.63,86-91

A chi-squared test (df = 1) was used to assess the observed imbalances for statistically significant deviation from the expected 50:50 distribution of allelic imbalances, thus indicating preferential, or allele-specific, imbalance at that SNP locus.

In cases in which a tumor was heterozygous for a SNP by genotyping but the paired normal sample failed to genotype, an average of the two normal alleles for heterozygous normal samples at that SNP was used in place of the failed normal sample to calculate an

R-ratio. SNPs with p-value < 0.10 were considered suggestive of preferential allelic imbalance and were therefore subjected to testing in the validation sample set to rule out false positives. Bonferroni correction was used to adjust for the number of statistical tests

(n = 16).

In addition to qualitative determination of imbalance, we generated box plots of the distribution of R-ratios for each SNP for samples showing relative loss of allele 1, samples showing relative loss of allele 2, and samples showing no allelic imbalance

(Figure 6). Samples were excluded from the plots if they had an R-ratio of greater than 10 or if an R-ratio could not be calculated because one of the two alleles in the tumor sample had an allele peak area value of 0.

27

2.3.5 Validation Studies

Following statistical analysis of allele-specific imbalance in the discovery sample set, three variants with p-values < 0.1 (rs16892766, rs6983267 and rs7136702) were genotyped by Sequenom® MassARRAY iPlex Gold in a replication sample set of 296 paired normal/tumor DNAs. The same quantitative genotyping protocol and statistical analyses used for the discovery sample set were employed with the validation sample set.

Bonferroni correction was used to adjust for the number of statistical tests (n = 3).

2.3.6 Compilation of Allele-Specific Imbalance Data from Multiple Studies

Allele-specific imbalance analyses have previously been performed on seven of the GWAS SNPs tested in the present study.26,27 These studies employed manual measurement of sequencing chromatogram peaks for tumor and normal DNAs to calculate R-ratios. Both published studies utilized R-ratio cutoff values of <0.60 and

>1.67 for allelic imbalance analysis. For both previously published studies, tumor DNA was isolated from fresh frozen colon tumors, and blood was used as the source of normal

DNA.26,27 In order to test the seven variants that overlapped with our study, we combined the data from the published studies with our allele-specific imbalance results for rs6983267, rs961253, rs3802842, rs10411210, rs4444235, rs4779584, and rs9929218.

We combined our numbers of relative allelic losses with the numbers from the published studies and performed a chi-squared test with Bonferroni correction (n = 7) to determine the statistical significance of the combined imbalances.

28

2.3.7 Correlation Analysis of Allelic Imbalances and Age, Sex, and Tumor Stage

For each SNP successfully assessed for allelic imbalance, we investigated the association between the presence of allelic imbalance and age of diagnosis, sex, and tumor stage of the patient. Chi-squared statistical test was used to detect association between allelic imbalance and sex. Fisher exact statistical test was used to detect association between allelic imbalance and tumor stage. For tumor stage, we classified tumors as TNM stage I–IV according to available tumor size, nodal spread, and metastasis information. The sample t-test was used to compare the average age of patients whose tumors showed allelic imbalance to that of patients whose tumors maintained heterozygosity. Correlations with corrected p-values < 0.05 were deemed statistically significant.

2.4 Results

2.4.1 Discovery Set Genotyping

To determine if any of the 17 CRC-associated SNPs show evidence of allele- specific imbalance, we genotyped them in 194 normal/tumor DNA pairs. All but one

SNP, rs4925386, were successfully genotyped in greater than 85% of samples in the discovery set. Due to a high rate of genotyping failures (24% of samples), rs4925386 was excluded from further analysis. The number of heterozygous normal DNAs identified for each SNP (for which the paired tumor DNA was also successfully genotyped) ranged from 27 to 84 of the 194 samples (14–43% heterozygosity; Table 2). The frequency of

29 overall relative allele loss (for both risk and non-risk alleles combined) ranged from 2% to 44%. While none of the SNPs reached statistical significance for allele-specific imbalance at α = 0.05, three SNPs (rs16892766, rs6983267, rs7136702) showed a trend for allele-specific imbalance (p-values < 0.10) prior to Bonferroni correction for multiple comparison (n = 16). The SNP rs6983267 showed higher frequencies of relative loss of the non-risk T allele compared to the risk G allele. Interestingly, rs16892766 and rs7136702 both demonstrated higher frequencies of relative loss of the risk allele compared to the non-risk allele in the discovery set tumors. The variants rs16892766, rs6983267 and rs7136702 were prioritized for validation in a second set of samples.

2.4.2 Validation Set Genotyping

The SNPs rs16892766, rs6983267 and rs7136702, which all showed evidence of allele-specific imbalance in the original discovery set, were further tested in the validation sample set of 296 normal/tumor DNA pairs. As with the discovery set, these three SNPs were successfully genotyped in greater than 85% of the validation samples.

Twenty-two percent of the heterozygotes in the validation set demonstrated relative loss of one rs6983267 allele, which was a lower frequency of overall relative allele loss than that observed in the original discovery set (30%; Table 3). Similarly, a lower frequency of heterozygous samples in the validation set showed relative loss of an allele of rs7136702 (11%) compared to the discovery set (23%; Table 3) and of rs16892766 (16% compared to the original discovery set 26%; Table 3). Despite the lower frequency of allelic loss, rs6983267 showed a tendency towards statistically significant preferential

30 allelic imbalance (p-value = 0.06), favoring relative loss of the non-risk T allele and relative retention of the risk G allele in the validation sample set. However, neither rs7136702 nor rs16892766 demonstrated a statistically significant tendency towards preferential allelic imbalance in the validation sample set (p-values = 0.59 and 1.00, respectively).

2.4.3 Combined Genotyping Results from Discovery and Validation Sample Sets

When genotyping data from the discovery set and validation set were combined,

48 of 192 heterozygous samples (25%) showed relative loss of an allele of rs6983267

(Table 3). For the SNP rs7136702, 31 of 208 combined heterozygotes showed relative loss of either allele (15%). When genotypes from the discovery set and validation set were combined for rs16892766, 13 of 65 heterozygotes (20%) showed allelic loss. By pooled analysis rs6983267 showed strong statistical evidence of preferential allelic imbalance (p-value = 0.01). After Bonferroni correction for multiple comparisons testing

(n = 3), rs6983267 maintained a statistically significant adjusted p-value of 0.03. In contrast, both rs16892766 and rs7136702 failed to show any tendency towards significant allele-specific imbalance by combined analysis (unadjusted p-values = 0.17 and 0.37, respectively).

2.4.4 Compilation of Allelic Imbalance Data from Multiple Studies

Because others have published allele-specific imbalance data on seven variants from our study, we decided to perform combined analysis of the present study and the

31 previously published studies to increase the power of identifying SNPs demonstrating allele-specific imbalance.26,27 When the imbalances observed in our samples at the SNPs rs6983267, rs961253, rs3802842, rs10411210, rs4444235, rs4779584, and rs9929218 were combined with those published previously, we observed a highly significant relative loss of the non-risk T allele of rs6983267 (p-value = 2.94×10−5).26,27 After Bonferroni correction (n = 7), the preferential relative loss of the T allele of rs6983267 maintained a highly significant p-value of 2.06×10−4. None of the other variants showed statistically significant evidence of preferential allelic imbalance (Table 4).

2.4.5 Correlation Analysis of Allelic Imbalances and Age, Sex, and Tumor Stage

To test whether samples showing allelic imbalance for the GWAS SNPs had different clinical characteristics compared to samples not showing imbalance, we performed a correlation analysis of imbalance with age, sex and tumor stage using data from our discovery sample set. The presence of allelic imbalance was significantly associated with tumor stage for rs719725 (unadjusted p-value = 0.0098), and significantly associated with younger age for rs7014346 (unadjusted p-value = 0.033).

However, after adjusting for multiple comparisons (n = 16), there was no significant association between the presence of allelic imbalance and age, sex, and tumor stage

(adjusted p-values > 0.05) for any of the tested SNPs.

32

2.5 Discussion

In this study, we investigated 16 SNPs previously associated with CRC risk for allele-specific imbalance using the Sequenom® MassARRAY iPLEX Gold genotyping platform. While none of the 16 tested SNPs showed statistically significant evidence (p- value < 0.05) of preferential allelic imbalance in our discovery sample set, the SNP rs6983267 demonstrated a tendency towards statistically significant relative loss of the non-risk T allele and relative retention of the risk G allele in both the original discovery set and the validation sample set (p-values = 0.07 and 0.06, respectively; Tables 3 and 4).

This is consistent with previously published reports.26,27,77 Because 8q24 show frequent gains in CRCs, we propose that the risk G allele of rs6983267 shows preferential allelic gain in tumors compared to the non-risk T allele. Interestingly, despite being in high linkage disequilibrium with rs6983267 at 8q24 (D′ = 0.99)67,71, rs7014346 did not show evidence of preferential allelic imbalance (p-value = 0.53) in the discovery sample set. In the largest previous study to assess allelic imbalance for rs6983267, 466 heterozygous tumors from Finnish CRC patients were successfully evaluated using manual measurement of Sanger sequencing chromatograms and 101 of these heterozygous samples (22%) showed allelic imbalance.26 Among these 101 samples, there were significantly (p-value = 0.0007) more tumors showing relative loss of the T allele (66% of tumors) versus relative loss of the G allele (34% of tumors).

From our discovery and validation sets combined, we evaluated tumors from individuals heterozygous for the rs6983267 variant, and 48 (25%) of these heterozygotes showed allelic imbalance. We observed a nearly identical percentage of tumors showing

33 relative loss of the T allele (33 of 48; 69%) versus the G allele (15 of 48; 31%) to that observed by Tuupanen et al.26 Our observation of ASI at rs6983267 was significant even after adjusting for multiple comparisons testing (p-value = 0.03; Table 3). Thus, our data support the observation of preferential allelic imbalance for rs6983267 and validate our experimental method. Furthermore, when we combined our data with that of Tuupanen et al., we observed a highly significant relative loss of the T allele and relative gain of the G allele that withstood multiple comparisons testing (p-value = 2.06×10−4; Table 4).26

The rs6983267 SNP is an important example of how variants showing allele- specific somatic copy number changes can be characterized for their functional significance in colorectal cancer, and therefore warrants further discussion. In an effort to elucidate the role rs6983267 plays in mediating CRC risk, numerous groups have investigated the functional effects of this intergenic SNP which lies ~14.5kb upstream from POU5F1B, ~180kb upstream of the long noncoding RNA CARLo-5, and ~335kb upstream from MYC. Because of its relatively close proximity to MYC, an oncogene that is frequently amplified or upregulated in colorectal tumors, researchers have been particularly interested in the effect of rs6983267 on MYC amplification or expression.

Cicek et al. performed genotype-phenotype analyses and concluded that rs6983267 does not modulate CRC risk by effects on c-MYC amplification or by chromosomal fragility.92

Experiments performed in cell culture systems by Tuupanen et al. supported the potential regulatory role of rs6983267, as these researchers demonstrated that this SNP may exert long-range regulatory effects on MYC via differential binding affinity to the Wnt- regulated TCF4.31 Further work is warranted to better elucidate the

34 effect of rs6983267 on MYC expression, and ultimately to assess the potential use of rs6983267 as a biomarker for CRC risk.

Importantly, our finding that the risk G allele of rs6983267 may be selectively retained or gained in colorectal tumors is consistent with previous ASI studies and functional analyses of this SNP. Additionally, these data confirm that allele-specific imbalance does occur for CRC susceptibility loci, albeit at a low frequency. In a recent study, Niittymäki et al. investigated seven low-penetrance CRC susceptibility loci for somatic allelic imbalance.27 The loci-tagging SNPs rs4779584, rs3802842, rs4444235, rs9929218, rs10411210, and rs961253 that were genotyped in our study were among the seven variants tested for allele specific imbalance in the study by Niittymäki and colleagues.27

While none of these SNPs showed evidence of preferential allelic imbalance in the combined analysis with our data, one of these SNPs (rs961253) demonstrated similar allelic imbalance trends as those observed in our discovery sample set, with rs961253 showing more frequent relative loss of the A allele in both studies (Table 4). Rates of heterozygosity and imbalance were very similar between the two studies with the exception of our study showing a higher degree of allelic imbalance for rs4779584. A combined analysis of our data and the data from Niittymäki et al. for the six variants in common did not reveal any SNPs with evidence of allele-specific imbalance.27 A caveat to combining data from the present study with that from published data sets is that the percentage of tumor cells in the samples as well as genotyping methods and R-value cutoffs for determining allelic imbalance differ across studies. Nonetheless, our study

35 reproduces the finding that these six loci-tagging SNPs show no evidence for preferential allelic imbalance in predominantly Caucasian study populations. However, it is important to note that statistical power may be limited in studies which interrogate fewer than 100 heterozygotes for ASI, such as ours and that of Niittymäki et al.27 As such, these studies may have been underpowered to detect allele-specific imbalance and larger cohorts of samples may help reveal whether the role of these variants resembles that of a tumor suppressor or an oncogene.

Although only one of the SNPs tested in the present study showed strong evidence of preferential allelic imbalance, the other SNPs may play a role in germline predisposition for CRC independent of somatic events in the tumor. It has been proposed that these SNPs influence the development of neoplasms but do not affect subsequent somatic neoplastic progression and therefore do not confer a selective advantage when lost or gained once the tumor has already been initiated.27 The functional SNPs at the

GWAS-identified loci may influence neoplastic development by modifying gene expression, methylation, or splicing patterns in such a way that selection at the DNA level is not required during tumorigenesis. These SNPs could also impact non-tumor cells, such as stromal or immune cells to modify cancer risk, but be independent of the cancer cells themselves. Alternatively, somatic copy number changes at risk alleles may occur later in the progression of disease and therefore be a later event in clonal expansion. If this is the case, perhaps a lower fraction of cancer cells contain the genomic alteration than we have the ability to detect. Once the mechanism by which these variants

36 act to confer risk is better understood, we may be able to deduce which variants are more likely to show selection in tumors.

Inherent limitations in our study design could further mask existing preferential allelic selection. First, it is possible that normal cells were isolated with tumor cells in the tumor tissue cores from which DNA was extracted for analysis. Despite initial selection of regions of the tumor containing 70% or greater tumor cells, some normal DNA contamination of the tumor DNA sample could bias the sample towards showing no imbalance. However, our histological examination of the tissue samples should minimize the possibility of normal DNA contamination. Similarly, our histologically normal samples from FFPE colon tissue may not be normal and may contain similar somatic mutations as the tumor, which could result in a general ―undercalling‖ of tumors with imbalance. Whenever possible the normal colon tissue was collected from sites distant from the tumor. Second, we employed conservative data inclusion practices by discounting aggressive genotype calls made by the Sequenom® MassARRAY iPLEX software and by instilling R-ratio cutoffs of >1.5 and <0.67 for determination of allelic imbalance. Our rigorous requirements for inclusion of data may limit detection of borderline significant allelic imbalance, particularly in tumor samples containing non- tumor cells. Furthermore, if tumors are heterogeneous for allelic loss we may not detect imbalances in that sample. Third, our discovery sample set was limited to 194 normal/tumor DNA pairs and may have lacked statistical power for detection of preferential allelic selection in loci showing lower levels of heterozygosity or less frequent genomic aberration. Based on mouse data showing that about 40% of

37 susceptibility loci demonstrate preferential allelic imbalance, we did not expect all SNPs identified through GWA studies to show preferential allelic selection in tumors.25

However, our results are surprising in that only one SNP, rs6983267, showed a trend towards somatic selection in the colon tumors. These results may indicate differences between species, differences between colon and skin tumors, or may be the result of the discussed study limitations.

2.6 Conclusions

In conclusion, our results suggest that the majority of variants identified as colon cancer susceptibility alleles through GWAS do not exhibit somatic allele-specific imbalance in colon tumors. However, our data confirm previously published results showing allele-specific imbalance for rs6983267. These results indicate that somatic allele-specific imbalance of cancer susceptibility alleles may not be a common phenomenon in colon cancer, but that for a small percentage of loci (1 of 16, or 6%, observed in the present study) somatic selection of specific alleles may be driving tumorigenesis.

38

Table 1. CRC risk-associated GWAS SNPs assessed for allele-specific imbalance in our

samples

*Position by UCSC Genome Browser (Human Feb. 2009 GRCh37/hg19 Assembly). †MAF = Minor Allele Frequency, as listed on NCBI dbSNP. §Odds Ratio from first listed GWAS reference (95% confidence interval). NA = not available. ‡P-value from first listed GWAS reference. ∞SNP positions in modest linkage disequilibrium (D′ = 0.76).73 #SNP positions in modest linkage disequilibrium (D′ = 0.71).73 ΔSNP positions in high linkage disequilibrium (D′ = 0.99).67

39

dbSNP GWAS SNP Position* Gene(s)/Locus Genotype OR (95% CI)§ P-value‡ MAF† Reference(s) rs10411210 chr19:33,532,300 RHPN2 CT T=0.21 0.87 (0.83–0.91) 4.6×10−9 68 rs10936599 chr3:169,492,101 MYNN CT T=0.30 0.93 (0.91–0.96) 3.39×10−8 73 rs11169552 chr12:51,155,663∞ ATF1, DIP2B CT T=0.24 0.92 (0.90–0.95) 1.89×10−10 73 rs16892766 chr8:117,630,683 EIF3H AC C=0.07 1.43 (1.13–1.82) 3.3×10−18 72,85 rs3802842 chr11:111,171,709 C11orf93 AC C=0.31 1.11 (1.08–1.15) 5.8×10−10 71,85 rs4444235 chr14:54,410,919 BMP4 TC C=0.44 1.11 (1.08–1.15) 8.1×10−10 68,85 SCG5, rs4779584 chr15:32,994,756 CT T=0.46 1.70 (1.35–2.14) 4.7×10−7 72,85 GREM1 rs4925386 chr20:60,921,044 LAMA5 CT T=0.41 0.93 (0.91–0.95) 1.89×10−10 73 rs4939827 chr18:46,453,463 SMAD7 CT T=0.38 0.85 (0.81–0.89) 1.0×10−12 72,80,71,85 rs6687758 chr1:222,164,948# Intergenic AG G=0.22 1.09 (1.06–1.12) 2.27×10−9 73 # −10 73 40 rs6691170 chr1:222,045,446 Intergenic GT T=0.26 1.06 (1.03–1.09) 9.55×10 rs6983267 chr8:128,413,305Δ Intergenic GT T=0.44 1.21 (1.15–1.27) 1.27×10−14 67,81,72,84,85 rs7014346 chr8:128,424,792Δ Intergenic GA A=0.33 1.19 (1.15-1.23) 8.6×10−26 71 LARP4, rs7136702 chr12:50,880,216∞ CT T=0.46 1.06 (1.04–1.08) 4.02×10−8 73 DIP2B rs719725 chr9:6,365,683 Intergenic AC C=0.33 1.13 (NA) 4.98×10−5 82, 81, 83 rs961253 chr20:6,404,281 BMP2 CA A=0.29 1.12 (1.08–1.16) 2.0×10−10 68,85 rs9929218 chr16:68,820,946 CDH1 GA A=0.25 0.91 (0.89–0.94) 1.2×10−8 68,85 Table 1. CRC risk-associated GWAS SNPs assessed for allele-specific imbalance in our samples

40

Table 2. Analysis of allele-specific imbalance in discovery sample set

*Risk Allele Lost refers to relative loss of the risk allele compared to the non-risk allele. Number in parentheses indicates percentage of total heterozygous samples showing relative loss of risk allele. †Non-risk Allele Lost refers to relative loss of the non-risk allele compared to the risk allele. Number in parentheses indicates percentage of total heterozygous samples showing relative loss of non-risk allele. §Total number of tumors with imbalance/total heterozygous samples (% of heterozygotes showing imbalance). ‡Chi-squared statistical test, df = 1. Unadjusted for multiple comparisons.

41

Risk Non-risk Non-risk Allele Unadjusted SNP Risk Allele Lost* Total Imbalance§ Allele Allele Lost† P-value‡ rs16892766 C A 6 (22%) 1 (4%) 7/27 (26%) 0.06 rs6983267 G T 6 (9%) 14 (21%) 20/67 (30%) 0.07 rs7136702 T C 12 (16%) 5 (7%) 17/75 (23%) 0.09 rs10936599 C T 6 (12%) 2 (4%) 8/49 (16%) 0.16 rs3802842 C A 9 (12%) 4 (5%) 13/75 (17%) 0.17 rs961253 A C 11 (15%) 6 (8%) 17/71 (24%) 0.23 rs6687758 G A 8 (15%) 4 (7%) 12/55 (22%) 0.25 rs4779584 T C 9 (17%) 14 (26%) 23/53 (43%) 0.3 rs4939827 T C 19 (24%) 15 (19%) 34/78 (44%) 0.49 rs7014346 A G 6 (7%) 4 (5%) 10/82 (12%) 0.53 rs9929218 G A 2 (3%) 1 (2%) 3/62 (5%) 0.56 rs10411210 C T 4 (11%) 3 (9%) 7/35 (20%) 0.71 42 rs4444235 C T 8 (10%) 7 (9%) 15/81 (19%) 0.8 rs719725 A C 8 (12%) 9 (14%) 17/65 (26%) 0.81 rs6691170 T G 5 (6%) 5 (6%) 10/79 (13%) 1 rs11169552 C T 1 (1%) 1 (1%) 2/84 (2%) 1 Table 2. Analysis of allele-specific imbalance in discovery sample set

42

Table 3. Analysis of allele-specific imbalance in discovery, validation, and combined

sample sets

*Risk Allele Lost refers to relative loss of risk allele compared to non-risk allele. Number in parentheses indicates percentage of total heterozygous samples showing relative loss of risk allele. †Non-risk Allele Lost refers to relative loss of non-risk allele compared to risk allele. Number in parentheses indicates percentage of total heterozygous samples showing relative loss of non-risk allele. §Total number of tumors with imbalance/total heterozygous samples (% of heterozygotes showing imbalance). ‡Chi-squared statistical test, df = 1. ∞Bonferroni correction for 16 multiple comparisons (original) or 3 multiple comparisons (validation, combined).

43

Risk Non-risk Risk Allele Non-risk Total Adjusted P- SNP Sample Set P-value‡ Allele Allele Lost* Allele Lost† Imbalance§ value∞ Discovery 12 (16%) 5 (7%) 17/75 (23%) 0.09 1 rs7136702 T C Validation 6 (5%) 8 (6%) 14/133 (11%) 0.59 1 Combined 18 (9%) 13 (6%) 31/208 (15%) 0.37 1 Discovery 6 (22%) 1 (4%) 7/27 (26%) 0.06 0.96

rs16892766 C A Validation 3 (8%) 3 (8%) 6/38 (16%) 1 1

Combined 9 (14%) 4 (6%) 13/65 (20%) 0.17 0.51 Discovery 6 (9%) 14 (21%) 20/67 (30%) 0.07 1

rs6983267 G T Validation 9 (7%) 19 (15%) 28/125 (22%) 0.06 0.18

44

Combined 15 (8%) 33 (17%) 48/192 (25%) 0.01 0.03 Table 3. Analysis of allele-specific imbalance in discovery, validation, and combined sample sets

44

Table 4. Combined analysis with published allele-specific imbalance studies

*Allelic imbalance data from Tuupanen et al.26 †Allelic imbalance data from Niittymäki et al.27 §Percentage indicates proportion of heterozygotes with allelic imbalance. ‡Chi-squared statistical test, df = 1. Unadjusted for multiple comparisons. ∞Bonferroni correction for 7 multiple comparisons.

45

Risk Allele Lost/Total Non-risk Allele

Hets§ Lost/Total Hets§

Tuupanen Total Tuupanen Total Non- Risk et al. * Gerber et Risk et al. * Gerber et Non-risk Adjusted SNP risk P-value‡ Allele Niittymäki al. Allele Niittymäki al. Allele P-value∞ Allele et al.† Lost et al.† Lost 34/466 15/192 67/466 33/192 rs6983267 G T 49 100 2.94×10−5 2.06×10−4 (7%)* (8%) (14%)* (17%) 16/88 11/71 11/88 6/71 rs961253 A C 27 17 0.13 0.92 (18%)† (15%) (13%)† (8%) 5/89 9/75 4/89 4/75 rs3802842 C A 14 8 0.2 1 (6%)† (12%) (4%)† (5%)

46

5/174 4/35 10/174 3/35 rs10411210 C T 9 13 0.39 1 (3%)† (11%) (6%)† (9%) 10/90 8/81 7/90 7/81 rs4444235 C T 18 14 0.48 1 (11%)† (10%) (8%)† (9%) 9/87 9/53 8/87 14/53 rs4779584 T C 18 22 0.53 1 (10%)† (17%) (9%)† (26%)

4/90 2/62 6/90 1/62 rs9929218 G A 6 7 0.78 1 (4%)† (3%) (7%)† (2%) Table 4. Combined analysis with published allele-specific imbalance studies

46

Figure 5. Generation of standard curves for detecting allelic imbalance

We assessed the validity of Sequenom® MassARRAY quantitative genotyping as a sensitive means of detecting allelic imbalance by generating log-transformed N-ratios (N- ratio = normal allele 1 peak area/normal allele 2 peak area) for DNA mixtures of known homozygote DNA samples representing 0, 20, 40, 50, 60, 80, and 100% allelic contributions. We did not have appropriate homozygous DNAs for rs11169552, rs4925386, and rs6691170 so these were not evaluated. The majority of slopes and R- values for these were very close to standard curves for ―perfect data‖ (R2 = 1) suggesting a high degree of sensitivity for our method of detecting allelic deviations from 50%.

47

rs10411210 y = -0.0536x + 4.389 R2 = 0.9508

4

3.5 3

2.5

\ 2

Ln(N-ratio) 1.5

1

0.5

0 0 10 20 30 40 50 60 70 80 90 % T Allele

y = -0.0391x + 1.9347 rs10936599 R2 = 0.9825

1.5

1

0.5

0

0 10 20 30 40 50 60 70 80 90 ln(N-ratio) -0.5

-1

-1.5 % T Allele

Figure 5. Generation of standard curves for detecting allelic imbalance

continued

48

Figure 5 continued

y = -0.0466x + 1.0667 rs16892766 R2 = 0.9859

0.5

0 0 10 20 30 40 50 60 70 80 90 -0.5

-1

-1.5 Ln(Nratio) -2

-2.5

-3 % A Allele

rs3802842 y = -0.0261x + 1.8949 R2 = 0.9978

1.6

1.4

1.2 1 0.8

0.6

ln(n-ratio) 0.4 0.2

0 0 10 20 30 40 50 60 70 80 90 -0.2 -0.4 % A Allele

continued 49

Figure 5 continued

y = -0.0272x + 1.9052 rs4444235 R2 = 0.9999

1.6

1.4

1.2 1 0.8

0.6

ln(n-ratio) 0.4

0.2

0 0 10 20 30 40 50 60 70 80 90 -0.2 -0.4 % T Allele

rs4779584 y = -0.0419x + 1.7211 R2 = 0.9977

1.5

1 0.5

0 0 10 20 30 40 50 60 70 80 90

-0.5 ln(Nratio) -1

-1.5 -2

% T allele

continued

50

Figure 5 continued

rs4939827 y = -0.0518x + 1.4993 R2 = 0.9815

1

0.5 0 0 10 20 30 40 50 60 70 80 90 -0.5

-1

Ln(N-ratio) -1.5

-2 -2.5

-3 %T allele

rs6687758 y = -0.0419x + 0.4404 R2 = 0.9753 0

0 10 20 30 40 50 60 70 80 90 -0.5

-1

-1.5

-2 Ln(N-ratio)

-2.5

-3

-3.5 % A Allele

continued

51

Figure 5 continued

y = -0.0322x + 1.4757 rs6983267 R2 = 0.979

1.5

1

0.5

0 0 10 20 30 40 50 60 70 80 90 ln(N-ratio) -0.5

-1

-1.5 % T allele

y = -0.0323x + 1.2967 rs7014346 2 R = 0.9929 1

0.5

0 0 10 20 30 40 50 60 70 80 90

ln(n-ratio) -0.5

-1

-1.5

% G Allele

continued

52

Figure 5 continued

y = -0.0391x + 1.4637 rs7136702 R2 = 0.9954

1

0.5

0 0 10 20 30 40 50 60 70 80 90 -0.5

ln(N-ratio) -1

-1.5

-2 % T Allele

rs719725 y = -0.0444x + 2.1488 R2 = 0.7811

1.5

1

0.5 0 0 10 20 30 40 50 60 70 80 90 -0.5

Ln(N-ratio) -1

-1.5

-2 -2.5 % A Allele

continued

53

Figure 5 continued

y = -0.0475x + 2.3465 rs961253 R2 = 0.956

2

1.5

1

0.5

0 0 10 20 30 40 50 60 70 80 90 ln(n-ratio) -0.5

-1

-1.5

-2 % A Allele

54

Figure 6. Box plots of the distribution of R-ratios for the GWAS SNPs

We generated box plots of the distribution of R-ratios for each SNP for samples showing relative loss of allele 1, samples showing relative loss of allele 2, and samples showing no allelic imbalance. Samples were excluded from the plots if they had an R-ratio of greater than 10 or if an R-ratio could not be calculated because one of the two alleles in the tumor sample had an allele peak area value of 0.

55

10411210 2.5

2

1.5

1 Corrected R Ratio

.5

0 Loss of C Loss of T No imbalance

10936599

3

2.5

2 1.5

Corrected R Ratio 1

.5

Loss of C Loss of T No imbalance

11169552

2.5

2

1.5 Corrected R Ratio

1

.5

Loss of C Loss of T No imbalance

Figure 6. Box plots of the distribution of R-ratios for the GWAS SNPs

continued

56

Figure 6 continued

16892766

3 2.5

2

1.5 Corrected R Ratio

1

.5

Loss of A Loss of C No imbalance

3802842 4

3

2

Corrected R Ratio 1

0 Loss of A Loss of C No imbalance

4444235

3

2

1 Corrected R Ratio

0

Loss of C Loss of T No imbalance

continued

57

Figure 6 continued

4779584

6

4

Corrected R Ratio 2

0 Loss of C Loss of T No imbalance

4939827

8

6

4

Corrected R Ratio 2

0 Loss of C Loss of T No imbalance

6687758 5

4

3

2

Corrected R Ratio 1

0

Loss of A Loss of G No imbalance

continued

58

Figure 6 continued

6691170 3

2

1 Corrected R Ratio

0

Loss of G Loss of T No imbalance

6983267 5

4

3

2 Corrected R Ratio

1

0 Loss of G Loss of T No imbalance

7014346

6

4

Corrected R Ratio 2

0

Loss of A Loss of G No imbalance

continued

59

Figure 6 continued

7136702

4

3

2

Corrected R Ratio 1

0

Loss of C Loss of T No imbalance

719725 5

4

3

2 Corrected R Ratio

1

0 Loss of A Loss of C No imbalance

961253 2

1.5

1

Corrected R Ratio .5

0

Loss of A Loss of C No imbalance

continued

60

Figure 6 continued

9929218

2

Corrected R Ratio

0

Loss of A Loss of G No imbalance

61

Chapter 3: Allele-Specific Imbalance Mapping of Human SCC4, SCC5, and SCC13 in

Colorectal Tumors

3.1 Introduction

Genetic and epigenetic alterations drive the transformation of normal colon epithelium into adenomas and ultimately malignant adenocarcinomas.93 Phenomena such as microsatellite instability (MSI), CpG island methylator phenotype (CIMP), and chromosomal instability (CIN) lead to genomic alterations and thus promote tumorigenesis.94 MSI leads to an accelerated mutation rate due to defective DNA mismatch repair and occurs in approximately 12% of CRCs.95 CIMP occurs in the presence of aberrant DNA hypermethylation, which can lead to epigenetic silencing of gene promoters containing CpG islands.96 CIN is marked by an accelerated rate of gains or losses of whole or partial chromosomes, leading to aneuploidy. Estimates suggest that as many as 80 to 85 percent of sporadic colorectal cancers demonstrate CIN.94 In CRC,

CIN is typically coupled with mutational activation of proto-oncogenes like KRAS, inactivation of tumor suppressor genes like APC and TP53, and loss of heterozygosity at

18q.94 By altering the ploidy of cells and changing gene dosage, CIN can lead to activation of oncogenes and inactivation of suppressor genes.

Array comparative genomic hybridization (aCGH) studies can be used to identify refined regions of somatic copy number alterations in human colorectal tumors. Such

62 studies have led to the identification of oncogenes and tumor suppressor genes that play a role in the development and progression of this cancer.19,20 One important limitation of this approach is the inability to detect allele-specific copy number alterations. Allelic imbalance arises when there is complete loss of one allele or copy number gain of one allele relative to the other by a mechanism such as CIN.22 This phenomenon can be detected by comparing the proportion of one allele to the other in cells from an individual who is constitutively heterozygous at that locus. In some loci showing allelic imbalance, one allele may exhibit preferential copy number gain or loss compared to the other allele.

This preferential, or allele-specific, imbalance (ASI) may implicate the presence of a susceptibility allele or risk allele at that genomic locus. ASI has been observed among

40% of mouse skin cancer susceptibility loci and at human variants associated with colon cancer risk by genome-wide association studies (GWAS).25-28 Technologies such as next- generation sequencing and quantitative genotyping permit identification of relative gains or losses of alleles in tumor DNA samples compared to germline DNA from the same individual.

Mouse models of chemically induced CRC follow a similar multistage disease progression and develop tumors with many of the same mutations observed in human

CIN tumors.44 This similarity suggests that the genes controlling cancer susceptibility in mice will be relevant to humans.57 Linkage studies in mouse models have uncovered numerous susceptibility loci for colon cancer. One of these loci, Scc1, maps to mouse chromosome 2. The candidate susceptibility gene Ptprj was discovered after fine mapping Scc1 by genotyping additional polymorphic markers.55 In humans, PTPRJ

63 functions as a tumor suppressor by negatively regulating growth-promoting receptor tyrosine kinases. Single nucleotide polymorphisms (SNPs) in PTPRJ have been associated with susceptibility to CRC in one study.56 Two additional murine Scc loci,

Scc5 and Scc13, show synergistic interactions with the Ptprj locus to enhance risk.50 The

Scc5 locus also demonstrates a reciprocal interaction with Scc4, wherein the risk associated with the allele at Scc5 is dependent on the allele present at Scc4.48 Importantly, the genes responsible for risk at these loci are unknown but are intriguing candidates for study in the context of human colon cancer.

3.2 Objectives and Hypothesis

In the present study, we aimed to identify candidate CRC susceptibility genes and variants within the interacting Scc4, Scc5, and Scc13 loci and their human orthologs. We employed a cross-species approach in which we integrated RNA-seq colon transcriptome data from the parental mouse lines used in the linkage studies with existing data from aCGH analysis and ASI mapping of human colon tumors.54 Here, we describe the results of using this approach to identify potential candidate genes for CRC susceptibility.

3.3 Materials and Methods

3.3.1 Mouse Samples

All studies were approved by The Ohio State University Institutional Animal Care and Use Committee. Snap-frozen large intestines from 4-5 week old age-matched female

Balb/cHeA and female STS/A mice were obtained from the Netherlands Cancer Institute.

64

Colon specimens were homogenized in 1 mL Ribozol (Amresco, Solon, OH, USA) using medium power for 15-second pulses, followed by incubation on ice. Total RNA was isolated according to the Ribozol manufacturer‘s protocol. The quantity and quality of the

RNA isolates were assessed by NanoDrop-1000 spectrophotometer and by Agilent

Bioanalyzer. The RNA samples used for library preparation had RNA integrity number greater than 8 to ensure high-quality RNA-seq reads.

3.3.2 RNA-Seq and Analysis of Next-Generation Sequencing Data

The Illumina TruSeq RNA sample preparation kit was used to generate mRNA libraries from 2µg of total RNA isolated from the Balb/cHeA and STS/A colon specimens according to manufacturer‘s recommended guidelines. Briefly, total RNA was selected for poly-A mRNA using poly-T oligo-attached magnetic beads. Poly-A-selected mRNA was next fragmented and primed for cDNA synthesis, reverse transcribed into first strand cDNA using SuperScript II reverse transcriptase (Life Technologies,

Carlsbad, CA, USA), and subsequently synthesized into double-stranded cDNA using

DNA polymerase I and RNase H supplied in the kit. Double-stranded cDNA fragments were subjected to an end repair process, the addition of a single ‗A‘ base, and ligation of adapters. Finally, double-stranded cDNA was size selected, multiplexed and sequenced on an Illumina Genome Analyzer IIX.

Partek Genomic Suite (Partek, Inc., St. Louis, MO, USA) was used for performing differential expression analysis, alternative-splicing analysis and SNP calling of the RNA-seq libraries. Each short sequence read was mapped to the corresponding

65 position of the mouse genome (mm9 assembly) and reads not meeting the quality threshold were discarded. When performing differential expression analysis on the transcript level, the log likelihood ratio for each transcript was calculated among the samples using the number of reads that mapped to the transcripts according to the recommended RNA-seq analysis procedure by Partek

(http://www.partek.com/Tutorials/microarray/User_Guides/RNASEQ.pdf). Next, p- values for each transcript were calculated via a chi-squared test. When performing the quantification, a contingency table with two rows representing the samples and as many columns as the number of isoforms for each gene was created. Each entry in the table was estimated using an expectation/maximization algorithm. Then p- values were calculated by performing chi-squared statistics on the contingency table using the log likelihood ratio.

On some occasions in the RNA-seq data, a base might not match the reference at the given position due to read errors, alignment errors or the presence of SNPs. Read errors were eliminated using the quality scores obtained by the sequencer. However, special care was taken when SNPs were identified and these were differentiated from potential alignment errors per Partek Genomic Suite recommendations

(http://www.partek.com/Tutorials/microarray/User_Guides/NGS_Genotype_Likelihoods. pdf). For a possible SNP position, the likelihood of each genotype was calculated using the frequency of the bases. Finally, the log-odd-ratios were calculated and reported for the genotype with the maximum likelihood ratio.

66

3.3.3 Human Samples

The Institutional Review Board at The Ohio State University approved these studies. Study participants provided written informed consent for use of their tissues in research. Samples and their preparation have been previously described.28 Samples, all from Ohio CRC cases, consisted of a discovery set of 194 colon tumor/normal DNA pairs and a validation set of 296 colon tumor/normal DNA pairs. A board certified pathologist confirmed all colon cancer diagnoses. To enrich for tumors showing CIN, samples were excluded if they exhibited features of Lynch syndrome or microsatellite instability.

Specifically, samples were excluded if they were absent for any of the mismatch repair proteins, demonstrated microsatellite instability, were from the proximal (right side) of the colon or were mucinous.

Sources of DNA for this study included formalin-fixed paraffin-embedded

(FFPE) tissue blocks, flash-frozen colon tissue, and blood samples. DNA isolation from

FFPE tissues was performed by extraction of tissue from paraffin by xylene and ethanol washes, digestion by proteinase K treatment in lysis buffer, and purification by phenol/chloroform extraction and ethanol precipitation. Flash-frozen colon tissue was processed in a similar fashion as the FFPE tissues, although the xylene and ethanol washes were omitted from the DNA isolation protocol. Blood-derived DNA samples were acquired from a human sample bank where they were isolated by standard protocols. All DNA samples were assessed with a NanoDrop-1000 spectrophotometer for quantity and quality. Between 10 and 20ng of DNA were used for each genotyping reaction.

67

3.3.4 Choice of Tagging SNPs for study

SNPs tagging for linkage disequilibrium blocks at an R-squared cutoff of 0.80 within candidate genes were selected for inclusion in this study. The International

HapMap Project ―Annotate Tag SNP Picker‖ tool was used to select tagging SNPs.

Haplotype tagging SNPs with a minor allele frequency of at least 20% in Caucasian individuals were prioritized for inclusion in the study, although in some cases a lower

MAF was tolerated. In total, 525 SNPs were assessed in the discovery sample set of human normal/tumor DNA pairs.

3.3.5 Sequenom® Quantitative Genotyping and R-ratio calculations

Multiplexed primers for PCR amplification and allele-specific single-base extension reactions were designed using the Sequenom® MassARRAY Assay Design 3.1 software (Appendix B, Table 9). Mass spectrometry-based genotyping of paired tumor and normal DNA was performed using Sequenom® MassARRAY iPlex Gold (Sequenom

Inc., San Diego, CA, USA) according to the manufacturer‘s protocol and as described in publications and in Chapter 2 of this document.28 Each 384-well Sequenom® plate included four negative template controls (dH2O), two samples tested in duplicate, and four positive control DNAs.

As described previously, for all SNPs tested we scored preferential allelic imbalance by calculating the R-ratio for each normal/tumor DNA pair.28,34,63 The R-ratio represents the ratio of the two allele peak areas measured by the Sequenom®

68

MassARRAY iPLEX software in the normal heterozygous DNA divided by the ratio of the two allele peak areas in the paired tumor DNA (R-ratio = Normal(allele 1/allele 2) /

Tumor(allele 1/allele 2)). For pairs in which the tumor was heterozygous for a SNP but the normal DNA from that individual failed to genotype, an average of the two normal alleles from all heterozygous normal samples at that SNP was used in place of the failed normal sample to calculate an R-ratio. Samples with R-ratio greater than 1.5 were deemed to have relative loss of the first allele (allele 1), while samples with R-ratio less than 0.67 were classified as showing relative loss of the second allele (allele 2).

3.3.6 Analysis of Allele-Specific Imbalance

The number of tumor samples from heterozygous individuals that showed relative loss of allele 1 (―allele 1 imbalance‖) was compared to the number of tumor samples showing relative loss of allele 2 (―allele 2 imbalance‖). A chi-squared test (df = 1) was used to assess the observed imbalances for statistically significant deviation from the expected 50:50 distribution of random allelic imbalances. SNPs with p-value < 0.10 were considered suggestive of preferential allelic imbalance and were therefore subjected to testing in the validation sample set to rule out false positives.

3.3.7 Validation Studies

Following statistical analysis of allele-specific imbalance in the discovery sample set, 103 variants with p-values < 0.10 were genotyped by Sequenom® MassARRAY iPlex Gold in a replication sample set of 296 paired normal/tumor DNA isolates. The

69 same quantitative genotyping protocol and statistical analyses used for the discovery sample set were employed with the validation sample set. Allele imbalance counts were combined for the discovery and validation sets and chi-squared analysis was conducted on the sum. Bonferroni correction was used to adjust for the number of statistical tests (n

= 103).

3.4 Results

3.4.1 RNA-Seq Analysis of CRC-Sensitive and CRC-Resistant Mouse Strains

The inbred mouse strains Balb/cHeA and STS/A differ in susceptibility to colon cancer when treated with the carcinogens 1,2-dimethyl-hydrazine (DMH) or azoxymethane (AOM).44,48 Balb/cHeA mice, the resistant strain, are reported to develop an average of 0.8-1.3 tumors per mouse while STS/A mice, the susceptible strain, develop an average of 8-18.4 tumors per mouse following DMH or AOM treatment, respectively.45,48,97,98 Linkage analysis performed using recombinant inbred strains of

Balb/cHeA and STS/A mice led to the identification of numerous quantitative trait loci

(termed Susceptibility to colon cancer, or Scc, loci) that are linked to colon tumor formation.46,48,49,55 As a screening method to identify potential candidate CRC susceptibility genes of interest from our loci of interest (Scc4, Scc5 and Scc13), we performed RNA-seq from normal colon tissue of one female mouse per strain.

To identify SNPs, expression, and splicing pattern differences between the strains, the RNA-seq data for STS/A and Balb/cHeA was analyzed for the genes mapping within the three Scc loci of interest. Among the 119 genes within Scc4, 74 transcripts

70 representing 55 genes showed different expression levels. At Scc5, 64 transcripts from 54 of the 137 annotated genes showed differential expression. Of the 185 genes at Scc13, 40 transcripts from 32 different genes showed differential expression. After eliminating genes that showed less than a 1.5-fold difference in expression and those that had very low expression (< 20 total reads in both strains), 95 genes expressed in the colon exhibited differential expression between the strains (Appendix B, Table 10).

In addition to expression differences, numerous coding SNPs were identified between Balb/cHeA and STS/A (Appendix B, Table 11). Non-synonymous variants were assessed for predicted disruption to protein structure and function using the in silico tools

SIFT and PolyPhen-2. None of the amino-acid changing SNPs that differed between

Balb/cHeA and STS/A were predicted to be damaging to protein structure or function.

These findings do not rule out the possibility that the amino acid substitutions could have a moderate effect on protein function that could lead to changes in cancer susceptibility.

3.4.2 Array Comparative Genomic Hybridization Analysis of Human Colorectal Tumors

To determine to what extent the human orthologous loci to Scc4, Scc5, and Scc13 show copy number gains or losses in tumors (suggesting that these loci could be informative in ASI studies), we evaluated copy number data for the orthologous loci to the mouse Scc regions from published human aCGH studies as well as from our own aCGH studies.99-102 As the exact coordinates of the aCGH data were not available for most of the published datasets, we looked at either whole chromosome arms or chromosome bands depending on the detail of the publicly available data. We observed a

71 range of copy number aberrations depending on the locus and the study (Table 5). Mouse

Scc13 correlates to syntenic segments on human 4q25 and 7p14. Among our cohort of samples, 7 of 67 tumors (10.4%) showed loss of at least 50% of bacterial artificial chromosomes (BACs) mapping to the 4q25 band. In other published aCGH datasets, losses occurred at frequencies varying from 0% to 35%. Conversely, at the 7p14 locus,

BACs exhibited gains in 16 of 67 tumors in our set (24%) and between 25% and 45% in other studies. In our samples, the BAC at 7p14 showing highest frequency of genomic gains in these tumors (55%, or 37 of 67 tumors), CTB-111H21, encompasses a genomic segment containing the gene SCRN1. As expected, 5q, where the APC tumor suppressor gene maps, shows frequent loss in CRCs across studies (up to 40%). Of particular note for the SCC5 locus, the BAC CTD-2202A14, which does not contain APC, is lost in 46% of our 67 tumors and contains several candidate genes from this study including the

PTPRJ substrate PDGFRB. SCC4 maps to human 2p25 which shows gains in 3-19% of tumors and losses in less than 10% of tumors.

3.4.3 Sequenom® Allele-Specific Imbalance Mapping

As we observed evidence of genomic copy number aberrations in greater than

15% of CRCs for most of the Scc equivalent regions, we next determined whether any of the genes showing sequence or expression differences between Balb/cHeA and STS/A exhibited ASI in human colon tumors. We chose genes for Sequenom® MassARRAY quantitative SNP genotyping in human tumors based largely on the mouse RNA-seq data, but we also included genes and/or SNPs from these loci that showed evidence in the

72 literature as being associated with any type of cancer, colon biology, and/or a suggestion of being associated with CRC risk from previous genome-wide association studies. From these criteria, we identified 81 genes and intergenic regions of interest. We performed quantitative genotyping of 525 haplotype-tagging SNPs in DNAs in our discovery set of

194 normal/colon tumor pairs. These corresponded to 103 SNPs mapping to 18 genes at

SCC4, 278 haplotype-tagging SNPs mapping to 34 genes in SCC5, and 144 SNPs from

29 genes at SCC13. Among these 525 SNPs, 74 SNPs showed evidence of ASI with a p- value of < 0.01, including 19 SNPs at SCC4, 28 variants at SCC5 and 27 polymorphisms at the SCC13 locus (Appendix B, Table 12). As this was our discovery set, we set a generous cutoff of p-value < 0.10 for selection for subsequent validation studies which resulted in 103 SNPs for further study (Appendix A, Tables 12-14).

For our validation set we evaluated the 103 SNPs in 296 normal/tumor pairs

(Appendix B, Table 13). When the validation data were combined with data from the discovery set, quantitative genotyping yielded 31 SNPs showing statistical significance with nominal p-values < 0.05 (Appendix B, Table 14). Two SNPs in the gene SNX10 at

SCC13 showed significant evidence of ASI after Bonferroni correction for multiple comparisons (n = 103) (Table 6). Genes with SNPs showing suggestive evidence of ASI

(adjusted p-values of < 0.2) include GRAMD3 and CEP120 at SCC5, EPAS1 at SCC4, and LANCL2 and SCRN1 at SCC13 (Appendix B, Table 14).

73

3.5 Discussion

Here we show that variants at loci orthologous to mouse Scc loci exhibit evidence of allele-specific imbalance in human colon tumors (Table 6). We identified two variants in the SNX10 gene and additional variants in GRAMD3, CEP120, LANCL2, SCRN1 and

EPAS1 that are candidates for colon cancer susceptibility based on data from the mice used in the original linkage analyses as well as our ASI studies (Table 6; Appendix B,

Table 14). None of these genes have been previously implicated in CRC risk by association studies, suggesting that our method may have the potential to uncover novel susceptibility genes.

The gene SNX10 is a member of the sorting nexin family. This family plays a role in endocytosis, endosome sorting, and endosome signaling.103,104 Mutations in SNX10 have been described in individuals diagnosed with osteopetrosis, but to date this gene has not been implicated in any cancers.105 Sorting nexins have been known to regulate the trafficking and signaling of such molecules as EGFR and PDGFR, both of which are substrates for PTPRJ.103 The PX domain of the sorting nexin protein binds to phosphatidylinositol-3-phosphate (PtdIns3P), which facilitates SNX protein localization to the membrane.103,106 It is possible that SNX10 binds to PtdIns3P and contributes to the endosomal trafficking of EGFR in colon epithelial cells, though this hypothesis has not been tested.

At the SCC5 locus, the genes CEP120 and GRAMD3 showed trends towards allele-specific imbalance. Centrosomal protein 120kDa (CEP120) has been shown to associate with CPAP in the formation and elongation of centrioles.107 To date, no known

74 role in cancer has been identified. GRAM domain-containing protein 3 (GRAMD3) has been associated by GWAS with cognitive impairment in the absence of dementia; however no known function for GRAMD3 has been described.108 Other GRAM domain- containing proteins are involved in membrane-associated processes such as intracellular protein binding or lipid binding and signaling and have documented roles in phosphatidylinositol 3,5 biphosphate (PtdIns(3,5)P2) substrate recognition and

PtdIns3P/PtdIns5P dependent oligomerization.109 One possible connection between

GRAMD3 at SCC5 and SNX10 at SCC13 is through shared binding partners in the phosphatidylinositide family.

An intriguing candidate at the SCC4 locus is the EPAS1 gene, which is also known as HIF2α. This transcription factor regulates genes involved in angiogenesis, metabolism, and other oxygen-sensitive processes.110 One variant in EPAS1 emerged in a genome-wide association study of renal cell carcinoma, while a different variant in this gene showed an interaction with the 8q24 locus in a prostate cancer association study.111,112 At this time, however, no SNPs in EPAS1 have been associated with CRC.

While the tagging SNPs in EPAS1 that we tested for ASI did not achieve a significant p- value after multiple comparisons testing, it remains possible that variants in this gene contribute to CRC risk by some unknown interaction with variants in the SCC5 locus.

Our RNA-seq analysis revealed few coding variations between Balb/cHeA and

STS/A (Appendix B, Table 11). Most of the differences between the strains were in mRNA expression (Appendix B, Table 10), suggesting that variants outside of the coding regions may play a critical role in the regulation of gene expression and contribute to

75

CRC sensitivity by modifying gene expression. This would be consistent with the finding that many GWAS-identified risk loci reside in intergenic regions proposed to be in linkage with regulatory elements that control gene expression in cis or in trans. One limitation of the use of RNA-seq data to identify candidate sequence variants is the inability to sequence promoter or intergenic regions that may differ between the strains and potentially modulate gene expression. However, if the variants driving the linkages are acting in cis to regulate gene expression, the gene expression differences can act as a surrogate for the regulatory variants. As we used mRNA for our RNA-seq study, we were not able to capture any difference in microRNA or other non-coding that map to these regions. An important consideration for statistical inference of the RNA-seq data is the small number (n = 1) of colon samples that were assessed.

Our aCGH data along with that described in the literature support the exploration of the human orthologous regions of Scc4, Scc5, and Scc13 in CRC susceptibility (Table

5). We observed similar patterns of chromosome arm gains and losses as a number of other aCGH analyses of CRC tumors.99-102 We demonstrated gains in 19% of tumors at

2p (SCC4) and in 24% of tumors at 7p (SCC13). Losses were observed in 10% of tumors at 4q (SCC13) and 31% of tumors at 5q (SCC5). In addition to gains or losses that spanned multiple adjacent BAC clones or whole arms, we also identified a handful of single BAC clones showing frequencies of gains or losses greater than or approaching

35% (data not shown). Genes such as SCRN1, PDGFRB, and PRDM5 map within these focal regions of aberration, but no SNPs within these genes showed statistically significant evidence of ASI in our study.

76

There are limitations to this study to consider when interpreting the results. From work performed by our laboratory and others, we know that not all CRC susceptibility alleles demonstrate allele-specific gains or losses in tumors.27,28 Thus, we may be missing interesting candidate genes or variants for future studies by prioritizing those that contain variants with ASI. Secondly, we primarily focused on testing tagging SNPs that map near or within genes. As many GWAS-identified SNPs for cancer are intergenic and are thought to alter regulatory elements of genes important in CRC development, we may be missing critical SNPs/regulatory elements important for cancer susceptibility.113 Finally, the identification of ―causal‖ variants being selected for during tumorigenesis (i.e. drivers of ASI) is dependent in part on the allele frequency and frequency at which a locus shows copy number gains and losses in tumors. Variants that have lower rates of heterozygosity and/or map in regions with less frequent gains and losses may require larger samples sizes than those used in this study to detect allele-specific somatic changes. Given the relatively low frequency of genomic copy number alterations at 2p observed by aCGH

(gains in less than 19% of CRCs), our study may have been underpowered to detect variants showing ASI at SCC4. If a less stringent method of multiple comparisons adjustments were made, a number of additional variants from these loci could be considered as candidates.

To date, none of our candidate SNPs (or genes) has been identified from genome- wide association studies for CRC risk. This is not entirely unexpected as we anticipate that risk of these SNPs will be dependent upon their interacting loci and will not necessarily meet the stringent p-values for risk when considered independently.

77

Interestingly some of the murine Scc and colon cancer susceptibility (Ccs) loci have been mapped within 3.3 cM of loci identified through GWAS of human CRC risk.114 A study by the EPICOLON consortium identified a SNP, rs954353, in the human orthologous locus to Scc15 that showed evidence of risk in their Phase I study.115 Finally, work by us and others suggest that variants and haplotypes in PTPRJ, the candidate gene for SCC1, may be important for colon and breast cancer risk.56,59,116 As these variants were not replicated in larger studies, it will be important to look at them in the context of candidate interacting alleles at the human orthologs of Scc5 and Scc13.

3.6 Conclusions

In summary, we used a cross-species approach to identify potential candidate genes from mouse Scc loci. We identified multiple candidate genes showing mRNA expression differences between the strains of mice and identified two variants in the gene

SNX10 showing statistically significant evidence of ASI in human tumors. Future studies will be necessary to determine (1) if these genes and the others showing suggestive evidence of ASI are important in the differences in colon cancer susceptibility between the strains, and (2) if these variants are involved in gene-gene interactions for human

CRC susceptibility. These variants may have utility as biomarkers for CRC risk and could highlight novel gene targets or pathways for therapeutic interventions.

78

Nakao et Jones et al., Lassman et Dyrso et al., Gains/ Our Cohort Arm al., 2004* 2005* al., 2007* 2011* (n = 67) Losses (n = 125) (n = 30) (n = 22) (n = 40) 13/67 Gains 10% 3% 15% <10% (19%) 2p 0/67 Losses <3% 3% 5% <10% (0%) 1/67 Gains <5% 0% 8% <5% (1.5%) 5q 21/67 Losses 25% 40% <5% 20% (31%) 2/67 Gains <5% 0% 5% <5% (3%) 4q 7/67 Losses 20% 35% 0% 30% (10%) 16/67 Gains 35% 45% 25% 43% (24%) 7p 0/67 Losses <5% 0% 0% 0% (0%) Table 5. Chromosomal losses and gains by aCGH at human chromosome arms housing SCC4 (2p), SCC5 (5q), and SCC13 (4q, 7p) *Percentages are approximations.

79

Table 6. SNPs showing evidence of ASI after replication

* ‗Allele 1 Imbalance‘ refers to the number of tumors from heterozygotes which exhibit relative loss of allele 1 compared to allele 2. Data are presented as a fraction of all heterozygotes at that SNP and as a percentage of the heterozygote samples. † ‗Allele 2 Imbalance‘ refers to the number of tumors from heterozygotes which exhibit relative loss of allele 2 compared to allele 1. Data are presented as a fraction of all heterozygotes at that SNP and as a percentage of the heterozygote samples. § Total number of tumors with imbalance of either allele/total heterozygous samples at the designated SNP (% of heterozygotes showing imbalance) ‡ Chi-squared statistical test, df = 1 ∞ Bonferroni corrected p-value (n = 103)

80

Tagged Sample Allele 1 Allele 2 Total Adjusted SNP ID Gene Alleles P-value‡ Set Imbalance* Imbalance† Imbalance§ P-value∞ (Locus) Discovery 18/67 (27%) 6/67 (9%) 24/67 (36%) 0.01431 SNX10 rs1919935 CT Validation 22/109 (20%) 4/109 (4%) 26/109 (24%) 0.00042 (SCC13) Combined 40/176 (23%) 10/176 (6%) 50/176 (28%) 0.00002 0.00228 Discovery 5/89 (6%) 19/89 (21%) 24/89 (27%) 0.00427 SNX10 rs2699814 TA Validation 6/136 (4%) 16/136 (12%) 22/136 (16%) 0.03301 (SCC13) Combined 11/225 (5%) 35/225 (16%) 46/225 (20%) 0.00040 0.04143 Discovery 17/63 (27%) 7/63 (11%) 24/63 (38%) 0.04123 rs6958331 LANCL2 CT Validation 14/93 (15%) 2/93 (2%) 16/93 (17%) 0.00270 (SCC13) Combined 31/156 (20%) 9/156 (6%) 40/156 (26%) 0.00050 0.05193 Discovery 26/98 (27%) 11/98 (11%) 37/98 (38%) 0.01366 CEP120

81 rs6891155 GA Validation 26/153 (17%) 12/153 (8%) 38/153 (25%) 0.02314 (SCC5) Combined 52/251 (21%) 23/251 (9%) 75/251 (30%) 0.00081 0.08365 Discovery 8/89 (9%) 21/89 (24%) 29/89 (33%) 0.01578 GRAMD3 rs4835907 TA Validation 10/118 (8%) 23/118 (19%) 33/118 (28%) 0.02364 (SCC5) Combined 18/207 (9%) 44/207 (21%) 62/207 (30%) 0.00096 0.09888 Table 6. SNPs showing evidence of ASI after replication

81

Chapter 4: Functional Characterization of Hypoxia Inducible Factor 2α (Epas1) Using

Colon Epithelial Cell Lines

4.1 Introduction

4.1.1 The Role of Hypoxia in Cancer

As solid tumors grow larger than 1 millimeter in diameter, generally the tumor cells experience hypoxia, or low oxygen levels, as a result of outgrowing the blood

117 supply. Hypoxia is generally defined by an oxygen tension [pO2] <7 mmHg, or as less than 2% oxygen.118,119 The hypoxic tumor microenvironment triggers a transcriptional program within the tumor cells that leads to upregulation of angiogenic factors like vascular endothelial growth factor (VEGF) and platelet-derived growth factor (PDGF), genes whose protein products are involved in a switch to glycolytic metabolism, as well as numerous other genes that modulate cancer cell proliferation and metastatic potential.120 The hypoxia inducible factor (HIF) family is a critical of this transcriptional program. In healthy cells, these transcription factors function to promote cellular adaptation to hypoxic stress. The main players in this family are the oxygen- regulated HIF1α and HIF2α and their oxygen-independent dimerization partner HIF1β

(Figure 7).121 Under normoxic conditions (normal oxygen tension), HIF1α and HIF2α are subject to proteasomal degradation when hydroxylated on at least one of two conserved proline residues within the oxygen-dependent degradation domain. The hydroxylation

82 mark is recognized by the von Hippel-Lindau protein (pVHL), which targets HIFα for

26S proteasomal degradation. Under low oxygen, HIF1α and HIF2α escape from proteasomal degradation and freely accumulate in the nucleus of the cell.122 In the nucleus, HIF1α or HIF2α heterodimerize with HIF1β and bind to hypoxia response elements (HREs) with a consensus binding sequence of 5‘-RCGTG-3‘ in the promoter regions of target genes.121 The HIFα/β heterodimer activates transcription of target genes involved in a host of processes that facilitate hypoxic adaptation (Figures 8 and 9).

4.1.2 The Function of EPAS1 (HIF2α) in Cancer

HIF2a, also called endothelial PAS-domain containing protein 1 (EPAS1), has a complex connection to cancer pathways and processes. This gene has been most extensively studied in kidney cancer. In clear cell renal cell carcinoma (ccRCC), biallelic inactivation of the von Hippel-Lindau (VHL) tumor suppressor leads to accumulation of stabilized HIF1α and HIF2α protein within the nucleus.123 Aberrant accumulation and transcriptional activity of HIF2α causes increased expression of canonically HRE- regulated genes such as VEGF, transforming growth factor α (TGFα), and lysyl oxidase

(LOX).110 Additionally, EPAS1 promotes cell cycle progression of ccRCC cell lines by stimulating c-MYC occupancy of promoters via stabilization of MYC/MAX complexes.110 EPAS1 has also been shown to enhance Wnt/β-catenin-mediated cell proliferation by functioning as a of β-catenin/TCF.124 Interestingly, EPAS1 contributes to the negative regulation of PTPRJ (the human ortholog of the Scc1 susceptibility gene) in pVHL-deficient ccRCCs.125

83

Variants in EPAS1 have been linked to ccRCC and other cancer types, supporting its putative role in tumor susceptibility. Of note, the intronic SNP rs7579899 in EPAS1 was found to significantly associate with ccRCC risk in a GWAS including almost 6,000 cases and over 13,000 controls (p-value = 2x10-19; Odds Ratio = 1.15 [1.10-1.21 95%

Confidence Interval]).112 A different SNP, rs4953347, in the first intron of EPAS1 was found to exhibit a significant pair-wise SNP-SNP interaction with the 8q24 SNP rs6983267 in a multistage GWAS for prostate cancer.111 Ciampa et al. hypothesized that

EPAS1, which is known to regulate POU5F1 (Oct4) transcription, may also regulate

POU5F1B at the 8q24 locus, and that this biological connection may underlie the observed interaction between rs4953347 and rs6983267.111 They also postulate that this biological interaction could contribute to colon cancer, as it shares many common features with prostate cancer—notably, EPAS1 is overexpressed, mRNA transcripts of

POU5F1B have been detected, and embryonic pathways are implicated in the pathogenesis.

While not normally expressed in healthy colon, EPAS1 is detected in colorectal tumors.126 Many articles describe the expression of EPAS1 in CRC and its correlation with prognosis.127-130 The hypoxic microenvironment of growing colon tumors supports stabilization of EPAS1. Xue et al. generated a mouse model with intestinal epithelium- specific disruption of Vhl on the ApcMin/+ background, leading to constitutive HIF signaling in a mouse predisposed to intestinal tumorigenesis.131 Constitutive HIF activity led to an increase in tumor multiplicity and advanced the progression from adenoma to carcinoma, in concordance with observed increases in cell proliferation by

84 bromodeoxyuridine staining. When these animals were crossed with Hif2α-disrupted mice, the effect on tumorigenesis was ameliorated. Together, these data suggest a plausible role for EPAS1 in human CRC progression.

4.1.3 Evidence from Our Studies Supporting Epas1 as a Candidate Susceptibility Gene for Scc4

The gene Epas1 first emerged as a candidate susceptibility gene at the Scc4 locus from our RNA-seq analysis of Balb/cHeA (CRC-resistant) and STS/A (CRC-sensitive) normal colon tissue. This gene showed differential expression in Balb/cHeA and STS/A mouse colon specimens (Appendix B, Table 10). Furthermore, five tagging SNPs in

EPAS1 showed trends of allele-specific imbalance (ASI) in our cohort of human colorectal tumors (Appendix B, Table 14). Additionally, literature supporting a physical interaction with β-catenin/TCF and c-MYC/MAX, protein complexes that are aberrantly activated in CRC, lends strength to the candidacy of Epas1 for the Scc4 risk gene.110,124,132 Given these lines of evidence, as well as published connections to other cancer types, we were motivated to investigate Epas1/EPAS1 as a potential candidate gene for colon tumorigenesis.

4.2 Objectives and Hypotheses

We aimed to characterize the impact of Epas1 expression on β-catenin/TCF- driven transcription in normal and cancerous colon cell lines cultured under normoxic and hypoxic conditions. We tested the hypothesis that Epas1 expression in these cell lines

85 would increase β-catenin/TCF transcriptional activity and upregulate mRNA levels of β- catenin/TCF-regulated genes, with robust effects occurring under hypoxic, Epas1- stabilizing conditions. We also looked at possible mechanisms of interaction between the candidate susceptibility genes Epas1 at Scc4, Csnk1a1 at Scc5, and Ptprj at Scc1 by investigating Epas1 for potential transcriptional control over its putative interacting partners. Lastly, we tested the hypothesis that Epas1 could function as an oncogene when expressed in normal mouse colon epithelial cells, rendering them more ―cancer-like‖ in their proliferative rate.

4.3 Materials and Methods

4.3.1 Real Time Quantitative PCR Analysis of Mouse Colon Specimens for Epas1,

Csnk1a1, and Ptprj Expression

All studies were approved by The Ohio State University Institutional Animal Care and Use Committee. Snap-frozen large intestines from 4-5 week old age-matched female

Balb/cHeA and female STS/A mice were obtained from the Netherlands Cancer Institute.

We collected total RNA from colon specimens of 3 Balb/cHeA female mice and 3 STS/A female mice by homogenizing whole colon tissue in 1mL Ribozol RNA extraction reagent (Amresco LLC, Solon, OH, USA) using medium speed for 15-second pulses followed by incubation on ice. After complete homogenization of the tissue, the Ribozol manufacturer‘s protocol was followed to purify RNA from the tissue. The quantity and quality of the RNA isolates were assessed on a NanoDrop-1000 spectrophotometer.

86

One microgram of RNA was used to generate cDNA using iScript cDNA synthesis kit (Bio-Rad, Hercules, CA, USA). The cDNA and RNA (no reverse transcription control) samples were tested in technical triplicates and duplicates, respectively, for Epas1, Csnk1a1, and Ptprj expression using real time probes purchased from Integrated DNA Technologies (Coralville, IA, USA). Hprt was used as a housekeeping gene for normalization. Real time probe information for these genes is listed in Table 7. Real time PCR plates were prepared using TaqMan Fast Advanced

Master Mix (Life Technologies, Carlsbad, CA, USA) and run on an Applied Biosystems

7900 thermocycler. The 2−ΔΔCT method was used to calculate normalized relative expression values. Student‘s t-test (www.graphpad.com) was used to compare Balb/cHeA and STS/A samples for statistically significant differential expression of target genes.

4.3.2 Sub-cloning Epas1 into the pEGFP-N1 Expression Vector

We purchased a Mus musculus Epas1 cDNA clone from the Harvard PlasmID database (GenBank Accession: NM_010137.3; Plasmid ID: MmCD00317485). The coding DNA sequence (CDS) was supplied in the pCMV-SPORT6 vector and encodes a protein that is 874 amino acids in length. We used Advantage HD Polymerase Mix to amplify the CDS from pCMV-SPORT6 using two-step PCR (Clontech Laboratories, Inc.,

Mountain View, CA, USA). Primers for PCR were specially designed for compatibility with the In-Fusion HD cloning kit, which utilizes a recombination-based cloning method.

Amplified PCR product was run through a 1% electrophoresis gel prepared with

UltraPure low melting point agarose (Life Technologies, Carlsbad, CA, USA). The

87 product was recovered by gel extraction using Nucleospin Gel and PCR Clean-up kit

(Clontech Laboratories, Inc., Mountain View, CA, USA). We used the In-Fusion HD

Cloning Kit (Clontech Laboratories, Inc., Mountain View, CA, USA) to subclone the

CDS into pEGFP-N1 plasmid. To prepare the vector for In-Fusion, first the pEGFP-N1 plasmid was linearized using the blunt-cutting restriction SmaI (New England

BioLabs, Inc., Ipswich, MA, USA). Linear product was separated by gel electrophoresis on an UltraPure low melt agarose gel and gel extracted with Nucleospin Gel and PCR

Cleanup kit reagents, per the manufacturer‘s protocol. After performing the In-Fusion reaction with 100ng of amplified cDNA and 100ng of linearized pEGFP-N1, we transformed Clontech Stellar competent cells with 25% of the reaction volume (2.5μL) according to the provided protocol. Transformed cells were plated on LB agar plates containing 50µg/mL kanamycin for antibiotic selection. Positive clones were screened using colony PCR, followed by miniprep isolation of plasmids using GeneJET Plasmid

Miniprep kit (Thermo Scientific, Waltham, MA, USA) and validation of insert by Sanger sequencing 450ng of plasmid with a sequencing primer flanking the SmaI restriction site.

4.3.3 siRNA Reagents

Small interfering RNA (siRNA) molecules were purchased from Qiagen (Venlo,

Limburg, Netherlands) and Life Technologies (Carlsbad, CA, USA). AllStars Negative

Control siRNA (5 nmol, Cat. No. 1027280) and FlexiTube siRNA Hs_EPAS1_5 (5 nmol,

Cat. No. SI02663038) were obtained as lyophilized products from Qiagen. We resuspended these siRNA reagents in 250µL RNase-free water to obtain a 20µM siRNA

88 solution, aliquoted this solution into small volumes, and stored aliquiots at -20°C. The siRNA were used for knockdown at 5nM final concentration. These siRNAs will be referred to as siControl (AllStars Negative Control siRNA) and siEPAS1 (FlexiTube siRNA Hs_EPAS1_5) in text and figures.

4.3.4 TOPflash and FOPflash Luciferase Reporter Vectors

M50 Super 8x TOPflash and M50 Super 8x FOPflash luciferase reporter vectors were a kind gift from Dr. Randall Moon (University of Washington) and are available through Addgene (Addgene plasmids #12456 and #12457).133 The TOPflash vector contains 7 repeated β-catenin/TCF binding sites (5‘-AGATCAAAGGgggta-3‘) upstream of a minimal promoter, which drives expression of a firefly luciferase reporter gene.

These binding sites are mutated in the FOPflash reporter vector (5‘-

AGGCCAAAGGgggta-3‘) to function as a negative control.

4.3.5 Cell Lines and Culture Conditions

For the described experiments, SW480 and HCT116 were used as representative human colorectal adenocarcinoma cell culture models. The SW480 cell line is derived from the primary sporadic colon adenocarcinoma of a 50-year-old male Caucasian.

SW480 cells are microsatellite stable but exhibit chromosomal instability. HCT116 is derived from a male colon tumor with microsatellite instability (MSI) and is a widely used archetype for sporadic MSI colon cancer. A third cell line, called ―Fred‖, is an immortalized TP53-null C57Bl/6 colon epithelial cell line. The laboratory of Dr. Joanna

89

Groden at The Ohio State University kindly provided these three cell lines. All cells were

2 cultured at 37°C with 5% CO2 and routinely passaged in Greiner CellStar 75cm tissue culture flasks. SW480 and Fred cells were cultured in Dulbecco‘s Modification of

Eagle‘s Medium with 4.5 g/L glucose, L-glutamine, and sodium pyruvate (Corning,

Manassas, VA, USA), supplemented with 10% (v/v) fetal bovine serum (HyClone,

Logan, UT, USA) and 1% (v/v) penicillin-streptomycin solution (supplied at a stock concentration of 10,000U/mL penicillin, 10,000µg/mL streptomycin in 0.85% NaCl;

Thermo Scientific, Waltham, MA, USA). HCT116 cells were grown in McCoy‘s 5A

Medium, which was also supplemented with 10% fetal bovine serum and 1% penicillin- streptomycin. The 3 cell lines were routinely sub-cultured using trypsin-EDTA (0.05% trypsin, 0.53mM 1xEDTA; Corning, Manassas, VA, USA) and maintained at low passage numbers (< 50) for all experiments. For normoxia studies, cells were maintained in a standard incubator (37°C, 5% CO2) and exposed to atmospheric levels of oxygen

(~20-21% O2). For hypoxia studies, cells were maintained in an oxygen-controlled incubator (37°C, 5% CO2) that utilized nitrogen gas to flush out oxygen to 1% (hypoxic) levels.

4.3.6 Transfection Methods

All pEGFP-N1 and pEGFP-Epas1 transfections in SW480, HCT116, and Fred cells were performed using Lipofectamine 2000 transfection reagent (Life Technologies,

Carlsbad, CA, USA) when cells were 60-80% confluent. For siRNA transfection of

HCT116 cells, 6µL of HiPerfect reagent (Qiagen, Venlo, Limburg, Netherlands) was

90 used to transfect siControl or siEPAS1 to a final concentration of 5nM in a ~60% confluent 12-well culture plate. For both overexpression and knockdown, triplicate wells of cells were transfected in 1x Opti-MEM (Life Technologies, Carlsbad, CA, USA) for 4 to 6 hours, then regular full-serum DMEM plus FBS and antibiotics was added back for the remainder of the experiment. Unless otherwise noted, cells were harvested at 24 hours post-transfection for analysis. For transfections performed in 12-well culture dish format,

1μg total plasmid DNA and 3μL Lipofectamine 2000 were used per well. Amounts and volumes were scaled as needed for alternative culture dish formats. Transfections were performed according to HiPerfect and Lipofectamine 2000 manufacturer‘s recommendations.

4.3.7 TOPflash/FOPflash Luciferase Assays

TOPflash/FOPflash reporter assays were conducted in SW480 and HCT116 cell lines, as these colon adenocarcinoma cells exhibit activated Wnt/β-catenin signaling.

These cells were plated in 12-well culture dishes roughly 16 to 24 hours prior to transfection. Cells at ~60-80% confluency were transfected in triplicate with 150ng of

LacZ internal control plasmid (encoding β-galactosidase), 500ng of TOPflash or

FOPflash reporter plasmid, and 350ng of pEGFP-N1 empty vector (EV) or pEGFP-Epas1 expression vector. Of note, for siRNA knockdown experiments, the cells were transfected with 5nM siControl or 5nM siEPAS1 24 hours prior to performing this transfection.

Protein lysates were harvested 24 hours post-transfection using 200µL of Mammalian

Protein Extraction Reagent (Thermo Scientific, Waltham, MA, USA) per well. Extracts

91 were incubated at room temperature for 5 to 10 minutes and then centrifuged at 16,000xg for 10 minutes at 4°C to pellet cell debris. The supernatant (containing protein) was transferred to a fresh 1.5mL tube and 30µL was assessed for luciferase activity and 30µL was measured for β-galactosidase activity.

For the luciferase readings, 30µL of lysate was transferred to a 96-well luciferase assay plate and measured for luminescence on a Veritas microplate luminometer

(Promega, Madison, WI, USA). Synthetic firefly D-luciferin potassium salt (Biosynth,

Itasca, IL, USA) was used as the substrate for the luciferase assay at 200µM concentration. Magnesium and ATP cofactors were supplied in reagent mixtures. We measured β-galactosidase activity in a standard clear-bottom 96-well plate by adding

100µL of Z-buffer containing 2-nitrophenyl β-D-galactopyranoside (ONPG, Sigma-

Aldrich Co., St. Louis, MO, USA) to each well containing 30µL protein lysate, incubating the plate at 37°C until yellow color develops, and then reading the absorbance on a plate reader at 420nm. Luciferase readings were normalized to β-galactosidase readings and averaged across triplicates. Ratios of TOPflash to FOPflash relative luminescence units (RLU) were calculated and graphed using Microsoft Excel. Analysis of Variance (ANOVA) followed by Tukey‘s post hoc test was used to assess experimental conditions for statistical significance. An alpha of 0.05 was used as a cutoff for statistical significance.

92

4.3.8 SDS-PAGE and Western Blots

Protein lysates were collected from cells transfected with pEGFP-N1 EV or pEGFP-Epas1 or from cells transfected with AllStars Negative Control siRNA

(siControl) or FlexiTube EPAS1 siRNA (siEPAS1). Before performing the protein extraction, culture media was aspirated and cells were rinsed twice with 1x Phosphate

Buffered Saline (PBS) to remove contaminants. Cells were then scraped from the well in

PBS using single-use disposable cell scrapers and transferred to 1.5mL tubes. Cell suspensions were centrifuged at 800xg for 5 minutes at room temperature to pellet cells.

The PBS supernatant was discarded and the pellet was resuspended and gently vortexed in RIPA lysis buffer supplemented with 1x protease inhibitors and 1x phosphatase inhibitors (Pierce Protease and Phosphatase Inhibitor Mini Tablets, Thermo Scientific,

Waltham, MA, USA) to extract cellular proteins. The protein lysate was then incubated on ice for 30 minutes and centrifuged at 16,000xg for 15 minutes at 4°C. The supernatant was transferred to a fresh 1.5mL tube and stored at -80°C.

Proteins were quantified using a Qubit Protein Assay Kit (Life Technologies,

Carlsbad, CA, USA). One to two microliters of protein lysate were used for quantification. The manufacturer‘s protocol was followed to determine protein concentration. Following quantification, 20µg of protein was combined with 4x protein loading buffer and water, boiled at 100°C for 5 minutes, and either used immediately for

Western blotting or stored at -80°C until use.

Twenty micrograms of dyed and denatured protein lysate were loaded into

0.75mm, 10% SDS-PAGE gels alongside 7µL of RPN800E full range rainbow

93 recombinant protein molecular weight marker (GE Healthcare Life Sciences, Pittsburgh,

PA, USA). Gels were run in a Bio-Rad SDS-PAGE apparatus at 100 volts through the stacking gel, and then increased to 120 volts through the separating gel. Transfers to pre- wet Amersham Hybond ECL nitrocellulose blotting membrane (GE Healthcare Life

Sciences, Pittsburgh, PA, USA) were performed with ice-cold 1x transfer buffer containing 20% methanol and were run at 90 volts for 70 minutes. Membranes were dried between Whatman filter paper at 37°C for 15 minutes, and then blocked in 1x Tris

Buffered Saline with 0.1% Tween-20 (1xTBST) supplemented with 5% (w/v) nonfat dry milk (plus 1% bovine serum albumin for detection of endogenous EPAS1/Epas1). After blocking for 1 hour at room temperature on a rocking platform, primary antibodies were adhered overnight on a rocker in the 4°C cold room. The following day, blots were rinsed

5 times for 5 minutes with 1xTBST at medium intensity on the rocking platform.

Secondary antibody was adhered for 2 hours at room temperature rocking at low intensity. Membranes were rinsed again 5 times for 5 minutes with 1xTBST at medium intensity. Blots were treated with SuperSignal West Pico Chemiluminescent substrates

(Thermo Scientific, Waltham, MA, USA), which were added to the membranes at equal volumes for 5 minutes at room temperature in a dim environment. Blots were developed using HyBlot CL autoradiography film (Denville Scientific, Inc., Metuchen, NJ, USA) and an automatic X-ray film processor. Exposure times generally ranged from 1 second to 10 minutes depending on the protein target. Densitometry was performed using ImageJ and target protein was normalized to GAPDH loading control.

94

4.3.9 Antibodies for Western Blotting

Anti-GFP mouse monoclonal IgG antibody (sc-9996) was purchased from Santa

Cruz Biotechnology, Inc. (Dallas, TX, USA) and used at a 1:500 dilution in 1xTBST to detect Epas1-GFP fusion protein from pEGFP-Epas1-transfected cells. Rabbit polyclonal

HIF2α/EPAS1 antibody (Cat. No. NB100-122) was purchased from Novus Biologicals

(Littleton, CO, USA) and used at a 1:1000 dilution (in 1xTBST, 5% nonfat dry milk, 1% bovine serum albumin) to detect endogenous EPAS1 in cells treated with siControl or siEPAS1. The loading control protein Gapdh/GAPDH was probed with an anti-GAPDH mouse monoclonal IgG antibody (sc-32233, Santa Cruz Biotechnology, Inc., Dallas, TX,

USA) at a 1:500 dilution in 1xTBST. Peroxidase-conjugated AffiniPure goat anti-mouse

IgG antibody was purchased from Jackson ImmunoResearch Laboratories, Inc. (West

Grove, PA, USA) and used at a 1:5000 dilution in 1xTBST to detect GFP and GAPDH.

To detect the anti-EPAS1 primary rabbit IgG, affinity purified horseradish peroxidase- linked goat anti-rabbit IgG was used as the secondary antibody (Cell Signaling

Technology, Inc., Danvers, MA, USA) at a 1:1000 dilution.

4.3.10 Real Time Quantitative PCR Analysis of β-catenin/TCF Transcriptional Targets

Human SW480 colon adenocarcinoma cells transfected with pEGFP-N1 EV or pEGFP-Epas1 expression vector were harvested for RNA at 24 hours post-transfection.

Total RNA was isolated from cells using 1mL Ribozol (Amresco LLC, Solon, OH, USA) following manufacturer‘s protocol. Bio-Rad iScript cDNA synthesis kit was used to prepare cDNA from 1µg of extracted RNA (Hercules, CA, USA). Bill Hankey from Dr.

95

Joanna Groden‘s laboratory generously provided Sybr Green PCR primers for Sybr

Green real time quantitative PCR (qPCR) analysis of c-MYC and 18S rRNA. The 18S rRNA was used as the housekeeping gene for normalization. Primers were diluted to

10μM stock concentrations in 1x Tris-EDTA buffer and used at 0.5μM final concentration in the qPCR reactions, along with 5μL of 2x Fast Sybr Green Master Mix

(Life Technologies, Carlsbad, CA, USA), 0.2μL of the 20μL cDNA product (~10ng cDNA), and water up to 10μL total reaction volume. Sybr Green qPCR was run using 40 cycles of thermocycler conditions specified by the Fast Sybr Green Master Mix. The

2−ΔΔCT method was used to calculate normalized relative expression values. Student‘s t- test (www.graphpad.com) was used to compare pEGFP-N1 EV and pEGFP-Epas1 samples for statistically significant differential expression of c-MYC.

4.3.11 Real Time Quantitative PCR Analysis of Csnk1a1 and Ptprj Expression in Epas1-

Expressing Cells

To assess Csnk1a1 and Ptprj mRNA levels in Fred cells transfected with pEGFP-

N1 EV or pEGFP-Epas1 expression vector, total RNA was collected from cells and 1µg was reverse transcribed into cDNA using the iScript cDNA synthesis kit (Bio-Rad,

Hercules, CA, USA). Approximately 20ng cDNA (0.4µL of 20µL cDNA product) was run per well on a real time qPCR plate using probes purchased from Integrated DNA

Technologies (Coralville, IA, USA) and TaqMan Fast Advanced Master Mix (Life

Technologies, Carlsbad, CA, USA). The qPCR primer/probe sets used for detection of

Csnk1a1, Ptprj, and the housekeeping gene Hprt are described in Table 7. The 2−ΔΔCT

96 method was used to calculate Hprt-normalized relative expression values. Student‘s t-test

(www.graphpad.com) was used to compare pEGFP-N1 EV and pEGFP-Epas1 samples for statistically significant differential expression of target genes.

4.3.12 MTT Cell Viability Assay in Epas1-Expressing Fred Cells

An MTT cell viability assay was performed in Fred cells. The MTT Cell

Proliferation Kit was purchased from Roche Life Science (Basel, Switzerland) and used according to the manufacturer‘s protocol. Briefly, Fred cells were transfected in 6-cm culture dishes with 5.525µg pEGFP-N1 EV or pEGFP-Epas1 expression vector, or mock transfected using 14µL Lipofectamine 2000. After 24 hours, the cells were collected from the 6-cm culture dishes and replated at 3000 cells/well in 100µL full-serum media. The cells were replated in quadruplicates in 4 separate 96-well plates for assessment at 24, 48,

72 and 96 hours post-transfection. Cells were also replated into 6-well culture dishes for daily collection of RNA and protein. At each 24 hour interval, cells were treated with

10µL MTT reagent and incubated at 37°C. Four hours later cells were solubilized using

100µL of solubilization reagent and returned to the incubator. The following day, plates were read for absorbance in a plate reader at 550nm and 690nm wavelengths. This was repeated for each time point and the average optical density (550nm minus 690nm) and standard deviation for the quadruplicates was calculated at each 24 hour interval. The data is represented as a line graph generated using Microsoft Excel. ANOVA was used to determine statistically significant differences in proliferative rate (as determined by the

97 slope of the line) across the three conditions. An alpha of 0.05 was used as the threshold for statistical significance.

4.4 Results

4.4.1 Epas1 and Csnk1a1 are Differentially Expressed in Balb/cHeA and STS/A Colon

From our RNA-seq and ASI studies described in Chapters 2 and 3, Epas1 and

Csnk1a1 and their human homologues emerged as interesting candidate genes for Scc4 and Scc5, respectively. The functional annotation of these genes described in the literature also supports a plausible role for Epas1 and Csnk1a1 in colon carcinogenesis.

Because our RNA-seq data did not identify any protein-damaging polymorphic variants between the CRC-susceptible and CRC-resistant parental mouse strains used to map Scc4 and Scc5, we were interested in the possibility that gene expression differences between these strains underlie their differential sensitivity to chemically induced intestinal carcinogenesis. We also wanted to assess the expression of Ptprj, the susceptibility gene at Scc1 which interacts with Scc5, in these strains.

We first evaluated to what extent these genes are expressed in CRC-susceptible

(Balb/cHeA) and CRC-resistant (STS/A) mouse colons. We tested transcript levels of the genes Epas1, Csnk1a1, and Ptprj in the normal, healthy colons of 3 female mice per strain to determine if the baseline levels of these genes differ based on the genetic background of the mouse. Given the putative role of Epas1 as a coactivator of β- catenin/TCF-driven transcription and as a stabilizer of c-MYC/MAX transactivation, we predicted that the susceptible STS/A colons would express higher levels of Epas1,

98 thereby rendering them more susceptible to pro-proliferative β-catenin- and c-Myc- mediated transcriptional programs. Conversely, we hypothesized that STS/A animals would have lower levels of Csnk1a1 in the colon, as Csnk1a1 functions as a negative regulator of β-catenin by targeting it for degradation. With regard to the tumor suppressor gene Ptprj, which maps to the Scc1 locus, we anticipated that STS/A animals would demonstrate reduced expression in the colon compared to Balb/cHeA mice.

To test our hypotheses, we collected total RNA from homogenized colon tissue and conducted reverse transcription followed by qPCR. Whole colon tissues from

Balb/cHeA and STS/A female mice differ significantly in their mRNA levels of Epas1

(Figure 10A), with Balb/cHeA colons expressing over three-fold higher Epas1 (p-value =

0.0213). Balb/cHeA colons also had significantly higher levels of total Csnk1a1 (p =

0.0093; Figure 10B). No significant differences in Ptprj levels were detected (Figure

10C). Importantly, the primer/probe sets that capture Csnk1a1 and Ptprj were selected to measure total expression of all transcript variants. Thus, we cannot compare transcript levels of specific alternative splice forms across strains.

4.4.2 Epas1 Enhances β-catenin/TCF-Driven Transcription of the TOPflash Luciferase

Reporter Plasmid Under Hypoxic Conditions

We next tested the hypothesis that Epas1 functions to enhance β-catenin/TCF- driven transcription in colon epithelial cells in the same manner as was observed in renal cell carcinoma.124 To perform this experiment, we subcloned the Mus musculus Epas1 open reading frame into the pEGFP-N1 expression vector to express an Epas1-EGFP

99 fusion protein in transfected cells. Attempts at subcloning a Homo sapiens EPAS1 open reading frame into pEGFP-N1 were unsuccessful, but mouse and human share 88% homology at the amino acid level so we proceeded to use Epas1 in both mouse and human cell culture experiments. Expression of Epas1-EGFP fusion protein was verified at 24 hours post-transfection by Western blot (Figure 11). The pEGFP-Epas1 expression construct was co-transfected with a LacZ internal control plasmid and either the

TOPflash or FOPflash luciferase reporter into SW480 or HCT116 cells and incubated for

24 hours in either normoxia (atmospheric O2, or ~20-21%) or hypoxia (1% O2).

Using our TOPflash/FOPflash reporter system, we observed a modest activation of β-catenin/TCF-mediated transcription in pEGFP-Epas1 transfected SW480 cells compared to pEGFP-N1 EV under normoxic conditions (Figure 12A). No significant effect of enforced Epas1 expression in HCT116 cells in normoxia was observed (Figure

12B). In both cell lines, hypoxia induced a significant increase in reporter activity in

Epas1-GFP-expressing cells compared to pEGFP-N1 EV-expressing cells (p < 0.05).

Reporter activity in pEGFP-Epas1-transfected cells was also significantly increased in hypoxic compared to normoxic conditions (p < 0.05). This effect was observed in both cell lines (Figure 12A and B). Hypoxic culture conditions unexpectedly reduced reporter activity of pEGFP-N1 EV-transfected HCT116 cells. This observation is contrary to the prediction that hypoxia would stabilize endogenous EPAS1 and lead to upregulated activity. Collectively, these data suggest that enforced expression of Epas1 in human colon adenocarcinoma cell lines has modest to low effects in normoxic conditions, but robustly stimulates β-catenin/TCF activity when cells are under hypoxic stress.

100

Our observations of the effects of Epas1 overexpression on TOPflash/FOPflash reporter activity led us to question the effects of knocking down endogenous EPAS1 in

HCT116 and SW480 cells. To establish knockdown specificity and efficiency of our control and EPAS1-targeting siRNA molecules, we first performed a dose-response of siControl and siEPAS1 in HCT116 cells to examine their effects on endogenous EPAS1 protein levels. For this assay, HCT116 cells were transfected with siControl or siEPAS1 to a final concentration of 1nM, 5nM, or 10nM. Protein levels of EPAS1 were markedly reduced when cells were administered siEPAS1 at final concentrations of 5nM (~65% knockdown relative to 5nM siControl) and 10nM (~78% knockdown relative to 10nM siControl) while EPAS1 levels were unaffected by the negative control siRNA at all tested doses (Figure 13A and B). When 5nM of siControl or siEPAS1 were introduced into HCT116 and transfected the following day with the luciferase reporters for 24 hours, we observed no effect of reduced endogenous EPAS1 on β-catenin/TCF activity in normoxia (Figure 14). Knockdown studies in SW480 and HCT116 under both normoxia and hypoxia have yet to be performed.

4.4.3 Epas1-GFP Expression does not Stimulate c-MYC Expression in Normoxia

To determine if our Epas1-GFP fusion protein could stimulate the expression of the canonical Wnt/β-catenin target gene c-MYC in SW480 cells, as suggested by our artificial TOPflash/FOPflash reporter system, we introduced pEGFP-N1 EV or pEGFP-

Epas1 into cells and then assessed c-MYC transcript levels at 24 and 48 hours post- transfection using Sybr Green qPCR. In this preliminary study, we did not see any

101 upregulation of c-MYC as a result of Epas1-GFP expression at either time point tested

(Student‘s t-test p-value > 0.05).

4.4.4 Epas1 Expression is Associated with Decreased Csnk1a1 and Ptprj Transcript

Levels in an Immortalized Mouse Colon Epithelial Cell Line

Next we inquired if Epas1 could potentially interact with the candidate genes at

Scc5 (Csnk1a1) and Scc1 (Ptprj) by contributing to their transcriptional regulation. We transfected our pEGFP-Epas1 expression construct into Fred cells and collected RNA at

48 hours post-transfection. Using qPCR, we assessed total transcript levels of all Csnk1a1 and Ptprj isoforms. We observed modestly decreased levels of Csnk1a1 in Fred cells expressing murine Epas1-GFP fusion protein at 48 hours (Figure 16A). Transcript levels of Ptprj were reduced by over 40% in Epas1-transfected cells after 48 hours (Figure

16B). Because Epas1 is traditionally a positive regulator of transcription, an unknown intermediate may indirectly link Epas1 to the regulation of Csnk1a1 and Ptprj gene expression.

4.4.5 Epas1-GFP does not Alter Proliferation or Viability in Immortalized Mouse Colon

Epithelial Cells Grown Under Normoxic Conditions

Given that mRNA expression of the tumor suppressor Ptprj was reduced in Fred cells expressing Epas1-GFP, and considering that Ptprj has an established role in antagonizing receptor tyrosine kinase-mediated cell proliferation and survival, we hypothesized that Fred cells with enforced Epas1 expression would show increased cell

102 proliferation and viability. To test this, we performed an MTT proliferation assay comparing mock, pEGFP-N1 EV, and pEGFP-Epas1 transfection conditions over the course of 24 to 96 hours under normoxic cell culture conditions. Analysis of the slopes of the growth curves for mock, pEGFP-N1 EV, and pEGFP-Epas1 revealed no statistically significant effect of Epas1-GFP on proliferation and viability under conditions of normal

O2 levels (Figure 17, ANOVA p-value > 0.05).

4.5 Discussion

Hypoxia inducible factors are established regulators of numerous cellular processes, including angiogenesis, cell proliferation, cell survival, and glycolytic metabolism.119 As such, they are intriguing genes to study for their role in the development and progression of cancer. Hif2α, or Epas1, emerged as a top candidate susceptibility gene for the Scc4 locus that interacts reciprocally with Scc5 to confer risk in a mouse model of chemically induced colon cancer.50 In humans, polymorphisms in

EPAS1 have been associated with risk for ccRCC and prostate cancer.111,112 While no

SNPs in EPAS1 have been implicated in CRC susceptibility thus far, it is possible that a variant in EPAS1 requires an interacting partner to modify risk—perhaps in a manner similar to that observed between Scc4 and Scc5 in the mouse.50 In our studies, we have aimed to characterize the functional role of this gene in normal and cancerous colon cell lines and to test for a potential biological relationship between Epas1 and Csnk1a1.

103

4.5.1 Colon Transcript Levels of Epas1, Csnk1a1, and Ptprj are Lower in the CRC-

Sensitive Strain STS/A Compared to the CRC-Resistant Strain Balb/cHeA

In previous RNA-seq studies (described in Chapter 3), we compared the transcriptomes of one Balb/cHeA (CRC-resistant) and one STS/A (CRC-susceptible) normal mouse colon. None of the coding SNPs we identified in our RNA-seq experiment and which map to Scc4, Scc5, or Scc13 were predicted to damage protein structure or function using the in silico tools SIFT and PolyPhen-v2. This suggests that regulatory

SNPs that affect gene expression in cis could underlie the susceptibility at the Scc loci we interrogated. We evaluated transcript levels of Epas1 (Scc4 candidate), Csnk1a1 (Scc5 candidate), and Ptprj (Scc1 candidate) in the normal colons of 3 Balb/cHeA female mice and 3 STS/A female mice to determine if basal differences in the expression of these genes exist between the strains and could potentially contribute to their differential sensitivity to chemically induced CRC. We observed significantly lower expression of all

3 genes in the colons of the CRC-susceptible STS/A mice (Figure 10).

The Scc1 gene, Ptprj, has previously been found to contain a handful of amino- acid changing polymorphisms that differ between Balb/cHeA and STS/A.55 No functional investigations have yet illuminated the possible functional effects of these polymorphisms on Ptprj protein structure, localization, enzymatic activity, or other properties. Here, we describe expression differences in this gene, which suggests that transcript levels of Ptprj (in addition to, or in concert with, coding variants) could contribute to the susceptibility linked to the Scc1 locus. Lower levels of Ptprj are

104 observed in the more cancer-sensitive strain, STS/A, in accordance with the tumor suppressor properties of this protein tyrosine phosphatase (Figure 10C).

Casein kinase 1α1 (CSNK1A1) is the component of the β-catenin destruction complex responsible for phosphorylating Serine residue 45 of β-catenin, thereby priming it for subsequent by GSK3β and targeting for proteasomal degradation.134 We found Csnk1a1 to be expressed at lower levels in STS/A colon

(Figure 10B). This may contribute to CRC sensitivity if low Csnk1a1 abundance limits the formation of β-catenin destruction complexes, and therefore facilitates the accumulation of unphosphorylated β-catenin within the cell.

Contrary to its hypothesized role as an oncogene-like stimulator of β-catenin and c-MYC transcriptional programs, Epas1 is transcriptionally more abundant in the colon of the CRC-resistant Balb/cHeA strain (Figure 10A). However, Epas1 maps to a locus that participates in a reciprocal interaction to modify risk wherein the allele present at one interacting locus takes on CRC-resistant or CRC-susceptible properties depending on what allele is present at its interacting partner.50 Given this complex relationship, it is difficult to predict how levels of Epas1 influence CRC risk when the biological nature of its statistical interaction is unknown.

4.5.2 Epas1 Promotes β-catenin/TCF Transcriptional Activity in Human Colon

Adenocarcinoma Cells Under Hypoxic Conditions

Previous studies implicating EPAS1 as a coactivator of β-catenin/TCF-mediated transcription in human ccRCC cell lines led us to hypothesize that it may contribute to

105 the aberrant activation of the Wnt/β-catenin pathway frequently observed in CRC.124 We performed TOPflash/FOPflash luciferase reporter assays to measure β-catenin/TCF transcriptional activity in SW480 and HCT116 cells transfected with either pEGFP-N1

EV or pEGFP-Epas1. Overexpression of Epas1 in SW480 cells led to a slight increase in reporter activity under normoxia, and this effect was amplified in hypoxic conditions

(Figure 12A). Because the Epas1 cDNA that was subcloned into pEGFP-N1 contains an intact oxygen-dependent degradation domain, the Epas1-GFP fusion protein is likely subject to the same rapid turnover under high oxygen as its endogenous counterpart.

Thus, activation effects of Epas1 are likely blunted under normoxia. Under hypoxic conditions, however, Epas1 escapes hydroxylation and traffics to the nucleus to participate in transcriptional regulation.135 While not yet tested here, it is likely that

Epas1 levels are increased during hypoxia in SW480 and that more protein accumulates within the nucleus to function as a coactivator of β-catenin/TCF.

In HCT116 cells, enforced expression of Epas1 in normoxia was not sufficient to stimulate β-catenin/TCF activity, but effectively upregulated transcription of the luciferase reporter gene under conditions of hypoxic stress (Figure 12B). Silencing of endogenous EPAS1 (via siRNA) in HCT116 cells in normoxia was not sufficient to decrease TOPflash/FOPflash reporter activity, but this was not tested under conditions of hypoxia (Figure 14). Interestingly, hypoxia reduced β-catenin/TCF activity in pEGFP-N1

EV-transfected cells (Figure 12B). This could be the result of endogenous HIF1α exerting negative regulation on the β-catenin/TCF complex when stabilized by low oxygen, although this is pure speculation. An important consideration for experiments performed

106 in this cell line is that HCT116 cells express HIF1α at much higher levels than HIF2α

(EPAS1), and HIF1α and HIF2α can have opposing effects on certain transcriptional programs (e.g. c-MYC-driven transcription).110,132 Though not tested here, it is possible that HIF1α exerts opposing effects on β-catenin/TCF activity that outweigh the effects of

HIF2α under normoxia. We have not compared endogenous HIF1α and Epas1-GFP for protein abundance or binding affinity for β-catenin, so it remains unclear how the balance of activity between HIF1α and HIF2α might influence our assessment of β-catenin/TCF activity in HCT116.

Ongoing investigations are focused on (1) repeating these TOPflash/FOPflash reporter assays in both HCT116 and SW480 cells cultured in normoxia and hypoxia, (2) confirming stabilization of Epas1-GFP by Western blotting protein lysates from hypoxia- treated cells, and (3) performing our siRNA-knockdown experiments in hypoxia. If future work validates the preliminary findings shown here, the next step would be to perform co-immunoprecipitation experiments to test the hypothesis that EPAS1 physically interacts with β-catenin to stabilize the β-catenin/TCF complex and to facilitate targeting of this complex to the TCF/LEF binding sequences. Alternatively, it is possible that

EPAS1 stimulates β-catenin/TCF activity by regulating expression of β-catenin, as has been observed in pancreatic intraepithelial neoplasia.136 To test this, we can overexpress or knockdown EPAS1 in our cell lines and assess β-catenin expression at the transcript and protein level. Collectively, these studies will shed light on the role of Epas1/EPAS1 in modulating Wnt/β-catenin signaling in CRC and further support its potential as a candidate susceptibility gene at Scc4/SCC4.

107

4.5.3 Epas1 does not Affect Expression of c-MYC in Normoxia, but may Regulate Levels of Csnk1a1 (Scc5) and Ptprj (Scc1)

We did not observe a regulatory effect of Epas1-GFP expression on the oncogene and canonical Wnt/β-catenin target c-MYC in SW480 cells cultured in normoxia (Figure

15). SW480 are positive for c-MYC amplification and thus may already be saturated for expression, potentially obscuring any further activation of its promoter by EPAS1.137,138

Studies conducted under hypoxia and utilizing both overexpression and knockdown systems will be needed to fully explore the hypothesis that EPAS1 stimulates β- catenin/TCF-driven transcription of genes like c-MYC and others. Other β-catenin/TCF target genes whose expression will be investigated under these experimental conditions include AXIN2, BMP4, CCND1, MET, TCF1, and VEGF. These experiments represent an important means of validating the TOPflash/FOPflash reporter data, which is a well- established and highly utilized means of measuring β-catenin/TCF activity but which nonetheless is an artificial reporter system.

In the immortalized C57Bl/6 colon epithelial cell line, we observed downregulation of both Csnk1a1 and Ptprj when Epas1-GFP was expressed in normoxia

(Figure 16). This suggests a possible ―interactome‖ in which the Scc4-candidate Epas1 functions as the central hub modifying transcription of the Scc candidate genes Csnk1a1 and Ptprj. Of particular note is a publication suggesting that EPAS1 exerts repressive transcriptional regulation of PTPRJ in ccRCC cells.125 Despite being largely characterized as a transcriptional activator, it is possible that under certain contexts

108

EPAS1 may repress the transcription of genes like PTPRJ, perhaps by differentially associating with repressors or by recruiting repressive chromatin remodelers to the hypoxia response element(s) (HRE) within the target gene promoter. Alternatively, it is possible EPAS1 down-modulates gene transcription by activating expression of a repressor of PTPRJ—thereby requiring an intermediary to exert repressive effects. Given that hypoxic stress generates rapid responses in cells, however, one could hypothesize that the hypoxia inducible factors have dual roles as activators and repressors so that they may more efficiently modulate expression of genes involved in cellular adaptation to limited oxygen. Indeed, there are a few documented examples of genes that are directly repressed by HIF1α at HRE sites within the target‘s promoter, raising the possibility that the highly homologous HIF2α (EPAS1) may also repress gene transcription of certain targets.139-141

4.5.4 Enforced Epas1-GFP Expression in Immortalized Mouse Colon Epithelial Cells does not Induce Cell Proliferation in Normoxia

Epas1/EPAS1 stimulates cell cycle progression and proliferation of renal carcinoma cells, NIH3T3 cells, HEK293 cells, and embryonic endothelial cells and promotes transformation of murine embryonic fibroblasts.110 Our MTT cell proliferation assay in the Fred line of immortalized mouse colon epithelial cells did not reveal any growth-promoting effects of enforced Epas1 expression when cells were exposed to atmospheric oxygen levels (Figure 17). Hypoxia incubation of cells for this assay has not yet been performed, but may uncover effects of Epas1 on cell proliferation or viability. If

109 no proliferation/viability differences are observed under hypoxic conditions, another possibility is that growth-promoting effects of Epas1 are dependent on an activated β- catenin or c-Myc axis, which may be absent in our immortalized colon epithelial line.

4.6 Conclusions

Our experimental results reveal differential expression of Epas1, Csnk1a1, and

Ptprj in the normal colon of CRC-resistant Balb/cHeA and CRC-susceptible STS/A mice at the transcript level. Preliminary studies in an immortalized C57Bl/6 mouse colon epithelial line suggest that the hypoxia inducible factor Epas1 (mapping within the colon cancer susceptibility locus Scc4) may be involved in the negative regulation of Csnk1a1 and Ptprj, two candidate CRC susceptibility genes mapping to Scc5 and Scc1, respectively. Variants that control murine Epas1 expression, splicing, stability, or protein structure remain to be uncovered, as our RNA-seq analysis failed to unearth any damaging coding SNPs that differ between the CRC-resistant Balb/cHeA and CRC- susceptible STS/A parental strains. However, our preliminary functional studies of Epas1 in colon epithelial cell lines provide a foundation supporting further investigation of this gene as a candidate CRC susceptibility gene at Scc4.

Notably, our experimental results support published observations that EPAS1 acts as a positive regulator of β-catenin/TCF-driven transcription. To our knowledge, we are the first to assess this relationship in human colon adenocarcinoma cell lines. In addition to its documented role as a coactivator of β-catenin/TCF activity and a positive regulator

110 of β-catenin transcription in cancer cells, EPAS1 has been linked to the stabilization of c-

MYC/MAX complexes. By interacting with MAX, EPAS1 stabilizes the formation of c-

MYC/MAX dimers and thereby promotes c-MYC transcriptional programs in WT8 cells

(786-O RCC cells expressing pVHL).110,132 We have not yet pursued the potential role of

EPAS1 in modulating c-MYC activity in colon cancer lines, many of which have genomic amplifications of c-MYC and/or upregulated expression of c-MYC.138 Real time qPCR analysis of c-MYC targets Cyclin D1 (CCND1), p21 (CDKN1A), p27 (CDKN1B) in colon cancer cell lines with enforced or silenced expression of EPAS1 would shed light on contributions of EPAS1 to c-MYC-mediated transcription.

By stimulating two transcriptional programs that are frequently hyper-activated in human CRC, Wnt/β-catenin and c-MYC, the transcription factor EPAS1 may function as an oncogene to promote cell proliferation, survival, migration, and other tumorigenic processes. The convergence of this gene on these critical cancer pathways warrants further investigation in both mouse and human cell culture systems, and ultimately in transgenic mouse models of spontaneous and chemically induced colon cancer.

111

NCBI Gene Detects All IDT Assay ID Gene Species Ref Seq # Transcripts Hit Fluorophore Quenchers Primer:Probe Query Variants? Location Symbol

ZEN/Iowa Mm.PT.39a.22214828 Hprt1 Hprt Mus musculus NM_013556 NM_013556 Yes Exon 6-7 FAM 2:2 Black FQ

ZEN/Iowa Mm.PT.58.17527058 Csnk1a1 Csnk1a1 Mus musculus NM_146087 NM_146087 Yes Exon 7-8 FAM 2:2 Black FQ

NM_001135657 ZEN/Iowa Mm.PT.53a.10504364 Ptprj Ptprj Mus musculus NM_008982 Yes Exon 21-22 FAM 2:2 NM_008982 Black FQ

ZEN/Iowa Mm.PT.51.16652583 Epas1 Epas1 Mus musculus NM_010137 NM_010137 Yes Exon 4-5 FAM 2:2 Black FQ

Table 7. Real time qPCR primer/probe assays purchased from Integrated DNA Technologies (Coralville, IA, USA)

112

112

Figure 7. Protein domain structures of the hypoxia-inducible factor subunits HIF1α, HIF1β (ARNT), and HIF2α (EPAS1) The following domains are shown: basic helix-loop-helix domain (bHLH), Per-Arnt-Sim homology domain (PAS), O2-dependent degradation domain (ODDD), and NH2- and COOH-terminal transactivation domains (TAD-N and TAD-C). Heterodimers are formed by HIF1α/HIF1β or HIF2α/HIF1β. HIF1β lacks the ODD domain and is constitutively expressed under both normoxic and hypoxic conditions.

Image borrowed from: Prabhakar NR, Semenza GL. Adaptive and maladaptive cardiorespiratory responses to continuous and intermittent hypoxia mediated by hypoxia- inducible factors 1 and 2. Physiol Rev 2012;92(3):967-1003.

113

Figure 8. Schematic detailing the regulation of HIFα subunits under normoxia and hypoxia Under normoxic conditions (top), HIFα protein levels are regulated by hydroxylation via prolyl hydroxylase domain (PHD) proteins and ubiquitination via von Hipple-Lindau (VHL) protein, targeting the protein for degradation.

During hypoxia (bottom), HIFα escapes hydroxylation, leading to protein accumulation, nuclear translocation, and formation and binding of the transcriptional complex to target genes. Ub, ubiquitin; CBP, CREB binding protein; HRE, hypoxia response element.

Image borrowed from: Shimoda LA, Laurie SS. HIF and pulmonary vascular responses to hypoxia. J Appl Physiol 2014;116(7):867-74.

114

Figure 9. Highlighted genes and cellular processes that are transcriptionally regulated by the HIF family Genes encoding proteins involved in numerous aspects of tumor initiation, growth and metastasis are transcriptionally activated by either HIF1α or HIF2α. The HIFs thus exert control over diverse processes, such as inflammatory cell recruitment, proliferation, survival, metabolism and mitochondrial function, extracellular matrix remodeling, motility, angiogenesis, and pH regulation.

Image borrowed from: Bertout JA, Patel SA, Simon MC. The impact of O2 availability on human cancer. Nat Rev Cancer 2008;8(12):967-75.

115

A Epas1 Expression in the Colon B Csnk1a1 Expression in the Colon

80 140

p = 0.0093 70 120 p = 0.0213

60 100 Expression 50 Expression 80

40

Hprt Hprt 30 60 20 40

10 20 Average % of Average 0 % of Average 0 Balb/cHeA STS/A Balb/cHeA STS/A Mouse Strain Mouse Strain

C Ptprj Expression in the Colon 25 p = 0.1098

20 Expression

15 Hprt 10

5

Average % of Average 0 Balb/cHeA STS/A Mouse Strain

Figure 10. Real time qPCR analysis of Epas1 (A), Csnk1a1 (B), and Ptprj (C) in the normal healthy colons of Balb/cHeA and STS/A female mice (n = 3 mice per strain)

The average expression of the target gene is presented as a percentage of Hprt expression. P-values were calculated using Student‘s t-test. Real time primer/probe assays are detailed in Table 7.

116

~123kDa Epas1-GFP

GAPDH ~37kDa

Figure 11. Western blot of GFP-tagged Epas1 protien (Epas1-GFP) from transfected SW480 cells GFP primary antibody (1:500 in 1xTBST) was used to detect the Epas1-GFP fusion protein in SW480 cells 24 hours post-transfection with pEGFP-Epas1. The pEGFP-N1 EV construct expresses a ~27kDa GFP protein (not pictured). GAPDH is shown in the lower half of the figure to reflect protein loading.

117

Figure 12. TOPflash/FOPflash reporter assay reveals activation of β-catenin/TCF-driven transcription in SW480 (A) and HCT116 (B) cells cultured under hypoxic (1% O2) conditions SW480 cells show modest, but significant activation of the reporter by Epas1 under normoxic conditions (~21% O2), but this was not observed in HCT116 cells. Data are presented as the average of three replicates, normalized by β-galactosidase activity and shown as the TOPflash to FOPflash ratio of Relative Luminescence Units (TOP:FOP RLU). ANOVA followed by Tukey post hoc tests were used to determine statistical significance. The asterisk (*) indicates a comparison that met the threshold of significance (p-value < 0.05).

118

A

EPAS1 (~118kDa)

GAPDH (37kDa)

B

0.7 0.6 0.5 0.4 0.3 siControl 0.2 siEPAS1

0.1 Intensity RelativeGAPDH to 0 1nM 5nM 10nM siRNA Dose

Figure 13. Treatment with siEPAS1 dose-dependently reduces endogenous levels of EPAS1 in HCT116 cells at 24 hours post-transfection under normoxic (~21% O2) conditions

HCT116 cells were transfected with 1, 5, or 10nM siControl or siEPAS1. Protein was collected from cells at 24 hours post-transfection and 20μg was run on a 10% SDS-PAGE gel and blotted for endogenous EPAS1 (A, top panel) and GAPDH (A, bottom panel). Quantification using ImageJ (B) reveals a dose-dependent decrease in EPAS1 expression in siEPAS1-treated cells, with no evident change observed in siControl-treated cells. Data are presented as EPAS1 band intensity normalized to GAPDH. Approximately 65% knockdown was achieved by 5nM siEPAS1 relative to 5nM siControl and about 78% knockdown was attained with 10nM siEPAS1 relative to 10nM siControl.

119

HCT116 TOP/FOP Reporter Activity

40 35

30

25 20

TOP:FOPRLU 15 10

5 0 siControl siEPAS1

Figure 14. Normoxia (~21% O2) does not elicit EPAS1-dependent TOPflash/FOPflash reporter activity in HCT116 cells at 24 hours post-transfection with 5nM siControl or 5nM siEPAS1 Data are depicted as β-galactosidase-normalized ratios of TOPflash to FOPflash reporter activity, averaged across three replicates. P-value was not significant at α = 0.05.

120

SW480 c-MYC mRNA Expression 2.5

2

1.5 EV Epas1 1

Average % % Average18SrRNAof Expression 0.5

0 24hr 48hr

Figure 15. The enforced expression of Epas1-GFP in SW480 cells does not alter transcription of the canonical Wnt target gene c-MYC under normoxic culture conditions 24 or 48 hours post-transfection The mRNA expression of c-MYC was detected by Sybr Green qPCR and normalized to 18S rRNA. Data are depicted as the average percent expression of triplicates with respect to 18S rRNA.

121

Csnk1a1 mRNA Expression in Normoxic Fred Cells p < 0.0001

74 73

72 Expression 71

Hprt 70 69 68 67

Average % % Averageof 66

pEGFP-N1 EV pEGFP-Epas1

Ptprj mRNA Expression in Normoxic Fred Cells p = 0.0097

6

5

4

Expression

Hprt 3

2

1 Average % of

0 pEGFP-N1 EV pEGFP-Epas1

Figure 16. Epas1-GFP expression in Fred cells under normoxic conditions is associated with reduced expression of Csnk1a1 (A) and Ptprj (B)

RNA was collected from triplicate wells of Fred cells 48 hours post-transfection with pEGFP-N1 EV or pEGFP-Epas1 and 1μg was used as template for cDNA synthesis. Real time qPCR was performed using primer/probe assays described in Table 7. Relative percent expression was calculated by normalizing to Hprt. Student‘s t-test was used to test for significant differences in target gene expression. P-values are in the upper right hand corner of each panel.

122

MTT Proliferation Assay in Fred Cells 2

1.8

1.6

1.4

690nm) 1.2 -

1 Mock pEGFP-N1 EV 0.8 pEGFP-Epas1 0.6

Average ODAverage (550nm 0.4

0.2

0 24 48 72 96

Hours Post-Transfection

Figure 17. Enforced expression of Epas1-GFP in immortalized C57Bl/6 colon epithelial cells (Fred) does not affect cell viability or proliferation under normoxic culture conditions as measured by MTT assay, 24 to 96 hours post-transfection Growth curves were generated using the Roche Cell Proliferation Kit to assess mitochondrial metabolism of Fred cells that were transfected with pEGFP-N1 EV or pEGFP-Epas1 expression constructs. Mock-transfected cells were exposed to the transfection reagent and MTT protocol but not treated with any plasmid DNA. Data are presented as the average optical density (OD) of 4 replicates. No statistically significant differences in growth rates were observed by ANOVA at α = 0.05.

123

Chapter 5: Conclusions

Colorectal cancer (CRC) is responsible for over 50,000 deaths annually in the

United States.3 Estimates suggest that between 12 and 35% of risk for developing this disease is due to heritable, or genetic, risk factors.8,9 At this time, approximately 40 independent susceptibility loci have been discovered through large population-based genome-wide association studies (GWAS) of CRC patients and healthy controls.12

Cumulatively, these susceptibility loci explain just a fraction of the genetic risk for developing this deadly disease. Current strategies such as GWAS have been unsuccessful at uncovering the missing heritability for CRC. Innovative approaches are needed to fill in this important gap in knowledge before we can develop accurate genetic screening tools to predict individuals at heightened risk who would benefit from more intensive clinical surveillance. The three projects described in this dissertation were aimed at uncovering novel CRC susceptibility genes using a cross-species systems genetics approach that integrates transcriptome data from mouse colon cancer linkage regions, human allele-specific imbalance analysis, and in vitro functional studies of candidate susceptibility genes.

As described in Chapter 2, the first project we conducted involved assessing single nucleotide polymorphisms (SNPs) at loci that were linked to CRC risk by GWAS for allele-specific somatic copy number gains or losses in human colorectal tumor DNA

124 samples. We hypothesized that GWAS-identified SNPs demonstrate allele-specific copy number changes (termed ‗allele-specific imbalance,‘ or ASI) in the tumor genome of patient specimens. Such allele-specific somatic copy number alterations can demarcate genomic regions that function to promote or repress tumorigenesis, as gains of tumor- promoting alleles or losses of tumor-suppressing alleles can provide a selective growth advantage to tumor cells. We tested this hypothesis using quantitative genotyping to detect relative gains or losses of GWAS-identified SNP alleles in a cohort of human paired colorectal tumor and normal DNA samples. Testing of 17 SNPs revealed statistically significant allele-specific copy number changes at one SNP, rs6983267 at

8q24, which suggests that ASI occurs at CRC risk loci but perhaps at low frequency.

Importantly, our ASI findings replicate those observed at rs6983267 by Tuupanen et al.26

Chapter 3 discusses our second project, in which we conducted a large ASI screen of SNPs tagging for candidate susceptibility genes at the human orthologs of three murine CRC susceptibility quantitative trait loci (QTL). The Susceptibility to colon cancer (Scc) QTLs we were specifically interested in are Scc4, Scc5, and Scc13—as these

QTLs show interactions with an established tumor suppressor gene, Ptprj at Scc1, which modulate CRC risk.50,55 We prioritized genes for inclusion in this study based on RNA- seq transcriptome data generated from the normal colons of the CRC-resistant and CRC- susceptible mouse parental strains that were used to linkage-map these QTLs. Because no protein-damaging coding SNPs were identified in our RNA-seq study according to SIFT and PolyPhen-v2 analyses, we focused largely on genes showing differential expression between the strains and/or genes with a documented role in cancer-relevant pathways or

125 processes. A total of 525 haplotype-tagging SNPs in genes mapping to the human orthologs of Scc4, Scc5, and Scc13 were tested for ASI in a discovery sample set of matched colon tumor/normal DNA samples. Of these, 103 SNPs meeting a p-value threshold of <0.10 were tested in a validation sample set. After corrections for multiple comparisons, two SNPs in the gene SNX10 emerged from our screen as reproducibly exhibiting ASI in human colorectal tumor DNA samples. The SNX10 gene is a member of the sorting nexin family of proteins, which are involved in protein sorting and endosome trafficking.103,104 To date, SNX10 has not been implicated in any cancer types, but our data suggest it warrants further investigation.

Finally, in Chapter 4 we report our preliminary functional investigations of the hypoxia-inducible factor Epas1, whose gene maps to the Scc4 susceptibility QTL. By real time quantitative polymerase chain reaction (qPCR), we determined that this gene is differentially expressed in the colons of the CRC-resistant and CRC-susceptible mouse strains used to map Scc4. Furthermore, our human studies described in Chapter 3 revealed that five tagging SNPs within EPAS1 show suggestive evidence of ASI in human colorectal tumor/normal DNA pairs; however, these SNPs were not statistically significant after correction for multiple comparisons. These data prompted us to investigate EPAS1/Epas1 for functional effects in human and mouse colon cell lines. We tested this gene for effects on β-catenin/TCF-mediated transcriptional activity under conditions of normoxia and hypoxia in SW480 and HCT116 human colon adenocarcinoma cell lines using the TOPflash/FOPflash luciferase reporter system. Under hypoxic culture conditions, we observed activation of our TOPflash β-catenin/TCF

126 reporter construct when Epas1 expression was enforced, suggesting a plausible role for

EPAS1 in modifying CRC susceptibility by converging on the Wnt/β-catenin signaling pathway—a pathway that is aberrantly activated in the majority of CRCs.

Future studies will be aimed at verifying a role for EPAS1/Epas1 in the stimulation of β-catenin/TCF-driven transcription by (1) performing additional overexpression/knockdown experiments in SW480 and HCT116 cells cultured in normoxia and hypoxia, (2) testing canonical β-catenin/TCF- and HIF-target genes for transcriptional regulation in cells with enforced or silenced EPAS1/Epas1 expression using qPCR and Western blot, and (3) assessing effects on cell proliferation and cell cycle progression when EPAS1/Epas1 is overexpressed or knocked down in hypoxia. We will also perform co-immunoprecipitation experiments to test for physical interaction of

EPAS1/Epas1 and β-catenin protein in human colon adenocarcinoma and mouse immortalized colon epithelial cell lines.

In summary, our research illustrates the advantages of using a cross-species approach that integrates data from human association studies, mouse linkage mapping,

ASI analysis of human colorectal tumor genomes, and functional characterization in mouse and human cell lines for the purpose of uncovering novel susceptibility genes and variants that contribute to the genetic risk for this complex disease. We have identified genes such as SNX10 and EPAS1 as intriguing candidate CRC susceptibility genes.

Further investigation of these genes and their polymorphic variants has the exciting potential to reveal new risk alleles, highlight new therapeutic targets, and enhance our understanding of CRC biology.

127

References

1. Tortora GJ, Derrickson BH. Principles of Anatomy and Physiology. 2006.

2. Peterson LW, Artis D. Intestinal epithelial cells: regulators of barrier function and immune homeostasis. Nature reviews Immunology 2014;14:141-53.

3. American Cancer Society. Colorectal Cancer Facts & Figures 2014-2016. 2014.

4. U.S. Preventive Services Task Force. Screening for Colorectal Cancer: U.S. Preventive Services Task Force Recommendation Statement. Annals of internal medicine 2008;149:627-37.

5. Urosevic J, Garcia-Albeniz X, Planet E, et al. Colon cancer cells colonize the lung from established liver metastases through p38 MAPK signalling and PTHLH. Nature cell biology 2014;16:685-94.

6. Calvert PM, Frucht H. The genetics of colorectal cancer. Annals of internal medicine 2002;137:603-12.

7. Kemp Z, Thirlwell C, Sieber O, Silver A, Tomlinson I. An update on the genetics of colorectal cancer. Human molecular genetics 2004;13 Spec No 2:R177-85.

8. Lichtenstein P, Holm NV, Verkasalo PK, et al. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. The New England journal of medicine 2000;343:78-85.

9. Czene K, Lichtenstein P, Hemminki K. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish Family-Cancer Database. International journal of cancer 2002;99:260-6.

10. Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science 2015;347:78-81.

11. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends in genetics 2001;17:502-10.

12. Zhang K, Civan J, Mukherjee S, Patel F, Yang H. Genetic variations in colorectal cancer risk and clinical outcome. World journal of gastroenterology 2014;20:4167-77.

128

13. Maher B. Personal genomes: The case of the missing heritability. Nature 2008;456:18-21.

14. Dawn Teare M, Barrett JH. Genetic linkage studies. Lancet 2005;366:1036-44.

15. Baron M. The search for complex disease genes: fault by linkage or fault by association? Molecular psychiatry 2001;6:143-9.

16. Easton DF, Eeles RA. Genome-wide association studies in cancer. Human molecular genetics 2008;17:R109-15.

17. Tenesa A, Dunlop MG. New insights into the aetiology of colorectal cancer from genome-wide association studies. Nature reviews Genetics 2009;10:353-8.

18. Jiao S, Peters U, Berndt S, et al. Estimating the heritability of colorectal cancer. Human molecular genetics 2014;23:3898-905.

19. Martin ES, Tonon G, Sinha R, et al. Common and distinct genomic events in sporadic colorectal cancer and diverse cancer types. Cancer research 2007;67:10736-43.

20. Xie T, G DA, Lamb JR, et al. A comprehensive characterization of genome-wide copy number aberrations in colorectal cancer reveals novel oncogenes and patterns of alterations. PloS one 2012;7:e42001.

21. Ahn JW, Coldwell M, Bint S, Mackie Ogilvie C. Array Comparative Genomic Hybridization (Array CGH) for Detection of Genomic Copy Number Variants. Journal of visualized experiments 2015.

22. Mei R, Galipeau PC, Prass C, et al. Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome research 2000;10:1126-37.

23. Skotheim RI, Diep CB, Kraggerud SM, Jakobsen KS, Lothe RA. Evaluation of loss of heterozygosity/allelic imbalance scoring in tumor DNA. Cancer genetics and cytogenetics 2001;127:64-70.

24. Staaf J, Jonsson G, Ringner M, Baldetorp B, Borg A. Landscape of somatic allelic imbalances and copy number alterations in HER2-amplified breast cancer. Breast cancer research 2011;13:R129.

25. Nagase H, Mao JH, Balmain A. Allele-specific Hras mutations and genetic alterations at tumor susceptibility loci in skin carcinomas from interspecific hybrid mice. Cancer research 2003;63:4849-53.

129

26. Tuupanen S, Niittymaki I, Nousiainen K, et al. Allelic imbalance at rs6983267 suggests selection of the risk allele in somatic colorectal tumor evolution. Cancer research 2008;68:14-7.

27. Niittymaki I, Tuupanen S, Li Y, et al. Systematic search for enhancer elements and somatic allelic imbalance at seven low-penetrance colorectal cancer predisposition loci. BMC medical genetics 2011;12:23.

28. Gerber MM, Hampel H, Schulz NP, et al. Evaluation of allele-specific somatic changes of genome-wide association study susceptibility alleles in human colorectal cancers. PloS one 2012;7:e37672.

29. Pomerantz MM, Ahmadiyeh N, Jia L, et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nature genetics 2009;41:882-4.

30. Prokunina-Olsson L, Hall JL. No effect of cancer-associated SNP rs6983267 in the 8q24 region on co-expression of MYC and TCF7L2 in normal colon tissue. Molecular cancer 2009;8:96.

31. Tuupanen S, Turunen M, Lehtonen R, et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nature genetics 2009;41:885-90.

32. Kim T, Cui R, Jeon YJ, et al. Long-range interaction and correlation between MYC enhancer and oncogenic long noncoding RNA CARLo-5. Proceedings of the National Academy of Sciences of the United States of America 2014;111:4173-8.

33. Chen H, Bell JM, Zavala NA, Ji HP, Zhang NR. Allele-specific copy number profiling by next-generation DNA sequencing. Nucleic acids research 2015;43:e23.

34. Fleming JL, Dworkin AM, Allain DC, et al. Allele-specific imbalance mapping identifies HDAC9 as a candidate gene for cutaneous squamous cell carcinoma. International journal of cancer 2014;134:244-8.

35. Nandan MO, Yang VW. Genetic and Chemical Models of Colorectal Cancer in Mice. Current colorectal cancer reports 2010;6:51-9.

36. Kwong LN, Dove WF. APC and its modifiers in colon cancer. Advances in experimental medicine and biology 2009;656:85-106.

37. Su LK, Kinzler KW, Vogelstein B, et al. Multiple intestinal neoplasia caused by a mutation in the murine homolog of the APC gene. Science 1992;256:668-70.

130

38. Dietrich WF, Lander ES, Smith JS, et al. Genetic identification of Mom-1, a major modifier locus affecting Min-induced intestinal neoplasia in the mouse. Cell 1993;75:631-9.

39. Silverman KA, Koratkar R, Siracusa LD, Buchberg AM. Identification of the modifier of Min 2 (Mom2) locus, a new mutation that influences Apc-induced intestinal neoplasia. Genome research 2002;12:88-97.

40. Kwong LN, Shedlovsky A, Biehl BS, Clipson L, Pasch CA, Dove WF. Identification of Mom7, a novel modifier of Apc(Min/+) on mouse chromosome 18. Genetics 2007;176:1237-44.

41. Crist RC, Roth JJ, Lisanti MP, Siracusa LD, Buchberg AM. Identification of Mom12 and Mom13, two novel modifier loci of Apc (Min) -mediated intestinal tumorigenesis. Cell Cycle 2011;10:1092-9.

42. Nnadi SC, Watson R, Innocent J, Gonye GE, Buchberg AM, Siracusa LD. Identification of five novel modifier loci of Apc(Min) harbored in the BXH14 recombinant inbred strain. Carcinogenesis 2012;33:1589-97.

43. Oikarinen SI, Cleveland AG, Cork KM, et al. Genetic mapping of Mom5, a novel modifier of Apc(Min)-induced intestinal tumorigenesis. Carcinogenesis 2009;30:1591-6.

44. Rosenberg DW, Giardina C, Tanaka T. Mouse models for the study of colon carcinogenesis. Carcinogenesis 2009;30:183-96.

45. Moen CJ, van der Valk MA, Snoek M, et al. The recombinant congenic strains--a novel genetic tool applied to the study of colon tumor development in the mouse. Mammalian genome 1991;1:217-27.

46. Moen CJ, Groot PC, Hart AA, Snoek M, Demant P. Fine mapping of colon tumor susceptibility (Scc) genes in the mouse, different from the genes known to be somatically mutated in colon cancer. Proceedings of the National Academy of Sciences of the United States of America 1996;93:1082-6.

47. Moen CJ, Snoek M, Hart AA, Demant P. Scc-1, a novel colon cancer susceptibility gene in the mouse: linkage to CD44 (Ly-24, Pgp-1) on chromosome 2. Oncogene 1992;7:563-6.

48. van Wezel T, Stassen AP, Moen CJ, Hart AA, van der Valk MA, Demant P. Gene interaction and single gene effects in colon tumour susceptibility in mice. Nature genetics 1996;14:468-70.

131

49. van Wezel T, Ruivenkamp CA, Stassen AP, Moen CJ, Demant P. Four new colon cancer susceptibility loci, Scc6 to Scc9 in the mouse. Cancer research 1999;59:4216-8.

50. Ruivenkamp CA, Csikos T, Klous AM, van Wezel T, Demant P. Five new mouse susceptibility to colon cancer loci, Scc11-Scc15. Oncogene 2003;22:7258-60.

51. Jacoby RF, Hohman C, Marshall DJ, et al. Genetic analysis of colon cancer susceptibility in mice. Genomics 1994;22:381-7.

52. Angel JM, Popova N, Lanko N, Turusov VS, DiGiovanni J. A locus that influences susceptibility to 1,2-dimethylhydrazine–induced colon tumors maps to the distal end of mouse chromosome 3. Molecular carcinogenesis 2000;27:47-54.

53. Eversley CD, Yuying X, Pearsall RS, Threadgill DW. Mapping six new susceptibility to colon cancer (Scc) loci using a mouse interspecific backcross. G3 (Bethesda) 2012;2:1577-84.

54. Ewart-Toland A, Balmain A. The genetics of cancer susceptibility: from mouse to man. Toxicologic pathology 2004;32 Suppl 1:26-30.

55. Ruivenkamp CA, van Wezel T, Zanon C, et al. Ptprj is a candidate for the mouse colon-cancer susceptibility locus Scc1 and is frequently deleted in human cancers. Nature genetics 2002;31:295-300.

56. Toland AE, Rozek LS, Presswala S, Rennert G, Gruber SB. PTPRJ haplotypes and colorectal cancer risk. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology 2008;17:2782-5.

57. Balmain A, Nagase H. Cancer resistance genes in mice: models for the study of tumour modifiers. Trends in genetics 1998;14:139-44.

58. Quigley D, Balmain A. Systems genetics analysis of cancer susceptibility: from mouse models to humans. Nature reviews Genetics 2009;10:651-7.

59. Mita Y, Yasuda Y, Sakai A, et al. Missense polymorphisms of PTPRJ and PTPN13 genes affect susceptibility to a variety of human cancers. Journal of cancer research and clinical oncology 2010;136:249-59.

60. Chial H. Tumor suppressor (TS) genes and the two-hit hypothesis. Nature Education 2008;1:177.

61. Balmain A, Gray J, Ponder B. The genetics and genomics of cancer. Nature genetics 2003;33 Suppl:238-44. 132

62. Croce CM. Oncogenes and cancer. The New England journal of medicine 2008;358:502-11.

63. Dworkin AM, Ridd K, Bautista D, et al. Germline variation controls the architecture of somatic alterations in tumors. PLoS genetics 2010;6:e1001136.

64. Ewart-Toland A, Briassouli P, de Koning JP, et al. Identification of Stk6/STK15 as a candidate low-penetrance tumor-susceptibility gene in mouse and human. Nature genetics 2003;34:403-12.

65. Hienonen T, Salovaara R, Mecklin JP, Jarvinen H, Karhu A, Aaltonen LA. Preferential amplification of AURKA 91A (Ile31) in familial colorectal cancers. International journal of cancer 2006;118:505-8.

66. LaFramboise T, Dewal N, Wilkins K, Pe'er I, Freedman ML. Allelic selection of amplicons in glioblastoma revealed by combining somatic and germline analysis. PLoS genetics 2010;6:e1001086.

67. Tomlinson I, Webb E, Carvajal-Carmona L, et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nature genetics 2007;39:984-8.

68. Houlston RS, Webb E, Broderick P, et al. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nature genetics 2008;40:1426-35.

69. Jaeger E, Webb E, Howarth K, et al. Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nature genetics 2008;40:26-8.

70. Pittman AM, Webb E, Carvajal-Carmona L, et al. Refinement of the basis and impact of common 11q23.1 variation to the risk of developing colorectal cancer. Human molecular genetics 2008;17:3720-7.

71. Tenesa A, Farrington SM, Prendergast JGD, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nature genetics 2008;40:631-7.

72. Tomlinson IP, Webb E, Carvajal-Carmona L, et al. A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nature genetics 2008;40:623-30.

133

73. Houlston RS, Cheadle J, Dobbins SE, et al. Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nature genetics 2010;42:973-7.

74. Xiong F, Wu C, Bi X, et al. Risk of genome-wide association study-identified genetic variants for colorectal cancer in a Chinese population. Cancer epidemiology, biomarkers & prevention 2010;19:1855-61.

75. Haiman CA, Le Marchand L, Yamamato J, et al. A common genetic risk factor for colorectal and prostate cancer. Nature genetics 2007;39:954-6.

76. Yeager M, Orr N, Hayes RB, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature genetics 2007;39:645-9.

77. Micale L, Augello B, Daniele G, et al. Amplification of the G allele at SNP rs6983267 in 8q24 amplicons in myeloid malignancies as cause of the lack of MYC overexpression? Blood Cells, Molecules, and Diseases 2011;47:259-61.

78. Yearsley M, Hampel H, Lehman A, Nakagawa H, de la Chapelle A, Frankel WL. Histologic features distinguish microsatellite-high from microsatellite-low and microsatellite-stable colorectal carcinomas, but do not differentiate germline mutations from methylation of the MLH1 promoter. Human pathology 2006;37:831-8.

79. Hampel H, Frankel WL, Martin E, et al. Screening for the Lynch syndrome (hereditary nonpolyposis colorectal cancer). The New England journal of medicine 2005;352:1851-60.

80. Broderick P, Carvajal-Carmona L, Pittman AM, et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nature genetics 2007;39:1315-7.

81. Poynter JN, Figueiredo JC, Conti DV, et al. Variants on 9p24 and 8q24 are associated with risk of colorectal cancer: results from the Colon Cancer Family Registry. Cancer research 2007;67:11128-32.

82. Zanke BW, Greenwood CM, Rangrej J, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nature genetics 2007;39:989-94.

83. Kocarnik JD, Hutter CM, Slattery ML, et al. Characterization of 9p24 risk locus and colorectal adenoma and cancer: gene-environment interaction and meta-analysis. Cancer epidemiology, biomarkers & prevention 2010;19:3131-9.

134

84. Cui R, Okada Y, Jang SG, et al. Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population. Gut 2011;60:799-805.

85. Peters U, Hutter CM, Hsu L, et al. Meta-analysis of new genome-wide association studies of colorectal cancer risk. Human genetics 2012;131:217-34.

86. Wang C, Horiuchi A, Imai T, et al. Expression of BRCA1 protein in benign, borderline, and malignant epithelial ovarian neoplasms and its relationship to methylation and allelic loss of the BRCA1 gene. The Journal of pathology 2004;202:215-23.

87. Weber F, Shen L, Fukino K, et al. Total-Genome Analysis of BRCA1/2-Related Invasive Carcinomas of the Breast Identifies Tumor Stroma as Potential Landscaper for Neoplastic Initiation. American journal of human genetics 2006;78:961-72.

88. Werness BA, Parvatiyar P, Ramus SJ, et al. Ovarian Carcinoma In Situ With Germline BRCA1 Mutation and Loss of Heterozygosity at BRCA1 and TP53. Journal of the National Cancer Institute 2000;92:1088-91.

89. Marsh DJ, Zheng Z, Zedenius J, et al. Differential loss of heterozygosity in the region of the Cowden locus within 10q22-23 in follicular thyroid adenomas and carcinomas. Cancer research 1997;57:500-3.

90. Nelson HH, Wilkojmen M, Marsit CJ, Kelsey KT. TP53 mutation, allelism and survival in non-small cell lung cancer. Carcinogenesis 2005;26:1770-3.

91. Dacic S, Ionescu DN, Finkelstein S, Yousem SA. Patterns of allelic loss of synchronous adenocarcinomas of the lung. The American journal of surgical pathology 2005;29:897-902.

92. Cicek MS, Slager SL, Achenbach SJ, et al. Functional and clinical significance of variants localized to 8q24 in colon cancer. Cancer epidemiology, biomarkers & prevention 2009;18:2492-500.

93. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell 1990;61:759-67.

94. Pino MS, Chung DC. The chromosomal instability pathway in colon cancer. Gastroenterology 2010;138:2059-72.

95. Hemminki A, Mecklin JP, Jarvinen H, Aaltonen LA, Joensuu H. Microsatellite instability is a favorable prognostic indicator in patients with colorectal cancer receiving chemotherapy. Gastroenterology 2000;119:921-8.

135

96. Issa JP. Colon cancer: it's CIN or CIMP. Clinical cancer research : an official journal of the American Association for Cancer Research 2008;14:5939-40.

97. Suzuki R, Kohno H, Sugie S, Nakagama H, Tanaka T. Strain differences in the susceptibility to azoxymethane and dextran sodium sulfate-induced colon carcinogenesis in mice. Carcinogenesis 2006;27:162-9.

98. Liu P, Lu Y, Liu H, et al. Genome-wide association and fine mapping of genetic loci predisposing to colon carcinogenesis in mice. Molecular cancer research 2012;10:66- 74.

99. Jones AM, Douglas EJ, Halford SE, et al. Array-CGH analysis of microsatellite- stable, near-diploid bowel cancers and comparison with other types of colorectal carcinoma. Oncogene 2005;24:118-29.

100. Dyrso T, Li J, Wang K, et al. Identification of chromosome aberrations in sporadic microsatellite stable and unstable colorectal cancers using array comparative genomic hybridization. Cancer genetics 2011;204:84-95.

101. Nakao K, Mehta KR, Fridlyand J, et al. High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. Carcinogenesis 2004;25:1345-57.

102. Lassmann S, Weis R, Makowiec F, et al. Array CGH identifies distinct DNA copy number profiles of oncogenes and tumor suppressor genes in chromosomal- and microsatellite-unstable sporadic colorectal carcinomas. J Mol Med (Berl) 2007;85:293- 304.

103. Worby CA, Dixon JE. Sorting out the cellular functions of sorting nexins. Nature reviews Molecular cell biology 2002;3:919-31.

104. Cullen PJ. Endosomal sorting and signalling: an emerging role for sorting nexins. Nature reviews Molecular cell biology 2008;9:574-82.

105. Pangrazio A, Fasth A, Sbardellati A, et al. SNX10 mutations define a subgroup of human autosomal recessive osteopetrosis with variable clinical severity. Journal of bone and mineral research 2013;28:1041-9.

106. Ellson CD, Andrews S, Stephens LR, Hawkins PT. The PX domain: a new phosphoinositide-binding module. Journal of cell science 2002;115:1099-105.

107. Comartin D, Gupta GD, Fussner E, et al. CEP120 and SPICE1 cooperate with CPAP in centriole elongation. Current biology 2013;23:1360-6.

136

108. Dube JB, Johansen CT, Robinson JF, Lindsay J, Hachinski V, Hegele RA. Genetic determinants of "cognitive impairment, no dementia". Journal of Alzheimer's disease 2013;33:831-40.

109. Tsujita K, Itoh T, Ijuin T, et al. Myotubularin regulates the function of the late endosome through the gram domain-phosphatidylinositol 3,5-bisphosphate interaction. The Journal of biological chemistry 2004;279:13817-24.

110. Gordan JD, Bertout JA, Hu CJ, Diehl JA, Simon MC. HIF-2alpha promotes hypoxic cell proliferation by enhancing c-myc transcriptional activity. Cancer cell 2007;11:335-47.

111. Ciampa J, Yeager M, Amundadottir L, et al. Large-scale exploration of gene-gene interactions in prostate cancer using a multistage genome-wide association study. Cancer research 2011;71:3287-95.

112. Purdue MP, Johansson M, Zelenika D, et al. Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3. Nature genetics 2011;43:60-5.

113. Edwards SL, Beesley J, French JD, Dunning AM. Beyond GWASs: illuminating the dark road from association to function. American journal of human genetics 2013;93:779-97.

114. Quan L, Stassen AP, Ruivenkamp CA, et al. Most lung and colon cancer susceptibility genes are pair-wise linked in mice, humans and rats. PloS one 2011;6:e14727.

115. Castellvi-Bel S, Ruiz-Ponte C, Fernandez-Rozadilla C, et al. Seeking genetic susceptibility variants for colorectal cancer: the EPICOLON consortium experience. Mutagenesis 2012;27:153-9.

116. Lesueur F, Pharoah PD, Laing S, et al. Allelic association of the human homologue of the mouse modifier Ptprj with breast cancer. Human molecular genetics 2005;14:2349-56.

117. Vaupel P. The role of hypoxia-induced factors in tumor progression. The oncologist 2004;9 Suppl 5:10-7.

118. van der Sanden B, Dhobb M, Berger F, Wion D. Optimizing stem cell culture. Journal of cellular biochemistry 2010;111:801-7.

119. Bertout JA, Patel SA, Simon MC. The impact of O2 availability on human cancer. Nature reviews Cancer 2008;8:967-75. 137

120. Rankin EB, Giaccia AJ. The role of hypoxia-inducible factors in tumorigenesis. Cell death and differentiation 2008;15:678-85.

121. Keith B, Johnson RS, Simon MC. HIF1alpha and HIF2alpha: sibling rivalry in hypoxic tumour growth and progression. Nature reviews Cancer 2012;12:9-22.

122. Greer SN, Metcalf JL, Wang Y, Ohh M. The updated biology of hypoxia- inducible factor. The EMBO journal 2012;31:2448-60.

123. Sufan RI, Jewett MA, Ohh M. The role of von Hippel-Lindau tumor suppressor protein and hypoxia in renal clear cell carcinoma. American journal of physiology Renal physiology 2004;287:F1-6.

124. Choi H, Chun YS, Kim TY, Park JW. HIF-2alpha enhances beta-catenin/TCF- driven transcription by interacting with beta-catenin. Cancer research 2010;70:10101-11.

125. Casagrande S, Ruf M, Rechsteiner M, et al. The protein tyrosine phosphatase receptor type J is regulated by the pVHL-HIF axis in clear cell renal cell carcinoma. The Journal of pathology 2013;229:525-34.

126. Talks KL, Turley H, Gatter KC, et al. The expression and distribution of the hypoxia-inducible factors HIF-1alpha and HIF-2alpha in normal human tissues, cancers, and tumor-associated macrophages. The American journal of pathology 2000;157:411- 21.

127. Rasheed S, Harris AL, Tekkis PP, et al. Hypoxia-inducible factor-1alpha and - 2alpha are expressed in most rectal cancers but only hypoxia-inducible factor-1alpha is associated with prognosis. British journal of cancer 2009;100:1666-73.

128. Yoshimura H, Dhar DK, Kohno H, et al. Prognostic impact of hypoxia-inducible factors 1alpha and 2alpha in colorectal cancer patients: correlation with tumor angiogenesis and cyclooxygenase-2 expression. Clinical cancer research 2004;10:8554- 60.

129. Imamura T, Kikuchi H, Herraiz MT, et al. HIF-1alpha and HIF-2alpha have divergent roles in colon cancer. International journal of cancer 2009;124:763-71.

130. Rawluszko-Wieczorek AA, Horbacka K, Krokowicz P, Misztal M, Jagodzinski PP. Prognostic potential of DNA methylation and transcript levels of HIF1A and EPAS1 in colorectal cancer. Molecular cancer research 2014;12:1112-27.

138

131. Xue X, Taylor M, Anderson E, et al. Hypoxia-inducible factor-2alpha activation promotes colorectal cancer progression by dysregulating iron homeostasis. Cancer research 2012;72:2285-93.

132. Gordan JD, Thompson CB, Simon MC. HIF and c-Myc: sibling rivals for control of cancer cell metabolism and proliferation. Cancer cell 2007;12:108-13.

133. Veeman MT, Slusarski DC, Kaykas A, Louie SH, Moon RT. Zebrafish prickle, a modulator of noncanonical Wnt/Fz signaling, regulates gastrulation movements. Current biology 2003;13:680-5.

134. Stamos JL, Weis WI. The beta-catenin destruction complex. Cold Spring Harbor perspectives in biology 2013;5:a007898.

135. Bruick RK. Oxygen sensing in the hypoxic response pathway: regulation of the hypoxia-inducible transcription factor. Genes & development 2003;17:2614-23.

136. Criscimanna A, Duan LJ, Rhodes JA, et al. PanIN-specific regulation of Wnt signaling by HIF2alpha during early pancreatic tumorigenesis. Cancer research 2013;73:4781-90.

137. Camps J, Armengol G, del Rey J, et al. Genome-wide differences between microsatellite stable and unstable colorectal tumors. Carcinogenesis 2006;27:419-28.

138. Augenlicht LH, Wadler S, Corner G, et al. Low-level c-myc amplification in human colonic carcinoma cell lines and tumors: a frequent, -independent mutation associated with improved outcome in a randomized multi-institutional trial. Cancer research 1997;57:1769-75.

139. Mazure NM, Chauvet C, Bois-Joyeux B, Bernard MA, Nacer-Cherif H, Danan JL. Repression of alpha-fetoprotein gene expression under hypoxic conditions in human hepatoma cells: characterization of a negative hypoxia response element that mediates opposite effects of hypoxia inducible factor-1 and c-Myc. Cancer research 2002;62:1158- 65.

140. Peyssonnaux C, Zinkernagel AS, Schuepbach RA, et al. Regulation of iron homeostasis by the hypoxia-inducible transcription factors (HIFs). The Journal of clinical investigation 2007;117:1926-32.

141. Chen KF, Lai YY, Sun HS, Tsai SJ. Transcriptional repression of human cad gene by hypoxia inducible factor-1alpha. Nucleic acids research 2005;33:5190-8.

139

Appendix A: Chapter 2 Supplemental Table

Primer SNP Multiplex 5'-3' Primer Sequence Type rs10411210 Discovery Forward ACGTTGGATGAGAGCGGAGCTTGGCAAAAT Discovery Reverse ACGTTGGATGTGACTCAACAAGCACCAACG Discovery Extension GGCAAAATGCCCTATTTGTTTT Validation Forward ACGTTGGATGTGTCAGAGGAAACCCTGAAG Validation Reverse ACGTTGGATGGAGCGGAGCTTGGCAAAATG Validation Extension CACCAACGGTTTCCC rs10936599 Discovery Forward ACGTTGGATGCCATTCTGATATCAAAATGC Discovery Reverse ACGTTGGATGTTCCCGCTGTTTGTTCAGTC Discovery Extension TGCAGTATTCGCACCA Validation Forward ACGTTGGATGCAAGGGTAAAATTCCATTCTG Validation Reverse ACGTTGGATGTTCCCGCTGTTTGTTCAGTC Validation Extension GCAGTATTCGCACCA rs11169552 Discovery Forward ACGTTGGATGTAGCTCCTCTTCCTTAGTGC Discovery Reverse ACGTTGGATGATTACAGGCGCATGCCACCA Discovery Extension gagtcGGTCTAAATCAAATGTTACGCTC rs16892766 Discovery Forward ACGTTGGATGGGGTGACATAAGGCATAACC Discovery Reverse ACGTTGGATGCTACTTAGGGACTCAGAACG Discovery Extension GGCATAACCTTTAACAGC Validation Forward ACGTTGGATGCTACTTAGGGACTCAGAACG Validation Reverse ACGTTGGATGGGGTGACATAAGGCATAACC Validation Extension ttccGACGCAAACAGTTTCAAGACTATT rs3802842 Discovery Forward ACGTTGGATGCACAGATGCTATCCTGGAAG Discovery Reverse ACGTTGGATGACAGCCCTTGCAGACCCATA Discovery Extension GAGGTGAATTTCTGGGA Validation Forward ACGTTGGATGGGCCCCTAAAATGAGGTGAA Validation Reverse ACGTTGGATGCATCGTTTTGTTAGGAAGAC Validation Extension GAGGTGAATTTCTGGGA rs4444235 Discovery Forward ACGTTGGATGGATTTGGACATGATGCCCAC Discovery Reverse ACGTTGGATGCCTCAAGCAATTTTAAGCTG Discovery Extension ccaccTGATGCCCACAGCCCTGATACTA rs4779584 Discovery Forward ACGTTGGATGTGTGCCAGTAGAACTTGTTG Discovery Reverse ACGTTGGATGCAAATATTTTTGTTTCCTGTG Discovery Extension ACTTGTTGATAAGCCATTCTTC rs4925386 Discovery Forward ACGTTGGATGACAATCGGAGATGCAGGCAG Discovery Reverse ACGTTGGATGACCACTGCCTGCTGGACTC Discovery Extension GATGCAGGCAGGATAACA Table 8. Primers for Sequenom® MassARRAY quantitative genotyping of GWAS- identified SNPs continued

140

Table 8 continued Primer SNP Multiplex 5'-3' Primer Sequence Type rs4939827 Discovery Forward ACGTTGGATGCTAATCCACCATGCTCACAG Discovery Reverse ACGTTGGATGTCTGTGTTTCCTGAGGAGTC Discovery Extension cccatCCTCATCCAAAAGAGGAAA rs6687758 Discovery Forward ACGTTGGATGAAGGTGGAGAAAGAATGTGC Discovery Reverse ACGTTGGATGGTTTGAGAAGCACTTGGAGG Discovery Extension TGCATCTCTAGATTCCATATT rs6691170 Discovery Forward ACGTTGGATGGGAGGAAGAACTACCATTGC Discovery Reverse ACGTTGGATGTCTTCAGTGACTCTTTTCAG Discovery Extension agacCTACCATTGCTAGTAAAGATCCT rs6983267 Discovery Forward ACGTTGGATGGTCAATAGCACATAAAAATTC Discovery Reverse ACGTTGGATGACCCTCATCGTCCTTTGAGC Discovery Extension taacAATTCTTTGTACTTTTCTCAGTG Validation Forward ACGTTGGATGACCCTCATCGTCCTTTGAGC Validation Reverse ACGTTGGATGGTCAATAGCACATAAAAATTC Validation Extension AGCTCAGCAGATGAAAG rs7014346 Discovery Forward ACGTTGGATGCAGTTCAAGATGGCTTCTGG Discovery Reverse ACGTTGGATGTTCAGGCCATTGCCCTTTTC Discovery Extension gggaCTTCTGGAGTGCTACC rs7136702 Discovery Forward ACGTTGGATGGCAAATGTGCTCTGCAGAAG Discovery Reverse ACGTTGGATGTCTTTTGCAATTTCCCCCCC Discovery Extension aGCTCTGCAGAAGAGACTG Validation Forward ACGTTGGATGTCTTTTGCAATTTCCCCCCC Validation Reverse ACGTTGGATGGCAAATGTGCTCTGCAGAAG Validation Extension gGTGGCTTGTATTTTATTCTCTCA rs719725 Discovery Forward ACGTTGGATGCTCTTAGTGAAGTTTGACAG Discovery Reverse ACGTTGGATGGATGATTATGTTCAATGAG Discovery Extension cgatAACTAATGTTTATTGATGCTATC rs961253 Discovery Forward ACGTTGGATGCCTTGATGCTCAGCAACTTC Discovery Reverse ACGTTGGATGAGTGCATACCAAGTATTGAG Discovery Extension CAGCAACTTCAATTAATCTTTCTGAAT rs9929218 Discovery Forward ACGTTGGATGGAATGATTCAAAGGTTCTG Discovery Reverse ACGTTGGATGTGGCTGACACCAAAATGTGC Discovery Extension gtttgAAGGTTCTGAATTCCACAAC

Table 8. Primers for Sequenom MassARRAY quantitative genotyping of GWAS- identified SNPs

141

Appendix B: Chapter 3 Supplemental Tables

SNP ID Primer Type Primer Sequence (5' to 3') rs10044860 Extension ggTGCAAGGATGTATTTACTAGGAT rs10044860 Forward ACGTTGGATGTAAAGCTCCAGTCTGCAAGG rs10044860 Reverse ACGTTGGATGGGCTAATAGAGTAGTGTTGC rs10050399 Extension gCCAATCAAGTTCCCCTTC rs10050399 Forward ACGTTGGATGCACCCAAACAATTGCTCTAC rs10050399 Reverse ACGTTGGATGGGTAACATTTTTCCAGGGTA rs10054158 Extension TCCTGGGTAAGTTGGACTA rs10054158 Forward ACGTTGGATGATAAACCACAGGCCTGGTGC rs10054158 Reverse ACGTTGGATGCATTTCCCCACATTTGAGAG rs10056807 Extension CCAAACACTGAAACAGATA rs10056807 Forward ACGTTGGATGGGGCTATTTTAGGAGTCCTT rs10056807 Reverse ACGTTGGATGTGTGAGACAGTAATCAGCC rs10058728 Extension AAGACCTAGGAAATTTTGG rs10058728 Forward ACGTTGGATGTGGATTCTCAACCATTTGTG rs10058728 Reverse ACGTTGGATGCTACAGCCAATGCATCAATC rs10065900 Extension ccctcTAACACCAACTTGAGCCAG rs10065900 Forward ACGTTGGATGACCTAATAATAATGCTGCG rs10065900 Reverse ACGTTGGATGTGTAAAGAATAACACCAAC rs10066188 Extension ccgAGCAGCAGGGTCACAGC rs10066188 Forward ACGTTGGATGCCAGGAGATTGAGACTAGAG rs10066188 Reverse ACGTTGGATGAGATTTTGAAAGAAGGAGG rs10071667 Extension aaACATACGTAAGGAAAAAGAAG rs10071667 Forward ACGTTGGATGGATTTAACCAAAACACCCTC rs10071667 Reverse ACGTTGGATGCTCTTATTTTTCGCCATGGG rs10072056 Extension agtTTCCCTCCGAGCCCA rs10072056 Forward ACGTTGGATGAAGTCCTGAAGCCTCCTCTG rs10072056 Reverse ACGTTGGATGAAGTGCTCAACTGAGCCAAG rs10076327 Extension cgaAAAAGAGAGAAGACAAAGAATA rs10076327 Forward ACGTTGGATGGATGCTGACGTAAGCATGAG Table 9. Primers for Sequenom® MassARRAY quantitative SNP genotyping continued

142

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs10076327 Reverse ACGTTGGATGTAACATTCCCAGAGCCTCGC rs10077431 Extension gtcTTCCTTCCTCCCATTTTA rs10077431 Forward ACGTTGGATGTCTCCTCCTCACCTCTATTC rs10077431 Reverse ACGTTGGATGCCCTTCCAATCTCAGAAGAC rs10080096 Extension gtATTAGGCCTCTGGCT rs10080096 Forward ACGTTGGATGAGTCCTTTCCATGATTAGGC rs10080096 Reverse ACGTTGGATGGTCTAGGCACAGTGAACAAG rs1008776 Extension TGGTTTTGCATATTTCCTTT rs1008776 Forward ACGTTGGATGTTGGAATTGTACCTTATGG rs1008776 Reverse ACGTTGGATGGAAGAAATTTCTATTCCTGTG rs1010101 Extension TGGCGCATTTCTGTCTTCAA rs1010101 Forward ACGTTGGATGGTTTCAGCTTGGCGCATTTC rs1010101 Reverse ACGTTGGATGGAAGTATGGACCCTCTTTGG rs1012543 Extension cccgGACAACAAAGTCACCGTTACC rs1012543 Forward ACGTTGGATGCCATTGACAACAAAGTCACC rs1012543 Reverse ACGTTGGATGCCTTTATGAATGTAGGGGAC rs10178633 Extension ccaaCGAGGAACTGAACTTTATACA rs10178633 Forward ACGTTGGATGATGTGGCTAGAGAAACCGAG rs10178633 Reverse ACGTTGGATGAGCCAAGTGTGGATATTTAC rs10182633 Extension gacTTATGAGTTGTTCAAAGGTAG rs10182633 Forward ACGTTGGATGCCTGTCTTTAATTATGAGTTG rs10182633 Reverse ACGTTGGATGCTCACGGTATTTCATGAGAA rs10185866 Extension cCCCTTTAAGTGAGTTTCCT rs10185866 Forward ACGTTGGATGACTGCAGTTATAGTCCTCCC rs10185866 Reverse ACGTTGGATGGCTGCTTAGTATCACTTTCC rs10190161 Extension AAAACAGGATCTTTGGC rs10190161 Forward ACGTTGGATGGGTGATTCAAATAATCAGG rs10190161 Reverse ACGTTGGATGCAATGAGAATACATCTCAGC rs10199201 Extension aggcGGGGATTGAGATAACAGATA rs10199201 Forward ACGTTGGATGATGGTGCTTACCCTGCAATG rs10199201 Reverse ACGTTGGATGAATATGTGGCAGGCTAGAGG rs10213948 Extension gaTGCTTTGAGACAAAGGT rs10213948 Forward ACGTTGGATGGACCCCAAATCCACTTTTTAC rs10213948 Reverse ACGTTGGATGGAGGTTTACAGGAGCACCAG continued

143

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs10230195 Extension ggacCTTCAGCCAGCTGGCGGAAC rs10230195 Forward ACGTTGGATGTTTGCAGGACCTCACTTCAG rs10230195 Reverse ACGTTGGATGCTCAGAGGCCTTTTTGGTTC rs10233387 Extension caCCACTAGGATCGGCC rs10233387 Forward ACGTTGGATGTGGGTTAGACTGGGGAAGC rs10233387 Reverse ACGTTGGATGGCTTGGCTGTCATGCCTAAG rs10263430 Extension gctgAGGAAATGCTTGTCAGC rs10263430 Forward ACGTTGGATGGGACTTCTTAGGAAATGCTT rs10263430 Reverse ACGTTGGATGAGCAATACACTGACAGTTC rs1027478 Extension gATCATAAAGATTGGTTAATCTTTAG rs1027478 Forward ACGTTGGATGAAGGTATCATAAAGATTGG rs1027478 Reverse ACGTTGGATGCGGAAAAATACCAATGGAGG rs10281500 Extension gGTGTTCTCCCCGAAA rs10281500 Forward ACGTTGGATGTCCAAGTGCTTTACCAGAGG rs10281500 Reverse ACGTTGGATGTAGGGACAGACACAGGACAG rs1038072 Extension ctttCACACTGTAGGCCCCCA rs1038072 Forward ACGTTGGATGGTGACTTAACAGTCCCTGGC rs1038072 Reverse ACGTTGGATGCGAACAGCATTTGTTCATC rs1042713 Extension ttGTCCGGCGCATGGCTTC rs1042713 Forward ACGTTGGATGACCCACACCTCGTCCCTTT rs1042713 Reverse ACGTTGGATGGAACGGCAGCGCCTTCTTG rs1042714 Extension ACACCTCGTCCCTTT rs1042714 Forward ACGTTGGATGACATGACGATGCCCATGCC rs1042714 Reverse ACGTTGGATGAGCGCCTTCTTGCTGGCAC rs1042719 Extension ggggtAGGTCTTCTTTGAAGGCCTATGG rs1042719 Forward ACGTTGGATGGCGCAGGTCTTCTTTGAAGG rs1042719 Reverse ACGTTGGATGTCCACGTGATATCCACTCTG rs10455038 Extension ggggtGCTCGACCCGGAGAGCCCT rs10455038 Forward ACGTTGGATGTACTCCTTAGGGTTTCCCTG rs10455038 Reverse ACGTTGGATGAAGGCTGCCTCTAGCTAAAC rs10478909 Extension gagAAATCCCAGGAAAAAATTTCAG rs10478909 Forward ACGTTGGATGGAAGTCACTACAAAAACAAGG rs10478909 Reverse ACGTTGGATGGGGTATGAAGAGCAAACCTG rs1049394 Extension aacaaAAGCATGCTCAGCCCCCA continued

144

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs1049394 Forward ACGTTGGATGCTTTCTGCACAAGCATGCTC rs1049394 Reverse ACGTTGGATGAGTAGCTTCCTAGGAAACCC rs10515641 Extension ACAATTACATCCATCCTTCC rs10515641 Forward ACGTTGGATGTACCGCCGAAGGATTATGAG rs10515641 Reverse ACGTTGGATGTGAGCCCTCTAGAGAGGCAG rs1051644 Extension gttAACTTAATAACTGTGTACTGTT rs1051644 Forward ACGTTGGATGGCAAGATGTGAATTATTGAC rs1051644 Reverse ACGTTGGATGGTCAAAAAATTTGAGGAACC rs10516930 Extension ggTGAGTAGAATGAGTGTTTTGA rs10516930 Forward ACGTTGGATGGTGTCTTGACGTTATGTTTG rs10516930 Reverse ACGTTGGATGTAAGACTCTCTCCAAAACTC rs10519347 Extension ccacgACTGCTTTCTTCCTCTTATCC rs10519347 Forward ACGTTGGATGATATCCATTTGTGCACTCGC rs10519347 Reverse ACGTTGGATGCTTCAGAAAGGCTGGCAATA rs10519348 Extension gagggGATTGAAGGTAGAGGTTGA rs10519348 Forward ACGTTGGATGGGGAAGCAAGCACAAAAGA rs10519348 Reverse ACGTTGGATGCTATATTATTTTGGTCCTG rs10519694 Extension cagTCACATCAATAAGTAAATGAATG rs10519694 Forward ACGTTGGATGATGCCACATCACTCCACTTG rs10519694 Reverse ACGTTGGATGATATTTTTCTCCTCCCAGCC rs1056021 Extension cttgGCAGCACTGCAGATGACAATCTAC rs1056021 Forward ACGTTGGATGTCTTTGCAGCACTGCAGATG rs1056021 Reverse ACGTTGGATGTGGACTGATATTTTGCTGTG rs1077724 Extension ggagGGTTTGGGAAAGAGCGTTCCTC rs1077724 Forward ACGTTGGATGGTCTGGTTTGGGAAAGAGCG rs1077724 Reverse ACGTTGGATGTGAGCACCCACACTCTGAGG rs10875551 Extension ggGCCTGATCTTTATGATGATGG rs10875551 Forward ACGTTGGATGGGTACAACAGTGCCTGATCT rs10875551 Reverse ACGTTGGATGTCTGGAATCCAGTGCAGAAC rs10875552 Extension cGGATGAGGATAGAAAGAAGG rs10875552 Forward ACGTTGGATGTCAACCTGGGATGAGGATAG rs10875552 Reverse ACGTTGGATGTGGAACGCCTTCTCTCTCCT rs11097234 Extension cctcCACTAGTAGATAAAAAGTATCACT rs11097234 Forward ACGTTGGATGCCTGGTCATGACTCTGACAC continued

145

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs11097234 Reverse ACGTTGGATGGACTCTGAGTACTGCATGTG rs11097413 Extension CCCATTGTGGGGAAAA rs11097413 Forward ACGTTGGATGGAAGCGAAGCAGCTGATGTG rs11097413 Reverse ACGTTGGATGTGGACCACTGGGGATATCTC rs11124954 Extension CCTCATCCCTTCCAGCC rs11124954 Forward ACGTTGGATGATTACCCAACTCCTCATCCC rs11124954 Reverse ACGTTGGATGAGAGTAGGGCAAACCTACAG rs11125126 Extension ggtagATCTTAACTGAAACCTGGAG rs11125126 Forward ACGTTGGATGCCTCTGCCCTAAATCTTAAC rs11125126 Reverse ACGTTGGATGTGCCTTTTCCTAGACCACAG rs11241891 Extension TACTGCACAAGTTGGGC rs11241891 Forward ACGTTGGATGCCTGGTTTTTAACTTACTGC rs11241891 Reverse ACGTTGGATGTCAGGTGTATGATTATACCC rs11241892 Extension ccgTGTTTTATTCTCTGCCGT rs11241892 Forward ACGTTGGATGCCTCTGTTTTATTTTGTTTGG rs11241892 Reverse ACGTTGGATGGCCACAAAAGGTGATGGGAA rs1124995 Extension GGATGAATCATGGAAATGGAGAAGGA rs1124995 Forward ACGTTGGATGCTCGCAGGATGAATCATGGA rs1124995 Reverse ACGTTGGATGTTTTCCTCCTGGAGGCTTAG rs1132528 Extension tcGCAACTGTTCACCAACC rs1132528 Forward ACGTTGGATGAGGGAAGACGTTCCCATAAC rs1132528 Reverse ACGTTGGATGCTTCTGATTTATCTAGGAAC rs11691326 Extension agtTGGGGTAAAGAAAGTCA rs11691326 Forward ACGTTGGATGGTCAATTTTTAAAAGTGGGG rs11691326 Reverse ACGTTGGATGGTTATGCTTGTGATCTTGGAG rs11723050 Extension cCCCGAGGTTTATTAAAGTCC rs11723050 Forward ACGTTGGATGTATTGAGGCTACAGTGCCAG rs11723050 Reverse ACGTTGGATGTCAGGATGTTTTTCATGGGC rs11729301 Extension ccccATTAGGGCCCTGAGCCTTTC rs11729301 Forward ACGTTGGATGCCTTCTTACCCAATTAGGGC rs11729301 Reverse ACGTTGGATGCTTGCCCTCTTTCTTTACGG rs11738528 Extension cttacCTGTTTGGTCTCCATCC rs11738528 Forward ACGTTGGATGCCTGTGTAAGGGGTTCCATA rs11738528 Reverse ACGTTGGATGGTGATCTGCAGGTTATTTGG continued

146

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs11739347 Extension agGCCATTAATGTTTTAATATTCTCT rs11739347 Forward ACGTTGGATGCCACATCCAGATAGACATGC rs11739347 Reverse ACGTTGGATGCAGCAGTATAAAAGAGATCA rs11740247 Extension CCATCCGAAAAGGACTA rs11740247 Forward ACGTTGGATGGTTCACAGGAAAGGCAACAC rs11740247 Reverse ACGTTGGATGCCAATCTGGAGTCTCTGAAG rs11740298 Extension TTTGAGTGGGTGCGG rs11740298 Forward ACGTTGGATGGGAAACGGCATCTTCATTTG rs11740298 Reverse ACGTTGGATGAGGCCTGTGGTTCCAAAATG rs11746929 Extension aCCCCGGGTCCTTCTTGC rs11746929 Forward ACGTTGGATGATTCCTATTCCCTCACGTGG rs11746929 Reverse ACGTTGGATGAGGAGGCATCCTCCTCCAAA rs11748334 Extension gGGACTCACTGCCAAGAATGG rs11748334 Forward ACGTTGGATGAAATACAAGGACTCACTGCC rs11748334 Reverse ACGTTGGATGGGTTAACTTCATGGTAGGTC rs11749913 Extension acccATAGGATCAAGCCCTGGCACC rs11749913 Forward ACGTTGGATGCATGGCCTATAGGATCAAGC rs11749913 Reverse ACGTTGGATGTGGAGATGGGAGTAGAAAGG rs11750093 Extension ggctgCATGGAGGTCTATGAGTTG rs11750093 Forward ACGTTGGATGAGATGGGCTTGGCTACATGG rs11750093 Reverse ACGTTGGATGGCTCCAGTCTACAGTCTGTC rs11764332 Extension GATGGAGAAGCAGGAG rs11764332 Forward ACGTTGGATGGGGAAGTTGAATGGATAGGC rs11764332 Reverse ACGTTGGATGTGGTTTCATTCTCAGGCAAG rs11764362 Extension tctAGTCGGTCGTCATCA rs11764362 Forward ACGTTGGATGTCAGGTGTAGGAAAGAGTCG rs11764362 Reverse ACGTTGGATGAGGCAGACATCAACTTACCG rs11951727 Extension ccccCAAAGCAAAGGAGCATAATCT rs11951727 Forward ACGTTGGATGGAGCTCAAAGCAAAGGAGCA rs11951727 Reverse ACGTTGGATGGAAGACTTGGTTGAAAATC rs11955699 Extension GAGGACCAAGGAAGTTATT rs11955699 Forward ACGTTGGATGTTCCATAAAAACGCATCTC rs11955699 Reverse ACGTTGGATGGATGACCTTTGTGTGTGTCC rs11956139 Extension TTCCCCCTTCTTATGCC continued

147

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs11956139 Forward ACGTTGGATGGTTTCGATATGCTTTTCCCC rs11956139 Reverse ACGTTGGATGGAGGTGCCTTCTGTTATTCC rs11960528 Extension GACAAAACATTGAAGCAGC rs11960528 Forward ACGTTGGATGGGTAAACATGAATAGCATGA rs11960528 Reverse ACGTTGGATGGGGAGCAGTCTGAATTAGTC rs11975645 Extension agccgGAGACTGCCTTTTCATG rs11975645 Forward ACGTTGGATGGTTTCCTGACATACAACTGC rs11975645 Reverse ACGTTGGATGTTCCCATCCAGGAAAAGGTC rs11977829 Extension aCAAGATAATGCATATGGTAAAAC rs11977829 Forward ACGTTGGATGGTGTTCCTCATGGAGTTTTTC rs11977829 Reverse ACGTTGGATGAACTACCCAGCCATGGTGTT rs12186425 Extension CACATTTCTTTCCCAGAC rs12186425 Forward ACGTTGGATGTACAAGAGCACCCACATTTC rs12186425 Reverse ACGTTGGATGGTGGAGGAATGAATAGATGC rs12505410 Extension CCAAATACAACCTTTGGAAGA rs12505410 Forward ACGTTGGATGCCATTCATTCTAAGATTTTCC rs12505410 Reverse ACGTTGGATGCATGATTCAAACTTGGCCTC rs12505502 Extension tataTAGGTGGATCCCAGG rs12505502 Forward ACGTTGGATGATTTTGCGTTCTGTGTAAGG rs12505502 Reverse ACGTTGGATGGCTTGTTTTGCCAGACAGAG rs12508842 Extension TGCCATCTCCTGCCA rs12508842 Forward ACGTTGGATGTTCCACTTGGTAAAGGTGGG rs12508842 Reverse ACGTTGGATGGTTCTGTCTGATTCAGAGCC rs12513847 Extension cccctTGCTCTAGTCCCCAGCAGCC rs12513847 Forward ACGTTGGATGCTTCTTTGTTGCTGGGTCAC rs12513847 Reverse ACGTTGGATGGGGAAGAGAATGATGCTGTC rs12516622 Extension cccgcTGTTACAGCTATTGCCA rs12516622 Forward ACGTTGGATGGTCAAAAGGTATGATAGAA rs12516622 Reverse ACGTTGGATGGCATCACAAAATGTCTACCG rs12516924 Extension AGGAGGTATATTAACGCCA rs12516924 Forward ACGTTGGATGATGCAAAATGGCTCTGGGTC rs12516924 Reverse ACGTTGGATGTCTCTATCCCATGCTCTTCC rs12517185 Extension ctcctAGTCTCCTCGTCCTCCT rs12517185 Forward ACGTTGGATGACACAGAATACCTGCTCCAG continued

148

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs12517185 Reverse ACGTTGGATGCTGTCCATTGCTATTTTTCC rs12519315 Extension GGACATTCGTATTAACATAAAAG rs12519315 Forward ACGTTGGATGCAGGCAGAAGTAAATAATAC rs12519315 Reverse ACGTTGGATGCATTTATTATTACTAATCTTG rs12522122 Extension AGAAGTAATTTTATGCTAGTTCAATTT rs12522122 Forward ACGTTGGATGCCTGGCCCCAAAATTGATTG rs12522122 Reverse ACGTTGGATGAATGCATAAAATACAAGGC rs12614710 Extension GAGACACGCTGGCCACC rs12614710 Forward ACGTTGGATGAAGCAGCAAAGTGAGACACG rs12614710 Reverse ACGTTGGATGGGCTGGTGTGCCAGCTGAG rs12618410 Extension AGGTTATGAAGATATTTTATTCTCCAA rs12618410 Forward ACGTTGGATGGTGGAGTCCTTTGTCCTTAC rs12618410 Reverse ACGTTGGATGAAATTTTACTAAATTTTATG rs12643954 Extension gtAGGGTATTTATGACATTTAAGGA rs12643954 Forward ACGTTGGATGATAGCCAGGGTATTTATGAC rs12643954 Reverse ACGTTGGATGCTAGGGCATATGAGCACAAC rs12652257 Extension gggcGTGCTATGTCTACAAGGAG rs12652257 Forward ACGTTGGATGTTGCATCCTAGTGCTATGTC rs12652257 Reverse ACGTTGGATGAGAAGTGACATCTAATCCTG rs12654269 Extension gaccTCTAATCCTGATTTTCTTGTC rs12654269 Forward ACGTTGGATGAGAAGTGACATCTAATCCTG rs12654269 Reverse ACGTTGGATGTTGCATCCTAGTGCTATGTC rs12668538 Extension gaCCACTCAATAACAAGCAAAG rs12668538 Forward ACGTTGGATGAAACACAAATTACCCCACTC rs12668538 Reverse ACGTTGGATGATCTTCCTTGGAGAAATTGC rs12712997 Extension ccTTAAAAATCGCCATTATTTTGT rs12712997 Forward ACGTTGGATGGCAGACTTTTAAAAATCGCC rs12712997 Reverse ACGTTGGATGGAAATATATGTATGTGTGTG rs12712998 Extension GTCTTGAACTCCCAATCT rs12712998 Forward ACGTTGGATGTCTCCATGTTGGTCAGGCAG rs12712998 Reverse ACGTTGGATGAGCACTTTTGGAGACTGAGG rs12717846 Extension ggcaATGAACTAGCAGCATCAG rs12717846 Forward ACGTTGGATGTGCTACTCATGTTGTGGTCC rs12717846 Reverse ACGTTGGATGGTGAAGGGCTGCGGTATTTA continued

149

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs12719304 Extension ACGAACATCCCAAGCAC rs12719304 Forward ACGTTGGATGCTCAAATGGTAATGAGCCAC rs12719304 Reverse ACGTTGGATGGTAGTTTGAAAGTCTCTAGC rs13120400 Extension gataAGGGAAACCTTCACCT rs13120400 Forward ACGTTGGATGGTTGGAATCAGGTGATCAGG rs13120400 Reverse ACGTTGGATGCAGCACATACTCCCTTTCAG rs13145977 Extension TTTGGGTAAACAGTCTACATAA rs13145977 Forward ACGTTGGATGACTATTTGGGTAAACAGTC rs13145977 Reverse ACGTTGGATGGAATCTCTTGTAAGTTCCCC rs13156718 Extension tACCAGGCTCAGATCCCTAGGAG rs13156718 Forward ACGTTGGATGCAGGTCTTATTTGACCAGGC rs13156718 Reverse ACGTTGGATGACATTGATTCACTGGAGCAC rs13159076 Extension AGGTATAGCCCGCCCTCCATTTCC rs13159076 Forward ACGTTGGATGTACTAGGCTCATGCAGAGAC rs13159076 Reverse ACGTTGGATGTTTGATCTATGGTTGGTAG rs13162396 Extension acGTCACTCTCACAAGGAAA rs13162396 Forward ACGTTGGATGGGCAGAAGCACAATATGCAG rs13162396 Reverse ACGTTGGATGGCTGCCTGTACTTCAGAAAA rs13162633 Extension ttgtcATAGAGGCTTCAGTCAC rs13162633 Forward ACGTTGGATGGGCTAGTCATAGTATGAGAG rs13162633 Reverse ACGTTGGATGAAAGATGAGTTCGCTGAGGG rs13167635 Extension ccggGTGCATAGATACTCATGACT rs13167635 Forward ACGTTGGATGTTGGAATTGACCCCTTGGTG rs13167635 Reverse ACGTTGGATGAATTATGCATCCCACATGGC rs13168872 Extension cctTCTTTATCTGTATCCTTTGC rs13168872 Forward ACGTTGGATGTATGCCTTGTCCTACGCATC rs13168872 Reverse ACGTTGGATGGCTAGAATCGCTCACAGAAC rs13174179 Extension cccGAGCCTCCATTTGTACG rs13174179 Forward ACGTTGGATGCAGGCGGGGAGCCTCCATT rs13174179 Reverse ACGTTGGATGAGAACCTTGAAGGAATTGGG rs13177306 Extension TCACCCAGATACATTCAT rs13177306 Forward ACGTTGGATGGGAAGAACTCAGATCTTCAC rs13177306 Reverse ACGTTGGATGGGAATTTATAGACAAATGTG rs13177918 Extension tctccTGCCACTCTTTCCTTCA continued

150

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs13177918 Forward ACGTTGGATGTCTGACTGCACCTCATGCCA rs13177918 Reverse ACGTTGGATGAACTGCAGGGAAAGGTCTTG rs13177951 Extension gtcgTCCAGTGAAATAAGGACA rs13177951 Forward ACGTTGGATGTGATAAGTCCGGAGTCCTTC rs13177951 Reverse ACGTTGGATGAAGACCTTTCCCTGCAGTTG rs13311608 Extension TGGCTATTCTTCAAAACGTTTA rs13311608 Forward ACGTTGGATGGGGAAACAGGTTGGCTATTC rs13311608 Reverse ACGTTGGATGTAGGAGTTGGAATTGCTGCG rs13356198 Extension ccGCAGGGATCCACTAGAT rs13356198 Forward ACGTTGGATGCAGGTAGCATCTATAGCAGG rs13356198 Reverse ACGTTGGATGTCATCTTGTTCCATCCTGCC rs13358977 Extension CTCTAGTTCTCCCTCATGG rs13358977 Forward ACGTTGGATGCTTGGGTTTTGAGCTGAATC rs13358977 Reverse ACGTTGGATGAAAAAGGGGTGGGGGAGAAC rs13359059 Extension GACCTCTGTCCTCAGATA rs13359059 Forward ACGTTGGATGAATAGAAGGTTGGACCTCTG rs13359059 Reverse ACGTTGGATGTGACGCTCTTAATTTTGTGG rs13359847 Extension CCGCACCTGGTATATGCTA rs13359847 Forward ACGTTGGATGAGCATTTAGAACAGTGCCCG rs13359847 Reverse ACGTTGGATGGAAATGAAAATGGAATGGC rs13387221 Extension cgGCTACTCTCAAAAGATAAAGTA rs13387221 Forward ACGTTGGATGGAGGAACGACAGAAAATCAAG rs13387221 Reverse ACGTTGGATGCTTAGGGAATTTAGGAATC rs13404066 Extension tCCCAGCCAAGAATATGATTTTTAA rs13404066 Forward ACGTTGGATGAAAGTGCTGGGATTACAGGC rs13404066 Reverse ACGTTGGATGGAGCGAGATCCTGTCTCAAA rs13434570 Extension TGCCAAGATAAAAACGGATTG rs13434570 Forward ACGTTGGATGGCCAGGAAAGTTACTCTAAC rs13434570 Reverse ACGTTGGATGCCCCTTTCCCAATTAGATCA rs1348431 Extension AACTCAGCAGTCTAGGAGCA rs1348431 Forward ACGTTGGATGGCATATTTTAAAACTCAGCAG rs1348431 Reverse ACGTTGGATGTGGCAATGACTCAAGAGAGG rs1351608 Extension TGCAACCTCATTTATCTGACAAC rs1351608 Forward ACGTTGGATGGAAATAGGGCACCTCCAATG continued

151

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs1351608 Reverse ACGTTGGATGTTTACTGCCAAAGACACCCG rs1363333 Extension caagTCATTCTCAAATGCAGATTTAA rs1363333 Forward ACGTTGGATGAACAGCGCTACTGTCACCTG rs1363333 Reverse ACGTTGGATGAAAGCCCTTGTCATTGAAGA rs1366455 Extension cTGGGCTGAAATATATACTTGTCAC rs1366455 Forward ACGTTGGATGCTTGCACATGGGCCTGTTTC rs1366455 Reverse ACGTTGGATGTGTGGAGCAAAGAGTGTCTG rs1373651 Extension tcTCCAGTCTTTCGCTTTA rs1373651 Forward ACGTTGGATGCCTTGCCAATATTCCCAAAC rs1373651 Reverse ACGTTGGATGCTTATCTTGTGGAGAAGGTG rs1406754 Extension GGTCTCCTCCTCCCCCA rs1406754 Forward ACGTTGGATGAATTCACATGGGTCTCCTCC rs1406754 Reverse ACGTTGGATGTTTGTAGCTGAGAGTATGGG rs1406755 Extension agGTACTTGAAAGTAATAAAAGTACA rs1406755 Forward ACGTTGGATGGGCCTAGTGGGAGTTAAAAT rs1406755 Reverse ACGTTGGATGGGAGGAGACCCATGTGAATT rs1450478 Extension ccgaTCTGTTAGACAGCAAGG rs1450478 Forward ACGTTGGATGTCACGGGTTCAGTTTAAGGC rs1450478 Reverse ACGTTGGATGTGTATCTCACTGACACTTTG rs1465327 Extension ccATGTGATGTGCCAAAGA rs1465327 Forward ACGTTGGATGAGCAGATTCAGTAGTATGTG rs1465327 Reverse ACGTTGGATGTGAGAATTGCTGTGACTCCG rs1465693 Extension CCCCCAAACCCTTATTC rs1465693 Forward ACGTTGGATGTGTCCCCATAAATGGAAGGC rs1465693 Reverse ACGTTGGATGTCTTACAAGCAGTTGAGGCG rs1468402 Extension GAATGCTAGGAAAGGCTATA rs1468402 Forward ACGTTGGATGCAGCAATACTTGAATGCTAGG rs1468402 Reverse ACGTTGGATGGAGTGATTTCACTGTGAAAGC rs1470696 Extension ggGTCATCCTTCAGGGCA rs1470696 Forward ACGTTGGATGCCACTGACAACATGATCTCC rs1470696 Reverse ACGTTGGATGTGAACATTCTCGCCAACTCC rs1496276 Extension cttaCTGCCTCAGAACTTTCT rs1496276 Forward ACGTTGGATGGAAGAAACCATTGGGCTTCC rs1496276 Reverse ACGTTGGATGGGCTTCCTAAGGATGAGTAG continued

152

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs1504175 Extension cGTTTTAAATACTACATCTCAGTTAA rs1504175 Forward ACGTTGGATGACCTATTCTGGGACAGGTG rs1504175 Reverse ACGTTGGATGTGAATGTCAGTATCCCTGTC rs1513721 Extension ATGTCAGTGGCCTTTCCT rs1513721 Forward ACGTTGGATGAATTAAATATGTCAGTGGC rs1513721 Reverse ACGTTGGATGCTTGATTTATTCCCCACATAC rs1513807 Extension TGCATCCCTGTTGCA rs1513807 Forward ACGTTGGATGGACATCATTACATTGCATCCC rs1513807 Reverse ACGTTGGATGTATTCCAGCTACAGTATTCC rs1529625 Extension atGAGAAAATTTCTTTCCTACTCATTT rs1529625 Forward ACGTTGGATGGGGAGAAAAAATTATTGGAG rs1529625 Reverse ACGTTGGATGGAAATCTGACATAGACTTCC rs1534696 Extension gGCAAGAACTGCTAATTTATTTC rs1534696 Forward ACGTTGGATGTCAAGGACCTCTTGCCCAAC rs1534696 Reverse ACGTTGGATGTTCATGTGGGGAATGCACTG rs154001 Extension GAGGTGTTCCCTGGC rs154001 Forward ACGTTGGATGTCTGCTTGCTGCATTTCCCT rs154001 Reverse ACGTTGGATGGTTGACACAGCGTCCATTTG rs1549919 Extension TTCCCTTTCTTCTGGTCATAAAT rs1549919 Forward ACGTTGGATGCACGTGTCCTCATCCATTTC rs1549919 Reverse ACGTTGGATGGAGCTATATTTGGAAGCCGC rs1562855 Extension ttcctCCAATCATTGTGGGATACA rs1562855 Forward ACGTTGGATGACTTTTTGCCCAGCCTACAG rs1562855 Reverse ACGTTGGATGCACATTTAAGGCATTCATAAC rs1564481 Extension gTGAAGCTGCTCATTGCC rs1564481 Forward ACGTTGGATGCATTGGAAATGAAGCTGCTC rs1564481 Reverse ACGTTGGATGGATTGCAAACACTAGGTGAC rs1588523 Extension TGTGCTAAATCTACTCTGCCTA rs1588523 Forward ACGTTGGATGCATTCAGAAAATCCTAGGGC rs1588523 Reverse ACGTTGGATGCAGGCTTGGTTATTCCATCG rs1603909 Extension ggTGTTGAGGGACTTTTCTT rs1603909 Forward ACGTTGGATGCCTTTATTATGTTGAGGGAC rs1603909 Reverse ACGTTGGATGCAAAATCCAACATCCTTTC rs162953 Extension GTGAGGGGCACCGTG continued

153

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs162953 Forward ACGTTGGATGTTTCTCGCTGTTGGTGGAAC rs162953 Reverse ACGTTGGATGGGAGCCCTGCCTTTTTCTTT rs163186 Extension GGTCATTGGAGCTTGTG rs163186 Forward ACGTTGGATGTCAGTCACCATTGTCTCCCG rs163186 Reverse ACGTTGGATGGCAGACTTAATTGCTCAACC rs167349 Extension TAATGCCCCTCCTTTTG rs167349 Forward ACGTTGGATGCCAATCATTCATAATGCCCC rs167349 Reverse ACGTTGGATGTTGAAGTTCTCCAACATTCC rs16881927 Extension GGCATGATTCCAGGGCA rs16881927 Forward ACGTTGGATGAGAGGTGTAGGGCATGATTC rs16881927 Reverse ACGTTGGATGTACAGGCAGGCATTGAGCAC rs1695508 Extension CGGGATACCGGACCT rs1695508 Forward ACGTTGGATGAGTGGCACTTTGCTTCTCCC rs1695508 Reverse ACGTTGGATGCTGTTTAACATGGAAGTAGC rs17031599 Extension cGTGTAAAAAAACTATTTGCTTTAAT rs17031599 Forward ACGTTGGATGCCTATGCCTTTACACACAT rs17031599 Reverse ACGTTGGATGCCAAAATGATATAGACTGAAC rs17034950 Extension tgccACCTGTGAACATATCCAC rs17034950 Forward ACGTTGGATGCTGTTAGGGTGTGAAACCTG rs17034950 Reverse ACGTTGGATGGAGAAAAGGAAATGAACAC rs17135364 Extension cgAGCTCATGCAGGACTCT rs17135364 Forward ACGTTGGATGTTGACCTCACAGGGTAGCTC rs17135364 Reverse ACGTTGGATGGTTGAGTAAGCTTTTGCCTG rs17149137 Extension GGGAACTGAGTTACAGG rs17149137 Forward ACGTTGGATGGGTTTACATTTGAGCAGCAC rs17149137 Reverse ACGTTGGATGCAGTGGCAGTGACTCACCTA rs17149910 Extension ACCTGTTCTTCTTGTTTCA rs17149910 Forward ACGTTGGATGTCTGTTCTTTCTCACACCCG rs17149910 Reverse ACGTTGGATGACTGAATCTGAGAGCATCGC rs171527 Extension CAAGAGTTAGTGGGAGAA rs171527 Forward ACGTTGGATGCTACATTAGCTTGACAAGAG rs171527 Reverse ACGTTGGATGGTTGTTGAAAGTGGCTACTG rs17153429 Extension acCCTATCTTCCTAAGGGATT rs17153429 Forward ACGTTGGATGACTTCCCTTTCCTATCTTCC continued

154

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs17153429 Reverse ACGTTGGATGTGAACTACAGATGTCATG rs17170180 Extension ttTGTTACATGTAAATTAAATCAACA rs17170180 Forward ACGTTGGATGGGAATGTGTTACATGTAAA rs17170180 Reverse ACGTTGGATGCAGAAATGTTTTAAAAAAGG rs1732001 Extension CTCTATTACACGCTCTTCT rs1732001 Forward ACGTTGGATGAAGGAATCTCTATTACACGC rs1732001 Reverse ACGTTGGATGGGAGTGTGTAAAAGAGCACC rs17323768 Extension GCAGTAAGGAGTTTGGA rs17323768 Forward ACGTTGGATGGGCCCAAATCATGAGGTTTT rs17323768 Reverse ACGTTGGATGTTCCACTACACCTCAAGTGC rs17326222 Extension gggtgCTCTCTGAGCATTGTTTTAAG rs17326222 Forward ACGTTGGATGGAGGAAACTTCTCTCTGAGC rs17326222 Reverse ACGTTGGATGCACAACAAAATAGCACTAC rs17326236 Extension cATATTTTTGTTCTAATTAGGCTAG rs17326236 Forward ACGTTGGATGCTTGGACCCTTCCTTCTTAT rs17326236 Reverse ACGTTGGATGATTATCATTATGATTAGGTG rs17598783 Extension ctctcTACCTAATCCAGCAGACATTAC rs17598783 Forward ACGTTGGATGTCCAGGTCCCTAGTTTTGAG rs17598783 Reverse ACGTTGGATGCTCTTCCTGGTAGTACATAG rs17653932 Extension cgcttGACGAGAGAGGCAGGTTGG rs17653932 Forward ACGTTGGATGATCTCGCAGTATACCGAGGC rs17653932 Reverse ACGTTGGATGATGAAACAGTGCCTGGACTC rs17656204 Extension gaTGAGGCTACCAAAATGA rs17656204 Forward ACGTTGGATGAAGCTGGGCACTGAGGCTAC rs17656204 Reverse ACGTTGGATGTGTAATCGCCTCTGGGTGTG rs17692371 Extension GGCCATCCAGAGTCT rs17692371 Forward ACGTTGGATGTAAATTGTATAGGGCCATCC rs17692371 Reverse ACGTTGGATGCCACTTAAGTTAGCCAAAGG rs17710564 Extension TTTCTTTGAGGTTGTATCATTC rs17710564 Forward ACGTTGGATGCTTTTCTTTCTTTGAGGTTG rs17710564 Reverse ACGTTGGATGTGCCTGGGCTGCAAACATAC rs17731538 Extension ggggtGTGCAGTTGCTTATCACT rs17731538 Forward ACGTTGGATGCAGGGAGTCTTGGGTTTAAG rs17731538 Reverse ACGTTGGATGAGAGGCCAATCCTATGATCC continued

155

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs17754040 Extension ggtgGCGGGAATTCCGACC rs17754040 Forward ACGTTGGATGCTTGATTTAGCTGGGCTGCG rs17754040 Reverse ACGTTGGATGACTTGGAGAGATCAGGCTTC rs17797713 Extension ccaaCTACATTTTGCACAATATCTATG rs17797713 Forward ACGTTGGATGAGCTGTGCAACAGCACCTAC rs17797713 Reverse ACGTTGGATGGAGCCTTCTGTTTGCACTAC rs17819022 Extension GGTGCAAAAAGAGTGACTGATT rs17819022 Forward ACGTTGGATGATGTGGGTGCAAAAAGAGTG rs17819022 Reverse ACGTTGGATGCATATGATTCCTAGTCATCC rs1833661 Extension ggggaCTACATCCAAGAAACTGCTCA rs1833661 Forward ACGTTGGATGCACCCATCTTTCCCTACATC rs1833661 Reverse ACGTTGGATGCACAAAAGTCAGGTTGTGGG rs1838207 Extension cccaaAAATTCAGGCCTGCGTT rs1838207 Forward ACGTTGGATGGAAAGTAAAACAAATTCAGGC rs1838207 Reverse ACGTTGGATGCTAGGGCTTCATAAGCCCTG rs185411 Extension tGGCTGCATCAACATCT rs185411 Forward ACGTTGGATGTGAACTATTCCTTGAGGCTG rs185411 Reverse ACGTTGGATGTACAGTGACTCCCTCCAATC rs1862175 Extension ccccgTGGCTCTGAAATCTCAGCGACC rs1862175 Forward ACGTTGGATGGCTATGGCAATAAATTGGCTC rs1862175 Reverse ACGTTGGATGGCATGATGAGGGTGACAAAC rs1864066 Extension ccTGTTTGCATGCTCTTTTA rs1864066 Forward ACGTTGGATGCTACAGCATGAATTGCCTTG rs1864066 Reverse ACGTTGGATGCTTAGCAGAGTGGGAAAAGC rs1864828 Extension tcctgTTTTAGGTACATTTGATCACA rs1864828 Forward ACGTTGGATGCCTATCCAAAGTTTTTAGGT rs1864828 Reverse ACGTTGGATGAGTTACTTGACTCATATGGC rs1864972 Extension agcTCCCAATCCCTACCTTAA rs1864972 Forward ACGTTGGATGCCTGACGATCCCAATCCCTA rs1864972 Reverse ACGTTGGATGATGCTAACTCCTTTGGCCTG rs186749 Extension ccccGTCTGTGCTTTGGTCA rs186749 Forward ACGTTGGATGCTCAAGTGCCCTAAAAGTGC rs186749 Reverse ACGTTGGATGACATCCCTGAACCCTTGTG rs1867785 Extension CAGAGCTGGCTGAGGTC continued

156

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs1867785 Forward ACGTTGGATGGCTATTTTCGAATGGCAGCG rs1867785 Reverse ACGTTGGATGGGCACCAAGGCACCAAAATT rs1868087 Extension cccgtTCTTCAGCAGATTTGTGAC rs1868087 Forward ACGTTGGATGCAGAAGGGTCTTCAGCAGAT rs1868087 Reverse ACGTTGGATGAGCTGAGAGTGAAAACTGGG rs1868715 Extension ATGCCATTTCCTGACAGA rs1868715 Forward ACGTTGGATGTCTGAGCATGCTTTGAAGGG rs1868715 Reverse ACGTTGGATGCTTGACCCTTCCTCTCTCAC rs1919935 Extension TTTTTGCTGTACATTTTCAAA rs1919935 Forward ACGTTGGATGCTTATGTGAAACATGATAG rs1919935 Reverse ACGTTGGATGGGTGCTCTTTACTAAATAGG rs1947582 Extension TTCACATGAGTTCTAGAGTTTC rs1947582 Forward ACGTTGGATGCTCTGCTCTCTTCCTATTTC rs1947582 Reverse ACGTTGGATGGGAGCAAAAAGTGACTGTAA rs1966484 Extension aggAATTCCGCCACCCTCCCA rs1966484 Forward ACGTTGGATGAGTGTGTTGGGAAATTCCGC rs1966484 Reverse ACGTTGGATGACAAGGCAGCCACTTAACTC rs1972530 Extension GGCTGTCAACCTAAGAG rs1972530 Forward ACGTTGGATGAAAAAAAAGGCTGTCAACC rs1972530 Reverse ACGTTGGATGGGGAATGTATGTTTGGGAGC rs1992846 Extension aaagcTGTGTTAAGTTGGACCC rs1992846 Forward ACGTTGGATGTCTGAGAACTGTGGCCTACC rs1992846 Reverse ACGTTGGATGAGGCTTAGGCTGAACAGAGG rs1997377 Extension agTGTGGAATTTGTCCCCC rs1997377 Forward ACGTTGGATGGTCATGTGCTTAATGATCAAG rs1997377 Reverse ACGTTGGATGATGAGAAAGGGCCCAGAGAC rs2010994 Extension gGGCTTGCAGAGCTCAACCT rs2010994 Forward ACGTTGGATGATCCGGGTTCAGGCTTGCAG rs2010994 Reverse ACGTTGGATGTACTTGGCTAAGGCTTGGAG rs2012547 Extension AGCCCTCTTGGAATCCC rs2012547 Forward ACGTTGGATGGAAAGAAGGCTGAAGACCAC rs2012547 Reverse ACGTTGGATGACATGCCCCCAGGAACTCAT rs2015018 Extension AAATCCTCCCACTGATG rs2015018 Forward ACGTTGGATGGACCCAAAGGTAACAAACGG continued

157

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs2015018 Reverse ACGTTGGATGGCCATATCCATAGTCTCTCC rs2034327 Extension ttcAGACCTATTTTCAGTTCGT rs2034327 Forward ACGTTGGATGGGATATAGTACAGGCACCTC rs2034327 Reverse ACGTTGGATGTTTAAAATAGTTTCACATGC rs2035116 Extension cccACAGAAAGTTAAAGATTGCC rs2035116 Forward ACGTTGGATGGAAGCTTGTGAAACAGAAAG rs2035116 Reverse ACGTTGGATGTGTGTCGTAGAAAGGTAGGC rs2035472 Extension cTAAGGAAAAGGGGAGAAAG rs2035472 Forward ACGTTGGATGCAACAAACTACCGCCTTGAG rs2035472 Reverse ACGTTGGATGACATCATTAGTCCCACGAGC rs2043188 Extension ttTCAGGAGCTCCAGAAAG rs2043188 Forward ACGTTGGATGAGGAATTTTCTGTCAGGAGC rs2043188 Reverse ACGTTGGATGGCAAACGAAGGCTGTTTTGA rs2043912 Extension ccCCTTCAACAGAATGTAAAGGTTAC rs2043912 Forward ACGTTGGATGCGCCTTCAACAGAATGTAAA rs2043912 Reverse ACGTTGGATGTGAGTGTGGGAGATGTTAGC rs2044241 Extension ATGTGAGAGAGTCTTTGAT rs2044241 Forward ACGTTGGATGGGTTCTGTCTAAAGTATGGC rs2044241 Reverse ACGTTGGATGAAACACATGCACCTGTGACC rs2047681 Extension gcaatACCAGCCTGGCCAACACTG rs2047681 Forward ACGTTGGATGAAGGTAGGCAGATCCCTTGA rs2047681 Reverse ACGTTGGATGATTACAGGCACACGCCACCA rs2053875 Extension acgAGACCACACAAATACAGATT rs2053875 Forward ACGTTGGATGGGGCCTAGACCACACAAATA rs2053875 Reverse ACGTTGGATGTCCAGGCAAACTCCACTGAT rs2067087 Extension gaCCTAGAACCCTACTCTGTCT rs2067087 Forward ACGTTGGATGTAAAACCGCCTTCAAGTCCC rs2067087 Reverse ACGTTGGATGCCCTCCAAGTGGCATTGTAA rs2072569 Extension AGTTTTCATTTAATTAAGAAGGCTA rs2072569 Forward ACGTTGGATGGCTGCTGCGTGAGTTTTCAT rs2072569 Reverse ACGTTGGATGGTAAACTAGCTAAAGGAAG rs2077482 Extension gggaaAAGTTAACAGTTTGGGAAAAA rs2077482 Forward ACGTTGGATGCATCAGAAGTTAACAGTTTGG rs2077482 Reverse ACGTTGGATGCAAGAGAAATTTTAACCCAG continued

158

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs2080896 Extension ccTTTTCATGACAACACAAAAAA rs2080896 Forward ACGTTGGATGGGTTGATCTGTTTTCATGAC rs2080896 Reverse ACGTTGGATGCGGCCCCTGTAATCAGTATT rs2112487 Extension gagGAGGGCAAAGACCATGTAAAA rs2112487 Forward ACGTTGGATGCATAAGTGAACTCCATGAGG rs2112487 Reverse ACGTTGGATGTAATAACTCATCTGATGCC rs2120349 Extension aGGGTCTGAAAACTGGTTTAGGAAAA rs2120349 Forward ACGTTGGATGGGGTCTGAAAACTGGTTTA rs2120349 Reverse ACGTTGGATGGGAGATGTTGGTCAAAGGAG rs2122715 Extension gggagACCATTGTGAACCTTAGGC rs2122715 Forward ACGTTGGATGAGTCGATACCATTGTGAACC rs2122715 Reverse ACGTTGGATGAGAACAAGATAGTAAAAGG rs2161257 Extension AGCCCAGACTCCCCTCCT rs2161257 Forward ACGTTGGATGTGTGCAGAGCCCAGACTCC rs2161257 Reverse ACGTTGGATGAGGATCCATGCTGCCGAAG rs216137 Extension ggCTCTGCTCATAAAGCACG rs216137 Forward ACGTTGGATGGAGACAACAGTTGTCTCTGC rs216137 Reverse ACGTTGGATGCCCCAAAATTCTGAGGAAGC rs216150 Extension ACTGTATGCCAACATGAG rs216150 Forward ACGTTGGATGTAGTGGGTGCCTACTGTATG rs216150 Reverse ACGTTGGATGTCGCACTAGACAGTAAACGC rs2165929 Extension ccCCAGTGCCTTTTCCCACTA rs2165929 Forward ACGTTGGATGAAGGCCACCAGTGCCTTTTC rs2165929 Reverse ACGTTGGATGCAGTATGAAATAAATACTATG rs2190242 Extension CCACTGGCATTGGCACA rs2190242 Forward ACGTTGGATGACCCATCTCCAGCATTCTAC rs2190242 Reverse ACGTTGGATGTGGTGGATCCACACAAAGAG rs221171 Extension cccacCAGAAAAGACCCTATGTTG rs221171 Forward ACGTTGGATGCTACAGTTTCAGAAAAGACCC rs221171 Reverse ACGTTGGATGGGGCAGAACCAAGTCTTTTG rs2229562 Extension agaACCCAGCATCTCCTGGCCTGGCCTG rs2229562 Forward ACGTTGGATGTGCCAGCACCCAGCATCTC rs2229562 Reverse ACGTTGGATGAGCAGAAAGCTTCCAGAAGG rs2231142 Extension attACGGTGAGAGAAAACTTA continued

159

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs2231142 Forward ACGTTGGATGTGATGTTGTGATGGGCACTC rs2231142 Reverse ACGTTGGATGGTCATAGTTGTTGCAAGCCG rs2231148 Extension GAGTTTCAATTAGAGATAAAAACTTA rs2231148 Forward ACGTTGGATGGGGAAGAGTTTCAATTAGAG rs2231148 Reverse ACGTTGGATGGGAGAGATCTGATTATGTAAC rs2237084 Extension cccaCTGAGATTCTGTATATCCAACAC rs2237084 Forward ACGTTGGATGAGGTGCGACCTGAGATTCTG rs2237084 Reverse ACGTTGGATGCCAACTCAGTGTAGGTCTGG rs2240781 Extension ggtTGCTGGTGGCAGGGC rs2240781 Forward ACGTTGGATGGAGTGGAACCCTGGAGCTG rs2240781 Reverse ACGTTGGATGAACTAAAGCTGAAGGGCAGG rs2253519 Extension TGAAAATAATGTAAGTTCAAACTG rs2253519 Forward ACGTTGGATGCGAGTTGGCTGTTAAGATTG rs2253519 Reverse ACGTTGGATGCAACACCCTACCCAAGAAAT rs2254161 Extension TGAGGCAGTCATTTGA rs2254161 Forward ACGTTGGATGGAACACCATTTGTGAGGCAG rs2254161 Reverse ACGTTGGATGTCCCTTCAGCTGCCTGAAAG rs2254958 Extension ccacACAATCTTCATCGAGGCAT rs2254958 Forward ACGTTGGATGACACTTGGTAGAACCACAGG rs2254958 Reverse ACGTTGGATGGCCAGGGTCTCCTGATTTTT rs2270414 Extension AAAACTAGCCCCCCA rs2270414 Forward ACGTTGGATGCGTGTTGTGACCCATTTAAC rs2270414 Reverse ACGTTGGATGCTTCCGAATTTTGCACTCAG rs2272226 Extension ggagAAAAGCAGGGACTATTTCTTA rs2272226 Forward ACGTTGGATGGTTCCTAGAAAAAAGCAGGG rs2272226 Reverse ACGTTGGATGAGCTGGACTTTACTTCTAGG rs2272448 Extension gCTCCTTAAAGATTTATGAGTTTATT rs2272448 Forward ACGTTGGATGATGACGGTAACACCATGCTC rs2272448 Reverse ACGTTGGATGGAACATATAAACCCCAGAGAG rs2278753 Extension ggATGCTTGGGGGTGGGAG rs2278753 Forward ACGTTGGATGGACATGAGCAAAGCCAAACC rs2278753 Reverse ACGTTGGATGGTGACCCTCGCTGACCTCA rs2282809 Extension ACCATCTGTAGTCCCTCAA rs2282809 Forward ACGTTGGATGTTGTCTTGTTTATCCCTGCC continued

160

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs2282809 Reverse ACGTTGGATGGTGAGGATGGGACTGATTAG rs2284218 Extension gaGATGTCCTGGGAGGGAG rs2284218 Forward ACGTTGGATGATCTCTGAGAGCCCCCAGAA rs2284218 Reverse ACGTTGGATGGACCTTCTCAATCTAGATAG rs2287350 Extension GTTCCACAGACTAAACCA rs2287350 Forward ACGTTGGATGTACTCTCTTGGGTGTTCCAC rs2287350 Reverse ACGTTGGATGCATGGAACCCATTTACCATA rs2304060 Extension ttgtAAAGCCTGTGGATCAAG rs2304060 Forward ACGTTGGATGGTCTATCCCAGTAGGAAAGC rs2304060 Reverse ACGTTGGATGATTTTGGTAGCCTCAGTGCC rs2304061 Extension cccaCGCTCCAGCGCCCTCACTGCG rs2304061 Forward ACGTTGGATGCTTGCACAACAGCACACTCC rs2304061 Reverse ACGTTGGATGTGAGTTTCTAAACCCCACCG rs2306619 Extension ccacTGCAGAGAAATGAAATAAAAAG rs2306619 Forward ACGTTGGATGTTTCCAGGGCGATCCATAAC rs2306619 Reverse ACGTTGGATGGACCCCAAGTCCTAGAAAAC rs2331014 Extension TTCTTTTTTTGAGATGGAGTC rs2331014 Forward ACGTTGGATGGTATGTCTGCGTCCTGTTTC rs2331014 Reverse ACGTTGGATGGAGCTGAGATCATGCCACTG rs2346175 Extension cctgtTGCCTGGTGCTCTATCAA rs2346175 Forward ACGTTGGATGTGCTGTGCTTTTTGTGCCTG rs2346175 Reverse ACGTTGGATGGTTAGGAAGAGTATATTGAAC rs2346176 Extension TCGCTCTTCTTCCCCGC rs2346176 Forward ACGTTGGATGTGAGAAGGCTCGCTCTTCTT rs2346176 Reverse ACGTTGGATGATCTGAAGTCAGGCACCGT rs2346419 Extension AGCTTCATCTTCACTGAG rs2346419 Forward ACGTTGGATGCACTAAGGCCCCAACTTATT rs2346419 Reverse ACGTTGGATGGTTTGGTCAAAATGTGAAGAG rs2373000 Extension gcctgGAAGGAGTTGTCTATTCAC rs2373000 Forward ACGTTGGATGAGCAAAACCCCTTGAAGGAG rs2373000 Reverse ACGTTGGATGCCAACTGGGCTAAGTTTAAG rs2390054 Extension ggaTGAAAACTTCTGTGCAAAG rs2390054 Forward ACGTTGGATGGGAGAAGATGCTTTGTATCTG rs2390054 Reverse ACGTTGGATGTTTTTCTCCCTGTGGATTAC continued

161

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs2391276 Extension tggCCTGCTTTTTTTCCTTGAACACTG rs2391276 Forward ACGTTGGATGCCTGGAGGTCCTGCTTTTTT rs2391276 Reverse ACGTTGGATGGAGCAGTCAGAGGACCCAT rs245051 Extension tcCACATAACACAATCTAAAACCA rs245051 Forward ACGTTGGATGTGCATGTTTGTGTATGTGTG rs245051 Reverse ACGTTGGATGACTTTTTCTACTGACCCTC rs245055 Extension gaTTTTCTCCTTGTCCCCA rs245055 Forward ACGTTGGATGCGCCCATTCTAGTTAACCTG rs245055 Reverse ACGTTGGATGGCATATCACATTAGTTGTAG rs245079 Extension gAAATAACGAAACAAAACAAAAAA rs245079 Forward ACGTTGGATGCGCAGAAAAGTTTAAATAACG rs245079 Reverse ACGTTGGATGCGCAGCTCTTGGATGTTATG rs245080 Extension gccgTCGATAGGATCATTACTAGACC rs245080 Forward ACGTTGGATGAGAGATGATGAATGCCTTCG rs245080 Reverse ACGTTGGATGCTGTCTTTTTGAGTTCATCG rs246390 Extension tCACTTCACTCCTCTGC rs246390 Forward ACGTTGGATGAGCATTCACACCTCAAGCTG rs246390 Reverse ACGTTGGATGTGTTGAAGAGGCCAACCATC rs246392 Extension cGTGCCTGTCTGCTCT rs246392 Forward ACGTTGGATGAAACTGAGGCCCTGAAGAAG rs246392 Reverse ACGTTGGATGACCCAGGACTCTAACTCTTC rs251468 Extension aacccAACACACTGTTTGTACGGCCC rs251468 Forward ACGTTGGATGACAGACCTGGGACCAAACAC rs251468 Reverse ACGTTGGATGGGTGAGCAGGCACGTTGAG rs252161 Extension GGTTTTGTTGTTGCTGTTT rs252161 Forward ACGTTGGATGGTATGTTCCCTACCCTCTTC rs252161 Reverse ACGTTGGATGGTCTGGAAATCCCTGGCATC rs253349 Extension TGTGCTTTGATTTTATAAAGAGAC rs253349 Forward ACGTTGGATGGATTACAGGCATGTGCTTTG rs253349 Reverse ACGTTGGATGTGTGAATGTATTACCTACCC rs2537742 Extension tccggGTGAGCCACTGTGCCCAG rs2537742 Forward ACGTTGGATGCCAGAGCAGCTGGAAGTACA rs2537742 Reverse ACGTTGGATGGTGGGACATGTTAATGATAC rs254209 Extension gaGGCTCAATGTTGGGC continued

162

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs254209 Forward ACGTTGGATGTACATGTGTTCTCCCTCTCC rs254209 Reverse ACGTTGGATGAACACCTCAGTTGCACATGG rs2545341 Extension tcaACAGGCAGCTCAGAACC rs2545341 Forward ACGTTGGATGATTGGTACCTGCAGCAACAC rs2545341 Reverse ACGTTGGATGAGGTTAGAGGGCCTCATGG rs2546343 Extension cacctCCTCCCAAACGCCACGGGAC rs2546343 Forward ACGTTGGATGACTTCCTTCGACCCAGAGAC rs2546343 Reverse ACGTTGGATGTCTACCGGGCTGCAGGCTT rs2569075 Extension ccagCTGAAGCCACTAAACCTGCC rs2569075 Forward ACGTTGGATGAATGCCACTGGGATCTGAAG rs2569075 Reverse ACGTTGGATGTGAAATGCCAGGAAAGAGCC rs256938 Extension tagggGAAAAGCAGGAAACAGTTATA rs256938 Forward ACGTTGGATGCACATGACAGAATGCCCAAG rs256938 Reverse ACGTTGGATGGCTTTGACAAAGTTATAATC rs256940 Extension gggGACCAAAACCAGTAAGGAT rs256940 Forward ACGTTGGATGCTTTCTCAGTTTAGAAGACC rs256940 Reverse ACGTTGGATGGCCAGTTAAATCAAGTTTG rs2572324 Extension AGTGACAGGTTTGTTTTAA rs2572324 Forward ACGTTGGATGGGCTCTGGTGTTTTGAGTTC rs2572324 Reverse ACGTTGGATGGGTCATTTTATTGTCCTCAG rs2602117 Extension TCCTTTTCGTACCCTG rs2602117 Forward ACGTTGGATGCCCTTATTTTCTCCTTTTCG rs2602117 Reverse ACGTTGGATGGGAGCACTGAGTCAAGTAAC rs2617280 Extension TACTTAGACTGAGAGGTCA rs2617280 Forward ACGTTGGATGTTAGACTACTTAGACTGAG rs2617280 Reverse ACGTTGGATGAGTTGACCTTTCATGGTGAG rs2622604 Extension TCTCTTCCTTACTACTCTGG rs2622604 Forward ACGTTGGATGAAAAACATCTCTTCCTTAC rs2622604 Reverse ACGTTGGATGGGATCTTACAGCTCTCTTATC rs2622621 Extension ccTGACTCTCTACTGATAAATCTT rs2622621 Forward ACGTTGGATGGTTTTATGACTCTCTACTG rs2622621 Reverse ACGTTGGATGACTAGTTCAAGCCAAAGTGC rs2622624 Extension cGCAGGGATTTGGGTATA rs2622624 Forward ACGTTGGATGCTTGAGTATGCAGGGATTTG continued

163

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs2622624 Reverse ACGTTGGATGCTTCTTTTCCATCTACAGGG rs2622626 Extension gggaaTGACCATACAGCCCTTT rs2622626 Forward ACGTTGGATGTGGCTTCTCACTGACCATAC rs2622626 Reverse ACGTTGGATGTCTGGAGAACGCGGCCTAAT rs26531 Extension acGACCTAGTGCAAGTCATTCAAC rs26531 Forward ACGTTGGATGAGCCGCTATCGATAGAAGAC rs26531 Reverse ACGTTGGATGCTCTTCCAGACACTTGACAC rs2698720 Extension TTATCCACTTCTACCCTATC rs2698720 Forward ACGTTGGATGCCCCAGGTTTGACCATTATC rs2698720 Reverse ACGTTGGATGGTTTCATCTCACTTAAGGGC rs2699811 Extension ttTACAAATCACCAGAAATCCT rs2699811 Forward ACGTTGGATGCCCTGCAGATCATACAAATC rs2699811 Reverse ACGTTGGATGCTCCCTCTCACATACAGTAG rs2699814 Extension gggttAAGGATCGCTGCATGGTTTTT rs2699814 Forward ACGTTGGATGTCTATTGAATGAAGGATCGC rs2699814 Reverse ACGTTGGATGACTGGGTGGTTGTGTCCTAC rs2725248 Extension aacttGAATCCAGCCTAAGCAACAG rs2725248 Forward ACGTTGGATGCAGTGAATCGTGATTGTGCC rs2725248 Reverse ACGTTGGATGGCTTTTCTTTCTTTTTGACAG rs2725252 Extension gaatgTATAGCTCCAGACTTTATGT rs2725252 Forward ACGTTGGATGTCAGGAACAGGAGACATTTC rs2725252 Reverse ACGTTGGATGGCTGATTGAAGGTGCATAAC rs2731647 Extension ccAGGGAGATTAAATTGAGGGAG rs2731647 Forward ACGTTGGATGGACAGGTATTGTCCTGTTAG rs2731647 Reverse ACGTTGGATGAAGAATAAGCAAAAACAAG rs2737089 Extension CCGATCTCACATAATTAGATTTAG rs2737089 Forward ACGTTGGATGTTAGCAGGATCCGATCTCAC rs2737089 Reverse ACGTTGGATGCTGAGTGTCACTGCATATAC rs27567 Extension tcCAAGTTTGAATGTTCTTTCTC rs27567 Forward ACGTTGGATGAGTGCAGGCAAGAAGTAAGG rs27567 Reverse ACGTTGGATGAGTTCACAGGTTCATCTTAG rs2775 Extension ggagGTTTTTGGAGAAGACAAAGAA rs2775 Forward ACGTTGGATGCAGCTGTTTTTGGAGAAGAC rs2775 Reverse ACGTTGGATGGATTAGTCACTGTCACAGTC continued

164

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs28282 Extension caTTCGACACGCAGCTCTC rs28282 Forward ACGTTGGATGGAATGAATCCTTCGACACGC rs28282 Reverse ACGTTGGATGAGAGGGACTAAGTGACCAAC rs2893245 Extension ggTAAGGAAAAGAACAGGAACTCT rs2893245 Forward ACGTTGGATGCCAATAAGGAAAAGAACAGG rs2893245 Reverse ACGTTGGATGCCCGGCACCACTATTTTCTT rs2930807 Extension TTAAAGCATTAGATTATGCCA rs2930807 Forward ACGTTGGATGAATCCCTTTAAAGCATTAG rs2930807 Reverse ACGTTGGATGAAGATCAGCTGCAATGACTC rs2941664 Extension CAACCTGGCTAACATGG rs2941664 Forward ACGTTGGATGGAGTTTGAAACCAACCTGGC rs2941664 Reverse ACGTTGGATGCCACGCCCAGCTAATTCTTT rs2941666 Extension cccatAACAAACCATTAGCAGTCACTC rs2941666 Forward ACGTTGGATGGACAAAGAACAAACCATTAGC rs2941666 Reverse ACGTTGGATGCACATCTATTTCATGAAGC rs2956540 Extension ccccgATTTTTCCTCCATTTGCTAACC rs2956540 Forward ACGTTGGATGCTTATTTTTCCTCCATTTG rs2956540 Reverse ACGTTGGATGCCTGTGAAACCATTCCTTCC rs2963765 Extension tGGATTGCTAAAGAGTGGAATAA rs2963765 Forward ACGTTGGATGCTCCTATGGGATTGCTAAAG rs2963765 Reverse ACGTTGGATGACACTAATAGGCTGGCAGAG rs2973586 Extension GCAGTTTGAGCCTGTTA rs2973586 Forward ACGTTGGATGAATGCATGCTTGATAAGCAG rs2973586 Reverse ACGTTGGATGCCCACATAAAAACCTTGTCC rs2973603 Extension aaatgAAATGGCATTCTTCTGTAA rs2973603 Forward ACGTTGGATGCCACATTCTGGAGGGAAATC rs2973603 Reverse ACGTTGGATGACCAAACAAACGGGGAATGG rs30035 Extension taatCAATTCATTACTCACCTACAA rs30035 Forward ACGTTGGATGTTGTCTCGGGTTTGTCTCTC rs30035 Reverse ACGTTGGATGTCATAGCTGCTGGGATCCAT rs3017896 Extension aaggTTCAAGGCAATGAAATAAATTCT rs3017896 Forward ACGTTGGATGTTATGTGGCTCCCAGGGTTG rs3017896 Reverse ACGTTGGATGTGGGTATGGTCATCTAGCAC rs3017908 Extension TGAGCTATTTCTACCAATAGTC continued

165

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs3017908 Forward ACGTTGGATGTGCTTCCTGAGACTGTTGAG rs3017908 Reverse ACGTTGGATGATAGGTAGAGCCTGAAAGAG rs304386 Extension ATTGGAGAGAATCGAGCC rs304386 Forward ACGTTGGATGCTTCCTTCCCTTCTGGAAC rs304386 Reverse ACGTTGGATGGAGAATACTACGAGCTGTG rs304388 Extension gaaagGACAGGAGCCATTAAAAA rs304388 Forward ACGTTGGATGCCCAGCTGTTTTGTTCTGAC rs304388 Reverse ACGTTGGATGGACGTGTTTTCGTAAGACCC rs3087822 Extension AAACACTGTTCTGGTTTTCA rs3087822 Forward ACGTTGGATGATGGACAAACACTGTTCTGG rs3087822 Reverse ACGTTGGATGACACTGAATTTTAGAAAAAC rs3088359 Extension ggCTGACAGATGACCTGGA rs3088359 Forward ACGTTGGATGGTGGCAAGCCTGACAGATGA rs3088359 Reverse ACGTTGGATGCCACCAATACCCTGCTTATG rs3095902 Extension ccccCCTGTCGCACAGGTGCC rs3095902 Forward ACGTTGGATGGGATAGTGCCTGTCGCACA rs3095902 Reverse ACGTTGGATGCATGGGAGGTCCGAATCTG rs3097714 Extension gTGATTAAAGTTCTTTATCTCACT rs3097714 Forward ACGTTGGATGGTGTGCTTTTTTCACTCTG rs3097714 Reverse ACGTTGGATGCAGTGCGGGAACTGTTTATG rs3109433 Extension TTTGCTACTGTGCCCAA rs3109433 Forward ACGTTGGATGTGAAGAGCTTAGTTTGCTAC rs3109433 Reverse ACGTTGGATGCTAACTACTGTTTACAGAG rs3114018 Extension gcTCAAATAACTTATTGCTTGAATTT rs3114018 Forward ACGTTGGATGGAGTATCAAATAACTTATTGC rs3114018 Reverse ACGTTGGATGGGAAACCTCACAAAAGTGTC rs32206 Extension gCCACTTATATTATTTCTTCCACTT rs32206 Forward ACGTTGGATGTTCTTGGCTCACCATTGGTC rs32206 Reverse ACGTTGGATGATCACCAAACTACATACAC rs32216 Extension gggAATGGGTAGCCAAGTG rs32216 Forward ACGTTGGATGCCAGTAGCTAATGTAATGGG rs32216 Reverse ACGTTGGATGCATCAGCTCAGCAAGTACTC rs32233 Extension aacTACAGTGATTGGAATTTTGTGAATA rs32233 Forward ACGTTGGATGACTCATTCCTTTTGTATGCC continued

166

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs32233 Reverse ACGTTGGATGCAGAAAAGTAAACTAATAGTG rs32574 Extension GTCAGACAGCTTGAGTTTGAA rs32574 Forward ACGTTGGATGCTTGTGAGTCAGACAGCTTG rs32574 Reverse ACGTTGGATGGCTTCTGTAACTGGCCCAAG rs32974 Extension ctcTCTCCAGCCATAGACC rs32974 Forward ACGTTGGATGAGCTGCCACTTTGGGTTAAG rs32974 Reverse ACGTTGGATGGGAAAAAGCACAATCCATTC rs330787 Extension GCAATAAAGAGAGTCGC rs330787 Forward ACGTTGGATGCAGACCACCATAATAAAGCG rs330787 Reverse ACGTTGGATGGACTACAGTATAATGTAAACC rs331074 Extension acccGCTCTTCTTTATACATCTGG rs331074 Forward ACGTTGGATGTCAATAGGAATGAAACCAGC rs331074 Reverse ACGTTGGATGCCAACCAGAAAAAGCATGGG rs331086 Extension CAAACTCAACATACACAGAC rs331086 Forward ACGTTGGATGGCCAGGCACGTATACTTTAA rs331086 Reverse ACGTTGGATGATAAGGTATACGTGGGGTGG rs331097 Extension ccttCCATTTTGCATAACAAGCAGAT rs331097 Forward ACGTTGGATGGGATGGCCATTTTGCATAAC rs331097 Reverse ACGTTGGATGGTATACCTCCAGAGATAACC rs33410 Extension GGAGAGTAGAGCGGC rs33410 Forward ACGTTGGATGATTCTCAGAGGTCCTACTGG rs33410 Reverse ACGTTGGATGTCTTTCTGCCCTGATGTTCC rs33412 Extension tAAGAATGTTATACAACAGCCT rs33412 Forward ACGTTGGATGCAAACTGGAGCCCTTTAACC rs33412 Reverse ACGTTGGATGCCAGCATTAGTATCCCATGA rs335140 Extension tccaTCAGCCCCCAGAGCCTAG rs335140 Forward ACGTTGGATGCCTTGATACTGCTTTCTCCC rs335140 Reverse ACGTTGGATGTGCACTTGAAAAGTACGTAG rs335145 Extension ATACTGCAAAACATCTTGGAA rs335145 Forward ACGTTGGATGCACTATCTCTTTGATACTGC rs335145 Reverse ACGTTGGATGACATTTACTGGAACCCTGTG rs335170 Extension AAGTGGAGGAAGATGTTAATACA rs335170 Forward ACGTTGGATGAAAAGTGGAGGAAGATGTT rs335170 Reverse ACGTTGGATGCATGCTACATATGAGAAAG continued

167

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs335180 Extension AGATGGAATTCAAGTTGGGA rs335180 Forward ACGTTGGATGTTCAGGATTAGTGTATGTGG rs335180 Reverse ACGTTGGATGATTCCTCCTCTGGGCTTTTG rs335182 Extension ttAGTCAGATCTGCAAGTCAAAT rs335182 Forward ACGTTGGATGGTGTGAGTCAGATCTGCAAG rs335182 Reverse ACGTTGGATGTGAGATGAGGCATCCACAAC rs336030 Extension gaggAATGAAACAATAAGCACAAAG rs336030 Extension tggaATTTGTCAGACACTGTCCTGGGTA rs336030 Forward ACGTTGGATGTGTCAAGAATCGGACCATGC rs336030 Forward ACGTTGGATGGCATTTGTCAGACACTGTCC rs336030 Reverse ACGTTGGATGTGTCAAGAATCGGACCATGC rs336030 Reverse ACGTTGGATGGCATTTGTCAGACACTGTCC rs337128 Extension gggcGGAAATACAGACTTCGGTAT rs337128 Forward ACGTTGGATGCCCCAAGATTTCATTAGTTC rs337128 Reverse ACGTTGGATGGGCTTCTCTGACTTTTATGC rs34161 Extension ccaCCTCACACTCACGAAAAT rs34161 Forward ACGTTGGATGGAAAGATCCAAGAAGGTCCC rs34161 Reverse ACGTTGGATGTAGAATGTGCATGGAGTCCC rs34166 Extension CTTGAGCACAGCTTTTAG rs34166 Forward ACGTTGGATGTCACTGGATAGCAGATGGAG rs34166 Reverse ACGTTGGATGCAGTACAATCAGTCTGCAC rs34167 Extension gggaaCATCATAATGTGGATGGG rs34167 Forward ACGTTGGATGAACCCAGATTTTCTCCTACC rs34167 Reverse ACGTTGGATGCTATTGTATAAAGTAAAGAG rs34173 Extension cctTGGCAAGTGGCTCTCCTTTA rs34173 Forward ACGTTGGATGATGCAACCTAGTGGCAAGTG rs34173 Reverse ACGTTGGATGCATCCCACCTTTCATACCAG rs34179 Extension TGGATGAAGAGTTGCTT rs34179 Forward ACGTTGGATGTGACTGAATTGCTGCAATC rs34179 Reverse ACGTTGGATGGAAACCATTTTCTTTGCTG rs34181 Extension TTTTACAGCACAATCCCACCCAGAGC rs34181 Forward ACGTTGGATGGCTGTTTTACAGCACAATCC rs34181 Reverse ACGTTGGATGCCTGAGACCGCAATGTAATG rs343173 Extension CTCAGCTCCCCCGCT continued

168

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs343173 Forward ACGTTGGATGTGCTGACACACCAGCTGCCT rs343173 Reverse ACGTTGGATGAACCCCCCCACACTCATTG rs343195 Extension AAAGGGCTACATACTATATGA rs343195 Forward ACGTTGGATGAGACAATCTGAAAGGGCTAC rs343195 Reverse ACGTTGGATGTAAGTGGACCTGCACAGTTC rs343201 Extension AATTCAAGCCACGTTG rs343201 Forward ACGTTGGATGGGGTAAGTAATTCAAGCC rs343201 Reverse ACGTTGGATGGCAGAACAAACTCTATACAC rs343212 Extension taatGCATTTACAGTCTCTTCCA rs343212 Forward ACGTTGGATGCTTTGACCTAAAAATGAAGCC rs343212 Reverse ACGTTGGATGGTAGATTGTAGGTGACATGG rs34850 Extension gggagCTTTCTAGAGACTAGACCTTA rs34850 Forward ACGTTGGATGATCAGTCCCAAGTTGATCTC rs34850 Reverse ACGTTGGATGCTCTCTTCTTTCTTTTCAGC rs353237 Extension cccctGCCTGTTTTGTGCTGA rs353237 Forward ACGTTGGATGCTTAGCTTAGTGCTCTGCTG rs353237 Reverse ACGTTGGATGCTTTTCCTCCTGAGTTGCTG rs356168 Extension TGTGCCCTGATTGTTC rs356168 Forward ACGTTGGATGTACCCTTGCCAGTGACCTTG rs356168 Reverse ACGTTGGATGGCAGTAACTGCAAACGCTTC rs3731838 Extension ggcggTGGTTGGGTTCACTTGC rs3731838 Forward ACGTTGGATGGGGCTTAGATGACGACATTG rs3731838 Reverse ACGTTGGATGCATGTCTAATGACCAGCTCG rs3749690 Extension CGTTTGTAGTAGAGAGCA rs3749690 Forward ACGTTGGATGGATCCTGGAAAGTAGAAGCG rs3749690 Reverse ACGTTGGATGTTGTGTACCGAGAGCTTGTG rs3750117 Extension TTATTGCAACTAAAGGAAAAATA rs3750117 Forward ACGTTGGATGCCTAGTTATTGCAACTAAAGG rs3750117 Reverse ACGTTGGATGCCACCATATAAGGGTACTTC rs3768730 Extension cACTTGGAATGTAGGCATCTTTATT rs3768730 Forward ACGTTGGATGACAGAACTTGGAATGTAGGC rs3768730 Reverse ACGTTGGATGGCATTTAGGGAACATAGGAC rs3770752 Extension GTCTTATTCTCAGAACTCAATAGATA rs3770752 Forward ACGTTGGATGGGGTCTTATTCTCAGAACTC continued

169

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs3770752 Reverse ACGTTGGATGAGCTCGCTCTTTCTTTCCAG rs3770754 Extension gtcgAGTTGGTAAGTATCATGAGAT rs3770754 Forward ACGTTGGATGGGAGCCTTACCATCTAGTTG rs3770754 Reverse ACGTTGGATGCCTCGCTTATTAGCTGTGAC rs3770755 Extension gTGAATATGTAAAAGCAGATCAACT rs3770755 Forward ACGTTGGATGGATGTTTCTCTTTCAGTGGC rs3770755 Reverse ACGTTGGATGTCATGGCCTCTGAATGCCAC rs3770764 Extension ggggaACACACAAAAATAGAAGTATGCT rs3770764 Forward ACGTTGGATGTGCCCTCCTTTAAAAACACG rs3770764 Reverse ACGTTGGATGAGAGGAAGTTGTTAAGGGAG rs3779250 Extension GTGTGGGCTTTTCTGTGG rs3779250 Forward ACGTTGGATGCTCTTGGTGTGTGGGCTTTT rs3779250 Reverse ACGTTGGATGTCTGTATGGACTCAGGGAGC rs3815995 Extension GGAACACTGCGTTAATGTCA rs3815995 Forward ACGTTGGATGAAAAAAGAAGGAACACTGCG rs3815995 Reverse ACGTTGGATGTTTGGGCCACAATTACGAAC rs3822606 Extension CATGCCTGTGATTTTCC rs3822606 Forward ACGTTGGATGGCTTGATTCATGCACTTCTG rs3822606 Reverse ACGTTGGATGACAAGTACTGTCCTTCAACC rs38410 Extension gGAAGGAACTCAGTTTATTGA rs38410 Forward ACGTTGGATGCCACAGCTTCTTATGCATAG rs38410 Reverse ACGTTGGATGGATTTACCTTGAACTCCATGC rs3844295 Extension AGTGTGAAAACCGACTAATACAA rs3844295 Forward ACGTTGGATGTCAGCAGTGTGAAAACCGAC rs3844295 Reverse ACGTTGGATGGAGGTGACTAGTTAGTTCAG rs3857434 Extension ggatgCGTTTAAAAAGGACTCACT rs3857434 Forward ACGTTGGATGGGATTTAGGGAAGGAATTCG rs3857434 Reverse ACGTTGGATGGATCCTACCCAACTGTCTTA rs39099 Extension caggTTGTTCAGAGCACCACTGAATC rs39099 Forward ACGTTGGATGCCTAGTTTGTTCAGAGCACC rs39099 Reverse ACGTTGGATGATTGGACAGAACTGGGTCAC rs3923559 Extension TGGTTTTAGTGATAACAATCTAAT rs3923559 Forward ACGTTGGATGAAGAACCGCTGCCAAGAATC rs3923559 Reverse ACGTTGGATGCGTGTATGTTGAAAGGTTCC continued

170

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs3924917 Extension tCCCACATACAACTACAAAGA rs3924917 Forward ACGTTGGATGGTAAGTGGCCCACATACAAC rs3924917 Reverse ACGTTGGATGGAACACTTATCCTTGAACCC rs4075350 Extension GCGGCAGAATCAAACTACAGAAT rs4075350 Forward ACGTTGGATGTTCTGGCGGCAGAATCAAAC rs4075350 Reverse ACGTTGGATGTTTCCATCGCACTGACCCTC rs4078254 Extension gtgGCTTAAGCCAGGCAGAA rs4078254 Forward ACGTTGGATGGTTGGTACAGCTTAAGCCAG rs4078254 Reverse ACGTTGGATGCTGGGTTCAGTTTAGTTGGG rs4148149 Extension gggaaGTTTGGGAACAGCAACTC rs4148149 Forward ACGTTGGATGTTGTCGTTGTGCTTAGGAGG rs4148149 Reverse ACGTTGGATGTCCATGACGAACTTTCTCTC rs4240406 Extension cccctCTCAGCCTTGGTTCCTT rs4240406 Forward ACGTTGGATGTTACTGCTGTCACTAGCTCC rs4240406 Reverse ACGTTGGATGTGGCAGGAACAGAGTCGCAG rs42427 Extension gttccAATTCACCTGACTGTGC rs42427 Forward ACGTTGGATGTCTGCCTTCTGTAGGAATGG rs42427 Reverse ACGTTGGATGTGAGTTAGCTGCTGGAGAAG rs4270 Extension AGAGCACTGCTGAAAA rs4270 Forward ACGTTGGATGCCCAAGTAAGATACTCCTTC rs4270 Reverse ACGTTGGATGACTTGAGAACAGGGATCTGG rs4285214 Extension TGAGTTGAAGTATACAAACATTTCT rs4285214 Forward ACGTTGGATGCAGGAAACTGAATTGCTGAG rs4285214 Reverse ACGTTGGATGGGAAGCAGTTAGGCAATATG rs4384764 Extension tttggCTGCGGGGTGGTTCACA rs4384764 Forward ACGTTGGATGCCATCTGGGATTTTTCTGCG rs4384764 Reverse ACGTTGGATGCCAGAAAGCCATCAGATAGG rs4389685 Extension aaaCTTGCCAATACCAATCTCA rs4389685 Forward ACGTTGGATGTTATTGTCCTTGCCAATACC rs4389685 Reverse ACGTTGGATGTGTGTTGGTTTAGCCATGTG rs4395533 Extension gggGCCATATTTTGATAGTGGTTAA rs4395533 Forward ACGTTGGATGTTCCTATGCCAACAAATCAC rs4395533 Reverse ACGTTGGATGGGGTGACGGAAGAATAATC rs4401454 Extension ACAGTATTATGTAGTTTAAAATACTTAT continued

171

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs4401454 Forward ACGTTGGATGCATTACCATATCTTTATGAG rs4401454 Reverse ACGTTGGATGTCCCAGGTCTATGCAGAGTG rs4443454 Extension CATGAGAAGGAAAAATAAATATATGAA rs4443454 Forward ACGTTGGATGAGCAGTAACTAGTAAATGCG rs4443454 Reverse ACGTTGGATGTCCCAAGTGAATATGCTCTC rs4558984 Extension tgTCGAAACAACAAATGTCTAG rs4558984 Forward ACGTTGGATGGCCAACTCGTTCGAAACAAC rs4558984 Reverse ACGTTGGATGCTTCACACTGGAAGCTTCTC rs4572999 Extension ccgacAGAATAACTGAGATGGGCA rs4572999 Forward ACGTTGGATGGCAAGTCCAAAGTTGGAAAG rs4572999 Reverse ACGTTGGATGCTTCCCCTAAGAGTTGTACC rs4579258 Extension ggCCAGTAAACACTAAAAATTTAAGT rs4579258 Forward ACGTTGGATGGGAGAAACTCCAGTAAACAC rs4579258 Reverse ACGTTGGATGATGGTTTATAAAGTTCTTTG rs4583877 Extension cctccGGGCATTCAGGACATTCTTA rs4583877 Forward ACGTTGGATGGCTGCTGTATTCCTCAGCAC rs4583877 Reverse ACGTTGGATGCCCAGCCTCGATGGTTAAAT rs4621556 Extension aggaCTGCCATTTGGGACACA rs4621556 Forward ACGTTGGATGGTTGGAAAGAATTTCCTTAGC rs4621556 Reverse ACGTTGGATGCAAACAAATGACTATACAG rs4626335 Extension CCCAGTCTTATCTGACATTA rs4626335 Forward ACGTTGGATGTGAGCGTGCCCAGTCTTATC rs4626335 Reverse ACGTTGGATGTTTTCTACAGTGGCTATCCC rs4670264 Extension aacatTAGCATTATGTTGTGTGGAA rs4670264 Forward ACGTTGGATGACATGGATGATCCCAGTAGC rs4670264 Reverse ACGTTGGATGCAGTACATATTCTTTGGTGC rs4670265 Extension AACTGTCTCTTTGGCTT rs4670265 Forward ACGTTGGATGATTTTGCACCCTAGCCCTTG rs4670265 Reverse ACGTTGGATGAATGCAGGAGTGGTGATGAC rs467069 Extension ccaaCCAGAAGGATAAGAGTCA rs467069 Forward ACGTTGGATGGTACTTGACTGTTGGAACCC rs467069 Reverse ACGTTGGATGCTCCCTCTTACTGGGTATTG rs4693375 Extension CACTTATTGAGGGCCTAC rs4693375 Forward ACGTTGGATGATAATGCCCACTCATCATGC continued

172

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs4693375 Reverse ACGTTGGATGTCTCCATTGTTCAGGGCAAC rs4705073 Extension CCCTAATCCAGTATGAACTAATTTTA rs4705073 Forward ACGTTGGATGATTTAGGGCCCACCCTAATC rs4705073 Reverse ACGTTGGATGGTGACCTCATTTGGAAACAG rs4705086 Extension tCTGCAGAACAGCACCAGC rs4705086 Forward ACGTTGGATGCATGGTTGCCTGGTATAGGG rs4705086 Reverse ACGTTGGATGTAGACCTAGGCCAACCCAGA rs4705344 Extension GGTTCACTGATGCTTGA rs4705344 Forward ACGTTGGATGCTTCCCAAGGCTAAGGATAG rs4705344 Reverse ACGTTGGATGCATCTACCAAGTGTCAGACC rs4705369 Extension ggaacTGCTAGTTATAAATCAGTGC rs4705369 Forward ACGTTGGATGTCAGAGCAGGTGATATGCTA rs4705369 Reverse ACGTTGGATGACCTCACCCAGGCTGACTTG rs4705376 Extension GTGAGCAGCTGGGTACT rs4705376 Forward ACGTTGGATGGCTCCATGGAATGTGAGCAG rs4705376 Reverse ACGTTGGATGTATATCCATGTAGCAGCATC rs4705378 Extension GTTCTCTCCAGATCTCC rs4705378 Forward ACGTTGGATGTTTCTGTCACCCTGGGTGAG rs4705378 Reverse ACGTTGGATGCCCTTAATTACAAGGGGTTC rs4705382 Extension GTAGCTTCCTCACACCCA rs4705382 Forward ACGTTGGATGTCTCTGGGTTGCTGGGCCTT rs4705382 Reverse ACGTTGGATGGTCTGTCTCAGAGCATGAAG rs4705415 Extension tgGGCCCAGGGGACAGC rs4705415 Forward ACGTTGGATGTTACTCAGCCCAGAGCATAC rs4705415 Reverse ACGTTGGATGTGCTCGATGGGACCTCTCCT rs4705758 Extension ccctgCCTGGCTCTTACCCTTT rs4705758 Forward ACGTTGGATGACTACAGCTACTGAGTTGGG rs4705758 Reverse ACGTTGGATGCAGATGCCACAAACCCAGG rs4705796 Extension AAAACCCCTAAAAAACCAAAA rs4705796 Forward ACGTTGGATGAGGCTGATGTATCTGAGAAC rs4705796 Reverse ACGTTGGATGCTCCAATTGGGCATCCTTAC rs4722558 Extension tttttGGGTCTAGGGAAGAATC rs4722558 Forward ACGTTGGATGGTGTGGCCTTGAATGTGTTC rs4722558 Reverse ACGTTGGATGGTTCCAACAACTTGCATGAG continued

173

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs4722593 Extension gCTCTCCTCTTCTAACCCTTC rs4722593 Forward ACGTTGGATGTCTACCTGGATCTGACTCCG rs4722593 Reverse ACGTTGGATGGCATGAGTTCCCGAGGCATA rs4722968 Extension GCAGGGATATTTGCAGC rs4722968 Forward ACGTTGGATGATCAGTCTCACATGGCAGGG rs4722968 Reverse ACGTTGGATGCACTTCCCAAAGTCCTTAGC rs4723226 Extension GGACATGCTAACTTAGGAG rs4723226 Forward ACGTTGGATGAGTGGTGTCAGAACAACTGG rs4723226 Reverse ACGTTGGATGAAGAGCTGCAATGTGCAGTC rs4833229 Extension aggACAGTGCCTGTGAATTAA rs4833229 Forward ACGTTGGATGAGGAATTTACAGTGCCTGTG rs4833229 Reverse ACGTTGGATGGAGGTTAGGCTCAGATATAG rs4835907 Extension gttCCACAGGCTAAGGACAG rs4835907 Forward ACGTTGGATGTGCTGCCTTCTAAGAATGCC rs4835907 Reverse ACGTTGGATGCCTCAAAGTATCTACAGTCC rs4835913 Extension AGTGTGCCTTTCCCCCT rs4835913 Forward ACGTTGGATGTCCTTGGAAGAAGAGTGTGC rs4835913 Reverse ACGTTGGATGCTATTGTTGTAAAGTAACC rs4835948 Extension GGTACTTAGAATTTTTAGATGTG rs4835948 Forward ACGTTGGATGTGTCCCTATTTTGTGCACTG rs4835948 Reverse ACGTTGGATGCACTTGTGTCAAGACGTATC rs4835981 Extension CCTTGCAAGTATACGTAGAAC rs4835981 Forward ACGTTGGATGGCCTTCCTTGCAAGTATACG rs4835981 Reverse ACGTTGGATGGACTACTAAAATGCTTTGC rs4836108 Extension AAAATGCATTGGCTCAG rs4836108 Forward ACGTTGGATGGGAACTTAGAAAAAAAATGC rs4836108 Reverse ACGTTGGATGGCATAGACTGCATATTAAATC rs4836110 Extension CTGAAAATGCTTTAAAATCCT rs4836110 Forward ACGTTGGATGAAGGCAAAAATAACTACTG rs4836110 Reverse ACGTTGGATGTCAGTGTCAAACGTGAAGTC rs4836261 Extension CATTCTCTGTCCTAGAAGA rs4836261 Forward ACGTTGGATGTTTCCTTTCTCACCCGTCAC rs4836261 Reverse ACGTTGGATGGGATGGACTGTCACTGGTTT rs4836272 Extension tctACATTATGTATTGACCCTCT continued

174

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs4836272 Forward ACGTTGGATGCACTGCTGTCTCCTTGAAAG rs4836272 Reverse ACGTTGGATGTCCTGCTCCTGGACAGAAAC rs4836276 Extension gGGGATAGGCTATATAAAAAATAATT rs4836276 Forward ACGTTGGATGGATTGGGATAGGCTATATAAAAA rs4836276 Reverse ACGTTGGATGCACACTGCTACACAGCCTAA rs4836277 Extension cCTCCTATGCAAAGACAAACTC rs4836277 Forward ACGTTGGATGTGAAGTCAGGCCCCTACTAC rs4836277 Reverse ACGTTGGATGCTTCTAAGAGTTTTATAGGTC rs4836487 Extension TCTATGTCTTTGCTGTGG rs4836487 Forward ACGTTGGATGGCACCAAGGTTGAGTCTATG rs4836487 Reverse ACGTTGGATGACTGTAAAGATACATGCCCC rs492346 Extension ttttgAGCTTATTAACATTCTGGCTC rs492346 Forward ACGTTGGATGGGAATTAGTATTGCCACTTG rs492346 Reverse ACGTTGGATGTGTCAAGTCAGAATACTGGG rs4948023 Extension gAGGTGTCTGGTCGTAA rs4948023 Forward ACGTTGGATGCAGTACCAGTAATGCCAAGG rs4948023 Reverse ACGTTGGATGTCAGGGTCCAGAACATGTAG rs4948031 Extension CACAGTAGAGCAACTGTAG rs4948031 Forward ACGTTGGATGTGTTCGGTAGCACAGTAGAG rs4948031 Reverse ACGTTGGATGTTCCTGTGTTGGGAACATTC rs4952682 Extension AAGTGATCTTATTTTAATTGTTTGC rs4952682 Forward ACGTTGGATGCCTTCAAGTATTCAAAAGTG rs4952682 Reverse ACGTTGGATGTTTGCTAACTTCAACCAAG rs4952818 Extension tAGCTAGGCTTCTGGTTG rs4952818 Forward ACGTTGGATGATGCTCACACTAGAGCTAGG rs4952818 Reverse ACGTTGGATGTTCTGGCACATGAGAAGCTG rs4952896 Extension CCAAGGTGTGTTCATATAATTAC rs4952896 Forward ACGTTGGATGCTCTAGTCCAAGGTGTGTTC rs4952896 Reverse ACGTTGGATGGCACCCAGTTTTTAATCATC rs4953012 Extension tTGGAGAACATTATTTTGGAAGATA rs4953012 Forward ACGTTGGATGGATTCTGCTTCATGGAGAAC rs4953012 Reverse ACGTTGGATGCAATAGAAGCATTAATCGC rs4953013 Extension ACCCAACCAGCAAAGAC rs4953013 Forward ACGTTGGATGACTTTGCACTGTACCCAACC continued

175

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs4953013 Reverse ACGTTGGATGACTCACTTGCATGGGTGCAG rs4953340 Extension ccctcCTTAATGCCATTGAATTGCA rs4953340 Forward ACGTTGGATGGTAGCCCAACAATATGAACG rs4953340 Reverse ACGTTGGATGCATAACAAAAAATGTTTACC rs4953342 Extension tcctCCTGCCTTTCTTTCTTCT rs4953342 Forward ACGTTGGATGGAAAACGAATTGGGCTGGAG rs4953342 Reverse ACGTTGGATGAAAAGCCTGTGCCCAAGCTG rs4953344 Extension cAGAGAACTGTTTTGTCCTA rs4953344 Forward ACGTTGGATGAGATAGTTCCTTTTGGGGAG rs4953344 Reverse ACGTTGGATGTGTATGTTCTTGCTCCCATC rs4953345 Extension tCCCTAATAGCCTCAGTGT rs4953345 Forward ACGTTGGATGAGGGTGTCCTTCATTCATGG rs4953345 Reverse ACGTTGGATGTCATGTCATCCTTCGCCTGC rs4953352 Extension TGCGGAGGTTAATTAGTC rs4953352 Forward ACGTTGGATGTCTCAGCTTGCGGAGGTTAA rs4953352 Reverse ACGTTGGATGTCTGATGTCTGCCACTGCC rs4953353 Extension cCCAGATAACCCAGCTCCTTA rs4953353 Forward ACGTTGGATGCATCTGGGGCTCCAGATAAC rs4953353 Reverse ACGTTGGATGACAGCTTTTCCCAGCTGGGC rs4953360 Extension GGTAGGCCTGGCTTGAGTA rs4953360 Forward ACGTTGGATGAGGAAATGCAGAAGACTGGG rs4953360 Reverse ACGTTGGATGTGCTGCTCAGCAAGCCCCA rs4953527 Extension cccgATTCCTGCAGATTTTTTCTTT rs4953527 Forward ACGTTGGATGGGATTATAGGCATGAGCCAC rs4953527 Reverse ACGTTGGATGGCATACTGTTAAAGTGCTA rs498360 Extension GACTTGAACTATGTCTCTTGG rs498360 Forward ACGTTGGATGTGGACGAACCAGACTTGAAC rs498360 Reverse ACGTTGGATGCTTCCCAAATTTCCCTAGTG rs509237 Extension GGAGTCAAGAAACAAACC rs509237 Forward ACGTTGGATGGGCTTGGATGAAGAAGCAGA rs509237 Reverse ACGTTGGATGAGATCATAGGAGGGCATGAG rs632678 Extension TTTTGGCTGCCTCCT rs632678 Forward ACGTTGGATGGTTTTCTGCCCTTTTGGCTG rs632678 Reverse ACGTTGGATGGAACTCCCATGACTTCCATC continued

176

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs637634 Extension agtgTCTGGAAAACAATTCAAAGG rs637634 Forward ACGTTGGATGTGTTGGAAGCCAGGCCAAAC rs637634 Reverse ACGTTGGATGGAAGGATAAACACACTAGGC rs639680 Extension TCCTGGAGAAGTGTTGTG rs639680 Forward ACGTTGGATGTATCTGCACAGGAGTGCAAC rs639680 Reverse ACGTTGGATGGCCTGATTTTGTTCTTTGCG rs6419610 Extension TTTATTGACTATGCTTTTTAACC rs6419610 Forward ACGTTGGATGTAAATTTTTATTGACTATGC rs6419610 Reverse ACGTTGGATGCTACAATCTTAAACCACTC rs6461902 Extension aaaggATGCAGTCTAGAAAGTTCAGA rs6461902 Forward ACGTTGGATGCAGAAGAGGCAAGAGAATAC rs6461902 Reverse ACGTTGGATGAAAAAATGGCCCACTCCCTG rs6462161 Extension CTTCAACCTTTGCCTTT rs6462161 Forward ACGTTGGATGATCACCTGGTCTCTAGGCTG rs6462161 Reverse ACGTTGGATGTGACCACTGCACTGCACTTC rs6462430 Extension cccccCTACTGGGATTAGGACTTCTA rs6462430 Forward ACGTTGGATGAGTCCAGCTACTGGGATTAG rs6462430 Reverse ACGTTGGATGCCAATCTACTAATCCTACTTC rs6532079 Extension AGAGTTCTGTTTAACCAGTAT rs6532079 Forward ACGTTGGATGGGTTACTCCTCCTAGAGTTC rs6532079 Reverse ACGTTGGATGCGGGCATGTAAAATCTGATG rs6544066 Extension ATAGTTTATCAAAAGGTACTGTG rs6544066 Forward ACGTTGGATGGCTCCATAGTTTATCAAAAGG rs6544066 Reverse ACGTTGGATGCTCACTGGTACACGTATAAT rs6544589 Extension CAGTGCTGACGGGCT rs6544589 Forward ACGTTGGATGTTTACCAGTGGGCGACTACC rs6544589 Reverse ACGTTGGATGGTGGTATGAAACTGCCAAGG rs6544724 Extension cccgcCCAACTCAAATTTCAGGCAGAA rs6544724 Forward ACGTTGGATGGTAAGAGAAGACCAACTCAA rs6544724 Reverse ACGTTGGATGTGCCTAGAATATTCAGAAC rs6544889 Extension agaggAATGATGTAAAAGAAAACCCA rs6544889 Forward ACGTTGGATGGGCTTCTGGAGAAATGATG rs6544889 Reverse ACGTTGGATGGCTAGAAAGTAGGTTAGAGG rs6594665 Extension acccgTGAGCTAGAATCTGGGC continued

177

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs6594665 Forward ACGTTGGATGTGTCCTTGGGAGCTTCTTGA rs6594665 Reverse ACGTTGGATGCCTGTATCTGTTACCACTGC rs6595441 Extension ggTCATGAACATATGAATTGAGGTATA rs6595441 Forward ACGTTGGATGCCTTTAAGCAGGAATTGGGC rs6595441 Reverse ACGTTGGATGCCTGGGTTATTTTTGTGTGAG rs6595711 Extension aagcaCAGCTCTGCTCTGCATG rs6595711 Forward ACGTTGGATGTTGCTGGCTCCCAGGTCTGT rs6595711 Reverse ACGTTGGATGTCCCAGCCTCCAGGTGATCT rs6650969 Extension gagcGATCAAGGACCTGGAGCTCAG rs6650969 Forward ACGTTGGATGCTCTGCAAGAAGAACCTCAG rs6650969 Reverse ACGTTGGATGAATGCAGACCCTGCCATTTG rs665444 Extension TGTGCCATTTTTTATTAGGTAAGAATTT rs665444 Forward ACGTTGGATGAGCCTATCAAGATATGTGCC rs665444 Reverse ACGTTGGATGGAAATGCAAAGATGACTTAC rs6721860 Extension accATATGCACCTAAACAAATAATG rs6721860 Forward ACGTTGGATGAACTCACAGCAATATGCACC rs6721860 Reverse ACGTTGGATGCAACTAAAAGTAAGCAGATT rs6728843 Extension aGGAGAAGGAGGAATTGC rs6728843 Forward ACGTTGGATGGCAAGAATAGTAAACCAGGG rs6728843 Reverse ACGTTGGATGGCCAATTCTTTGTTTGCCAC rs6736282 Extension gacCGCTGACAATATCCCTATA rs6736282 Forward ACGTTGGATGTCCTGTTAGAAAACGCTGAC rs6736282 Reverse ACGTTGGATGGTGAGGACTGCTTTATATAG rs6741740 Extension agcCCTAGTAGAAAACTGGCAAA rs6741740 Forward ACGTTGGATGGTATGACCTAGTAGAAAAC rs6741740 Reverse ACGTTGGATGTGGCCATTTGGATTTCTTC rs6751481 Extension ggttACATATTTGATACGTCTCTGA rs6751481 Forward ACGTTGGATGCTGACCACATATTTGATACG rs6751481 Reverse ACGTTGGATGGGGCAAAGGAAAGAAAGGG rs6753406 Extension ctagATTCGTGTAACCACCACTAGAT rs6753406 Forward ACGTTGGATGCCTACAGTTTGTTACACAGC rs6753406 Reverse ACGTTGGATGGGAAGCTGTTGAGATACTGG rs6756455 Extension aaacgAGATCTACTTCCAAACAGATT rs6756455 Forward ACGTTGGATGCACCTTTAGGCCAATCATTAG continued

178

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs6756455 Reverse ACGTTGGATGACTTCAGAGGAAAGATAAGC rs6757976 Extension ggaggCCTGGGATCCAAAAGCAGTGG rs6757976 Forward ACGTTGGATGTGCAACCTGGGATCCAAAAG rs6757976 Reverse ACGTTGGATGACAGAGCAAGACTCCGTCTC rs6758592 Extension accGGCATTTTAGCATTTTTGTAA rs6758592 Forward ACGTTGGATGGTACTTCTTGGCATTTTAGC rs6758592 Reverse ACGTTGGATGTCAAACAAAGACCAGATGTG rs6827372 Extension AGGGTAGAACGGTGG rs6827372 Forward ACGTTGGATGTGAACACACAGAAGCAGAGG rs6827372 Reverse ACGTTGGATGATCTTCCCATTCCTCCCTTC rs6857766 Extension gggtaCCTGAGATGATCTATGTGAAG rs6857766 Forward ACGTTGGATGGATGGGATTTCCCTGAGATG rs6857766 Reverse ACGTTGGATGAGCTAGTATCTGCTGAGTGC rs6858978 Extension AGTAAACAATGTATGGTCAC rs6858978 Forward ACGTTGGATGGGAGAATTAATTATCCCAGAG rs6858978 Reverse ACGTTGGATGTACCCTAGTTCTGGCTATCC rs6859190 Extension ccAGATCCAGCTATCAGGGTC rs6859190 Forward ACGTTGGATGGCTGCTGAGATCCAGCTATC rs6859190 Reverse ACGTTGGATGAAGGAAGATGGAGTGGAGGG rs6860957 Extension AGGGAGAAAAAGATGCT rs6860957 Forward ACGTTGGATGTGCAGATGCTCTGCTGAAGG rs6860957 Reverse ACGTTGGATGGAATGAGCATTTTTCCCTG rs6862252 Extension TTGTCCGTTTTGCAGTTG rs6862252 Forward ACGTTGGATGCCGGGCACTTTTCAGTTTAG rs6862252 Reverse ACGTTGGATGTAGGAGATAAAGAAAAGGGC rs6862261 Extension gCATGATGAAACTCCATGTAT rs6862261 Forward ACGTTGGATGGAGTTCTAGATCAGCCTGAG rs6862261 Reverse ACGTTGGATGTGCCTGGCTAATTTTTGCAC rs6862890 Extension tcCACATTCATTTCCTGCC rs6862890 Forward ACGTTGGATGCTGTTGGTTCATGGGTGTTC rs6862890 Reverse ACGTTGGATGCCCTCGTACTCTCTTAAATC rs6863373 Extension gtgtATTGCTCTGTGTTCTACT rs6863373 Forward ACGTTGGATGTCTGGCAAGGTAGATTGCTC rs6863373 Reverse ACGTTGGATGTCCTGCAGGAGAAAATGAGC continued

179

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs6865809 Extension aATAGTAGCTTGTCACTCAT rs6865809 Forward ACGTTGGATGTTGGTTATAGTAGCTTGTC rs6865809 Reverse ACGTTGGATGTTTCTCCCCTACACGACTTG rs6866298 Extension TGCCTCAGTAGTTCCAT rs6866298 Forward ACGTTGGATGCAAGTTTCCTTGTGCCTCAG rs6866298 Reverse ACGTTGGATGAGATTACAGGCATGAGCCAC rs6866739 Extension GCATTACAAAGGACCATC rs6866739 Forward ACGTTGGATGAAATAAGGCATTACAAAGG rs6866739 Reverse ACGTTGGATGCCATTGTGTTAATGTGTAG rs6867136 Extension tTTGCTGATCTTCATGTTC rs6867136 Forward ACGTTGGATGTGACCTGTGGGATGCTTTAG rs6867136 Reverse ACGTTGGATGACAGCGTTCCAGTGCTAATG rs6869827 Extension ccccaTGATATCAACTCCTTTATAAGC rs6869827 Forward ACGTTGGATGCAGGAGATGTTTCTTAATTCG rs6869827 Reverse ACGTTGGATGCATCTATGCTAGAAATATGG rs6870146 Extension cccctCCCAGAATTGGAAAGGTTT rs6870146 Forward ACGTTGGATGGAAATGGAAATTCCCCCTGC rs6870146 Reverse ACGTTGGATGAGCCATCCTGAAAGGACTTG rs6871443 Extension aagggCAGAAAACTCTGGGAGA rs6871443 Forward ACGTTGGATGTCCAGATTGGAGAAGGTCAG rs6871443 Reverse ACGTTGGATGATACACCAAGTATAAGTTC rs6876829 Extension cagtcTTTACTATCAGCCATTTACAG rs6876829 Forward ACGTTGGATGTTGGCCTGCAAAGCCTAAAG rs6876829 Reverse ACGTTGGATGTTCAGTAGAGCAAGGATTGG rs6879589 Extension cacATGTTCCTCCTCTGCCAC rs6879589 Forward ACGTTGGATGGTCTCTGATGTCAGTCAAAC rs6879589 Reverse ACGTTGGATGTTGGGTAATTTACTCTGGGC rs6880051 Extension gATTCACTTTCTCTGGAATTGAA rs6880051 Forward ACGTTGGATGGTTTATTCACTTTCTCTGG rs6880051 Reverse ACGTTGGATGACCAGAAAACCATAAGACCG rs6881379 Extension ctcccGTGTCCTAGTTACAAAATGTC rs6881379 Forward ACGTTGGATGGTTCTACCAAGCAAGTGTCC rs6881379 Reverse ACGTTGGATGTTATGATGCTCTTTGTTAC rs6884924 Extension tgccgTCTTTACATGTCAGCAAAAT continued

180

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs6884924 Forward ACGTTGGATGGAATAAATTCTTTACATGTC rs6884924 Reverse ACGTTGGATGCTCTGCTCTCATGGAACTTG rs6886362 Extension AGGAGAGGGTAGATGCT rs6886362 Forward ACGTTGGATGAAAGCAGAGAGGAGAGGGTA rs6886362 Reverse ACGTTGGATGACTCACTTGTGGATTTCAGG rs6890872 Extension GAGGTTTGAGCAAATGATCC rs6890872 Forward ACGTTGGATGACCACAACATTCTTTTTGGG rs6890872 Reverse ACGTTGGATGCTGTTTCAGTAGCTTATGAGG rs6891155 Extension TGTTGTAGACTTGGAAGATA rs6891155 Forward ACGTTGGATGGGTATCTACCATGTTGTAGAC rs6891155 Reverse ACGTTGGATGAGTGCTATATTTTGGAAGGC rs6892794 Extension gagGCAAGAAAGTGTGAAGTAC rs6892794 Forward ACGTTGGATGGAGAAAGCAAGAAAGTGTG rs6892794 Reverse ACGTTGGATGGCCTGCTTTTGATAGACTTC rs6893016 Extension aTCTTTACCCTAAATATCAGACATA rs6893016 Forward ACGTTGGATGCTACTCAGAACATGCCAACC rs6893016 Reverse ACGTTGGATGGCGTGGCACACCTTTCATTC rs6893389 Extension GTGGAACATTAGATGTCTGTCC rs6893389 Forward ACGTTGGATGGTTCGCCTGTGGAACATTAG rs6893389 Reverse ACGTTGGATGGGAGCTCACCTTCACAAATC rs6894219 Extension accctAGGTATATGGCTCTCCT rs6894219 Forward ACGTTGGATGCAGTACTGTTCCTTCTTTCC rs6894219 Reverse ACGTTGGATGGCTGCTTCTCATAGCATAAG rs6895698 Extension agttTCCTAGTAATGATCTCATGC rs6895698 Forward ACGTTGGATGCTTATGTGACACACATTGTCC rs6895698 Reverse ACGTTGGATGAGCAGGCTACAATTCCTTTC rs6897488 Extension ggTGAGAGTGAATTCAGGTT rs6897488 Forward ACGTTGGATGGGAATAGTCATAAGGTTGGC rs6897488 Reverse ACGTTGGATGCTTCAAGTAACTCCAGGTTG rs6947342 Extension ttagTTGTTCAGAACAAGATGTCTTGTC rs6947342 Forward ACGTTGGATGGATTTGTTCAGAACAAGATG rs6947342 Reverse ACGTTGGATGTCCCTATGCTGAACATTTAG rs6949451 Extension TCCCCTCCCTCCCATAC rs6949451 Forward ACGTTGGATGTCACTACACTCCTACCTCTC continued

181

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs6949451 Reverse ACGTTGGATGAGGAGCAAGGACTGCCCTA rs6958331 Extension tgGTCTCCTTTTTTTCCTAAGGAAAATA rs6958331 Forward ACGTTGGATGAGAAATGCTTATTGTCTCC rs6958331 Reverse ACGTTGGATGTGGCAATCTGAGGATATAGC rs6962537 Extension gggaAAAGTTAACATGGCAAGAT rs6962537 Forward ACGTTGGATGAAGGCTGACAGATAATTAC rs6962537 Reverse ACGTTGGATGTATGCGAAGTCCTTTCCCAC rs6964512 Extension aacagGTGAACCCATTTTATGGACATA rs6964512 Forward ACGTTGGATGGCACTTTACACATGTGAACC rs6964512 Reverse ACGTTGGATGCAAAGTTACGTACATTCCAG rs6970537 Extension AGCACTTTTCCTCCCTC rs6970537 Forward ACGTTGGATGTCGGCTTCCAAGCACTTTTC rs6970537 Reverse ACGTTGGATGTAGTAGCTGGGAAGAATGGC rs6973742 Extension CCACCATGGTTTGAGTTTATATCAG rs6973742 Forward ACGTTGGATGGAACTTCCTAATGGACGGTG rs6973742 Reverse ACGTTGGATGATGATGAATGAGCAGCAGGG rs6979746 Extension cccGCAAACTATAGAAAAATCAACACCA rs6979746 Forward ACGTTGGATGCCTAGAGTCACACTACTTGC rs6979746 Reverse ACGTTGGATGTTTTTCACTGTGTACCTGGC rs710834 Extension CGCCATTACCTCCTC rs710834 Forward ACGTTGGATGCACCGCCTTCCGCCATTAC rs710834 Reverse ACGTTGGATGACATGGCCGACTCGGAAAAC rs728989 Extension TGCATGTGCAGATGT rs728989 Forward ACGTTGGATGTGAGTAGGTTCAAGAGACAC rs728989 Reverse ACGTTGGATGCAGAAGCAGGTACCATGAGA rs7357309 Extension gaggCCAGAATTAGGGGAATCC rs7357309 Forward ACGTTGGATGCTTCTGCCTTCCAGAATTAG rs7357309 Reverse ACGTTGGATGGCAGCAAAATACCACAAAC rs740751 Extension aaaaCCCCCTCTGAGCTACCT rs740751 Forward ACGTTGGATGAAGGGCCTCAGCATCTTTTC rs740751 Reverse ACGTTGGATGTCTGGGTGGGAGAGAGATG rs7425925 Extension GTTGGAAGGATTGAGAATGG rs7425925 Forward ACGTTGGATGCTACATGTGGAAGTTGGAAG rs7425925 Reverse ACGTTGGATGATACCATTCACTGATGCCTC continued

182

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs752009 Extension TGACATAGAGAATGACAGGT rs752009 Forward ACGTTGGATGAAGAAGGCAAGAACCTTGAC rs752009 Reverse ACGTTGGATGGCCTCAACTTCAATGTCCTG rs752730 Extension CACTTAAATGAGCGAGCTAAATA rs752730 Forward ACGTTGGATGGAAACCCACTTAAATGAGCG rs752730 Reverse ACGTTGGATGAGAGACCACCGGTCAAGTTC rs7568481 Extension cccGGTAATTGTTTTTGGCATAAT rs7568481 Forward ACGTTGGATGTGACGTACCTGAGGTAATTG rs7568481 Reverse ACGTTGGATGAGTAGGCTAATCTATAACC rs7570659 Extension ggggTGTTACTGGCATCTAGAACG rs7570659 Forward ACGTTGGATGCAGTTGTTGTTACTGGCATC rs7570659 Reverse ACGTTGGATGCCTCTGATGGATCTGAGTAG rs7583088 Extension cAAAAAATCCCAACCCAG rs7583088 Forward ACGTTGGATGATGACAGATGCCAGGGAAAG rs7583088 Reverse ACGTTGGATGACTTCCTGTGGATGTCTTGG rs7584547 Extension CCTGATACTATTTATTCTTTAATGAT rs7584547 Forward ACGTTGGATGAGCTTTGCTGCTTAATCCTG rs7584547 Reverse ACGTTGGATGCAACACTGTCTGGCACTCAA rs7594731 Extension agACCAGGACATATGGTTACC rs7594731 Forward ACGTTGGATGCATCAGATGAATACCAGGAC rs7594731 Reverse ACGTTGGATGCTACAGAAATCCAATCTAGC rs7657769 Extension cGCAGATGTATGTCTCTCTG rs7657769 Forward ACGTTGGATGGGACTTCTTAGGCTAGTTGG rs7657769 Reverse ACGTTGGATGGCAGACCTATAAAAAACAG rs7661193 Extension ggggaTCAATGAGTGACTGTTTTGGCTC rs7661193 Forward ACGTTGGATGCCTAACTCAATGAGTGACTG rs7661193 Reverse ACGTTGGATGAGGAACTATGGATCATCTGG rs7664025 Extension ctctgGGTGATGAGCCTGGCATA rs7664025 Forward ACGTTGGATGTTTTCCAGGCATGCTTGGTG rs7664025 Reverse ACGTTGGATGTTGTGTGGATGACTGCCTAC rs767657 Extension ggAGAACTCAAGGACTGATGA rs767657 Forward ACGTTGGATGTTTAGAAGCACAATATCAG rs767657 Reverse ACGTTGGATGTTCCCGTCTCTTGGCTTTTC rs7681440 Extension aaACGTTCACATTCACCTAC continued

183

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs7681440 Forward ACGTTGGATGTCTGGAAAAGCAAACAGTCG rs7681440 Reverse ACGTTGGATGAAAGGTGGACTGAGTCAGAG rs7681598 Extension taaTACCTTAAATGCAGACAAAA rs7681598 Forward ACGTTGGATGACTACTACCTTAAATGCAG rs7681598 Reverse ACGTTGGATGCTGTCAGGCCATAGAGATCG rs7708070 Extension atCCGGGGCAGAGGGAAGAGAGC rs7708070 Forward ACGTTGGATGAAGTGACCTGGAGATGAGAG rs7708070 Reverse ACGTTGGATGAAGCTGGCCTCGGAGCCC rs7717604 Extension CACCCTCAGGACTTTGTGTG rs7717604 Forward ACGTTGGATGTGTTGGTGCAACACCCTCAG rs7717604 Reverse ACGTTGGATGTCTATCAGTGATGAAGCTGG rs7718029 Extension gtttCTAACAAGGATAGCCCA rs7718029 Forward ACGTTGGATGGAAGGGTTATTTTCCCTAAG rs7718029 Reverse ACGTTGGATGTCCTGTGTTATATGGAAGTC rs7719910 Extension ggTGAAGTCCAAGTTTCTGGT rs7719910 Forward ACGTTGGATGTGGACCCAGATTCACTGAAG rs7719910 Reverse ACGTTGGATGTCCTCCTCTGACATGCCAAG rs7721529 Extension CTGTTCTTATTCGTAAATTAATTTCAAC rs7721529 Forward ACGTTGGATGGTGTGTATCTGTTCTTATTCG rs7721529 Reverse ACGTTGGATGCCCTTTGAACTAGTTTCTGG rs7721622 Extension ttccGAGTCCAGGCTCTTGTGC rs7721622 Forward ACGTTGGATGTCAATGCACTATTCGGTGGC rs7721622 Reverse ACGTTGGATGTTGGGTTGAGTAGTTGGGCT rs7724258 Extension AGAGCAGTACCTAAGGA rs7724258 Forward ACGTTGGATGACAAATTGGGCTAGTGGCAG rs7724258 Reverse ACGTTGGATGCAGTAATGAACTAGGACCAA rs7724734 Extension aggGGAAGTCCAGAGAAAACAA rs7724734 Forward ACGTTGGATGCTTGTTCAGAGGAAGTCCAG rs7724734 Reverse ACGTTGGATGTGGGCTCATACTTTTTTCCG rs7727195 Extension ggTTGTGTGGGCCACTAG rs7727195 Forward ACGTTGGATGAAGGTCCATCCACCTTGTGT rs7727195 Reverse ACGTTGGATGAAGACTGGGCATTCTAGGTG rs7730782 Extension ccttGTACCTGTTTCTGTTCCT rs7730782 Forward ACGTTGGATGCCTTTCCTGGTACCTGTTTC continued

184

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs7730782 Reverse ACGTTGGATGTCTTCAGGAGTCAGCTAAGG rs7731056 Extension gTGGGGATGCAGATGGAC rs7731056 Forward ACGTTGGATGAGACTCTGCAGCAAGGCTCT rs7731056 Reverse ACGTTGGATGTCCAGATGGGAATTGGAAAG rs7731890 Extension cgCACCCCTCGAAAGGGAC rs7731890 Forward ACGTTGGATGGGGAAATGGGTAGAGGACAC rs7731890 Reverse ACGTTGGATGGAAGGTTGGCATGTTCTGAG rs7732198 Extension cccccACTCTCTGAGGCAGGAACTA rs7732198 Forward ACGTTGGATGCCACAAATACTCTCTGAGGC rs7732198 Reverse ACGTTGGATGTTCTCTGTGCCTTCATTCCC rs7736823 Extension gagtTCCTGTGAAGATGAAAGCAG rs7736823 Forward ACGTTGGATGACAGGGAAGAAGGTCCTGTG rs7736823 Reverse ACGTTGGATGTCTGATAGCTCTTGGTGCTC rs7737127 Extension aagGGAGAAAAATAAACAGTGTGT rs7737127 Forward ACGTTGGATGGAGGAGTTTTTAAAAATGGAG rs7737127 Reverse ACGTTGGATGCAGTCATAGAGCTGATTTGG rs7737512 Extension GTATGCCTTTTGGTAGGA rs7737512 Forward ACGTTGGATGCATAATCTGATGGAAGCAGG rs7737512 Reverse ACGTTGGATGGAGAAACAGTTTTGGAAGTC rs774256 Extension CTACAGCTGTTACTCCAAATTTA rs774256 Forward ACGTTGGATGGGCTGATCTGGTATAACTAC rs774256 Reverse ACGTTGGATGCATCATCATTTTTCTCTGC rs7784224 Extension taTGCTACCATTCTTTTCTCT rs7784224 Forward ACGTTGGATGTGCTGTTTTACTCTGCTACC rs7784224 Reverse ACGTTGGATGCTGGCTTGTTAAAATCCGTG rs7794120 Extension acaaCTTCATAAGCATGCCTGAACAACT rs7794120 Forward ACGTTGGATGGTACTTGAACACTTTACACC rs7794120 Reverse ACGTTGGATGCATTTTCTTCTCCTGGCATA rs7799841 Extension CGATTCCCCTGCTTCT rs7799841 Forward ACGTTGGATGTGAGATGGTCTCCACGATTC rs7799841 Reverse ACGTTGGATGTAAGATCTGGCCACCAACTC rs7801318 Extension cctTGCCATGTCCTGTCTTG rs7801318 Forward ACGTTGGATGAGCAAGCTGAGACACTTCTG rs7801318 Reverse ACGTTGGATGATACACAAGCTGAGGTGTGG continued

185

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs7811609 Extension ggggACTGCCCTTTGAAACG rs7811609 Forward ACGTTGGATGAAAGAAAGCAGAAACTGCCC rs7811609 Reverse ACGTTGGATGAGGAGGTGGTCGAATTCAAG rs7933 Extension gggcgTCCCGCGGGAGCCGCGGCCCAAG rs7933 Forward ACGTTGGATGTTTGGCACGCGCGATGCGTT rs7933 Reverse ACGTTGGATGCCCGAGAAGCTGAGGCTGT rs8026 Extension AAGGAGGTAGTATTAACACA rs8026 Forward ACGTTGGATGGGGTATAGTGATGTACAAGG rs8026 Reverse ACGTTGGATGACCATTACAGGGCTAAAAGG rs815965 Extension ggggACAGCAAGGAAACAAAAACAAAAC rs815965 Forward ACGTTGGATGCTAGATAAGAAACAGCAAGG rs815965 Reverse ACGTTGGATGACACATTCCTCTTCACTGCC rs815991 Extension ccgcTCTGAGCTACTTGTTTCTTC rs815991 Forward ACGTTGGATGGAAACTTTTATCTGAGCTAC rs815991 Reverse ACGTTGGATGACCTCTCCCTCACCCCAAG rs843556 Extension ccTAGCATTTCTAGAGTCCAAGAATAA rs843556 Forward ACGTTGGATGGCAGATTATGTCTAGTCCCC rs843556 Reverse ACGTTGGATGCAGATTATAGGGCAAGATGG rs843570 Extension gggaAGAACCTGACCCTGGG rs843570 Forward ACGTTGGATGTTCTTTCTGCTCATCCACCC rs843570 Reverse ACGTTGGATGACTGTCAGGGTCATGACTAC rs843576 Extension ccccCTATTTACAAAGGATCTGACAGCA rs843576 Forward ACGTTGGATGCCAAGCTCTATTTACAAAGG rs843576 Reverse ACGTTGGATGCATGGAGACTTTTGCTGATG rs874869 Extension CTGCAGTAACAAATCCC rs874869 Forward ACGTTGGATGGCAGTATCAGTCAGATAGGC rs874869 Reverse ACGTTGGATGGTTGTTTTAAGCTAGACTGG rs887343 Extension GCCAGGTTCAGCAGCTCT rs887343 Forward ACGTTGGATGCAAGCGACTTTTGAGACCAG rs887343 Reverse ACGTTGGATGCTTCCATACAAATCACCTCG rs919751 Extension TCCAGCCGAGTACCC rs919751 Forward ACGTTGGATGATCTTTCTGACTTGCCCAGC rs919751 Reverse ACGTTGGATGGAAGTAGTAGAGCAGGTGAC rs929377 Extension GGTCTTTGAGGGCACCTGA continued

186

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs929377 Forward ACGTTGGATGAGAGGTTAGTTCTAGCTTCC rs929377 Reverse ACGTTGGATGATCCTAGTCCACTCCTCATC rs9307048 Extension tttccCAGCCTAAGCTATCCC rs9307048 Forward ACGTTGGATGTAGGCAGTAAGCATGTGTGG rs9307048 Reverse ACGTTGGATGTGCAGCCTTACAATCTACCC rs9309056 Extension AAGATCAGATTTATCTGAAAGG rs9309056 Forward ACGTTGGATGTCCTTTCCTACTGTTCCCAC rs9309056 Reverse ACGTTGGATGCATATTTATAGATGCATAAGC rs9324628 Extension cctTCTTTCTTTTCTTAAGCAAAATC rs9324628 Forward ACGTTGGATGGAAGAACCAGGAATCTGCTC rs9324628 Reverse ACGTTGGATGTGAAAGGAGTGAAGTTAGTG rs9325134 Extension ggggcGCAGAAGCCAATTTCCATGTT rs9325134 Forward ACGTTGGATGCCTAACCTTAACCTTCACAG rs9325134 Reverse ACGTTGGATGCCAGGTGACATAAGCTTTCC rs9327244 Extension gCCCCAACTATTTCATACCTGTTTT rs9327244 Forward ACGTTGGATGTTCCCCAACTATTTCATACC rs9327244 Reverse ACGTTGGATGGACAATAGTCAGCTCATTGC rs9327500 Extension gtatgTGGCCCTTTTGCACTTA rs9327500 Forward ACGTTGGATGCAGACAATGTTATTTGGCCC rs9327500 Reverse ACGTTGGATGCATGAAAAATATGCAGCATGG rs960106 Extension CAGGACACACAAATCGTCATA rs960106 Forward ACGTTGGATGAGTATAAAACAGGACACAC rs960106 Reverse ACGTTGGATGGAAAATACATTAGGGACAGGC rs9639668 Extension GTGTCATGCATACTCATTCA rs9639668 Forward ACGTTGGATGGTTGGCATTCAACCAGTGTC rs9639668 Reverse ACGTTGGATGAATTTGGAGAGCACTGTTCG rs966087 Extension CTTATGAACCTTGGTATCATTA rs966087 Forward ACGTTGGATGCTTCCCTCTTATGAACCTTG rs966087 Reverse ACGTTGGATGAAACTTGAGGGTGAGAGAAG rs968355 Extension AAAGGCTCCCATCACCTA rs968355 Forward ACGTTGGATGGCTGAAGGGACAAACAAAAG rs968355 Reverse ACGTTGGATGGTGCTTGTGAATTGCTTCTC rs9687567 Extension ccctCAGTTCCCTAAGCAATTT rs9687567 Forward ACGTTGGATGTGTTTAACCAAGCAGTTCCC continued

187

Table 9 continued SNP ID Primer Type Primer Sequence (5' to 3') rs9687567 Reverse ACGTTGGATGGTAATTCCTACCTAGTCTGG rs969876 Extension cactcTTTCAGCAGAAAAACACAC rs969876 Forward ACGTTGGATGGTATCTAAAACTATTTCAGC rs969876 Reverse ACGTTGGATGGGATTCTCAATGTAAGAGAC rs975537 Extension GGATAAAAGTAGGTGGGTAA rs975537 Forward ACGTTGGATGGGACGGATGGATAAAAGTAG rs975537 Reverse ACGTTGGATGCTATCCACTTACCTACCAAC rs9791151 Extension ctagGCATATATGCCCCTTTTTTT rs9791151 Forward ACGTTGGATGTTCTTATGCATATATGCCCC rs9791151 Reverse ACGTTGGATGTGCACCTACTCTCTTGTATG rs986610 Extension cCCCTAATTTGATTATCACACA rs986610 Forward ACGTTGGATGGTTTCAGGTAATGGATATG rs986610 Reverse ACGTTGGATGGGGATCCAGAATTATATTTCC rs9912 Extension aCCTCCACGTGCTTCCTT rs9912 Forward ACGTTGGATGATGCCCAGCAAGGGAACCTC rs9912 Reverse ACGTTGGATGGGGAAGGTCTTCAGTTGAGC rs9973653 Extension AGTTCTTGAGCAGTGGTG rs9973653 Forward ACGTTGGATGGTACAGAGTTTCCATTTGAG rs9973653 Reverse ACGTTGGATGAGTACGTTCATATTGTTGGG

Table 9. Primers for Sequenom MassARRAY quantitative SNP genotyping

188

STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc13 6 53989926 54251807 + NM_001163640 Chn2 5.5 0.0192638 3150 520.9 332.8 6.893 5.854

Scc13 6 54222816 54251807 + NM_023543 Chn2 194.6 3.22E-44 2653 31.8 204.4 0.500 4.269

Scc13 6 54277006 54280195 + NM_030024 Prr15 185.1 3.78E-42 1381 269.0 539.0 8.120 21.624

Scc13 6 54402877 54453763 + NM_001167860 Wipf3 21.6 3.39E-06 4204 181.6 73.6 1.801 0.970

Scc13 6 54545105 54595817 + NM_001164361 Plekha8 24.6 7.04E-07 6722 1409.7 1284.9 8.742 10.590

Scc13 6 54553141 54595817 + NM_001001335 Plekha8 54.7 1.44E-13 6627 1409.3 1402.2 8.865 11.722

Scc13 6 54873936 54921656 - NM_001171007 Nod1 11.2 0.000805246 4071 314.0 172.7 3.215 2.350

Scc13 6 54873936 54922607 - NM_172729 Nod1 6.1 0.013184 4322 263.0 154.3 2.537 1.978

Scc13 6 54935089 54942862 - NM_026637 Ggct 13.4 0.000251941 1362 72.0 24.0 2.204 0.976

Scc13 6 54987995 55029499 + NM_180678 Gars 218.4 0 2390 3666.0 3878.0 63.943 89.897

1 Scc13 8 6 55153377 55270217 + NM_001142781 Fam188b 5.2 0.0223535 4270 73.7 35.1 0.720 0.455

9

Scc13 6 55286293 55298550 + NM_007472 Aqp1 1006.4 0 2760 4215.0 1218.0 63.663 24.450

Scc13 6 55401974 55451450 + NM_001025372 Adcyap1r1 56.9 4.68E-14 6159 326.2 112.4 2.208 1.011

Scc13 6 55787012 55928593 + NM_001081665 Ccdc129 5.6 0.0178581 3284 5.0 0.0 0.063 0.000

Scc13 6 56019798 56312387 - NM_001159953 Pde1c 9.7 0.00187047 3907 73.6 28.7 0.785 0.407

Scc13 6 56050434 56319629 - NM_001159957 Pde1c 16.2 5.69E-05 8299 51.6 12.2 0.259 0.082

Scc13 6 56050434 56312387 - NM_001159952 Pde1c 6.3 0.0123195 8826 17.7 3.8 0.084 0.024

Scc13 6 56050434 56312805 - NM_001159960 Pde1c 6.2 0.012539 8727 19.8 4.7 0.095 0.030

Scc13 6 56050434 56312387 - NM_001025568 Pde1c 5.9 0.015045 8853 16.7 3.6 0.079 0.022

Scc13 6 56651057 56654694 - NM_025520 Lsm5 16.2 5.57E-05 501 25.0 49.0 2.080 5.419

Scc13 6 56664899 56711906 + NM_030235 Avl9 25.4 4.66E-07 6691 2437.0 2130.0 15.183 17.637

Scc13 6 56727519 56747808 - NM_145958 Kbtbd2 4.6 0.0324806 3868 1043.0 866.0 11.241 12.404

Table 10. Genes showing differential expression between Balb/cHeA and STS/A by RNA-seq continued 189

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc13 6 56782053 56829355 + NM_012056 Fkbp9 52.2 5.01E-13 3011 2313.0 1363.0 32.023 25.080

Scc13 6 56832395 56873927 - NM_026004 Nt5c3 227.6 0 1675 1176.0 1576.0 29.268 52.129

Scc13 6 57456496 57485421 - NM_175523 Ppm1k 54.1 1.95E-13 5560 394.0 487.0 2.954 4.853

Scc13 6 57530986 57615131 + NM_025992 Herc6 597.9 0 5279 290.0 963.0 2.290 10.107

Scc13 6 57652449 57689444 + NM_133737 Lancl2 38.7 4.82E-10 3657 419.0 478.0 4.776 7.242

Scc13 6 58546666 58642446 + NM_011920 Abcg2 175.4 4.97E-40 2491 3374.0 3496.0 56.464 77.756

Scc13 6 58783694 58870391 + NM_028705 Herc3 9.3 0.00229648 4729 756.0 669.0 6.664 7.838

Scc13 6 58855227 58857121 - NM_021432 Nap1l5 10.6 0.00112931 1894 262.0 141.0 5.767 4.125

Scc13 6 58883530 58974497 - NM_153574 Fam13a 284.9 0 4293 1861.0 681.0 18.071 8.789

Scc13 6 59158864 59162028 + NM_001081145 Tigd2 41.7 1.04E-10 3007 431.0 496.0 5.975 9.139

19 Scc13 6 59302455 59376285 - NM_183183 Gprin3 18.6 1.64E-05 3100 152.0 61.0 2.044 1.090

0 Scc13 6 60681567 60778991 - NM_001042451 Snca 52.8 3.77E-13 1278 50.5 0.4 1.648 0.017

Scc13 6 60681567 60779850 - NM_009221 Snca 47.0 7.12E-12 1190 5.5 44.6 0.192 2.077

Scc13 6 61130319 62332858 + NM_001164316 Fam190a 7.2 0.0074199 5602 90.1 100.1 0.671 0.990

Scc13 6 64679140 64681230 + NM_007500 Atoh1 368.6 0 2090 2703.0 1039.0 53.914 27.543

Scc13 6 65067287 65094725 - NM_019455 Hpgds 7.2 0.00733388 3298 275.0 261.0 3.476 4.385

Scc13 6 65331288 65408145 + NM_175524 C130060K24Rik 6.9 0.00847606 2953 1.0 7.0 0.014 0.131

Scc13 6 65540392 65584035 + NM_001001495 Tnip3 153.2 3.49E-35 3769 89.0 269.0 0.984 3.954

Scc4 17 78907403 79026750 + NM_001197028 Vit 5.8 0.015809 2654 30.9 41.2 0.486 0.860

Scc4 17 79152246 79234722 - NM_001081179 Heatr5b 53.0 3.29E-13 6437 1098.0 571.0 7.111 4.915

Scc4 17 79234856 79247649 + NM_181649 Ccdc75 26.1 3.23E-07 4207 617.0 621.0 6.114 8.178

Scc4 17 79249845 79251885 - NR_003363 Gm6548 80.3 3.25E-19 2040 168.8 295.3 3.449 8.020 continued 190

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc4 17 79251890 79281913 - NM_011163 Eif2ak2 296.5 0 4326 1046.2 1553.7 10.082 19.899

Scc4 17 79315839 79320389 + NM_001177402 1110001A16Rik 25.5 4.47E-07 1345 187.5 231.2 5.810 9.525

Scc4 17 79336475 79347393 + NM_028611 2410091C18Rik 24.7 6.72E-07 2303 529.0 277.0 9.576 6.664

Scc4 17 79348745 79420157 - NM_001171004 Prkd3 61.0 5.57E-15 5887 500.8 201.1 3.546 1.893

Scc4 17 79348745 79420157 - NM_029239 Prkd3 35.0 3.36E-09 5884 440.1 203.1 3.118 1.912

Scc4 17 79348745 79407846 - NM_001171005 Prkd3 33.7 6.52E-09 5083 570.1 283.8 4.676 3.093

Scc4 17 79451246 79489584 + NM_027455 Qpct 86.2 1.66E-20 1924 684.0 830.0 14.820 23.901

Scc4 17 79733365 79754432 - NM_026514 Cdc42ep3 7.2 0.00721438 1967 686.0 600.0 14.539 16.900

Scc4 17 79905389 79909827 - NR_033462 Gm10494 9.8 0.00177401 2582 120.0 55.0 1.937 1.180

Scc4 17 80014240 80081493 + NM_201361 Fam82a1 27.1 1.89E-07 1802 168.0 60.0 3.886 1.845

19 Scc4 17 80106293 80114382 - NM_009994 Cyp1b1 11.0 0.000933169 5128 16.0 32.0 0.130 0.346

1

Scc4 17 80247732 80295369 - NM_019717 Atl2 170.4 6.16E-39 3521 2073.5 2314.8 24.550 36.424

Scc4 17 80247732 80295464 - NM_178050 Atl2 5.4 0.0201486 3371 549.5 353.2 6.795 5.804

Scc4 17 80428827 80461675 - NM_144802 Hnrpll 55.3 1.01E-13 3116 1484.0 1469.0 19.854 26.119

Scc4 17 80526811 80584373 + NM_176963 Galm 39.2 3.79E-10 2245 1309.0 741.0 24.307 18.287

Scc4 17 80599420 80606646 - NM_146083 Srsf7 112.5 2.73E-26 2288 681.2 878.4 12.412 21.270

Scc4 17 80599420 80606646 - NM_001195485 Srsf7 43.0 5.36E-11 2279 155.2 229.1 2.839 5.569

Scc4 17 80599420 80606646 - NR_036615 Srsf7 26.6 2.55E-07 2739 1853.5 1150.8 28.210 23.279

Scc4 17 80599420 80606646 - NM_001195486 Srsf7 22.3 2.31E-06 2255 186.4 223.8 3.445 5.498

Scc4 17 80599420 80606646 - NM_001195487 Srsf7 14.0 0.000181456 2243 39.7 62.9 0.737 1.555

Scc4 17 80623829 80627838 + NM_026053 Gemin6 42.2 8.08E-11 1086 85.0 151.0 3.263 7.703

Scc4 17 80637644 80689817 - NM_001163759 Dhx57 36.8 1.32E-09 5077 398.4 175.9 3.271 1.919 continued 191

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc4 17 80706747 80788030 + NM_001145452 Arhgef33 8.6 0.00327631 3261 7.7 0.0 0.099 0.000

Scc4 17 80771197 80772883 - NR_028385 Gm10190 8.2 0.00422796 1686 7.3 0.0 0.180 0.000

Scc4 17 81343972 81401157 + NM_026516 Tmem178 12.0 0.000521676 1696 16.0 1.0 0.393 0.033

Scc4 17 81772445 82137718 - NM_001112798 Slc8a1 144.5 2.78E-33 18563 2497.5 1249.1 5.609 3.728

Scc4 17 81772445 82137718 - NM_011406 Slc8a1 73.6 9.59E-18 18599 2406.5 1358.0 5.394 4.045

Scc4 17 83614623 83624410 + NM_134117 Pkdcc 73.9 8.14E-18 2454 1449.0 745.0 24.615 16.820

Scc4 17 83750271 83879700 + NM_001114362 Eml4 119.5 7.98E-28 5236 577.1 181.6 4.595 1.921

Scc4 17 83750271 83879700 + NM_199466 Eml4 7.2 0.0073641 5107 456.3 280.6 3.725 3.044

Scc4 17 83901257 83913674 - NM_009187 Cox7a2l 595.5 0 1080 711.6 1542.5 27.467 79.131

Scc4 17 83901257 83913674 - NM_001159529 Cox7a2l 160.3 9.74E-37 1149 273.4 30.5 9.920 1.470

19 Scc4 17 84105503 84204763 + NM_001171052 Mta3 206.1 0 2673 65.1 275.2 1.016 5.704

2

Scc4 17 84105503 84192218 + NM_001171054 Mta3 39.1 3.94E-10 1727 148.0 40.0 3.573 1.282

Scc4 17 84105503 84214246 + NM_054082 Mta3 26.7 2.43E-07 2115 359.5 169.1 7.085 4.429

Scc4 17 84230694 84246131 - NM_025325 Haao 40.2 2.30E-10 1224 9.0 47.0 0.307 2.127

Scc4 17 84583269 84587288 - NM_001001806 Zfp36l2 103.4 2.73E-24 3532 3754.0 2152.0 44.307 33.757

Scc4 17 84589396 84865549 - NM_183021 Thada 111.6 4.34E-26 7896 420.0 113.0 2.217 0.793

Scc4 17 84911235 85021483 + NM_177606 Plekhh2 59.0 1.57E-14 6886 997.0 496.0 6.036 3.991

Scc4 17 85057574 85082264 - NM_031884 Abcg5 81.5 1.72E-19 2263 59.0 160.0 1.087 3.917

Scc4 17 85104587 85190127 - NM_028233 Lrpprc 25.7 4.09E-07 4621 2340.0 1490.0 21.110 17.864

Scc4 17 85316522 85356764 - NR_027929 1110020A21Rik 8.2 0.00415274 591 0.0 5.0 0.001 0.469

Scc4 17 85356081 85423333 + NM_001159497 Ppm1b 15.9 6.65E-05 1592 36.3 6.2 0.951 0.216

Scc4 17 85357341 85416469 + NM_011151 Ppm1b 128.1 1.08E-29 3262 643.5 868.5 8.224 14.751 continued 192

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc4 17 85373392 85423333 + NM_001159498 Ppm1b 16.5 4.74E-05 1563 105.0 133.9 2.800 4.747

Scc4 17 85427687 85463582 + NM_009205 Slc3a1 863.2 0 2305 3328.5 910.8 60.198 21.893

Scc4 17 85462817 85489615 - NM_001163624 Prepl 147.9 4.98E-34 3162 431.1 94.7 5.684 1.660

Scc4 17 85462817 85489608 - NM_145984 Prepl 86.8 1.19E-20 3394 384.2 114.9 4.719 1.875

Scc4 17 85462817 85489608 - NM_001163623 Prepl 69.6 7.25E-17 3294 295.6 86.2 3.742 1.450

Scc4 17 85462817 85489608 - NM_001163622 Prepl 18.1 2.15E-05 3830 657.4 627.4 7.155 9.075

Scc4 17 86083608 86087595 - NM_011380 Six2 11.2 0.00080919 2108 10.0 0.0 0.198 0.000

Scc4 17 86567125 87057260 + NM_011104 Prkce 5.1 0.0244313 6254 1048.0 707.0 6.986 6.263

Scc4 17 87153204 87232751 + NM_010137 Epas1 448.1 0 5352 4633.0 2016.0 36.087 20.870

Scc4 17 87362451 87399410 + NM_145491 Rhoq 25.2 5.15E-07 4122 2184.4 1384.6 22.091 18.610

Scc4 17 87396599 87424742 - NM_008838 Pigf 252.0 0 980 430.6 811.4 18.318 45.871

19 Scc4 17 87424901 87435149 + NM_019936 Cript 27.4 1.70E-07 1171 999.0 953.0 35.564 45.089

3

Scc4 17 87507019 87536926 + NM_019654 Socs5 29.6 5.19E-08 4394 877.0 487.0 8.320 6.141

Scc4 17 87653783 87665288 - NM_139295 Mcfd2 48.5 3.31E-12 2050 994.9 1021.3 20.232 27.601

Scc4 17 87653783 87665288 - NM_176808 Mcfd2 36.3 1.70E-09 2275 172.1 235.7 3.154 5.741

Scc4 17 87674226 87682155 - NR_015506 4833418N02Rik 44.3 2.78E-11 1530 95.0 15.0 2.588 0.543

Scc4 17 87682226 87781111 + NM_028639 Ttc7 81.8 1.49E-19 4615 1921.0 1024.0 17.352 12.293

Scc4 17 87832741 87846276 - NM_007589 Calm2 1377.9 0 1216 5795.0 8194.0 198.665 373.335

Scc4 17 88035319 88050468 + NM_008532 Epcam 1766.7 0 2061 26920.0 28921.0 544.500 777.449

Scc4 17 88374390 88390233 + NM_010830 Msh6 29.1 6.96E-08 4263 910.4 510.2 8.902 6.631

Scc4 17 88390199 88464626 - NM_001081034 Fbxo11 6.4 0.0111252 4013 2276.6 1854.8 23.650 25.607

Scc4 17 88840052 88889874 + NM_180974 Foxn2 38.0 6.97E-10 5084 801.0 418.0 6.568 4.555 continued 193

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc4 17 89025895 89045065 + NM_029858 Ston1 227.3 0 2983 610.0 124.0 8.525 2.303

Scc4 17 89140889 89191317 - NM_013582 Lhcgr 5.6 0.0178581 2553 5.0 0.0 0.082 0.000

Scc4 17 90432984 91492143 - NM_020252 Nrxn1 29.4 5.96E-08 9040 91.2 21.2 0.421 0.130

Scc4 17 90432984 91492143 - NM_177284 Nrxn1 8.8 0.00305907 9004 60.8 22.8 0.282 0.141

Scc5 18 52625347 52650393 + NM_026040 Srfbp1 15.5 8.47E-05 1463 183.0 83.0 5.214 3.143

Scc5 18 52676892 52689363 - NM_010728 Lox 343.8 0 3604 936.0 193.0 10.827 2.967

Scc5 18 52927465 53075586 + NM_026408 Sncaip 5.0 0.0254735 3553 56.7 25.4 0.666 0.396

Scc5 18 52927747 53075586 + NM_001199153 Sncaip 5.7 0.0171877 3529 64.5 28.9 0.762 0.454

Scc5 18 53336019 53380515 + NM_026386 Snx2 129.7 4.67E-30 2013 1658.0 1835.0 34.335 50.504

Scc5 18 53405316 53550480 + NM_029394 Snx24 13.5 0.000232412 1882 351.0 349.0 7.775 10.274 Scc5 19 18 53565995 53577662 - NM_008908 Ppic 150.6 1.30E-34 1286 995.0 366.0 32.254 15.768

4

Scc5 18 53841377 53904202 - NM_178686 Cep120 11.7 0.000631461 4554 754.0 680.0 6.902 8.273

Scc5 18 55047699 55149835 - NM_175751 Zfp608 20.9 4.87E-06 6112 540.0 293.0 3.683 2.656

Scc5 18 56591786 56663447 + NM_026240 Gramd3 164.4 1.25E-37 2605 2416.0 2611.0 38.663 55.531

Scc5 18 56685390 56721953 - NM_138600 Aldh7a1 12.7 0.000363563 2911 307.2 308.3 4.399 5.867

Scc5 18 56722223 56747367 + NM_001162989 Phax 16.8 4.11E-05 1783 322.1 334.0 7.532 10.379

Scc5 18 56732525 56747367 + NM_019996 Phax 37.4 9.45E-10 1893 202.2 268.0 4.453 7.845

Scc5 18 56867467 56913080 + NM_010721 Lmnb1 65.5 5.90E-16 2844 1381.0 1412.0 20.243 27.507

Scc5 18 56921370 57085203 - NM_177115 3-Mar 30.6 3.21E-08 1754 63.0 111.0 1.497 3.506

Scc5 18 57095914 57100173 + NR_033455 Gm10536 17.4 2.95E-05 4259 1.0 14.0 0.010 0.182

Scc5 18 57514387 57552374 + NM_028447 Prrc1 133.0 9.03E-31 4801 2257.0 1124.0 19.598 12.971

Scc5 18 57628140 57637789 + NM_001134697 Ctxn3 133.8 6.08E-31 1472 623.0 192.0 17.643 7.227 continued 194

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc5 18 57705668 57712050 - NR_027944 4930511M06Rik 21.3 3.99E-06 2651 0.7 15.7 0.012 0.328

Scc5 18 57705668 57712052 - NR_015494 4930511M06Rik 7.2 0.00718695 2806 0.3 5.3 0.004 0.105

Scc5 18 58038332 58106476 + NM_009194 Slc12a2 1880.4 0 6520 14451.0 5660.0 92.396 48.096

Scc5 18 58168277 58369581 - NM_010181 Fbn2 686.1 0 10480 1301.0 173.0 5.175 0.915

Scc5 18 58715894 58772524 + NM_001081072 Slc27a6 7.1 0.00773185 2598 70.0 30.0 1.123 0.640

Scc5 18 58819136 58839225 + NM_025478 Isoc1 23.8 1.08E-06 2563 2290.0 1465.0 37.247 31.668

Scc5 18 58996418 59213333 + NM_175506 Adamts19 13.8 0.000202804 4666 315.0 167.0 2.814 1.983

Scc5 18 59222035 59236615 + NR_033765 A730017C20Rik 23.0 1.66E-06 1922 28.1 1.3 0.609 0.038

Scc5 18 59222035 59236615 + NR_033764 A730017C20Rik 19.0 1.28E-05 1941 23.3 1.1 0.500 0.031

Scc5 18 59222035 59236615 + NM_001167925 A730017C20Rik 12.7 0.000367596 1861 21.7 2.4 0.487 0.073

19 Scc5 18 59334994 59570991 + NM_001081328 Chsy3 24.6 6.97E-07 3882 65.0 13.0 0.698 0.186

5

Scc5 18 60371731 60407475 + NM_001033767 Gm4951 134.9 3.44E-31 2648 8.0 109.0 0.126 2.281

Scc5 18 60427955 60432922 - NM_001034859 Gm4841 29.0 7.07E-08 2838 6.0 33.0 0.088 0.644

Scc5 18 60453034 60462671 + NM_001101475 F830016B08Rik 5.5 0.0187379 3297 1.0 6.0 0.013 0.101

Scc5 18 60535683 60552284 + NM_001146275 Iigp1 209.6 0 3123 101.4 337.1 1.353 5.981

Scc5 18 60535683 60552284 + NM_021792 Iigp1 867.9 0 2940 219.6 1061.9 3.114 20.011

Scc5 18 60633845 60661638 - NM_134133 2010002N04Rik 54.7 1.38E-13 1986 325.0 421.0 6.822 11.745

Scc5 18 60685875 60718417 + NM_026302 Dctn4 29.4 5.91E-08 3714 2104.0 1881.0 23.616 28.060

Scc5 18 60753644 60783960 - NM_177340 Synpo 73.3 1.11E-17 5012 1844.8 996.1 15.344 11.011

Scc5 18 60759860 60769759 - NM_001109975 Synpo 27.6 1.47E-07 4479 110.2 30.9 1.026 0.382

Scc5 18 60845630 60873044 - NM_008306 Ndst1 199.1 2.80E-45 6072 1919.0 816.0 13.175 7.446

Scc5 18 60934250 60938201 + NM_020600 Rps14 168.0 2.03E-38 597 2228.0 2450.0 155.576 227.368 continued 195

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc5 18 60963503 60972307 + NM_001042605 Cd74 484.3 0 1415 1288.1 2084.4 37.950 81.614

Scc5 18 60963503 60972307 + NM_010545 Cd74 5507.5 0 1223 3532.0 10159.0 120.391 460.215

Scc5 18 60973410 61008619 - NM_001198984 Tcof1 92.2 8.03E-22 4629 366.6 102.4 3.301 1.226

Scc5 18 61071894 61077423 + NM_001038499 Arsi 12.7 0.000370668 1722 86.0 32.0 2.082 1.030

Scc5 18 61085286 61147807 + NM_177407 Camk2a 32.8 1.02E-08 4956 122.7 32.8 1.032 0.367

Scc5 18 61123208 61147807 + NM_009792 Camk2a 34.6 4.12E-09 4268 83.3 15.2 0.814 0.197

Scc5 18 61155034 61173854 - NM_201353 Slc6a7 654.1 0 3343 852.0 50.0 10.624 0.829

Scc5 18 61178516 61195854 - NM_009880 Cdx1 70.1 5.54E-17 1750 5389.0 3378.0 128.372 106.944

Scc5 18 61204804 61244722 + NM_008809 Pdgfrb 221.6 0 5410 1084.0 343.6 8.353 3.519

Scc5 18 61204804 61244722 + NM_001146268 Pdgfrb 167.9 2.11E-38 5413 941.0 320.4 7.247 3.280

Scc5 18 61265226 61290794 + NM_001037859 Csf1r 13.6 0.000226657 3875 1564.0 1015.0 16.825 14.512

19 Scc5 18 61290931 61336704 - NM_178277 Hmgxb3 133.1 8.50E-31 5198 515.2 141.3 4.132 1.507

6

Scc5 18 61356508 61371251 - NM_007885 Slc26a2 11.3 0.000783779 3863 7496.0 5311.0 80.892 76.171

Scc5 18 61380153 61449373 + NM_146086 Pde6a 41.8 9.94E-11 4198 22.0 70.0 0.218 0.924

Scc5 18 61457790 61560086 - NM_133249 Ppargc1b 41.2 1.38E-10 3656 620.0 300.0 7.069 4.546

Scc5 18 61653448 61696191 - NM_177828 Arhgef37 32.1 1.49E-08 3234 113.0 29.0 1.457 0.497

Scc5 18 61715236 61747954 + NM_146087 Csnk1a1 48.3 3.75E-12 2013 5686.0 4900.0 117.751 134.862

Scc5 18 61807479 61807549 - NR_029557 Mir145 12.9 0.0003356 70 61.0 19.0 36.327 15.038

Scc5 18 61847589 61852192 + NM_019508 Il17b 9.0 0.00273545 692 8.0 0.0 0.482 0.000

Scc5 18 61856491 61867290 - NM_172832 Pcyox1l 8.7 0.00315252 2017 296.0 168.0 6.118 4.615

Scc5 18 61889916 61946317 - NM_178928 Afap1l1 84.6 3.69E-20 3403 483.0 166.0 5.917 2.703

Scc5 18 62112729 62175374 + NM_172628 Sh3tc2 22.8 1.77E-06 4540 90.0 25.0 0.826 0.305 continued 196

Table 10 continued STS/A Balb/cheA STS/A Balb/CheA ChiSq p-value Transcript Locus Chrom. Start Stop Strand Transcript Gene Colon Colon Colon Colon (DiffExpr) (DiffExpr) Length (Reads) (Reads) (RPKM) (RPKM) Scc5 18 62483858 62627457 + NM_008313 Htr4 10.6 0.0011606 4657 115.0 131.0 1.029 1.558

Scc5 18 62708565 62821043 + NM_177829 Spink10 16.9 3.92E-05 1135 0.0 10.0 0.000 0.488

Table 10. Genes showing differential expression between Balb/cHeA and STS/A by RNA-Seq

19

7

197

Position (UCSC Reference STS/A Balb/cHeA STS/A: STS/A: STS/A: STS/A: Balb/cHeA: Balb/cHeA: Balb/cHeA: Balb/cHeA: Locus Gene Genome Browser Base Genotype Call Genotype Call A calls C calls G calls T calls A calls C calls G calls T calls NCBI37/mm9) Scc13 Znrf2 chr6:54841244 C CC TT 0 26 0 0 0 2 0 16 Scc4 Cdc42ep3 chr17:79734797 C TT CC 0 0 0 34 0 8 0 0 Scc4 Eml4 chr17:83878059 C TT CC 0 0 0 50 0 15 0 0 Scc4 Eml4 chr17:83879540 C TT CC 0 0 0 8 0 35 0 0 Scc4 Cox7a21 chr17:83901597 T CC TT 0 11 0 0 0 0 1 176 Scc4 Cox7a21 chr17:83901598 C GG CC 0 0 15 0 0 188 0 0 Scc4 Cox7a21 chr17:83901599 C GG CC 0 0 19 0 0 195 0 0 Scc4 Lrpprc chr17:85125918 G AA GG 35 0 0 0 0 0 68 0 Scc4 Slc3a1/Prep1 chr17:85463161 A GG AA 0 0 135 0 47 0 0 0 Scc4 Slc3a1/Prep1 chr17:85463413 G AA GG 63 0 0 0 0 0 24 0 Scc4 Slc3a1/Prep1 chr17:85463454 A GG AA 0 1 69 1 25 0 0 0 Scc4 Prkce chr17:87056480 T CC TT 0 8 0 0 0 0 0 55 Scc4 Epas1 chr17:87208980 C CC TT 0 39 0 0 0 1 1 31 Scc4 Rhoq chr17:87394037 G GG AA 0 0 22 0 18 0 0 0 Scc4 Rhoq/Pigf chr17:87397393 C TT CC 0 0 0 20 0 19 0 0 Scc4 Rhoq/Pigf chr17:87397583 C CC TT 0 29 0 0 0 0 0 38 Scc4 Rhoq/Pigf chr17:87398928 A CC AA 0 62 0 0 66 0 0 0 Scc4 Rhoq/Pigf chr17:87399025 T AA TT 48 0 0 0 0 0 0 50 Scc4 Rhoq/Pigf chr17:87399109 C TT CC 0 0 0 31 0 26 0 1

19 Scc4 Rhoq/Pigf chr17:87399192 C TT CC 0 0 0 29 0 62 0 0 8 Scc4 Pigf chr17:87421262 C AA CC 24 0 0 0 0 129 1 2

Scc4 Pigf chr17:87423194 A AA GG 16 0 0 0 0 0 71 0 Scc4 Cript chr17:87424966 C CC GG 0 23 0 0 0 0 27 0 Scc4 Cript chr17:87434387 A AA GG 59 0 0 0 0 0 105 0 Scc4 Cript chr17:87434730 G GG AA 0 0 33 0 66 1 0 0 Scc4 Cript chr17:87434939 A AA GG 36 0 0 0 0 0 45 1 Scc4 Mcfd2 chr17:87654272 A GG AA 0 0 53 0 117 0 1 0 Scc4 Mcfd2 chr17:87654360 T AA TT 30 0 0 0 0 0 0 161 Scc4 Mcfd2 chr17:87654471 C AA CC 30 0 0 0 0 82 0 0 Scc4 Mcfd2 chr17:87654691 G GG TT 0 0 36 0 0 1 0 48 Scc4 Mcfd2 chr17:87654856 A GG AA 0 0 34 0 44 0 0 0 Scc4 Mcfd2 chr17:87654879 C TT CC 0 0 1 24 0 40 2 0 Scc4 Mcfd2 chr17:87654999 G AA GG 26 0 0 0 0 0 25 0 Scc4 Mcfd2 chr17:87655011 A GG AA 2 0 22 0 24 1 0 0 Scc4 Mcfd2 chr17:87655042 C TT CC 0 0 0 13 0 30 0 0 Scc4 Mcfd2 chr17:87655138 C GG CC 0 0 36 0 0 5 0 0 Table 11. Coding SNPs identified in Balb/cHeA and STS/A by RNA-seq continued 198

Table 11 continued Position (UCSC Reference STS/A Balb/cHeA STS/A: STS/A: STS/A: STS/A: Balb/cHeA: Balb/cHeA: Balb/cHeA: Balb/cHeA: Locus Gene Genome Browser Base Genotype Call Genotype Call A calls C calls G calls T calls A calls C calls G calls T calls NCBI37/mm9) Scc4 Mcfd2 chr17:87655164 C GG CC 0 0 28 0 0 19 0 0 Scc4 Mcfd2 chr17:87655311 G GG AA 0 0 34 0 41 0 0 0 Scc4 Mcfd2 chr17:87655338 A GG AA 0 0 32 0 39 1 1 0 Scc4 Mcfd2 chr17:87656582 G AA GG 27 0 0 0 0 0 18 0 Scc4 Ttc7 chr17:87780454 A GG AA 0 0 44 0 25 0 0 0 Scc4 Calm2 chr17:87834471 G GG AA 0 0 283 1 690 2 6 2 Scc4 Msh2 chr17:88122775 C CC TT 0 18 0 0 0 0 0 32 Scc4 Klraq1 chr17:88968541 G GG AA 0 0 23 0 26 0 0 0 Scc5 Snx2 chr18:53357557 T TT CC 0 0 0 54 0 11 0 0 Scc5 Prrc1 chr18:57541368 T CC TT 0 28 0 0 0 0 0 23 Scc5 Prrc1 chr18:57551901 G GG AA 0 0 19 0 14 0 0 0 Scc5 Prrc1 chr18:57552178 A AA GG 26 0 1 0 0 0 34 0 Scc5 Isoc1 chr18:58832975 C TT CC 0 0 0 37 0 15 0 0 Scc5 Isoc1 chr18:58837906 A AA CC 28 0 0 0 1 64 0 0 Scc5 Isoc1 chr18:58837910 C AA CC 25 0 0 0 0 64 0 1 Scc5 Isoc1 chr18:58837914 C AA CC 21 0 0 0 0 65 0 0 Scc5 Isoc1 chr18:58838224 A GG AA 0 0 75 0 70 2 0 0

1

9 Scc5 Isoc1 chr18:58838332 T AA TT 65 0 0 0 1 0 0 91

9

Scc5 Isoc1 chr18:58838764 C TT CC 0 0 0 45 0 38 2 0 Scc5 Isoc1 chr18:58838782 C TT CC 0 0 0 63 1 24 0 1 Scc5 Isoc1 chr18:58838918 A GG AA 0 0 34 0 53 0 0 0 Scc5 Isoc1 chr18:58838950 T CC TT 0 51 0 0 0 0 0 59 Scc5 Isoc1 chr18:58839065 G AA GG 52 0 0 0 1 0 62 0 Scc5 Isoc1 chr18:58839075 G AA GG 60 2 1 0 0 0 59 0 Scc5 Isoc1 chr18:58839464 A TT AA 0 0 0 67 77 0 0 0 Scc5 unannotated chr18:60844836 T TT CC 0 0 0 22 0 33 0 0 Scc5 unannotated chr18:60844970 G GG AA 0 0 17 0 38 1 0 0 Scc5 Ndst1 chr18:60867693 A AA GG 16 0 0 0 0 0 11 0 Scc5 Rps14 chr18:60934317 G GG CC 0 1 10 0 1 40 0 0 Scc5 Cd74 chr18:60963579 G GG AA 0 2 63 0 244 1 1 0 Scc5 Cd74 chr18:60963580 C CC TT 0 65 0 0 1 1 2 245 Scc5 Cd74 chr18:60967592 C CC TT 0 137 0 0 3 0 1 602 Scc5 Cd74 chr18:60970056 C CC AA 0 246 0 0 706 4 2 3 Scc5 Tcof1 chr18:60973528 G GG TT 0 0 5 0 0 0 1 19 Scc5 unannotated chr18:61353658 G GG AA 10 0 121 0 206 1 5 0 Scc5 unannotated chr18:61353660 G GG AA 1 0 144 0 216 0 4 0 continued 199

Table 11 continued Position (UCSC Reference STS/A Balb/cHeA STS/A: STS/A: STS/A: STS/A: Balb/cHeA: Balb/cHeA: Balb/cHeA: Balb/cHeA: Locus Gene Genome Browser Base Genotype Call Genotype Call A calls C calls G calls T calls A calls C calls G calls T calls NCBI37/mm9) Scc5 unannotated chr18:61353676 G GG AA 0 0 200 0 347 0 1 1 Scc5 unannotated chr18:61353678 G GG AA 0 0 263 0 374 2 1 0 Scc5 unannotated chr18:61353754 A AA CC 349 0 0 0 1 511 1 0 Scc5 unannotated chr18:61354412 G GG AA 0 0 175 0 163 1 1 1 Scc5 unannotated chr18:61354438 C TT CC 0 0 0 202 2 123 0 0 Scc5 Slc26a2 chr18:61356626 C CC AA 0 112 0 0 177 0 0 1 Scc5 unannotated chr18:61452732 G GG AA 0 0 30 0 58 0 0 0 Scc5 unannotated chr18:61452755 A AA TT 29 0 0 0 0 0 0 62 Scc5 unannotated chr18:61453745 C CC TT 0 9 0 0 0 0 0 16 Scc5 unannotated chr18:61453823 T TT CC 0 0 0 22 0 18 0 0 Scc5 Csnk1a1 chr18:61746989 A GG AA 0 0 19 0 18 0 0 0 Scc5 Csnk1a1 chr18:61747129 G GG AA 0 0 147 0 186 0 0 1 Scc5 Csnk1a1 chr18:61747150 A AA GG 174 1 0 0 0 2 219 0 Scc5 Ak143260 chr18:61806777 T CT TT 0 7 0 53 0 0 0 40 Scc5 Grpe12 chr18:61872325 T TT CC 0 0 0 30 0 19 0 0 Scc5 Grpe12 chr18:61873302 C CC TT 0 20 0 0 0 0 1 26 Scc5 Grpe12 chr18:61873350 G GG AA 1 0 15 0 30 0 1 0 Scc5 Grpe12 chr18:61873425 T TT CC 0 0 0 24 0 22 0 0

20 Scc5 Grpe12 chr18:61873450 C CC TT 0 27 0 0 0 0 0 15

0

Scc5 Grpe12 chr18:61875280 C CC TT 0 18 0 0 0 0 0 35 Scc5 Grpe12 chr18:61875624 T TT CC 0 0 0 37 0 30 0 0 Scc5 Fbxo38 chr18:62663931 G GG AA 0 0 38 0 40 0 1 0 Scc5 Fbxo38 chr18:62664416 T TT CC 0 0 0 30 0 41 0 0 Scc5 Fbxo38 chr18:62666389 A AA CC 15 0 0 0 0 37 0 1 Scc5 Fbxo38 chr18:62675086 C CC GG 0 34 0 0 0 0 16 0 Scc5 Fbxo38 chr18:62675144 G GG AA 0 0 30 0 9 0 0 0

Table 11. Coding SNPs identified in Balb/cHeA and STS/A by RNA-seq

200

Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs11955699 MCC Intronic SCC5 5 112498949 G=0.440 GT G T 25 6 0.000643698

rs2537742 FBXO11 Intronic SCC4 2 48039044 C=0.163 GA G A 13 1 0.001340641

rs2699814 SNX10 Intronic SCC13 7 26406800 T=0.474 TA T A 5 19 0.004266725

rs39099 CHN2 Intronic SCC13 7 29293095 A=0.271 GA G A 19 5 0.004266725

rs2112487 PRDM6 Intronic SCC5 5 122506237 T=0.169 CT C T 19 5 0.004266725

rs874869 FBXO11 Intronic SCC4 2 48094337 G=0.431 CG C G 18 5 0.00671439

rs34850 CDO1 Intronic SCC5 5 115142745 C=0.472 GC G C 14 3 0.007632882

rs1868087 EPAS1 Intronic SCC4 2 46579719 G=0.212 AG A G 0 7 0.008150972

rs38410 GGCT Intronic SCC13 7 30540399 G=0.361 GT G T 8 22 0.010587138

rs2034327 EPAS1 Intronic SCC4 2 46549040 G=0.455 CG C G 11 2 0.01255492

rs12516924 NDST1 Intronic SCC5 5 149901834 G=0.233 GA G A 11 2 0.01255492

rs6891155 CEP120 Intronic SCC5 5 122684131 G=0.442 GA G A 26 11 0.013663714

20

1 rs1919935 SNX10 Intronic SCC13 7 26399235 C=0.242 CT C T 18 6 0.01430588

rs17653932 Intergenic Intergenic SCC5 5 148831133 T=0.275 TC T C 18 6 0.01430588

rs3770755 QPCT Intronic SCC4 2 37574134 G=0.298 GC G C 8 21 0.015776757 EPCAM rs3923559 Intronic SCC4 2 47606774 T=0.344 CT C T 21 8 0.015776757 (TACSTD1) rs4835907 GRAMD3 Intronic SCC5 5 125825322 A=0.226 TA T A 8 21 0.015776757

rs1947582 CSNK1A1 Intronic SCC5 5 148892095 C=0.293 AG A G 8 1 0.019630659

rs256938 FEM1C Intronic SCC5 5 114868337 C=0.377 CA C A 12 3 0.020136753

rs356168 SNCA Intronic SCC13 4 90674431 A=0.405 GA G A 17 6 0.021810122

rs4395533 Intergenic Intergenic SCC13 4 121481950 A=0.496 GA G A 26 12 0.023140934

rs1010101 CSF1R Intronic SCC5 5 149489110 C=0.415 CT C T 23 10 0.023635105

Table 12. Allele-specific imbalance data for 525 SNPs in the discovery sample set continued 201

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs10199201 EPAS1 Intronic SCC4 2 46577162 C=0.172 CT C T 5 0 0.02534732

rs2973586 LMNB1 Intronic SCC5 5 126130101 C=0.369 GA G A 18 7 0.027806897

rs3770752 QPCT Intronic SCC4 2 37576136 C=0.220 CT C T 21 9 0.028459739

rs1008776 LOC101927421 Intronic SCC5 5 124480600 C=0.426 CT C T 21 9 0.028459739

rs6949451 HOXA10 Intronic SCC13 7 27215041 T=0.415 CT C T 13 4 0.029049025

rs3087822 CRIPT 3'UTR SCC4 2 46852033 G=0.431 GA G A 4 13 0.029049025

rs1513807 FAM13A Intronic SCC13 4 89961290 T=0.499 GT G T 19 8 0.034264013

rs4835948 FBN2 Intronic SCC5 5 127870190 T=0.494 CT C T 19 8 0.034264013

rs4722968 PLEKHA8 Intronic SCC13 7 30109718 G=0.196 CG C G 2 9 0.034808484

rs7568481 LRPPRC Intronic SCC4 2 44145374 G=0.335 GT G T 9 2 0.034808484 rs966087

20 LMNB1 Intronic SCC5 5 126170910 G=0.319 GT G T 9 2 0.034808484

2 rs2284218 CRHR2 Intronic SCC13 7 30714333 T=0.432 TC T C 14 5 0.038947459

rs7811609 KBTBD2 Intronic SCC13 7 32930597 T=0.265 TC T C 14 5 0.038947459

rs10182633 FBXO11 Intronic SCC4 2 48091408 T=0.258 TA T A 14 5 0.038947459

rs4705073 Intergenic Intergenic SCC5 5 148832326 C=0.428 TC T C 5 14 0.038947459

rs353237 Intergenic Intergenic SCC5 5 148831850 C=0.223 CT C T 9 20 0.041087229

rs1534696 SNX10 Intronic SCC13 7 26397239 C=0.348 GT G T 7 17 0.041226838

rs6958331 LANCL2 Intronic SCC13 7 55440523 C=0.138 CT C T 17 7 0.041226838

rs32216 FBN2 Intronic SCC5 5 127640076 C=0.242 TC T C 7 17 0.041226838

rs4705086 PPARGC1B Intronic SCC5 5 149121767 C=0.498 TC T C 7 17 0.041226838

rs4401454 BMPR1B Intronic SCC13 4 95741658 G=0.188 GA G A 12 24 0.04550027

rs7801318 CHN2 Intronic SCC13 7 29384157 T=0.341 CT C T 6 15 0.049534623 continued 202

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs16881927 FKBP9 Intronic SCC13 7 33042691 C=0.200 CT C T 10 3 0.052203647

rs2043912 MTA3 Intronic SCC4 2 42935481 G=0.334 AG A G 3 10 0.052203647

rs7719910 CSF1R Intronic SCC5 5 149445987 T=0.145 GT G T 3 10 0.052203647

rs2390054 PRDM5 Intronic SCC13 4 121783018 G=0.454 GT G T 22 11 0.055511105

rs2956540 LOX Intronic SCC5 5 121409180 C=0.351 GC G C 8 2 0.05777958

rs7737512 MCC Intronic SCC5 5 112618167 G=0.132 GA G A 8 2 0.05777958

rs4953344 EPAS1 Intronic SCC4 2 46552458 C=0.085 CT C T 6 1 0.058781731

rs1862175 Intergenic Intergenic SCC5 5 121532861 C=0.424 AG A G 1 6 0.058781731

rs2622604 ABCG2 Intronic SCC13 4 89078924 T=0.171 CT C T 13 5 0.059346448

rs4953012 DYNC2LI1 Intronic SCC4 2 44006464 C=0.204 CA C A 13 5 0.059346448 rs9973653 20 EPAS1 Intronic SCC4 2 46548109 T=0.490 GT G T 5 13 0.059346448

3 rs221171 Intergenic Intergenic SCC13 7 29006139 G=0.455 GC G C 11 4 0.070701156

rs9327244 Intergenic Intergenic SCC5 5 121597391 G=0.292 GC G C 11 4 0.070701156

rs4078254 MCC Intronic SCC5 5 112456067 A=0.444 GT G T 8 17 0.07186065

rs34166 C7orf16 Intronic SCC13 7 31730776 T=0.467 TA T A 6 14 0.073638283

rs2775 ALDH7A1 3'UTR SCC5 5 125880589 C=0.392 AG A G 6 14 0.073638283

rs4572999 ZNF608 Intronic SCC5 5 124024209 G=0.239 CG C G 14 6 0.073638283

rs17326236 PLEKHA8 Intronic SCC13 7 30091123 A=0.136 AG A G 3 9 0.08326454

rs960106 FBXO11 Intronic SCC4 2 48078838 C=0.347 CT C T 3 0 0.08326454

rs4270 GGCT 3'UTR SCC13 7 30536611 C=0.286 CT C T 7 15 0.088081528

rs336030 FAM82A1 Intronic SCC4 2 38266996 G=0.327 CA C A 7 15 0.088081528

rs1513721 LOC101927052 Intronic SCC13 4 122411527 G=0.318 CA C A 12 5 0.089555093 continued 203

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs6964512 Intergenic Intergenic SCC13 7 26026149 G=0.452 GT G T 5 12 0.089555093

rs4953013 DYNC2LI1 Intronic SCC4 2 44007075 G=0.402 GA G A 12 5 0.089555093

rs171527 FBN2 Intronic SCC5 5 127662706 T=0.144 TC T C 12 5 0.089555093

rs1049394 SCRN1 3'UTR SCC13 7 29960804 T=0.309 GA G A 7 2 0.09558073

rs2893245 SNX10 Intronic SCC13 7 26352073 G=0.409 AG A G 7 2 0.09558073

rs843576 PRDM5 Intronic SCC13 4 121645319 G=0.179 AG A G 7 2 0.09558073

rs1864828 HNRPLL Intronic SCC4 2 38824781 A=0.134 AT A T 7 2 0.09558073

rs2254161 Intergenic Intergenic SCC5 5 148823662 G=0.356 GT G T 7 2 0.09558073

rs10516930 GRID2 Intronic SCC13 4 94557686 T=0.343 TC T C 16 8 0.102470455

rs2346175 EPAS1 Intronic SCC4 2 46571310 T=0.267 TC T C 8 16 0.102470455

rs13359059 GRAMD3 Intronic SCC5 5 125799373 C=0.450 CT C T 5 1 0.102470455

20 rs3857434 MCC Intronic SCC5 5 112365804 A=0.432 AG A G 16 8 0.102470455

4

rs4579258 ZNF608 Intronic SCC5 5 124037371 G=0.419 AG A G 5 1 0.102470455

rs710834 HERC3 Intronic SCC13 4 89618837 T=0.499 CT C T 20 11 0.105997573

rs4705415 PDGFRB Intronic SCC5 5 149517235 A=0.400 AG A G 11 20 0.105997573

rs10080096 YTHDC2 Intronic SCC5 5 112879405 G=0.310 GT G T 6 13 0.108293684

rs11241891 GRAMD3 Intronic SCC5 5 125768020 T=0.268 TC T C 6 13 0.108293684

rs4836261 GRAMD3 Intronic SCC5 5 125786305 T=0.369 GT G T 6 13 0.108293684

rs1564481 ABCG2 Intronic SCC13 4 89061265 T=0.296 TC T C 4 10 0.108809459 near 5' of rs2346419 PIGF/CRIPT SCC4 2 46845069 C=0.276 CG C G 10 4 0.108809459 gene/intron rs7570659 MTA3 Intronic SCC4 2 42873590 G=0.363 GC G C 4 10 0.108809459

rs3779250 CRHR2 Intronic SCC13 7 30694260 T=0.454 GA G A 9 17 0.116664507 continued 204

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs919751 PDGFRB Intronic SCC5 5 149505489 C=0.257 CT C T 9 17 0.116664507

rs17754040 PRR15 5'UTR SCC13 7 29603850 C=0.220 CT C T 14 7 0.126630498

rs3114018 ABCG2 Intronic SCC13 4 89064581 A=0.498 CA C A 14 7 0.126630498

rs6897488 GRAMD3 Intronic SCC5 5 125769679 C=0.410 CT C T 22 13 0.128190217

rs12508842 FAM190A Intronic SCC13 4 92017500 C=0.475 CT C T 10 18 0.130570066

rs6827372 PRDM5 Intronic SCC13 4 121714045 C=0.414 CT C T 18 10 0.130570066

rs6544066 PRKD3 Intronic SCC4 2 37543371 G=0.406 GA G A 18 10 0.130570066

rs11241892 GRAMD3 Intronic SCC5 5 125773331 A=0.338 CA C A 10 18 0.130570066

rs4148149 ABCG2 Intronic SCC13 4 89062285 G=0.384 CA C A 8 3 0.131668066

rs6728843 FBXO11 Intronic SCC4 2 48103511 C=0.116 TC T C 3 8 0.131668066

rs330787 FBXO11 Intronic SCC4 2 48041377 A=0.357 TC T C 11 5 0.133614457

rs4953342 EPAS1 Intronic SCC4 2 46552047 G=0.339 AG A G 5 11 0.133614457

20 rs10072056 PDGFRB Intronic SCC5 5 149505306 A=0.259 GA G A 11 19 0.144127088

5

rs7681440 SNCA Intronic SCC13 4 90756550 C=0.328 CG C G 15 8 0.144399846

rs4953353 EPAS1 Intronic SCC4 2 46567276 T=0.385 GT G T 8 15 0.144399846

rs6859190 MCC Intronic SCC5 5 112526686 T=0.411 TC T C 15 8 0.144399846

rs1077724 PDGFRB Intronic SCC5 5 149504115 T=0.278 AT A T 24 15 0.149541421

rs2930807 HERC3 Intronic SCC13 4 89552006 A=0.412 GA G A 6 12 0.157299265

rs34181 C7orf16 Intronic SCC13 7 31743948 T=0.484 GT G T 12 6 0.157299265

rs9307048 ABCG2 Intronic SCC13 4 89149570 T=0.159 TC T C 2 6 0.157299265

rs986610 SNCA Intronic SCC13 4 90751655 C=0.142 CT C T 12 6 0.157299265

rs3768730 EPAS1 Intronic SCC4 2 46592524 G=0.285 GT G T 12 20 0.157299265 continued 205

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs10058728 CSNK1A1 Intronic SCC5 5 148904092 A=0.255 TA T A 20 12 0.157299265

rs10071667 GRAMD3 Intronic SCC5 5 125785238 A=0.464 GA G A 6 2 0.157299265 Coding/Exonic/ rs1042713 ADRB2 SCC5 5 148206440 A=0.474 GA G A 12 20 0.157299265 Missense rs1833661 PPARGC1B Intronic SCC5 5 149167192 C=0.488 TC T C 2 6 0.157299265

rs337128 PRDM6 Intronic SCC5 5 122441482 C=0.452 CA C A 12 20 0.157299265

rs6884924 SNCAIP Intronic SCC5 5 121755377 A=0.217 CA C A 6 12 0.157299265

rs843556 PRDM5 Intronic SCC13 4 121636220 A=0.249 TA T A 16 9 0.161513387

rs2272448 MTA3 Intronic SCC4 2 42871144 G=0.352 CG C G 16 9 0.161513387 near 5' of rs2122715 PIGF/CRIPT SCC4 2 46845849 G=0.218 TC T C 4 9 0.16551794 gene/intron rs251468 PPARGC1B Intronic SCC5 5 149194485 T=0.312 AG A G 4 9 0.16551794

rs1732001 ELMO1 Intronic SCC13 7 37475424 T=0.361 CT C T 21 13 0.170067024

20

6 rs665444 PRDM6 Intronic SCC5 5 122476329 T=0.309 GT G T 21 13 0.170067024

rs186749 PRDM6 Intronic SCC5 5 122454305 T=0.450 CT C T 10 17 0.177931811

rs33410 RBM22 Intronic SCC5 5 150072693 T=0.161 TC T C 10 17 0.177931811

rs6880051 ZNF608 Intronic SCC5 5 123974325 A=0.446 CA C A 10 17 0.177931811

rs34179 C7orf16 Intronic SCC13 7 31741035 T=0.390 CT C T 13 7 0.179712587 Coding/Exonic/ rs6862252 ZNF608 SCC5 5 123983915 G=0.369 GT G T 13 7 0.179712587 Missense rs17819022 CRIPT Intronic SCC4 2 46850571 A=0.176 GA G A 10 5 0.196705719

rs2346176 EPAS1 Intronic SCC4 2 46593299 T=0.356 TC T C 5 10 0.196705719

rs4953345 EPAS1 Intronic SCC4 2 46552601 T=0.293 AT A T 10 5 0.196705719

rs2546343 PRDM6 Intronic SCC5 5 122434666 C=0.469 TC T C 5 10 0.196705719

rs7724258 MCC Intronic SCC5 5 112434312 T=0.372 TC T C 10 5 0.196705719 continued 206

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs1470696 PRDM5 Intronic SCC13 4 121842567 A=0.404 AG A G 8 14 0.200825214

rs6756455 LRPPRC Intronic SCC4 2 44218637 T=0.473 GT G T 8 14 0.200825214

rs1465693 CSF1R Intronic SCC5 5 149481322 A=0.417 TA T A 8 14 0.200825214

rs2304060 PDGFRB Intronic SCC5 5 149501751 A=0.411 CA C A 14 8 0.200825214

rs6867136 TCOF1 Intronic SCC5 5 149744226 A=0.432 AG A G 14 8 0.200825214

rs6879589 MCC Intronic SCC5 5 112394380 G=0.333 GT G T 14 8 0.200825214

rs6886362 ZNF608 Intronic SCC5 5 124047163 G=0.277 GT G T 14 8 0.200825214

rs11977829 PLEKHA8 Intronic SCC13 7 30113706 G=0.254 AG A G 7 3 0.205903321

rs4952682 DYNC2LI1 Intronic SCC4 2 44036491 G=0.211 GA G A 7 3 0.205903321

rs9309056 Intergenic Intergenic SCC4 2 40783406 C=0.2066 CT C T 7 3 0.205903321

rs10213948 GRAMD3 Intronic SCC5 5 125826191 T=0.198 TC T C 3 7 0.205903321

20

7 rs343201 PRDM5 Intronic SCC13 4 121797035 G=0.459 GA G A 19 12 0.20866791

rs2287350 EIF2AK2 Intronic SCC4 2 37368452 G=0.263 GA G A 19 12 0.20866791

rs2731647 Intergenic Intergenic SCC5 5 121420426 A=0.483 CT C T 19 12 0.20866791

rs728989 SMARCAD1 Intronic SCC13 4 95164481 G=0.307 CG C G 15 9 0.220671492

rs7664025 FAM190A Intronic SCC13 4 91695919 T=0.340 CT C T 15 9 0.220671492 EPCAM rs10185866 Intronic SCC4 2 47599283 T=0.350 TC T C 9 15 0.220671492 (TACSTD1) rs1864066 Intergenic Intergenic SCC5 5 112997865 T=0.347 GA G A 9 15 0.220671492

rs1972530 CSS3 (CHSY3) Intronic SCC5 5 129469521 T=0.108 TC T C 15 9 0.220671492

rs843570 PRDM5 Intronic SCC13 4 121643418 C=0.486 CT C T 20 13 0.223017612

rs10455038 PPIC Intronic SCC5 5 122368076 T=0.436 TC T C 11 6 0.22525306

rs1838207 MCC Intronic SCC5 5 112621104 G=0.452 CG C G 11 6 0.22525306 continued 207

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs3095902 NDST1 Intronic SCC5 5 149926998 C=0.450 CT C T 11 6 0.22525306

rs335180 PRDM6 Intronic SCC5 5 122453933 A=0.311 GA G A 10 16 0.239316668

rs1465327 SCRN1 Intronic SCC13 7 30002608 T=0.270 CT C T 8 4 0.248213251

rs6758592 EPAS1 Intronic SCC4 2 46558541 C=0.462 TC T C 4 8 0.248213251

rs10478909 CSS3 (CHSY3) Intronic SCC5 5 129445811 C=0.297 TC T C 4 8 0.248213251

rs12654269 MCC Intronic SCC5 5 112501783 G=0.201 AG A G 4 8 0.248213251

rs639680 MCC Intronic SCC5 5 112562163 C=0.181 GC G C 4 8 0.248213251

rs1366455 FBN2 Intronic SCC5 5 127600418 C=0.174 TC T C 12 7 0.251349301

rs6869827 CSS3 (CHSY3) Intronic SCC5 5 129323187 A=0.481 AG A G 12 7 0.251349301

rs13434570 SMARCAD1 Intronic SCC13 4 95168019 T=0.417 CT C T 17 11 0.256839426

rs1496276 PRDM5 Intronic SCC13 4 121623516 G=0.461 GA G A 11 17 0.256839426

20 rs17170180 NT5C3 Intronic SCC13 7 33065102 C=0.372 TC T C 5 2 0.256839426

8

rs335170 PRDM6 Intronic SCC5 5 122482243 C=0.356 CA C A 16 23 0.262331878

rs4705378 PPARGC1B Intronic SCC5 5 149164663 G=0.275 GA G A 23 16 0.262331878

rs2035116 FAM190A Intronic SCC13 4 91917416 C=0.275 CT C T 12 18 0.273321894

rs2053875 PRDM5 Intronic SCC13 4 121796416 T=0.381 TA T A 12 18 0.273321894

rs6461902 Intergenic Intergenic SCC13 7 26029812 A=0.460 GA G A 18 12 0.273321894

rs2044241 MCC Intronic SCC5 5 112523526 C=0.298 CA C A 13 8 0.275233755

rs30035 SNX24 Intronic SCC5 5 122278545 T=0.153 TC T C 8 13 0.275233755

rs2699811 SNX10 Intronic SCC13 7 26403658 C=0.451 TC T C 9 5 0.285049646

rs7794120 LANCL2 Intronic SCC13 7 55482148 A=0.353 GA G A 9 5 0.285049646

rs11956139 MCC Intronic SCC5 5 112624552 G=0.388 AG A G 9 5 0.285049646 continued 208

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs4705376 PPARGC1B Intronic SCC5 5 149157675 G=0.329 GC G C 9 5 0.285049646

rs6862890 SNCAIP Intronic SCC5 5 121793874 C=0.201 CA C A 9 5 0.285049646

rs2231148 ABCG2 Intronic SCC13 4 89028478 A=0.221 TA T A 19 13 0.28884457

rs4389685 ZNF608 Intronic SCC5 5 124050437 G=0.388 AG A G 19 13 0.28884457

rs17153429 SNX10 Intronic SCC13 7 26340970 A=0.249 GA G A 9 14 0.297146799 5'UTR/near 5' of rs2254958 EIF2AK2 SCC4 2 37376291 T=0.433 CT C T 9 14 0.297146799 gene rs13162633 GRAMD3 Intronic SCC5 5 125784492 C=0.258 CT C T 9 14 0.297146799

rs4443454 CSS3 (CHSY3) Intronic SCC5 5 129294786 T=0.282 TA T A 14 9 0.297146799

rs13145977 ANXA5 Intronic SCC13 4 122616166 C=0.184 CT C T 3 1 0.317310813

rs1603909 CPVL Intronic SCC13 7 29099292 C=0.392 TC T C 1 0 0.317310813

rs4722558 Intergenic Intergenic SCC13 7 26019767 T=0.220 TC T C 3 1 0.317310813

2

0 rs6947342 PKD1L1 Intronic SCC13 7 47896089 T=0.124 TC T C 0 1 0.317310813

9

rs9912 SCRN1 3'UTR SCC13 7 29960211 A=0.175 GT G T 15 10 0.317310813

rs1056021 PRKD3 3'UTR SCC4 2 37480249 C=0.325 TC T C 0 1 0.317310813

rs11125126 CALM2 Intronic SCC4 2 47395379 G=0.346 AG A G 1 3 0.317310813

rs4952896 FBXO11 Intronic SCC4 2 48116126 G=0.176 GA G A 6 10 0.317310813

rs4953352 EPAS1 Intronic SCC4 2 46565522 T=0.250 CT C T 10 6 0.317310813

rs4953527 MTA3 Intronic SCC4 2 42870125 A=0.325 CA C A 1 0 0.317310813

rs10519348 MCC Intronic SCC5 5 112601878 A=0.097 GA G A 3 6 0.317310813

rs12186425 GRAMD3 Intronic SCC5 5 125774155 A=0.193 GA G A 10 6 0.317310813

rs12513847 ALDH7A1 Intronic SCC5 5 125890559 C=0.450 TC T C 6 10 0.317310813

rs13156718 MCC Intron SCC5 5 112452529 A=0.475 AG A G 6 3 0.317310813 continued 209

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs17135364 MCC Intronic SCC5 5 112502152 G=0.217 GA G A 10 6 0.317310813

rs6890872 MCC Intronic SCC5 5 112398161 G=0.384 AG A G 15 10 0.317310813

rs7727195 CSF1R Intronic SCC5 5 149487941 A=0.254 GA G A 6 3 0.317310813

rs9327500 Intergenic Intergenic SCC5 5 128245804 G=0.176 AG A G 6 3 0.317310813 Coding/Exonic/ rs154001 FBN2 SCC5 5 127685135 C=0.270 TC T C 11 16 0.335923821 Missense rs6863373 NDST1 Intronic SCC5 5 149909931 T=0.311 CT C T 11 16 0.335923821

rs7718029 ZNF608 Intronic SCC5 5 124074013 A=0.283 AG A G 16 11 0.335923821

rs12505502 FAM190A Intronic SCC13 4 91568023 G=0.377 GT G T 11 7 0.345778592

rs12712997 FBXO11 Intronic SCC4 2 48085061 C=0.403 CA C A 7 11 0.345778592

rs7594731 DYNC2LI1 Intronic SCC4 2 44027302 A=0.200 GA G A 7 11 0.345778592 rs12516622

21 GRAMD3 Intronic SCC5 5 125795563 G=0.171 GT G T 11 7 0.345778592

0 rs4836277 ALDH7A1 Intronic SCC5 5 125901441 G=0.385 GT G T 17 12 0.353160358

rs2572324 SNCA Intronic SCC13 4 90678798 C=0.155 TC T C 4 7 0.365712299 Coding/Exonic/ rs752730 KBTBD2 SCC13 7 32909182 T=0.197 CT C T 7 4 0.365712299 Synonymous rs11124954 LRPPRC Intronic SCC4 2 44207974 G=0.455 GC G C 4 7 0.365712299

rs13387221 LRPPRC Intronic SCC4 2 44135314 A=0.292 AG A G 7 4 0.365712299

rs3088359 EPAS1 Intronic SCC4 2 46602251 A=0.4835 TC T C 4 7 0.365712299

rs13177951 RPS14 Intronic SCC5 5 149825939 T=0.207 TC T C 7 4 0.365712299

rs304388 SNCAIP Intronic SCC5 5 121755954 A=0.250 CT C T 18 13 0.369171419

rs492346 MCC Intronic SCC5 5 112617337 A=0.359 GA G A 18 13 0.369171419

rs7721529 CSNK1A1 Intronic SCC5 5 148879318 A=0.264 AT A T 13 18 0.369171419

rs2602117 HERC3 3'UTR SCC13 4 89629066 T=0.482 GA G A 12 8 0.371093372

continued 210

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs2278753 EPAS1 Intronic SCC4 2 46583593 G=0.467 CT C T 8 12 0.371093372

rs10875551 PPARGC1B Intronic SCC5 5 149173134 A=0.307 GA G A 8 12 0.371093372

rs17656204 PDGFRB Intronic SCC5 5 149501803 T=0.177 CT C T 12 8 0.371093372

rs331074 FBN2 Intronic SCC5 5 127722735 T=0.358 CT C T 8 12 0.371093372

rs331097 FBN2 Intronic SCC5 5 127706046 G=0.230 AG A G 8 12 0.371093372

rs9791151 CSNK1A1 Intronic SCC5 5 148877798 G=0.273 GT G T 12 8 0.371093372

rs163186 PRDM6 Intronic SCC5 5 122514588 C=0.257 CT C T 19 14 0.384088251

rs2698720 SNX10 Intronic SCC13 7 26395693 T=0.270 AT A T 9 13 0.393768636

rs10044860 Intergenic Intergenic SCC5 5 149731071 A=0.422 AG A G 13 9 0.393768636

rs3749690 SPINK7 Intronic SCC5 5 147693826 T=0.374 CT C T 9 13 0.393768636

21 rs12643954 GRID2 Intronic SCC13 4 93398759 T=0.331 TC T C 8 5 0.405380557

1

rs10178633 EPAS1 Intronic SCC4 2 46597827 G=0.350 AG A G 5 8 0.405380557

rs11750093 MCC Intronic SCC5 5 112408177 C=0.468 TC T C 5 8 0.405380557

rs13159076 MCC Intronic SCC5 5 112611314 G=0.431 AG A G 8 5 0.405380557

rs13356198 CSS3 (CHSY3) Intronic SCC5 5 129257092 G=0.147 AG A G 5 8 0.405380557

rs245055 SLC26A2 Intronic SCC5 5 149341279 G=0.246 TC T C 5 8 0.405380557

rs740751 PDGFRB Intronic SCC5 5 149515888 A=0.231 AG A G 8 5 0.405380557

rs1992846 EPAS1 Intronic SCC4 2 46597581 T=0.460 TC T C 4 2 0.414216179

rs1363333 MCC Intronic SCC5 5 112364945 G=0.493 TC T C 14 10 0.414216179

rs2304061 PDGFRB Intronic SCC5 5 149503670 A=0.145 GA G A 14 10 0.414216179

rs2737089 SNCAIP Intronic SCC5 5 121783796 C=0.368 CG C G 10 14 0.414216179

rs2973603 LMNB1 Intronic SCC5 5 126152745 C=0.284 GT G T 14 10 0.414216179

continued 211

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs335145 PRDM6 Intronic SCC5 5 122444585 G=0.325 AG A G 14 10 0.414216179

rs6871443 ZNF608 Intronic SCC5 5 124018279 A=0.4835 AG A G 10 14 0.414216179

rs10050399 GRAMD3 Intronic SCC5 5 125810777 C=0.244 CT C T 15 11 0.432767581

rs1549919 CSF1R Intronic SCC5 5 149453450 A=0.235 GA G A 15 11 0.432767581

rs335140 PRDM6 Intronic SCC5 5 122489095 A=0.472 GA G A 15 11 0.432767581

rs7724734 ZNF608 Intronic SCC5 5 124008549 A=0.315 CA C A 11 15 0.432767581

rs968355 LMNB1 Intronic SCC5 5 126140350 G=0.412 CT C T 15 11 0.432767581

rs10263430 SCRN1 Intronic SCC13 7 30015446 G=0.266 GA G A 9 6 0.438578033

rs12668538 SNX10 Intronic SCC13 7 26372025 C=0.239 CA C A 9 6 0.438578033

rs1450478 PRDM5 Intronic SCC13 4 121837791 T=0.239 TC T C 9 6 0.438578033

rs2253519 HNRPLL Intronic SCC4 2 38809162 T=0.241 CT C T 9 6 0.438578033

21 rs4240406 ALDH7A1 Intronic SCC5 5 125882825 T=0.496 TC T C 9 6 0.438578033

2

rs6881379 ZNF608 Intronic SCC5 5 123993505 T=0.328 AT A T 24 19 0.445765913

rs1588523 PRDM5 Intronic SCC13 4 121766616 C=0.491 CT C T 16 12 0.449691803

rs13162396 GRAMD3 Intronic SCC5 5 125791140 G=0.386 GA G A 12 16 0.449691803

rs4836487 CSS3 (CHSY3) Intronic SCC5 5 129351499 G=0.347 GA G A 16 12 0.449691803

rs6858978 CSS3 (CHSY3) Intronic SCC5 5 129439574 A=0.396 GA G A 12 16 0.449691803

rs17149910 PRDM6 Intronic SCC5 5 122448244 G=0.263 AG A G 13 17 0.465208821

rs1864972 PDGFRB Intronic SCC5 5 149509270 A=0.222 GA G A 13 17 0.465208821

rs467069 PRDM6 Intronic SCC5 5 122477874 A=0.242 TA T A 17 13 0.465208821

rs7721622 FBN2 Intronic SCC5 5 127820533 G=0.423 AG A G 13 17 0.465208821

rs11764332 CPVL Intronic SCC13 7 29123198 C=0.338 CT C T 10 7 0.466854274

continued

212

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs7784224 VOPP1 Intronic SCC13 7 55601850 A=0.189 GA G A 7 10 0.466854274

rs1997377 GALM Intronic SCC4 2 38951666 T=0.362 TC T C 10 7 0.466854274

rs752009 MTA3 Intronic SCC4 2 42803431 G=0.272 GA G A 7 10 0.466854274

rs10077431 YTHDC2 Intronic SCC5 5 112927686 A=0.138 CA C A 7 10 0.466854274

rs2120349 MCC Intronic SCC5 5 112532935 T=0.469 TC T C 7 10 0.466854274

rs253349 SYNPO Intronic SCC5 5 150025994 T=0.138 AG A G 7 10 0.466854274

rs7731890 MCC Intronic SCC5 5 112386217 G=0.489 AG A G 10 7 0.466854274

rs1124995 PPARGC1B Intronic SCC5 5 149117004 T=0.220 TC T C 5 3 0.479500124

rs4836276 ALDH7A1 Intronic SCC5 5 125896918 A=0.359 CA C A 14 18 0.479500124

rs7731056 ZNF608 Intronic SCC5 5 123997048 A=0.313 AG A G 14 18 0.479500124

rs1868715 PRDM5 Intronic SCC13 4 121747970 A=0.383 GA G A 11 8 0.491297125

21 rs4722593 SNX10 Intronic SCC13 7 26370190 A=0.309 GA G A 11 8 0.491297125

3

rs11740298 CSF1R Intronic SCC5 5 149492455 T=0.176 TC T C 11 8 0.491297125

rs11960528 CSS3 (CHSY3) Intronic SCC5 5 129360432 A=0.105 AG A G 11 8 0.491297125

rs17692371 CSS3 (CHSY3) Intronic SCC5 5 129444097 A=0.304 AG A G 11 8 0.491297125

rs185411 SLC27A6 Intronic SCC5 5 128313024 A=0.230 AG A G 8 11 0.491297125

rs331086 FBN2 Intronic SCC5 5 127759778 T=0.321 CT C T 8 11 0.491297125

rs4705758 MCC Intronic SCC5 5 112391033 A=0.483 CA C A 15 19 0.492716678

rs2012547 PPARGC1B Intronic SCC5 5 149138228 A=0.426 GA G A 16 20 0.504985076

rs1406754 SNX10 Intronic SCC13 7 26396198 T=0.275 GT G T 9 12 0.512690761

rs343212 PRDM5 Intronic SCC13 4 121790033 C=0.42 CT C T 9 12 0.512690761

rs10054158 MCC Intronic SCC5 5 112386154 G=0.473 CG C G 9 12 0.512690761

continued 213

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs11749913 CSF1R Intronic SCC5 5 149483427 C=0.346 GC G C 12 9 0.512690761

rs12717846 PPARGC1B Intronic SCC5 5 149182447 T=0.270 GT G T 12 9 0.512690761

rs13168872 CSS3 (CHSY3) Intronic SCC5 5 129277660 C=0.357 TC T C 9 12 0.512690761

rs3822606 CAMK2A Intronic SCC5 5 149641155 A=0.266 TC T C 12 9 0.512690761

rs1373651 BMPR1B Intronic SCC13 4 95868525 T=0.460 CT C T 17 21 0.516412269

rs11975645 PLEKHA8 Intronic SCC13 7 30114617 C=0.129 TC T C 6 4 0.527089257

rs17326222 PLEKHA8 Intronic SCC13 7 30090418 T=0.139 GT G T 4 6 0.527089257

rs1867785 EPAS1 Intronic SCC4 2 46534338 G=0.401 AG A G 4 6 0.527089257

rs4953360 EPAS1 Intronic SCC4 2 46589179 A=0.493 AG A G 6 4 0.527089257

rs246390 PDGFRB Intronic SCC5 5 149496321 A=0.489 AG A G 4 6 0.527089257

rs32574 PPARGC1B Intronic SCC5 5 149218886 A=0.316 GT G T 6 4 0.527089257

21 rs32974 SEMA6A Intronic SCC5 5 115831873 T=0.121 GA G A 6 4 0.527089257

4

rs6876829 ZNF608 Intronic SCC5 5 124049388 G=0.1668 AG A G 6 4 0.527089257

rs343195 PRDM5 Intronic SCC13 4 121800466 A=0.283 GA G A 10 13 0.531614577

rs4833229 ANXA5 Intronic SCC13 4 122600664 A=0.447 AG A G 13 10 0.531614577

rs2941666 LMNB1 Intronic SCC5 5 126144324 G=0.297 CA C A 13 10 0.531614577

rs33412 SYNPO Intronic SCC5 5 150033470 T=0.338 AG A G 13 10 0.531614577

rs7737127 GRAMD3 Intronic SCC5 5 125764139 T=0.203 GT G T 13 10 0.531614577

rs2622621 ABCG2 Intronic SCC13 4 89030920 G=0.345 GC G C 14 11 0.548506236

rs11746929 PPARGC1B Intronic SCC5 5 149133874 A=0.281 AG A G 14 11 0.548506236

rs216150 CSF1R Intronic SCC5 5 149447628 A=0.355 GA G A 14 11 0.548506236

rs4836272 ALDH7A1 Intronic SCC5 5 125883341 T=0.381 TC T C 14 11 0.548506236

continued

214

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs6866739 GRAMD3 Intronic SCC5 5 125776486 A=0.188 AG A G 11 14 0.548506236

rs3017908 HERC3 Intronic SCC13 4 89621666 A=0.417 GA G A 12 15 0.563702862

rs3844295 HERC3 Intronic SCC13 4 89527665 A=0.417 GA G A 7 5 0.563702862

rs7357309 FKBP9 Intronic SCC13 7 33030058 A=0.133 AG A G 2 1 0.563702862

rs815965 VOPP1 Intronic SCC13 7 55551832 C=0.352 TC T C 1 2 0.563702862

rs2373000 QPCT Intronic SCC4 2 37592628 C=0.419 TC T C 2 1 0.563702862

rs3097714 GALM Intronic SCC4 2 38960491 G=0.177 TC T C 5 7 0.563702862

rs6544589 MTA3 Intronic SCC4 2 42935706 A=0.311 AT A T 5 7 0.563702862

rs6544889 EPAS1 Intronic SCC4 2 46577299 G=0.299 AG A G 7 5 0.563702862

rs7584547 HNRPLL Intronic SCC4 2 38791920 G=0.258 GC G C 7 5 0.563702862

21 rs10056807 ZNF608 Intronic SCC5 5 123986010 C=0.122 GC G C 7 5 0.563702862

5

rs167349 SNCAIP Intronic SCC5 5 121752508 T=0.464 GA G A 12 15 0.563702862

rs17149137 SNCAIP Intronic SCC5 5 121737457 A=0.303 GA G A 12 15 0.563702862

rs26531 CDO1 Intronic SCC5 5 115150525 C=0.489 CA C A 12 15 0.563702862

rs6894219 MCC Intronic SCC5 5 112383275 A=0.483 AG A G 12 15 0.563702862

rs12614710 EPAS1 Intronic SCC4 2 46565091 T=0.228 GT G T 16 13 0.577468665

rs2270414 EIF2AK2 Intronic SCC4 2 37375932 C=0.485 TC T C 16 13 0.577468665

rs2237084 CSF1R Intronic SCC5 5 149465491 C=0.230 CA C A 16 13 0.577468665

rs7732198 MCC Intronic SCC5 5 112379659 C=0.367 CT C T 16 13 0.577468665

rs6462161 PRR15 Intronic SCC13 7 29605549 G=0.372 GC G C 14 17 0.590013889

rs6462430 KBTBD2 Intronic SCC13 7 32921831 G=0.491 GA G A 17 14 0.590013889

rs12618410 PRKD3 Intronic SCC4 2 37521850 C=0.370 GC G C 17 14 0.590013889

continued 215

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs3770754 QPCT Intronic SCC4 2 37575381 G=0.282 GC G C 17 14 0.590013889

rs13174179 PPARGC1B Intronic SCC5 5 149150671 A=0.369 GA G A 14 17 0.590013889

rs6862261 GRAMD3 Intronic SCC5 5 125828248 G=0.440 AG A G 14 17 0.590013889

rs1562855 GEMIN6 Intronic SCC4 2 39008468 C=0.365 TC T C 8 6 0.5929801

rs2043188 GALM Intronic SCC4 2 38928981 C=0.174 TC T C 8 6 0.5929801

rs11951727 ZNF608 Intronic SCC5 5 124017618 T=0.266 CT C T 18 15 0.601508136

rs34173 C7orf16 Intronic SCC13 7 31735263 G=0.191 GC G C 7 9 0.617075078

rs7657769 PRDM5 Intronic SCC13 4 121633580 A=0.290 CA C A 9 7 0.617075078

rs10066188 PPIC Intronic SCC5 5 122364030 T=0.311 TC T C 9 7 0.617075078

rs12517185 MCC Intronic SCC5 5 112444660 G=0.408 GC G C 9 7 0.617075078

rs6594665 MCC Intronic SCC5 5 112388225 T=0.417 TC T C 7 9 0.617075078

21 rs10230195 VOPP1 Intronic SCC13 7 55636629 G=0.268 AG A G 10 8 0.637351889

6

rs11097234 SNCA Intronic SCC13 4 90713330 G=0.293 CG C G 10 8 0.637351889

rs12505410 ABCG2 Intronic SCC13 4 89030841 G=0.300 GT G T 10 8 0.637351889

rs1695508 Intergenic Intergenic SCC13 7 28966556 A=0.331 CT C T 10 8 0.637351889

rs10190161 LRPPRC Intronic SCC4 2 44187829 A=0.454 CA C A 10 8 0.637351889

rs12652257 MCC Intronic SCC5 5 112501801 C=0.377 TC T C 8 10 0.637351889

rs2282809 CDX1 Intronic SCC5 5 149551812 C=0.301 CG C G 8 10 0.637351889

rs32233 CSS3 (CHSY3) Intronic SCC5 5 129244591 T=0.489 CT C T 10 8 0.637351889

rs6595441 CEP120 Intronic SCC5 5 122724446 G=0.222 AG A G 8 10 0.637351889

rs6893016 MCC Intronic SCC5 5 112386284 A=0.479 AG A G 8 10 0.637351889

rs3017896 HERC3 Intronic SCC13 4 89533004 C=0.129 CG C G 11 9 0.654720846

continued 216

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs7681598 PRDM5 Intronic SCC13 4 121790202 A=0.151 AG A G 9 11 0.654720846

rs12712998 FBXO11 Intronic SCC4 2 48114209 A=0.161 GA G A 3 2 0.654720846

rs4670264 GALM Intronic SCC4 2 38913884 A=0.422 AG A G 3 2 0.654720846

rs6721860 HNRPLL Intronic SCC4 2 38818221 T=0.281 CT C T 2 3 0.654720846

rs6753406 MTA3 Intronic SCC4 2 42902700 A=0.359 AG A G 2 3 0.654720846

rs1012543 PPARGC1B Intronic SCC5 5 149176945 C=0.241 TC T C 11 9 0.654720846

rs1348431 MCC Intronic SCC5 5 112521347 T=0.237 CT C T 11 9 0.654720846

rs2963765 Intergenic Intergenic SCC5 5 114693074 G=0.388 TC T C 11 9 0.654720846

rs304386 SNCAIP Intronic SCC5 5 121756884 T=0.273 GA G A 11 9 0.654720846

rs4583877 PDGFRB Intronic SCC5 5 149531801 C=0.481 TC T C 11 9 0.654720846

21 rs637634 MCC Intronic SCC5 5 112579347 C=0.475 CT C T 11 9 0.654720846

7

rs9324628 PPARGC1B Intronic SCC5 5 149187134 A=0.267 AG A G 9 11 0.654720846

rs2622626 ABCG2 Intronic SCC13 4 89066715 C=0.494 GT G T 10 12 0.669815358

rs4723226 FKBP9 Intronic SCC13 7 33003687 A=0.173 AT A T 12 10 0.669815358

rs1504175 FSHR Intronic SCC4 2 49277823 G=0.492 GA G A 12 10 0.669815358

rs1966484 PPARGC1B Intronic SCC5 5 149157113 T=0.281 CT C T 12 10 0.669815358

rs216137 CSF1R Intronic SCC5 5 149436545 A=0.418 GA G A 10 12 0.669815358

rs4948023 LANCL2 Intronic SCC13 7 55454267 A=0.406 AG A G 13 11 0.683091398

rs7799841 CHN2 Intronic SCC13 7 29446736 A=0.377 GA G A 13 11 0.683091398 Coding/Exonic/ rs7933 ZFP36L2 SCC4 2 43452183 G=0.436 GA G A 11 13 0.683091398 Synonymous rs11740247 PPARGC1B Intronic SCC5 5 149133734 A=0.230 GA G A 11 13 0.683091398

rs32206 FBN2 Intronic SCC5 5 127617336 T=0.150 CT C T 11 13 0.683091398

continued 217

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs4075350 MCC Intronic SCC5 5 112373718 A=0.491 TC T C 13 11 0.683091398

rs4558984 GRAMD3 Intronic SCC5 5 125818909 C=0.483 TC T C 13 11 0.683091398

rs7717604 ZNF608 Intronic SCC5 5 123989617 T=0.309 CT C T 13 11 0.683091398

rs4953340 EPAS1 Intronic SCC4 2 46548064 C=0.436 GC G C 14 12 0.694886602 Coding/Exonic/ rs1042719 ADRB2 SCC5 5 148207447 C=0.352 CG C G 14 12 0.694886602 Synonymous Coding/Exonic/ rs2015018 CSS3 (CHSY3) SCC5 5 129521126 T=0.412 TC T C 14 12 0.694886602 Missense rs245080 SLC26A2 Intronic SCC5 5 149354805 C=0.25 CT C T 14 12 0.694886602

rs4705344 Intergenic Intergenic SCC5 5 148822118 G=0.434 GA G A 12 14 0.694886602

rs4705796 MCC Intronic SCC5 5 112485614 G=0.369 AG A G 14 12 0.694886602

rs887343 CDX1 Intronic SCC5 5 149548823 A=0.325 GA G A 14 12 0.694886602

21 rs815991 LANCL2 Intronic SCC13 7 55475080 G=0.476 AG A G 4 3 0.705456986

8

rs6419610 LRPPRC Intronic SCC4 2 44129774 G=0.479 GA G A 13 15 0.705456986

rs6544724 LRPPRC Intronic SCC4 2 44211546 G=0.418 AG A G 4 3 0.705456986

rs7425925 Intergenic Intergenic SCC4 2 40890416 T=0.3661 TC T C 4 3 0.705456986

rs2161257 PPARGC1B Intronic SCC5 5 149189997 A=0.468 GA G A 15 13 0.705456986

rs245079 SLC26A2 Intronic SCC5 5 149354704 G=0.400 CA C A 15 13 0.705456986

rs6595711 GRAMD3 Intronic SCC5 5 125800836 T=0.326 TC T C 3 4 0.705456986

rs13177306 ZNF608 Intronic SCC5 5 124012861 C=0.261 CG C G 16 14 0.715000655

rs9325134 ABLIM3 Intronic SCC5 5 148551165 G=0.466 GA G A 16 14 0.715000655

rs2272226 LANCL2 Intronic SCC13 7 55467923 C=0.309 GC G C 15 17 0.723673611 Coding/Exonic/ rs1132528 YTHDC2 SCC5 5 112929013 T=0.427 TA T A 17 15 0.723673611 Missense rs2569075 CSF1R Intronic SCC5 5 149458829 T=0.330 CT C T 15 17 0.723673611

continued 218

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs343173 PRDM5 Intronic SCC13 4 121761018 C=0.187 GC G C 4 5 0.738882681

rs1027478 CALM2 Intronic SCC4 2 47390026 T=0.476 AG A G 5 4 0.738882681

rs10519347 MCC Intronic SCC5 5 112593347 T=0.281 TC T C 5 4 0.738882681

rs17710564 CSNK1A1 Intronic SCC5 5 148916211 C=0.133 GC G C 5 4 0.738882681

rs28282 PPARGC1B Intronic SCC5 5 149202206 G=0.430 GC G C 4 5 0.738882681

rs13120400 ABCG2 Intronic SCC13 4 89033527 C=0.139 TC T C 6 5 0.763024601

rs3109433 FKBP9 Intronic SCC13 7 33011535 T=0.259 CT C T 5 6 0.763024601

rs7661193 PRDM5 Intronic SCC13 4 121669224 C=0.098 CA C A 6 5 0.763024601

rs17323768 FBXO11 Intronic SCC4 2 48039223 G=0.203 GT G T 5 6 0.763024601

rs6736282 LRPPRC Intronic SCC4 2 44204717 C=0.296 CT C T 6 5 0.763024601 rs17797713 2 PPARGC1B Intronic SCC5 5 149115155 C=0.136 TC T C 6 5 0.763024601

1

9 Coding/Exonic/

rs2231142 ABCG2 SCC13 4 89052323 T=0.1389 AC A C 6 7 0.781511295 Missense rs975537 CRHR2 Intronic SCC13 7 30697357 T=0.223 TA T A 6 7 0.781511295

rs2072569 Intergenic Intergenic SCC4 2 40783227 T=0.153 AT A T 7 6 0.781511295

rs1038072 PPARGC1B Intronic SCC5 5 149123132 G=0.365 TC T C 6 7 0.781511295

rs10515641 SYNPO Intronic SCC5 5 150001187 A=0.110 GA G A 7 6 0.781511295

rs254209 SEMA6A Intronic SCC5 5 115894579 A=0.318 AG A G 6 7 0.781511295

rs335182 PRDM6 Intronic SCC5 5 122450824 A=0.120 AG A G 6 7 0.781511295

rs2190242 CRHR2 Intronic SCC13 7 30709475 C=0.298 CA C A 7 8 0.796253415

rs6857766 ANXA5 Intronic SCC13 4 122611285 A=0.227 GA G A 8 7 0.796253415

rs11691326 LRPPRC Intronic SCC4 2 44196195 G=0.305 AG A G 8 7 0.796253415

rs4670265 GALM Intronic SCC4 2 38932777 T=0.397 GT G T 7 8 0.796253415

continued 219

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs7583088 EPAS1 Intronic SCC4 2 46603165 G=0.281 GA G A 7 8 0.796253415

rs11748334 SNCAIP Intronic SCC5 5 121732448 C=0.109 TC T C 7 8 0.796253415

rs17598783 LMNB1 Intronic SCC5 5 126135382 G=0.301 GA G A 7 8 0.796253415

rs256940 FEM1C Intronic SCC5 5 114866191 G=0.293 AG A G 8 7 0.796253415

rs11764362 VOPP1 Intronic SCC13 7 55603625 C=0.345 TC T C 8 9 0.808365156

rs2391276 SNX10 Intronic SCC13 7 26385004 A=0.461 CA C A 9 8 0.808365156

rs34161 C7orf16 Intronic SCC13 7 31727749 G=0.438 AG A G 9 8 0.808365156

rs6970537 HOXA10 Intronic SCC13 7 27217359 A=0.388 AG A G 8 9 0.808365156

rs929377 CRHR2 Intronic SCC13 7 30704159 T=0.486 AT A T 9 8 0.808365156

rs4952818 EPAS1 Intronic SCC4 2 46537056 T=0.281 TC T C 8 9 0.808365156

rs13358977 PPARGC1B Intronic SCC5 5 149142339 T=0.281 CT C T 8 9 0.808365156

22 rs6893389

0 MCC Intronic SCC5 5 112393483 A=0.278 GA G A 8 9 0.808365156

rs7730782 CSS3 (CHSY3) Intronic SCC5 5 129450559 A=0.381 AG A G 9 8 0.808365156

rs34167 C7orf16 Intronic SCC13 7 31731366 G=0.290 AG A G 10 9 0.818545808

rs6532079 FAM13A Intronic SCC13 4 89686573 T=0.283 TA T A 10 9 0.818545808

rs9639668 KBTBD2 Intronic SCC13 7 32912635 A=0.262 AG A G 10 9 0.818545808

rs10065900 SNCAIP Intronic SCC5 5 121707541 G=0.156 GT G T 10 9 0.818545808

rs10519694 LOX Intronic SCC5 5 121407219 T=0.150 CT C T 10 9 0.818545808

rs4621556 ZNF608 Intronic SCC5 5 124033113 G=0.365 GA G A 10 9 0.818545808 HOXA13/HOTT near 5' of rs2067087 SCC13 7 27241660 G=0.319 GC G C 10 11 0.827259347 IP gene/nc-RNA rs6751481 GALM Intronic SCC4 2 38897810 C=0.484 CT C T 11 10 0.827259347

rs11738528 MCC Intronic SCC5 5 112536689 T=0.498 TA T A 11 10 0.827259347

continued 220

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs27567 MCC Intronic SCC5 5 112741798 T=0.424 AG A G 11 10 0.827259347

rs4705369 PPARGC1B Intronic SCC5 5 149136958 A=0.311 AG A G 11 10 0.827259347 HOXA13/HOTT Intronic (nc- rs10233387 SCC13 7 27243106 A=0.360 AG A G 11 12 0.834827329 IP RNA) rs13311608 FKBP9 Intronic SCC13 7 32999601 G=0.362 GA G A 11 12 0.834827329

rs2725252 ABCG2 Intronic SCC13 4 89061910 A=0.434 GT G T 12 11 0.834827329

rs6973742 Intergenic Intergenic SCC13 7 26151190 A=0.352 GA G A 12 11 0.834827329

rs12519315 YTHDC2 Intronic SCC5 5 112924802 T=0.440 TA T A 12 11 0.834827329

rs12719304 SNCAIP Intronic SCC5 5 121734393 C=0.183 GC G C 11 12 0.834827329

rs2545341 NDST1 Intronic SCC5 5 149914401 T=0.316 TC T C 11 12 0.834827329

rs509237 MCC Intronic SCC5 5 112550247 A=0.414 GA G A 11 12 0.834827329

rs6860957 GRAMD3 Intronic SCC5 5 125801394 T=0.275 TC T C 11 12 0.834827329

22

1 rs774256 Intergenic Intergenic SCC13 7 27003454 C=0.270 GC G C 13 12 0.841480581

rs3731838 GALM 3'UTR SCC4 2 38960927 G=0.372 GA G A 13 12 0.841480581

rs10076327 PPIC Intronic SCC5 5 122367347 A=0.147 AG A G 13 12 0.841480581

rs12522122 CSNK1A1 Intronic SCC5 5 148910772 A=0.274 GA G A 13 12 0.841480581

rs252161 RBM22 Intronic SCC5 5 150078956 G=0.112 GA G A 13 12 0.841480581

rs4626335 ALDH7A1 Intronic SCC5 5 125897411 C=0.467 CT C T 13 12 0.841480581

rs6865809 YTHDC2 Intronic SCC5 5 112911703 A=0.339 AT A T 13 12 0.841480581 Coding/Exonic/ rs3750117 NT5C3 SCC13 7 33060946 A=0.319 GA G A 14 13 0.84738966 Synonymous rs2010994 PPARGC1B Intronic SCC5 5 149191652 A=0.202 GA G A 13 14 0.84738966 Coding/Exonic/ rs42427 APC SCC5 5 112176325 G=0.336 GA G A 13 14 0.84738966 Synonymous rs4835981 PRDM6 Intronic SCC5 5 122508257 T=0.364 CT C T 13 14 0.84738966

continued 221

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs8026 SMARCAD1 3'UTR SCC13 4 95212185 G=0.454 GA G A 14 15 0.852683684

rs2080896 MCC Intronic SCC5 5 112367785 A=0.293 TC T C 14 15 0.852683684

rs2617280 SNCAIP Intronic SCC5 5 121778880 A=0.342 GT G T 15 14 0.852683684

rs6650969 PPARGC1B Intronic SCC5 5 149119979 G=0.490 GT G T 15 14 0.852683684

rs9687567 GRAMD3 Intronic SCC5 5 125800646 C=0.401 TC T C 14 15 0.852683684

rs1406755 SNX10 Intronic SCC13 7 26396245 G=0.467 GA G A 16 15 0.857462442

rs162953 PRDM6 Intronic SCC5 5 122499648 A=0.446 CA C A 15 16 0.857462442 Coding/Exonic/ rs7708070 ZNF608 SCC5 5 123984763 C=0.236 TC T C 16 15 0.857462442 Synonymous rs11097413 HPGDS Intronic SCC13 4 95249286 T=0.411 TC T C 16 17 0.861804433

rs4285214 ZNF608 Intronic SCC5 5 124024281 T=0.444 GT G T 17 16 0.861804433

rs1351608 Intergenic Intergenic SCC5 5 113180637 C=0.339 GA G A 18 17 0.865772375 rs11729301 22 PRDM5 Intronic SCC13 4 121617285 C=0.231 CA C A 19 18 0.869417061

2

rs245051 SLC26A2 Intronic SCC5 5 149345975 T=0.460 CT C T 19 18 0.869417061

rs767657 HPGDS Intronic SCC13 4 95227901 T=0.448 GT G T 22 23 0.881497452

rs10281500 VOPP1 Intronic SCC13 7 55542890 G=0.272 GC G C 9 9 1

rs11723050 PRDM5 Intronic SCC13 4 121814232 T=0.431 CT C T 14 14 1

rs1468402 GARS Intronic SCC13 7 30668882 A=0.456 GA G A 13 13 1

rs17731538 ABCG2 Intronic SCC13 4 89055379 A=0.120 GA G A 5 5 1

rs2622624 ABCG2 Intronic SCC13 4 89069406 C=0.446 GA G A 8 8 1

rs2725248 ABCG2 Intronic SCC13 4 89068007 C=0.194 GT G T 6 6 1

rs6962537 ELMO1 Intronic SCC13 7 37480974 T=0.369 CT C T 11 11 1

rs17031599 DYNC2LI1 Intronic SCC4 2 44031175 A=0.397 AG A G 6 6 1

continued 222

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs17034950 EPAS1 Intronic SCC4 2 46538794 A=0.326 AG A G 6 6 1

rs3770764 PRKD3 Intronic SCC4 2 37489877 T=0.354 TC T C 2 2 1

rs4384764 QPCT Intronic SCC4 2 37590284 A=0.410 GA G A 9 9 1

rs6741740 LRPPRC Intronic SCC4 2 44181434 A=0.201 AG A G 8 8 1

rs969876 HNRPLL Intronic SCC4 2 38801247 T=0.416 AT A T 12 12 1 Coding/Exonic/ rs1042714 ADRB2 SCC5 5 148206473 G=0.234 GC G C 17 17 1 Missense rs1051644 LMNB1 3'UTR SCC5 5 126172195 T=0.459 CT C T 11 11 1

rs10875552 PPARGC1B Intronic SCC5 5 149189489 G=0.222 AG A G 7 7 1

rs11739347 MCC Intronic SCC5 5 112516435 C=0.379 TC T C 5 5 1

rs13167635 CSS3 (CHSY3) Intronic SCC5 5 129469043 C=0.313 CA C A 11 11 1

rs13177918 RPS14 Intronic SCC5 5 149825870 T=0.180 CT C T 7 7 1

22

3 rs13359847 SNCAIP Intronic SCC5 5 121669859 A=0.254 GA G A 3 3 1

rs2035472 ALDH7A1 Intronic SCC5 5 125900721 G=0.302 CG C G 15 15 1

rs2077482 CSNK1A1 Intronic SCC5 5 148889672 T=0.315 AT A T 9 9 1

rs2165929 MCC Intronic SCC5 5 112485155 G=0.202 GA G A 7 7 1

rs2229562 PDGFRB 3'UTR SCC5 5 149495253 G=0.311 AG A G 1 1 1

rs2240781 PDGFRB Intronic SCC5 5 149516480 T=0.459 TC T C 4 4 1

rs2306619 ALDH7A1 Intronic SCC5 5 125887856 G=0.262 AG A G 3 3 1

rs246392 PDGFRB Intronic SCC5 5 149497672 T=0.172 TC T C 15 15 1

rs4705382 PPARGC1B Intronic SCC5 5 149181366 T=0.450 TC T C 8 8 1

rs4835913 ALDH7A1 Intronic SCC5 5 125895509 G=0.436 GA G A 10 10 1

rs4836108 ZNF608 Intronic SCC5 5 124010261 G=0.449 GA G A 13 13 1

continued 223

Table 12 continued Candidate GRCh37.p13 dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Genotype Allele 1 Allele 2 Locus Chrom. Position (Accessed 2012) Imbalance Imbalance P-value rs4836110 ZNF608 Intronic SCC5 5 124040779 G=0.226 GC G C 13 13 1

rs6870146 CSF1R Intronic SCC5 5 149461912 A=0.138 CA C A 12 12 1

rs6892794 PPARGC1B Intronic SCC5 5 149124131 G=0.350 AG A G 9 9 1

rs6895698 PPARGC1B Intronic SCC5 5 149140262 A=0.215 AG A G 7 7 1

rs7736823 MCC Intronic SCC5 5 112424337 G=0.332 AG A G 7 7 1

rs2331014 Intergenic Intergenic SCC13 7 55286337 C=0.191 CT C T 0 0 N.A

rs4693375 SMARCAD1 Intronic SCC13 4 95133241 C=0.465 CA C A 0 0 N.A.

rs4948031 VOPP1 Intronic SCC13 7 55610012 A=0.409 AC A C 0 0 N.A.

rs6979746 PKD1L1 Intronic SCC13 7 47831535 C=0.206 TC T C 0 0 N.A.

rs13404066 MTA3 Intronic SCC4 2 42860732 G=0.389 AG A G 0 0 N.A.

rs1529625 GALM Intronic SCC4 2 38945701 C=0.379 GT G T 0 0 N.A.

22

4 rs2047681 FBXO11 Intronic SCC4 2 48056231 G=0.103 GA G A 0 0 N.A.

rs3815995 DYNC2LI1 Intronic SCC4 2 44010620 A=0.394 CT C T 0 0 N.A. EPCAM rs3924917 Intronic SCC4 2 47612878 G=0.437 AG A G 0 0 N.A. (TACSTD1) rs6757976 MTA3 Intronic SCC4 2 42856208 G=0.240 AG A G 0 0 N.A.

rs2941664 LMNB1 Intronic SCC5 5 126150428 T=0.390 AT A T 0 0 N.A.

rs498360 MCC Intronic SCC5 5 112584331 A=0.275 AG A G 0 0 N.A.

rs632678 PRDM6 Intronic SCC5 5 122447774 C=0.496 CT C T 0 0 N.A.

rs6866298 CSF1R Intronic SCC5 5 149439588 T=0.215 CT C T 0 0 N.A.

Table 12. Allele-specific imbalance data for 525 SNPs in the discovery sample set

224

GRCh37.p13 Candidate dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Chromosome Genotype Allele 1 Allele 2 Locus (accessed 2012) Imbalance Imbalance P-value Position rs1919935 SNX10 Intronic SCC13 7 26399235 C=0.254 CT C T 22 4 0.000415418 rs6958331 LANCL2 Intronic SCC13 7 55440523 C=0.138 CT C T 14 2 0.002699796 rs1049394 SCRN1 3'UTR SCC13 7 29960804 T=0.309 GA G A 14 3 0.007632882 rs1513721 LOC101927052 Intronic SCC13 4 122411527 G=0.318 CA C A 4 15 0.011616892 rs34166 C7orf16 Intronic SCC13 7 31730776 T=0.467 TA T A 2 11 0.01255492 rs2699814 SNX10 Intronic SCC13 7 26406800 T=0.474 TA T A 6 16 0.033006262 rs1534696 SNX10 Intronic SCC13 7 26397239 C=0.348 GT G T 2 8 0.05777958 rs6949451 HOXA10 Intronic SCC13 7 27215041 T=0.415 CT C T 17 8 0.07186065 rs4722968 PLEKHA8 Intronic SCC13 7 30109718 G=0.196 GC G C 7 2 0.09558073 rs38410 GGCT Intronic SCC13 7 30540399 G=0.361 GT G T 17 9 0.116664507 rs843576 PRDM5 Intronic SCC13 4 121645319 G=0.179 AG A G 9 4 0.16551794 rs2390054 PRDM5 Intronic SCC13 4 121783018 G=0.454 GT G T 17 10 0.177931811 rs221171 Intergenic Intergenic SCC13 7 29006139 G=0.455 GC G C 3 7 0.205903321 rs2284218 CRHR2 Intronic SCC13 7 30714333 T=0.432 CT C T 20 13 0.223017612 rs39099 CHN2 Intronic SCC13 7 29293095 A=0.271 GA G A 14 9 0.297146799 rs2622604 ABCG2 Intronic SCC13 4 89078924 T=0.171 CT C T 10 15 0.317310813 rs2930807 HERC3 Intronic SCC13 4 89552006 A=0.412 GA G A 7 11 0.345778592 rs6964512 Intergenic Intergenic SCC13 7 26026149 G=0.452 GT G T 10 7 0.466854274 rs343201 PRDM5 Intronic SCC13 4 121797035 G=0.459 GA G A 27 22 0.475050526

22 rs17153429 SNX10 Intronic SCC13 7 26340970 A=0.249 GA G A 8 11 0.491297125 rs2893245 SNX10 Intronic SCC13 7 26352073 G=0.409 GA G A 12 9 0.512690761

5

rs3844295 HERC3 Intronic SCC13 4 89527665 A=0.417 GA G A 6 4 0.527089257 rs356168 SNCA Intronic SCC13 4 90674431 A=0.405 GA G A 12 15 0.563702862 rs728989 SMARCAD1 Intronic SCC13 4 95164481 G=0.307 CG C G 13 11 0.683091398 rs4401454 BMPR1B Intronic SCC13 4 95741658 G=0.188 GA G A 13 15 0.705456986 rs1513807 FAM13A Intronic SCC13 4 89961290 T=0.499 GT G T 15 17 0.723673611 rs16881927 FKBP9 Intronic SCC13 7 33042691 C=0.200 CT C T 5 4 0.738882681 rs34179 C7orf16 Intronic SCC13 7 31741035 T=0.390 CT C T 11 12 0.834827329 rs10516930 GRID2 Intronic SCC13 4 94557686 T=0.343 TC T C 13 14 0.84738966 rs4395533 Intergenic Intergenic SCC13 4 121481950 A=0.496 GA G A 15 14 0.852683684 rs7801318 CHN2 Intronic SCC13 7 29384157 T=0.341 TC T C 2 2 1 rs7811609 KBTBD2 Intronic SCC13 7 32930597 T=0.265 TC T C 15 15 1 rs17326236 PLEKHA8 Intronic SCC13 7 30091123 A=0.136 AG A G 0 0 N.A rs3770755 QPCT Intronic SCC4 2 37574134 G=0.298 GC G C 15 5 0.02534732 rs9973653 EPAS1 Intronic SCC4 2 46548109 T=0.490 GT G T 2 8 0.05777958 Table 13. Allele-specific imbalance data for 103 SNPs in the validation sample set continued 225

Table 13 continued GRCh37.p13 Candidate dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Chromosome Genotype Allele 1 Allele 2 Locus (accessed 2012) Imbalance Imbalance P-value Position rs10199201 EPAS1 Intronic SCC4 2 46577162 C=0.172 CT C T 6 1 0.058781731 rs336030 FAM82A1 Intronic SCC4 2 38266996 G=0.327 CA C A 11 4 0.070701156 rs1868087 EPAS1 Intronic SCC4 2 46579719 G=0.212 AG A G 0 3 0.08326454 rs4953344 EPAS1 Intronic SCC4 2 46552458 C=0.085 CT C T 9 3 0.08326454 rs2034327 EPAS1 Intronic SCC4 2 46549040 G=0.455 CG C G 18 10 0.130570066 rs330787 FBXO11 Intronic SCC4 2 48041377 A=0.357 TC T C 3 7 0.205903321 rs4953353 EPAS1 Intronic SCC4 2 46567276 T=0.385 GT G T 11 6 0.22525306 rs3770752 QPCT Intronic SCC4 2 37576136 C=0.220 CT C T 8 4 0.248213251 rs10182633 FBXO11 Intronic SCC4 2 48091408 T=0.258 TA T A 12 7 0.251349301 rs874869 FBXO11 Intronic SCC4 2 48094337 G=0.431 CG C G 9 5 0.285049646 rs2043912 MTA3 Intronic SCC4 2 42935481 G=0.334 AG A G 1 0 0.317310813 rs3768730 EPAS1 Intronic SCC4 2 46592524 G=0.285 GT G T 13 9 0.393768636 rs2537742 FBXO11 Intronic SCC4 2 48039044 C=0.163 AG A G 4 2 0.414216179 rs1864828 HNRPLL Intronic SCC4 2 38824781 A=0.134 TA T A 6 9 0.438578033 rs7568481 LRPPRC Intronic SCC4 2 44145374 G=0.335 GT G T 15 12 0.563702862 rs4953012 DYNC2LI1 Intronic SCC4 2 44006464 C=0.204 CA C A 7 9 0.617075078 rs4953013 DYNC2LI1 Intronic SCC4 2 44007075 G=0.402 AG A G 4 3 0.705456986 rs960106 FBXO11 Intronic SCC4 2 48078838 C=0.347 TC T C 5 4 0.738882681

22 EPCAM

6 rs3923559 Intronic SCC4 2 47606774 T=0.344 CT C T 20 22 0.757620724 (TACSTD1)

rs3087822 CRIPT 3'UTR SCC4 2 46852033 G=0.431 GA G A 7 8 0.796253415 rs2346175 EPAS1 Intronic SCC4 2 46571310 T=0.267 TC T C 14 13 0.84738966 rs12516924 NDST1 Intronic SCC5 5 149901834 G=0.233 GA G A 12 26 0.023140934 rs6891155 CEP120 Intronic SCC5 5 122684131 G=0.442 GA G A 26 12 0.023140934 rs4835907 GRAMD3 Intronic SCC5 5 125825322 A=0.226 TA T A 10 23 0.023635105 rs10080096 YTHDC2 Intronic SCC5 5 112879405 G=0.310 GT G T 5 15 0.02534732 rs4705086 PPARGC1B Intronic SCC5 5 149121767 C=0.498 TC T C 5 15 0.02534732 rs353237 Intergenic Intergenic SCC5 5 148831850 C=0.223 CT C T 29 15 0.034808484 rs4705415 PDGFRB Intronic SCC5 5 149517235 A=0.400 GA G A 16 30 0.038999957 rs171527 FBN2 Intronic SCC5 5 127662706 T=0.144 TC T C 17 7 0.041226838 rs1862175 Intergenic Intergenic SCC5 5 121532861 C=0.424 AG A G 6 14 0.073638283 rs2973586 LMNB1 Intronic SCC5 5 126130101 C=0.369 GA G A 14 25 0.078169103 rs2956540 LOX Intronic SCC5 5 121409180 C=0.351 GC G C 9 3 0.08326454 rs4836261 GRAMD3 Intronic SCC5 5 125786305 T=0.369 GT G T 19 10 0.094670744 rs11241892 GRAMD3 Intronic SCC5 5 125773331 A=0.338 CA C A 14 24 0.104757513 continued

226

Table 13 continued GRCh37.p13 Candidate dbSNP MAF Allele 1 Allele 2 Chi-Squared SNP ID Tagged Gene SNP Type Chrom. Chromosome Genotype Allele 1 Allele 2 Locus (accessed 2012) Imbalance Imbalance P-value Position rs34850 CDO1 Intronic SCC5 5 115142745 C=0.472 CG C G 12 21 0.117185131 rs10213948 GRAMD3 Intronic SCC5 5 125826191 T=0.198 TC T C 5 11 0.133614457 rs4078254 MCC Intronic SCC5 5 112456067 A=0.444 GT G T 24 15 0.149541421 rs2304060 PDGFRB Intronic SCC5 5 149501751 A=0.411 CA C A 21 13 0.170067024 rs1008776 LOC101927421 Intronic SCC5 5 124480600 C=0.426 CT C T 22 32 0.173568239 rs639680 MCC Intronic SCC5 5 112562163 C=0.181 GC G C 7 13 0.179712587 rs1947582 CSNK1A1 Intronic SCC5 5 148892095 C=0.293 GA G A 11 17 0.256839426 rs2963765 Intergenic Intergenic SCC5 5 114693074 G=0.388 TC T C 16 23 0.262331878 rs4705073 Intergenic Intergenic SCC5 5 148832326 C=0.428 CT C T 18 25 0.285750874 rs3095902 NDST1 Intronic SCC5 5 149926998 C=0.450 CT C T 19 13 0.28884457 rs7737512 MCC Intronic SCC5 5 112618167 G=0.132 GA G A 9 14 0.297146799 rs2775 ALDH7A1 3'UTR SCC5 5 125880589 C=0.392 AG A G 20 14 0.303483916 rs6897488 GRAMD3 Intronic SCC5 5 125769679 C=0.410 CT C T 29 22 0.32698936 rs966087 LMNB1 Intronic SCC5 5 126170910 G=0.319 GT G T 22 28 0.39614391 rs1077724 PDGFRB Intronic SCC5 5 149504115 T=0.278 AT A T 15 20 0.398024721 rs4572999 ZNF608 Intronic SCC5 5 124024209 G=0.239 GC G C 15 20 0.398024721 rs2254161 Intergenic Intergenic SCC5 5 148823662 G=0.356 GT G T 24 30 0.414216179 rs11955699 MCC Intronic SCC5 5 112498949 G=0.440 GT G T 12 16 0.449691803

22 rs17653932 Intergenic Intergenic SCC5 5 148831133 T=0.275 TC T C 17 21 0.516412269 rs32216 FBN2 Intronic SCC5 5 127640076 C=0.242 TC T C 18 22 0.527089257

7

rs4240406 ALDH7A1 Intronic SCC5 5 125882825 T=0.496 TC T C 2 1 0.563702862 rs13359059 GRAMD3 Intronic SCC5 5 125799373 C=0.450 CT C T 17 14 0.590013889 rs9327244 Intergenic Intergenic SCC5 5 121597391 G=0.292 GC G C 14 12 0.694886602 rs6886362 ZNF608 Intronic SCC5 5 124047163 G=0.277 GT G T 20 18 0.745602789 rs4835948 FBN2 Intronic SCC5 5 127870190 T=0.494 CT C T 23 21 0.763024601 rs7719910 CSF1R Intronic SCC5 5 149445987 T=0.145 GT G T 6 7 0.781511295 rs1010101 CSF1R Intronic SCC5 5 149489110 C=0.415 TC T C 14 13 0.84738966 rs11241891 GRAMD3 Intronic SCC5 5 125768020 T=0.268 CT C T 16 15 0.857462442 rs4579258 ZNF608 Intronic SCC5 5 124037371 G=0.419 AG A G 17 16 0.861804433 rs256938 FEM1C Intronic SCC5 5 114868337 C=0.377 CA C A 21 20 0.875896058 rs2112487 PRDM6 Intronic SCC5 5 122506237 T=0.169 CT C T 14 14 1 Coding/Exonic/ rs6862252 ZNF608 SCC5 5 123983915 G=0.369 GT G T 14 14 1 Missense rs919751 PDGFRB Intronic SCC5 5 149505489 C=0.257 TC T C 13 13 1 rs3857434 MCC Intronic SCC5 5 112365804 A=0.432 AG A G 0 0 N.A Table 13. Allele-specific imbalance data for 103 SNPs in the validation sample set

227

dbSNP Bonferroni- GRCh37.p13 Candidate MAF Allele 1 Allele 2 Chi-Squared Corrected P- SNP ID Tagged Gene SNP Type Chrom. Chrom. Genotype Allele 1 Allele 2 Locus (accessed Imbalance Imbalance P-value value Position 2012) (n=103) rs1919935 SNX10 Intronic SCC13 7 26399235 C=0.242 CT C T 40 10 2.20905E-05 0.002275321 rs2699814 SNX10 Intronic SCC13 7 26406800 T=0.474 TA T A 11 35 0.000402244 0.041431164 rs6958331 LANCL2 Intronic SCC13 7 55440523 C=0.138 CT C T 31 9 0.000504218 0.051934478 rs6891155 CEP120 Intronic SCC5 5 122684131 G=0.442 GA G A 52 23 0.000812117 0.083648067 rs4835907 GRAMD3 Intronic SCC5 5 125825322 A=0.226 TA T A 18 44 0.000959969 0.098876851 rs1868087 EPAS1 Intronic SCC4 2 46579719 G=0.212 AG A G 0 10 0.001565402 0.161236441 rs1049394 SCRN1 3'UTR SCC13 7 29960804 T=0.309 GA G A 21 5 0.001701872 0.175292836 rs4705086 PPARGC1B Intronic SCC5 5 149121767 C=0.498 TC T C 12 32 0.002568832 0.264589653 rs34166 C7orf16 Intronic SCC13 7 31730776 T=0.467 TA T A 8 25 0.003083186 0.317568176 rs10199201 EPAS1 Intronic SCC4 2 46577162 C=0.172 CT C T 11 1 0.003892417 0.40091898 rs34850 CDO1 Intronic SCC5 5 115142745 C=0.472 GC G C 35 15 0.004677735 0.48180673 rs874869 FBXO11 Intronic SCC4 2 48094337 G=0.431 CG C G 27 10 0.00519346 0.534926389 rs6949451 HOXA10 Intronic SCC13 7 27215041 T=0.415 CT C T 30 12 0.005478554 0.564291026 rs39099 CHN2 Intronic SCC13 7 29293095 A=0.271 GA G A 33 14 0.005580985 0.57484141 rs1534696 SNX10 Intronic SCC13 7 26397239 C=0.348 GT G T 9 25 0.006069973 0.625207208 rs10080096 YTHDC2 Intronic SCC5 5 112879405 G=0.310 GT G T 11 28 0.006485308 0.667986739 rs4722968 PLEKHA8 Intronic SCC13 7 30109718 G=0.196 CG C G 4 16 0.007290358 0.750906908 rs2034327 EPAS1 Intronic SCC4 2 46549040 G=0.455 CG C G 29 12 0.007931924 0.816988156

22 rs171527 FBN2 Intronic SCC5 5 127662706 T=0.144 TC T C 29 12 0.007931924 0.816988156 8 rs9973653 EPAS1 Intronic SCC4 2 46548109 T=0.490 GT G T 7 21 0.008150972 0.839550109

rs2956540 LOX Intronic SCC5 5 121409180 C=0.351 GC G C 17 5 0.010515247 1 rs4953344 EPAS1 Intronic SCC4 2 46552458 C=0.085 CT C T 15 4 0.011616892 1 rs1862175 Intergenic Intergenic SCC5 5 121532861 C=0.424 AG A G 7 20 0.012354586 1 rs3770752 QPCT Intronic SCC4 2 37576136 C=0.220 CT C T 29 13 0.013554668 1 rs2390054 PRDM5 Intronic SCC13 4 121783018 G=0.454 GT G T 39 21 0.020136753 1 rs10182633 FBXO11 Intronic SCC4 2 48091408 T=0.258 TA T A 26 12 0.023140934 1 rs2537742 FBXO11 Intronic SCC4 2 48039044 C=0.163 GA G A 15 5 0.02534732 1 rs11241892 GRAMD3 Intronic SCC5 5 125773331 A=0.338 CA C A 24 42 0.0267157 1 rs1947582 CSNK1A1 Intronic SCC5 5 148892095 C=0.293 AG A G 25 12 0.032582803 1 rs843576 PRDM5 Intronic SCC13 4 121645319 G=0.179 AG A G 16 6 0.033006262 1 rs10213948 GRAMD3 Intronic SCC5 5 125826191 T=0.198 TC T C 8 18 0.049860213 1 rs11955699 MCC Intronic SCC5 5 112498949 G=0.440 GT G T 37 22 0.050839319 1 rs2112487 PRDM6 Intronic SCC5 5 122506237 T=0.169 CT C T 33 19 0.052203647 1 rs2304060 PDGFRB Intronic SCC5 5 149501751 A=0.411 CA C A 35 21 0.06136884 1 rs4395533 Intergenic Intergenic SCC13 4 121481950 A=0.496 GA G A 41 26 0.066870777 1 Table 14. Allele-specific imbalance data for 103 SNPs in the discovery sample set and validation sample set combined continued 228

Table 14 continued dbSNP Bonferroni- GRCh37.p13 Candidate MAF Allele 1 Allele 2 Chi-Squared Corrected P- SNP ID Tagged Gene SNP Type Chrom. Chrom. Genotype Allele 1 Allele 2 Locus (accessed Imbalance Imbalance P-value value Position 2012) (n=103) rs7801318 CHN2 Intronic SCC13 7 29384157 T=0.341 CT C T 8 17 0.07186065 1 rs3087822 CRIPT 3'UTR SCC4 2 46852033 G=0.431 GA G A 11 21 0.077099888 1 rs639680 MCC Intronic SCC5 5 112562163 C=0.181 GC G C 11 21 0.077099888 1 rs4572999 ZNF608 Intronic SCC5 5 124024209 G=0.239 CG C G 34 21 0.07961582 1 rs4401454 BMPR1B Intronic SCC13 4 95741658 G=0.188 GA G A 25 39 0.080118333 1 rs32216 FBN2 Intronic SCC5 5 127640076 C=0.242 TC T C 25 39 0.080118333 1 rs17326236 PLEKHA8 Intronic SCC13 7 30091123 A=0.136 AG A G 3 9 0.08326454 1 rs6897488 GRAMD3 Intronic SCC5 5 125769679 C=0.410 CT C T 51 35 0.084469058 1 rs16881927 FKBP9 Intronic SCC13 7 33042691 C=0.200 CT C T 15 7 0.088081528 1 rs2930807 HERC3 Intronic SCC13 4 89552006 A=0.412 GA G A 13 23 0.09558073 1 rs1864828 HNRPLL Intronic SCC4 2 38824781 A=0.134 AT A T 16 8 0.102470455 1 rs3857434 MCC Intronic SCC5 5 112365804 A=0.432 AG A G 16 8 0.102470455 1 rs7568481 LRPPRC Intronic SCC4 2 44145374 G=0.335 GT G T 24 14 0.104757513 1 rs2043912 MTA3 Intronic SCC4 2 42935481 G=0.334 AG A G 4 10 0.108809459 1 rs3095902 NDST1 Intronic SCC5 5 149926998 C=0.450 CT C T 30 19 0.116083175 1 rs7719910 CSF1R Intronic SCC5 5 149445987 T=0.145 GT G T 9 17 0.116664507 1

2 rs1010101 CSF1R Intronic SCC5 5 149489110 C=0.415 CT C T 36 24 0.121335281 1

2

9 rs4835948 FBN2 Intronic SCC5 5 127870190 T=0.494 CT C T 42 29 0.122875422 1 rs9327244 Intergenic Intergenic SCC5 5 121597391 G=0.292 GC G C 25 16 0.159853739 1 rs343201 PRDM5 Intronic SCC13 4 121797035 G=0.459 GA G A 46 34 0.179712587 1 rs256938 FEM1C Intronic SCC5 5 114868337 C=0.377 CA C A 33 23 0.181449306 1 EPCAM rs3923559 Intronic SCC4 2 47606774 T=0.344 CT C T 41 30 0.191736415 1 (TACSTD1) rs7811609 KBTBD2 Intronic SCC13 7 32930597 T=0.265 TC T C 29 20 0.198542878 1 rs17153429 SNX10 Intronic SCC13 7 26340970 A=0.249 GA G A 17 25 0.217044022 1 rs4953013 DYNC2LI1 Intronic SCC4 2 44007075 G=0.402 GA G A 15 9 0.220671492 1 rs1513807 FAM13A Intronic SCC13 4 89961290 T=0.499 GT G T 34 25 0.241317579 1 rs728989 SMARCAD1 Intronic SCC13 4 95164481 G=0.307 CG C G 28 20 0.248213251 1 rs13359059 GRAMD3 Intronic SCC5 5 125799373 C=0.450 CT C T 22 15 0.249817624 1 rs356168 SNCA Intronic SCC13 4 90674431 A=0.405 GA G A 29 21 0.25789921 1 rs11241891 GRAMD3 Intronic SCC5 5 125768020 T=0.268 TC T C 21 29 0.25789921 1 rs919751 PDGFRB Intronic SCC5 5 149505489 C=0.257 CT C T 22 30 0.267257669 1 rs6886362 ZNF608 Intronic SCC5 5 124047163 G=0.277 GT G T 34 26 0.30169982 1 rs4953012 DYNC2LI1 Intronic SCC4 2 44006464 C=0.204 CA C A 20 14 0.303483916 1 rs17653932 Intergenic Intergenic SCC5 5 148831133 T=0.275 TC T C 35 27 0.30962914 1 continued 229

Table 14 continued dbSNP Bonferroni- GRCh37.p13 Candidate MAF Allele 1 Allele 2 Chi-Squared Corrected P- SNP ID Tagged Gene SNP Type Chrom. Chrom. Genotype Allele 1 Allele 2 Locus (accessed Imbalance Imbalance P-value value Position 2012) (n=103) rs10516930 GRID2 Intronic SCC13 4 94557686 T=0.343 TC T C 29 22 0.32698936 1 rs2346175 EPAS1 Intronic SCC4 2 46571310 T=0.267 TC T C 22 29 0.32698936 1 rs4240406 ALDH7A1 Intronic SCC5 5 125882825 T=0.496 TC T C 11 7 0.345778592 1 Coding/Exonic rs6862252 ZNF608 SCC5 5 123983915 G=0.369 GT G T 27 21 0.386476232 1 /Missense rs3844295 HERC3 Intronic SCC13 4 89527665 A=0.417 GA G A 13 9 0.393768636 1 rs38410 GGCT Intronic SCC13 7 30540399 G=0.361 GT G T 25 31 0.422678075 1 rs4579258 ZNF608 Intronic SCC5 5 124037371 G=0.419 AG A G 22 17 0.423339642 1 rs34179 C7orf16 Intronic SCC13 7 31741035 T=0.390 CT C T 24 19 0.445765913 1 rs12516924 NDST1 Intronic SCC5 5 149901834 G=0.233 GA G A 23 28 0.483839853 1 rs6964512 Intergenic Intergenic SCC13 7 26026149 G=0.452 GT G T 15 19 0.492716678 1 rs1513721 LOC101927052 Intronic SCC13 4 122411527 G=0.318 CA C A 16 20 0.504985076 1 rs2963765 Intergenic Intergenic SCC5 5 114693074 G=0.388 TC T C 27 32 0.515082279 1 rs221171 Intergenic Intergenic SCC13 7 29006139 G=0.455 GC G C 14 11 0.548506236 1 rs960106 FBXO11 Intronic SCC4 2 48078838 C=0.347 CT C T 7 5 0.563702862 1 rs4705415 PDGFRB Intronic SCC5 5 149517235 A=0.400 AG A G 41 36 0.568811401 1 rs3768730 EPAS1 Intronic SCC4 2 46592524 G=0.285 GT G T 25 29 0.586213683 1 23 rs1077724 PDGFRB Intronic SCC5 5 149504115 T=0.278 AT A T 39 35 0.641938221 1

0 rs2622604 ABCG2 Intronic SCC13 4 89078924 T=0.171 CT C T 23 20 0.647314785 1

rs3770755 QPCT Intronic SCC4 2 37574134 G=0.298 GC G C 23 26 0.668235142 1 rs330787 FBXO11 Intronic SCC4 2 48041377 A=0.357 TC T C 14 12 0.694886602 1 rs2893245 SNX10 Intronic SCC13 7 26352073 G=0.409 AG A G 16 14 0.715000655 1 rs353237 Intergenic Intergenic SCC5 5 148831850 C=0.223 CT C T 38 35 0.725495743 1 rs4953353 EPAS1 Intronic SCC4 2 46567276 T=0.385 GT G T 19 21 0.751829634 1 rs4836261 GRAMD3 Intronic SCC5 5 125786305 T=0.369 GT G T 25 23 0.772829993 1 rs2284218 CRHR2 Intronic SCC13 7 30714333 T=0.432 TC T C 27 25 0.781511295 1 rs2775 ALDH7A1 3'UTR SCC5 5 125880589 C=0.392 AG A G 26 28 0.785494747 1 rs4705073 Intergenic Intergenic SCC5 5 148832326 C=0.428 TC T C 30 32 0.799495362 1 rs1008776 LOC101927421 Intronic SCC5 5 124480600 C=0.426 CT C T 43 41 0.827259347 1 rs7737512 MCC Intronic SCC5 5 112618167 G=0.132 GA G A 17 16 0.861804433 1 rs336030 FAM82A1 Intronic SCC4 2 38266996 G=0.327 CA C A 18 19 0.869417061 1 rs966087 LMNB1 Intronic SCC5 5 126170910 G=0.319 GT G T 31 30 0.898119787 1 rs2254161 Intergenic Intergenic SCC5 5 148823662 G=0.356 GT G T 31 32 0.899741299 1 rs2973586 LMNB1 Intronic SCC5 5 126130101 C=0.369 GA G A 32 32 1 1 rs4078254 MCC Intronic SCC5 5 112456067 A=0.444 GT G T 32 32 1 1 Table 14. Allele-specific imbalance data for 103 SNPs in the discovery sample set and validation sample set comb 230

231

232