<<

Washington University in St. Louis Washington University Open Scholarship

All Theses and Dissertations (ETDs)

5-24-2010 Genetic Analysis of the PI3K/AKT/mTOR Signaling Pathway Janna Hutz Washington University in St. Louis

Follow this and additional works at: https://openscholarship.wustl.edu/etd

Recommended Citation Hutz, Janna, "Genetic Analysis of the PI3K/AKT/mTOR Signaling Pathway" (2010). All Theses and Dissertations (ETDs). 887. https://openscholarship.wustl.edu/etd/887

This Dissertation is brought to you for free and open access by Washington University Open Scholarship. It has been accepted for in All Theses and Dissertations (ETDs) by an authorized administrator of Washington University Open Scholarship. For more information, please contact [email protected].

WASHINGTON UNIVERSITY IN ST. LOUIS

Division of Biology and Biomedical Sciences

Quantitative Human and Statistical

Dissertation Examination Committee: Co‐chair: Howard McLeod Co‐chair: Michael Province Ingrid Borecki Anne Bowcock Raphael Kopan John Rice

GENETIC ANALYSIS OF THE PI3K/AKT/MTOR SIGNALING PATHWAY

by

Janna Elizabeth Hutz

A dissertation presented to the Graduate School of Arts and Sciences of Washington University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

August 2010

Saint Louis, Missouri

copyright by

Janna Elizabeth Hutz

2010

Abstract

Cancer is a leading cause of human death, and it is fundamentally attributable to dysfunctional . The PI3K/AKT/mTOR pathway is an important pro‐ growth intracellular signaling cascade that is often inappropriately activated in a wide array of . Efforts to develop anticancer drugs have therefore focused, in part, on identifying PI3K/AKT/mTOR pathway inhibitors. However, patient response to some such inhibitors is mixed, with some patients experiencing a paradoxical activation of the pathway following treatment. It is therefore necessary to better understand the nature of the PI3K/AKT/mTOR pathway and how it varies in different individuals. The work presented here used cell lines from families to measure the activity of three PI3K/AKT/mTOR pathway members (AKT1, p70S6K and 4E‐BP1) in a variety of contexts, including under baseline conditions and in response to treatment with different PI3K/AKT/mTOR pathway inhibitors.

Traditional genetic analyses were used to identify pathway activation phenotypes that were influenced by genetic variation, and genomic regions harboring variation were identified. A new tool for ranking candidate was developed and used to select promising genes within these regions for follow‐up. Genotyping and association tests of SNPs in these genes identified four variants that were associated with two baseline PI3K/AKT/mTOR pathway activation phenotypes. These represent the first studies to find genetic variants that influence post‐translational protein modifications. In addition, the identified SNPs may shed light on normal pathway function as well as new mechanisms for pathway inhibition.

ii

Acknowledgments

I would like to acknowledge the sources that have financially supported this work. I was primarily supported by the Mr. and Mrs. Spencer T. Olin Fellowship for

Women in Graduate Study, which is co‐sponsored by the Monticello College

Foundation and Washington University in St. Louis. I also received a National

Defense Science and Engineering Graduate Fellowship. The Division of Biology and

Biomedical Sciences (DBBS) provided me with a DBBS Scholar award that paid for various materials, as well as a travel grant for my attendance at a conference.

Finally, the lab of my mentor, Howard McLeod, provided financial support to supplement my Olin Fellowship during the last two years of my graduate study.

This dissertation is dedicated to my family and friends who have supported me during my graduate studies. I would like to thank my parents, Peggy Majher and

John Hutz, for their support and encouragement and my sister, Erika, for her inspiration. I am greatly indebted to my husband, Xiaohan Hu, for his support and presence. I would also like to thank my labmates: Aldi, Derek, Jim, Rosa and Jeanne

(from the DSG), as well as Ryan, Laura, Janelle, Tammy, Lorri, Mervi, Aaron, Todd,

Michael, Cindy and Peggy (from the McLeod lab/UNC). My non‐biologist friends, the

IA and B‐13 ladies, helped distract me from the lab, and I thank them for that.

Finally, I would like to thank my mentors, Howard and Mike, as well as my committee in its entirety for helping improve my research and my science skills. I hope to make my family, friends, co‐workers and mentors proud with this research and the research I will conduct in my future career.

iii

Table of Contents

Acknowledgements...... iii List of Figures ...... vii List of Tables ...... viii List of Selected Abbreviations...... ix

Chapter 1: An introduction to , cell signaling and pharmacogenomics...... 1 1.1 Abstract...... 2 1.2 Cancer in humans...... 3 1.2.1 Common cancers ...... 3 1.2.2 Environmental risk factors ...... 5 1.2.3 Genetic risk factors...... 8 1.2.3.1 regulation pathway ...... 10 1.2.3.2 DNA damage response pathway ...... 11 1.2.3.3 Cell death pathways...... 15 1.2.3.4 Growth signaling pathways...... 20 1.3 The PI3K/AKT/mTOR signaling pathway ...... 23 1.3.1 receptors...... 24 1.3.2 PI3Ks and PTEN...... 26 1.3.3 AKT...... 30 1.3.4 mTOR...... 34 1.3.5 Downstream effectors of mTOR ...... 36 1.4 Drugs targeting the PI3K/AKT/mTOR pathway...... 39 1.4.1 Drugs targeting growth factor receptors...... 40 1.4.2 Inhibitors of PI3Ks...... 42 1.4.3 Inhibitors of AKT and its regulatory proteins ...... 44 1.4.4 Inhibitors of mTOR...... 46 1.5 Pharmacogenomics of cancer treatment...... 48 1.5.1 Successes in cancer pharmacogenomics...... 49 1.5.2 Pharmacogenomics and the PI3K/AKT/mTOR pathway ...... 50 1.5.3 The rapamycin problem...... 52 1.6 Current strategies for genetic analysis...... 53 1.6.1 Genome‐wide strategies for genetic analysis ...... 53 1.6.1.1 Linkage analysis...... 54 1.6.1.2 Genome‐wide association studies...... 57 1.6.1.3 Direct whole‐genome sequencing...... 59 1.6.2 Candidate studies...... 60

Chapter 2: Identification of loci that influence heritable signaling phenotypes..72 2.1 Abstract...... 73 2.2 Introduction...... 74 2.3 Materials and Methods...... 77

iv

2.3.1 Cell culture ...... 77 2.3.2 Western blotting...... 78 2.3.3 ELISAs ...... 79 2.3.4 Data cleaning, heritability and linkage analysis ...... 80 2.4 Results...... 81 2.4.1 Selection of experimental method for measuring signaling proteins...82 2.4.2 Quality control analysis of signaling phenotypes ...... 84 2.4.3 Variation and heritability of signaling phenotypes in lymphoblastoid cell lines...... 85 2.4.4 Linkage analysis of heritable phenotypes...... 86 2.5 Discussion...... 87

Chapter 3: CANDID: A flexible method for prioritizing candidate genes for complex human traits...... 97 3.1 Abstract...... 98 3.2 Introduction...... 99 3.3 Methods ...... 102 3.3.1 Overview of scoring ...... 103 3.3.2 Scoring criteria...... 103 3.3.2.1 Publication data and scoring ...... 103 3.3.2.2 Protein domain data and scoring...... 105 3.3.2.3 Conservation data and scoring ...... 105 3.3.2.4 Expression data and scoring...... 106 3.3.2.5 Protein‐protein interaction data and scoring ...... 107 3.3.2.6 Linkage data and scoring...... 108 3.3.2.7 Association data and scoring...... 109 3.3.2.8 Custom data and scoring...... 110 3.3.3 Final score determination and ranking...... 110 3.3.4 ROC curve calculation and analysis...... 111 3.3.5 Analysis of known genes responsible for traits in OMIM...... 111 3.3.6 Analysis of recently characterized causal genes for complex traits ...... 113 3.3.7 Databases and computing ...... 115 3.4 Results...... 116 3.4.1 Genes associated with complex traits in OMIM ...... 116 3.4.2 Recently published causal genes for complex traits...... 117 3.5 Discussion...... 119

Chapter 4: Selection and testing of variants for association with PI3K/AKT/mTOR signaling phenotypes ...... 145 4.1 Abstract...... 146 4.2 Introduction...... 147 4.3 Materials and Methods...... 149 4.3.1 Candidate gene selection ...... 150

v

4.3.2 SNP selection and genotyping ...... 150 4.3.3 Cell culture, cDNA preparation and sequencing...... 151 4.3.4 CEPH association testing ...... 152 4.3.5 SCOTROC sample and clinical trait association testing...... 153 4.4 Results...... 155 4.5 Discussion...... 159

Chapter 5: A search for loci influencing the PI3K/AKT/mTOR pathway signaling response to inhibitors of pathway members...... 175 5.1 Abstract...... 176 5.2 Introduction...... 177 5.3 Materials and Methods...... 181 5.3.1 Cell culture, protein preparation and ELISA assays...... 181 5.3.2 Data cleaning, basic statistics and phenotype definitions...... 182 5.3.3 Data transformation and heritability analyses ...... 183 5.3.4 Linkage analysis...... 184 5.3.5 Candidate gene selection ...... 184 5.4 Results...... 185 5.4.1 Variation in PI3K/AKT/mTOR drug response phenotypes...... 185 5.4.2 Relationships between phenotypes...... 188 5.4.3 Heritabilities of twelve drug response phenotypes...... 191 5.4.4 Linkage analysis of heritable phenotypes...... 193 5.4.5 Candidate gene identification and assessment...... 194 5.5 Discussion...... 196

Chapter 6: Conclusions and future directions...... 209 6.1 Conclusions ...... 210 6.2 Future directions ...... 213 6.3 Final reflections on current and future research...... 219

References...... 222

vi

List of Figures

Figure 1.1: Estimated statistics on new cancer cases and cancer deaths in the United States for 2009 ...... 62 Figure 1.2: Schematic diagram of the PI3K/AKT/mTOR signaling pathway...... 63 Figure 1.3: Illustration of selected drugs that target the PI3K/mTOR/AKT pathway...... 64

Figure 2.1: Development of Western blot assay for signaling proteins ...... 91 Figure 2.2: Quality control measurements for ELISAs...... 92 Figure 2.3: Variation in AKT1, p70S6K and 4E‐BP1 phenotypes...... 93 Figure 2.4: Genomewide linkage analysis results for AKT1 and p70S6K ratio phenotypes...... 94 Figure 2.5: Linkage to 3 of correlated AKT1 and p70S6K ratio phenotypes...... 95

Figure 3.1: Depiction of CANDID’s information flow...... 125 Figure 3.2: Performance of 5 criteria in ranking OMIM genes ...... 126 Figure 3.3: Success in predicting OMIM genes using 5 main criteria...... 127 Figure 3.4: Performance of 5 criteria in ranking recently identified causal genes...... 128 Figure 3.5: Success in predicting recently published genes using 5 data‐independent criteria ...... 129

Figure 4.1: Locations of candidate genes relative to linkage peaks...... 164 Figure 4.2: Linkage disequilibrium surrounding statistically significant SNPs.....165

Figure 5.1: Variation in twelve drug response phenotypes in unrelated individuals...... 202 Figure 5.2: AKT1 response to rapamycin in 120 cell lines...... 203 Figure 5.3: Linkage scan of AKT1 response to rapamycin (“RAratio” phenotype)...... 204

vii

List of Tables

Table 1.1: Selected cell cycle genes with known associations with cancer ...... 65 Table 1.2: Selected DNA damage response genes with known associations with cancer...... 66 Table 1.3: Selected cell death genes with known associations with cancer...... 68 Table 1.4: Selected growth and genes with known associations with cancer...... 68 Table 1.5: Cancer‐associated variants and other alterations in genes in the PI3K/AKT/mTOR pathway ...... 69

Table 2.1: Statistically significant covariates of AKT1, p70S6K and 4E‐BP1 phenotypes...... 96 Table 2.2: Heritabilities of AKT1, p70S6K and 4E‐BP1 phenotypes...... 96

Table 3.1: Information used for OMIM trait analysis ...... 130 Table 3.2: Genes with recently identified complex human trait associations ...... 136 Table 3.3: Information used for analysis of recently characterized genes ...... 139 Table 3.4: Training genes, loci and results from Endeavour analysis...... 143

Table 4.1: SNP genotyping assay details...... 166 Table 4.2: Top ten identified candidate genes for the AKT1 and p70S6K phosphorylated:total protein ratio phenotypes...... 171 Table 4.3: All CEPH association test results for SNPs on 14 and 3 ...... 172

Table 5.1: Correlations between “ratio” and “phospho ratio” phenotypes...... 205 Table 5.2: Spearman correlations between drug response phenotypes and growth rate ...... 205 Table 5.3: Spearman correlations between baseline phenotypes and growth rate...... 206 Table 5.4: Correlations between baseline phosphorylated:total protein ratios and drug response “ratio” phenotypes...... 206 Table 5.5: Heritabilities of twelve drug response phenotypes...... 207 Table 5.6: Definitions, distributions and heritabilities of alternate AKT1‐rapamycin phenotypes...... 207 Table 5.7: Maximum LOD scores of AKT1‐rapamycin phenotypes...... 208 Table 5.8: Candidate genes under linkage peak for “RAratio” phenotype...... 208

viii

List of Selected Abbreviations

4E‐BP1 4E binding protein 1 53BP1 p53 binding protein 1 ACGR Approach for Candidate Gene Retrieval ALT alternative lengthening of telomeres AP Apurinic/apyrimidinic Apaf‐1 apoptotic peptidase activating factor 1 APC adenomatous polyposis coli ASK1 signal regulating 1 ATM ataxia telangiectasia mutated ATP ATR ataxia telangiectasia and Rad3 related ATRIP ATR interacting protein AUC area under the curve BAD BCL2‐associated agonist of cell death BAK BCL2‐antagonist/killer BARD1 BRCA1 associated RING domain 1 BAX BCL2‐associated X protein BCL2 B‐cell CLL/lymphoma 2 BER base excision repair BID BH3 interacting domain death agonist BIND Biomolecular Interaction Network Database BioGRID Biological General Repository for Interaction Datasets BLM Bloom syndrome, RecQ ‐like BRCA1 breast and ovarian cancer susceptibility protein 1 BRCA2 breast and ovarian cancer susceptibility protein 2 BSA bovine serum albumin BUB1B budding uninhibited by benzimidazoles 1 homolog beta BUB3 budding uninhibited by benzimidazoles 3 homolog CA‐125 cancer antigen 125 CANDID candidate gene identification CASP8 8 CBP CREB binding protein CCND1 cyclin D1 CDC2 cell division control protein 2 homolog CDC20 cell division cycle 20 homolog CDC25B cell division cycle 25 homolog B CDC25C cell division cycle 25 homolog C CDD Conserved Domain Database CDK cyclin‐dependent kinase CDKN2A cyclin‐dependent kinase inhibitor 2A cDNA complementary DNA

ix

CEPH Centre d’Etude du Polymorphisme Humain CGI combining and protein‐protein interaction data CIA candidate identification algorithm cM centiMorgan COG Clusters of Orthologous Groups COX2 cyclooxygenase 2 CRP C‐reactive protein CSA Cockayne syndrome group A protein CSB Cockayne syndrome group B protein CV coefficient of variation CYP2D6 cytochrome P450, family 2, subfamily D, polypeptide 6 DAG diacylglycerol DBBS Division of Biology and Biomedical Sciences DGP Disease Gene Prediction DMSO dimethyl sulfoxide DNA deoxyribonucleic acid DNA‐PKCS DNA‐dependent catalytic subunit DRS discounted rating system DSG Division of Statistical Genomics DSM‐IV Diagnostic and Statistical Manual of Mental Disorders, fourth edition DSS1 deleted in split hand/split foot protein 1 ECOG Eastern Cooperative Oncology Group EDTA ethylenediaminetetraacetic acid EGF(R) epidermal growth factor (receptor) EGTA ethylene glycol tetraacetic acid eIF3 eukaryotic initiation factor 3 eIF4E eukaryotic translation initiation factor 4E ELISA ‐linked immunosorbent assay eNOS endothelial nitric oxide synthase EP300 E1A binding protein p300 eQTL expression quantitative trait /loci ERCC1 excision repair cross‐complementing rodent repair deficiency, complementation group 1 ERK(1/2) extracellular signal‐regulated kinase (1/2) EWSR1 Ewing sarcoma breakpoint region 1 FADD FAS‐associated death domain protein FAP familial adenomatous polyposis FASLG FAS FGF(R) fibroblast growth factor (receptor) FIGO Federation Internationale de Gynecologie et d’Obstetrique FKBP12 FK506 binding protein 1A, 12kDa

x

FLN functional linkage network FRAP1 FK506 binding protein 12‐rapamycin associated protein 1 G0 growth phase 0 (resting phase) G1 growth phase 1 G2 growth phase 2 G2D Genes to Diseases GAPDH glyceraldehyde‐3‐phosphate dehydrogenase GFR growth factor receptor GFSST gene function similarity search tool GIST gastrointestinal stromal tumor GLUT1/4 transporter type ¼ GNF Genomics Institute of the Novartis Research Foundation GO GPCR G protein‐coupled receptor GRAP2 GRB2‐related adaptor protein 2 GRB2 growth factor receptor‐bound protein 2 GRM7 glutamate receptor, metabotropic 7 GSK3β glycogen synthase kinase 3 beta GTP H2AX/γH2AX H2A histone family, member X/γH2A histone family, member X HIF inducible factor HPRD Human Protein Reference Database HR homologous recombination HSP90 heat shock protein 90 HSP90AA1 heat shock protein 90 alpha (cytosolic), class A member 1 IGF1(R) ‐like growth factor 1 (receptor) IRS‐1 substrate 1 KAT2B K(lysine) acetyltransferase 2B KEGG Kyoto Encyclopedia of Genes and Genomes KRAS Kirsten rat sarcoma viral oncogene homolog LCL(s) lymphoblastoid cell line(s) LD linkage disequilibrium LOD logarithm of odds M mitosis phase MAD1L1 MAD1 mitotic arrest deficient‐like 1 MAD2L1 MAD2 mitotic arrest deficient‐like 1 MAD2L2 MAD2 mitotic arrest deficient‐like 2 MAPK activated protein kinase MDC1 mediator of DNA‐damage checkpoint 1 MEK(1/2) MAPK/ERK kinase (1/2) MeSH medical subject headings MKRN2 makorin ring finger protein 2 MLH MutL homolog

xi

mLST8 mammalian lethal with SEC13 protein 8 MOPS 3‐(N‐morpholino)propanesulfonic acid MRE11 meiotic recombination 11 homolog mRNA messenger ribonucleic acid MS mass spectrometry MSH MutS homolog mTOR mechanistic target of rapamycin mTORC1/2 mTOR complex ½ MW molecular weight MYH/MUTYH mutY homolog MYT1 myelin factor 1 NBS1 Nijmegen breakage syndrome 1 NCBI National Center for Biotechnology Information NDRG1 N‐myc downstream regulated 1 NER nucleotide excision repair NF‐κB nuclear factor kappa‐light‐chain‐enhancer of activated B cells NHEJ non‐homologous end joining NK natural killer OD optical density OMIM Online Mendelian Inheritance in Man p16(INK4) p16, inhibits CDK4 p70S6K 70 kDa kinase 1 p90RSK ribosomal protein S6 kinase, 90kDa pAKT phosphorylated AKT PALB2 partner and localizer of BRCA2 PARP poly (ADP‐ribose) polymerase PBS phosphate buffered saline PCR polymerase chain reaction PDCD4 4 PDGF(R) platelet‐derived growth factor (receptor) PDK1 3‐phosphoinositide dependent protein kinase‐1 PH pleckstrin PHLPP1/2 PH domain and rich repeat protein phosphatase 1/2 PI3K phosphatidylinositol 3 kinase PIK3C PI3K catalytic subunit PIK3R PI3K regulatory subunit PIP2 phosphatidylinositol‐4,5‐bisphosphate PIP3 phosphatidylinositol‐3,4,5‐trisphosphate PKB PKC PMS postmeiotic segregation increased 1 PMSF phenylmethanesulfonylfluoride POCUS prioritization of candidate genes using statistics

xii

POT1 protection of telomeres 1 PP2A protein phosphatase 2A PRAS40 40 kDa proline‐rich AKT substrate PTEN phosphatase and tensin homolog PUMA p53 up‐regulated modulator of apoptosis PVDF polyvinylidene fluoride QTL quantitative trait locus/loci RB/RB1 retinoblastoma/retinoblastoma 1 RFLP restriction fragment length polymorphism Rheb Ras homolog enriched in RIPA radioimmunoprecipitation assay RNA ribonucleic acid RNF8 ring finger protein 8 RNF168 ring finger protein 168 ROC receiver operating characteristic RPA replication protein A1 RPMI Roswell Park Memorial Institute medium RTK receptor S DNA synthesis phase S6rp S6 ribosomal protein SCOTROC Scottish Randomised Trial in Ovarian Cancer SDS sodium dodecyl sulfate SGK1 serum/glucocorticoid regulated kinase 1 SH2 Src homology 2 SHIP SH2 containing inositol phosphatase SMART Simple Modular Architecture Research Tool SNP simple nucleotide polymorphism SOLAR Sequential Oligogenic Linkage Analysis Routines SOS son of sevenless TBS Tris‐buffered saline TBST Tris‐buffered saline plus Tween 20 TIN2 TRF1‐interacting nuclear protein 2 TNF TOM Transcriptomics of OMIM TOR1/2 target of rapamycin 1/2 TPP1 TIN2 interacting protein 1 TRAIL TNF‐related apoptosis‐inducing ligand TRF1 telomeric repeat binding factor 1 TRF2 telomeric repeat binding factor 2 Tris tris(hydroxymethyl)aminomethane TSC1/2 ½ UGT1A1 UDP glucuronosyltransferase 1 family, polypeptide A1 UNC University of North Carolina

xiii

VEGF(R) vascular endothelial growth factor (receptor) VHL von Hippel‐Lindau WDR20 WD repeat domain 20 WRN Werner syndrome, RecQ helicase‐like XPA‐XPG xeroderma pigmentosum, complementation groups A‐G XRCC4 X‐ray repair complementing defective repair in Chinese hamster cells 4 XRCC6 X‐ray repair complementing defective repair in Chinese hamster cells 6 YWHAH tyrosine 3‐monooxygenase/tryptophan 5‐monooxygenase activation protein, eta polypeptide

xiv

Chapter 1

An introduction to cancer, cell signaling and pharmacogenomics

1

1.1 Abstract

Cancer is a leading cause of death in humans, and the search for cures has long been a top priority of scientific research. Work in the fields of molecular and cell biology has revealed many of the cellular changes that precipitate the formation of tumors, and they are widely varied. Central to many of these changes are alterations of signaling pathways used by cells to sense and respond to their environments. In particular, signaling pathways that normally tightly regulate cell growth are often deregulated in cancer. The molecular components of one of these pathways, the PI3K/AKT/mTOR pathway, have been very well characterized in the last ten to twenty years. With increasing understanding of the pathway have come intense efforts to create drugs that can target and regulate these molecules in cancerous cells. Several drugs that target various pathway components have been approved, and more are in development, but these therapies are not perfect and do not work in a large percentage of patients. Until better therapies are developed, pharmacogenomics represents an opportunity to better understand which patients will benefit from the drugs that are currently available. Pharmacogenomics has already produced some recommendations for modifying the treatment of this pathway depending on the specific mutations in a patient’s tumor, but there is still a long way to go. The usage of conventional genetic techniques like linkage analysis and biological materials like cell line collections represents opportunities to improve pharmacogenomic recommendations for current cancer patients.

2

1.2 Cancer in humans

Worldwide, cancer is a leading cause of human death. Data from the World

Health Organization estimated that in 2002, cancer caused approximately 12.49 percent of all deaths, making it the third most common killer after and infectious or parasitic disease [World Health Organization 2004]. In developed countries where lethal infectious and parasitic diseases are relatively rare, cancer is responsible for an even larger percentage of deaths. In the United

States, cancer is the second most common cause of death after cardiovascular disease, causing an estimated 559,888 deaths in 2006, or 23.1 percent of the total

[Heron, et al. 2009].

1.2.1 Common cancers

Cancer is fundamentally a disease of inappropriate cellular regulation. As such, nearly any part of the body that contains cells is prone to developing cancer.

However, cancers occur more frequently in certain parts of the body. The American

Cancer Society estimates that in males, approximately 25 percent of new cancer cases in 2009 were cancers of the prostate [American Cancer Society 2009]. In females, some 27 percent of new cases were breast cancers. Though these two cancers represented the most common forms of newly diagnosed cancer, they were not responsible for the greatest number of cancer‐related deaths during the same year. Cancer of the lung and bronchus caused 30 percent of cancer deaths in men and 26 percent of cancer deaths in women [American Cancer Society 2009]. Other

3

relatively common forms of cancer include colorectal cancer, cancer of the urinary tract, lymphoma, melanoma, renal cancer, thyroid cancer, ovarian cancer, liver cancer, leukemia and oral cancer. With few exceptions, mortality from these cancers is roughly proportional to their frequency in the United States population

(Figure 1.1) [American Cancer Society 2009].

It should be noted that current information on cancer diagnoses and mortality rates does not reflect only on the biological basis for cancer. In fact, there are several other societal factors that may influence the data. A longitudinal view of cancer statistics shows clear changes in the rates of diagnosis and cancer‐related death for several cancers. Specifically, lung cancer‐related deaths started to rise in the 1950s but have recently declined, as have mortality rates for other cancers

[American Cancer Society 2009]. Factors such as large scale changes in screening, prevention, diagnostics and treatment are likely reflected in these longitudinal changes. As an example, the fact that is currently the most commonly diagnosed cancer in women in the United States could be attributed to increased societal awareness of breast cancer, leading to more screening and more diagnoses.

Likewise, the introduction of a variety of effective treatments of breast cancer has reduced the mortality rate. While it is important to note that societal factors play a role in determining an individual’s likelihood to be diagnosed with or die from a particular cancer, it is also important to consider that individual’s environmental and genetic risks.

4

1.2.2 Environmental risk factors

Biological influences that are not inherited genetically are referred to as environmental risk factors. Perhaps the most well known example of an environmental risk factor for cancer is the link between smoking and lung cancer.

Cigarettes contain several known carcinogens – compounds that are known to promote the development of cancer. These carcinogens include chemicals like polycyclic aromatic hydrocarbons, aldehydes and benzene, and they are present both in the particulate and gas phases [American Cancer Society 2009; Lodovici and

Bigagli 2009]. As a result, both smokers and people exposed to secondhand smoke demonstrate an increased likelihood of developing cancer[American Cancer Society

2009; Lodovici and Bigagli 2009]. Though lung cancer is the cancer most strongly associated with smoking due to the direct exposure of the lungs to the carcinogen‐ containing smoke, other cancers have also been linked with smoking. These include but are not limited to cancers of the nose, oral cavity and esophagus as well as cancers of organs that are not directly exposed to smoke, such as the liver, kidney and colon [American Cancer Society 2009; Lodovici and Bigagli 2009].

The mechanism by which smoking and carcinogens increase the risk of cancer in these organs is similar to the mechanisms by which exposure to many other chemicals increases the risk of other cancers. Namely, the carcinogenic chemical compounds in cigarette smoke form adducts with DNA. By bonding in this manner, the DNA is no longer able to replicate faithfully, and when cells divide, they may acquire mutations at locations in the genome affected by the bound

5

carcinogens. DNA adducts have been found more frequently in smokers compared with non‐smokers and in cancer patients compared with healthy individuals [Asami, et al. 1997; Lodovici and Bigagli 2009; Tang, et al. 2001; Tang, et al. 1995].

Another known environmental risk factor for cancer is increased age. In the

United States, approximately 77 percent of cancers are diagnosed in individuals who are age 55 or older [American Cancer Society 2009]. The mechanism of this influence may also be related to DNA replication and repair. As individuals age, their cells undergo repeated rounds of cell division. With each round, rare errors in

DNA replication occur. Cells contain proofreading mechanisms to correct replication errors, but these mechanisms are not completely accurate. Eventually, when a single cell acquires enough mutations, it will become sufficiently deregulated and will grow and divide unchecked. Even though each round of cell division increases the chance a cell will turn cancerous, a process also exists to check this transformation. The telomeres that protect the ends of each chromosome also decay in length during each round of cell division. When they are sufficiently shortened in a cell, the cell should normally undergo senescence (the inability to divide further) or apoptosis (programmed cell death). These processes should also occur if a cell has acquired considerable structural damage to the DNA. However, if the cell has previously acquired mutations that disable these and other cellular checkpoints, the cell may still override its normal functioning to become cancerous.

In the next section, these pathways are further discussed in the context of inherited mutations in them.

6

Being overweight or obese is also likely to increase one’s risk of developing most cancers. A meta‐analysis of 89 studies of 20 diseases found that being overweight or obese increased the risk of developing colorectal, kidney, breast, ovarian, endometrial and [Guh, et al. 2009]. The only cancers that were not associated with being overweight or obese in this study were esophageal cancer and prostate cancer. Multiple other studies confirm the link between increased weight and colorectal, breast, kidney, endometrial and ovarian cancers

[American Cancer Society 2009; Moghaddam, et al. 2007; Schouten, et al. 2008]. The

American Cancer Society estimates that this one environmental influence is linked to 14‐20 percent of all cancer‐related mortality [American Cancer Society 2009]. A hypothesized mechanism for this is that increased insulin in overweight individuals causes higher levels of insulin‐like growth factor 1 (IGF‐1) [Fair, et al. 2007; Singh and Rubin 1993]. IGF‐1 is known to activate pro‐growth and anti‐apoptotic pathways in cells, potentially creating a pre‐cancerous state.

One last major category of environmental risk factors concerns diet.

Excessive consumption of is linked to a variety of cancers, including breast cancer, colorectal cancer, oral cancer and esophageal cancer [American Cancer

Society 2009; Poschl and Seitz 2004; Purohit, et al. 2005]. One possible mechanism by which heavy alcohol may cause cancer involves a byproduct made during the normal digestion of alcohol. Alcohol dehydrogenase converts ethanol to acetaldehyde, a known carcinogen and inhibitor of the DNA repair machinery

[Poschl and Seitz 2004; Purohit, et al. 2005]. Diets high in cooked red meat have

7

also been linked with certain cancers, including colorectal, prostate, breast and pancreatic cancer [Cross and Sinha 2004; Santarelli, et al. 2008]. There are several suggested mechanisms for this relationship, some of which include the possibility that carcinogens might be created when meat is cooked at high temperatures or that excess iron in meat promotes cellular proliferation [Cross and Sinha 2004;

Santarelli, et al. 2008]. In contrast, diets high in fruit, vegetables, fish, whole grains and unsaturated fats appear to protect against the development of cancer [La

Vecchia 2009; Ma and Chapman 2009]. Unfortunately, it is unknown whether these foods have intrinsic anti‐cancer properties or whether diets which include them tend to be beneficial because they usually exclude potentially harmful foods such as excessive red meat, saturated fats and dairy.

1.2.3 Genetic risk factors

While environmental exposures can produce DNA damage and cancer‐ causing mutations, mutations can also be inherited genetically. Earlier studies searching for cancer‐causing mutations focused on families in which related individuals had similar types of cancer. An example of this is familial adenomatous polyposis (FAP). Individuals with FAP develop hundreds or thousands of polyps in the colon, and some of those polyps often develop into colorectal cancer if left untreated [Gardner 1951]. The mutation causing FAP was thought to be dominant and highly penetrant, affecting many members of families that possessed the mutation [Gardner 1951; Naylor and Gardner 1977]. Linkage studies using families

8

affected by FAP yielded results pointing to chromosome 5q21‐22 [Bodmer, et al.

1987; Leppert, et al. 1987; Nakamura, et al. 1988]. Causative mutations were later identified in the APC gene in this locus [Groden, et al. 1991; Nishisho, et al. 1991].

In contrast to these early studies, more recent studies have searched for mutations that are less highly penetrant but potentially more common in the population. Breast cancer is the most commonly diagnosed cancer in women, and though some cases appear to be more familial in nature (as in patients carrying mutations in BRCA1 or BRCA2), others appear to be more sporadic. These sporadic cases may develop as a result of a combination of environmental influences and inherited variants that are more common in the population but less penetrant than the mutations in BRCA1 and BRCA2. Examples of this type of variant were found in the CHEK2 and BARD1 genes. Variants in both genes were present in unaffected individuals at a frequency of about one to two percent but were overrepresented in individuals with breast cancer [Karppinen, et al. 2004; Vahteristo, et al. 2001].

Work is ongoing to identify more rare, highly penetrant (i.e. familial) mutations as well as more common, less penetrant variants that may increase cancer susceptibility. Once mutations are identified, it is important to understand the mechanism by which they predispose a cell to becoming cancerous. One way to look at this question is in the context of the biochemical pathways in the cell that are most commonly affected by cancer. Information on many of these pathways is already well known, and understanding how specific mutations disrupt them is key to understanding and potentially treating cancer.

9

1.2.3.1 Cell cycle regulation pathway

The cell cycle is composed of five stages – G0 (resting phase), G1 (gap phase

1), S (DNA synthesis), G2 (gap phase 2) and M (mitosis) [Alberts 2002; Deep and

Agarwal 2008; Schwartz and Shah 2005]. Two classes of molecules control the cell’s progress through these stages: cyclins and cyclin‐dependent (CDKs)

[Alberts 2002; Molinari 2000]. Three very important points exist in the cell division process: the G1/S checkpoint, the G2/M checkpoint and the spindle checkpoint. The

G1/S checkpoint, also called the restriction checkpoint, is where it is determined whether the cell will remain quiescent or attempt to divide. Major regulators of this checkpoint include p16(INK4), CDK4, cyclin D, p27(KIP1), p21, RB, E2F, cyclin E and

CDK2 [Alberts 2002; Molinari 2000]. The G2/M checkpoint determines whether the cell can enter mitosis, and its key players include cyclin B, CDC2, , MYT1,

CDC25B and CDC25C [Alberts 2002; Molinari 2000]. Lastly, the spindle checkpoint controls accurate cell division at the metaphase stage of mitosis. The major regulators of this checkpoint include CDC2, cyclin B, the anaphase promoting complex (APC), BUB3, MAD1L1, BUB1B, MAD2L1, MAD2L2 and CDC20 [Alberts

2002; Molinari 2000]. Another group of proteins responsible for sensing and repairing DNA damage interacts with the above‐mentioned proteins during several checkpoints, and this will be discussed in the next section.

One of the key hallmarks of cancerous cells is that they are able to divide excessively [Hanahan and Weinberg 2000]. Two potential causes for this are escape

10

from apoptosis and increased sensitivity to growth factor signaling, both of which are discussed in upcoming sections [Hanahan and Weinberg 2000]. However, another cause is inherited or acquired mutation in genes encoding some of the key cell signaling proteins mentioned above. For example, mutations in both the CDK4 gene and the CDKN2A gene, which encodes the p16(INK4) protein, have been associated with familial forms of melanoma [Gruis, et al. 1995; Miller and Mihm

2006; Soufir, et al. 1998; Zuo, et al. 1996]. A polymorphism in CCND1, which encodes cyclin D1, was found to increase susceptibility to colorectal cancer in

Caucasians [Kong, et al. 2001; Le Marchand, et al. 2003]. Perhaps the best‐known example of cell cycle gene mutations leading to cancer is the RB1 gene, which encodes the RB protein. This gene was originally mapped in families with retinoblastoma, a rare childhood cancer of the eye [Canning and Dryja 1989;

Sparkes, et al. 1983]. These and other links between cancers and cell cycle genes are summarized in Table 1.1.

1.2.3.2 DNA damage response pathway

As discussed above, the process of cell division contains several checkpoints where it is decided whether the cell will continue to divide, become quiescent, become senescent or undergo apoptosis. One of the key criteria that must be met for the cell cycle to proceed is that the cell’s DNA be intact [Alberts 2002; Strachan and Read 2004]. As such, proteins involved in sensing and repairing DNA damage

11

and interfacing with the cell cycle regulatory proteins are important in cell division, and by extension, in cancer.

There are several distinct mechanisms for DNA repair, and each mechanism is used for a specific type of damage. For single‐stranded lesions, the cell can use direct repair (mediated by O6‐methylguanine‐DNA methyltransferase), base excision repair (BER, mediated by different DNA glycosylases, AP endonuclease and phosphodiesterase), nucleotide excision repair (NER, mediated by the XPA‐XPG proteins, CSA, CSB, ERCC1, RPA and RAD23) or mismatch repair (mediated by MSH proteins, MLH proteins and PMS proteins) [Alberts 2002; Charames and Bapat

2003; Strachan and Read 2004]. Several of these processes (including base excision repair, nucleotide excision repair and mismatch repair) involve the removal of a stretch of DNA affected by the damage, and as a result, DNA polymerases and are also required to fill in the resulting gap [Charames and Bapat 2003; Strachan and Read 2004]. In response to double‐stranded breaks, the cell can utilize either non‐homologous end‐joining (NHEJ) or homologous recombination (HR) to attempt to repair the damage. Proteins involved in these processes include NBS1, XRCC4,

MRE11, RAD50, p200, p400, Ku70, Ku80, DNA‐PKCS, Artemis, DNA IV, X family

DNA polymerases, XLF, RAD51, WRN, BLM and GEN1 [Hartlerode and Scully 2009;

Strachan and Read 2004].

Some of the proteins involved with homologous recombination also play a role in another form of DNA repair ‐ telomere maintenance. Since telomere length cannot be fully maintained during conventional DNA replication, alternate

12

mechanisms exist to make sure the ends of the chromosomes remain protected. The best‐known mechanism involves telomerase, a RNA‐protein hybrid enzyme that lengthens telomeres when activated [Alberts 2002; Strachan and Read 2004].

Another method is the alternative lengthening of telomere (ALT) mechanism, which uses homologous recombination to copy the repetitive telomere sequence across chromosomes to extend the overall length [Bollmann 2007; Cesare and Reddel

2008]. When these two mechanisms are inactive, telomeres shorten with each cell division. Their eventual loss also means the loss of the telosome/shelterin complex that binds the telomeres and acts as a protective cap on the chromosome ends [Palm and de Lange 2008; Xin, et al. 2008]. Proteins in this complex include TRF1, TRF2,

POT1, RAP1, TIN2 and TPP1 [Xin, et al. 2008]. When this cap is gone, the cell perceives chromosome ends as double‐stranded breaks, and as a result, the cell either attempts to repair the damage by fusing chromosome ends together via non‐ homologous end‐joining or undergoes apoptosis [Palm and de Lange 2008; Xin, et al.

2008]. The eventual death of cells in this manner is considered normal, but cancerous cells may prolong their lives by activating telomerase or the ALT pathway to prevent the eventual uncapping of the chromosome ends [Bollmann 2007;

Hanahan and Weinberg 2000].

If DNA damage occurs and the cell is unsuccessful at repairing it, several key proteins are responsible for signaling the damage so that the cell will not divide.

Some of the first molecules to recognize and bind to sites of DNA damage include the

9‐1‐1 complex (composed of Rad9, Hus1 and Rad1) and Rad17 [Meek 2009; Niida

13

and Nakanishi 2006; Parrilla‐Castellar, et al. 2004]. This then facilitates the recruitment of ATM and ATR, two kinases that play critical signaling roles in the

DNA damage response [Niida and Nakanishi 2006; Parrilla‐Castellar, et al. 2004].

ATM interacts with other key proteins, including NBS1, MRE11 and RAD50, while

ATR forms a complex with ATRIP [Meek 2009; Niida and Nakanishi 2006]. ATR then phosphorylates CHK1, which interfaces with the cell cycle machinery and leads to cell cycle arrest [Meek 2009; Niida and Nakanishi 2006]. In cases of double stranded breaks, the ATM complex phosphorylates H2AX, which then becomes

γH2AX and binds to the damaged DNA, marking it as a site in need of repair

[Hartlerode and Scully 2009; Niida and Nakanishi 2006]. Other facilitator proteins then bind the γH2AX‐marked double stranded breaks, including MDC1, 53BP1,

BRCA1, BARD1, RNF8, RNF168, PALB2, BRCA2, DSS1 and ATM [Hartlerode and

Scully 2009]. Both ATM and ATR are responsible for targeting the MDM2 and

MDM4 proteins for destruction, which frees up p53, a protein that has been nicknamed “the guardian of the genome” [Meek 2009; Niida and Nakanishi 2006;

Strachan and Read 2004]. p53 is also phosphorylated and activated by two other substrates of ATM and ATR: CHK1 and CHK2 [Meek 2009]. When p53 is activated, transcription of genes that encode pro‐apoptotic factors like p21, PUMA and PIG3 is increased, leading to cell cycle arrest [Meek 2009; Niida and Nakanishi 2006].

The DNA damage response is highly complex, requiring dozens of molecules to interact correctly in order to sense, repair and report damage that could be detrimental to the cell. Failure to sense or repair mutations could lead to the

14

acquisition of mutations in oncogenes or tumor suppressors. Additionally, failure to halt cell division when this damage occurs would result in these mutations being passed on to cells newly created through proliferation. For these reasons, many of the genes encoding proteins involved in the DNA damage response have been considered as candidate genes for various cancers, and many cancer‐causing inherited mutations have been identified. As a specific example, mutations in BRCA1 and BRCA2 have long been known to be associated with familial forms of breast and ovarian cancer [Easton, et al. 1993; Wooster, et al. 1995]. Deleterious mutations in these genes often result in truncated forms of their encoded proteins [Casey 1997].

A severely truncated form of the BRCA1 protein would not be able to form any of the three complexes through which it mediates critical processes like DNA repair, cell cycle control and genomic stabilization [Wang, et al. 2009]. Another example of a

DNA damage response protein implicated in cancer is MYH, encoded by the MUTYH gene. MYH is a DNA glycosylase that removes adenines from A/C and A/G mismatches [McGoldrick, et al. 1995]. When the MUTYH gene is mutated, there is an increased likelihood that mutations will be acquired during DNA replication, and this can lead to cancer if the mutations occur in a tumor suppressor or oncogene [Al‐

Tassan, et al. 2002]. A multitude of other cancer‐causing mutations in DNA repair genes exist, and many of them can be found in Table 1.2.

1.2.3.3 Cell death pathways

15

Following DNA damage, errors in the cell cycle or unfavorable extracellular conditions, the normal result is for cells to die. There are three major methods by which cell death can occur, and the method that is used depends on a variety of factors, including the extent of any physical damage and the availability of nutrients.

These three methods are apoptosis, and necrosis [Alberts 2002;

Hotchkiss, et al. 2009]. Of the three, apoptosis is the best understood. It is apoptosis that is initiated when DNA damage or other stressors, such as gamma radiation, are applied to a cell [Alberts 2002; Hotchkiss, et al. 2009]. Cells of the , namely cytotoxic T cells and natural killer (NK) cells, also induce cell death through apoptosis [Hotchkiss, et al. 2009; Kufe, et al. 2003]. In contrast, autophagy is mainly initiated due to certain conditions in the extracellular environment, such as nutrient deprivation. Autophagy occurs under these circumstances because it allows the cells to obtain additional nutrients (therefore avoiding death) by recycling unnecessary cellular components [Hotchkiss, et al.

2009]. Lastly, necrosis mainly occurs in response to rapid injury or nutrient depravation, where the cell does not have sufficient time to undergo apoptosis or autophagy [Alberts 2002; Hotchkiss, et al. 2009]. All of these cellular situations are relevant during tumorigenesis, and it is therefore important to understand these processes and how inherited variation in the relevant genes impacts the development of cancer.

Undergoing apoptosis is a cell’s last line of defense against many of the circumstances mentioned in the previous two sections that would otherwise

16

promote tumorigenesis. Depending on the circumstances that have put the cell in jeopardy, it can use one of two major pathways to undergo apoptosis – the mitochondria‐mediated intrinsic pathway and the death‐receptor mediated extrinsic pathway [Hotchkiss, et al. 2009]. The intrinsic pathway is generally invoked by DNA damage, radiation or free radicals [Hotchkiss, et al. 2009]. As mentioned in the section on DNA damage, p53 can be activated in these circumstances, leading to upregulation or activation of pro‐apoptotic proteins like

PUMA. PUMA and two other pro‐apoptotic proteins, BIM and BID, inhibit anti‐ apoptotic proteins like BCL2 and BCL‐XL [Hotchkiss, et al. 2009]. Following inhibition of these anti‐apoptotic proteins, additional pro‐apoptotic proteins BAX and BAK are freed, which then increase permeability of the mitochondrial membrane and cause the release of [Chowdhury, et al. 2008;

Hotchkiss, et al. 2009]. This results in the formation of the apoptosome, a large complex including Apaf‐1 and activated caspase‐9 [Alberts 2002; Chowdhury, et al.

2008; Hotchkiss, et al. 2009]. Caspase‐9 cleaves and activates caspase‐3 and caspase‐7, which initiate the death cascade [Alberts 2002; Chowdhury, et al. 2008;

Hotchkiss, et al. 2009]. In the death cascade, nuclear lamins are cleaved and DNAse is freed in order to fragment the cell’s DNA [Alberts 2002; Hotchkiss, et al. 2009].

The death cascade is also activated via caspase‐3 in the extrinsic pathway. This occurs when death ligands like FAS, TNF and TRAIL bind to their corresponding receptors, which then recruit FADD, an adaptor protein. FADD then recruits and

17

activates caspase‐8, which activates caspase‐3, leading to the death cascade [Alberts

2002; Hotchkiss, et al. 2009].

Autophagy and necrosis are discussed less often than apoptosis, but they are still important methods for cell death and are therefore relevant to cancer. There are three related forms of autophagy, and all involve the digestion of cellular components in [Hotchkiss, et al. 2009]. In macroautophagy, the process is initiated by mTOR, a signaling protein that is responsive to changes in nutrient availability [Corcelle, et al. 2009]. Members of the Atg are then involved with initiating and elongating a membrane, which, when complete, is called an autophagosome [Corcelle, et al. 2009]. The cargo enveloped by the autophagosome is digested when the autophagosome docks and fuses with a [Corcelle, et al. 2009; Hotchkiss, et al. 2009]. This recycles the cargo to meet the cell’s nutrient needs. In contrast with autophagy, in which lysosomal destruction of cellular components is controlled, necrosis involves an uncontrolled digestion of components by lysosomal that have been released into the cytosol [Hotchkiss, et al. 2009]. Necrosis usually occurs in situations where the cell has not had sufficient time to initiate autophagy or apoptosis. These include sudden heat or cold shock, acute hypoxia, and severe nutrient deprivation [Hotchkiss, et al.

2009]. The process is mediated by PARP, which is inactive in apoptosis but active in necrosis. Its activity causes the release of large amounts of ATP from intracellular stores [Hotchkiss, et al. 2009]. During necrosis, calcium ions also flow into the cell, activating calcium‐dependent proteases [Hotchkiss, et al. 2009]. In the last stages of

18

necrosis, the plasma membrane ruptures and the intracellular contents are released

[Alberts 2002; Hotchkiss, et al. 2009].

Understanding these three major pathways of cell death – apoptosis, autophagy and necrosis – is important to the understanding of cancer. Many genes involved in these processes have been implicated in cancer, although most catalogued variants in these genes have been identified as having been acquired somatically during tumorigenesis, not inherited. This may be due to the critical importance of processes like apoptosis – disruptive de novo mutations affecting apoptosis could result in embryonic lethality because of the critical role apoptosis plays in things like limb patterning. Nevertheless, some inherited cancer‐related variations have been identified in cell death pathway members, and some of these are summarized in Table 1.3. For example, variants in the and receptor, involved in the extrinsic apoptosis pathway, have been implicated in lung cancer in

Chinese individuals [Zhang, et al. 2005]. In this study, a promoter variant in FAS caused decreased Fas expression and an increased risk of lung cancer, while another promoter variant in FASLG caused increased Fas ligand expression and increased lung cancer risk [Zhang, et al. 2005]. Two interesting cancer‐related variants have been identified in the gene encoding , an important mediator of the intrinsic and extrinsic apoptosis pathways. One of these variants appears to protect against breast cancer in individuals of European descent while the other appears to protect against lung cancer in Han Chinese individuals [Cox, et al. 2007; Sun, et al.

2007]. No cancer‐predisposing variants in CASP8 have been identified at the time of

19

this writing. As research progresses, there will certainly be more links identified between the three major cell death processes, genetics and cancer.

1.2.3.4 Growth signaling pathways

Cancer is a disease of uncontrolled cell growth, so it is important to understand the mechanisms by which tumor cells induce, maintain and sustain their growth. As mentioned in the previous section, cells that grow quickly and deplete their nutrient supplies may end up undergoing autophagy or another form of cell death as a result. Sustaining a nutrient‐rich environment is therefore critical to the growth of tumors, and this is mainly achieved through angiogenesis [Hanahan and

Weinberg 2000]. Angiogenesis can be initiated under conditions of hypoxia, which can occur when tumor cells are quickly dividing and depleting local oxygen supplies

[Alberts 2002; Kufe, et al. 2003]. In these conditions, the normal degradation of the

HIF‐1α protein is inhibited, and as levels of HIF‐1α increase, HIF‐1α dimerizes with

HIF‐1β and translocates to the nucleus, where it binds co‐activators CBP and p300 and activates transcription of pro‐angiogenic genes like VEGF­A [Semenza 2009].

The VEGF protein family includes several proteins that assist in angiogenesis under different circumstances. When VEGF‐A is secreted from cells, it binds to VEGF receptors in endothelial cells, which initiates signaling cascades that result in the recruitment of new blood vessels to the cells in need of oxygen and nutrients

[Alberts 2002; Kufe, et al. 2003].

20

Some of the same signaling cascades that are activated in endothelial cells by pro‐angiogenic factors are also activated in tumor cells. In general, these are pathways that are involved in cell growth and division. The pathways are usually stimulated when in the extracellular environment bind to transmembrane receptor tyrosine kinases (RTKs) like FGFR, EGFR and IGFR [Alberts 2002; Kufe, et al. 2003]. Ligand binding to these receptors can activate several key intracellular signaling cascades. An example of one such cascade is the RAF/MEK/ERK signaling pathway. ERK is a mitogen‐activated protein kinase (MAPK), so this pathway is sometimes called the MAPK pathway. However, other MAPK proteins exist in the cell, so the more specific RAF/MEK/ERK name will be used here [Garrington and

Johnson 1999; Sebolt‐Leopold and Herrera 2004]. Pro‐growth actions of this pathway including promoting cell cycle progression by controlling cyclin D1 levels and acting against apoptosis by inactivating BAD [Sebolt‐Leopold and Herrera

2004]. The activation of this pathway starts when ligands bind to RTKs, causing the recruitment of adapter proteins GRB2 and SOS to the RTK and the subsequent recruitment and activation of Ras. Ras then binds and phosphorylates RAF‐1, activating it. RAF‐1 then proceeds to phosphorylate and activate the two MEK isoforms (MEK1 and MEK2), which phosphorylate and activate ERK1 and ERK2.

ERK1/2 can then translocate to the nucleus and assist in controlling expression of other growth‐related genes. Several other signaling pathways contribute to cell growth, including other MAPK pathways, the NF‐κB pathway, the Notch signaling pathway and WNT/β‐catenin signaling. However, in addition to the RAF/MEK/ERK

21

pathway, there is one other pathway that is most commonly associated with cell growth – the phosphatidyl inositol 3‐kinase (PI3K)/AKT/mechanistic target of rapamycin (mTOR) pathway, which will be discussed in greater detail in the next section.

As can be predicted, inherited variation in genes involved in cell growth and angiogenesis can be linked to cancer. Some of these are summarized in Table 1.4.

The VHL gene was first mapped as a mutated in Von Hippel‐

Lindau syndrome, a disease that is frequently associated with the development of hemangioblastomas, renal cell carcinomas and pheochromocytomas. These tumors tend to overproduce VEGF and are highly vascularized [Liang, et al. 2007]. Based on this evidence, experiments were performed to investigate a link between VHL and

VEGF, and it was shown that normal VHL inhibits production of VEGF under normoxic conditions but that mutant VHL allows constitutive VEGF production

[Iliopoulos, et al. 1996]. Since VHL also functions to stabilize and activate p53, its loss or mutation is very likely to promote tumorigenesis [Roe, et al. 2006]. Other cancer‐related angiogenesis genes include the co‐activators of HIF‐1α, p300 and

CBP, which are mutated in patients who have Rubenstein‐Taybi syndrome and are more likely to develop a variety of cancers [Miller and Rubinstein 1995; Petrij, et al.

1995; Roelfsema, et al. 2005]. Genes in pro‐growth pathways are also occasionally found to be associated with predisposition to cancer. The PDGFRA gene, encoding

PDGFRα, is mutated in some familial cases of gastrointestinal stromal tumors

(GISTs), and somatic mutations in the gene have also been identified in sporadic

22

cases of GISTs [Chompret, et al. 2004; Heinrich, et al. 2003]. More information on inherited mutations in pro‐growth pathway members will be found in the next section.

1.3 The PI3K/AKT/mTOR signaling pathway

The PI3K/AKT/mTOR signaling pathway is one of the main pro‐growth signaling pathways. When stimulated, it generally promotes cell growth, differentiation and cell division. As a brief summary of the mechanism of the pathway, phosphoinositide 3‐kinase (PI3K) is initially stimulated by the binding of growth factors to transmembrane receptors. PI3K then phosphorylates phosphatidylinositol‐4,5‐bisphosphate (PIP2) to create PIP3. This lipid binds to other signaling molecules like PDK1 and AKT and recruits them to the , where PDK1 phosphorylates and activates AKT. Mechanistic target of rapamycin (mTOR) complex 2 also phosphorylates and activates AKT, which then goes on to phosphorylate certain mTOR complex 1 inhibitors, resulting in the activation of mTOR complex 1. An activated mTOR complex 1 can then phosphorylate downstream effectors like p70 S6 kinase and 4E‐BP1, ultimately causing increases in protein translation and the promotion of cell division. A more detailed overview of the pathway is shown in Figure 1.2. Given the pathway’s importance in cell growth and division, it is unsurprising that some germline and somatic variants in pathway members promote the development of cancer. The

23

genes, proteins and mutations that are important to understanding this pathway will be discussed in this section.

1.3.1 Growth factor receptors

Since the PI3K/AKT/mTOR pathway is known to be a growth‐promoting signaling pathway, it follows logically that the initial stimuli to the pathway come from growth factor receptors. Two classes of receptors have the ability to signal through PI3K/AKT/mTOR: receptor tyrosine kinases (RTKs) and G protein‐coupled receptors (GPCRs). Each of these classes is associated with a particular class of

PI3K, which will be discussed further in the next section; RTKs are associated with class IA PI3Ks, and GPCRs are associated with class IB PI3Ks [Hawkins, et al. 2006;

Katso, et al. 2001; Liu, et al. 2009]. Though there exists a wide array of GPCRs that can be activated by many different ligands, this class of receptors is generally considered less relevant to the overall study of the PI3K/AKT/mTOR pathway, mainly because the class IB PI3Ks through which it signals are only expressed in a few tissue types [Katso, et al. 2001; Liu, et al. 2009]. Nevertheless, some information is still known about GPCR‐PI3K signaling. For example, GPCRs that bind complement component 5a in neutrophils are known to activate PI3K signaling, as do GPCRs that bind chemokines [Hawkins, et al. 2006; Koyasu 2003].

GPCRs can activate PI3K/AKT/mTOR signaling directly, through Gβγ subunits, or indirectly, by transactivating RTKs [New and Wong 2007]. When ligands bind to their corresponding GPCRs, Gβγ subunits are freed to interact with

24

other signaling proteins [Bonacci, et al. 2006]. One such molecule is the well‐ characterized class IB PI3K, a dimer composed of the catalytic subunit p110γ and the regulatory subunit p101. p101 binds with Gβγ, resulting in the activation of p110γ and the initiation of PI3K signaling [Liu, et al. 2009]. GPCR stimulation can also result in the activation of metalloproteases, which are able to cleave growth factor‐like precursors. These precursors can bind to RTKs like EGFR and HER2 and subsequently activate the PI3K/AKT/mTOR cascade through class IA PI3Ks

[Gschwind, et al. 2003; Schafer, et al. 2004].

RTKs are transmembrane proteins that, upon ligand binding, phosphorylate tyrosine residues located either on the protein itself or on an adapter protein. These residues are located in activation motifs of the form Y‐X‐X‐M [Hawkins, et al. 2006].

When those specific tyrosines are phosphorylated, they are then able to recruit SH2 domain‐containing molecules [Hawkins, et al. 2006]. Such molecules include the regulatory domains of class IA PI3Ks. This recruitment of PI3Ks to the membrane by RTKs initiates PI3K/AKT/mTOR signaling. Another method by which RTKs might activate PI3K involves the protein Ras. GTP‐bound Ras is produced as a result of the activation of several different RTKs, and it appears to be able to bind and possibly activate the p110 catalytic subunit of class IA PI3Ks [Hawkins, et al. 2006].

A handful of mutations, amplifications and other changes in growth factor receptor genes have been associated with cancer. As mentioned in section 1.2.3.4, there are a few known cancer‐related inherited variants in PDGFRA and PDGFB, the

PDGF receptor and ligand, respectively. However, these variants are very rare in the

25

population. In general, it is much more common for growth factor receptor genes to acquire somatic mutations or to undergo amplification as part of the tumorigenesis process. EGFR is one example of a growth factor receptor gene that is mutated and amplified in a wide array of cancers. Lung cancers in particular demonstrate a high frequency of EGFR mutation and amplification [Forbes, et al. 2009; Hirsch, et al.

2008]. EGFR amplification is also found in breast cancer, though amplification of another EGFR family member, ERBB2, is more commonly associated with breast cancer. ERBB2 encodes a protein also known as HER2, and it is amplified in approximately 18‐20 percent of all breast cancers [Penault‐Llorca, et al. 2009].

Amplifications and activating mutations in these and other growth factor receptors can lead to increased signaling downstream of the receptors – either through the

PI3K/AKT/mTOR pathway or other growth‐related pathways like the

RAF/MEK/ERK pathway. This may produce a stronger impetus for the cell to grow and avoid apoptosis, and as a result, the cell may turn cancerous. Further details on mutations and amplifications in growth factor receptors can be found in Table 1.5.

1.3.2 PI3Ks and PTEN

PI3Ks and their counterpart, PTEN, are just downstream of the growth factor receptors in this signaling cascade. There are three classes of PI3K, distinguished from each other by structural and functional characteristics. The most well‐ characterized class is class I, which is further subdivided into class IA and class IB

PI3Ks [Hawkins, et al. 2006]. Both of these classes are heterodimeric molecules, and

26

the major function of both classes is to convert phosphatidylinositol‐4,5‐ bisphosphate (PIP2) into PIP3 by phosphorylating PIP2 [Hawkins, et al. 2006]. The main differences between the classes are their means of activation and the makeup of their heterodimers. As discussed in the last section, class IA PI3Ks are activated by RTKs, while class IB PI3Ks are activated by GPCRs. This difference is due to the different combinations of regulatory subunits that make up the heterodimeric molecule. Class IA PI3Ks contain one of five regulatory subunits (p85α, p55α, p50α, p85β or p55γ) and one of three possible catalytic subunits (p110α, p110β or p110δ)

[Hawkins, et al. 2006; Yuan and Cantley 2008]. The regulatory subunits are encoded by three genes: PIK3R1 (encodes all of the alpha subunits), PIK3R2 and PIK3R3, while the catalytic subunits are encoded by PIK3CA, PIK3CB and PIK3CD [Hawkins, et al. 2006]. In contrast, the class IB PI3Ks are much simpler; there is only one catalytic subunit, p110γ (encoded by PIK3CG and only expressed in a limited subset of tissues) and two possible regulatory subunits, p101 and p84 (encoded by PIK3R5 and PIK3R6 respectively) [Hawkins, et al. 2006; Liu, et al. 2009].

While the catalytic subunits of the class IA and class IB PI3Ks are somewhat structurally similar, there are major differences in the structures of the regulatory subunits. All of the class IA regulatory subunits possess SH2 domains, enabling them to interact with phosphorylated tyrosine residues associated with RTKs, while none of the class IB subunits have these [Hawkins, et al. 2006]. The dissimilarity in the regulatory subunit structures and the similarity in the catalytic subunit

27

structures explains how these classes are activated in different ways, yet still perform the same enzymatic function.

Both types of class I PI3Ks differ greatly in structure and function from the class II and class III PI3Ks, which are associated with cellular processes like endocytosis and vesicle trafficking, not with [Hawkins, et al.

2006; Hennessy, et al. 2005]. Due to their differing functions, the class II and III

PI3Ks are not thought to play significant roles in cancer and will not be discussed in depth here. Any further mention of PI3Ks will refer only to class I PI3Ks unless otherwise specified.

Cancer‐related mutations and amplifications have been identified in two genes that encode PI3K components. One of these genes is PIK3R1, the gene that encodes the p85α, p55α and p50α regulatory subunits of class IA PI3Ks.

Approximately 10 percent of central nervous system cancers (mostly glioblastomas) have somatic mutations in PIK3R1 [Forbes, et al. 2009]. Somatic mutations have also been identified in PIK3CA in a strikingly wide array of cancers, and in some cancer types, mutations are quite common. For example, more than 20 percent of breast, urinary tract and endometrial cancers contain somatic mutations of PIK3R1

[Forbes, et al. 2009]. It is also amplified in more than 20 percent of gastric, head and neck and lung cancers [Liu, et al. 2009]. These mutations and amplifications generally lead to increased PI3K/AKT/mTOR signaling, resulting in a pro‐growth signal to the cell.

28

PTEN plays an important role in opposing the function of PI3Ks. PTEN performs the reverse of the reaction performed by PI3Ks; that is, it dephosphorylates PIP3 to create PIP2. Another lipid phosphatase, SHIP, is capable of performing the same function, but PTEN is much better characterized [Katso, et al.

2001]. A lipid‐binding domain normally localizes PTEN to the plasma membrane, where it carries out its phosphatase activities and counteracts the PI3K/AKT/mTOR signaling pathway [Carracedo and Pandolfi 2008]. As would be expected, mice that are lacking two functional copies of Pten have higher PIP3 levels, and their cells are more proliferative, often leading to tumors [Katso, et al. 2001].

Studies of human cancers have affirmed the connection that was found in mouse studies between lack of functional Pten and cancer. PTEN was originally identified as a positional candidate gene in a region of the genome that underwent frequent loss of heterozygosity (LOH) in several types of cancer [Li, et al. 1997]. In fact, it may be mutated in approximately 40 percent of endometrial cancers and 20 percent of glioblastomas [Forbes, et al. 2009]. Loss of heterozygosity at the PTEN locus is also quite common in melanomas, glioblastomas and prostate cancer. It is important to note that somatic mutations are not the only alterations in PTEN known to be associated with cancer. There are also several rare inherited mutations in PTEN that result in syndromes involving frequent tumor growth. One of these is

Cowden syndrome, which is characterized by a highly increased predisposition to breast, thyroid and endometrial cancers [Blumenthal and Dennis 2008]. Lhermitte‐

Duclos disease, Bannayan‐Riley‐Ruvalcaba syndrome and form

29

the larger “umbrella” group of PTEN hamartoma tumor syndromes [Blumenthal and

Dennis 2008]. Loss of PTEN activity, whether by inherited or acquired mutation or loss of heterozygosity, means the loss of the major negative regulator of

PI3K/AKT/mTOR signaling. As a result of this, signaling through this pathway and the cell growth it promotes go unchecked, promoting tumorigenesis.

1.3.3 AKT

AKT, also known as protein kinase B (PKB), lies downstream of PI3K in the

PI3K/AKT/mTOR pathway. In humans, there are three AKT proteins (AKT1, AKT2 and AKT3) that are part of the larger AGC kinase family [Bellacosa, et al. 2005].

Knockout mouse models have indicated that the functions of these isoforms are somewhat distinct. Specifically, AKT1 is thought to be primarily involved in growth and cell survival, AKT2 appears to be mainly involved in glucose homeostasis, and

AKT3 may be important for brain development [Gonzalez and McGraw 2009]. Due to these functional differences, cancer research usually focuses on AKT1, though there is some evidence that AKT2 and AKT3 are also relevant to the disease.

Structurally, all three AKT proteins have a pleckstrin homology (PH) domain near the N‐terminus and a kinase domain and a hydrophobic regulatory motif near the C‐terminus [Bellacosa, et al. 2005]. The PH domain is one of the main ways through which PI3K activates AKT. This domain binds to PIP3, so when more PIP3 is produced in the plasma membrane as a result of PI3K activation, AKT translocates to the membrane [Bellacosa, et al. 2005]. Upon recruitment to the membrane, the

30

conformation of AKT changes, revealing two residues on which it can be phosphorylated – T308 and S473 (residue numbers correspond to AKT1)

[Hennessy, et al. 2005]. Another PH domain‐containing protein, PDK1, is also recruited to the membrane following PI3K activation. PDK1 is then able to phosphorylate AKT on T308, which is in the activation loop of the kinase domain

[Bellacosa, et al. 2005; Hennessy, et al. 2005]. This phosphorylation event is required in order for AKT’s kinase domain to become active [Hennessy, et al. 2005].

However, the activity of AKT can be greatly increased if it is phosphorylated on

S473, which is in the hydrophobic domain [Hennessy, et al. 2005]. Recently, it was discovered that mTOR complex 2 (mTORC2) is the kinase that performs this phosphorylation event, fully activating AKT [Franke 2008; Liu, et al. 2009].

Additional proteins are responsible for dephosphorylating AKT. These include the

PH domain‐containing phosphatases PHLPP1 and PHLPP2, which dephosphorylate

AKT at S473 and PP2A, which dephosphorylates the T308 residue [Qiao, et al. 2008].

A balance between the activity of PDK1/mTORC2 and PHLPP1/PHLPP2/PP2A controls AKT’s kinase activity.

AKT is known to phosphorylate dozens of substrates containing the consensus sequence R‐X‐R‐X‐X‐S/T. Some of these substrates are cell cycle proteins like p21, p27 and GSK3β, which prevents cyclin D1 degradation

[Bellacosa, et al. 2005]. AKT also directly regulates proteins involved in apoptosis and survival signaling, including BAD, MDM2, procaspase‐9, PED (an inhibitor of caspase 3) and ASK1 [Bellacosa, et al. 2005; Franke 2008]. The result of increased

31

AKT activity is to regulate all of these proteins in such a way as to promote cell survival and avoid apoptosis. Another important group of apoptosis proteins is the family of forkhead transcription factors. AKT phosphorylation of these proteins prevents their translocation to the nucleus and subsequent transcription of pro‐ apoptotic genes [Bellacosa, et al. 2005]. Similarly, AKT indirectly promotes the nuclear localization of NF‐κB, which assists in the transcription of pro‐survival genes [Bellacosa, et al. 2005]. Several other AKT substrates are involved in processes that are relevant to tumorigenesis. These include eNOS (involved in angiogenesis), the catalytic component of telomerase (involved in maintaining DNA integrity), GLUT1 and GLUT4 (both involved in ) [Bellacosa, et al. 2005].

AKT also phosphorylates several substrates that are involved in cell growth.

These include PRAS40 and TSC2, a GTPase that normally forms a complex with another protein, TSC1 [Bellacosa, et al. 2005; Guertin and Sabatini 2007; Manning and Cantley 2007]. Upon TSC2 phosphorylation, the GTPase activity of TSC1/2 is reduced, allowing Rheb, a Ras‐like protein, to bind GTP [Bellacosa, et al. 2005;

Guertin and Sabatini 2007; Manning and Cantley 2007]. The GTP‐bound form of

Rheb activates mTOR complex 1 (mTORC1), an important cell growth signaling complex that will be discussed further in the next section. Additionally, when

PRAS40 is phosphorylated, it relieves its normal inhibition of mTORC1, resulting in further mTORC1 activation [Bellacosa, et al. 2005; Guertin and Sabatini 2007;

Manning and Cantley 2007]. Eventually, several signaling proteins downstream of

32

mTOR exert feedback inhibition on AKT, resulting in tight control of this pathway

[Guertin and Sabatini 2007].

AKT is downstream of growth factor receptors and upstream of substrates involved in a wide array of pro‐growth, pro‐division signaling cascades, so it is no surprise that AKT plays a critical role in tumorigenesis. However, its key role in pro‐ growth signaling does not necessarily translate to increased mutation or amplification of the AKT genes in cancer. Somatic AKT1 mutations have been identified in a few types of cancer, but they are somewhat rare. Thyroid cancer is the only cancer type in which AKT1 is mutated in more than 5 percent of cases

[Forbes, et al. 2009]. A recently identified somatic coding change in the PH domain was originally thought to occur somewhat frequently in breast cancer (6 percent), but subsequent analyses of other patient populations have shown that its true prevalence is only about 3 percent [Bleeker, et al. 2008; Carpten, et al. 2007; Liu, et al. 2009]. Common amplifications have been detected in the AKT2 gene, but not in

AKT1 or AKT3. They are present in about 14 percent of ovarian cancers and appear to be even more common in pancreatic and head and neck cancers, though the number of cases of those two cancer types in which AKT2 has been studied is still relatively small [Liu, et al. 2009]. More information on alterations in AKT genes can be found in Table 1.5.

In spite of the low mutation rate of the AKT genes, when AKT activity levels are measured, they are routinely found to be extremely high in many cancer types.

Between 80 percent and 100 percent of thyroid cancers, for example, show

33

increased AKT activity, as do approximately 50 percent of prostate cancers and 80 percent of gastric carcinomas [Altomare and Testa 2005]. This activity is likely due to changes in the activity of upstream regulators, such as PI3K, PTEN, growth factor receptors and the kinases and phosphatases that regulate AKT phosphorylation.

Since these factors succeed in elevating AKT activity so substantially, one explanation for the rarity of amplifications and mutations is that they are just not necessary for tumorigenesis.

1.3.4 mTOR

mTOR is a 289 kDa pro‐growth signaling protein that is distantly related to

PI3K, but it differs from PI3K in that it is exclusively a protein kinase as opposed to a lipid kinase. It was originally discovered as a homolog to a yeast protein called TOR, an acronym for “target of rapamycin” [Corradetti and Guan 2006]. While yeast have two related proteins, TOR1 and TOR2, mammalian cells have one protein (mTOR) that exists in one of two complexes – mTOR complex 1 (mTORC1) or mTOR complex

2 (mTORC2) [Corradetti and Guan 2006]. mTORC1 consists of mTOR, raptor, mLST8 and PRAS40, while mTORC2 contains mTOR, rictor, mLST8, protor and mSIN1 [Alessi, et al. 2009; Guertin and Sabatini 2007].

As described in the previous section on AKT, mTORC1 becomes activated following AKT activation, while mTORC2 is one of the kinases involved in the initial

AKT activation. The activation of mTORC1 is mediated by the inhibition of the AKT substrate PRAS40 and the activation of Rheb‐GTP, which is induced following the

34

inhibition of the TSC1/2 complex by AKT [Guertin and Sabatini 2007]. When active, mTORC1 phosphorylates several substrates. The best characterized of these are p70S6K and 4E‐BP1, and they will be described in the next section. In addition to

PI3K/AKT signaling, mTORC1 is also regulated by signaling proteins in other pathways. For example, in hypoxic or nutrient‐deprived circumstances, mTORC1 is inhibited, preventing the cell from growing [Corradetti and Guan 2006; Guertin and

Sabatini 2007].

Little was known about mTORC2 before its recent discovery as the kinase responsible for phosphorylating AKT at S473 [Guertin and Sabatini 2007]. It is now known that mTORC2 subunits are phosphorylated in response to growth factor stimulation, but the kinases responsible for these events have not been identified

[Guertin and Sabatini 2007]. However, several substrates of mTORC2 have been identified to date, all of which are members of the AGC kinase family. These include the aforementioned AKT (at S473), protein kinase C alpha (PKCα, at S657) and serum glucocorticoid‐induced protein kinase 1 (SGK1, at S422) [Alessi, et al. 2009].

These substrates are broadly involved in growth signaling and cellular , areas that are important in cancer [Alessi, et al. 2009].

To date, no common mutations or amplifications have been identified in mTOR or the components of either mTORC1 or mTORC2. However, as mentioned in the previous section, some mutations have been observed in AKT, which is an mTORC2 substrate. Also, there are rare inherited mutations in the genes encoding two of the inhibitors of mTORC1 activity – TSC1 and TSC2. These genes were

35

identified as positional candidates for tuberous sclerosis, a rare, highly‐penetrant disease in which patients develop multiple hamartomas in diverse tissues [Kumar, et al. 1995; van Slegtenhorst, et al. 1997]. Mutations in these genes result in the

TSC1/2 complex’s loss of its ability to inhibit mTORC1, leading to increased downstream signaling activity and cell growth. Though hamartomas are benign, in some cases, patients with mutations in TSC1 or TSC2 develop cancerous tumors

[Shields, et al. 2005].

1.3.5 Downstream effectors of mTOR

mTORC1 and mTORC2 are each known to phosphorylate several downstream substrates. mTORC1 is best known for phosphorylating two proteins involved with the regulation of protein translation – p70

(p70S6K) and 4E‐BP1. Upon phosphorylation of the C‐terminal hydrophobic motif of p70S6K by mTORC1, an additional site is phosphorylated by PDK1, the same enzyme that phosphorylates AKT at T308 [Alessi, et al. 2009; Garcia‐Echeverria and

Sellers 2008]. This activates p70S6K and also causes it to dissociate from the eukaryotic initiation factor 3 (eIF3) translation initiation complex [Corradetti and

Guan 2006; Guertin and Sabatini 2007]. The activated p70S6K then phosphorylates ribosomal protein S6, which acts to increase translation of some mRNAs [Corradetti and Guan 2006]. p70S6K also causes an increase in the rate of protein translation by phosphorylating PDCD4 and causing it to be targeted for degradation [Guertin and Sabatini 2007]. PDCD4 normally interferes with translation by preventing

36

secondary structures in mRNAs from being unwound, so its degradation leads to a net increase in translation [Guertin and Sabatini 2007]. As previously mentioned, the PI3K/AKT/mTOR pathway undergoes some degree of feedback inhibition, and this is thought to be mediated by p70S6K phosphorylation and inhibition of growth factor receptor‐associated molecules like IRS‐1, which results in the attenuation of the initial stimulus to the PI3K/AKT/mTOR cascade [Guertin and Sabatini 2007].

mTORC1 phosphorylation of 4E‐BP1 releases it from the same eIF3 complex that binds p70S6K. It also inhibits the normal function of 4E‐BP1, which is to inhibit eIF4E, a cap‐dependent initiation factor. As a result, following mTORC1 activation, the inhibition of eIF4E is released, and translation of mRNAs with 7‐methylguanine caps is increased [Corradetti and Guan 2006]. These mTORC1‐dependent responses may be responsible for translating up to 15 percent of the proteins in a given cell, including several proteins that are key regulators of cell growth and division like cyclin D1 and c‐Myc [Corradetti and Guan 2006].

mTORC2 is best known for phosphorylating AKT at the S473 residue (see section 1.3.3 on AKT), but it also phosphorylates two related kinases, PKCα and

SGK1. SGK1 is one isoform of the SGK family, and while it had been known that it was phosphorylated in response to growth factor signaling, it was only recently identified as a substrate of mTORC2 [Alessi, et al. 2009]. The molecules downstream of SGK1 are not fully characterized, but there is some information available about several of the substrates. SGK1 phosphorylates a E3 ligase called Nedd2‐4, leading to the ultimate effect of increased sodium transport across

37

the plasma membrane [Alessi, et al. 2009]. SGK1 is also thought to phosphorylate n‐ myc downstream regulated 1 (NDRG1) and some forkhead transcription factors

[Alessi, et al. 2009]. Unfortunately, some cross‐reactivity has been observed between reagents detecting SGK1 and p70S6K, so a complete and accurate understanding of SGK1 has likely not yet been achieved [Alessi, et al. 2009].

PKCα is a much more extensively characterized protein, though its role as a direct substrate of mTORC2 is somewhat unclear. mTORC2 appears to regulate the stability and phosphorylation of PKCα, but until recently, it was not known whether this was mediated by direct phosphorylation by mTORC2 or by an mTORC2 substrate [Guertin and Sabatini 2007]. However, studies have now shown that mTORC2 phsophorylates PKCα directly at S657 in the turn motif, resulting in the stabilization of the protein [Alessi, et al. 2009]. Like p70S6K, PKCα can also be phosphorylated and activated by PDK1 [Garcia‐Echeverria and Sellers 2008].

Additionally, PKCα is activated by diacylglycerol (DAG) and calcium ions, which function as second messengers in a number of other diverse signaling pathways

[Reyland 2009]. It has been implicated in several diseases, including cardiovascular disease and cancer [Konopatskaya and Poole 2009].

The genes encoding p70S6K, 4E‐BP1, SGK1 and PKCα are not known to be commonly amplified or mutated in cancers [Forbes, et al. 2009]. While this may reflect insufficient data collected on these genes, there is another compelling explanation. Upstream regulators like growth factor receptors, PI3Ks and PTEN are very commonly altered in cancer, and the activity levels of the molecules farther

38

downstream are usually increased as a result. While tumors may acquire pro‐ growth mutations in some of the genes encoding mTOR substrates, the effects of these mutations would likely not increase the activity of the PI3K/AKT/mTOR pathway if any of the common alterations to upstream regulators have already occurred. Also, while there are dozens of signaling molecules located downstream of PI3Ks and growth factor receptors, the effectors of mTOR (with the exception of

AKT) have relatively few substrates. In the absence of increased activity of upstream regulators, activating mutations in one of these effectors would probably have a pro‐growth effect that is very small or nonexistent. As a result, the cell in which the mutation occurred might not necessarily outcompete its neighbors to form the bulk of a tumor, and the mutation would likely not be detected if a sample of tumor or normal cells were sequenced using most current methods.

1.4 Drugs targeting the PI3K/AKT/mTOR pathway

The importance of the PI3K/AKT/mTOR pathway in growth signaling and its activation in a number of different types of cancer has made it an attractive target for anti‐cancer therapies. The particular molecules targeted by drug development efforts have mainly been the upstream signaling proteins in the pathway – growth factor receptors, PI3Ks, AKT and mTOR. In general, blocking upstream proteins is thought to be a more efficient way of shutting down the entire pathway than blocking intermediate signaling molecules. However, alterations in downstream pathway members may abrogate any effects of drugs targeting upstream pathway

39

members. Current efforts to develop drugs targeting the PI3K/AKT/mTOR pathway have focused on finding potent, specific and safe inhibitors of pathway members and then identifying which patients would benefit from those inhibitors, either singly or in combination with other therapies. Many drugs have been identified and proven to work in rodent models, and some of those have continued on to clinical trials. A handful of these have been approved for the treatment of cancer. A simplified diagram of the PI3K/AKT/mTOR pathway and some of the drugs targeting its members is shown in Figure 1.3.

1.4.1 Drugs targeting growth factor receptors

As the initiators of growth signaling cascades, growth factor receptors appear to be an excellent target for the treatment of tumors with excessive growth signaling. They are transmembrane proteins, meaning that drug delivery efforts can avoid the problem of transport across the cell membrane by targeting only the extracellular portion of the molecule. Additionally, several cancer types demonstrate common amplifications of these genes (Table 1.5). Proteins that are commonly overexpressed present fewer challenges to target pharmaceutically than those that may be deleted or mutated (often at different sites in different patients).

One successful way of targeting overexpressed growth factor receptors is with a monoclonal antibody (mAb) that binds the receptor, preventing its natural ligand from binding to activate its downstream activity. Two such mAbs are cetuximab (Erbitux) and panitumumab (Vectibix). These drugs target the EGFR

40

receptor, which is commonly amplified in several cancer types (Table 1.5) [Cheng, et al. 2005; Liu, et al. 2009]. Both drugs are approved for the treatment of metastatic colorectal cancer, and cetuximab is additionally approved for treating squamous cell head and neck cancer [Martinelli, et al. 2009]. Another growth factor receptor that is commonly amplified is HER2, which is overexpressed in about 20 percent of breast cancers (Table 1.5). Trastuzumab (Herceptin) is a mAb that binds to the extracellular portion of the HER2 protein and prevents it from forming dimers with

EGFR, HER2 or ERBB3 (HER3) [Yan, et al. 2008]. Dimerization is required for signaling through this receptor, so by blocking that process, trastuzumab is able to successfully inhibit HER2 signaling [Yan, et al. 2008]. Because of this success, trastuzumab has been approved for the treatment of HER2‐overexpressing breast cancer [Liu, et al. 2009].

An alternate method of targeting growth factor receptors is by using small molecule inhibitors. Two of these are erlotinib (Tarceva) and gefitinib (Iressa), both of which bind to the ATP‐binding domain of EGFR in order to inhibit it [Cheng, et al.

2005; Hennessy, et al. 2005; Liu, et al. 2009]. These drugs have been approved for the treatment of non‐small cell lung cancer. Lapatinib (Tykerb) is another small molecule inhibitor of EGFR, but in addition to that role, it also inhibits HER2. It is similar to erlotinib and gefitinib in its mechanism of action; it binds the ATP‐binding pocket of EGFR and HER2 to prevent the receptors from undergoing autophosphorylation [Nelson and Dolder 2006]. Because of its action against HER2,

41

lapatinib is also approved for treatment of HER2‐overexpressing breast cancer [Liu, et al. 2009].

Though research on drugs targeting growth factor receptors has already produced severally clinically useful products, research in this area is still ongoing.

Several additional mAbs are in clinical development, including ertumaxomab (anti‐

HER2) and nimotuzumab (anti‐EGFR) [Liu, et al. 2009]. Another drug, neratinib, is a small‐molecule inhibitor of EGFR and HER2 that is currently in trials for treatment of breast cancer [Ocana and Amir 2009]. It seems likely that research in the coming years will be able to build off of the wealth of information about currently approved drugs to create new and improved drugs for these targets as well as for other growth receptors.

1.4.2 Inhibitors of PI3Ks

Attempts to create inhibitors of PI3Ks have been underway for over a decade without resulting in a single clinically approved drug. However, recent advances have produced handfuls of promising compounds, many of which are now in clinical trials. Some of these are based on two compounds that are strong inhibitors of

PI3Ks in vitro but have chemical properties that render them unfavorable for administration as a drug – and LY294002 [Garcia‐Echeverria and

Sellers 2008; Liu, et al. 2009]. Wortmannin is a potent, irreversible inhibitor that interferes with the ability of PI3Ks to bind phosphate. It is dissimilar from

LY294002, which is a reversible inhibitor that competes with ATP for binding to

42

PI3K [Garcia‐Echeverria and Sellers 2008; Liu, et al. 2009]. Both wortmannin and

LY294002 inhibit all PI3K isoforms, and at higher concentrations, wortmannin can also inhibit related kinases like mTOR [Cheng, et al. 2005]. LY294002 also has off‐ target effects on processes like calcium signaling [Ihle and Powis 2009]. Two modified versions of these drugs are in clinical trials for solid tumors – PX‐866, a wortmannin derivative, and SF1126, a derivative of LY294002. SF1126 is also an inhibitor of mTOR [Liu, et al. 2009].

Attempts are also underway to produce a clinically useful PI3K inhibitor that is unrelated to wortmannin or LY294002. XL‐147 is a drug currently in early stage clinical trials for the treatment of solid tumors, and it is a specific inhibitor of class I

PI3Ks but not class II or III PI3Ks [Garcia‐Echeverria and Sellers 2008; Ihle and

Powis 2009; Liu, et al. 2009]. CAL‐101 is even more specific than XL147, and it is currently in phase I trials for hematologic malignancies [Ihle and Powis 2009; Liu, et al. 2009]. It only inhibits one isoform of PI3K ‐ the class IA isoform containing the catalytic subunit p110δ [Ihle and Powis 2009; Liu, et al. 2009]. Another compound,

PI‐103, was identified a dual inhibitor of mTOR and of PI3Ks, and it is most potent towards PI3Ks containing the p110α subunit [Liu, et al. 2009]. It showed promising results in mouse models of human tumor xenografts but did not progress to clinical trials due to a poor pharmacologic profile [Garcia‐Echeverria and Sellers 2008; Ihle and Powis 2009]. However, one of its derivatives, GDC‐0941, has already been entered into early stage clinical trials for advanced solid tumors and non‐Hodgkin’s lymphoma [Liu, et al. 2009; Raynaud, et al. 2009]. As the different PI3K isoforms

43

become better understood and as more specific inhibitors are developed, drug development efforts focusing on these molecules will continue to expand.

1.4.3 Inhibitors of AKT and its regulatory proteins

Inhibition of AKT can occur through several methods, since several different molecules regulate AKT’s activity. For example, inhibiting PI3Ks is generally expected to decrease AKT activity, since that would decrease the levels of PIP3, which AKT needs to translocate to the plasma membrane. Inhibiting PDK1 or the mTORC2 complex is also expected to result in decreased AKT activity, since those molecules are responsible for phosphorylating and activating AKT. Lastly, there is another druggable protein that influences AKT. This protein is HSP90, a heat shock protein that assist dozens of client proteins with proper protein folding. AKT is one such client protein, so inhibition of HSP90 is expected to inhibit AKT activity as well.

Though there are several methods for indirectly inhibiting AKT, there do exist some compounds that have shown success in directly inhibiting AKT. None of these compounds are approved for treatment of cancer, but several are in clinical trials. Miltefosine and perifosine are two ether lipids that have the ability to disrupt

AKT’s recruitment to the plasma membrane by binding AKT’s PH domain

[Hennessy, et al. 2005; Liu, et al. 2009]. One of these, perifosine, is the most clinically advanced AKT inhibitor; it is in phase II trials for multiple types of cancers

[Liu, et al. 2009]. Another drug in trials for solid tumors and other cancers is MK‐

2206. It is a derivative of AKTi‐1/2, an allosteric AKT inhibitor that is specific for

44

AKT1 and AKT2. Despite being an AKTi‐1/2 derivative, MK‐2206 has activity toward all three isoforms of AKT. GSK690693, like MK‐2206, is also a pan‐AKT inhibitor. It targets the ATP‐binding domain of AKT and also has activity toward other members of the AGC kinase family [Garcia‐Echeverria and Sellers 2008; Liu, et al. 2009]. GSK690693 has recently completed phase I trials. Finally, there are a handful of drugs in preclinical stages, such as PX‐316. This drug acts similarly to perifosine in that it targets AKT’s PH domain to prevent membrane translocation

[Hennessy, et al. 2005; Liu, et al. 2009].

As previously mentioned, a number of molecules that target proteins other than AKT nevertheless have the ability to indirectly reduce AKT activity. Celecoxib is a drug that was originally developed as a COX‐2 inhibitor and approved for the treatment of arthritis. However, it was discovered that COX‐2 also inhibits PDK1, the kinase responsible for phosphorylating AKT at T308 [Garcia‐Echeverria and

Sellers 2008]. This has led to the limited use of celecoxib to prevent the formation of polyps in individuals who have familial adenomatous polyposis [William, et al.

2009]. Another PDK1 inhibitor, UCN‐01 (also called 7‐hydroxystaurosporine), has been through several stages of clinical trials for various cancers. In addition to inhibiting PDK1, UCN‐01 also inhibits several other kinases, such as protein kinase C

[Kummar, et al. 2009]. Geldanamycin is a natural compound that was found to inhibit HSP90’s chaperone activity, resulting in reduced folding of client proteins

AKT and PDK1 [Garcia‐Echeverria and Sellers 2008]. The pharmacologic profile of geldanamycin was determined to be unfavorable, but a modified form, 17‐

45

allyamino‐17‐demethoxygeldanamycin, is well tolerated and is in phase II trials

[Garcia‐Echeverria and Sellers 2008]. NVP‐AUY922 is a very potent small molecule inhibitor of HSP90, and it has recently entered phase I trials [Garcia‐Echeverria and

Sellers 2008].

1.4.4 Inhibitors of mTOR

mTOR has long been known as a druggable target, since it was originally characterized as the target of rapamycin, a drug produced by the soil bacteria

Streptomyces hygroscopicus that was first purified in 1972 [Garber 2009; Rini 2008].

Rapamycin () was first used clinically as an antifungal and later as an immunosuppressant, but its anti‐growth properties were not investigated for cancer until later [Garber 2009; Liu, et al. 2009]. Several related compounds, sometimes referred to as “rapalogs” have been synthesized, tested and approved for clinical use. These include and , which have both been approved for the treatment of [Garber 2009; Liu, et al. 2009]. An additional rapalog, (formerly known as AP23573), is in phase III clinical trials for the treatment of metastatic soft tissue and bone sarcomas [Yuan, et al. 2009]. What all of the rapalogs have in common is that they all target mTORC1 and do not affect the activity of mTORC2 except at much higher doses [Feldman, et al. 2009; Foster and Toschi 2009; Garber 2009; Rini 2008]. Also, they may not even completely inhibit mTORC1 [Guertin and Sabatini 2009].

46

To improve upon the rapalogs, a new generation of mTOR inhibitors have been developed that have the ability to inhibit both mTORC1 and mTORC2. Two mTORC1/2 inhibitors, OSI‐027 and AZD8055, have initiated clinical trials for the treatment of solid tumors and lymphomas [Liu, et al. 2009]. These two compounds are ATP‐competitive inhibitors, and they reduce cell growth better than the rapalogs [Liu, et al. 2009]. Three pre‐clinical ATP‐competitive inhibitors of both mTOR complexes are Torin1 and the two TORKInibs, PP30 and PP242 [Feldman, et al. 2009; Guertin and Sabatini 2009]. These inhibitors are much more potent than rapamycin, but this potency may be due to increasingly effective inhibition of mTORC1 rather than inhibition of mTORC2 [Guertin and Sabatini 2009; Liu, et al.

2009].

Several additional compounds under clinical investigation are combination inhibitors of mTORC1, mTORC2 and PI3Ks. Two of these compounds are BEZ235 and BGT226, both developed by Novartis. BEZ235 works by competitively binding the ATP‐binding domain of mTOR and PI3Ks, and it is in phase II trials [Garcia‐

Echeverria and Sellers 2008; Liu, et al. 2009]. SF1126, the previously mentioned derivative of the PI3K inhibitor LY294002, is also a combination PI3K/mTOR inhibitor [Liu, et al. 2009]. XL765 is another of these inhibitors, but it has not advanced as far and is only in phase I of clinical trials [Guertin and Sabatini 2009;

Liu, et al. 2003]. The combination mTORC1/2 and PI3K/mTOR inhibitors have opened up new avenues for mTOR inhibition and have contributed to a greater understanding of how mTOR operates. Future attempts to make mTOR inhibitors

47

may improve success and potentially result in clinically useful treatments by building on these discoveries.

1.5 Pharmacogenomics of cancer treatment

Early enthusiasm for cancer research was directed toward the hope of someday finding “a cure for cancer”. In the last several decades, however, it has become clear that cancer is not a single disease but a collection of many dozens of types of cellular aberrations. For example, within the main category of breast cancer, there are five subtypes that can be determined based on gene expression and immunohistochemical profiling [Tang, et al. 2009]. As described in previous sections, one of these subtypes is characterized by the overexpression of the HER2 receptor. Consequently, patients with this subtype usually receive a HER2 inhibitor like trastuzumab, while patients with other subtypes require different treatment plans.

Additional factors complicate the administration of current cancer therapies.

One such factor is the patient’s sensitivity to the drug regimen. Chemotherapy has a reputation for being harsh on the patient’s body, and different patients may exhibit different side effects, some of which are dangerous and could require the patient to discontinue therapy. In light of the high cost of cancer treatment, the intensity of many treatment regimens and the potentially fatal consequences of treatment failure, it is critically important to make every attempt to insure that patients receive the right therapy for both their body and their cancer.

48

1.5.1 Successes in cancer pharmacogenomics

Pharmacogenomics is the study of how genomic variation impacts the effects of medicine, and it has great potential for helping cancer patients optimize their treatment regiments on an individual level. Though the field is still relatively young, some notable successes have already been achieved. The above‐mentioned recommendation of trastuzumab treatment for HER2‐overexpressing breast cancers is just one example of how measuring genetic variation in the tumor can guide treatment decisions.

Growing evidence suggests that patients with another breast cancer subtype may also benefit from pharmacogenomic testing. Patients who have estrogen receptor‐positive breast cancer often receive tamoxifen, a selective estrogen , as part of their treatment regimen [Osborne 1998]. Tamoxifen is converted into its active metabolite, endoxifen, by the liver enzyme CYP2D6

[Desta, et al. 2004; Wu, et al. 2009]. However, some individuals have inherited polymorphisms in the CYP2D6 gene that impair their ability to produce endoxifen from tamoxifen [Ingelman‐Sundberg 2005; Jin, et al. 2005]. These patients may need to increase their dosage of tamoxifen to a dose that is higher but still well‐ tolerated in order to improve their chances of treatment success.

Pharmacogenomics has also succeeded in identifying inherited polymorphisms that are associated with drug toxicity. Irinotecan, a common therapeutic agent for the treatment of colorectal cancer, can cause severe diarrhea

49

and neutropenia in some patients [Sanoff and McLeod 2008]. It was determined that many of the patients who experience severe neutropenia have polymorphisms in the UGT1A1 gene, which encodes a protein involved in the elimination of irinotecan’s active metabolite from the body [Sanoff and McLeod 2008]. As a result of this finding, it is now recommended that physicians start the patients with this polymorphism on a lower dose of the medication [Hoskins, et al. 2007; Sanoff and

McLeod 2008].

1.5.2 Pharmacogenomics and the PI3K/AKT/mTOR pathway

To date, most of the applications of pharmacogenomics to the

PI3K/AKT/mTOR pathway have involved determining tumor genotypes in order to assist with choosing the best treatment option. The directional nature of the pathway means that therapies that inhibit upstream pathway members might be useless if mutations render downstream pathway members insensitive to upstream control. As shown in Figure 1.3, there are many approved drugs that can target the

PI3K/AKT/mTOR pathway by inhibiting growth factor receptors (GFRs). In patients whose tumors overexpress GFRs, these drugs are an excellent option for inhibiting the growth of the tumor because it has often become addicted to PI3K/AKT/mTOR signaling [Engelman 2009]. However, as shown in Table 1.5, tumors frequently acquire mutations that activate PI3Ks and inactivate PTEN, which can lead to resistance to these drugs. Despite the acquired resistance to GFR inhibitors, the tumors may still be dependent on increased PI3K/AKT/mTOR signaling, meaning

50

that treatment with an inhibitor that targets molecules farther downstream may be effective [Engelman 2009; Hynes and Dey 2009].

Though drugs that target PI3Ks and AKT are not currently clinically approved, patients may still be able to benefit from pharmacogenomic efforts to optimize treatment. For example, physicians may refrain from prescribing a GFR inhibitor to a patient whose tumor is expected to be resistant to it. Patients with tumors that are PTEN‐deficient or that have activating mutations in PI3Ks are known to respond much more poorly to trastuzumab, so these patients may elect an alternate therapy instead, such as lapatinib [Hynes and Dey 2009; Liu, et al. 2009].

Another option for patients with known mutations is combination therapy.

KRAS encodes a signaling protein that is important in the RAF/MEK/ERK signaling cascade, and it is often mutated in cancers [Engelman 2009]. Some studies suggest that cancers with KRAS mutations might benefit from a combination of a drug targeting PI3K/AKT/mTOR signaling and one targeting the RAF/MEK/ERK pathway, but this will not be better understood until more inhibitors are clinically available [Engelman 2009]. Other studies suggest that combinations of multiple

GFR inhibitors or drugs with specificities for multiple signaling molecules will be needed to effectively treat most cancers [Stommel, et al. 2007]. The identification of mutations that can predict drug response has started to change the way cancer therapies are prescribed, and it will continue to change as more pharmacogenomic knowledge is amassed.

51

1.5.3 The rapamycin problem

Aside from the growth factor receptor inhibitors, the only other

PI3K/AKT/mTOR‐targeting drugs approved for the treatment of cancer are the analogs of rapamycin (“rapalogs”). Unfortunately, while they do have some efficacy, it has been disappointingly lower than was originally expected [Foster and Toschi

2009; Rini 2008]. In some clinical trials, rapamycin or its analogs achieved a response or stable disease in some patients, but many other patients showed a paradoxical increase in tumor growth or in markers of increased growth signaling

[Cloughesy, et al. 2008; Tabernero, et al. 2008]. One such marker is phosphorylated

AKT (pAKT), which was upregulated in approximately half of the treated tumors

[Cloughesy, et al. 2008; Tabernero, et al. 2008]. Explanations for the increased levels of pAKT range from rapamycin‐induced transactivation of EGFR to the loss of mTORC1‐mediated negative feedback [Chaturvedi, et al. 2009; Garber 2009]. The lack of negative feedback may result in the upregulation of insulin receptor susbrate

1 (IRS‐1), which promotes signaling through the insulin‐like growth factor 1 (IGF1) growth factor receptor [Foster and Toschi 2009; O'Reilly, et al. 2006].

Dual mTOR‐PI3K inhibitors may address this problem, since inhibition of

PI3K should result in inhibition of AKT as well, but those drugs are not yet clinically approved. Since rapalogs will therefore continue to be used for the treatment of several types of cancer, it is important to determine whether pharmacogenomics can help with predicting whether patients will respond or have increased tumor progression in response to rapalog treatment. To date, no somatic mutations have

52

been implicated in rapamycin resistance. However, there have also been no efforts made to use conventional genetic analysis methods to investigate inherited variation that may contribute to rapamycin resistance.

1.6 Current strategies for genetic analysis

Over the last few decades, strategies for genetic analysis have evolved just as quickly as the molecular technology used to produce genetic data. The draft sequence was completed in 2001, but mapping of genetic traits had been ongoing long before that point [Lander, et al. 2001; Venter, et al. 2001]. As of 1995,

500 traits had been mapped to a region of the genome, and 60 had been mapped to specific cloned genes [Lander and Kruglyak 1995]. These traits were mostly rare

Mendelian diseases, but new approaches in recent years have led to the identification of variants responsible for variation in a wide array of quantitative human traits and common diseases [Altshuler, et al. 2008]. With the wealth of biological knowledge and technical advances available today, modern investigators can choose between many approaches for identifying genetic influences on their traits of interest. These approaches can be broadly divided into two categories: genome‐wide and candidate gene strategies.

1.6.1 Genome­wide strategies for genetic analysis

In the early days of human genetics, when little was known about genes, candidate gene studies were impossible. The first types of genetic studies were

53

necessarily genome‐wide studies. There are several main varieties of genome‐wide studies, and they can utilize several different study designs and analysis methods.

They can be used to study dichotomous or quantitative traits, and they can assess these traits in population‐based cohorts or in family‐based models. Some of the most popular genome‐wide analysis methods are linkage analysis, genome‐wide association studies and direct whole‐genome sequencing.

1.6.1.1 Linkage analysis

Linkage analysis studies were being performed in fruit flies as early as 1913, but the lack of known polymorphic sites in humans precluded the use of linkage in human studies until the 1970s [Altshuler, et al. 2008]. In lieu of DNA markers, protein‐based markers could be used to identify linkage (the tendency for traits to be co‐inherited) between easily measurable things like blood type and other traits

[Prescott, et al. 2008]. During the 1980s, a linkage map of the human genome was finally created using restriction fragment length polymorphisms (RFLPs) originally identified in multigenerational families from Utah [Altshuler, et al. 2008; Prescott, et al. 2008]. The proof of the feasibility of this approach came with the mapping of

Huntington’s disease to a locus on chromosome 4 in 1983 [Altshuler, et al. 2008;

Prescott, et al. 2008]. Since then, thousands of informative polymorphic markers have been identified for use in linkage analyses.

Linkage analyses have traditionally been conducted on dichotomous traits that run in families, but they have also been expanded to study quantitative traits.

54

One early finding of linkage studies was that familial breast and ovarian cancer mapped to chromosome 17q21 [Hall, et al. 1990]. Identification of this locus simplified the later efforts that isolated the BRCA1 gene as the one mutated in this disease [Miki, et al. 1994]. The development of the SOLAR linkage analysis program expanded linkage analysis to be able to deal with quantitative traits measured in large pedigrees [Almasy and Blangero 1998]. As a result, samples like the large, multi‐generational Utah families that were collected to make the original human linkage map could be used for further studies. Utah‐based investigators have re‐ contacted these families and measured and mapped a number of additional traits, including dichotomous traits like the ability to taste phenylthiocarbamide [Drayna, et al. 2003; Prescott, et al. 2008]. In addition, lymphoblastoid cell lines had been created from these Utah families and maintained by the non‐profit Centre d’Etude du Polymorphisme Humain (CEPH) for the purposes of having an unlimited supply of genomic DNA [Prescott, et al. 2008]. These cell lines are freely available to researchers and have been used to study a number of quantitative traits, including gene expression and response to ionizing radiation [Jen and Cheung 2003; Morley, et al. 2004].

In recent years, the CEPH lymphoblastoid cell lines (LCLs) have proven to be a useful system for searching for variants involved in pharmacogenomics. These cell lines are genetically well characterized, having been genotyped for over 14,000 microsatellite markers [Shukla and Dolan 2005]. They can also be exposed to a variety of chemotherapeutic drugs at a variety of doses, an intervention that would

55

be impossible with a healthy human population. So far, CEPH LCLs have been used to map loci for cellular response to several chemotherapy drugs, including 5‐ fluorouracil, docetaxel, etoposide, daunorubicin and others [Bleibel, et al. 2009;

Duan, et al. 2007; Watters, et al. 2004; Zhang and Dolan 2009]. The phenotype that is usually used in these studies is apoptosis, often measured using the dye

AlamarBlue [Duan, et al. 2007; Watters, et al. 2004].

While apoptosis is a relevant phenotype to cancer treatment, which attempts to kill tumor cells, it is a broad measure that is reflective of a variety of cellular processes, only some of which may relate to the action of the drug. Endophenotypes are measurements, usually quantitative, that are often mechanistically intermediate between the genetic causes of a trait and the measured endpoint of that trait

[Chamberlain and Menzies 2009]. In recent years, they have become popular measurements for traits like psychiatric diseases, where the mechanisms by which genetic alterations produce a trait are thought to be complex. For example, alcohol dependence is a complex trait that is dichotomized by applying diagnostic criteria from the DSM‐IV or other manuals. In a large family‐based study of alcohol dependence, it was found that the quantitative variable "maximum number of drinks consumed in a 24‐hour period" correlated well with diagnosed alcohol dependence [Saccone, et al. 2000]. Since this measurement was available for more individuals than the official alcohol dependence diagnosis, the use of it as a surrogate phenotype was able to increase the statistical power of the linkage study

[Saccone, et al. 2000]. For drugs with known mechanisms of action, using

56

measurements of their mechanistic targets as endophenotypes or surrogate phenotypes may make linkage studies of pharmacogenomic phenotypes more accurate and powerful.

1.6.1.2 Genome­wide association studies

As genotyping technologies became more inexpensive and high‐throughput, genome‐wide association studies replaced linkage studies as the most popular method for genetic analysis. These studies do not require the use of large family‐ based cohorts, which can be difficult to recruit. Instead, they make use of population‐based groups – for dichotomous traits, this would be one group of individuals with the trait in question (cases) and one without (controls). Hundreds of thousands of common sequence variants are analyzed in both groups, and if a variant is statistically significantly overrepresented in one of the groups, it is considered to be associated with the trait [Altshuler, et al. 2008; Donnelly 2008].

Some statistics are available that make it possible to perform association tests in family‐based cohorts, but association studies are usually performed on case‐control cohorts when available [Borecki and Province 2008].

In the last few years, genome‐wide association studies have succeeded in finding hundreds of common variants that influence quantitative traits and complex disease. These include 30 loci associated with Crohn’s disease, 40 loci associated with height and 20 loci associated with type 2 [Altshuler, et al. 2008;

Donnelly 2008]. Some of these variants have been found in genes or intergenic

57

regions that had never been previously implicated in the disease in question

[Altshuler, et al. 2008]. As a result, light is being shed on the molecular pathogenesis of dozens of diseases.

Genome‐wide association studies have several drawbacks, one of which is that they are unsuitable for the study of a variety of traits. One such group of traits involves the response of individuals to chemotherapeutic agents. Genome‐wide association studies often require hundreds or thousands of participants, and it is not logistically possible to recruit this many cancer patients to pharmacogenomic clinical trials [Altshuler, et al. 2008; Donnelly 2008]. As mentioned in the last section, it is ethically impossible to recruit healthy volunteers to receive these drugs.

Also, there are currently no collections of LCLs from unrelated individuals that are large enough to carry out appropriately powered studies in vitro. Until those collections are created, linkage studies using family‐based LCL collections remain the best model for searching for pharmacogenetically important variants. An additional drawback of genome‐wide association studies is that they are often limited to measuring only variation in simple nucleotide polymorphisms (SNPs).

Increasing evidence suggests that copy number and other structural variation is far more widespread than originally thought and that it may cause a variety of functional consequences [Iafrate, et al. 2004; Korbel, et al. 2007; Stranger, et al.

2007]. A last drawback is that the gold standard for accepting a SNP‐trait association is a replication study in a separate population. This requires the recruitment of another large set of cases and controls, and often times, the results of

58

these “replication” studies do not, in fact, replicate the original findings [Clarke, et al.

2007]. For these reasons, new methods of analysis are being pursued for association studies.

1.6.1.3 Direct whole­genome sequencing

Whole‐genome sequencing technologies have been rapidly declining in cost and increasing in efficiency and accuracy. At some point, they will likely become cheaper than current methods for assessing variation, such as arrays that measure hundreds of thousands or millions of SNPs. When this happens, investigators will be confronted with the problem of how to best utilize the onslaught of genome‐scale data for genetic analysis. Of course, variants identified in individuals will still be able to be used for linkage and association studies. Beyond this, the best methods for analyzing this data are still yet to be determined.

One application of whole‐genome sequencing that has been successful so far is the sequencing of tumors. Several genome sequences from cancer samples have been published and compared to sequence from the patients’ normal tissue. The first of these was from an acute myeloid leukemia [Ley, et al. 2008]. By comparing the genomes, the mutations acquired by the tumor can be identified, and the genes containing those mutations may therefore shed light on the process of tumorigenesis [Ley, et al. 2008; Pleasance, et al. 2009]. Analyzing only germline

DNA sequence to predict predisposition to cancer, however, appears to be much less straightforward. One company, Myriad Genetics, has performed complete

59

sequencing of the BRCA1 and BRCA2 genes in thousands of individuals as part of its diagnostic testing service [Easton, et al. 2007a]. Though there are a handful of variants that are known to predispose individuals to breast cancer, assessing the functional significance of others has proven to be much more difficult [Easton, et al.

2007a]. Improving in silico methods of predicting variants’ functional effects or developing high‐throughput biological assays to determine these effects will be important in analyzing the wealth of sequence data to come.

1.6.2 Candidate gene studies

Though genome‐wide studies are considered to be the only unbiased method of genetic analysis, candidate gene studies are often a cheaper, faster, more focused alternative. Whereas there are only a few main methods for doing genomewide studies, there are infinite ways to structure a candidate gene study. There is usually an incorporation of some positional data (such as a known locus from linkage mapping results) or biological data (such as information on the pathobiology of the disease) or both. Evidence from previously published literature may also figure into the selection of the genes to be studied.

Attempts have been made in the last decade to make candidate gene selection more structured and less prone to the bias of the investigator. As a result, a number of algorithms have been developed to formalize the selection process.

Much like the ways investigators tend to choose candidate genes, these algorithms are all very different from each other in the way they run and the information

60

sources they use. Some make use of data from previous studies, while others require knowledge of the molecular basis of the disease in lieu of data [Adie, et al.

2006; Aerts, et al. 2006; Ma, et al. 2007; Wang, et al. 2007]. Depending on the needs of the investigator, the use of prior data may be helpful if it is available, but if not, it may hinder his or her attempts to identify candidate genes. Also, some of the information sources used by previous algorithms are incomplete, such as the GO database [Ashburner, et al. 2000; Dolan, et al. 2005]. Improving these algorithms to make them less biased and more flexible may improve investigators’ success in performing candidate gene association studies.

61

Figure 1.2: Schematic diagram of the PI3K/AKT/mTOR signaling pathway. Solid single arrows indicate activation by an unspecified mechanism, while solid double‐ headed arrows indicate activation by phosphorylation. Hollow double‐headed arrows indicate inhibition by dephosphorylation, and inhibition by another mechanism is represented by perpendicular lines. Proteins that generally stimulate the pathway are depicted in blue, while proteins that inhibit the pathway are shown in red.

63

Figure 1.3: Illustration of selected drugs that target the PI3K/mTOR/AKT pathway. Pathway members are shown in magenta or dark blue. Activation is indicated by double‐headed arrows (phosphorylation) or single‐headed arrows (activation by another mechanism), and inhibition is indicated by perpendicular lines. Drugs with light purple backgrounds are pre‐clinical, drugs with light blue backgrounds are in clinical trials, and drugs with green backgrounds have been approved for the treatment or prevention of at least one type of cancer.

64

Table 1.1: Selected cell cycle genes with known associations with cancer

Protein Gene Checkpoint Cancer Reference name name p16(INK4) CDKN2A G1/S melanoma [Gruis, et al. 1995; Miller and Mihm 2006] CDK4 CDK4 G1/S melanoma [Soufir, et al. 1998; Zuo, et al. 1996] cyclin D CCND1 G1/S colorectal [Kong, et al. 2001; Le Marchand, et al. 2003] p21 CDKN1A G1/S lung [Sjalander, et al. 1996] RB RB1 G1/S retinoblastoma [Taylor, et al. 2007] BUB1B BUB1B spindle colorectal, [Cahill, et al. rhabdomyosarcoma 1998; Hanks, et al. 2004]

65

Table 1.2: Selected DNA damage response genes with known associations with cancer

Protein Gene Process Cancer References name name MYH MUTYH Base excision repair colorectal [Jones, et al. 2002] XPA XPA Nucleotide excision repair skin cancers [Soufir, et al. 1998; Tanaka, et al. 1990; Zuo, et al. 1996] ERCC3/XPB ERCC3 Nucleotide excision repair skin cancers [Oh, et al. 2006] XPC XPC Nucleotide excision repair skin cancers [Gozukara, et al. 2001] ERCC2/XPD ERCC2 Nucleotide excision repair skin cancers [Broughton, et al. 2001] DDB2/XPE DDB2 Nucleotide excision repair skin cancers [Itoh, et al. 1999] ERCC4/XPF ERCC4 Nucleotide excision repair skin cancers [Sijbers, et al. 1996; Yamamura, et al. 1989] ERCC5/XPG ERCC5 Nucleotide excision repair skin cancers [Emmert, et al. 2001]

66 POLH/XPV POLH Nucleotide excision repair skin cancers [Itoh, et al. 2000]

MSH2 MSH2 Mismatch repair colorectal, others [Fishel, et al. 1993] MSH6 MSH6 Mismatch repair colorectal, others [Miyaki, et al. 1997; Ostergaard, et al. 2005] MLH1 MLH1 Mismatch repair colorectal, others [Hamilton, et al. 1995; Papadopoulos, et al. 1994] MLH3 MLH3 Mismatch repair colorectal, others [Liu, et al. 2003] PMS1 PMS1 Mismatch repair colorectal [Nicolaides, et al. 1994] PMS2 PMS2 Mismatch repair colorectal, others [Hamilton, et al. 1995; Parsons, et al. 1995] DNA ligase LIG4 Non‐homologous end‐joining leukemia [Ben‐Omran, et al. 2005] IV RAD51 RAD51 Homologous recombination breast [Wang, et al. 2001] WRN WRN Homologous recombination multiple [Goto, et al. 1996]

BLM BLM Homologous recombination multiple [Ellis, et al. 1995] ATM ATM Damage recognition and signaling leukemia, [Boultwood 2001] lymphoma, breast CHK2 CHEK2 Damage recognition and signaling multiple [Antoni, et al. 2007] BRCA1 BRCA1 Damage recognition and signaling breast, ovarian [Easton, et al. 1993] BARD1 BARD1 Damage recognition and signaling breast [Karppinen, et al. 2004] PALB2 PALB2 Damage recognition and signaling breast, pancreatic, [Erkko, et al. 2007; Jones, et al. others 2009] BRCA2 BRCA2 Damage recognition and signaling breast, ovarian, [Howlett, et al. 2002; Wooster, et others al. 1995] MDM2 MDM2 Damage recognition and signaling multiple [Bond, et al. 2004] p53 TP53 Damage recognition and signaling multiple [Meek 2009] p21 CDKN1A Damage recognition and signaling lung [Sjalander, et al. 1996] NBS NBS1 Damage recognition and signaling multiple [Dzikiewicz‐Krawczyk 2008]

67

Table 1.3: Selected cell death genes with known associations with cancer

Protein Gene Process Cancer References Fas‐L FASLG apoptosis lung [Jones, et al. 2002; Zhang, et al. 2005] Fas FAS apoptosis lung, lymphoma [Straus, et al. 2001; Zhang, et al. 2005] TRAILR2 TNFRSF10B apoptosis skin [Pai, et al. 1998] Caspase‐8 CASP8 apoptosis breast, lung [Cox, et al. 2007; Sun, et al. 2007] XIAP XIAP apoptosis lymphoma [Rigaud, et al. 2006] NRAS NRAS apoptosis leukemia, lymphoma [Oliveira, et al. 2007]

Table 1.4: Selected growth and angiogenesis genes with known associations with cancer

Protein Gene Process Cancer References VHL VHL angiogenesis multiple [Latif, et al. 1993] PHD2 EGL9 angiogenesis paraganglioma [Ladroue, et al. 2008] CBP CREBBP angiogenesis multiple [Miller and Rubinstein 1995; Petrij, et al. 1995] p300 EP300 angiogenesis multiple [Miller and Rubinstein 1995; Roelfsema, et al. 2005] VEGFR2 KDR angiogenesis hemangioma [Jinnin, et al. 2008] EPHB2 EPHB2 angiogenesis prostate [Kittles, et al. 2006] PDGFRα PDGFRA growth gastrointestinal [Chompret, et al. 2004] PDGFB PDGFB growth meningioma [Smidt, et al. 1990] PRKAR1A PRKAR1A growth multiple [Kirschner, et al. 2000]

68

Table 1.5: Cancer­associated variants and other alterations in genes in the PI3K/AKT/mTOR pathway

Protein Gene Type of alteration Cancer (percent of cases) References EGFR EGFR somatic lung (27%) [Forbes, et al. 2009]* EGFR EGFR somatic adrenal (9%) [Forbes, et al. 2009]* EGFR EGFR somatic thyroid (6%) [Forbes, et al. 2009]* EGFR EGFR somatic central nervous system (6%) [Forbes, et al. 2009]* EGFR EGFR somatic prostate (6%) [Forbes, et al. 2009]* EGFR EGFR amplification non‐small cell lung (59%) [Hirsch, et al. 2008] EGFR EGFR amplification breast (5%) [Bhargava, et al. 2005] PDGFRα PDGFRA somatic small intestine (55%) [Forbes, et al. 2009]* PDGFRα PDGFRA somatic stomach (20%) [Forbes, et al. 2009]* PDGFRα PDGFRA somatic soft tissue (12%) [Forbes, et al. 2009]* PDGFRα PDGFRA inherited gastrointestinal (rare) [Chompret, et al. 2004] 69 PDGFB PDGFB inherited meningioma (rare) [Smidt, et al. 1990]

HER2 ERBB2 amplification breast (18‐20%) [Penault‐Llorca, et al. 2009] HER2 ERBB2 amplification lung (5‐10%) [Herbst, et al. 2008] p85α/p55α PIK3R1 somatic central nervous system (10%) [Forbes, et al. 2009]* /p50α p110α PIK3CA somatic breast (26%) [Forbes, et al. 2009]* p110α PIK3CA somatic urinary tract (23%) [Forbes, et al. 2009]* p110α PIK3CA somatic endometrial (22%) [Forbes, et al. 2009]* p110α PIK3CA somatic large intestine (14%) [Forbes, et al. 2009]* p110α PIK3CA somatic cervical (10%) [Forbes, et al. 2009]* p110α PIK3CA somatic ovarian (9%) [Forbes, et al. 2009]* p110α PIK3CA somatic skin (9%) [Forbes, et al. 2009]* p110α PIK3CA somatic stomach (8%) [Forbes, et al. 2009]*

p110α PIK3CA somatic upper digestive tract (8%) [Forbes, et al. 2009]* p110α PIK3CA somatic pancreatic (7%) [Forbes, et al. 2009]* p110α PIK3CA somatic central nervous system (6%) [Forbes, et al. 2009]* p110α PIK3CA somatic liver (6%) [Forbes, et al. 2009]* p110α PIK3CA somatic esophageal (5%) [Forbes, et al. 2009]* p110α PIK3CA somatic thyroid (5%) [Forbes, et al. 2009]* p110α PIK3CA amplification gastric (36%) [Liu, et al. 2009] p110α PIK3CA amplification head and neck (32%) [Liu, et al. 2009] p110α PIK3CA amplification lung (23%) [Liu, et al. 2009] p110α PIK3CA amplification ovarian (12%) [Liu, et al. 2009] p110α PIK3CA amplification breast (9%) [Liu, et al. 2009] p110α PIK3CA amplification thyroid (9%) [Liu, et al. 2009] p110α PIK3CA amplification glioblastoma (6%) [Liu, et al. 2009]

70 p110α PIK3CA amplification esophageal (6%) [Liu, et al. 2009]

PTEN PTEN somatic endometrial (42%) [Forbes, et al. 2009]* PTEN PTEN somatic central nervous system (20%) [Forbes, et al. 2009]* PTEN PTEN somatic prostate (19%) [Forbes, et al. 2009]* PTEN PTEN somatic skin (16%) [Forbes, et al. 2009]* PTEN PTEN somatic large intestine (13%) [Forbes, et al. 2009]* PTEN PTEN somatic lung (9%) [Forbes, et al. 2009]* PTEN PTEN somatic urinary tract (8%) [Forbes, et al. 2009]* PTEN PTEN somatic ovarian (8%) [Forbes, et al. 2009]* PTEN PTEN somatic hematopoietic/lymphoid (8%) [Forbes, et al. 2009]* PTEN PTEN somatic bone (7%) [Forbes, et al. 2009]* PTEN PTEN somatic cervical (7%) [Forbes, et al. 2009]* PTEN PTEN somatic stomach (6%) [Forbes, et al. 2009]* PTEN PTEN somatic soft tissue (6%) [Forbes, et al. 2009]*

PTEN PTEN somatic breast (6%) [Forbes, et al. 2009]* PTEN PTEN somatic liver (5%) [Forbes, et al. 2009]* PTEN PTEN somatic thyroid (5%) [Forbes, et al. 2009]* PTEN PTEN inherited multiple tissues (rare) [Blumenthal and Dennis 2008] PTEN PTEN loss of heterozygosity melanoma (37%) [Liu, et al. 2009] PTEN PTEN loss of heterozygosity prostate (30%) [Liu, et al. 2009] PTEN PTEN loss of heterozygosity glioblastoma (28%) [Liu, et al. 2009] PTEN PTEN loss of heterozygosity breast (25%) [Liu, et al. 2009] PTEN PTEN loss of heterozygosity gastric (25%) [Liu, et al. 2009] AKT1 AKT1 somatic thyroid (5%) [Forbes, et al. 2009]* AKT2 AKT2 amplification ovarian (14%) [Liu, et al. 2009] TSC1 TSC1 inherited multiple tissues (rare) [van Slegtenhorst, et al. 1997] TSC2 TSC2 inherited multiple tissues (rare) [Kumar, et al. 1995]

71 *Data taken from the COSMIC website (http://www.sanger.ac.uk/genetics/CGP/cosmic) for tumor types with at least

50 samples and a mutational prevalence of greater than 5 percent.

Chapter 2

Identification of loci that influence heritable signaling phenotypes

72

2.1 Abstract

While there exists a wealth of information about genetic influences on gene expression levels, little is known about how inherited variation might influence the expression and post‐translational modifications of proteins, especially those involved in intracellular signaling. The PI3K/AKT/mTOR signaling pathway is one such pathway that has been implicated in a number of diseases, including a variety of cancers. To assess whether the activation of this pathway is influenced by genetic factors, phosphorylated and total levels of three key proteins in the pathway (AKT1, p70S6K, 4E‐BP1) were measured by ELISA in lymphoblastoid cell lines from 14

CEPH families. Interestingly, the phenotypes with the highest heritability were the ratios of phosphorylated to total protein for two of the pathway members: AKT1

(h2=0.18) and p70S6K (h2=0.23). Linkage analysis identified several peaks of interest for these phenotypes, including a peak for the AKT1 phenotype on chromosome 14 and peaks for both phenotypes that overlapped on chromosome 3.

The overlapping peaks were the result of a genetic correlation between the two ratios that produced a maximum LOD score of 2.91. These findings represent the first quantitative trait loci identified for direct measurements of signaling protein activation and may be useful in understanding the pathogenesis of the many diseases associated with this pathway.

73

2.2 Introduction

Recent advances in human genetics have included large genomewide association studies in which variants have been associated with a particular clinical outcome, such as the development of a disease like type 2 diabetes [Saxena, et al.

2007; Scott, et al. 2007; Sladek, et al. 2007]. However, when there is a detailed understanding of the biological basis of a particular disease, a potentially more tractable approach is to search for variants that influence the molecular‐level alterations that cause the disease [Jansen and Nap 2001; Schadt, et al. 2003]. To this end, studies have identified variants associated with known like blood glucose levels for metabolic syndrome or bone mineral density for osteoporosis

[Cho, et al. 2009; Dupuis, et al. ; Richards, et al. 2008; Saxena, et al. ; Xiong, et al.

2009].

At a more fine‐grained level, several groups have published analyses of genetic variation associated with mRNA expression levels of most human genes, often using cell line models [Cheung, et al. 2005; Monks, et al. 2004; Morley, et al.

2004]. These results have been used to identify expression quantitative trait loci

(eQTLs) and potential causal variants for other molecular phenotypes, including drug response [Huang, et al. 2008a; Huang, et al. 2008b]. At the heart of eQTL studies is the idea that a gene’s mRNA expression level is related to the expression level of its encoded protein, which drives the phenotype in question. In spite of the common lack of correlation between mRNA and protein expression levels, loci have been identified for comparatively few human protein levels [Calafell, et al. ; He, et al.

74

; Maier, et al. 2009; Sun, et al.]. Even fewer studies have investigated genetic influences on the post‐translational modifications that can be so critical in determining a protein’s activity.

The paucity of loci associated with levels of protein expression or post‐ translational modifications can be attributed partially to the lack of methods for accurate, inexpensive, high‐throughput measurement of proteins. While thousands of nucleic acid measurements can be done quickly and relatively cheaply using microarray platforms, a similar technology does not exist for protein studies.

Traditional antibody‐based protein detection methods are highly dependent on the source and lot of the antibodies used, and multiplexing of different protein assays is rare. Mass spectrometry has become an increasingly popular method for analyzing most of the proteome, but it has several drawbacks [Bonilla, et al. 2008; Foss, et al.

2007; Gstaiger and Aebersold 2009; Witze, et al. 2007]. These include high cost and lower sensitivity for the detection of proteins that are expressed or post‐ translationally modified at low levels [Bonilla, et al. 2008; Witze, et al. 2007].

Proteins involved in cell signaling are particularly influenced by post‐translational alterations like phosphorylation, so understanding genetic influences on their activity requires the use of a platform that can reliably measure these alterations.

One important cell signaling pathway is the phosphoinositide 3‐ kinase/AKT/mechanistic target of rapamycin (PI3K/AKT/mTOR) pathway. It is well‐characterized and has been linked to the pathology of a number of diseases, including cancer. The stimulation of this pathway begins with the activation of

75

receptor tyrosine kinases, which generally results in the recruitment and activation of PI3K family members [Carracedo and Pandolfi 2008; Hawkins, et al. 2006]. Upon activation, PI3K phosphorylates phosphotidylinositol‐4,5‐bisphosphate (PIP2) to convert it to PIP3, a process opposed by PTEN [Carracedo and Pandolfi 2008;

Hawkins, et al. 2006]. Increased PIP3 levels cause AKT to be recruited to the plasma membrane, where it is phosphorylated and activated at Thr308 by PDK1 and at

Ser473 by mTOR complex 2 (mTORC2) [Franke 2008; Hennessy, et al. 2005; Liu, et al. 2009]. AKT then phosphorylates a number of targets, resulting in the activation of mTOR complex 1 (mTORC1), which in turn phosphorylates additional proteins, including p70 S6 kinase (p70S6K) and 4E‐BP1 [Alessi, et al. 2009; Corradetti and

Guan 2006; Garcia‐Echeverria and Sellers 2008]. The net result of the activation of this pathway is an increase in translation of some proteins, including key growth regulatory proteins like cyclin D1 and c‐Myc [Corradetti and Guan 2006; Guertin and Sabatini 2007].

Like other important signaling pathways, the PI3K/AKT/mTOR pathway is often activated in tumors by somatic mutations. In addition to these, inherited variants in the PI3K/AKT/mTOR pathway have also been implicated in several types of cancer. Rare inherited variants in pathway genes such as PTEN and two genes that encode mTOR regulatory proteins (TSC1 and TSC2) have long been known to cause cancer syndromes [Blumenthal and Dennis 2008; Kumar, et al.

1995; van Slegtenhorst, et al. 1997]. Patients with these mutations develop benign growths called hamartomas and are also at a greatly increased risk for a variety of

76

malignant cancers. In addition to its role in the onset of cancer, the

PI3K/AKT/mTOR pathway may also be involved in mediating the response to cancer treatment; one study has shown with associations between SNPs in several pathway genes and several esophageal cancer outcomes including recurrence risk, survival and treatment response [Hildebrandt, et al. 2009].

Given the importance of the PI3K/AKT/mTOR pathway in such a broad array of cancer types, a better understanding of variation in its activation is essential. The research presented here utilized a commonly used lymphoblastoid cell line collection in combination with standard human genetics techniques to conduct the first search for variants that are associated with direct measures of

PI3K/AKT/mTOR activity. This may also represent the first attempt at identifying

QTLs for proteins with specific post‐translational modifications. Here, the natures of these signaling phenotypes are explored, and QTLs associated with several

PI3K/AKT/mTOR signaling phenotypes are presented.

2.3 Materials and Methods

2.3.1 Cell culture

122 lymphoblastoid cell lines from 14 families in the CEPH (Centre d’Etude du Polymorphisme Humain) collection were seeded at a density of 250,000 cells/ml and a total volume of 20 ml in duplicate flasks. Growth media consisted of RPMI

(Invitrogen) plus 10% fetal bovine serum (Sigma). After 48 hours, the duplicate flasks were pooled and cells were washed twice in 1 milliliter of ice‐cold PBS.

77

During the last PBS wash, 100 microliters of cell suspension were removed, pelleted and frozen at ‐80°C for later RNA extraction. The remainder of the cell suspension was pelleted and immediately lysed using the protein extraction protocol best suited for the technique chosen for downstream protein analysis.

2.3.2 Western blotting

Lysis buffer consisted of RIPA buffer (150 mM NaCl, 1% Triton X‐100, 0.5% sodium deoxycholate, 0.1% SDS, 50 mM Tris‐HCl, pH 8.0) supplemented with protease and phosphatase inhibitors (Pierce). Cells were resuspended in 200 µL of lysis buffer and then shaken at 4°C for 30 minutes. Cell suspensions were then centrifuged at 4°C at 20,000 x g for 15 minutes. Supernatants were transferred to new tubes and stored at ‐80°C until needed. Protein concentrations were measured using the Bio‐Rad DC Protein Assay and samples were diluted in lysis buffer to a final concentration of 2 µg/µL.

For each lane, 12 µL of cell lysate (24 µg of total protein) was supplemented with 4X NuPAGE LDS Sample Buffer and 10X NuPAGE Reducing Agent (Invitrogen).

Deionized water was added to reach a final volume of 20 µL, and the sample was heated at 70°C for 10 minutes. Samples were loaded into NuPAGE Novex 10% Bis‐

Tris gels (Invitrogen) and run in NuPAGE MOPS SDS running buffer supplemented with NuPAGE Antioxidant (Invitrogen). Proteins were then transferred to methanol‐activated PVDF using a semi‐dry transfer apparatus and NuPAGE Transfer

78

Buffer supplemented with NuPAGE Antioxidant (Invitrogen). Membranes were washed twice in TBST and blocked with 3% BSA for one hour at room temperature.

Primary antibodies were diluted in TBST and incubated with the membrane overnight at 4°C. Primary antibodies for the total and phosphorylated forms of the following proteins were used (all antibodies from Cell Signaling Technologies): p90RSK, p70S6K, AKT, MEK1/2, ERK1/2, S6 ribosomal protein (S6rp) and glyceraldehyde 3‐phosphate dehydrogenase (GAPDH). Following primary antibody incubation, membranes were washed twice in TBST. Quantum dot‐conjugated secondary antibodies (Qdot 605 goat anti‐rabbit IgG conjugate, Qdot 565 goat anti‐ mouse IgG conjugate, Invitrogen) were then diluted 1:1000 in SEA BLOCK blocking reagent (Pierce) and incubated with the membrane for one hour at room temperature. Membranes were then washed three times in TBST and twice in TBS before being imaged using a GelDoc system (Bio‐Rad). QuantityOne software was used to detect bands and quantitate their densities (Bio‐Rad).

2.3.3 ELISAs

Protein was extracted from cell pellets using lysis buffer (100 mM Tris, 100 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton‐X, 10% glycerol, 0.1% SDS, 0.5% sodium deoxycholate) supplemented with PMSF (Sigma) and protease and phosphatase inhibitors (Pierce). Cell suspensions were incubated on ice for 30 minutes with periodic vortexing before being centrifuged at 13,000 x g at 4°C for 10 minutes. Supernatants were removed and then applied to Vivaspin 500

79

concentrating columns (Sartorius) and spun at 15,000 x g at 4°C for one hour.

Concentrated lysates were stored at ‐80°C, and protein concentrations were determined using the Bio‐Rad DC Protein Assay.

ELISAs were performed using kits for measuring phospho‐AKT1 [pS473], total AKT1, phospho‐p70S6K [pT389], total p70S6K, phospho‐4E‐BP1 [pT46] and total 4E‐BP1 (Invitrogen/Biosource). Equal amounts of protein from each of the

122 protein samples was assayed in triplicate, and recombinant protein standards and pooled lysate standards were present on each 96‐well plate for quality control purposes. Protocols provided for each kit were followed exactly. Final absorbance at 450 nm was read using a Beckman Coulter DTX880 plate reader.

2.3.4 Data cleaning, heritability and linkage analysis

For each ELISA assay, it was confirmed that measured OD readings fell within the linear range of the standard curve. Triplicate OD readings were averaged for both the phosphorylated and total protein ELISAs, and ratios of phosphorylated to total protein were calculated, resulting in a total of nine phenotypes (total, phosphorylated and phosphorylated:total ratio for AKT1, p70S6K and 4E‐BP1).

Standard deviations were calculated for ratios based on the formula for the variance of an unbiased ratio estimator presented by van Kempen and van Vliet [van Kempen and van Vliet 2000].

Genetic analyses were conducted using the SOLAR software package. To adjust the data to meet SOLAR’s requirements, each of the nine phenotypes was

80

assessed for normality, transformed as needed with an inverse or logarithmic transformation and multiplied by a constant. To assess heritability for these phenotypes, polygenic models were created in SOLAR. A handful of covariates were screened for each phenotype, including age, sex, age × sex, age2, age2 × sex, ELISA plate number, ELISA plate column position and protein concentration of the lysate.

The “polygenic –screen” command was used to screen for statistically significant covariates and to estimate the additive genetic heritability.

Multipoint linkage analysis was conducted using SOLAR for traits with heritability measures greater than 15 percent. Genotypes for 126 individuals in 14

CEPH families were obtained from version 10.0 of the CEPH genotype database, which includes both microsatellite markers and SNPs. Genetic locations of these markers were determined using data from the Marshfield map. The total number of genotypes available for these markers was 95,278 at 7,403 loci. 30 markers with

Mendelian inconsistencies or other errors were identified by SOLAR and removed.

Before running the linkage analysis, the “lodadj –calc” command was used to create a distribution of LOD scores under the null hypothesis of no linkage. Bivariate linkage analysis of the AKT1 and p70S6K ratio phenotypes was also conducted in

SOLAR. For this analysis, covariates that were statistically significant for each of the individual traits were applied to the respective traits, and testing was performed using the “polygenic –testrhog” command.

2.4 Results

81

2.4.1 Selection of experimental method for measuring signaling proteins

Both Western blots and ELISAs were tested for their ability to accurately and reliability measure levels of total and phosphorylated signaling proteins. In contrast to ELISAs, which can often be performed in a 96‐well format, a typical Western blot often measures far fewer samples. In order to develop a more high‐throughput

Western blotting approach, an attempt was made to profile multiple signaling proteins for each sample. In theory, the number of multiplexed proteins analyzed by Western blot is limited only by the sizes of the target proteins; bands for distinct proteins should not overlap, and the smallest and largest proteins should be able to be visualized on the same blot. Working with these limitations, antibodies for total levels of six signaling proteins from the PI3K/AKT/mTOR and RAF/MEK/ERK pathway were multiplexed along with an antibody for GAPDH as a control (Figure

2.1A). The bands were automatically quantitated using measurements of fluorescence intensity. After normalizing to GAPDH band intensity, intensity levels for five of the six proteins (p90RSK, p70S6K, ERK1/2, AKT and S6rp) showed low inter‐gel variation (Figure 2.1B). This indicated high precision for this method, but a further experiment suggested low accuracy. Specifically, an experiment to assess the linearity of the Western blot approach failed to show a linear relationship between the loaded amount of total protein and the resulting GAPDH band intensity

(Figure 2.1C). Further investigation of this finding revealed that band intensity was highly dependent on the placement of the blot within the imaging apparatus; a control band from a commercially‐produced protein ladder showed very different

82

intensities on three different blots depending on how the blots were arranged relative to the light source (Figure 2.1D). It is possible that these difficulties could have been overcome with a more advanced imaging apparatus. However, parallel experiments attempting to develop a multiplexed Western blot assay for measuring the phosphorylated forms of these signaling proteins also failed due to insufficient sensitivity (data not shown).

ELISAs proved to be more reliable than Western blots in accurately and precisely measuring total and phosphorylated signaling proteins. Though phosphorylated levels of some signaling proteins (ERK1/2, STAT1, STAT3) were undetectable in test lysates using commercial ELISA kits, robust results were obtained for both phosphorylated and total levels of three members of the

PI3K/AKT/mTOR signaling pathway: AKT1 [pS473], p70S6K [pT389] and 4E‐BP1

[pT46]. In contrast to the Western blot experiments, the use of recombinant protein standards for the ELISAs demonstrated a clear linear relationship between protein concentration and the quantitative assay readout (data not shown). Additionally, pilot experiments demonstrated that ELISA detection of phosphorylated protein was robust to variations in sample handling, including differing numbers of freeze‐ thaw cycles (Figure 2.2A). Experiments performed on different days using ELISA kits from different lots yielded slightly different results, though similar patterns were seen across samples in each experiment (Figure 2.2B). Finally, similar results were seen within cell lines that had been split several ways before being cultured and harvested to produce separate protein samples, while variation between cell

83

lines persisted (Figure 2.2C). These results strongly suggested that ELISAs would be the best way to profile signaling proteins in the full panel of 122 lymphoblastoid cell lines chosen for genetic analyses.

2.4.2 Quality control analysis of signaling phenotypes

To limit any spurious effects of experimental variation, the ELISA experiments for each target protein were performed on a single day, using kits from identical lots. When comparing the pooled lysates used as controls on each 96‐well

ELISA plate, it was found that all assays had low to moderate coefficients of variation (CVs), which ranged from 7.6 percent (phospho‐AKT1) to 18.7 percent

(total 4E‐BP1). Interestingly, the CVs for the raw quantitative readout of absorbance at 450 nm, or OD(450), were generally lower than the CVs calculated using the measurement of protein concentration derived using the kit‐supplied recombinant protein standards. Because of this, OD(450) measurements were used for subsequent analyses of the total and phosphorylated forms of each protein. As an additional quantitative phenotype, the ratio of the OD(450) for the phosphorylated form of each protein to the OD(450) of the total protein was calculated. This resulted in three phenotypes (total, phosphorylated, phosphorylated to total ratio) for each of the three signaling proteins (AKT1, p70S6K, 4E‐BP1), for a total of nine phenotypes.

After transforming each of the nine phenotypes to achieve normality, they were individually assessed to see whether they were statistically significantly

84

associated with any covariates. Tested covariates included variables describing the individuals (such as sex and age) and experimental variables (such as initial protein concentration.) Statistically significant covariates are shown in Table 2.1 and were adjusted for before continuing with further analyses.

2.4.3 Variation and heritability of signaling phenotypes in lymphoblastoid cell lines

It was first confirmed for each of the three measured proteins (AKT1, p70S6K and 4E‐BP1) that the covariate‐adjusted phenotypes were variable across the 122 cell lines used in this study. These measurements varied to different extents. For example, when considering only the total protein measurements in unrelated individuals, p70S6K showed the lowest fold difference between the highest and lowest measurements (1.52 ± 0.10), while 4E‐BP1 had the highest (2.43

± 0.13). The fold differences for measurements of phosphorylated proteins appeared to be slightly more variable (1.99 ± 0.02, 1.35 ± 0.13 and 4.80 ± 0.07 for

AKT1, p70S6K and 4E‐BP1, respectively), as did the phosphorylated:total ratios

(2.91 ± 0.03, 1.46 ± 0.07 and 2.92 ± 0.07 for AKT1, p70S6K and 4E‐BP1, respectively). Plots showing covariate‐adjusted measurements for all nine phenotypes in unrelated individuals can be found in Figure 2.3.

Given that variation existed, it was possible to assess the phenotypes’ heritability in order to determine the percentage of the total variation attributable to genetic effects. Heritabilities of the normally distributed, covariate‐adjusted phenotypes were generally very low (<3 percent), with two exceptions (Table 2.2).

85

The phosphorylated:total ratios for two proteins, AKT1 and p70S6K, displayed moderate heritability (h2 = 17.99 percent and 23.13 percent, respectively). It is important to note that the p70S6K ratio had been adjusted for five covariates, including two variables describing on which ELISA plate the samples were measured and three variables related to the individual’s age at the time the pedigrees were initially obtained (age, age × sex and age2 × sex, Table 2.1). Since individual ages were not available for many of the CEPH pedigrees, only 62 of 122 individuals were available to use for further analysis of the p70S6K ratio phenotype.

2.4.4 Linkage analysis of heritable phenotypes

Linkage analysis was conducted for both the AKT1 and p70S6K ratio traits.

While neither trait obtained a genome‐wide statistically significant LOD score, there was suggestive linkage for p70S6K on chromosome 3, with a maximum LOD score of

2.37 (Figure 2.4A). This peak was located at the 36 cM position and had a one‐LOD confidence interval that spanned the region from 21 cM to 46 cM. There appeared to be a second peak on chromosome 9, but the presence of multiple convergence errors in this region casts some doubt on its reliability. For the AKT1 ratio phenotype, the maximum LOD score was 1.70 at the 112 cM mark on chromosome

14 (Figure 2.4B). The one‐LOD confidence interval for this linakge peak extended from 87 cM to the q‐terminus of the chromosome (approximately 134 cM).

Interestingly, there was also a smaller peak for the AKT1 ratio on chromosome 3 (LOD=1.38 at 57 cM) close to the major peak for the p70S6K ratio at

86

36 cM. Since AKT1 is known to act upstream from p70S6K in the PI3K/AKT/mTOR pathway, it was possible that the two peaks were the result of a situation in which genetic variation in this region influenced one protein’s phosphorylated:total protein ratio, which then influenced the other protein’s ratio. A bivariate analysis revealed that the phenotypes were genetically correlated (RhoG = 0.980, P = 0.009).

A linkage scan for the correlated traits revealed a peak on chromosome 3 with a maximum LOD score of 2.91, indicating that genes in this region influence both traits (Figure 2.5). Incorporating each of the ratio traits as a covariate for the other trait resulted in retention of the peak for the AKT1 ratio but not for the p70S6K ratio

(data not shown). This implies that the causal variants in this region mainly influence the AKT1 ratio, which then, in turn, influences the p70S6K ratio.

2.5 Discussion

Variation in the activation of the PI3K/AKT/mTOR pathway has been linked to a variety of human cancers. To assess this variation, an ELISA method was established for measuring the amounts of total and phosphorylated AKT1, p70S6K and 4E‐BP1 in lymphoblastoid cell lines from the CEPH collection. These measurements, as well as the ratios of phosphorylated to total AKT1, p70S6K and

4E‐BP1 were variable. Though little of the variation measured for most of the phenotypes was due to genetic influences, two phenotypes did show moderate heritability – the phosphorylated to total ratios of AKT1 (17.99 percent) and p70S6K (23.13 percent). The phosphorylated to total ratio of 4E‐BP1 was not

87

heritable (h2 = 0). Linkage analysis of the AKT1 ratio phenotype produced a maximum LOD score of 1.70 on chromosome 14 and a secondary peak on chromosome 3. The p70S6K ratio phenotype mapped to the same region of chromosome 3 as the AKT1 phenotype, and both traits were found to be genetically correlated, with a maximum LOD score of 2.91 for the correlated traits.

The degree of genetic influence on the phosphorylation of these three proteins appears to be similar to analogous results using gene expression and protein phenotypes. In cell lines from the CEPH collection, approximately 30 percent of differentially expressed genes appear to have statistically significant heritability, and heritability is high (greater than 60 percent) in only 10 percent of these genes [Huang, et al. 2007; Monks, et al. 2004; Schadt, et al. 2003]. In contrast to the wealth of expression data available, information on protein expression is more sparse. Handfuls of proteins have been measured in blood samples, including

C‐reactive protein (CRP), serum insulin and fibrinogen, and heritabilities for these traits range from 18‐60 percent [Pilia, et al. 2006; Yuan, et al. 2008]. Heritability data for protein phenotypes in human cell lines is even rarer. Only one study was identified in which the heritability of a specific protein (chromogranin B) was determined in cell lines, and the results were highly variable and dependent on which cleavage product was measured (h2=0.378‐0.910) [Greenwood, et al. 2004].

Given that relatively few gene and protein expression phenotypes are heritable, the finding presented here of low heritability for the total levels of AKT1, p70S6K and 4E‐BP1 is not surprising. Additionally, the two heritability levels

88

observed for the ratios of phosphorylated to total AKT1 and p70S6K (17.99 percent and 23.13 percent, respectively) fall within the range of h2 values observed for other gene and protein expression phenotypes. It is also interesting to note that both of the heritable phenotypes identified in this study are ratios. It is possible that by taking a ratio of two measurements, some experimental sample‐to‐sample noise is eliminated, reducing the overall variation and increasing the heritability. In fact, a study of metabolic phenotypes identified stronger genetic associations when considering ratios of metabolites rather than single metabolite measurements

[Gieger, et al. 2008].

Though heritability measurements are not generally correlated with maximum LOD scores, the only two phenotypes in this study that displayed linkage peaks of interest were the two with moderate heritabilities [Huang, et al. 2007]. The maximum LOD score for the AKT1 ratio phenotype was only 1.70, falling short of the

1.80 threshold for suggestive linkage. If the signal represents true linkage, it may be that its lower LOD score is reflective of the small sample size used in these experiments; the use of additional CEPH pedigrees might have increased the LOD score of this peak. The secondary peak for the AKT1 phenotype on chromosome 3 also initially seemed to be too low to accept statistically. However, when it was considered in tandem with the p70S6K phenotype that mapped to the same area, the traits were found to share a genetic component that produced a much higher

LOD score of 2.91 in the same area.

89

Given what is known about the architecture of the PI3K/AKT/mTOR signaling pathway, this result is highly plausible biologically. A variant in this region that influences the ratio of phosphorylated to total AKT1 would likely also produce a change in the ratio of phosphorylated to total p70S6K, since AKT1 activation results in the mTOR‐mediated phosphorylation and activation of p70S6K. The disappearance of the chromosome 3 linkage peak for p70S6K when AKT1 levels are included as a covariate confirms this directionality. This represents evidence that this cell line genetic platform can confirm existing biology as well as be used to uncover new information.

The work presented here represents a novel and successful genetic method for exploring inter‐individual variation in cell signaling. With high‐throughput platforms for measuring protein levels and post‐translational modifications on the horizon, this method will be a simple, straightforward way in which to identify biologically significant variants that may be relevant to any field in which cell signaling is important, including cancer research.

90

Figure 2.1: Development of Western blot assay for signaling proteins. (A) Antibodies for six signaling proteins in the PI3K/AKT/mTOR and RAF/MEK/ERK signaling pathways were successfully multiplexed along with an antibody for the GAPDH control. Several antibodies (p70 S6K, AKT, MEK1/2 and ERK1/2) produce two adjacent bands of different molecular weights. (B) Inter‐gel variation is shown for a single protein sample. The sample was run on four independent gels, (represented by the blue, red, green and purple columns), and blots were probed for five signaling proteins and GAPDH. Intensities for each signaling protein in a gel lane were normalized to the intensity of the GAPDH band in that lane. (C) Linearity of the GAPDH control is assessed for three different protein samples, represented by the blue diamonds, red squares and green triangles. There is no consistent linear relationship between protein loaded and the area under the intensity peak. (D) Effects of gel position on the intensity of a high molecular weight (MW) band in a protein ladder are assessed. Three replicate blots (represented by the blue, red and green columns) were arranged in three different configurations on the GelDoc imager. The intensity of the ladder band varies widely depending on the arrangement of the blots.

91

Figure 2.2: Quality control measurements for ELISAs. Data shown are specific to the phospho‐AKT1 [S473] ELISA, though levels of variation for the other five ELISAs are comparable. (A) Aliquots of the same protein sample were thawed on ice and refrozen the specified number of times, with no appreciable differences in the final ELISA measurements. Error bars indicate the standard deviation across replicate assay wells. (B) Five samples were used in three ELISA assays (red, blue and green columns) which were performed on different days using kits from different lots. There is some variation across different experiments, though the measurements of the samples relative to each other appear to be consistent. Error bars indicate standard deviation across triplicate wells for each sample. (C) Each cell line was split into five separate flasks, producing five separate protein samples. Error bars indicate standard deviation across these five samples, which is relatively small for all cell lines.

92

AKT1 (total) AKT1 (phospho Ser473) AKT1 ratio )

2.0 2.5 0 2.5 5 4 (

D 2.0 ) )

2.0 O

l 0 1.5 0 a 5 5 t 4 4 o ( ( t 1.5 1.5 : D D o h O 1.0 O

p A A 1.0 s 1.0 o S S I I h L L p

E E 0.5 f 0.5 o

0.5 o i t a

r 0.0 0.0 0.0

p70S6K (total) p70S6K (phospho Thr389) p70S6K ratio )

0.8 0.8 0 1.2 5 4 (

D 1.0 ) ) O

l 0 0.6 0 0.6 a 5 5

t 0.8 4 4 o ( ( t : D D o 0.6 h O 0.4 O 0.4

p A A s o S S

I I 0.4 h L L p

E E 0.2 0.2 f

o 0.2

o i t a

r 0.0 0.0 0.0

4E-BP1 (total) 4E-BP1 (phospho Thr46) 4E-BP1 ratio )

1.5 0.6 0 0.6 5 4 ( D ) ) O

l 0 0 a 5 5

t 0.4

4 1.0 4 0.4 o ( ( t : D D o h O O

p A A s o S S

I I 0.2

0.5 0.2 h L L p

E E f o

o i t a

r 0.0 0.0 0.0

Figure 2.3: Variation in AKT1, p70S6K and 4E‐BP1 phenotypes. ELISA measurements of three different proteins in cell lines from 26 unrelated individuals are shown here. The proteins are AKT1 (top row), p70S6K (middle row) and 4E‐ BP1 (bottom row). The measurements are of total amounts of the protein (left column), of the amounts of the protein phosphorylated at the site indicated (middle column) and of the ratio of phosphorylated to total protein (right column). Measurements were adjusted for statistically significant covariates before graphing and are sorted from highest to lowest within each graph. Error bars represent standard error of the mean (s.e.m.).

93

A 2.5

2.0

1.5 D O

L 1.0

0.5

0.0 1 2 3 4 5 6 7c M 8p o s 9it io n 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome B 2.0

94 1.5

D

O 1.0 L

0.5

0.0 1 2 3 4 5 6 7c M 8p o s i9ti o n 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

Figure 2.4: Genomewide linkage analysis results for AKT1 and p70S6K ratio phenotypes. Genomewide linkage analysis results are shown for the ratio of phosphorylated to total p70S6K (A) and AKT1 (B). Chromosomes 1‐22 are depicted from left to right, with alternating colors indicating different chromosome; odd‐numbered chromosomes are shown in black and even‐numbered chromosomes in blue. The maximum LOD score for the p70S6K ratio trait is 2.37 and is found on chromosome 3, while the maximum LOD score for the AKT1 ratio is 1.70 on chromosome 14.

3.0 2.5 2.0 D O

L 1.5 1.0 0.5 0.0 0 10 20 30 40 50 60 70 cM position

Figure 2.5: Linkage to chromosome 3 of correlated AKT1 and p70S6K ratio phenotypes. The AKT1 and p70S6K ratio traits were found to be genetically correlated, and their shared genetic component was used for genomewide linkage analysis. A section of chromosome 3 containing the maximum LOD score of 2.91 is shown here.

95

Table 2.1: Statistically significant covariates of AKT1, p70S6K and 4E­BP1 phenotypes

Protein Phenotype Statistically significant covariates AKT total plate 5, protein concentration phosphorylated plate 4, plate 6 phosphorylated:total protein concentration p70S6K total plate 2, plate 6, protein concentration phosphorylated plate 2, plate 3, plate 6, protein concentration phosphorylated:total age, age × sex, age2 × sex, plate 2, plate 3 4E‐BP1 total plate 2, plate 4, plate 5, plate 6, column 3 phosphorylated plate 2, plate 3, plate 6, protein concentration phosphorylated:total plate 3, plate 4, plate 5, plate 6, column 3, protein concentration

Table 2.2: Heritabilities of AKT1, p70S6K and 4E­BP1 phenotypes

Protein Phenotype Heritability (%) AKT total 2.64 phosphorylated 2.04 phosphorylated:total 17.99 p70S6K total 2.81 phosphorylated 0 phosphorylated:total 23.13 4E‐BP1 total 2.10 phosphorylated 0 phosphorylated:total 0

96

Chapter 3

CANDID: A flexible method for prioritizing candidate genes for complex human traits

Portions of the work in this chapter were published in [Hutz, et al. 2008].

97

3.1 Abstract

Genomewide studies and localized candidate gene approaches have become everyday study designs for identifying polymorphisms in genes that influence complex human traits. Yet, in general, the number of statistically significant findings and the need to focus in smaller regions require a prioritization of genes for further study. Some candidate gene identification algorithms have been proposed in recent years to attempt to streamline this prioritization, but many suffer from limitations imposed by the source data or are difficult to use and understand. CANDID is a prioritization algorithm designed to produce impartial, accurate rankings of candidate genes that influence complex human traits. CANDID can use information from publications, protein domain descriptions, cross‐species conservation measures, gene expression profiles and protein‐protein interactions in its analysis.

Additionally, users may supplement these data sources with results from linkage, association and other studies. CANDID was tested on well‐known complex trait genes using data from the Online Mendelian Inheritance in Man (OMIM) database.

Additionally, CANDID was evaluated in a modeled gene discovery environment, where it ranked genes whose trait associations were published after CANDID’s databases were compiled. In all settings, CANDID exhibited high sensitivity and specificity, indicating an improvement upon previously published algorithms. Its accuracy and ease of use make CANDID a highly useful tool in study design and analysis for complex human traits.

98

3.2 Introduction

Investigators initiating a study of a complex human trait generally choose either a candidate gene or genomewide approach. In a candidate gene approach, genes are selected based on their putative functions and possible relevance to the trait. This approach is necessarily biased towards well‐characterized genes, but if the right candidate gene(s) are selected, it may be the quickest and least expensive study design. Genomewide studies are more impartial, but the resulting statistical analyses are complicated by the large number of statistically significant or suggestive results. In many cases, investigators use a combination of genomewide and candidate gene approaches, using genomewide technologies to identify a subset of genes of interest and then conducting further analysis on the most promising candidates in this subset [Calvo, et al. 2006; Mootha, et al. 2003; Niculescu, et al.

2000; Schadt, et al. 2005]. As a result, both candidate gene and genomewide approaches often require a candidate gene selection and/or prioritization step. This step is highly susceptible to bias from a number of sources, including the extent and area of the individual scientist’s knowledge and the degree to which potential candidate genes have been characterized in the scientific literature and public databases.

Some candidate identification algorithms (CIAs) have been created to assist in automating candidate gene selection or prioritization. These algorithms make use of a wide array of publicly available and user‐generated data, and as such, they are very diverse in their designs. While some of these algorithms may be very

99

successful under certain circumstances as discussed below, broader use of these algorithms is generally limited by shortcomings in two main areas: data sources and overall usability. Many CIAs require specific data inputs, such as previously obtained biological data, which may not be available. For example, Prioritizer, TOM,

SUSPECTS, GeneWanderer and POCUS require the user to supply at least one locus of interest [Adie, et al. 2006; Franke, et al. 2006; Kohler, et al. 2008; Perez‐Iratxeta, et al. 2005; Rossi, et al. 2006; Turner, et al. 2003]. GeneSeeker requests that the user supply a locus of interest; if there are none, the algorithm must rely solely on publicly‐available expression data [van Driel, et al. 2003]. Another CIA, GeneRank, is based on Google’s PageRank algorithm and requires the user to supply expression data, again necessitating previous biological studies [Morrison, et al. 2005]. Several other recently reported CIAs (Endeavour, CGI, ToppGene and DRS) do not require biological data but instead require a training set of at least several known disease genes, which are then used to create a profile that is compared to potential candidates [Aerts, et al. 2006; Chen, et al. 2007; Li and Patra ; Ma, et al. 2007].

Selection of the training set relies on previous biological characterization of the trait, meaning that only a subset of well‐characterized traits can be studied with these CIAs.

The choice of datasets used to evaluate candidates can also limit the usefulness of CIAs. For example, Gene Ontology (GO) functional classifications of genes are heavily relied upon in many CIAs, including Endeavour, Prioritizer, TOM,

SUSPECTS, POCUS, G2D, GeneRank, GFSST, DRS, ACGR and GeneDistiller [Adie, et al.

100

2006; Aerts, et al. 2006; Franke, et al. 2006; Li and Patra ; Morrison, et al. 2005;

Perez‐Iratxeta, et al. 2005; Rossi, et al. 2006; Seelow, et al. 2008; Turner, et al. 2003;

Yilmaz, et al. 2009; Zhang, et al. 2006]. GO classification terms include a unique identification number and a brief description of the corresponding biological function using a controlled vocabulary, making them ideally suited to annotating large datasets [Ashburner, et al. 2000]. However, only 60 percent of all human genes have associated GO terms, and these terms may be inconsistent due to differences in curators’ judgment [Ashburner, et al. 2000; Dolan, et al. 2005].

Additionally, genes that already have GO annotations may still have additional unidentified functions. Similar limitations occur with the use of manually‐annotated pathway classifications from the Kyoto Encyclopedia of Genes and Genomes (KEGG), implemented in Endeavour, Prioritizer and Gentrepid [Aerts, et al. 2006; Franke, et al. 2006; George, et al. 2006]. Depending on the CIA in question, these issues reduce or eliminate the probability that poorly characterized genes will be viable candidates.

The usability limitations of current CIAs are generally related to either the algorithm output or the ease of use. Some CIAs evaluate a certain number of genes and then supply the user with an unordered subset of these genes [George, et al.

2006; Rossi, et al. 2006; Turner, et al. 2003; van Driel, et al. 2003; Zhang, et al.

2006]. This subset can be prohibitively large, depending on the nature of the intended follow‐up experiments. In this situation, the user would be forced to rely upon his or her own knowledge to pick candidates out of the subset, negating the

101

purpose of using a CIA. A better output method provides a ranked list of all considered genes, allowing the user to interpret the results as he or she sees fit.

This method is employed by many of the CIAs, including Prioritizer, SUSPECTS, G2D,

GeneRank, Endeavour and CGI [Adie, et al. 2006; Aerts, et al. 2006; Franke, et al.

2006; Ma, et al. 2007; Morrison, et al. 2005; Perez‐Iratxeta, et al. 2005]. The last weakness of some CIAs is low accessibility. An ideal CIA would be easy for all potential users to understand and easy to access, preferably through a web‐based interface that does not require software installation [Troyanskaya 2005]. Of the previously mentioned CIAs, only eight (Endeavour, G2D, TOM, GeneSeeker,

SUSPECTS, ToppGene, GeneWanderer and GeneDistiller) are available as purely web‐based applications that are easily accessible to both biologists and statisticians.

CANDID is a highly effective genomewide candidate identification algorithm designed with complex human trait genetics in mind. A variety of data sources are used by CANDID, and most of these were selected to reduce bias against poorly characterized genes. Users also have the option of supplementing CANDID’s analyses by providing their own data sources, including results from linkage and association studies. Additionally, users can optionally use their own knowledge to fine‐tune their analyses by specifying which data sources CANDID uses and the weights CANDID gives to each source. Finally, CANDID has a flexible web‐based interface that is easily accessible to both biologists and statisticians.

3.3 Methods

102

3.3.1 Overview of scoring

CANDID scores genes by up to eight criteria: publications, protein domains, cross‐species conservation, gene expression profile, protein‐protein interactions, linkage analysis results, association analysis results and custom data (Figure 3.1).

Each gene receives criterion‐specific scores, which are normalized and weighted by the user‐defined criterion weight and then summed to form the gene’s final score.

Genes are ranked by final score and presented as a list to the user along with detailed scoring information. While CANDID evaluates all human genes by default, users also have the option of limiting their analyses to only protein‐coding genes.

3.3.2 Scoring criteria

3.3.2.1 Publication data and scoring

CANDID’s publications scoring method differs from those of previously described CIAs. G2D, for example, links genes to traits through medical subject headings (MeSH) and GO terms [Perez‐Iratxeta, et al. 2007; Perez‐Iratxeta, et al.

2005]. CANDID’s publications database consists of direct links between PubMed IDs and EntrezGene IDs, where a gene‐publication link is made when a publication describes evidence of the sequence or function of the human gene or its murine ortholog. The publications associated with gene X will be identified as the set G(X).

The CANDID user provides a set of keywords associated with the trait of interest that would be appropriate for a typical literature search. As a general rule, keywords should be chosen based on whether publications about that keyword would be of

103

interest to someone researching the trait. For example, if someone researching schizophrenia were interested in knowing about genes with known functions in the brain, he or she should include “brain” as a keyword. As a result, keywords may range from very trait‐specific (“schizophrenia”) to very broad (“brain”). Traits may also differ greatly in the number of applicable keywords. It is to be expected that some traits will only have a few relevant keywords, and in these cases, adding more keywords that are only vaguely related to the trait would likely introduce more noise into the final rankings. Also, synonyms should be included when appropriate to encompass as many relevant publications as possible. For example, searching for

“Alzheimer” and “Alzheimer’s” returns different results, so both variations should be used when applicable. The resulting set of keywords is used in a “text word”

PubMed search to identify all matching publications (M). The publication score, P, for gene X is a value between 0 and 1 equal to

M ∩ G(X) P(X) = G(X)

The rationale for this method is to reward genes that are commonly linked with publications describing the trait of interest independent of the degree to which € these genes are characterized in the literature. After all genes are scored for this criterion, scores are normalized by dividing all publications scores by the highest overall publications score, eliminating a relative bias against traits with few matching publications.

104

3.3.2.2 Protein domain data and scoring

Protein domain information is obtained from the National Center for

Biotechnology Information (NCBI) Conserved Domain Database (CDD), a curated database that integrates information from other databases, such as Pfam, SMART and COG. The CDD is similar to InterPro, another protein domain resource used by

CIAs such as SUSPECTS [Adie, et al. 2006]. Domain entries may possess a description of the domain and links to genes whose protein products contain these domains. As with the publications criterion, the CANDID user provides a set of keywords relating to the trait of interest, which are used to search the protein domain descriptions. Any gene whose protein product(s) contain at least one of the matching domains receives a score of 1. All other genes receive a score of 0. This method serves to reward genes for their putative functions, and since domain prediction is generally sequence‐based, it will have the potential to reward nearly all protein‐coding genes for which domains have been identified, regardless of how extensively they have been characterized in the scientific literature.

3.3.2.3 Conservation data and scoring

CANDID’s method for assigning conservation scores is similar to that of other

CIAs, including DGP [Lopez‐Bigas and Ouzounis 2004]. NCBI’s HomoloGene database provides the information for CANDID’s conservation analysis.

HomoloGene analyzes genes from 18 completely sequenced organisms and detects homologs using amino acid and DNA sequences. For each gene, one of nine labels is

105

supplied that describes the phylogenetic similarity between humans and the organism harboring the most distant homolog. “Eutheria,” for example, is a common descriptor that refers to genes present only in eutherian mammals, such as humans and mice. Human‐specific genes receive a score of 0, while genes with the label

“Eukaryota” receive the maximum score of 1. Scores for the other seven labels are distributed evenly, ranging from 0.125 to 0.875 in increments of 0.125; genes with the label “Eutheria” receive a score of 0.375. A high score for this criterion may be relevant for phenotypes that are known to involve conserved cellular processes, such as cancer, while it may be irrelevant for phenotypes such as hair color that only affect a small subset of species. Scores for this criterion are normalized by dividing all genes’ conservation scores by the highest conservation score.

3.3.2.4 Expression data and scoring

The Genomics Institute of the Novartis Research Foundation (GNF) Gene

Atlas comprises expression levels of 17,761 human genes in at least 79 human tissues [Su, et al. 2004]. CANDID stands in contrast to other CIAs, such as TOM, that compare expression patterns across genes [Rossi, et al. 2006]. Instead, CANDID compares each gene’s expression levels across 79 to 84 measured tissues in the

GeneAtlas. Each gene receives a score of 1 for the tissue(s) in which the gene is most highly expressed, and the gene’s scores for the remaining tissues correspond to the ratio of the expression level in that tissue to the gene’s maximum expression level.

With this strategy, genes that are specifically expressed in a certain tissue receive a

106

high score for that tissue and very low scores for all other tissues, while

“housekeeping” genes with roughly equal expression levels in all tissues receive roughly the same scores across all tissues. This method serves to highlight genes in the tissues where they may be thought to play the most important roles. In order to use this criterion, CANDID users must enter at least one tissue code between 1 and

84 corresponding to the tissue(s) of interest. A gene’s expression score corresponds to the sum of its individual tissue scores for all of the user‐specified tissues. Scores for this criterion are normalized by dividing all genes’ expression scores by the highest expression score.

3.3.2.5 Protein­protein interaction data and scoring

Protein‐protein interactions can provide valuable information about a gene product’s function, even in the absence of other information. As such, they have been widely used in recent CIAs, including Prioritizer, Gentrepid and the method published by Oti, et al [Franke, et al. 2006; George, et al. 2006; Oti, et al. 2006]. In the NCBI Gene database, published protein‐protein interactions for each gene are compiled from sources such as BIND, HPRD and BioGRID. To determine a gene’s interactions score, CANDID sums the publications and protein domains scores for all of the gene’s interacting partners. Using this method, it is possible for a gene of unknown function to have scores of zero in all categories, but still be highly rated due to an interaction with a key, high‐scoring protein, perhaps identified through a

107

high‐throughput screen. These scores are not normalized, and as such, scores for this criterion range from 0 to an undefined upper limit.

3.3.2.6 Linkage data and scoring

CANDID was specifically designed to accommodate one or more sets of existing linkage data. CANDID users with linkage information may wish to limit their

CANDID gene analyses to one or more loci or to prioritize genes based on the gene’s approximate LOD score. CANDID readily accepts custom linkage files as well as linkage output files from GENEHUNTER, MERLIN and SOLAR [Abecasis, et al. 2002;

Almasy and Blangero 1998; Kruglyak, et al. 1996]. Multiple linkage files may be submitted to MetaMaker, an associated tool on the CANDID website, to create a meta linkage file by summing individual LOD curves. All marker positions are in centiMorgans (cM) and correspond to the Marshfield genetic map. positions can be converted to approximate Marshfield cM positions using

MapConverter, another tool on the CANDID website.

CANDID assigns a linkage score to each gene based on the approximate LOD score at the gene’s location. Approximate cM positions for each gene have been interpolated by examining the physical distance between the midpoint of the gene and the two closest Marshfield markers. The linkage score for a gene is interpolated between the two closest markers with assigned LOD scores and then normalized by dividing by the highest individual gene’s linkage score. The highest maximum linkage score will be 1, but some genes may have negative linkage scores if they are

108

associated with negative LOD scores. This method therefore filters out these genes, even in the presence of moderate to high scores from other criteria.

When a CANDID user is interested in analyzing one or more loci in the absence of linkage data, a dummy linkage file may be created to focus the analysis on specific intervals. This is done using four dummy markers for each interval. Two represent the boundaries of the interval, and both of these markers are assigned the same arbitrary non‐zero LOD score. Additionally, two markers are created just outside the interval and assigned a LOD score of 0. In this way, only the genes in the interval will receive a non‐zero linkage score from CANDID, allowing them to be separately analyzed if the user has chosen the option to limit analysis only to genes with LOD scores greater than 0.

3.3.2.7 Association data and scoring

CANDID can accept and use SNP‐based association data from the user. The user supplies a file where each line contains a SNP ID from dbSNP and a p‐value for that SNP. CANDID assigns each of these SNPs to their associated genes according to

NCBI’s dbSNP database. The best p‐value for each gene is subtracted from 1 to yield each gene’s association score. Users will note that genes with more tested SNPs are more likely to have a good p‐value, and therefore a good association score, due to multiple tests. It should also be noted that a gene with one SNP and a highly insignificant p‐value will always receive a higher association score than a gene with no associated SNPs. To eliminate these problems, the user has the option of setting

109

a p‐value threshold, where p‐values greater than the threshold are not considered.

Using this threshold greatly speeds up the association analysis and also results in the same range of scores, since association scores are normalized to fall between 0 and 1 regardless of whether the threshold is used.

3.3.2.8 Custom data and scoring

In addition to linkage and association data, CANDID can also accept and utilize other data. Users may provide a two‐column custom file, where each line contains an EntrezGene ID and a numeric score, with higher numbers corresponding to greater interest in the gene. This data will be weighted and used like the other

CANDID criteria.

3.3.3 Final score determination and ranking

The CANDID user must specify a series of weights corresponding to the eight criteria described above. Criteria with non‐zero weights are evaluated, and their scores are multiplied by the user‐defined criterion‐specific weights. Finally, a gene’s weighted scores are summed across all criteria to produce the final score. Genes are ranked by final score, and this information, along with normalized raw scores and basic information on each gene, is provided in a comma‐separated output file available for download from the server. This file can easily be opened in Microsoft

Excel or any number of statistical software packages. The user is also provided with

110

a summary file stating the parameters used for the analysis as well as a literature file with further information on the publications scores.

3.3.4 ROC curve calculation and analysis

For all CANDID tests, ranks of causal genes were extracted from CANDID output files and used to create receiver operating characteristic (ROC) curves. For each data point in the curve, the specificity corresponds to the percentage of genes that ranked lower than the causal gene(s), while the sensitivity corresponds to the percentage of the causal genes that achieved at least that level of specificity. For example, a specificity value of 0.91 and a sensitivity value of 0.65 indicates that 65 percent of the causal genes ranked at or above the 91st percentile in the CANDID output. The area under the curve (AUC) for each ROC is an indicator of the performance of the algorithm; an algorithm that ranked the causal gene(s) first every time would have an AUC of 1, while an algorithm that randomly ranked the causal gene(s) would have an AUC of 0.5.

3.3.5 Analysis of known genes responsible for traits in OMIM

In order to gauge the effectiveness of CANDID and its various criteria in ranking candidate genes, 29 complex traits were selected from the Online

Mendelian Inheritance in Man (OMIM) database. First, a set of 154 traits and associated identified causal genes on was selected, and from that set,

29 traits were chosen that are generally common in the population or known to be

111

influenced by multiple genes. These traits include various cancers, Alzheimer’s disease, and others. Phenotypic keywords were chosen for each trait by referencing the trait descriptions in OMIM. Additionally, when the trait seemed to affect certain tissues, those tissue codes were identified in order to use the expression criterion. A full list of the 29 traits, their keywords, tissue codes, causal genes and OMIM identifiers is available as Table 3.1. Keywords and tissue codes were selected a priori and were not changed at any point in the analysis.

For each trait, the identified keywords and tissue codes were used in a genomewide CANDID analysis using version 1 of the CANDID databases. Five criteria were scored in these analyses: publications, protein domains, conservation, expression and protein‐protein interactions. To further assess CANDID’s specificity, the 29 traits and their corresponding gene(s) were shuffled 100 times. Additionally, each of the 29 traits was paired with a randomly selected gene 100 times. The AUCs from the analyses using the true causal genes were compared to the AUCs resulting from the shuffled and random gene analyses using the Wilcoxon signed‐rank test.

To maximize the specificity and sensitivity of CANDID when using more than one criterion, 11 values were tested for each criterion weight. These values ranged from 0 to 1 by increments of 0.1 and yielded 161,051 possible configurations of weights for the five criteria. Some of these configurations were redundant; a configuration with weights of (0, 0, 0, 0.3, 0.6) would yield identical results to a configuration with weights of (0, 0, 0, 0.1, 0.2), since these weights serve only to rank genes relative to each other. The AUCs of all weight configurations were

112

determined, and the highest AUC was used to select the optimal configuration. This was performed in both a genomewide setting and a locus‐limited setting, where a dummy linkage file was used to restrict CANDID’s analysis to only chromosome 1.

To test the resulting optimal weight configuration, 15 additional complex or polygenic traits were selected from the subset of traits in OMIM that are linked to chromosome 2 (Table 3.1). As with the chromosome 1 traits, keywords and tissue codes were selected for these traits, and genomewide and chromosome‐specific

CANDID analyses were performed using the optimal weight configuration obtained from the chromosome 1 analyses. Ranks of causal genes on chromosome 2 were determined, and AUCs were calculated for both the genomewide and chromosome‐ specific analyses.

3.3.6 Analysis of recently characterized causal genes for complex traits

CANDID was designed for gene discovery purposes, and often, genes found to contain causal variants have no prior published association with the trait of interest.

To model a gene discovery effort, gene‐trait associations published after November

2006 were identified. The first version of the CANDID databases was compiled prior to the publication of these works, so the evidence from these publications that links the genes to their corresponding traits was not factored into CANDID’s analysis.

These publications identified causal variants for complex human traits by using various methods, including studies of promising candidate genes as well as genomewide studies. In several situations, a small cluster of genes was identified,

113

but the causal gene could not be determined due to linkage disequilibrium; in these cases, all genes in the cluster were used in this analysis. A full list of all 56 genes and their associated traits is included in Table 3.2. Keywords describing the ascertained traits were carefully selected from the publications, and tissues were selected for the expression criterion if it was deemed appropriate to do so (Table 3.3).

Keywords and tissue codes were selected a priori and were not changed at any point in the analysis.

Analysis of these traits was conducted similarly to the OMIM analyses. Each of the five main criteria was evaluated separately and in combination with 161,051 different weight configurations. For each of the criterion‐specific analyses, the 56 gene‐trait pairs were also shuffled 100 times, and each of the traits was also paired with a randomly selected gene 100 times. The analyses of various weight configurations were conducted in a genomewide setting as well as a locus‐limited setting, where dummy linkage files limited the analysis to the 200 genes closest to the true causal gene. The other main difference between the recently identified causal gene analysis and the OMIM analysis is that while genes associated with the same trait were analyzed together in the OMIM analyses, each of these 56 gene‐trait pairs was analyzed separately. The optimal weight configuration was determined by selecting the configuration that produced the highest AUC.

To test this resulting set of optimal weights, an additional 19 causal genes and their corresponding traits were selected. These gene‐trait associations were also published after the compilation of the first version of the CANDID databases.

114

Once again, relevant keywords were selected for each trait, and tissue codes were selected if applicable (Table 3.3). This information was used to run both genomewide and locus‐limited CANDID analyses using the optimal weight configuration determined from the analysis of the original 56 gene‐trait pairs.

Ranks of the causal genes were determined, and AUCs were calculated. To compare

CANDID to Endeavour, training sets of genes were determined from OMIM for each trait, and loci including the recently identified gene and 199 surrounding genes were selected and used as input for Endeavour. A full list of this information is in

Table 3.4.

3.3.7 Databases and computing

CANDID databases are periodically built using Perl scripts implementing the

DB_File module (version 1.814). Users can choose which version of the CANDID databases they would like to use. To date, several versions have been constructed.

Version 1 was compiled from data downloaded in November 2006, while the data used to construct version 2 was downloaded in June 2007. New database versions will be soon automatically created each month. Both version 1 and version 2 contain a similar number of genes, with 38,697 in version 1 and 38,530 in version 2. In the version 2 databases, 85 percent of protein coding genes have information for at least one of the five data‐independent criteria. Except where specified otherwise,

CANDID is described using the current version 2 dataset. On average, a CANDID

115

query returns results in less than three minutes, though run time can increase if there are many literature matches or extensive supplied association data.

3.4 Results

3.4.1 Genes associated with complex traits in OMIM

29 common or multigenic traits were selected from OMIM as described in the

Methods section and used to analyze the effectiveness of each individual criterion.

Receiver operating characteristic (ROC) curves were calculated to determine each criterion’s specificity and sensitivity. These values were generally high and, in most cases, statistically significantly higher than the values determined when the gene‐ trait pairs were randomized (Figure 3.2).

Different weight configurations were assessed to select the configuration that maximized CANDID’s sensitivity and specificity. One set of redundant configurations proved to be optimal for ranking genes associated with the OMIM traits: the configurations where all weights equaled zero except for the publications weight. For the genomewide analysis, this configuration yielded an AUC of 0.923, while a configuration with equal weights yielded an AUC of 0.895 (Figure 3.3A). The

AUCs were improved when a dummy linkage file was used to limit the analysis to evaluate only genes on chromosome 1. In this case, the AUC for the publications‐ only analysis was 0.983, compared to an AUC of 0.927 when all criteria were weighted equally (Figure 3.3B). Using the optimal weight conditions, both shuffling the genes and their traits and pairing the traits with randomly selected genes

116

resulted in significantly lower AUCs: approximately 0.81 when genes were shuffled

(P = 8.4 x 10‐9) and 0.52 when genes were randomly selected. Genomewide and chromosome‐specific analyses of an additional 15 traits linked to chromosome 2 were performed using the publications‐only weight configuration, and results were similar to those from the chromosome 1 traits, with AUCs of 0.969 (genomewide) and 0.971 (chromosome 2 only) (Figure 3.3).

3.4.2 Recently published causal genes for complex traits

Next, CANDID was tested in a gene discovery situation, where genes did not necessarily have any prior publications linking them to the trait. As with the OMIM genes, the effectiveness of each individual criterion in ranking these causal genes was evaluated. Unsurprisingly, when attempting to rank these recently characterized genes, the specificity and sensitivity were slightly lower than when attempting to rank the well‐characterized OMIM genes (Figure 3.4). However, in most cases, these values were significantly higher than values determined when the gene‐trait pairs were randomized.

The optimal weight configuration for ranking recently characterized genes was determined to be one in which the publications scores are weighted by 1, the expression scores are weighted by 0.2, and the protein‐protein interactions scores are weighted by 0.1. The protein domains and conservation scores had weights equal to 0 and were not used in the optimal weight configuration. For the genomewide CANDID analyses, use of the optimal weights improved the AUC to

117

0.926 compared with 0.878 when all weights were equal (Figure 3.5A). The optimal weight configuration for the locus‐limited analysis was the same as for the genomewide analysis, yielding an AUC of 0.906 compared with an AUC of 0.843 when all weights were equal (Figure 3.5B). Randomly shuffling the genes and traits resulted in significantly lower AUCs (0.818 for the genomewide analysis, P = 7.2 x

10‐13 and 0.755 for the locus‐specific analysis, P = 2.4 x 10‐12), as did randomly selecting genes (AUCs of 0.502 and 0.487, respectively).

CANDID’s optimal weight configuration depends heavily on publications.

Since the publications criterion in turn depends on the keyword selection, one or two keywords were dropped for each trait in the genomewide analysis, and the resulting AUC was not significantly different (0.912 compared to 0.926 with all keywords, data not shown.) It is also possible that novel causal genes of unknown function could be missed using CANDID’s optimal weight configuration. To test this, the optimal weight configuration using all data sources other than publications was determined. This configuration weights protein domains by 0.1, conservation by

0.1, gene expression by 0.3 and protein‐protein interactions by 0.9. Though not quite as accurate as the optimal weight configuration, it still has a very high AUC

(0.889 for the genomewide analysis and 0.850 for the locus‐limited analysis, Figure

3.5.) When the original optimal configuration of weights was used to repeat the

CANDID analyses after CANDID’s publications database was updated to version 2, including the publications describing the 56 gene‐trait pairs, the specificity and sensitivity increased, yielding AUCs of 0.952 and 0.941 for the genomewide and

118

locus‐specific analyses (Figure 3.5). Finally, the optimal weight configuration was also quite successful in ranking the 19 causal genes that were not included as part of the original set of 56 gene‐trait pairs (genomewide AUC = 0.882, locus‐limited AUC =

0.855). When these gene‐trait pairs were analyzed with Endeavour in a locus‐ limited setting, the AUC was lower (0.818, Figure 3.5B).

3.5 Discussion

CANDID is a flexible and easy to use method for identifying and prioritizing candidate genes for complex human traits. CANDID uses a variety of sources, including publications, protein domains, cross‐species conservation levels, gene expression levels and protein‐protein interactions to assess and rank all human genes. These five criteria were chosen to reduce or eliminate dependence on the user’s personal knowledge. Though users may optionally contribute their own linkage, association and other experimental data to CANDID’s prioritization process, prior data is not required. Additionally, CANDID does not require the definition of training or test sets, which may limit the types of traits available for analysis.

Compared to other candidate identification algorithms (CIAs), CANDID may be the most versatile.

CANDID’s effectiveness was rigorously tested using two types of datasets: genes linked to common or complex traits described in OMIM and genes recently linked to complex traits. The resulting AUCs were quite high compared to AUCs for other gene prioritization programs. Prioritizer, developed by Franke, et al,

119

demonstrated an AUC of 0.90 when the method was used to characterize gene networks, but when used to prioritize known OMIM genes at a given locus, the AUCs appear to be between 0.65‐0.70 [Franke, et al. 2006]. Endeavour produced an AUC of 0.866 for known OMIM genes [Aerts, et al. 2006]. Other algorithms, such as

GeneWanderer and an unnamed functional linkage network (FLN) method, also performed well when identifying known disease genes, achieving AUCs of approximately 0.98. CANDID’s AUCs were lower than 0.98 in the OMIM gene analysis (genomewide weight‐optimized AUC = 0.923, genomewide chromosome 2 analysis AUC = 0.969). However, CANDID outperformed the other algorithms in a simulated gene discovery analysis (genomewide weight‐optimized AUC = 0.926, genomewide new gene analysis AUC = 0.882, compared to gene discovery AUCs of

0.805 and 0.85 for Endeavour and the FLN method, respectively) [Aerts, et al. 2006;

Linghu, et al. 2009]. This indicates that in a true gene discovery setting, CANDID should rank causal genes higher than the other algorithms.

In another comparison, Endeavour was used to rank a handful of genes that had been recently linked to complex traits. These analyses were limited to a 200‐ gene locus, and the publication data used was “rolled back” to one year before the critical publications were published. The average rank of the causal genes was 40 in

Endeavour’s analysis; in CANDID’s very similar simulated gene discovery analysis, the average causal gene rank was 20 (Tables 3.2, 3.4) [Aerts, et al. 2006]. In a “head‐ to‐head” comparison of CANDID and Endeavour using 19 gene‐trait pairs,

Endeavour also had a lower AUC (0.818), despite the fact that it could access

120

databases that were updated after the gene‐trait associations were published, while

CANDID’s databases were compiled prior to this.

Some limitations do exist to CANDID’s performance. Chief among these is a problem inherent in all of CANDID’s data sources – that the genome is incompletely and unevenly described. As annotation increases, so may the perception of public data sources as complete or infallible. It is worth pointing out that even databases such as dbSNP, which contain millions of entries, fail to provide even, complete coverage. Furthermore, the gaps in coverage may be nonrandom, depending on the database used. For example, users supplying custom data from a genomewide association study may note that genes involved in immune response and sensory perception are located in regions of weak linkage disequilibrium, and as such, may be underrepresented by SNPs on the current generation of SNP chips [Li, et al. 2008;

Smith, et al. 2005]. Similarly, the average CANDID user may be concerned with the uneven coverage of the scientific literature, upon which CANDID relies heavily. The high relative weight attributed to the publications criterion may raise questions about CANDID’s ability to rank genes that have yet to be characterized in the scientific literature. The accuracy of a genomewide CANDID analysis excluding the publications criterion was high (AUC = 0.889), though not as high as when publications were included (AUC = 0.926). Additionally, the optimal weight configuration changed significantly when publications were excluded, the main difference being the inclusion of the protein domains and conservation criteria.

Users who wish to focus strongly on poorly characterized genes may therefore

121

choose to use the publications‐excluded optimal weight configuration. All users should note that increasing the number of data sources used in a CANDID analysis might help to reduce bias against genes that are underrepresented in any one particular data source. Additionally, careful selection of the criteria to be used and an understanding of each criterion’s flaws are important.

The high reliance on certain criteria may raise questions of the effectiveness of other criteria in identifying true causal genes. For example, the optimal weight configuration for gene discovery contained weights of 0 for the protein domains and conservation criteria. These criteria performed significantly better than would be expected of a random prioritization algorithm (Figures 3.2, 3.4), but they failed to distinguish between the causal gene(s) specific to a given trait and causal genes for other traits. It is interesting to note that, even when using optimal weight configurations, the AUCs resulting from shuffling gene‐trait associations were high.

Though generally significantly different from the AUCs resulting from the correct gene‐trait pairs, these AUCs were higher than expected. This may indicate that

CANDID tends to prioritize genes causal for any trait over genes not causal for any traits. One CIA, PROSPECTR, aims to do just this by utilizing sequence features

[Adie, et al. 2005]. However, CANDID does not evaluate genes directly at the sequence level. It is possible that causal genes are instead characterized by a number of other factors measured by CANDID, such as a large number of protein‐ protein interactions, the presence of identified functional domains, extensive cross‐ species conservation or tissue‐specific expression patterns. As more gene‐trait

122

associations are published, it may be possible to examine this profile in greater detail.

One of CANDID’s strengths is its ability to conduct genomewide analyses, whereas other CIAs must focus on a specific locus or set of genes. CANDID appears to be equally effective in genomewide and locus‐specific settings. In the OMIM gene set, CANDID performed slightly better when limited to the 3,504 genes on chromosome 1, while in the gene discovery setting, CANDID performed slightly worse when limited to a 200‐gene locus. Causal genes will therefore be ranked in roughly the same percentile, regardless of how many genes are analyzed. It is therefore suggested that as a practical matter, users who wish to follow up on a smaller number of genes should limit their analyses to one or more loci, if feasible.

Though CANDID has been described here as a method for prioritizing candidate genes for complex human traits, it can easily be used to investigate other traits, including Mendelian human traits. Murine versions of the CANDID databases are in preparation for users who wish to investigate mouse traits. In addition, a batch mode version of CANDID is available upon request for individuals who wish to adapt the CANDID algorithm and databases to high‐throughput gene identification and other methods.

The potential applications of CANDID are numerous. A ranked list of candidate genes can be created to select genes for a variety of initial or follow‐up studies. If a promising locus is identified, CANDID could be used to prioritize genes within that region for resequencing or SNP‐based genotyping. Additionally, results

123

from genomewide CANDID analyses could be used to identify a large set of genes to use in the construction of custom microarrays. Ranked output of candidates could also prove useful in analyzing the large amount of data generated by genomewide association studies; instead of analyzing all SNPs at once, requiring a correction for a large number of tests, smaller batches of SNPs could be analyzed sequentially, based on CANDID’s ranking of their corresponding genes. When used wisely, CANDID has great potential to reduce bias in candidate gene‐directed studies and to streamline genomewide studies.

124

Figure 3.1: Depiction of CANDID’s information flow. Genes are evaluated by up to 8 criteria (Publications, Protein domains, Conservation, Expression, Interactions, Linkage, Association and Custom). Scores from each criterion are normalized, weighted by the value specified by the user for that criterion and combined to form final scores. CANDID output consists of a list of all evaluated genes, ranked by final score, as well as other associated information.

125

Figure 3.2: Performance of 5 criteria in ranking OMIM genes. Receiver operating characteristic (ROC) curves describing the specificity and sensitivity of CANDID in identifying the causal genes for each OMIM trait are shown in blue. Orange and pink ROC curves are representative curves from simulations in which traits were paired with causal genes from other traits and random genes, respectively. Areas under the curves (AUC) are displayed. The five CANDID criteria that can be used without any prior data include (A) publications, (B) protein domains, (C) conservation, (D) expression and (E) protein‐protein interactions.

126

Figure 3.3: Success in predicting OMIM genes using 5 main criteria. Receiver operating characteristic (ROC) curves were generated for the genomewide (A) and chromosome‐specific (B) analyses of traits linked to chromosomes 1 and 2. Analyses of traits linked to chromosome 1 using the optimal weight combination (blue) and equal weights (orange) are shown. Analysis of traits linked to chromosome 2 using the optimal weight combination is shown in pink.

127

Figure 3.4: Performance of 5 criteria in ranking recently identified causal genes. ROC curves describing the specificity and sensitivity of CANDID in identifying the causal genes are shown in blue. Orange and pink ROC curves are representative curves from simulations in which traits were paired with other causal genes and random genes, respectively. The five CANDID criteria assessed are (A) publications, (B) protein domains, (C) conservation, (D) expression and (E) protein‐protein interactions.

128

Figure 3.5: Success in predicting recently published genes using 5 data‐ independent criteria in (A) genomewide and (B) locus‐specific analyses. Receiver operating characteristic (ROC) curves are shown for the optimal weight configuration (blue), equal weights configuration (orange) and optimal weight configuration excluding publications (pink). The purple curves represent the ROC curves produced when optimal weights were used with updated databases from June 2007. Green ROC curves indicate the results from analyses with the optimal weight configuration on the 19 gene‐trait pairs not used in the original analysis. The brown ROC curve in (B) represents results from an Endeavour analysis using the 19 gene‐trait pairs.

129

Table 3.1: Information used for OMIM trait analysis

OMIM ID Keywords Tissue Target Other EntrezGene code chromosome IDs EntrezGene IDs Chromosome 1 traits #104300 Alzheimer, Alzheimer's, dementia, neurodegenerative, 29 5664 351, 348, 5633, 2, neurofibrillary tangles, neurofibrillary, senility, 4035, 7018, 3077, plaques, beta‐amyloid 4846, 7422, 20, 7124 #114500 colorectal cancer, cancer, colon, tumor 1 4595 314, 4436, 4292, 5378, 5395, 2956, 7048, 27030, 8313, 11200, 3845, 5290, 673, 1499 130 #126200 multiple sclerosis, autoimmune, immunologic, 7, 8, 5788 934, 3123, 3119

immune system, inflammation, central nervous 29, 40 system , neuron, axon, dendrite, myelin, demyelination #137760 optic disc, optic nerve, eye, vision, glaucoma, ‐ 4653 10133, 134430 intraocular pressure, nerve fiber, visual field, intraocular, aqueous humor #144250 hyperlipidemia, familial hyperlipidemia, familial 45 7391 ‐ combined hyperlipidemia, cholesterol, triglycerides, VLDL, LDL, lipoprotein #145500 hypertension, blood pressure, heart, vascular, 44, 53, 183, 1889, 185, 2784, 118, essential hypertension 46 481, 8490, 1577, 4846 6401, 7133 #147050 atopic hypersensitivity, atopy, asthma, hay fever, ‐ 6403 51131, 2206 eczema, IgE responsiveness, IGER, allergy, IgE

metabolism, immunoglobulin E, allergen #152430 , life span, lifespan, age, aging ‐ 51441 11216, 1071, 5444, 7099, 4536, 348 #155255 medulloblastoma, desmoplastic medulloblastoma, 27, 41 8643 324, 1499, 51684, brain tumor 675 #168000 paraganglioma, paragangliomas, glomus body, glomus ‐ 6391, 6390 6392 body tumors, paraganglia, phenochromocytoma, chemodectoma, chromaffin, nonchromaffin #168600 Parkinson, Parkinson's, substantia nigra, 29 11315 6622, 7345, 120892, dopaminergic neuron, dopaminergic, dopamine, 27429, 9627, 5071 parkinsonism, dementia, Lewy body, Lewy bodies, neuron #171300 pheochromocytoma, catecholamine, adrenal medulla, 42 6390 7428, 5979, 6392, adrenal tumor, adrenal cancer 2668 #176807 prostate cancer, prostate, cancer, tumor 47 6041, 2048 4601, 3732, 60528, 131 5728, 8379, 1316,

463, 4481, 11200, 999 #180300 inflammation, autoimmune, inflammatory, synovial, ‐ 23569, 26191 4795, 6583, 861, rheumatoid arthritis, arthritis 4261 #181500 schizophrenia, brain, neuron, neurotransmitter, 24 27185, 5999 9685, 5625 personality #188550 thyroid cancer, thyroid carcinoma, cancer, tumor 49 4914, 7170, 5979, 8031, 5573, 7175, 51592 10342, 114568, 5108, 9950, 8805 #266600 inflammation, inflammatory bowel, bowel, Crohn, ‐ 149233 64127, 5243, 6583, intestine, ulcerative colitis, ileum, colon, colitis, 9231, 6584 autoimmune, autoimmunity #268000 retinitis pigmentosa, retina, night blindness, ‐ 23418, 9129, 5158, 6010, 4575, blindness, bone corpuscle, fundus, retinal pigment, 24, 64218 6103, 6101, 6100,

retina, eye, vision 26121, 5961, 7287, 762, 10594, 3614, 375298, 25794, 768206 #600101 deafness, deaf, hearing, hearing loss, sensorineural, ‐ 2707, 9132 ‐ nonsyndromic, cochlea #601634 neural tube, neural tube defects, spina bifida, 41 4524, 4548 4552, 4522 myelomeningocele, anencephaly, folate, folic acid, cobalamin, homocysteine, birth defect #601665 obesity, body mass index, BMI, lipid storage, adipose, 56, 45, 3953 3953, 5122, 5443, adiposity, overweight, obese, energy metabolism, 35 4160, 154, 155, feeding, satiety 51738, 348, 7040, 5167, 9607, 51141 #605074 papillary renal cell carcinoma, papillary RCC, 46 5546 7030, 4233 papillary renal tumor, renal carcinoma, renal tumor, 132 papillary

#605462 basal cell carcinoma, multiple basal cell carcinoma, 69 8643 5727 BCC, basal cell nevus syndrome #608266 parathyroid carcinoma, primary ‐ 79577 ‐ hyperparathyroidism, parathyroid cancer, parathyroid tumor, elevated PTH #608556 Legionnaire disease, Legionnaire's, susceptibility to 43 7100 ‐ Legionnaire, pneumonia #609006 deafness, deaf, hearing, hearing loss, neurosensory ‐ 83715 ‐ deafness, neurosensory, vestibular areflexia, areflexia #609532 hepatatis C susceptibility, hepatitis C infection, ‐ 5788 3123, 3119, 3126, hepatitis C, HCV, HCV infection 3804 #610141 cardiac repolarization, cardiac polarization, qt 44, 53, 9722 ‐ interval, heart, cardiac, ion channel 72 #610698 macular degeneration, macula, retina, fovea, eye ‐ 3075, 83872, 10516, 2074, 84839,

24 5654, 4567, 629, 717 Chromosome 2 traits #114480 breast cancer, cancer, tumor, cell cycle, mitosis, ‐ 580, 841, 672, 675, 8068, apoptosis, stomach, breast, mammary 1545 60500, 5002, 7157, 83990, 9821, 367, 5888, 11200, 8493, 5290, 207, 79728, 7040, 3161, 5728, 6794, 472, 8202, 7764, 6387, 6366, 2826, 2099, 2064, 3119, 3123, 1312, 1543, 3265, 1588, 1586, 23405, 27161, 27257, 9530, 56892, 133 2263, 4214, 4046,

27324 #115200 dilated cardiomyopathy, cardiomyopathy, cardiac 53, 44 1674, 7273 4000, 4607, 11155, dilation, systolic function, systole, myocardium, heart, 2218, 7414, 5664, cardiac muscle, cardiac 5663, 7112, 4625, 70, 5350, 10060, 8557, 8048, 6444, 2070, 6331, 7139 #125853 diabetes, type 2 diabetes, NIDDM, insulin, insulin 45, 75, 11132, 4760, 8835, 6517, 6514, resistance, obesity 56 2820, 3667 9479, 3767, 6927, 6928, 208, 56729, 84105, 5506, 5770, 5167, 2053, 3990, 6934, 5468, 3119, 3125, 3126, 3117,

9370, 5078, 7021, 6833, 4536, 4540, 7466 #137215 gastric cancer, stomach cancer, cancer, tumor, cell ‐ 4436, 3553 999, 4292, 5728, cycle, mitosis, apoptosis, stomach, gastric carcinoma 6794, 7157, 3659, 9, 3119, 3117, 7124, 1571 #141749 fetal hemoglobin, HbF, F cells, hemoglobin 23, 21 53335 3047, 3048, 3043, 3045, 3040 #165720 osteoarthritis, Heberden nodes, arthritis, Heberden, ‐ 2487, 388, 54829, 23143, 8200, joint pain, knee, degenerative arthritis, degenerative 3552 1280, 2099, 2100, joint disease, inflammation, synovial fluid, cartilage 367, 51314 %209850 autism, autistic, brain, cognition, cognitive, behavior, 41, 29 8604 54413, 57502, 4204, mental, neuron, psychiatric, neuropsychiatric, 6532, 3690, 4233, neurotransmitter, pervasive developmental disorder 2020 134 #211980 lung cancer, lung carcinoma, cancer, tumor, cell cycle, 43 841 5002, 7157, 3845,

mitosis, apoptosis, adenocarcinoma, alveolar cell 673, 1956, 1548, carcinoma, small cell lung cancer, non‐small cell lung 4353, 2944, 3162, cancer, nonsmall cell lung cancer, NSCLC, smoking, 355, 356 tobacco #600669 epilepsy, seizure 29 785 4200, 79751, 773, 4988, 2563 #600807 asthma, bronchial, cough, wheezing, dyspnea, atopy, 65, 43 3176, 3790 5729, 387129, atopic dermatitis, airway hyperresponsiveness, 11213, 94103, 154, bronchial hyperresponsiveness, serum IgE, 1728, 4353, 847, eosinophil, eosinophils 4589, 3596, 3593, 117156, 3135, 7941, 6356, 7356, 6778, 51131, 3566, 3596, 80332, 11251

%601694 levels, leptin 56 5443 ‐ #609423 HIV susceptibility, HIV, AIDS, acquired 7 3577 1234, 1231, 6352, immunodeficiency syndrome, human 6387, 6349, 1524, immunodeficiency virus, CD4‐positive, CD4+ 30835, 9560, 3566, 3106, 3107, 3813, 3804, 3458, 6347, 6356, 30834 #609755 celiac disease, celiac, inflammation, inflammatory, ‐ 1493 4650, 84162, autoimmune, autoimmunity, intestine, bowel, gluten 132612, 3558, 59067, 3117, 3119, 3125 #609830 abdominal fat distribution, body fat distribution, fat 56 5443 5468, 5770 distribution, abdominal fat, body fat, adipose, subcutaneous fat, fat cells, obesity, waist, abdominal obesity 135 #611162 malaria, mosquito, parasite, plasmodium, falciparum, 23 2995 3043, 3383, 948,

malariae, vivax, ovale, anemia, erythrocyte, 1378, 2293, 2994, erythrocytes, red blood cells, red blood cell 7124, 4843, 114609, 2213, 2532, 2539, 3123, 3119, 3106

Table 3.2: Genes with recently identified complex human trait associations

Reference Gene Trait Rank Rank (genome) (locus) Original set of 56 gene‐trait associations [Mullighan, et al. 2007] PAX5 acute lymphoblastic leukemia 2302 16 [Riemenschneider, et al. PLAU Alzheimer’s disease 115 1 2006] [Rogaeva, et al. 2007] SORL1 Alzheimer’s disease 148 2 [Liu, et al. 2007] HNT Alzheimer’s disease (late‐onset) 1094 12 [Liu, et al. 2007] OPCML Alzheimer’s disease (late‐onset) 486 7 [Durand, et al. 2007] SHANK3 disorders 3345 26 [Frayling, et al. 2007] FTO body mass index 6935 38 [Seal, et al. 2006] BRIP1 breast cancer 1912 21

136 [Cox, et al. 2007] CASP8 breast cancer 67 1 [Easton, et al. 2007b] FGFR2 breast cancer 5404 39

[Easton, et al. 2007b] LSP1 breast cancer 7108 55 [Easton, et al. 2007b] MAP3K1 breast cancer 359 5 [Erkko, et al. 2007] PALB2 breast cancer 11251 77 [Easton, et al. 2007b] TOX3 breast cancer 10741 65 [Sun, et al. 2007] CASP8 cancer (multiple types) 67 1 [Hata, et al. 2007] AGTRL1 cerebral infarction 3045 15 [Kubo, et al. 2007] PRKCH cerebral infarction 4309 27 [Pare, et al. 2007] EDN1 coronary artery disease 1816 2 [Rioux, et al. 2007] ATG16L1 Crohn’s disease 5657 45 [Rioux, et al. 2007] FAM92B Crohn’s disease 14848 119 [Rioux, et al. 2007] NCF4 Crohn’s disease 3566 31 [Rioux, et al. 2007] PHOX2B Crohn’s disease 3349 10 [Shaw‐Smith, et al. 2006] MAPT developmental delay, learning disability 176 1

[Duffy, et al. 2007] OCA2 eye color 621 8 [Chan, et al. 2006] MSH2 hereditary nonpolyposis colorectal cancer 141 2 [Gao, et al. 2007] CHD7 idiopathic scoliosis 3936 26 [Dempfle, et al. 2006] VDR idiopathic short stature 361 2 [Duerr, et al. 2006] IL23R inflammatory bowel disease 946 4 [Lucae, et al. 2006] P2RX7 major depressive disorder 5105 31 [Dieterich, et al. 2007] AURKC male infertility 399 3 [Papassotiropoulos, et al. WWC1 memory performance 1630 16 2006] [Tarpey, et al. 2006] AP1S2 mental retardation 4944 27 [Bierut, et al. 2007] DLG4 nicotine dependence 4 1 [Bierut, et al. 2007] GABARAP nicotine dependence 1559 10 [Miyamoto, et al. 2007] GDF5 osteoarthritis 20 1 [Tosh, et al. 2006] MICA paucibacillary leprosy 358 9

137 [Tosh, et al. 2006] MICB paucibacillary leprosy 784 32 [Bogdanova, et al. 2007] ANXA5 pregnancy loss 744 3

[Cargill, et al. 2007] IL23R psoriasis 1767 4 [Chen, et al. 2006] ACSL6 schizophrenia 2361 17 [Chen, et al. 2006] CDC42SE2 schizophrenia 3879 29 [Choudhury, et al. 2007] FXYD6 schizophrenia 4077 39 [Chen, et al. 2006] RAPGEF6 schizophrenia 1691 7 [Graham, et al. 2006] CD28 systemic erythematosus lupus 253 5 [Graham, et al. 2006] CTLA4 systemic erythematosus lupus 233 5 [Graham, et al. 2006] ICOS systemic erythematosus lupus 379 9 [Romeo, et al. 2007] ANGPTL4 triglyceride levels 366 4 [Zeggini, et al. 2007] CDKAL1 type 2 diabetes 6107 32 [Zeggini, et al. 2007] CDKN2A type 2 diabetes 2755 9 [Zeggini, et al. 2007] CDKN2B type 2 diabetes 16521 75 [Zeggini, et al. 2007] FTO type 2 diabetes 7615 45

[Zeggini, et al. 2007] HHEX type 2 diabetes 4338 20 [Zeggini, et al. 2007] IDE type 2 diabetes 129 2 [Zeggini, et al. 2007] IGF2BP2 type 2 diabetes 1403 12 [Zeggini, et al. 2007] SLC30A8 type 2 diabetes 2829 18 [Dewan, et al. 2006] HTRA1 wet age‐related macular degeneration 1645 20 Average (± s.e.m.) 3000±483 20±3 Additional 19 gene‐trait associations [Broderick, et al. 2007] SMAD7 colorectal cancer 954 11 [Shen, et al. 2007] LRP8 coronary artery disease 10108 42 [Sulem, et al. 2007] SLC24A4 hair and eye color 16356 62 [Sulem, et al. 2007] KITLG hair color 387 2 [Willer, et al. 2008] GALNT2 HDL cholesterol levels 9698 68 [Willer, et al. 2008] MMAB HDL cholesterol levels 11712 95 [Willer, et al. 2008] MVK HDL cholesterol levels 364 4 138 [Willer, et al. 2008] CELSR2 LDL cholesterol levels 7940 22

[Willer, et al. 2008] PSRC1 LDL cholesterol levels 14850 112 [Willer, et al. 2008] SORT1 LDL cholesterol levels 435 6 [Mio, et al. 2007] COL11A1 lumbar disc herniation 170 2 [Stefansson, et al. 2007] BTBD9 restless legs syndrome 4803 44 [Plenge, et al. 2007] C5 rheumatoid arthritis 231 5 [Plenge, et al. 2007] TRAF1 rheumatoid arthritis 389 4 [Stokowski, et al. 2007] SLC24A5 skin pigmentation 1389 9 [Stokowski, et al. 2007] SLC45A2 skin pigmentation 133 2 [Willer, et al. 2008] ANGPTL3 triglyceride levels 187 6 [Willer, et al. 2008] MLXIPL triglyceride levels 273 2 [Willer, et al. 2008] TRIB1 triglyceride levels 14856 116 Average (± s.e.m.) 5012±1383 32±9

Table 3.3: Information used for analysis of recently characterized genes

EntrezGene Gene PubMed ID Keywords Tissue codes ID Symbol Original set of 56 gene‐trait associations 17053108 macular degeneration, eye, neovascular, retina, macular ‐ 5654 HTRA1 17053149 brain, memory, neuron, hippocampus, axon, dendrite 24, 32, 35, 36 23286 WWC1 17033622 breast cancer, tumor, cancer, cell cycle, mitosis ‐ 83990 BRIP1 cancer, colorectal cancer, endometrial cancer, tumor, 16951683 nonpolyposis, mitosis, cell cycle 1 4436 MSH2 mental retardation, brain, developmental delay, 16906163 hypotonia, muscle, craniofacial 29, 41, 75 4137 MAPT 17030554 schizophrenia, neuron, axon, dendrite, brain 24, 33 56990 CDC42SE2 51735 RAPGEF6 23305 ACSL6 139 lupus, systemic erythematosus lupus, autoimmune, 17000707 immunity ‐ 940 CD28 1493 CTLA4 29851 ICOS 16923796 leprosy, autoimmune, immunity, leper ‐ 4276 MICA 4277 MICB 16905557 stature, bone, long bones, bone length, height ‐ 7421 VDR 16822851 , neurotransmitter, neuron, axon, dendrite 29 5027 P2RX7 Alzheimer's, Alzheimer, dementia, amyloid, beta‐amyloid, 16825285 neurofibrillary 29, 24, 30, 33 5328 PLAU 17186471 mental retardation, brain, hypotonia, muscle 29, 41, 75 8905 AP1S2 inflammatory bowel, bowel, inflammation, Crohn's, 17068223 gastrointestinal, ulcerative colitis, ileum, colon ‐ 149233 IL23R 17173049 autism, neuron, Asperger, brain, Asperger's 29, 41 85358 SHANK3

Alzheimer's, Alzheimer, dementia, neurofibrillary, 17220890 amyloid, beta‐amyloid, 29, 24, 30, 33 6653 SORL1 stroke, cerebral infarction, arteriosclerosis, 17206144 atherosclerosis, arteries ‐ 5583 PRKCH 17236130 eye color, iris, eye ‐ 4948 OCA2 17434869 obesity, bmi, fat, obese, body mass 56 79068 FTO inflammatory bowel, bowel, inflammation, Crohn's, 17435756 gastrointestinal, ulcerative colitis, ileum, colon ‐ 55054 ATG16L1 8929 PHOX2B 4689 NCF4 339145 FAM92B 17450141 cancer, tumor, cell cycle, lung cancer, mitosis ‐ 841 CASP8 17435757 infertility, fertility, sperm, spermatozoa, testis, male factor 60, 61, 62, 63 6795 AURKC 17322881 triglycerides, HDL, cholesterol, fatty acids 56, 58, 45, 79 51129 ANGPTL4 17384641 osteoarthritis, bone, arthritis, cartilage ‐ 8200 GDF5 140 17293864 breast cancer, tumor, cancer, cell cycle, mitosis ‐ 841 CASP8 17287723 breast cancer, tumor, cancer, cell cycle, mitosis ‐ 79728 PALB2 17529967 breast cancer, tumor, cancer, cell cycle, mitosis ‐ 2263 FGFR2 27324 TOX3 4214 MAP3K1 4046 LSP1 acute lymphoblastic leukemia, leukemia, cancer, blood 17344859 cells, acute lymphocytic leukemia, mitosis, cell cycle 7, 8, 9 5079 PAX5 17463249 diabetes, type 2, obesity, obese, insulin 56, 58, 45, 79 79068 FTO 54901 CDKAL1 3087 HHEX 3416 IDE 1029 CDKN2A 1030 CDKN2B 10644 IGF2BP2

169026 SLC30A8 nicotine, smoking, cigarette, addiction, neurotransmitter, 17158188 axon, neuron, dendrite 38 11337 GABARAP 1742 DLG4 thrombosis, pregnancy, thrombophilia, pregnancy loss, 17339269 pregnant, abortion, miscarriage ‐ 308 ANXA5 stroke, brain infarction, atherosclerosis, arteries, 17309882 thromboembolism, blood vessel ‐ 187 AGTRL1 Alzheimer's, Alzheimer, dementia, amyloid, beta‐amyloid, 17564960 neurofibrillary 29, 24, 30, 33 4978 OPCML 50863 HNT spine, scoliosis, caudal, spinal, vertebral, vertebrae, 17436250 vertebra ‐ 55636 CHD7 17357072 schizophrenia, neuron, axon, dendrite, brain 24, 33 53826 FXYD6 HDL, cholesterol, coronary artery disease, LDL, artery, 141 17357073 arteries 45 1906 EDN1

psoriasis, T‐cell, inflammation, inflammatory, arthritis, 17236132 skin 69 149233 IL23R Additional 19 gene‐trait associations 17934461 cancer, colorectal cancer, tumor, mitosis, cell cycle, colon 1 4092 SMAD7 coronary artery disease, coronary artery, heart, coronary, 17847002 heart attack, 44 7804 LRP8 17952075 eye color, hair color, pigmentation, iris, hair, pigment ‐ 123041 SLC24A4 17952075 hair color, pigmentation, hair, pigment ‐ 4254 KITLG lumbar disc herniation, lumbar disc, intervertebral disc, vertebra, vertebrae, vertebral, herniation, hernia, lumbar 17999364 spine, lumbar ‐ 1301 COL11A1

restless legs syndrome, restless legs, restlessness, paresthesia, paresthesias, dysesthesias, dysesthesia, 17634447 twitch, tremor, muscle 75 114781 BTBD9 rheumatoid arthritis, arthritis, inflammation, 17804836 autoimmune, inflammatory, synovial ‐ 727 C5 7185 TRAF1 17999355 skin color, skin, pigmentation, melanin 69 283652 SLC24A5 51151 SLC45A2 HDL, cholesterol, lipid, lipoprotein, triglyceride, 18193043 triglycerides, fatty acid, fatty acids 45, 56 2590 GALNT2 4598 MVK 326625 MMAB LDL, cholesterol, lipid, lipoprotein, triglyceride, 18193043 triglycerides, fatty acid, fatty acids 1952 CELSR2 84722 PSRC1 142 6272 SORT1

lipid, lipoprotein, triglyceride, triglycerides, fatty acid, 18193043 fatty acids 10221 TRIB1 51085 MLXIPL 27329 ANGPTL3

Table 3.4: Training genes, loci and results from Endeavour analysis

Gene Trait Training genes Locus (chromosome:bp) Rank SMAD7 colorectal cancer TGFBR2 CTNNB1 PIK3CA MUTYH CHEK2 18:14615650‐74315649 18 AXIN2 KRAS APC MSH2 MLH1 PMS1 PMS2 MSH6 MLH3 BRAF AKT1 MCC LRP8 coronary artery KALRN MMP3 CD36 MEF2A LRP6 ENPP1 1:41183805‐65866330 8 disease SLC24A4 hair and eye color OCA2 HERC2 TYR MC1R SLC45A2 SLC24A5 14:75858674‐114032348 7 TYRP1 KITLG hair color MC1R OCA2 HERC2 SLC24A5 TYR 12:66414728‐108498369 55 GALNT2 HDL cholesterol levels CETP LMNA PPARG LCAT ABCA1 USF1 1:210260360‐246484493 73 APOA5 LDLR APOB PCSK9 APOA2 EPHX2

143 MMAB 12:95477638‐120495685 52 GHR ITIH4 CYP27A1 SAR1B KALRN APOC2 MVK 12:95495443‐120520305 2

CELSR2 LDL cholesterol levels LDLR APOB PCSK9 APOA2 EPHX2 GHR ITIH4 1:91594164‐126619900 9 PSRC1 USF1 NPC1 APOA5 CETP RAI1 LRP6 CD36 1:91623701‐126627331 115 SAR1B APOC2 ABCA1 SORT1 1:91653715‐126742096 5 COL11A1 lumbar disc COL9A2 COL9A3 CILP ACAN 1:87114611‐118346635 1 herniation BTBD9 restless legs SPTLC1 ATXN3 6:32250713‐44671821 84 syndrome C5 rheumatoid arthritis NFKBIL1 PADI4 SLC22A4 RUNX1 CIITA 9:112754437‐132852375 8 TRAF1 STAT4 HLA­DRB1 PTPN22 9:112704492‐132730868 138 SLC24A5 skin pigmentation OCA2 HERC2 TYR KITLG SLC24A4 15:32200461‐57222161 1

SLC45A2 5:980478‐67020592 1 ANGPTL3 triglyceride levels USF1 CETP BSCL2 LCAT APOA5 LIPI RP1 1:44835746‐79844418 14 MLXIPL ALMS1 ACADVL ACADM PCSK9 LPL 7:50645460‐93676806 64 TRIB1 8:103512046‐148519830 108 144

Chapter 4

Selection and testing of variants for association with PI3K/AKT/mTOR signaling phenotypes

145

4.1 Abstract

The PI3K/AKT/mTOR signaling pathway is a key pro‐growth signaling pathway in human cells, and results presented in Chapter 2 suggest that its activity is partially influenced by inter‐individual genetic variation. Specifically, the ratios of phosphorylated to total AKT1 and p70S6K were previously shown to be moderately heritable, and linkage analysis identified regions on chromosomes 3 and 14 as likely locations for causal functional variants. In work presented in this chapter, seven top candidate genes were identified on chromosomes 3 and 14 using CANDID, the candidate gene identification algorithm described in Chapter 3. Known variants in these genes were genotyped and tested using a mixed model for association with the

AKT1 or p70S6K ratio traits. One SNP in the HSP90AA1 gene, rs1190584, was statistically significantly associated with the AKT1 ratio trait, and four SNPs on chromosome 3 were associated with the p70S6K ratio trait. Three of the four statistically significant SNPs on chromosome 3 were in the RAF1 gene, which encodes the well‐known c‐RAF signaling protein. This work represents the first statistically significant genetic association results for a phenotype defined by a protein modification. Additionally, the identification of these SNPs may contribute to a better understanding of the normal activation and clinical inhibition of the

PI3K/AKT/mTOR pathway.

146

4.2 Introduction

As discussed in detail in Chapter 1, the PI3K/AKT/mTOR pathway plays an important role in normal cell growth and development, and the fundamentals of information flow through the pathway are well understood. It is also well known that this pathway is highly activated in some individuals – namely, individuals with cancer or inherited mutations in genes encoding PI3K/AKT/mTOR pathway members. Despite these facts, little research has been performed to examine whether there is more subtle inter‐individual variation in pathway activity. If this variation exists, it might be influenced genetically, and it might also be linked to the development of diseases like cancer.

The research described in Chapter 2 supports the hypothesis that inter‐ individual genetic variation produces subtle differences in the activation levels of key signaling proteins in the PI3K/AKT/mTOR pathway. Using a collection of lymphoblastoid cell lines derived from large Caucasian families, variation was shown to exist in the levels of total and phosphorylated AKT1, p70S6K and 4E‐BP1 across different individuals as measured by ELISAs. Furthermore, the ratios of the phosphorylated to total forms of those three proteins were also variable. Using information about the family structures, it was determined that two of these ratio traits possessed moderate heritability (h2 > 0.15), indicating that they were influenced to a moderate extent by genetic factors. These traits were the ratios of the phosphorylated to total forms of AKT1 and p70S6K (referred to in the remainder of this chapter as the AKT1 ratio and p70S6K ratio).

147

Linkage analysis indicated regions of the genome that may harbor genetic variants that influence these two traits. For the AKT1 ratio, a region on chromosome 14 contained the highest LOD score for the trait, indicating that the region was the most likely locus to contain variants influencing the phenotype.

However, the LOD score (1.70) fell short of a threshold for determining statistical confidence (1.80). Both the AKT1 ratio and p70S6K ratio had linkage peaks in the same region of chromosome 3, and a multivariate analysis of the two traits revealed a shared genetic component, which also mapped to chromosome 3. The maximum

LOD score of the shared component, 2.91, can be considered suggestive linkage. The peak at the 36 cM mark on chromosome 3 for this shared genetic component (old‐

LOD confidence interval between 21 cM and 46 cM) and the chromosome 14 peak for the AKT1 ratio (one‐LOD confidence interval between 87 cM and 134 cM) were selected for follow‐up analyses.

Historically, a variety of approaches have been used to search for causal variants represented by linkage peaks. Some of these strategies include adding additional markers to refine the linkage signal (fine mapping), selecting candidate genes located underneath the linkage peak for follow‐up functional or sequencing studies and performing association tests on genotyped SNPs located beneath the peak [Brophy, et al. ; Ehret, et al. 2009; Schulze, et al. 2003; Stamm, et al. 2008]. The selection of SNPs for association tests can also be accomplished using several methods. One systematic approach involves the selection of SNPs that maximize coverage of regions of linkage disequilibrium (LD) across the entire area under the

148

linkage peak [Ehret, et al. 2009]. Another SNP selection approach involves selecting candidate genes under the peak and then choosing SNPs associated with only those genes [Brophy, et al.]. Of course, both this approach and the approach that identifies candidate genes for functional follow‐up studies require a method for the identification of candidate genes. As discussed in Chapter 3, many such methods exist. However, the method described in Chapter 3, CANDID, is flexible, freely available and capable of outperforming many other algorithms for identifying genes involved with complex human traits.

To search the peaks on chromosomes 3 and 14 for variants associated with the AKT1 and p70S6K ratio phenotypes, a multi‐stage approach was taken. First,

CANDID was used to select candidate genes under each peak. Then, SNPs for each of the top candidate genes were selected and genotyped. An association test was performed to identify whether any of the genotyped SNPs were associated with the

AKT1 or p70S6K ratio traits. Any SNPs that were statistically significant or close to being statistically significant were then tested in a patient sample from a of ovarian cancer treatments to see whether these SNPs, associated with in vitro PI3K/AKT/mTOR phenotypes, were also associated with clinical traits in the context of cancer. This approach was successful in identifying several statistically significant associations of SNPs with PI3K/AKT/mTOR signaling phenotypes, and these results are presented here.

4.3 Materials and Methods

149

4.3.1 Candidate gene selection

Candidate genes for association analysis were selected using CANDID, the candidate gene identification algorithm described in Chapter 3. Inputs to CANDID included results from previous linkage analyses and existing knowledge about the

PI3K/AKT/mTOR signaling pathway. Weights for CANDID were set as 1 for literature, 0.2 for protein domains, 0.2 for conservation and 0.8 for protein‐protein interactions, and version 5 of the CANDID databases was used. Keywords for

CANDID were AKT, signaling, phosphatase, kinase, mTOR, PI3K, phosphatidyl, phosphoinositide, p70S6K and AKT1. The analysis for the AKT1 signaling trait was restricted to genes that fell between 87 cM and the q‐terminus of chromosome 14

(roughly 134 cM), while the analysis for the combined AKT1/p70S6K signaling trait was limited to genes on chromosome 3 between 19.5 cM and 54 cM. The top‐ranked genes were selected for further analysis.

4.3.2 SNP selection and genotyping

For six identified candidate genes (HSP90AA1, KAT2B, RAB5A, RAF1, VHL and

GRM7), a total of 86 tag SNPs were selected using an r2 cutoff of 0.8. For the AKT1 candidate gene, a more in‐depth SNP selection process was undertaken in which 51 coding SNPs or common intronic or perigenic SNPs were selected from dbSNP. SNP genotyping was performed with the Sequenom platform using DNA extracted from cell lines from the 14 CEPH families. PCR and extension primers were designed using Sequenom AssayDesign software (primer sequences and assay designs are

150

shown in Table 4.1). PCR reactions, shrimp alkaline phosphatase digestion and extension reactions were run according to Sequenom’s standard protocol with one exception: a linear adjustment was made to the PCR primer concentrations in an attempt to standardize mass spectrometer peak heights.

4.3.3 Cell culture, cDNA preparation and sequencing

RNA was extracted from 122 lymphoblastoid cell lines from 14 families in the

CEPH (Centre d’Etude du Polymorphisme Humain) collection. Cell lines were seeded at a density of 250,000 cells/ml and a total volume of 20 ml in duplicate flasks. Growth media consisted of RPMI (Invitrogen) plus 10% fetal bovine serum

(Sigma). After 48 hours, the duplicate flasks were pooled and cells were washed twice in 1 milliliter of ice‐cold PBS. During the final PBS wash, 100 microliters of the cell suspension was pelleted, frozen at ‐80°C and reserved for RNA extraction, while the remainder of the cell suspension was used for protein extraction for the experiments described in Chapter 2. RNA was extracted from thawed pellets using

RNeasy Mini kits (QIAGEN) according to the recommended protocol for cultured cells. Concentrations of RNA were determined using a NanoDrop spectrophotometer. For each cell line, cDNA was synthesized using SuperScript II reverse transcriptase and random primers (Invitrogen). One microgram of total

RNA was used in each 20‐microliter reaction, and the manufacturer’s protocol for the SuperScript II enzyme was followed exactly.

151

One of three alternately spliced AKT1 transcripts (ENST00000402615) was sequenced in 26 cDNA samples representing parents from the 14 CEPH nuclear families used in the signaling phenotyping experiments performed in Chapter 2.

Five overlapping primer pairs were designed that covered 94 percent of the transcript (1for: 5’‐ ACTGGGTGCTCCTCACTGAC‐3’; 1rev: 5’‐

AGGTGCCATCATTCTTGAGG‐3’; 2for: 5’‐ GCTATTGTGAAGGAGGGTTGG‐3’; 2rev: 5’‐

TTGGCGTACTCCATGACAAA‐3’; 3for: 5’‐ ACTCTTTCCAGACCCACGAC‐3’; 3rev: 5’‐

CTTGGTCAGGTGGTGTGATG‐3’; 4for: 5’‐ TCGGAGACTGACACCAGGTA‐3’; 4rev: 5’‐

CCTCAGAGACACGGCCTTAG‐3’; 5for: 5’‐ CTGGGCCTCAGTTCAACCT‐3’; 5rev: 5’‐

ACCCCCACAGGGAGTCAG‐3’). RT‐PCR was performed using KOD polymerase (EMD

Biosciences) according to the polymerase manufacturer’s protocol, and products were run on and then excised from a 1.5% agarose gel. The products were then purified using QIAGEN Gel Extraction kits, and the concentrations of the resulting solutions were determined using a NanoDrop spectrophotometer. Products were diluted as necessary and sequenced using an Applied Biosystems 3730xl Genetic

Analyzer. Sequence electropherograms were interpreted using Sequencher software (Gene Codes Corporation).

4.3.4 CEPH association testing

Any SNP or individual that had less than an 85 percent success rate was removed from further analysis. Pedstats and MERLIN were used to identify and remove unlikely genotypes and genotypes that produced Mendelian errors.

152

Invariant SNPs were also removed. The remaining SNPs were tested for Hardy‐

Weinberg Equilibrium in the unrelated individuals. SNPs that passed this test were analyzed in Haploview, and additional SNPs were eliminated as having been tagged by other genotyped SNPs. Genotypes for the remaining SNPs were coded for analysis with the minor allele in the founders represented in an additive (genotypic) model, a dominant model and a recessive model. Association tests with the protein phenotypes from Chapter 2 were performed in SAS using PROC MIXED, with pedigree ID as a class and subject variable and compound symmetry as the covariance structure.

Codings of the statistically significant SNPs under the appropriate model

(additive, dominant or recessive) were also incorporated as potential covariates for heritability analyses of the corresponding phenotypes. These analyses were conducted as described in Chapter 2, except that covariates previously found to be statistically significant for each phenotype were fixed in the polygenic model, and the SNPs in question were screened for their statistical significance as covariates and their effects on the total amount of variance explained by all statistically significant covariates. Significant SNPs were also visualized in Haploview using data from the HapMap CEU population (Version 2, Release 24).

4.3.5 SCOTROC sample and clinical trait association testing

The Scottish Randomised Trial in Ovarian Cancer (SCOTROC) consisted of

1077 ovarian cancer patients that were randomized to receive a chemotherapy

153

regimen of either docetaxel and carboplatin or paclitaxel and carboplatin. Clinical information on the patients and their tumors was collected and included data on patient age, tumor FIGO stage, tumor grade, tumor histology, clinical response, CA‐

125 response, survival time, progression free survival time and information on various toxicities. Follow‐ups were done for each patient at two‐month intervals, and 98 percent of patients were followed for at least one year [Vasey, et al. 2004].

DNA samples and clinical information were available for 890 patients from the SCOTROC trial. To test for association of SNPs identified in the CEPH sample with clinical traits, the group of 890 patients was split into two smaller groups – a group of 430 individuals to be used as a testing set and a group of 460 individuals to be used as a replication set in which to test any statistically significant SNPs identified in the testing set. The 430 individuals in the testing set were genotyped for 17 SNPs identified in the CEPH association tests using a single Sequenom assay

(Table 4.1), and one SNP (rs2494749) was removed due to quality control filters.

Genotypes were coded for each SNP under additive, dominant and recessive models.

Each SNP was tested for association under each model with five clinical traits from the SCOTROC dataset. Three of these traits were dichotomized to produce a case control and a control group as follows: FIGO stage (case: stage III or IV, control: stage Ic or II), tumor grade (case: poorly differentiated or undifferentiated, control: well‐differentiated or moderately differentiated) and clinical response (case: stable disease or progressive disease, control: complete response or partial response). In general, choices about how to dichotomize these traits were made such that the

154

numbers of cases and controls were as close to equal as possible. These resulting binary traits were tested first for covariates among the other available clinical variables, but none were found. The three traits were then tested for association with all SNPs using Fisher’s exact test. Two additional clinical traits were also examined – survival time and progression‐free survival time. These traits were also assessed for covariates before testing them for association with the genotyped SNPs.

Statistically significant covariates identified for progression‐free survival included the bulk of residual disease after surgery, FIGO stage, ECOG performance status and status of the tumor as either a clear cell carcinoma or endometroid carcinoma. All of these covariates were also statistically significant for overall survival time, as was the status of the tumor as a mucinous cystadenocarcinoma. SNPs were tested for association with these two survival‐related traits using a Cox proportional hazards model implemented in SAS using PROC PHREG. The aforementioned statistically significant covariates were included in the model, and genotype data for each SNP was added to see if it represented an additional statistically significant predictor. A

Bonferroni‐corrected P‐value of 0.001 reflecting 48 tests for each clinical variable

(16 SNPs and three genetic models) was used to determine statistical significance.

4.4 Results

Previous experiments investigating the existence of genetic influences on the activation of the PI3K/AKT/mTOR signaling pathway indicated two chromosomal loci of potential interest. One of these loci was on chromosome 14 and was linked to

155

the ratio of phosphorylated to total AKT1 in the CEPH sample, and the other was located on chromosome 3 and linked with both the AKT1 ratio phenotype as well as the ratio of phosphorylated to total p70S6K. To find out which genes at these loci might harbor causal variants for these traits, CANDID (described in Chapter 3) was used to identify candidate genes for follow‐up analysis. Keywords related to cell signaling and the PI3K/AKT/mTOR pathway were selected and entered into

CANDID, which also utilized information on gene conservation, protein domains and protein‐protein interactions. All genes within the defined linkage regions were scored and ranked, and the top ten genes for each peak are shown in Table 4.2. The

AKT1 gene represented an exceptionally strong candidate for the AKT1 ratio peak on chromosome 14, while the top‐ranked genes on chromosome 3 included RAF1,

VHL and RAB5A, which are signaling proteins in pathways that interact with

PI3K/AKT/mTOR signaling (Figure 4.1).

The top two genes under the chromosome 14 peak (AKT1 and HSP90AA1) and the top five genes under the chromosome 3 peak (RAF1, VHL, KAT2B, GRM7 and

RAB5A) were selected to test whether variants in these genes are associated with either the AKT1 phosphorylated:total protein ratio or the p70S6K phosphorylated:total protein ratio. Since the AKT1 gene was such a strong candidate for the AKT1 ratio phenotype, a comprehensive SNP genotyping strategy was desired for this gene. In addition to searching dbSNP for potential causal variants, the cDNA sequence of one known AKT1 transcript was sequenced in 26 parents from the CEPH families to search for any previously unknown coding SNPs.

156

Despite successfully sequencing an average of 94 percent of the transcript in all individuals, no novel AKT1 SNPs were found using this method. As a result, the genotyping strategy for the AKT1 phenotype consisted of genotyping 51 known

SNPs in or near AKT1 along with an additional 7 tag SNPs from the second‐highest scored candidate gene, HSP90AA1. Of these 58, 20 SNPs remained after removing invariant SNPs, SNPs that could be tagged through linkage disequilibrium by other genotyped SNPs and any SNPs that failed quality control filters. One of these, rs1190584, was statistically significantly associated with the AKT1 ratio phenotype

(Table 4.3). P‐values for this SNP were 0.0036 under a dominant model and 0.0042 under an additive model. When the minor allele is coded under a dominant model and the resulting variable is screened as a covariate for the AKT1 ratio trait, its inclusion explains an additional 4.44 percent of the overall variance in the phenotype, meaning it may explain approximately 25 percent of the trait’s overall heritability (h2 = 0.1799). Though this SNP lies approximately 62.5 kilobases upstream of HSP90AA1 in an for a different gene, WDR20, it is part of a block of linkage disequilibrium that extends into the 5’ end of the HSP90AA1 gene (Figure

4.2).

Of the 79 SNPs genotyped on chromosome 3, four failed quality control filters, leaving 75 SNPs for association testing. These SNPs were tested for association with the phosphorylated:total p70S6K ratio because its linkage evidence in this region was stronger than that of the phosphorylated:total AKT1 ratio. Out of

75 tested SNPs, four showed statistically significant association under at least one

157

model (Table 4.3). Three of these (rs713178, rs5746223 and rs9855183) are distributed across a 25‐kilobase region that includes RAF1 and are in moderately high linkage disequilibrium with each other, though their pairwise r2 values are low.

Of the three, rs5746223 is the only SNP located in the RAF1 gene – rs713178 and rs9855183 are both located downstream of RAF1, in of the MKRN2 gene

(Figure 4.2). The fourth statistically significant SNP, rs12630300, is in the intron of the GRM7 gene and is separated from the RAF1 SNPs by approximately 6 megabases.

No other SNPs approached statistically significant association with this phenotype.

When coded under an additive model and tested individually as covariates, the three RAF1 SNPs were statistically significant as covariates (P < 0.001) and increased the proportion of total variance explained by covariates from anywhere between 2.3 percent (rs713178) to 7.7 percent (rs5746223 and rs9855183). They are therefore responsible for explaining between 10 percent to 33 percent of the original heritability (h2 = 0.2313) when analyzed individually. In contrast, the GRM7

SNP was statistically significant as a covariate (P < 0.001) but did not increase the proportion of total variance explained by covariates. Furthermore, when all four

SNPs were tested simultaneously as potential covariates, only rs713178 was statistically significant and had roughly the same effect on the total variance as when it was analyzed apart from the other SNPs.

Since several SNPs showed associations with the levels of phosphorylation of proteins in the PI3K/AKT/mTOR pathway, which is commonly implicated in cancer, possible associations between these SNPs and cancer clinical phenotypes were

158

investigated using samples from the SCOTROC ovarian cancer clinical trial. Five clinical traits thought to be potentially influenced by PI3K/AKT/mTOR signaling were selected: tumor stage, FIGO grade, clinical response, overall survival and progression‐free survival. Sixteen candidate SNPs that achieved or were close to achieving statistical significance in the CEPH cell line studies were genotyped and analyzed in approximately half of the 890 patients for which DNA was available.

This allowed for a two‐stage design in which any SNPs that were statistically significant in this set of patients could be tested for replication in the other half of the SCOTROC samples. However, none of the SNPs were statistically significantly associated with any of the clinical phenotypes under any of the three genetic models

(additive, dominant and recessive) that were tested in this patient set.

4.5 Discussion

In Chapter 2, two PI3K/AKT/mTOR signaling phenotypes – the ratios of phosphorylated to total AKT1 and p70S6K – were shown to be moderately heritable and linked to regions on chromosomes 14 and 3. Here, candidate genes present in each of those two loci were ranked using CANDID, a candidate gene identification algorithm described in Chapter 3. For the AKT1 ratio linkage peak on chromosome

14, the AKT1 and HSP90AA1 genes were the top two results. Five genes, including several that encode key signaling proteins, constituted the top five results for the linkage peak on chromosome 3 that represented the shared genetic component between the AKT1 and p70S6K ratio traits. SNPs were selected and genotyped for

159

each of these 7 genes, and a mixed model was used to test the SNPs for association with either the AKT1 ratio (genes on chromosome 14) or the p70S6K ratio (genes on chromosome 3). One SNP in HSP90AA1, rs1190584, was statistically significantly associated with the AKT1 ratio phenotype (P = 0.0036), and four SNPs on chromosome three were statistically significantly associated with the p70S6K ratio phenotype (rs12630300, P = 4.10×10‐4; rs5746223, P = 6.35×10‐5; rs713178, P =

4.57×10‐4; rs9855183, P = 6.35×10‐5). These five SNPs, in combination with eleven others, were then tested for association with cancer‐related clinical traits in a patient sample. It was hypothesized that any SNPs that influence the baseline activation of this pathway might also influence its activity in cancer. Patients with

SNPs that cause higher levels of PI3K/AKT/mTOR signaling might therefore exhibit poorer prognoses. However, no associations were found with the five clinical phenotypes.

These results confirm that a multi‐stage approach incorporating linkage, candidate gene identification and SNP association testing can successfully identify variants associated with quantitative traits. The candidate gene identification step was a particularly important part of this strategy, as it was responsible for drastically narrowing the scope of the linkage results to a scale that would be feasible for medium‐scale association testing. Each of the loci described in Chapter

2 contained hundreds of genes, but CANDID was able to successfully narrow down the lists by orders of magnitude. Interestingly, the association analysis for the genes on chromosome 3 produced several statistically significant SNPs in RAF1, which was

160

the highest‐ranked gene in this interval according to CANDID. In contrast, on chromosome 14, the highest‐ranked gene, AKT1, did not harbor any genotyped SNPs that were statistically significantly associated with the AKT1 ratio. The sole statistically significant SNP for that phenotype, rs1190584, was found in the second highest‐ranked gene, HSP90AA1. If RAF1 and HSP90AA1 truly contain genetic variants that affect PI3K/AKT/mTOR signaling, it would mean that CANDID performed excellently in identifying the true causal genes for these two traits.

HSP90, the protein product of HSP90AA1, has long been known to affect signaling by regulating the stability of a variety of proteins, including AKT1 [Liao and Hung ; Sato, et al. 2000]. As such, it is a popular target for recent oncology drug development efforts [Gao, et al. ; Massey, et al.]. However, little is known about natural variation within the HSP90AA1 gene and its effects on the activity of HSP90.

Additional investigation of the HSP90AA1 SNP that was statistically significant in this study, rs1190584, is necessary in order to identify the mechanism by which the variant in that region affects the phosphorylation levels of AKT1. Since AKT1 must be localized at the plasma membrane in order to be phosphorylated at S473, the residue measured in the experiments in Chapter 2, possible mechanisms include various methods by which HSP90 might inhibit this localization. If a causal variant in or near HSP90AA1 works by altering the activity or overall levels of HSP90, it may influence patient response to new HSP90 inhibitors and should be considered for future pharmacogenomic studies.

161

Three of the four statistically significant SNPs on chromosome 3 reside in or near the RAF1 gene, while the fourth is linked to the GRM7 gene. Since the GRM7

SNP was the only SNP of the four that did not increase the explained total variance of the p70S6K ratio trait when included as a covariate, it may be less likely to be a true causal SNP. Additionally, it is not likely that the three statistically significant

SNPs in RAF1 represent three distinct association signals. The SNPs are in linkage disequilibrium with each other; pairwise D’ values range from 0.74 to 1.0 among the

SNPs, while pairwise r2 values range from 0.034 to 0.549. It is more likely that there is a single causal SNP in the RAF1 gene region that is being detected through linkage disequilibrium with these three genotyped SNPs.

The RAF1 gene encodes RAF‐1 (also known as c‐RAF), a key member of the mitogen‐activated protein kinase (MAPK) signaling cascade. Though the

PI3K/AKT/mTOR and MAPK pathways are usually thought of as distinct, cross‐talk between them has been documented in a variety of contexts [Kiyatkin, et al. 2006;

Rommel, et al. 1999; Yu, et al. 2002a]. However, reports describing the mechanism for this cross‐talk are contradictory, leading to an unclear understanding of how it might work. If a functional variant in RAF1 can be proven to influence the ratio of phosphorylated to total AKT1, its method of influencing RAF1 expression levels or

RAF1 protein structure could help elucidate the mechanism for PI3K‐MAPK cross‐ talk. Research by other groups into expression quantitative trait loci (eQTLs) identified an association between a SNP in the RAF1 gene region, rs10440120, and the level of mRNA expression of BLK, a tyrosine kinase [Dixon, et al. 2007; Moffatt, et

162

al. 2007]. The SNP is in high linkage disequilibrium with rs5746223 (D’ = 0.906, r2 =

0.444), one of the SNPs statistically significantly associated with the p70S6K ratio phenotype. Though not much information is known about this kinase, it represents an intriguing possibility for a mechanism by which variation in this region influences cell signaling. As with HSP90, RAF1 represents a target for oncology drug discovery, and additional information gained about its mechanism could assist with the development or testing of new treatments.

The work presented here represents the conclusion to a search for variants associated with the baseline activity of the PI3K/AKT/mTOR signaling pathway, and it may be significant in several ways. First, the genes and variants identified here represent highly plausible leads for further investigations into understanding the normal functioning of this signaling pathway. Secondly, it is possible that these genes or variants will one day be useful biomarkers in determining the efficacy of

PI3K/AKT/mTOR‐targeting drugs currently in development for the treatment of cancer. Finally, this is the first work to produce linkage or association results for inter‐individual differences in post‐translational modifications. As discussed in

Chapter 1, previous studies have examined genetic influences on gene expression and even protein expression, but not on protein modifications like phosphorylation.

This work can be taken as a valuable proof of concept for future genetic studies of cell signaling and other dynamic cellular processes, which may one day lead to a better understanding of human diseases like cancer.

163

A 2.0 3.0 B

2.5 1.5 2.0 D D O

O 1.0

L 1.5 L 1.0 0.5 0.5 0.0 0.0 0 10 20 30 40 50 60 70 70 80 90 100 110 120 130 140 cM position cM position

Figure 4.1: Locations of candidate genes relative to linkage peaks. The LOD peak on chromosome 3 for the correlated ratios of phosphorylated to total AKT1 and p70S6K is shown in (A), and the LOD peak on chromosome 14 for the ratio of phosphorylated to total AKT1 is shown in (B). The locations of the selected candidate genes for each trait are depicted by vertical lines in the corresponding panel. The chromosome 3 candidate genes, from left to right, are GRM7 (23.89 cM), VHL (30.72 cM), RAF1 (36.89 cM), RAB5A (43.07 cM) and KAT2B (43.19 cM). The chromosome 14 genes are HSP90AA1 (131.52 cM) and AKT1 (134.3 cM). The cM values were determined using CANDID and were calculated for each gene by using the gene’s midpoint (in base pairs) to interpolate its genetic location using two adjacent Marshfield map markers with known physical locations.

164

A

HSP90AA1

B

RAF1 MKRN2

Figure 4.2: Linkage disequilibrium surrounding statistically significant SNPs. Haploview plots showing the extent of linkage disequilibrium surrounding statistically significant SNPs in HSP90AA1 (A) and RAF1 (B) are shown here. The statistically significant SNP associated with HSP90AA1, rs1190584, is indicated by a yellow star (A), as are the three statistically significant SNPs associated with RAF1 (B, from left to right: rs713178, rs9855183, rs5746223). The physical locations of the SNPs in relation to their associated genes are also depicted.

165

Table 4.1: SNP genotyping assay details

Assay SNP Gene PCR primer 1 (ACGTTGGATG‐) PCR primer 2 (ACGTTGGATG‐) Extension primer 1 rs1130214 AKT1 AAGACAGGACCAGGATGCAG TGGGGTTTCTCCCAGGAGG GGGTTTCTCCCAGGAGGTTTTTG 1 rs1130233 AKT1 CAGCTGTTCTTCCACCTGTC ACACAATCTCAGCGCCATAG TGGCCCGGTCCTCGGAGAACACACG 1 rs1130295 AKT1 TGTCTCTGAGGACGTCATCG TAAACCTTGCTCCTCTGTCC GACTCCCCCATCCCTGGTCCCATCCCA 1 rs1130443 AKT1 TACCGTGGAGAGATCATCTG TTTGTGACAGGAAAGCCCTC CCCTCCCCCTTCCCCTT 1 rs11555432 AKT1 TCTCCTCCATGAGGATGAGC ATGTACGAGATGATGTGCGG ACCAGGACCATGAGAAGC 1 rs11555434 AKT1 CGACACTGTGGCCTTGTTTC AAGCGATGCTGCATGATCTC TTGCGTCCTCGGAGCCCCC 1 rs11848899 AKT1 GCAGTTGGTGGATAAGATGG ACGCTAACTGAGCGGGAAGA AGCGGGAAGACTGATGA 1 rs12588965 AKT1 GTTTCTGTCGCTGGCCCTA AGGACAGATGTGCCTGGGAT GTGGCAGGCGCGGTACGGGAGC 1 rs17846831 AKT1 ACAGCATTGCGTGTGCTCAG TTTTGCGGCACACCTGAGTA TGCGGCACACCTGAGTACCTGGC 1 rs2230508 AKT1 GACTACCTGCACTCGGAGAA TCAGTGCCGCCAGGCCCCCA ACCTTGAGGTCCCGGTACAC 1 rs2494731 AKT1 TGCAGACAGGACTCTCAATG TCACGTGTGCACATCACCTT TCTCACCTTATAGTCACCCTT 1 rs34558581 AKT1 CAATGCCACATTGCGCATAG CCTCAGCCCTCAGAACAATC CCTAGCCCTCAGAACAATCCGATTCA 1 rs34670300 AKT1 CACCCTCATCTCCACCCTG ACGCCATGAAGATCCTCAAG GATCCTCAAGAAGGAAGTCAT 1 rs35416681 AKT1 TGCCACCAGGTTGAACTGAG ATGTGTCCCCTCACTCTGTC CCCGGCTGTCTGTCACCAGCTA 166 1 rs36214921 AKT1 ACTTTTCGTCCGTGGGAAAC TGCCCCGGGGTTGGAGAAA GGGTTGGAGAAAGACTCGC 1 rs3803304 AKT1 TATCAGTGTAGTCTGGGAGG TACACCTCCATCCCCTCATC GCATCCCCAGGCTGCACCTGCCCCT

1 rs3840005 AKT1 ACAGGGAGTCAGGGAGGGC AGATGATCTCTCCACGGTAG CACTTGACCTTTTCGAC 2 rs10138227 AKT1 CCAGGAGGTTTTTGGGCTTG AGGAAGACAGGACCAGGATG GTGCAGGCCACTGGCGCAAA 2 rs1130526 AKT1 ATAATATGGAACCTTCCCTC CACTCTTCCACCCAGCAAAG TTGAAAAGCAACTTTTATTGAA 2 rs11555430 AKT1 TGGCTTCTCTCAAATGCACC ACGATAGCTTGGAGGGATGG TATAGCTTGGAGGGATGGAGAGGCG 2 rs11555433’ AKT1 TTGAGTACCTGAAGCTGCTG GATCTTCATGGCGTAGTAGC CTGTGGCCTTCTCCTTC 2 rs11555435 AKT1 TGGCCGAGTAGGAGAACTG TCTACACCCACAGATGACAG CCACAGATGACAGCATGGAGTGT 2 rs12881616 AKT1 ACAGCATTGCGTGTGCTCAG TTTTGCGGCACACCTGAGTA TGAGTACCTGGCCCCCG 2 rs17846825 AKT1 CAGGCACAGGCAGAAGTGG GCTGAAGAGATGGAGGTGTC GTCCCTGGCCAAGCCCAAGCACCG 2 rs2494746 AKT1 AGCTTCTGGCTCTGCTTCC AACAAGCAGGGGACAGCACA GGGATGGAGAAGGCAGGATG 2 rs2494748 AKT1 ACAGGCACTCACAGACCCT ACAAACGAGGTTAGTACCCG TGCCTGGGGAGGGAGAGAC 2 rs28730786’ AKT1 AGCCTGAGAGGAGCGCGTGA CCAACCCTCCTTCACAATAG CACGTCGCTCATGGTGCCC 2 rs41307094 AKT1 AGAGTGAGGGGACACATGG CCGATTCACGTAGGGAAATG CATGTTAAGGACTTCTGCAGCTATG 2 rs4375597’ AKT1 TATCAGGCTGCCCTGCTACT CAGGACCTGGTTCCTGTAAG ATTCCCTGGCTCTTCCAGCCTCA 2 rs55707948 AKT1 TCCCCCTCAGATGATCTCTC GCGACAGCGGAAAGGTTAAG CGTCGAAAAGGTCAAGT

2 rs55874341’ AKT1 CACATCATCTCGTACATGAC TGCAGGTGCTGGAGGACAA GGCAGGTGCTGGAGGACAATGACTA 2 rs58565216 AKT1 CCTCCCTGACTCCCTGTGG GTAAAAAAACGCCGTGGTGC CTGGCCAGGAGGCGTGGA 3 rs11555429 AKT1 CCTCCTGGCCAGACGCTG GGGAAGGTTCCATATTATAT CGTTGAATGTTGTAAAAAAACGCC 3 rs11555431 AKT1 TGTAGCCTGTAGCTGGGATG GGCCAAGTCCTTGCTTTCAG GGCTGCTCAAGAAGGAC 3 rs17846826 AKT1 TTCTCGGGTGCATTTGAGAG GCTGCCCACAGCACAAAAAC AAACGTCTTTCCATCTGGGCTC 3 rs2230506 AKT1 CTGGAAAGAGTACTTCAGGG CCACACACTCACCGAGAAC CACTCACCGAGAACCGCGTC 3 rs2494735 AKT1 CTTCCATGTGGAGACTCCTG ATCCCCGTGTCCCTCCTAAG AAGCGCTGGGGCTGCCCAAG 3 rs2494738 AKT1 AGGACAGATGTGCCTGGGAT TTTCTGTCGCTGGCCCTAAG GCCCTAAGAAACAGCTCC 3 rs2498793 AKT1 ACTGGCTGGTTTAGGGCTG TGGAGTACCATGGACAGGAG CCAGAGGTCTGGGCCAGTCTGC 3 rs2498796 AKT1 AGCTCCCCTTGCATACCAC AGAAACTGAGGCTTGGAGAG GGAAGAGATGGGGCTTCC 3 rs34865723 AKT1 TGCCACCAGGTTGAACTGAG ATGTGTCCCCTCACTCTGTC CTCTGTCAGCCAGCCGC 3 rs3803305 AKT1 ATGGCCACCCCCACAGGGA CGGTAGCACTTGACCTTTTC AGTTCCGCTGTCGCCCCAGGCCCT 3 rs56217512 AKT1 TGAGAGGAGCGCGTGAGCGT GGTACTAACCTCGTTTGTGC TCCAACCCTCCTTCACAATAGCCAC 3 rs56289559 AKT1 TGGTCCTGGTTGTAGAAGGG TGGAGGACAATGACTACGGC TGGTCATGTACGAGATGATGTG 4 rs11555436 AKT1 ATGATGGCACCTTCATTGGC CACAGAGAAGTTGTTGAGGG ACGTTGGTCCACATCCT 4 rs17102385 AKT1 GAATGTTGTAAAAAAACGCC TCCACGCCTCCTGGCCAGA GGAATCCTGGCCAGACGCTGC 4 rs1804268 AKT1 CTCTGTCCCACTGGGTAAAC TAAGGCCGTGTCTCTGAGGA TCTGAGGACGTCATCGGAG

167 4 rs2494732 AKT1 TTTCAGGGCTGCTCAAGAAG GATGAGGGGATGGAGGTGT GATGGAGGTGTAGCCTG 4 rs2494749’ AKT1 ACAGGCACTCACAGACCCT ACAAACGAGGTTAGTACCCG TTTGCCTGGGGAGGGAGAGA 4 rs34409589 AKT1 TTCTTGAGGAGGAAGTAGCG TAGAGTGTGCGTGGCTCTCA CCCGCACGTCTGTAGGG 4 rs61761249 AKT1 TCCCTACGTGAATCGGATTG GGAAGAAAACTATCCTGCGG CTCCGGGTGTGGCCTCAGCC 5 rs1154355 GRM7 ATTGGAAATGGGATAATGGG CTGGACAACACAGCAAGACC GTACCACAGCAAGACCTTGTTTCCC 5 rs12630300 GRM7 GGGCCACTTTTCAATTTGAG GAAAGGTAAGGTAGTGCAAC TGTAGTGCAACTCTTCCATA 5 rs17046224 GRM7 GCTTAAGCGTATCTATTTAC TAAGGGTTTGTGTGCTCAAG AGGGGTTGTGTGCTCAAGTTTCTA 5 rs17046239 GRM7 CACATGATCTTTGGGACAAC TAAAGACTTCTGGGCTCCTG GGCCCCTGAGGCTTTTCATATAAAC 5 rs191443 GRM7 GAAGAAAGAGGCTCTCTAGG ATAGCCTCCTACAAAGGGTG CTCCTACAAAGGGTGCAATTA 5 rs340657 GRM7 ATAGGTGTCACTGCCATTTG TCCCACTCACCATCTGTTAG TGTGGAGGCAGAGATTGTTATA 5 rs3846161 GRM7 ATGGCCATCACATGAGAACG ACAGAGCTGGATGGAGAAAG GGATGGAGAAAGAATTCG 5 rs6443074 GRM7 CCCTTGCTACATTCTGACTC ATCCACCTGAATCAGATGCC CATTCAATTCCAAGTCTATAGTG 5 rs7644436 GRM7 CCACAATGAATGACCTTGAC GGCAGATGTGTTTTTCCCTC AATGAGGTTCAGAGAGGCTA 5 rs7651971 GRM7 GGTAGGTGTTTAGCCCTACT CATTTGCAGAGCCCTTGTTG CCCTCAGAGCATTTAGT 5 rs983534 GRM7 GTTAAGAGGTGGGTTCTTGC AGGCTGAGGCAAGAGAATTG AGGCCAGAAATTCGAGA 5 rs1124376 KAT2B ATCTTGGACAGCACAGTTGG AGGAATGAAGAGAGGGAGAC AGAGGGAGACAACATGGGGT 5 rs1610186 KAT2B GGGAAGTCTCAGTCATTATG ACTTCCTCCACTGAGTCTTG GGGATCCCAGTACAAAGTA

5 rs2929401 KAT2B CACCTGGCACATTAAATAAC ACAAGCCTCAGATTTCGCAG GGATCCTCAGATTTCGCAGCTGTAAT 5 rs11128927 RAB5A TGCGAGTGGCTCAGATCCGA ATAACAATGGCGCTGGAGAG GGGGAAGGGGAGAAAGGC 5 rs4241540 RAB5A AGCAGGAGACACCCACTTAG CACTGATGTCCTTTGTATGC AAAGGGTGGACCACTTGGTTGTCAGT 5 rs6790199 RAB5A TGAGGACCACTGTTCTAAGC GATGATGGAACTGTACATGA CCCGTACATGAAAATAACATTCCTCTT 5 rs6796538 VHL TCACTCTCGGCACCTCCTTG AGAAAGGGGCTCCCACAGT GGGTTGCAGCCGCATGCTGGAC 6 rs339807 GRM7 GGATATGCCCCTCTTACTTG AGCCAAGACTTGATCACAGG GGGTGGATCAGTTCTCCTGAAACCAT 6 rs393046 GRM7 TTTCAGTCTGGAGTGCAAGG CTGTACATGTATCCCAGCAC TACAACGATCTTCTCCC 6 rs4108607 GRM7 CATAAATCCCTTTTCTGCCAC GTACTGCAATAATGAAAGC ACTGCAATAATGAAAGCAAATCC 6 rs9870018 GRM7 ACCACACACCCAGAAATATG TGGTCTTCATAATACCTCCC GCTTTTCCTACATTTGCAC 6 rs9872244 GRM7 GGGAAAAGCAGAGAGAAATG TGAAACCTCTTAGGTTTGGG GGGAAATTTCAATAGCAGA 6 rs9882058 GRM7 GATCAATCTGATTAGGAGGAC TGTTGGATACCAAAGACCAC AACAATTTTAGTCCTTCAGAATCTA 6 rs12639078 KAT2B GGTAAAGGGTGGATGAACAG AAAGAATTGCGCACTGTGCC GCGAACTGTGCCCATTACCTAGACTCT 6 rs1915919 KAT2B ATTCTGTACATAGCTCGCCC AGGAGTTGATCATGAAGGGC AGACGAATGAGAACAGGACAC 6 rs2623074 KAT2B TTTTCAAGGTGCTTGAAAC GCTTTCTTCTGAATTCAGGC CCCCTTTCCAAAGCAGCACAT 6 rs2929402 KAT2B TAATCAGTCCCCATGCGTTG TTCTGGAACCTCCTGCATAC TTCCTACCACACAAACCTCGCAAAA 6 rs2929408 KAT2B GAGTGGGTAGCAAGAAATGG ACAGGATTTGGCAAAGTGAC GGAGTTGGCAAAGTGACAGCCCAG 6 rs13085694 RAB5A AGTAACCATCAGCTGCCTTC GTCCTTCCTTTGCCCAAAGC TCCTTCTCAAAAATGCAACTCTTATCT

168 6 rs2929344 RAB5A GGGTCAACATCTCTGGTATC GCCAGTGTAAAGGTGATTGC GGTGATTGCAAGATTCCA 6 rs2929346 RAB5A TCTTCCACCTCTCTCACTAC ACAGTCCCTTGACTCAACTC TATGCTTTGACCCAAAAAGA 6 rs7613136 RAB5A AAGGTTTCCATGAAGATGGG AGCCTGGGCAACAGAGCAA CAACAGAGCAAGACTCC 6 rs9810613 RAB5A TACATGGCTGGACATATGTG CAAAGTAATTGTACTGTGA AGTCATTGTACTGTGAACACCC 6 rs3773341 RAF1 GCCAAACAGTTTGAAAACCC AGTGGACGGAATAAAGTCTC CCAGCAAACATTCTCCTAGCTCC 7 rs1154354 GRM7 AATCCCATTATCCCATTTCC TTGCTTGCATTGTGCCTGTG CTGTACCCCCACAAACAAAAA 7 rs339805 GRM7 TTTTTCCCCACTATTCAAGG AGACTTCATTCCCAGTCTCC TCCCAGTCTCCCCGACAAA 7 rs340660 GRM7 CAGCCTGTCTCTGTCTGAAA CCCCTGGAATATTTTTCAGC TTCAGCAAACACAGTCC 7 rs3749380 GRM7 GATCACTGTTGATCTGGTCC CCTGCGGCGACATCAAGAG CGGCGACATCAAGAGGGAAAA 7 rs9818072 GRM7 AGGAAAGATAACCTTTGCCC CCTTCTGAATATCTTTGCCC GTATCTTTGCCCTCATCTTTATC 7 rs9876241 GRM7 GAAGATGCTATGAAATTCAC TAGCTAGAAGTGGGATTTGC GTGGGATTTGCTTTCATTATTGC 7 rs9882865 GRM7 ACATGTTTATGGGCTGGCTC CCAAAACAGGCTTGGCTTAG TTGGCTTAGAAGACACA 7 rs2948083 KAT2B CCCTTCTGCAGAAGGATTTC GCCTCGCAGAGAATGAAAAC CCTGGGGAATTCCTCTAGATAC 7 rs2948089 KAT2B TAAGGCACTCCCTTTACTCC TAGCATGAGGCAACAGTCAC AGGAGGCAACAGTCACCATATCTTA 7 rs6806287 KAT2B TCGTACCCAATCAAGTCCAG GCAGGAATCTTAACCTGGAG TCTTAACCTGGAGAGTGACA 7 rs4858660 RAB5A GCCCAGGTTATTTAACCCAG ACTGCTAGATCATGTCCACC CTTTAAGCATTTCCCAAAGTATAT 7 rs713178 RAF1 TAGACATGGCTCAGAGTCGG GGTTAAACTGCAGTGGAGTC GGATTCAGTGGAGTCAGAGTCCGT

7 rs9855183 RAF1 GGATGGGGTAAGTGCTTTTG ACACTCCTGGGAAGGAGAAC GGCAGGCAGCAGGGGCAC 7 rs265318 VHL TAGGGATGTAGGCATTTCTG AGATTCATGCAAACATCCCC CATCCATGCAAACATCCCCTCAGTT 8 rs1240966 GRM7 ATTTGCTTGCATTGTGCCTG AATCCCATTATCCCATTTCC CTTGAGATATTTTTGTTTGTGGG 8 rs2948097 KAT2B CTAAGGGTCCAGCTTCATTC GCCAAGACCATTCAATAGAG GCTGGAAAAACTAGGTATCTG 8 rs13072891 RAB5A ATGTTGGCTGCCTATTATTC CAAGAGCAAAAATCTGCCTC TCCTGTGGAGAAATATGTT 9 rs12487836 GRM7 CTAAATGTGAGTGCCAGATTC TTGCAGTGTGCCATCCATTG GTCCATTTGACAGATCTATTCAGTA 9 rs17046322 GRM7 TATTTTTTTCCTGGGCCCTG GGGCTAATGCCCACATCAAG GCCCACATCAAGTTAATTG 9 rs340655 GRM7 TTTGGTGAGTTTGTGGGAAC CAAAAGCAGTCTGAATACAGG TTGACTAGTAATGACCAAGA 9 rs414907’ GRM7 CTCTTAGCCTTGCTGTTATC TTAGTGACATTACTCAGTGG CCTCAGTGGAATGAAACTTTTT 9 rs421802 GRM7 CCACAAATGTAAATACAAAG GCTTCAGAAGTGTGAAATTAC TGCAGAAGTGTGAAATTACAAACATG 9 rs7651591 GRM7 GGTGGACTTCCACAAGTTTC GCCACAATGATCGATTAGTC CTGTGTGTGTGTGTGTATTAG 9 rs2929404 KAT2B TAGCTAGCTTTGTGGTCTGC AAGTCTGCCATGAACCAAGG CGGGAAACACACTTAACTATGAGTA 9 rs6765791 KAT2B AAGGACATAGCACTCACTGG TGGCATAAGAAGTCACCCAC GTCAGGTGGAAAGAGCAA 9 rs9874923 KAT2B GCCCATCAACTAGTTAACGC GTTGTAGCATATGTCAGCAC CATATGTCAGCACTTCTTTAT 9 rs2127956 RAB5A CGTGAGTGTAGTTAGGGAAG TCCTGGGTGCTGAGTAATAC GGTTATTACTCTTCCTCACATTAATTG 9 rs6778866 RAB5A AAGCCTAGAGATGCAGTACC GTTAGCCCAGAGATTAGCAG GGAGATTAGCAGGGAGCAG 9 rs9858341 RAB5A GTGTTTTAAATTGAAAAAGC GCAGAGTCATGAGTTGAACTT TCGCCAACATATATTCCATCTAG

169 9 rs2442809’ RAF1 CGACAGAACGAGACTCTATC TGCTAGGGATTACAGGTGTG CCACCACACCTGGCCTA 9 rs2596831 RAF1 GATGGCCCAGTGAACTAATG CCCCAGTCCCAAAAACAAAC GTGGCATTCTGAGCAAA 9 rs5746223 RAF1 AACCCATGTCATCTGTTTCC GCCAGTGCTATAACTAAAATG AGACAAAAATTGGAAGTATGGTTA 9 rs7956 RAF1 GCTGATGGCAAGTCTTGTAG TTGGAACAAGCTAGTGCTGG AGCTAGTGCTGGTTCACAAGGTT 9 rs388600 VHL GTCAGTGGTTTGCAACCCTG GTAGGGCTAGGCATTTATAC TTAAAAGCACCCCAGATGTTTCTA 10 rs163540 GRM7 GGCTGCAAGTAACAGAAAAC CCCCAATGCTTCCTCTTTTG CTGCTCATTGCATAGGT 10 rs9838115 GRM7 ACACTGTGGAATCAGCTAGG GAAGTCTGGTACTGGTTTGC GGTTTTGCCTGCTATACGGAGGGG 10 rs4858754 KAT2B ACTCTCTGAACACTAGCTGG TAACCTGTGGGATGTGATGC TTGGGATGTGATGCTATCTCCA 10 rs10510497 RAB5A TTTCCTGCTTCACCCATCAC GAATGAGAGAAGAGGAAAAG AGAAACTTTATGTGGTGGTCAG 10 rs11128930 RAB5A CAGTGCACTCATAACAACTC GATTAAGAGGATGAAGGAGG ATGAAGGAGGTAAACTAGAA 10 rs12488378 RAB5A GCAATCCGATCTCAAAGCAG GGACAGACCACTATGGTAAG ATAGCTTTGACTCAGGTTACT 10 rs13081007 RAB5A GCTTCTGAAGGTTCTCTCTG TCACCTTTACACTGGCCTTC AGCCCACTGCAGTCTCTC 10 rs2454436 RAF1 TTAACGGCAGGGATACCTTC GGCTGTACCTTCTTGGTTTG CTTTATGATGTGTGCATGA 10 rs3729931 RAF1 AGGGAAAGAAAACAGCTGAG TAAGCAGCTAGAGGGTTAGG ACTTGTTATTATCTGTTGTTCATT 10 rs166538 VHL AAACCGTATCAGGGCTGTTC CTCAAAATCCTCCCTACTGC CCCGAGTTGCTCCACGCCAG 11 rs11555433 AKT1 TTGAGTACCTGAAGCTGCTG GATCTTCATGGCGTAGTAGC TAGCGGCCTGTGGCCTTCTCCTTC 11 rs2494749 AKT1 ACAGGCACTCACAGACCCT ACAAACGAGGTTAGTACCCG GGGCCTGGGGAGGGAGAGA

11 rs28730786 AKT1 CCAACCCTCCTTCACAATAG AGCCTGAGAGGAGCGCGTGA GTGAGCGTCGCGGGAGCCT 11 rs4375597 AKT1 TATCAGGCTGCCCTGCTACT CAGGACCTGGTTCCTGTAAG CCTGGCTCTTCCAGCCTCA 11 rs55874341 AKT1 CACATCATCTCGTACATGAC TGCAGGTGCTGGAGGACAA CAGGTGCTGGAGGACAATGACTA 11 rs414907 GRM7 CTCTTAGCCTTGCTGTTATC TTAGTGACATTACTCAGTGG TTACTCAGTGGAATGAAACTTTTT 11 rs11621560 HSP90AA1 GTTGTTGAGGAAGGAAGAGG AACCCCAAGTTCTTCCACTC TTCCACTCCTGCTTCAC 11 rs1190583 HSP90AA1 GCAAAACTACATCTCCTCAC ACATCTTTTCACAATCGAGG TTCACAATCGAGGAAGTTTAGA 11 rs2298877 HSP90AA1 GCTGCTTGGAGGTATTAAAG CTTAAAGCCAGCTATTAGGG CCAGCTATTAGGGTAATACTC 11 rs7160651 HSP90AA1 GGAAGTCTGCACTGAAACTC CTATGTGGCCAATTGACCAG TCAGGCCACCTTGTCAC 11 rs8005080 HSP90AA1 CAGGAATACCCCAACCAAGG TATCTCCTTCCCAGAGCTTC CCTTCCCAGAGCTTCTCCGAAC 11 rs8005905 HSP90AA1 GTACCAAGAAAAGGCCCAAG TCCAGAGACAGAGTAGAGTG GGTGGATCCAGACACCA 12 rs17846829 AKT1 TTAACCTTTCCGCTGTCGCC GCAGCGGCAGCGTCTGGC AGCGTCTGGCCAGGAGG 12 rs2498791 AKT1 CTGAGCCTGCTCTGCTCTC TCCCCTCCCCCAGCCACAG GCAGCGGGCAGGCAGGGGC 12 rs1190584 HSP90AA1 TCCTCGATTGTGAAAAGATG TTATATACTCCGAGCACTT CTCCGAGCACTTCAGAAT 12 rs2442809 RAF1 CGACAGAACGAGACTCTATC TGCTAGGGATTACAGGTGTG CCACCACACCTGGCCTA S rs12588965 AKT1 AGGACAGATGTGCCTGGGAT GTTTCTGTCGCTGGCCCTA AGCTGTCGCTGGCCCTAAGAAAC S rs2494749 AKT1 ACAGGCACTCACAGACCCT ACAAACGAGGTTAGTACCCG GCCTGGGGAGGGAGAGA S rs41307094 AKT1 AGAGTGAGGGGACACATGG CCGATTCACGTAGGGAAATG ATTAAGGACTTCTGCAGCTATG

170 S rs1240966 GRM7 ATTTGCTTGCATTGTGCCTG AATCCCATTATCCCATTTCC TCTTGAGATATTTTTGTTTGTGGG S rs12630300 GRM7 GAAAGGTAAGGTAGTGCAAC GGGCCACTTTTCAATTTGAG AGTACATATTGACGTATGAGC S rs7651591 GRM7 GGTGGACTTCCACAAGTTTC GCCACAATGATCGATTAGTC GTGTGTGTGTGTGTATTAG S rs1190584 HSP90AA1 TCCTCGATTGTGAAAAGATG TTATATACTCCGAGCACTT ACTCCGAGCACTTCAGAAT S rs1190587* HSP90AA1 CCTGAAACACTCAGTAGATG CCTTTCACTCTGATTATCAC CCTCTGATTATCACTGTCAAGAC S rs7160651 HSP90AA1 GGAAGTCTGCACTGAAACTC CTATGTGGCCAATTGACCAG GGTCAGGCCACCTTGTCAC S rs8005905 HSP90AA1 GTACCAAGAAAAGGCCCAAG TCCAGAGACAGAGTAGAGTG GGGTGGATCCAGACACCA S rs2623074 KAT2B TTTTCAAGGTGCTTGAAAC GCTTTCTTCTGAATTCAGGC TTTCCAAAGCAGCACAT S rs2948083 KAT2B GCCTCGCAGAGAATGAAAAC CCCTTCTGCAGAAGGATTTC GAAGGATTTCCTGGGCA S rs9874923 KAT2B GCCCATCAACTAGTTAACGC GTTGTAGCATATGTCAGCAC CATATGTCAGCACTTCTTTAT S rs9858341 RAB5A GTGTTTTAAATTGAAAAAGC GCAGAGTCATGAGTTGAACTT CGCCAACATATATTCCATCTAG S rs5746223 RAF1 GCCAGTGCTATAACTAAAATG AACCCATGTCATCTGTTTCC TTCCATATCTTTTAAAGATAAACATA S rs713178 RAF1 TAGACATGGCTCAGAGTCGG GGTTAAACTGCAGTGGAGTC AGGCAGTGGAGTCAGAGTCCGT S rs9855183 RAF1 GGATGGGGTAAGTGCTTTTG ACACTCCTGGGAAGGAGAAC GGAGCCAGGCAGGCAGCAGGGGCAC ‘Indicates that this SNP failed quality control filters and was re‐typed using a different primer set. *This SNP is a tag SNP in perfect LD with rs1190583 in HSP90AA1.

Table 4.2: Top ten identified candidate genes for the AKT1 and p70S6K phosphorylated:total protein ratio phenotypes.

Rank Chromosome 14 genes Chromosome 3 genes (CANDID score) (CANDID score) 1 AKT1 (55.5) RAF1 (69.4) 2 HSP90AA1 (54.2) VHL (20.6) 3 CALM1 (33.6) KAT2B (16.7) 4 TRAF3 (17.2) GRM7 (11.2) 5 EVL (12.2) RAB5A (11.4) 6 TSHR (9.9) THRB (9.7) 7 MTA1 (9.3) IRAK2 (8.4) 8 BEGAIN (9.1) ATP2B2 (7.7) 9 CCNK (8.7) SH3BP5 (7.5) 10 YY1 (7.9) PPARG (7.2)

171

Table 4.3: All CEPH association test results for SNPs on chromosomes 14 and 3

Phenotype SNP Gene Additive Dominant Recessive AKT1 ratio rs10138227 AKT1 0.266 0.192 0.789 AKT1 ratio rs1130214 AKT1 0.147 0.135 0.696 AKT1 ratio rs1130233 AKT1 0.845 0.421 0.125 AKT1 ratio rs11848899 AKT1 0.777 0.777 ‐ AKT1 ratio rs12588965 AKT1 0.042 0.042 ‐ AKT1 ratio rs2494732 AKT1 0.204 0.635 0.106 AKT1 ratio rs2494738 AKT1 0.423 0.149 0.200 AKT1 ratio rs2494746 AKT1 0.832 0.997 0.586 AKT1 ratio rs2494748 AKT1 0.193 0.163 0.951 AKT1 ratio rs2494749 AKT1 0.954 0.543 0.155 AKT1 ratio rs2498796 AKT1 0.659 0.785 0.069 AKT1 ratio rs3803304 AKT1 0.425 0.552 0.310 AKT1 ratio rs41307094 AKT1 0.251 0.251 ‐ AKT1 ratio rs11621560 HSP90AA1 0.898 0.867 0.963 AKT1 ratio rs1190583 HSP90AA1 0.211 0.211 ‐ AKT1 ratio rs1190584 HSP90AA1 0.0042* 0.0036** 0.564 AKT1 ratio rs2298877 HSP90AA1 0.216 0.171 0.826 AKT1 ratio rs7160651 HSP90AA1 0.518 0.518 ‐ AKT1 ratio rs8005080 HSP90AA1 0.557 0.557 ‐ AKT1 ratio rs8005905 HSP90AA1 0.136 0.136 ‐ p70S6K ratio rs1154354 GRM7 0.831 0.872 0.465 p70S6K ratio rs1154355 GRM7 0.922 0.688 0.408 p70S6K ratio rs1240966 GRM7 0.506 0.506 ‐ p70S6K ratio rs12487836 GRM7 0.408 0.466 0.585 p70S6K ratio rs12630300 GRM7 4.10×10‐4 * 4.10 × 10‐4 * ‐ p70S6K ratio rs163540 GRM7 0.574 0.574 ‐ p70S6K ratio rs17046224 GRM7 0.681 0.681 ‐ p70S6K ratio rs17046239 GRM7 ‐ ‐ ‐ p70S6K ratio rs17046322 GRM7 ‐ ‐ ‐ p70S6K ratio rs191443 GRM7 0.022 0.046 0.076 p70S6K ratio rs339805 GRM7 0.025 0.040 0.211 p70S6K ratio rs339807 GRM7 0.538 0.382 0.442 p70S6K ratio rs340655 GRM7 0.003 0.012 0.124 p70S6K ratio rs340657 GRM7 0.010 0.037 0.105 p70S6K ratio rs340660 GRM7 0.152 0.555 0.030 p70S6K ratio rs3749380 GRM7 0.134 0.004 0.598 p70S6K ratio rs3846161 GRM7 0.650 0.970 0.307 p70S6K ratio rs393046 GRM7 0.005 0.113 0.005

172

p70S6K ratio rs4108607 GRM7 0.085 0.150 0.441 p70S6K ratio rs421802 GRM7 0.053 0.089 0.211 p70S6K ratio rs6443074 GRM7 0.944 0.944 ‐ p70S6K ratio rs7644436 GRM7 0.060 0.059 0.408 p70S6K ratio rs7651591 GRM7 0.876 0.660 0.078 p70S6K ratio rs7651971 GRM7 0.131 0.017 0.776 p70S6K ratio rs9818072 GRM7 0.796 0.796 ‐ p70S6K ratio rs9838115 GRM7 0.401 0.401 ‐ p70S6K ratio rs9870018 GRM7 0.629 0.486 0.957 p70S6K ratio rs9872244 GRM7 0.877 0.980 0.645 p70S6K ratio rs9876241 GRM7 0.151 0.145 0.822 p70S6K ratio rs9882058 GRM7 0.127 0.156 0.445 p70S6K ratio rs9882865 GRM7 0.074 0.093 0.401 p70S6K ratio rs1124376 KAT2B 0.785 0.785 ‐ p70S6K ratio rs12639078 KAT2B 0.135 0.019 0.508 p70S6K ratio rs1610186 KAT2B 0.584 0.902 0.358 p70S6K ratio rs1915919 KAT2B 0.297 0.622 0.172 p70S6K ratio rs2623074 KAT2B 0.140 0.221 0.216 p70S6K ratio rs2929401 KAT2B 0.974 0.323 0.284 p70S6K ratio rs2929402 KAT2B 0.471 0.674 0.475 p70S6K ratio rs2929404 KAT2B 0.538 0.207 0.199 p70S6K ratio rs2929408 KAT2B 0.208 0.320 0.216 p70S6K ratio rs2948083 KAT2B 0.903 0.896 0.528 p70S6K ratio rs2948089 KAT2B 0.780 0.780 ‐ p70S6K ratio rs2948097 KAT2B 0.477 0.229 0.633 p70S6K ratio rs4858754 KAT2B 0.570 0.183 0.728 p70S6K ratio rs6765791 KAT2B 0.289 0.449 0.294 p70S6K ratio rs6806287 KAT2B 0.488 0.488 ‐ p70S6K ratio rs9874923 KAT2B 0.963 0.966 0.728 p70S6K ratio rs10510497 RAB5A 1.000 0.837 0.501 p70S6K ratio rs11128927 RAB5A 0.992 0.584 0.388 p70S6K ratio rs11128930 RAB5A 0.826 0.492 0.426 p70S6K ratio rs12488378 RAB5A 0.862 0.769 0.952 p70S6K ratio rs13072891 RAB5A 0.960 0.501 0.546 p70S6K ratio rs13081007 RAB5A 0.184 0.345 0.138 p70S6K ratio rs13085694 RAB5A 0.784 0.428 0.499 p70S6K ratio rs2127956 RAB5A 0.913 0.604 0.330 p70S6K ratio rs2929344 RAB5A 0.813 0.422 0.291 p70S6K ratio rs2929346 RAB5A 0.814 0.240 0.122 p70S6K ratio rs4241540 RAB5A 0.838 0.729 0.229 p70S6K ratio rs4858660 RAB5A 0.915 0.930 0.807

173

p70S6K ratio rs6778866 RAB5A 0.779 0.779 ‐ p70S6K ratio rs6790199 RAB5A 0.560 0.560 ‐ p70S6K ratio rs7613136 RAB5A 0.849 0.975 0.774 p70S6K ratio rs9810613 RAB5A 0.848 0.994 0.502 p70S6K ratio rs9858341 RAB5A 0.922 0.525 0.102 p70S6K ratio rs2454436 RAF1 0.463 0.621 0.529 p70S6K ratio rs2596831 RAF1 0.163 0.224 0.319 p70S6K ratio rs3729931 RAF1 0.391 0.832 0.113 p70S6K ratio rs3773341 RAF1 0.178 0.048 0.699 p70S6K ratio rs5746223 RAF1 6.35×10‐5 ** 6.35×10‐5 ** ‐ p70S6K ratio rs713178 RAF1 4.57×10‐4 * 5.65×10‐4 * 0.356 p70S6K ratio rs7956 RAF1 0.384 0.106 0.199 p70S6K ratio rs9855183 RAF1 6.35×10‐5 ** 6.35×10‐5 ** ‐ p70S6K ratio rs166538 VHL 0.449 0.859 0.345 p70S6K ratio rs265318 VHL 0.067 0.077 0.534 p70S6K ratio rs388600 VHL 0.121 0.120 0.427 * Statistically significant at Bonferroni‐corrected threshold ** Statistically significant using false discovery rate of 5 percent + Invariant in individuals analyzed for this phenotype

174

Chapter 5

A search for loci influencing the PI3K/AKT/mTOR pathway signaling response to inhibitors of pathway members

175

5.1 Abstract

The PI3K/AKT/mTOR signaling pathway is an attractive target for the development of anticancer agents, and some compounds, including analogs of rapamycin (also known as “rapalogs”) are already being used clinically. However, inter‐individual response to these drugs is mixed, with as many as half of individuals demonstrating an unexpected increase in PI3K/AKT/mTOR activity following rapalog treatment. The studies presented in this chapter utilized a collection of cell lines that was used successfully in Chapters 2 and 4 to identify genetic variants that influence baseline cell signaling. These cell lines were treated with LY294002 (a

PI3K inhibitor) or rapamycin, and three signaling proteins (AKT1, p70S6K and 4E‐

BP1) were measured in two different ways to produce a total of twelve drug response phenotypes. Though these phenotypes were variable and some appeared to confirm previous clinical findings, their heritabilities were very low. The most heritable phenotypes, alternate definitions of the AKT1 response to rapamycin, displayed heritabilities between 13 percent and 31 percent, but they produced no statistically significant or suggestive linkage peaks. Though these experiments provided valuable information about the PI3K/AKT/mTOR signaling pathway, they ultimately failed to identify genetic variants that could be tested for potential clinical use in predicting patient response to PI3K/AKT/mTOR pathway inhibitors.

176

5.2 Introduction

The PI3K/AKT/mTOR pathway is a well‐characterized pro‐growth signaling pathway. It is commonly upregulated in cancer, and as such, it is a high priority target for the development of anticancer drugs [Garcia‐Echeverria and Sellers 2008;

Liu, et al. 2009]. Currently, two types of drugs are clinically available that target this pathway for the treatment of cancer: growth factor receptor inhibitors and analogs of the mTOR inhibitor rapamycin [Garcia‐Echeverria and Sellers 2008; Liu, et al.

2009]. Growth factor receptor inhibitors target the pathway at its origin, by inhibiting the transmembrane receptors that initiate the signaling cascade [Cheng, et al. 2005; Liu, et al. 2009]. However, they can be ineffective if any downstream members of the pathway have developed mutations that remove their dependence on growth factor activation [Hynes and Dey 2009; Hynes and Lane 2005]. Efforts have therefore recently focused on developing treatments that target one or more intermediate members of the PI3K/AKT/mTOR pathway.

Rapamycin and its analogs (often referred to as “rapalogs”) represent a group of drugs that target mTOR complex 1 (mTORC1), one such intermediate pathway member. Rapamycin is a naturally occurring compound that was originally isolated from Streptomyces hygroscopicus, a bacterium found in the soil of Easter

Island (Rapa Nui). It was first used clinically as an antifungal agent and then as an immunosuppressant before being investigated for its antiproliferative properties

[Albert, et al. ; Liu, et al. 2009]. It exerts its influence by binding both the FKBP12 and mTOR proteins [Albert, et al. ; Choi, et al. 1996; Liang, et al. 1999]. This inhibits

177

mTORC1‐dependent phosphorylation of proteins downstream of mTOR, such as p70S6K and 4E‐BP1 [Foster and Toschi 2009; Liu, et al. 2009]. The analogs of rapamycin (rapalogs) work similarly and include drugs such as everolimus and temsirolimus, both of which are approved for the treatment of some cancer types

[Albert, et al. ; Liu, et al. 2009]. Other rapalogs, such as deforolimus, are also in clinical trials for cancer [Albert, et al. ; Liu, et al. 2009].

PI3K is another intermediate pathway member that has been the focus of recent drug discovery efforts. Early attempts to find PI3K inhibitors identified naturally occurring compounds including and wortmannin [Cheng, et al.

2005; Garcia‐Echeverria and Sellers 2008; Ihle and Powis 2009]. These structures were modified to create additional PI3K inhibitors, one of which was LY294002.

LY294002 was successful at inhibiting growth of tumor xenografts as well as cultured cell lines, and it has become a popular tool for laboratory studies of the

PI3K/AKT/mTOR pathway [Cheng, et al. 2005; Garcia‐Echeverria and Sellers 2008;

Ihle and Powis 2009]. However, it is not used clinically, as it displays a poor pharmacological profile [Garcia‐Echeverria and Sellers 2008; Garlich, et al. 2008].

Additionally, it appears that LY294002 is not a specific inhibitor of PI3K; rather, it may inhibit other kinases within the PI3K/AKT/mTOR pathway as well as kinases in different pathways [Garcia‐Echeverria and Sellers 2008; Ihle and Powis 2009].

Regardless of these shortcomings, LY294002 continues to be used extensively in laboratory studies, and its structure has provided a starting point for the creation of

178

better clinical candidate drugs, including SF‐1126 [Altomare, et al. 2004; Garlich, et al. 2008; Hartmann, et al. 2009].

One problem encountered in clinical trials of PI3K/AKT/mTOR‐targeting drugs has been resistance of some tumors to these drugs. As mentioned above, resistance of tumors to drugs targeting growth factor receptors has been attributed to mutations identified in downstream pathway members. Resistance also occurs in many individuals who have been treated with rapalogs, and the mechanism for this is less well‐understood [Albert, et al.]. For example, in approximately 50 percent of patients treated with rapalogs, an increase in the phosphorylation of AKT1 was observed [Cloughesy, et al. 2008; Tabernero, et al. 2008]. Possible mechanisms for this resistance include rapalog‐induced activation of the RAF/ERK/MEK signaling pathway and of mTORC2, which lies upstream of AKT1 [Albert, et al. ; Carracedo, et al. 2008]. While resistance to growth factor receptor inhibitors can be predicted somewhat by screening tumors for activated downstream proteins, there is currently no way to predict which individuals will respond to treatment with rapalogs and which will suffer from increased pathway activation following treatment. Similar issues may well exist for other PI3K/AKT/mTOR inhibitors that are still in preclinical development or in clinical trials. One possible explanation for inter‐individual variation in response to any drug is the different genetic makeup of individuals in a population. As discussed in Chapter 1, pharmacogenomics has been successful in identifying individuals who will respond best to several different

179

anticancer treatments, and it may do the same for PI3K/AKT/mTOR‐targeting drugs.

Chapters 2 and 4 have shown that genetic influences on the baseline activation levels of several PI3K/AKT/mTOR pathway members do exist in a lymphoblastoid cell line model. Specifically, the baseline ratios of phosphorylated to total AKT1 and p70S6K were found to be moderately heritable. The AKT1 ratio trait mapped to a region of chromosome 14 containing AKT1 and HSP90AA1, and a variant in HSP90AA1 was statistically significantly associated with the phenotype.

Both the AKT1 and p70S6K ratio traits mapped to chromosome 3, and several variants under that linkage peak were found to be statistically significantly associated with the p70S6K ratio phenotype. These included three SNPs in RAF1, which encodes the c‐Raf signaling protein, and one SNP in GRM7, a glutamate receptor.

Since inherited genetic variation was shown to be involved in mediating baseline PI3K/AKT/mTOR activity, the effects of genetic variation on response to inhibition of this pathway were investigated here. Cell lines from related individuals in the CEPH collection were grown and treated with rapamycin,

LY294002 or a vehicle control, and protein was harvested following drug treatment.

ELISAs were used to obtain quantitative measurements of total and phosphorylated

AKT1, p70S6K and 4E‐BP1 under each treatment condition. Combinations of these measurements produced twelve initial phenotypes, which were then assessed using heritability and linkage analysis. The results of these analyses are presented here,

180

and their implications for the clinical use of PI3K/AKT/mTOR inhibitors are discussed.

5.3 Materials and Methods

5.3.1 Cell culture, protein preparation and ELISA assays

121 lymphoblastoid cell lines from 14 families in the CEPH (Centre d’Etude du Polymorphisme Humain) collection were used in this study. Cells were seeded at a density of 250,000 cells/ml and a total volume of 20 ml in duplicate vented T‐25 flasks. Growth media consisted of RPMI (Invitrogen) plus 10% fetal bovine serum

(Sigma). After 47 hours, LY294002 (10 µM, Cayman Chemical), rapamycin (1 µM,

LKT Laboratories) or vehicle (0.02 percent DMSO, Sigma) was added. At this timepoint, cells in a separate flask were counted to measure the growth rate.

Treated cells were incubated for an additional hour, and then protein lysates were harvested as done previously (see Chapter 2, section 2.3.3). Concentrated lysates were then diluted with lysis buffer to a concentration of 20 µg/µl and refrozen in stock plates at ‐80°C.

ELISAs were performed using kits for measuring phospho‐AKT1 [pS473], total AKT1, phospho‐p70S6K [pT389], total p70S6K, phospho‐4E‐BP1 [pT46] and total 4E‐BP1 (Invitrogen/Biosource). Equal amounts of protein from each of the protein samples was assayed in triplicate, and recombinant protein standards and pooled lysate standards were present on each 96‐well plate for quality control

181

purposes. Protocols provided for each kit were followed exactly. Final absorbance at 450 nm was read using a Beckman Coulter DTX880 plate reader.

5.3.2 Data cleaning, basic statistics and phenotype definitions

To calculate the growth rate for each cell line, the concentration of cells/ml in the flask set aside for growth rate measurement was divided by the initial concentration of cells (250,000 cells/ml). For the protein phenotypes determined by ELISA, triplicate OD(450) values were averaged for each sample as well as the pooled standards. Inter‐plate differences were adjusted for by using the pooled protein standards, which were present in eight replicates across the 14 96‐well plates required for each ELISA. The average of all replicates across all plates was determined, and so was the average of the eight replicates within each plate. The difference between the experiment‐wide average and each plate’s average was added to all 96 samples within that plate. In most cases, across‐plate differences were minor and corresponded with differences of several minutes between reading times for each plate. Ratios of phosphorylated to total protein were calculated by simply dividing the two measurements.

Drug response phenotypes were defined in two ways: (1) by dividing the phosphorylated:total protein ratio for the drug‐treated samples by the ratio for the vehicle‐treated samples (referred to here as the “ratio” method) and (2) by dividing the phosphorylated protein measurement for the drug‐treated samples by the phosphorylated protein measurement for the vehicle‐treated samples (referred to

182

here as the “phospho ratio” method). This made for a total of twelve phenotypes: phosphorylation of AKT1, p70S6K and 4E‐BP1, after treatment with either

LY294002 or rapamycin, defined using either the phosphorylated:total protein ratios or just the measurements of phosphorylated protein. Standard deviations were calculated for ratios based on the formula for the variance of an unbiased ratio estimator presented by van Kempen and van Vliet [van Kempen and van Vliet 2000].

Correlation calculations were performed using SAS version 9.1.3 (SAS

Institute). Unless otherwise specified, all correlations mentioned in the text are

Pearson correlations calculated using PROC CORR. When mentioned, Spearman calculations were also performed using the “Spearman” option of PROC CORR.

5.3.3 Data transformation and heritability analyses

Genetic analyses were conducted using the SOLAR software package. To adjust the data to meet SOLAR’s requirements, each of the twelve phenotypes was assessed for normality and transformed with a square root or logarithmic transformation if necessary. If phenotypes still did not achieve normality, outliers were winsorized to the most extreme value that would still allow the distribution to be normally distributed. Phenotypes were also multiplied by a constant to achieve a larger standard deviation if SOLAR required it. To assess heritability for these phenotypes, polygenic models were created in SOLAR. A handful of covariates were screened for each phenotype, including age, sex, age × sex, age2, age2 × sex, growth rate of the cells (defined by the cell count at the time of protein harvest divided by

183

the initial cell concentration) and passage number. The “polygenic –screen” command was used to screen for statistically significant covariates and to estimate the additive genetic heritability.

5.3.4 Linkage analysis

Multipoint linkage analysis was conducted using SOLAR for traits with heritability measures greater than 10 percent. Genotypes for 126 CEPH individuals in the 14 families studied were obtained from version 10.0 of the CEPH genotype database, which includes both microsatellite markers and SNPs. Genetic locations of these markers were determined using data from the Marshfield map. The total number of genotypes available for these markers was 95,278 at 7,403 loci. 30 markers with Mendelian inconsistencies or other errors were identified by SOLAR and removed. Before running the linkage analysis, the “lodadj –calc” command was used to create a distribution of LOD scores under the null hypothesis of no linkage.

5.3.5 Candidate gene selection

For the “RAratio” phenotype, CANDID was used to identify candidate genes within the interval containing the maximum LOD score identified by linkage analysis. This region extended from 26 cM to 60 cM on chromosome 22 and contained 363 genes. Keywords used for CANDID were AKT, signaling, phosphatase, kinase, mTOR, PI3K, phosphatidyl, phosphoinositide, p70S6K, AKT1, rapamycin, sirolimus, temsirolimus, rapalog, PTEN, PDK1, PP2A, FKBP‐12, FRAP1, FKBP12,

184

mTORC1 and mTORC2. Weights were 1 for literature, 0.15 for domains, 0.1 for conservation, 0.001 for linkage and 0.8 for interactions. The linkage file assigned the same linkage score to all genes within the previously mentioned region, and analysis was limited to only these genes. Analysis was not limited to only protein‐ coding genes. Version 5 of the CANDID databases was used for this analysis.

5.4 Results

5.4.1 Variation in PI3K/AKT/mTOR drug response phenotypes

ELISAs were used to measure the phosphorylated and total levels of three proteins (AKT1, p70S6K and 4E‐BP1) in cell lines that had been subjected to a number of conditions. Three of these conditions were treatment with DMSO (as a vehicle control), LY294002 (a PI3K inhibitor) and rapamycin (a mTORC1 inhibitor).

From the quantitative ELISA results, a total of twelve phenotypic measurements were calculated for each cell line. These consisted of measurements for each of the three proteins, for treatment with each of the two drugs, determined in two separate ways (as described in section 5.3.2). One of the original 121 cultured cell lines failed to produce enough protein to be used for the ELISAs, leaving a total of

120 cell lines for which these 12 phenotypes were measured.

The variation of these 12 phenotypes in a subset of 24 unrelated individuals is shown in Figure 5.1. The average variation for all of the phenotypes, as determined for each phenotype by calculating the fold change between the highest and lowest values in this subset, is 2.834. The least variable phenotype, with a fold

185

change of 1.818, is the p70S6K response to LY294002 treatment as determined by the “ratio” method. The most variable phenotype, the AKT1 response to LY294002 measurement as determined by the “phospho ratio” method, had a fold change of

4.913. In general, the phenotypes determined by the “phospho ratio” method had a higher range of variation than did their respective “ratio” method phenotypes. In fact, the only phenotype that displayed less variation when calculated by the

“phospho ratio” method was the AKT1 response to rapamycin treatment.

A perhaps more clinically relevant way of examining the inter‐individual variation in these phenotypes is to consider the percentage of individuals with values greater than one for each phenotype. A value greater than one for the “ratio” method indicates that the ratio of phosphorylated to total protein was higher in the drug‐treated sample for that cell line than it was in the vehicle‐treated sample.

Similarly, a value greater than one for the “phospho ratio” method indicates that the amount of phosphorylated protein was higher for the drug‐treated sample than for the vehicle‐treated sample. As a PI3K inhibitor, LY294002 is generally expected to inhibit the phosphorylation of all three of these downstream proteins, while rapamycin would be expected to inhibit p70S6K and 4E‐BP1 while causing no change in the phosphorylation of AKT1.

The choice of phenotype calculation method appeared to have little effect on the percentage of individuals with scores greater than one. For both calculation methods, the responses of AKT1 and p70S6K to LY294002 contained values greater than one for between 16 percent and 33 percent of unrelated individuals. The 4E‐

186

BP1 inhibition in response to LY294002, however, was more robust; the percentages of individuals with values greater than one were 12.5 percent for the

“phospho ratio” phenotype and 0 percent for the “ratio” phenotype. In fact, the maximum value for the “ratio” phenotype was 0.71, indicating that all of the individuals in this subset experience a sizeable decrease in the ratio of phosphorylated to total 4E‐BP1 following LY294002 treatment.

The p70S6K and 4E‐BP1 responses to rapamycin contained similar percentages of individuals with values greater than one. Depending on the method for calculating the phenotype, between 29 percent and 38 percent of unrelated individuals displayed values greater than one for the responses of these two proteins to rapamycin. In line with its position upstream of rapamycin’s target,

AKT1 experienced less inhibition of phosphorylation following rapamycin treatment. Between 46 percent and 58 percent of unrelated individuals possessed values greater than 1 for the AKT1 response to rapamycin treatment.

The AKT1 response to rapamycin findings were consistent with results from clinical trials of rapamycin and other rapamycin analogs in which approximately 50 percent of individuals exhibited an increase in PI3K/AKT/mTOR pathway activation. To explore this further, the two “ratio” and “phospho ratio” AKT1‐ rapamycin phenotypes were examined in the entire collection of 120 phenotyped cell lines (Figure 5.2). The results in the unrelated individuals were similar to what was seen in the entire collection, with 47 percent of individuals displaying a value greater than one for the “ratio” phenotype (Figure 5.2A) and 56 percent of

187

individuals displaying a value greater than one for the “phospho ratio” phenotype.

As shown by the error bars in Figure 5.2, many of these values are significantly greater than one. This substantial increase in phosphorylation was unexpected given that rapamycin treatment is generally not thought to alter the phosphorylation of AKT1 because it targets mTORC1, which is downstream of AKT1.

5.4.2 Relationships between phenotypes

Given the great deal of data amassed from these experiments and the experiments using the same cell lines describe in Chapter 2, it was possible to investigate relationships between some of the phenotypes obtained. First, the correlations between the “ratio” and “phospho ratio” phenotypes were determined.

For all three proteins and both drug conditions, the correlations between the phenotypes determined by these two methods were highly statistically significant

(Table 5.1). However, the correlation coefficients were not necessarily high, ranging from 0.385 for the AKT1 response to LY294002 to 0.716 for the p70S6K response to rapamycin. These results indicated that the “ratio” and “phospho ratio” phenotypes were closely related to each other but could not be easily substituted for each other, so subsequent analyses generally used both methods for calculating the phenotypes.

Another question stemming from the design of these experiments was whether a cell line would respond similarly to LY294002 and rapamycin. To test whether this was the case, Spearman rank correlations were calculated between the

LY294002 “ratio” phenotype and the rapamycin “ratio” phenotype for each protein.

188

All three correlations were statistically significant, but the correlation between these two phenotypes was weakest for 4E‐BP1, with a correlation coefficient of

0.25236 and a P‐value of 0.0061. The correlations for AKT1 and p70S6K were both highly statistically significant, with P‐values less than 0.0001 and correlation coefficients of 0.53384 (AKT1) and 0.53075 (p70S6K). The positive and statistically significant correlations for all three proteins indicated that, in general, cell lines respond similarly to LY294002 and rapamycin.

In these experiments, growth rate information was collected for each cell line in parallel with the protein extraction process. Having this data made it possible to examine the Spearman correlations between the growth rate and all twelve drug response phenotypes (Table 5.2). Only two of the twelve correlations were statistically significant – the p70S6K “phospho ratio” response to rapamycin and the

4E‐BP1 “ratio” response to LY294002. The correlation coefficient for the p70S6K

“phospho ratio” response to rapamycin was 0.26487, indicating that faster‐growing cells would have higher levels of phosphorylated p70S6K following treatment with rapamycin compared to treatment with vehicle. In contrast, the correlation coefficient for the 4E‐BP1 “ratio” response to LY294002 was ‐0.27176, indicating that faster‐growing cells would have a lower ratio of phosphorylated to total 4E‐BP1 following treatment with LY294002 compared to treatment with vehicle. It is interesting that one of these statistically significant correlations is positive while the other is negative, since both proteins lie downstream of mTORC1 and would be predicted to behave similarly in response to both drugs.

189

In addition to examining the relationship between cell line growth rate and drug response, the growth rate data were also combined with the baseline protein data analyzed in Chapter 2. This dataset contained a total of 9 phenotypes: measurements of total protein, phosphorylated protein and the ratio of phosphorylated to total protein for AKT1, p70S6K and 4E‐BP1. None of these nine phenotypes displayed a statistically significant Spearman correlation with the growth rate phenotype (Table 5.3). However, one phenotype approached statistical significance – the amount of phosphorylated 4E‐BP1 protein (P=0.0091, Bonferroni‐ corrected threshold for nine tests=0.0056).

Finally, the last set of correlation tests conducted was to investigate whether any correlations existed between the baseline protein phenotypes and the drug response phenotypes. Rather than assess the correlation of each of the nine baseline phenotypes against each of the twelve drug response phenotypes, only the baseline phosphorylated to total protein ratio for each protein was tested for possible correlations with the drug response “ratio” phenotypes for that protein for

LY294002 and rapamycin. This created two correlation tests per protein, for a total of six tests. Only one of these six tests had a P‐value lower than the Bonferroni‐ corrected threshold of 0.0083 – the correlation between the baseline phosphorylated to total AKT1 ratio and the AKT1 “ratio” response to LY294002

(Table 5.4). These two phenotypes had a correlation coefficient of ‐0.28340 and a P‐ value of 0.0016, indicating that cell lines with a higher baseline ratio of

190

phosphorylated to total AKT1 tend to have a lower ratio of phosphorylated to total

AKT1 following LY294002 treatment compared to vehicle treatment.

5.4.3 Heritabilities of twelve drug response phenotypes

After confirming that variation existed in the 120 cell lines assayed for all twelve drug response phenotypes, the heritability for each phenotype was assessed.

Heritabilities were extremely low for most phenotypes, with nine of the twelve phenotypes possessing a heritability measurement of 0 (Table 5.5). All four of the p70S6K phenotypes had heritabilities of 0, as did two of the AKT1 phenotypes and three of the 4E‐BP1 phenotypes. The non‐zero phenotypes and their heritabilities included the AKT1 “ratio” and “phospho ratio” responses to rapamycin and the 4E‐

BP1 “ratio” response to rapamycin. The heritabilities were 13.93 percent for the

AKT1 “ratio” response to rapamycin (nicknamed the “RAratio” phenotype), 13.51 percent for the AKT1 “phospho ratio” response to rapamycin (nicknamed the

“RAphos” phenotype) and 5.90 percent for the 4E‐BP1 “ratio” response to rapamycin (nicknamed the “R4ratio” phenotype). The “RAratio” and “RAphos” heritabilities were in the low to moderate range, and the “R4ratio” phenotype, while non‐zero, was still very low. None of the tested covariates were statistically significant for the “RAratio” phenotype, but cell line passage number and the interaction between age2 and sex were statistically significant for the “RAphos” phenotype. Statistically significant covariates for the “R4ratio” phenotype were age, sex, age2, the interaction between age2 and sex and cell line passage number.

191

Since the heritability for the “R4ratio” phenotype was relatively low, it was determined that only the AKT1 response to rapamycin phenotypes would be used for further analyses. Because these two phenotypes had the highest heritabilities, alternate methods of defining the AKT1 response to rapamycin were calculated to see whether other methods would result in increased heritability. The first of these alternate phenotypes, the “simple” phenotype, was a binary phenotype in which individuals with values greater than zero for the log‐transformed “RAratio” phenotype (corresponding to an original “RAratio” value greater than one) were assigned a value of 1, and all other individuals were assigned a value of 0. To avoid potential misclassification of individuals whose log‐transformed “RAratio” values were very close to zero, the “non‐zero” phenotype assigned individuals with log‐ transformed “RAratio” values greater than 0.1 a value of 1, and individuals with log‐ transformed “RAratio” values less than ‐0.1 were assigned a value of 0. Individuals whose log‐transformed “RAratio” values fell between ‐0.1 and 0.1 were not assigned a value for the “non‐zero” phenotype. The third method was an extension of “non‐ zero” method, except that the cutoff on either side of 0 was equal to one standard deviation of the log‐transformed “RAratio” phenotype (or 0.27) rather than 0.1. For this “stdev” phenotype, individuals with log‐transformed “RAratio” values between ‐

0.27 and 0.27 were not assigned “stdev” values. Finally, to separate the individuals with very high log‐transformed “RAratio” values from everyone else, the “pos stdev” method divided the group into individuals with log‐transformed “RAratio” values

192

greater than 0.27 (assigned a value of 1) from everyone else (assigned a value of 0).

A summary of these four alternate phenotypes is shown in Table 5.6.

The heritabilities for these four alternate AKT1 response to rapamycin phenotypes are also shown in Table 5.6. The heritabilities range from 16.21 percent

(“simple” phenotype) to 30.09 percent (“pos stdev” phenotype), representing an increase over the previous heritabilities of 13.93 percent and 13.51 percent for the

“RAratio” and “RAphos” phenotypes, respectively. None of the tested covariates were statistically significant for any of these four alternate phenotypes.

5.4.4 Linkage analysis of heritable phenotypes

Linkage analysis was conducted for the six AKT1 response to rapamycin phenotypes (“RAratio”, “RAphos”, “simple”, “non‐zero”, “stdev” and “pos stdev”).

Despite the wide range of heritabilities, which extended up to 30.09 percent, all six phenotypes had similarly low maximum LOD scores (Table 5.7). A representative linkage analysis using the “RAratio” phenotype is shown as Figure 5.3. Out of the six phenotypes, the one with the highest LOD score was the “stdev” phenotype, with a maximum LOD score of 1.36, while the lowest maximum LOD score of 0.68 was associated with the “simple” phenotype. Additionally, there appeared to be no relationship between heritability of these phenotypes and maximum LOD scores.

For example, the phenotype with the second‐highest heritability, “non‐zero”

(h2=26.85 percent), had the second‐lowest maximum LOD score (0.77). The maximum LOD score of the phenotype with the highest heritability, “pos stdev”, was

193

1.08, lower than the maximum LOD score for the phenotype with the lowest heritability (“RAphos”, maximum LOD score of 1.10). None of these maximum LOD scores came close to the threshold for suggestive linkage of 1.8.

In addition to the low magnitude of the maximum LOD scores, their chromosomal locations also suggested that these peaks were not truly biologically significant. The maximum LOD scores for all six phenotypes were located on six different chromosomes. If they had overlapped, or if a peak had been seen on the same chromosome for multiple phenotypes, it might have suggested that a locus near the peak had a small, but reliably detectable effect on the AKT1 response to rapamycin. However, this was not the case.

5.4.5 Candidate gene identification and assessment

In Chapter 2, linkage analysis for the ratio of phosphorylated to total AKT1 produced a maximum LOD score of 1.7, short of the threshold for suggestive linkage.

However, subsequent analyses of this phenotype were performed because it was found that the AKT1 gene, an extremely attractive candidate gene, was located underneath that peak. In case a similar situation was present here, one of the AKT1 response to rapamycin phenotypes (“RAratio”) was selected for follow‐up analyses to see whether any strong candidate genes were located under the peak containing its maximum LOD score. This peak was located at the 41 cM position on chromosome 22. Of the 363 genes located in the interval surrounding that peak, the

194

top 5 candidates were EP300, YWHAH, XRCC6, EWSR1 and GRAP2 (scores shown in

Table 5.8).

None of the top 5 genes appear to be important in the PI3K/AKT/mTOR signaling pathway. The top gene, EP300, encodes a histone acetyltransferase called p300 that interacts with CREB to regulate transcription through chromatin remodeling [Condorelli and Giordano 1997]. The second gene, YWHAH, encodes a

14‐3‐3 scaffolding protein called 14‐3‐3eta [Yu, et al. 2002b]. It may have scored highly due to a protein‐protein interaction with mTOR‐interacting proteins TSC1 and TSC2 that was listed on the NCBI page for this gene. However, the cited publication for this interaction appears to refer to a different 14‐3‐3 protein, 14‐3‐

3zeta, making it unlikely that this gene is truly relevant to PI3K/AKT/mTOR signaling [Nellist, et al. 2002]. The third gene, XRCC6, encodes a well‐characterized protein involved in DNA repair [Smider and Chu 1997]. The fourth gene, EWSR1, encodes a gene that is often disrupted in Ewing’s sarcoma patients [Ban, et al. 2006].

Finally, the last gene, GRAP2, encodes an adaptor protein involved in T‐cell signaling

[Burack, et al. 2002]. The scores of these last three genes as potential candidate genes reflected their high conservation, involvement with other kinases and interactions with a large number of proteins (none of which, however, are in the canonical PI3K/AKT/mTOR pathway). Based on the lack of a strong candidate gene, the linkage results were not pursued further, bringing the analysis of these experiments to a close.

195

5.5 Discussion

The work described in this chapter resulted from an attempt to apply traditional genetic analysis methods to the problem of differential responses to drugs targeting the PI3K/AKT/mTOR pathway. Using 120 cell lines, the responses of three proteins (AKT1, p70S6K and 4E‐BP1) to treatment with LY294002 (a PI3K inhibitor) and rapamycin (an mTORC1 inhibitor) were measured and defined using two different methods (the “ratio” method and the “phospho ratio” method). The resulting twelve phenotypes showed variation across the cell lines, and several relationships were observed between the drug response phenotypes and other phenotypes, including growth rate and baseline protein phosphorylation measurements. However, none of the twelve phenotypes were highly heritable.

The AKT1 response to rapamycin treatment showed moderate heritability when defined using a variety of different methods, but none of the tested definitions produced a statistically suggestive or significant linkage peak. Since the linkage peaks identified were neither statistically nor biologically significant, no further follow‐up analyses were conducted. Despite the failure to identify variants capable of predicting the response of PI3K/AKT/mTOR pathway members to inhibition by

LY294002 or rapamycin, valuable information about the pathway and its behavior was still obtained from these experiments.

The results shown here demonstrate the first large‐scale investigation into inter‐individual differences in molecular response to LY294002 and rapamycin.

Variation was seen across all individuals for the twelve phenotypes investigated.

196

The percentages of individuals who experienced increased versus decreased phosphorylation following drug treatment confirmed previous research on the

PI3K/AKT/mTOR pathway. As expected, most individuals experienced decreases in the phosphorylation of AKT1, p70S6K and 4E‐BP1 in response to LY294002 and of p70S6K and 4E‐BP1 in response to rapamycin. All of these molecular targets are predicted to be inhibited by these drugs, and these results confirm those expectations. Of particular interest were the phenotypes measuring the AKT1 response to rapamycin. Rapamycin is generally not expected to influence AKT1 phosphorylation at all, and the fact that approximately half of individuals experienced an increase in AKT1 phosphorylation may at first glance appear to reflect these expectations. However, some of these individuals experienced a strong increase in AKT1 phosphorylation, which is unexpected based on the current dogma regarding rapamycin’s mechanism of action. These in vitro results confirm previous data from clinical trials in which as many as half of individuals experienced increased AKT1 phosphorylation following treatment with rapalogs.

In an attempt to better understand the nature of these phenotypes, correlations between them were explored, and several items of statistical significance were found. First, the “ratio” and “phospho ratio” phenotypes for each molecular drug response phenotype were significantly correlated. This is somewhat to be expected, given that the two terms used to define the “phospho ratio” phenotype were both used in the definition of the “ratio” phenotype. However, the correlation coefficients for these comparisons were never greater than 0.8. This

197

likely reflects the inter‐individual variation seen in the two other terms used to define the “ratio” phenotype – the total amounts of each protein in the drug‐treated and vehicle‐treated samples. These measurements vary across all individuals, and they are not highly correlated with the corresponding phosphorylated protein measurements for the same sample. As a result, the “ratio” phenotype ends up being very different than the “phospho ratio” phenotype.

Tests for correlations between drug response phenotypes and growth rate

(measured in untreated cells) yielded two statistically significant results amidst many insignificant results. These findings were a significant positive correlation between growth rate and p70S6K response to rapamycin and a significant negative correlation between growth rate and 4E‐BP1 response to LY294002. Both of these proteins lie downstream of mTORC1 and are predicted to respond similarly to both drugs. Therefore, there is not an immediately obvious explanation as to why their results should seem to conflict. To be sure, LY294002 and rapamycin are very different drugs, so there could be inherent differences in how they are processed by cells growing at different rates. However, one would still expect that if growth rate were correlated with the p70S6K response to rapamycin, it would also be correlated with 4E‐BP1 response to rapamycin and vice versa for the LY294002 results. In fact, though the 4E‐BP1 response to LY294002 is statistically significantly negatively correlated with growth rate, the p70S6K response to LY294002 is not. It has a trend toward statistical significance (P = 0.0513), but it appears to be trending toward a positive correlation rather than a negative one.

198

Furthermore, though the activation of the PI3K/AKT/mTOR pathway is often described as leading to increased cell growth, none of the baseline protein activation phenotypes were statistically significantly correlated with the measured growth rate. The baseline protein phenotypes were measured in the cell lines in a different experiment than the one in which the growth rate was measured, and small differences between cell line passage numbers and incubation conditions could be partially to blame for this lack of correlation. The conflicting nature of these results could be due to a number of factors, including errors in the measurement of the protein phenotypes and insufficient or inaccurate current knowledge of the

PI3K/AKT/mTOR pathway, and they should therefore be interpreted with caution until further studies replicate or refute them.

Though baseline protein phosphorylation levels were not found to be correlated with growth rate, one baseline measurement was statistically significantly correlated with a protein drug response phenotype. There was a statistically significant negative correlation (r2 = ‐0.28) between the baseline ratio of phosphorylated to total AKT1 and the response of the phosphorylated to total AKT1 ratio to LY294002 treatment. This suggests that individuals with a higher baseline activation of AKT1 (as measured by the phosphorylated:total protein level) would experience a more robust inhibition of AKT1 following treatment with LY294002.

These results may be explained by differences in the activity levels of PI3K in different individuals. If LY294002 reduces PI3K to similar activity levels in all individuals, then those with higher initial PI3K activity (reflected by increased AKT1

199

activation) would experience a proportionally higher decrease in PI3K activity following treatment with an inhibitor like LY294002. However, it is not known whether LY294002 inhibits PI3K similarly in different individuals, so further experiments would be required to confirm this.

Nine of the twelve phenotypes investigated had a heritability of zero, and the heritabilities for the other three original phenotypes were all below 14 percent.

Since heritability is defined as the percentage of variation due to genetic influences, the across‐the‐board low heritabilities could reflect one of two things – either (1) that the non‐genetic variation in these experiments far outweighed the influence of genetics or (2) that there is little or no genetic influence on these phenotypes. Many attempts were made to limit non‐genetic variation, including simultaneous handling of the cell lines, consistency in reagent lot numbers across cell lines and statistical screening and adjusting for experimental covariates like passage number, sex and age. The weak evidence for genetic influence that was found, in terms of the small linkage peaks and lack of strong candidate genes, did not lend credence toward the theory that common genetic variation played a role in influencing these phenotypes.

The second explanation – that these traits are not influenced much, if at all, by genetics – may therefore be more likely. Random variation in sampling, measurement error and environmental variation could have provided the basis for most of the phenotypic variation seen here.

This study was conducted with the hope that this cell line model could confirm some of the variation in patient response to PI3K/AKT/mTOR pathway

200

inhibitors seen clinically and that specific genetic variants could be identified to assist with predicting patient response. The inter‐individual variation in the AKT1 response to rapamycin did parallel what has been observed clinically, and there appeared to be a low to moderate influence of genetics on this trait, as evidenced by the calculated heritabilities. However, no statistically significant or suggestive linkage peaks were identified. This meant that no strong candidate genes could be selected for follow‐up genotyping and analysis of individual variants. It is possible that many variants affect these phenotypes and are distributed across the genome such that they did not produce large linkage peaks in this study. However, until future studies using other methods suggest otherwise, these results indicate that it may not be possible to predict patient response to drugs like the rapalogs by genotyping individuals for inherited genetic variants.

201

LY AKT1 ratios LY p70S6K ratios LY 4E-BP1 ratios s s s 2.0 1.5 0.8 g g g n n n i i i d d d a a a e e e r r r 1.5 0.6

e e e l l l 1.0 c c c i i i h h h e e e 1.0 0.4 v v v : : : g g g u u u r r r 0.5 d d d

f f f 0.5 0.2 o o o

o o o i i i t t t a a a r r r 0.0 0.0 0.0

Rapa AKT1 ratios Rapa p70S6K ratios Rapa 4E-BP1 ratios

s 2.0 s 1.5 s 1.5 g g g n n n i i i d d d a a a e e e

r 1.5 r r

e e e

l l 1.0 l 1.0 c c c i i i h h h

e 1.0 e e v v v : : : g g g u u u

r r 0.5 r 0.5 d d d

f 0.5 f f o o o

o o o i i i t t t a a a

r 0.0 r 0.0 r 0.0

LY AKT1 phospho ratios LY p70S6K phospho ratios LY 4E-BP1 phospho ratios

s 2.5 s 1.5 s 1.5 g g g n n n i i i d d d

a 2.0 a a e e e r r r

e e e

l l 1.0 l 1.0 c c c

i 1.5 i i h h h e e e v v v : : :

g 1.0 g g u u u

r r 0.5 r 0.5 d d d

f f f

o 0.5 o o

o o o i i i t t t a a a

r 0.0 r 0.0 r 0.0

Rapa AKT1 phospho ratios Rapa p70S6K phospho ratios Rapa 4E-BP1 phospho ratios

s 2.0 s 2.0 s 2.0 g g g n n n i i i d d d a a a e e e

r 1.5 r 1.5 r 1.5

e e e l l l c c c i i i h h h

e 1.0 e 1.0 e 1.0 v v v : : : g g g u u u r r r d d d

f 0.5 f 0.5 f 0.5 o o o

o o o i i i t t t a a a

r 0.0 r 0.0 r 0.0

Figure 5.1: Variation in twelve drug response phenotypes in unrelated individuals. Drug response was measured two different ways, for two different drugs, in three different proteins, for a total of twelve phenotypes. Proteins measured include AKT1 (left column), p70S6K (middle column) and 4E‐BP1 (right column). Phenotypes were defined either (1) by dividing the protein activation (ratio of phosphorylated to total protein) in the drug‐treated samples to the protein activation in the vehicle‐treated samples or (2) by dividing the amount of phosphorylated protein in the drug‐treated sample to the amount of phosphorylated protein in the vehicle‐treated sample. Phenotypes calculated using the first method are shown in the top two rows, while the bottom two rows contain phenotypes calculated using the second method. Response to the PI3K inhibitor LY294002 is shown in the first and third rows, and response to rapamycin is shown in the second and fourth rows. Error bars represent standard deviations.

202

A Rapa AKT1 ratios

s 2.5 g n i d

a 2.0 e r

e l c

i 1.5 h e v :

g 1.0 u r d

f

o 0.5

o i t a

r 0.0

B Rapa AKT1 phospho ratios

s 4 g n i 203 d a e r 3

e l c i h

e 2 v : g u r d

f 1 o

o i t a

r 0

Figure 5.2: AKT1 response to rapamycin in 120 cell lines. Panel (A) shows the rapamycin response phenotype as calculated by dividing the AKT1 activation (ratio of phosphorylated to total AKT1) in the rapamycin‐treated samples to the AKT1 activation in the vehicle‐treated samples, while panel (B) shows the phenotype as calculated by dividing the amount of phosphorylated AKT1 in the rapamycin‐treated sample to the amount of phosphorylated AKT1 in the vehicle‐treated sample. Error bars represent standard deviations, and the red lines intersect the y‐axis at 1.

1.2 1.0 0.8

D 0.6 O L 0.4 0.2 0.0 1 2 3 4 5 6 7 c M 8 p o s i9ti o n 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

20 Figure 5.3: Linkage scan of AKT1 response to rapamycin (“RAratio” phenotype). This phenotype was calculated by 4

dividing the AKT1 activation (ratio of phosphorylated to total AKT1) in the rapamycin‐treated samples to the AKT1 activation in the vehicle‐treated samples. The maximum LOD score is 1.05 on chromosome 22.

Table 5.1: Correlations between “ratio” and “phospho ratio” phenotypes

Protein Drug Phenotype correlation AKT1 LY294002 0.38541* rapamycin 0.45835* p70S6K LY294002 0.57107* rapamycin 0.71579* 4E‐BP1 LY294002 0.48323* rapamycin 0.53439* *Associated P‐value < 0.0001

Table 5.2: Spearman correlations between drug response phenotypes and growth rate

Protein Drug Phenotype measurement Correlation P­value AKT1 LY294002 “ratio” ‐0.02401 0.7946 rapamycin “ratio” ‐0.08447 0.3590 LY294002 “phospho ratio” 0.08831 0.3375 rapamycin “phospho ratio” 0.13725 0.1350 p70S6K LY294002 “ratio” 0.17834 0.0513 rapamycin “ratio” 0.19923 0.0291 LY294002 “phospho ratio” 0.13491 0.1418 rapamycin “phospho ratio” 0.26487 0.0035* 4E‐BP1 LY294002 “ratio” ‐0.27176 0.0028* rapamycin “ratio” 0.01661 0.8589 LY294002 “phospho ratio” ‐0.09138 0.3229 rapamycin “phospho ratio” 0.08274 0.3751 *P‐value less than Bonferroni‐corrected threshold of 0.004 (twelve tests)

205

Table 5.3: Spearman correlations between baseline phosphorylation phenotypes and growth rate

Protein Phenotype Correlation P­value AKT1 total protein 0.04342 0.6378 phosphorylated protein ‐0.11431 0.2138 phosphorylated:total protein ‐0.14852 0.1055 p70S6K total protein 0.20316 0.0260 phosphorylated protein 0.03477 0.7062 phosphorylated:total protein ‐0.12936 0.1591 4E‐BP1 total protein 0.18326 0.0451 phosphorylated protein 0.23706 0.0091 phosphorylated:total protein 0.13760 0.1339

Table 5.4: Correlations between baseline phosphorylated:total protein ratios and drug response “ratio” phenotypes

Protein Drug “ratio” phenotype Correlation P­value AKT1 LY294002 ‐0.28340 0.0016* rapamycin 0.11467 0.2124 p70S6K LY294002 0.06179 0.5008 rapamycin 0.00227 0.9803 4E‐BP1 LY294002 ‐0.01541 0.8679 rapamycin ‐0.01025 0.9126 *P‐value less than Bonferroni‐corrected threshold of 0.0083 (six tests)

206

Table 5.5: Heritabilities of twelve drug response phenotypes

Protein Drug Measurement used for treatment‐to‐vehicle normalization Phenotype Heritability (%) AKT1 LY294002 “ratio” “LAratio” 0 “phospho ratio” “LAphos” 0 rapamycin “ratio” “RAratio” 13.93 “phospho ratio” “RAphos” 13.51 p70S6K LY294002 “ratio” “LPratio” 0 “phospho ratio” “LPphos” 0 rapamycin “ratio” “RPratio” 0 “phospho ratio” “RPphos” 0 4E‐BP1 LY294002 “ratio” “L4ratio” 0 “phospho ratio” “L4phos” 0 rapamycin “ratio” “R4ratio” 5.90 “phospho ratio” “R4phos” 0 207

Table 5.6: Definitions, distributions and heritabilities of alternate AKT1­rapamycin phenotypes

Phenotype name Group definitions (number per group) Heritability (%) “simple” log(“RAratio”) > 0 (56) 16.21 log(“RAratio”) < 0 (64) “non‐zero” log(“RAratio”) > 0.1 (41) 26.85 log(“RAratio”) < ‐0.1 (47) “stdev” log(“RAratio”) > 1 standard deviation above mean (19) 16.36 log(“RAratio”) < 1 standard deviation below mean (23) “pos stdev” log(“RAratio”) > 1 standard deviation above mean (19) 30.09 log(“RAratio”) < 1 standard deviation above mean (101)

Table 5.7: Maximum LOD scores of AKT1­rapamycin phenotypes

Phenotype Maximum LOD score Chromosome Position “RAratio” 1.05 22 41 cM “RAphos” 1.10 20 29 cM “simple” 0.68 15 10 cM “non‐zero” 0.77 7 75 cM “stdev” 1.36 21 29 cM “pos stdev” 1.08 17 106 cM

Table 5.8: Candidate genes under linkage peak for “RAratio” phenotype

Rank Genes (score) 1 EP300 (35.0) 2 YWHAH (27.4) 3 XRCC6 (25.6) 4 EWSR1 (25.3) 5 GRAP2 (17.9)

208

Chapter 6

Conclusions and future directions

209

6.1 Conclusions

Cancer is a leading cause of human death, and a main goal of scientific research is to better understand and treat this array of diseases. It is clear that cancer, like many diseases, is defined by aberrant cell signaling. In Chapter 1, evidence from the scientific literature was presented to show that some signaling pathways, including the PI3K/AKT/mTOR signaling pathway, are commonly deregulated in a wide array of tumor types. The PI3K/AKT/mTOR pathway is a pro‐ growth signaling pathway that is normally activated following the binding of growth factors to transmembrane receptors. Though the key players in this pathway and their functions have been well characterized, previous research had not yielded much information about inter‐individual differences in signaling through this pathway. Additionally, this pathway represents a popular target for cancer therapies, some of which are already known to affect individuals differently. Due to these factors, a better understanding of inter‐individual differences in

PI3K/AKT/mTOR signaling in a variety of contexts was needed. The work presented here attempted to investigate these differences and to determine whether they could be attributed to inherited genetic variation. Traditional genetic analysis methods were used, and a new tool was also developed and successfully employed in these studies.

This new tool, named CANDID, was developed as an algorithm for the ranking of candidate genes for complex human traits. As part of the CANDID creation process, databases are routinely compiled from freely available

210

information from several sources, including PubMed, NCBI Entrez Gene, NCBI dbSNP and a large public gene expression dataset. CANDID is available online, and users can select which data sources to use and how much of their own knowledge or data, if any, they wish to supply. Once the user selects and weights these parameters, CANDID quickly ranks potential candidates and provides the user with the ranked output list of genes and detailed information on their scores. CANDID performed well in a rigorous test for identifying novel associations. It also outperformed other candidate gene algorithms available at the time it was developed, either by comparing previously published performance statistics or by participating in a direct head‐to‐head comparison. CANDID was also used successfully to investigate the cell signaling phenotypes discussed in Chapters 2, 4 and 5.

Chapters 2 and 4 focused on the genetic analysis of baseline activation of the

PI3K/AKT/mTOR pathway. Phosphorylated and total amounts of three proteins

(AKT1, p70S6K and 4E‐BP1) were measured in over 120 cell lines from 14 families.

Variation was seen across all nine of the measured and derived phenotypes. Two derived phenotypes (the ratios of phosphorylated to total AKT1 and p70S6K) displayed moderate heritability, indicating that inherited genetic variation might play some part in influencing these ratios. Traditional genetic linkage analysis identified a suggestive linkage peak for the p70S6K ratio phenotype on chromosome

3. This peak overlapped with a smaller linkage peak for the AKT1 ratio phenotype, and it was found that the phenotypes were genetically correlated. The linkage peak

211

for the combined genetic component for both traits had a maximum LOD score of

2.91 on chromosome 3. There was also a smaller, non‐significant linkage peak on chromosome 14 that represented the maximum LOD score of the AKT1 ratio phenotype. CANDID identified the top candidate genes in the linkage intervals on chromosomes 3 and 14, and SNPs in these genes were genotyped and tested for association with the appropriate phenotype. One SNP in the HSP90AA1 gene on chromosome 14 was statistically significantly associated with the AKT1 phenotype, and four SNPs on chromosome 3 were statistically significantly associated with the p70S6K phenotype. These included three SNPs in the RAF1 gene and one SNP in the

GRM7 gene. SNPs statistically significant for the phenotypes were tested for association with several cancer‐related traits measured in patients enrolled in an ovarian cancer clinical trial. However, no statistically significant associations were found.

Chapter 5 described attempts to identify similar associations between inherited genetic variants and inter‐individual variation in response to treatment with PI3K/AKT/mTOR pathway inhibitors. Using the same cell line collection, variation was seen in the responses of AKT1, p70S6K and mTOR to treatment with

LY294002 (a PI3K inhibitor) or rapamycin (an mTOR complex 1 inhibitor).

Unfortunately, though these phenotypes were variable, little or none of the variation appeared to be due to genetic factors. The AKT1 responses to rapamycin were the most heritable phenotypes, but they did not achieve any suggestive linkage peaks.

Additionally, CANDID did not find any compelling candidate genes under one of the

212

highest linkage peaks seen. These results seem to suggest that any possible genetic influence on these phenotypes is small or due to effects in multiple genes.

Overall, there are several main highlights of the work presented here. While the work of others has clearly established that molecular‐level traits like gene expression and protein expression can be influenced by genetics, this is the first evidence that inherited genetic variation influences a post‐translational phenotype like protein phosphorylation. Additionally, CANDID proved to be a successful and flexible method for identifying candidate genes, as evidenced by its high performance in a simulated gene discovery situation and its success in identifying candidate genes for the baseline phosphorylation traits. Several of the genes it identified for the protein phosphorylation phenotypes contained statistically significant SNPs and have well documented ties to the PI3K/AKT/mTOR pathway.

Finally, though no such variants were found to influence the response of AKT1, p70S6K or 4E‐BP1 to LY294002 and rapamycin, these drug response studies were able to confirm a clinical finding and, by doing so, strengthen the case for using in vitro models to assess inter‐individual variation.

6.2 Future directions

The work presented here has resulted in the development of a useful tool and has revealed additional information about the PI3K/AKT/mTOR pathway, but it has also prompted additional questions and directions. One obvious question is whether inherited genetic variation influences the baseline activation of members of

213

the PI3K/AKT/mTOR pathway other than AKT1 and p70S6K. In these studies, only

AKT1, p70S6K and 4E‐BP1 were measured because these were the only pathway members for which robust, reliable ELISA kits were commercially available. Other key pathway members include PI3K, mTOR, PDK1 (which phosphorylates and activates AKT1 on Thr308) and other members of mTOR complexes 1 and 2.

Currently, directly measuring the activity of these proteins can be technically difficult for a variety of reasons. For example, while it is known that PI3Ks are activated by interacting with growth factor receptors at the plasma membrane, the exact mechanism by which their activity increases is not clear and can therefore not be measured [Hawkins, et al. 2006]. Additionally, accurately assessing PI3K activity by measuring PIP2 and PIP3 requires the use of technically demanding assays involving 32P and may not be feasible for studies involving many cell lines [Hawkins, et al. 2006]. In fact, PI3K activity is usually assessed by measuring the activity of molecules known to be downstream of PI3K, such as AKT1 [Hawkins, et al. 2006].

Though pathway members like PI3K may require the development of newer methods more suitable to high‐throughput biology, it is still possible to measure the activity of some proteins by measuring their phosphorylation states. mTOR,

PRAS40 and TSC2 are all phosphorylated in ways that are thought to regulate the activity of the PI3K/AKT/mTOR pathway [Guertin and Sabatini 2007; Manning and

Cantley 2007]. Phosphorylation of these proteins can be measured by a number of antibody‐based methods, including ELISAs. In fact, a commercially available kit for the detection of phosphorylated and total PRAS40 is now available from Invitrogen.

214

Investigating the variation and heritability of measurements of these proteins could provide a more complete picture of how this pathway functions in different individuals.

Just as future work could extend the same principles applied here to other members of the PI3K/AKT/mTOR pathway, it could also apply to other key signaling pathways. The work presented in Chapter 4 showed that polymorphisms in the RAF1 gene were associated with the ratio of phosphorylated to total p70S6K.

RAF1 encodes c‐Raf, a key player in the RAF/MEK/ERK signaling pathway.

Activation of this pathway promotes , cell cycle progression and survival, and as such, it is also important in cancer [Chang and Karin 2001].

Whereas the PI3K/AKT/mTOR pathway includes several key proteins whose activities are not directly mediated by phosphorylation, the RAF/MEK/ERK pathway is a simpler kinase regulatory cascade [Chang and Karin 2001; Garrington and Johnson 1999]. The activities of its members can therefore be assessed in a relatively straightforward manner using commercially available ELISAs or other methods for obtaining quantitative measures of phosphorylated and total protein.

Similar principles could be applied to members of the NF‐κB signaling pathway, the

Notch signaling pathway and the WNT/β‐catenin signaling pathway. In this way, a more complete picture of inter‐individual variation in signaling could be achieved.

Though ELISAs performed well in the experiments described here, other technologies for measuring protein quantitatively are emerging. Perhaps the most promising of these are the mass spectrometry‐based methods. Mass spectrometry

215

(MS) has long been used for chemical studies, and it is an attractive technique for proteomics because of three intrinsic properties – it is quantitative, it can distinguish molecules with only slight differences in mass (such as molecules with or without small post‐translational modifications), and it can identify molecules based on their spectral fingerprints. Until recently, the suitability of mass spectrometry‐based methods to proteome‐level science was limited due to a number of factors. These included a dynamic range that was too narrow to accommodate the large variation seen in protein expression within a cell, the requirement to use isotope labeling to obtain accurate measurements and insufficient sensitivity to detect the relatively rare proteins with post‐translational modifications [Bonilla, et al. 2008; Gstaiger and Aebersold 2009; Witze, et al. 2007].

However, recent developments have overcome these obstacles in a variety of ways

[Bonilla, et al. 2008; Gstaiger and Aebersold 2009; Witze, et al. 2007]. As a result, the technology has advanced sufficiently to be used in several studies assessing genetic influences on these MS‐measured protein phenotypes [Foss, et al. 2007;

Gstaiger and Aebersold 2009]. Though these studies have so far used small numbers of samples (usually comparing pools of samples with two different genomes), as costs are reduced, MS may be used for studies on the scale of the work presented here. This could increase the speed with which these studies are completed as well as the portion of the proteome that could be measured.

Just as mass spectrometry improvements are advancing protein research, developments in sequencing technologies and computational biology are opening

216

up new possibilities for genetics research. Since the publication of the work described in Chapter 3, additional candidate gene identification algorithms have been published. Several of these have performed well in initial tests, though they appear not to be as flexible as CANDID or as powerful to detect novel genes for complex traits [Kohler, et al. 2008; Linghu, et al. 2009]. However, they employ more complex mathematical methods than those used in CANDID, and when used with the right data sources, these methods might outperform CANDID [Kohler, et al. 2008;

Linghu, et al. 2009]. The data sources themselves are also improving as more data is collected on the genes in the human genome. While the number of genes with gene ontology (GO) annotations has not changed appreciably in the last five years, specific current entries for each gene may reflect a more accurate and complete picture of gene function compared to the terms associated with that gene in the past. Likewise, the body of published scientific literature continues to grow and provide useful information about human genes. CANDID and other future algorithms could make use of these improved data sources as well as more sophisticated mathematical methods in order to achieve higher accuracy for candidate gene predictions.

As methods for identifying candidate genes improve, so will methods for following up on them. Next‐generation sequencing has the potential to identify polymorphisms or mutations in the entire genome, but statistical methods for assessing the significance of these variants are not well defined [Easton, et al.

2007a; Ley, et al. 2008]. Ranking of candidate genes could provide a hierarchy by

217

which to prioritize sequenced genes for statistical analysis, thus limiting the number of statistical tests performed. This method could answer the questions of how to deal with genome‐scale sequence data and how to best follow up on candidate genes.

Variants that have been statistically significantly associated with disease‐ related phenotypes should be replicated and translated into clinical practice, when applicable. The significantly associated variants on chromosomes 3 and 14 described in Chapter 4 may have clinical potential, but they require further replication and validation first. First, the association between these variants and the ratios of phosphorylated to total AKT1 and p70S6K should be replicated in another set of lymphoblastoid cell lines. This could be an independent sample of cell lines from the CEPH collection or a sample from a different collection. Lymphoblastoid cell lines are available for many of the individuals genotyped in the HapMap project, and some private collections of cell lines collected during clinical trials also exist. If the variants maintain their associations in another set of cell lines, they could be tested for association with susceptibility to diseases characterized by deregulation of PI3K/AKT/mTOR signaling, such as different types of cancer. If statistically significant associations are identified in those studies, clinical testing of these variants could be performed in order to predict risk for those diseases.

Functional studies could also be performed on the variants identified here.

HSP90AA1 and RAF1 both encode genes known to influence signaling pathways, but it is unknown how the statistically significant SNPs in these genes might affect their

218

respective protein products. Quantitative measurements of gene and protein expression might reveal effects of these SNPs on those phenotypes. Likewise, immunohistochemistry might suggest possible alterations in protein localization influenced by these variants. Additionally, if the presence of these variants influences post‐translational modifications of these proteins, that could potentially be determined by Western blotting. These represent just some of the potential methods by which the functional consequences of these SNPs could be identified. Of course, it may be that the SNPs described in Chapter 4 are not the true functional variants and that they are simply tagging those variants by residing in linkage disequilibrium with them. If a potential mechanism is identified by one of the above molecular approaches, it could be used to dissect the gene region containing the statistically significant SNP(s) by systematically assaying all known variation in the region. Identification of the true functional variant(s) could shed light on how these proteins work differently in different individuals, and this information could lead to the development of new drugs for the treatment of cancer and other diseases.

6.3 Final reflections on current and future research

The results of the studies described here suggest several conclusions about current and future research in this area. One theme of the experiments described in

Chapters 2, 4 and 5 is that it is difficult to measure proteins reliably and accurately, whether or not they have post‐translational modifications. The Western blot method described in Chapter 2 failed to meet quality control measures for the

219

detection of signaling proteins, and the ELISA method that was eventually selected had flaws as well. This included inherent weaknesses in the throughput capacity of the technique as well as variation that was seen across different assay plates and reagent lot numbers. The ELISA method used here also required substantial amounts of protein for each sample, which in turn required substantial cell culture efforts. As a result, cell culture and protein purification procedures had to be conducted in batches, which may have caused additional variability.

These weaknesses of protein detection methods contrast with the ease and accuracy with which mRNA and DNA can be assayed using microarray technology.

If all studies of cellular phenotypes were required to meet the quality standards set by microarray platforms, the experiments described here would fail to do so.

However, though the data quality may not be as high, these experiments provide additional information that cannot be measured by platforms like microarrays.

Specifically, mRNA expression levels rarely correlate with protein levels, so ELISAs have the advantage of detecting a potentially more interesting or relevant phenotype. The results shown here provide proof that protein‐based phenotypes can be studied in a cell culture model and suggest that trade‐offs be considered between experimental variability and phenotypic relevance.

Another potential weakness of the experimental system used here is the use of lymphoblastoid cell lines from the CEPH collection to model the protein phenotypes. These cell lines were collected from donors more than 20 years ago, transformed using Epstein‐Barr virus and stored and maintained by several

220

different groups. It is unknown how many times most of these cell lines were passaged, so it is possible that differences in passage number have contributed to the inter‐individual differences seen in PI3K/AKT/mTOR pathway activation.

Additionally, differences in the genomic integration sites of the Epstein‐Barr virus could skew the phenotypic measurements and genetic analysis results. Finally, as with all cell culture models, it is uncertain to what extent the phenotypes in these cell lines reflect the true biology of a human being. However, despite the flaws inherent in this cell line collection, it still represents one of the largest and best characterized collections of cell lines from families. This makes it an excellent genetic resource and one that has been used successfully by many other groups.

While the experiments used here had many weaknesses in their abilities to investigate genetic influences on PI3K/AKT/mTOR signaling, they still managed to produce some statistically significant and biologically interesting results. In the broader picture, they represent valuable proof of concept studies of genetic influences on cell signaling. Additionally, with the emergence of better experimental methods for protein detection on the horizon, these studies suggest a blueprint for future work at the intersection of genetics and proteomics.

221

References

222

Gene Ontology Annotation (UniProtKB‐GOA). European Bioinformatics Institute. Abecasis GR, Cherny SS, Cookson WO, Cardon LR. 2002. Merlin‐‐rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30(1):97‐101. Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS. 2005. Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics 6:55. Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS. 2006. SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics 22(6):773‐4. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B and others. 2006. Gene prioritization through genomic data fusion. Nat Biotechnol 24(5):537‐44. Al‐Tassan N, Chmiel NH, Maynard J, Fleming N, Livingston AL, Williams GT, Hodges AK, Davies DR, David SS, Sampson JR and others. 2002. Inherited variants of MYH associated with somatic G:C‐‐>T:A mutations in colorectal tumors. Nat Genet 30(2):227‐32. Albert S, Serova M, Dreyer C, Sablin MP, Faivre S, Raymond E. New inhibitors of the mammalian target of rapamycin signaling pathway for cancer. Expert Opin Investig Drugs. Alberts B. 2002. Molecular biology of the cell. New York: Garland Science. Alessi DR, Pearce LR, Garcia‐Martinez JM. 2009. New insights into mTOR signaling: mTORC2 and beyond. Sci Signal 2(67):pe27. Almasy L, Blangero J. 1998. Multipoint quantitative‐trait linkage analysis in general pedigrees. Am J Hum Genet 62(5):1198‐211. Altomare DA, Testa JR. 2005. Perturbations of the AKT signaling pathway in human cancer. Oncogene 24(50):7455‐64. Altomare DA, Wang HQ, Skele KL, De Rienzo A, Klein‐Szanto AJ, Godwin AK, Testa JR. 2004. AKT and mTOR phosphorylation is frequently detected in ovarian cancer and can be targeted to disrupt ovarian tumor cell growth. Oncogene 23(34):5853‐7. Altshuler D, Daly MJ, Lander ES. 2008. Genetic mapping in human disease. Science 322(5903):881‐8. American Cancer Society. Cancer Facts & Figures 2009. Atlanta: American Cancer Society; 2009. Antoni L, Sodha N, Collins I, Garrett MD. 2007. CHK2 kinase: cancer susceptibility and cancer therapy ‐ two sides of the same coin? Nat Rev Cancer 7(12):925‐ 36. Asami S, Manabe H, Miyake J, Tsurudome Y, Hirano T, Yamaguchi R, Itoh H, Kasai H. 1997. Cigarette smoking induces an increase in oxidative DNA damage, 8‐ hydroxydeoxyguanosine, in a central site of the human lung. 18(9):1763‐6.

223

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT and others. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25‐9. Ban J, Siligan C, Kreppel M, Aryee D, Kovar H. 2006. EWS‐FLI1 in Ewing's sarcoma: real targets and collateral damage. Adv Exp Med Biol 587:41‐52. Bellacosa A, Kumar CC, Di Cristofano A, Testa JR. 2005. Activation of AKT kinases in cancer: implications for therapeutic targeting. Adv Cancer Res 94:29‐86. Ben‐Omran TI, Cerosaletti K, Concannon P, Weitzman S, Nezarati MM. 2005. A patient with mutations in DNA Ligase IV: clinical features and overlap with Nijmegen breakage syndrome. Am J Med Genet A 137A(3):283‐7. Bhargava R, Gerald WL, Li AR, Pan Q, Lal P, Ladanyi M, Chen B. 2005. EGFR gene amplification in breast cancer: correlation with epidermal growth factor receptor mRNA and protein expression and HER‐2 status and absence of EGFR‐activating mutations. Mod Pathol 18(8):1027‐33. Bierut LJ, Madden PA, Breslau N, Johnson EO, Hatsukami D, Pomerleau OF, Swan GE, Rutter J, Bertelsen S, Fox L and others. 2007. Novel genes identified in a high‐ density genome wide association study for nicotine dependence. Hum Mol Genet 16(1):24‐35. Bleeker FE, Felicioni L, Buttitta F, Lamba S, Cardone L, Rodolfo M, Scarpa A, Leenstra S, Frattini M, Barbareschi M and others. 2008. AKT1(E17K) in human solid tumours. Oncogene. Bleibel WK, Duan S, Huang RS, Kistner EO, Shukla SJ, Wu X, Badner JA, Dolan ME. 2009. Identification of genomic regions contributing to etoposide‐induced cytotoxicity. Hum Genet 125(2):173‐80. Blumenthal GM, Dennis PA. 2008. PTEN hamartoma tumor syndromes. Eur J Hum Genet 16(11):1289‐300. Bodmer WF, Bailey CJ, Bodmer J, Bussey HJ, Ellis A, Gorman P, Lucibello FC, Murday VA, Rider SH, Scambler P and others. 1987. Localization of the gene for familial adenomatous polyposis on . Nature 328(6131):614‐6. Bogdanova N, Horst J, Chlystun M, Croucher PJ, Nebel A, Bohring A, Todorova A, Schreiber S, Gerke V, Krawczak M and others. 2007. A common haplotype of the annexin A5 (ANXA5) gene promoter is associated with recurrent pregnancy loss. Hum Mol Genet 16(5):573‐8. Bollmann FM. 2007. Targeting ALT: the role of alternative lengthening of telomeres in pathogenesis and prevention of cancer. Cancer Treat Rev 33(8):704‐9. Bonacci TM, Mathews JL, Yuan C, Lehmann DM, Malik S, Wu D, Font JL, Bidlack JM, Smrcka AV. 2006. Differential targeting of Gbetagamma‐subunit signaling with small molecules. Science 312(5772):443‐6. Bond GL, Hu W, Bond EE, Robins H, Lutzker SG, Arva NC, Bargonetti J, Bartel F, Taubert H, Wuerl P and others. 2004. A single nucleotide polymorphism in the MDM2 promoter attenuates the p53 tumor suppressor pathway and accelerates tumor formation in humans. Cell 119(5):591‐602.

224

Bonilla L, Means G, Lee K, Patterson S. 2008. The evolution of tools for protein phosphorylation site analysis: from discovery to clinical application. Biotechniques 44(5):671‐9. Borecki IB, Province MA. 2008. Linkage and association: basic concepts. Adv Genet 60:51‐74. Boultwood J. 2001. Ataxia telangiectasia gene mutations in leukaemia and lymphoma. J Clin Pathol 54(7):512‐6. Broderick P, Carvajal‐Carmona L, Pittman AM, Webb E, Howarth K, Rowan A, Lubbe S, Spain S, Sullivan K, Fielding S and others. 2007. A genome‐wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet 39(11):1315‐7. Brophy K, Ryan AW, Turner G, Trimble V, Patel KD, O'Morain C, Kennedy NP, Egan B, Close E, Lawlor G and others. Evaluation of 6 candidate genes on chromosome 11q23 for coeliac disease susceptibility: a case control study. BMC Med Genet 11:76. Broughton BC, Berneburg M, Fawcett H, Taylor EM, Arlett CF, Nardo T, Stefanini M, Menefee E, Price VH, Queille S and others. 2001. Two individuals with features of both xeroderma pigmentosum and trichothiodystrophy highlight the complexity of the clinical outcomes of mutations in the XPD gene. Hum Mol Genet 10(22):2539‐47. Burack WR, Cheng AM, Shaw AS. 2002. Scaffolds, adaptors and linkers of TCR signaling: theory and practice. Curr Opin Immunol 14(3):312‐6. Cahill DP, Lengauer C, Yu J, Riggins GJ, Willson JK, Markowitz SD, Kinzler KW, Vogelstein B. 1998. Mutations of mitotic checkpoint genes in human cancers. Nature 392(6673):300‐3. Calafell F, Almasy L, Sabater‐Lleal M, Buil A, Mordillo C, Ramirez‐Soriano A, Sikora M, Souto JC, Blangero J, Fontcuberta J and others. Sequence variation and genetic evolution at the human F12 locus: mapping quantitative trait nucleotides that influence FXII plasma levels. Hum Mol Genet 19(3):517‐25. Calvo S, Jain M, Xie X, Sheth SA, Chang B, Goldberger OA, Spinazzola A, Zeviani M, Carr SA, Mootha VK. 2006. Systematic identification of human mitochondrial disease genes through integrative genomics. Nat Genet 38(5):576‐82. Canning S, Dryja TP. 1989. Short, direct repeats at the breakpoints of deletions of the retinoblastoma gene. Proc Natl Acad Sci U S A 86(13):5044‐8. Cargill M, Schrodi SJ, Chang M, Garcia VE, Brandon R, Callis KP, Matsunami N, Ardlie KG, Civello D, Catanese JJ and others. 2007. A large‐scale genetic association study confirms IL12B and leads to the identification of IL23R as psoriasis‐ risk genes. Am J Hum Genet 80(2):273‐90. Carpten JD, Faber AL, Horn C, Donoho GP, Briggs SL, Robbins CM, Hostetter G, Boguslawski S, Moses TY, Savage S and others. 2007. A transforming mutation in the pleckstrin homology domain of AKT1 in cancer. Nature 448(7152):439‐44.

225

Carracedo A, Ma L, Teruya‐Feldstein J, Rojo F, Salmena L, Alimonti A, Egia A, Sasaki AT, Thomas G, Kozma SC and others. 2008. Inhibition of mTORC1 leads to MAPK pathway activation through a PI3K‐dependent feedback loop in human cancer. J Clin Invest 118(9):3065‐74. Carracedo A, Pandolfi PP. 2008. The PTEN‐PI3K pathway: of feedbacks and cross‐ talks. Oncogene 27(41):5527‐41. Casey G. 1997. The BRCA1 and BRCA2 breast cancer genes. Curr Opin Oncol 9(1):88‐ 93. Cesare AJ, Reddel RR. 2008. Telomere uncapping and alternative lengthening of telomeres. Mech Dev 129(1‐2):99‐108. Chamberlain SR, Menzies L. 2009. Endophenotypes of obsessive‐compulsive disorder: rationale, evidence and future potential. Expert Rev Neurother 9(8):1133‐46. Chan TL, Yuen ST, Kong CK, Chan YW, Chan AS, Ng WF, Tsui WY, Lo MW, Tam WY, Li VS and others. 2006. Heritable germline epimutation of MSH2 in a family with hereditary nonpolyposis colorectal cancer. Nat Genet 38(10):1178‐83. Chang L, Karin M. 2001. Mammalian MAP kinase signalling cascades. Nature 410(6824):37‐40. Charames GS, Bapat B. 2003. Genomic instability and cancer. Curr Mol Med 3(7):589‐96. Chaturvedi D, Gao X, Cohen MS, Taunton J, Patel TB. 2009. Rapamycin induces transactivation of the EGFR and increases cell survival. Oncogene 28(9):1187‐96. Chen J, Xu H, Aronow BJ, Jegga AG. 2007. Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics 8:392. Chen X, Wang X, Hossain S, O'Neill FA, Walsh D, Pless L, Chowdari KV, Nimgaonkar VL, Schwab SG, Wildenauer DB and others. 2006. Haplotypes spanning SPEC2, PDZ‐GEF2 and ACSL6 genes are associated with schizophrenia. Hum Mol Genet 15(22):3329‐42. Cheng JQ, Lindsley CW, Cheng GZ, Yang H, Nicosia SV. 2005. The Akt/PKB pathway: molecular target for cancer drug discovery. Oncogene 24(50):7482‐92. Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. 2005. Mapping determinants of human gene expression by regional and genome‐ wide association. Nature 437(7063):1365‐9. Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, Ban HJ, Yoon D, Lee MH, Kim DJ, Park M and others. 2009. A large‐scale genome‐wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 41(5):527‐34. Choi J, Chen J, Schreiber SL, Clardy J. 1996. Structure of the FKBP12‐rapamycin complex interacting with the binding domain of human FRAP. Science 273(5272):239‐42. Chompret A, Kannengiesser C, Barrois M, Terrier P, Dahan P, Tursz T, Lenoir GM, Bressac‐De Paillerets B. 2004. PDGFRA germline mutation in a family with

226

multiple cases of gastrointestinal stromal tumor. Gastroenterology 126(1):318‐21. Choudhury K, McQuillin A, Puri V, Pimm J, Datta S, Thirumalai S, Krasucki R, Lawrence J, Bass NJ, Quested D and others. 2007. A genetic association study of chromosome 11q22‐24 in two different samples implicates the FXYD6 gene, encoding phosphohippolin, in susceptibility to schizophrenia. Am J Hum Genet 80(4):664‐72. Chowdhury I, Tharakan B, Bhat GK. 2008. ‐ an update. Comp Biochem Physiol B Biochem Mol Biol 151(1):10‐27. Clarke GM, Carter KW, Palmer LJ, Morris AP, Cardon LR. 2007. Fine mapping versus replication in whole‐genome association studies. Am J Hum Genet 81(5):995‐ 1005. Cloughesy TF, Yoshimoto K, Nghiemphu P, Brown K, Dang J, Zhu S, Hsueh T, Chen Y, Wang W, Youngkin D and others. 2008. Antitumor activity of rapamycin in a Phase I trial for patients with recurrent PTEN‐deficient glioblastoma. PLoS Med 5(1):e8. Condorelli G, Giordano A. 1997. Synergistic role of E1A‐binding proteins and tissue‐ specific transcription factors in differentiation. J Cell Biochem 67(4):423‐31. Corcelle EA, Puustinen P, Jaattela M. 2009. Apoptosis and autophagy: Targeting autophagy signalling in cancer cells ‐'trick or treats'? FEBS J 276(21):6084‐ 96. Corradetti MN, Guan KL. 2006. Upstream of the mammalian target of rapamycin: do all roads pass through mTOR? Oncogene 25(48):6347‐60. Cox A, Dunning AM, Garcia‐Closas M, Balasubramanian S, Reed MW, Pooley KA, Scollen S, Baynes C, Ponder BA, Chanock S and others. 2007. A common coding variant in CASP8 is associated with breast cancer risk. Nat Genet 39(3):352‐8. Cross AJ, Sinha R. 2004. Meat‐related mutagens/carcinogens in the etiology of colorectal cancer. Environ Mol Mutagen 44(1):44‐55. Deep G, Agarwal R. 2008. New combination therapies with cell‐cycle agents. Curr Opin Investig Drugs 9(6):591‐604. Dempfle A, Wudy SA, Saar K, Hagemann S, Friedel S, Scherag A, Berthold LD, Alzen G, Gortner L, Blum WF and others. 2006. Evidence for involvement of the vitamin D receptor gene in idiopathic short stature via a genome‐wide linkage study and subsequent association studies. Hum Mol Genet 15(18):2772‐83. Desta Z, Ward BA, Soukhova NV, Flockhart DA. 2004. Comprehensive evaluation of tamoxifen sequential biotransformation by the human cytochrome P450 system in vitro: prominent roles for CYP3A and CYP2D6. J Pharmacol Exp Ther 310(3):1062‐75. Dewan A, Liu M, Hartman S, Zhang SS, Liu DT, Zhao C, Tam PO, Chan WM, Lam DS, Snyder M and others. 2006. HTRA1 promoter polymorphism in wet age‐ related macular degeneration. Science 314(5801):989‐92.

227

Dieterich K, Soto Rifo R, Faure AK, Hennebicq S, Ben Amar B, Zahi M, Perrin J, Martinez D, Sele B, Jouk PS and others. 2007. Homozygous mutation of AURKC yields large‐headed polyploid spermatozoa and causes male infertility. Nat Genet 39(5):661‐5. Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M and others. 2007. A genome‐wide association study of global gene expression. Nat Genet 39(10):1202‐7. Dolan ME, Ni L, Camon E, Blake JA. 2005. A procedure for assessing GO annotation consistency. Bioinformatics 21 Suppl 1:i136‐43. Donnelly P. 2008. Progress and challenges in genome‐wide association studies in humans. Nature 456(7223):728‐31. Drayna D, Coon H, Kim UK, Elsner T, Cromer K, Otterud B, Baird L, Peiffer AP, Leppert M. 2003. Genetic analysis of a complex trait in the Utah Genetic Reference Project: a major locus for PTC taste ability on chromosome 7q and a secondary locus on chromosome 16p. Hum Genet 112(5‐6):567‐72. Duan S, Bleibel WK, Huang RS, Shukla SJ, Wu X, Badner JA, Dolan ME. 2007. Mapping genes that contribute to daunorubicin‐induced cytotoxicity. Cancer Res 67(11):5425‐33. Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A and others. 2006. A genome‐wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314(5804):1461‐3. Duffy DL, Montgomery GW, Chen W, Zhao ZZ, Le L, James MR, Hayward NK, Martin NG, Sturm RA. 2007. A three‐single‐nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye‐color variation. Am J Hum Genet 80(2):241‐52. Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, Wheeler E, Glazer NL, Bouatia‐Naji N, Gloyn AL and others. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 42(2):105‐16. Durand CM, Betancur C, Boeckers TM, Bockmann J, Chaste P, Fauchereau F, Nygren G, Rastam M, Gillberg IC, Anckarsater H and others. 2007. Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nat Genet 39(1):25‐7. Dzikiewicz‐Krawczyk A. 2008. The importance of making ends meet: mutations in genes and altered expression of proteins of the MRN complex and cancer. Mutat Res 659(3):262‐73. Easton DF, Bishop DT, Ford D, Crockford GP. 1993. Genetic linkage analysis in familial breast and ovarian cancer: results from 214 families. The Breast Cancer Linkage Consortium. Am J Hum Genet 52(4):678‐701. Easton DF, Deffenbaugh AM, Pruss D, Frye C, Wenstrup RJ, Allen‐Brady K, Tavtigian SV, Monteiro AN, Iversen ES, Couch FJ and others. 2007a. A systematic genetic assessment of 1,433 sequence variants of unknown clinical

228

significance in the BRCA1 and BRCA2 breast cancer‐predisposition genes. Am J Hum Genet 81(5):873‐83. Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R and others. 2007b. Genome‐wide association study identifies novel breast cancer susceptibility loci. Nature 447(7148):1087‐93. Ehret GB, O'Connor AA, Weder A, Cooper RS, Chakravarti A. 2009. Follow‐up of a major linkage peak on chromosome 1 reveals suggestive QTLs associated with essential hypertension: GenNet study. Eur J Hum Genet 17(12):1650‐7. Ellis NA, Groden J, Ye TZ, Straughen J, Lennon DJ, Ciocci S, Proytcheva M, German J. 1995. The Bloom's syndrome gene product is homologous to RecQ . Cell 83(4):655‐66. Emmert S, Schneider TD, Khan SG, Kraemer KH. 2001. The human XPG gene: gene architecture, alternative splicing and single nucleotide polymorphisms. Nucleic Acids Res 29(7):1443‐52. Engelman JA. 2009. Targeting PI3K signalling in cancer: opportunities, challenges and limitations. Nat Rev Cancer 9(8):550‐62. Erkko H, Xia B, Nikkila J, Schleutker J, Syrjakoski K, Mannermaa A, Kallioniemi A, Pylkas K, Karppinen SM, Rapakko K and others. 2007. A recurrent mutation in PALB2 in Finnish cancer families. Nature 446(7133):316‐9. Esteva FJ, Sahin AA, Smith TL, Yang Y, Pusztai L, Nahta R, Buchholz TA, Buzdar AU, Hortobagyi GN, Bacus SS. 2004. Prognostic significance of phosphorylated P38 mitogen‐activated protein kinase and HER‐2 expression in lymph node‐ positive breast carcinoma. Cancer 100(3):499‐506. Fair AM, Dai Q, Shu XO, Matthews CE, Yu H, Jin F, Gao YT, Zheng W. 2007. Energy balance, biomarkers, and breast cancer risk. Cancer Detect Prev 31(3):214‐9. Feldman ME, Apsel B, Uotila A, Loewith R, Knight ZA, Ruggero D, Shokat KM. 2009. Active‐site inhibitors of mTOR target rapamycin‐resistant outputs of mTORC1 and mTORC2. PLoS Biol 7(2):e38. Fishel R, Lescoe MK, Rao MR, Copeland NG, Jenkins NA, Garber J, Kane M, Kolodner R. 1993. The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell 75(5):1027‐38. Forbes SA, Tang G, Bindal N, Bamford S, Dawson E, Cole C, Kok CY, Jia M, Ewing R, Menzies A and others. 2009. COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer. Nucleic Acids Res. Foss EJ, Radulovic D, Shaffer SA, Ruderfer DM, Bedalov A, Goodlett DR, Kruglyak L. 2007. Genetic basis of proteome variation in yeast. Nat Genet 39(11):1369‐ 75. Foster DA, Toschi A. 2009. Targeting mTOR with rapamycin: one dose does not fit all. Cell Cycle 8(7):1026‐9.

229

Franke L, Bakel H, Fokkens L, de Jong ED, Egmont‐Petersen M, Wijmenga C. 2006. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 78(6):1011‐25. Franke TF. 2008. PI3K/Akt: getting it right matters. Oncogene 27(50):6473‐88. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW and others. 2007. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316(5826):889‐94. Gao X, Gordon D, Zhang D, Browne R, Helms C, Gillum J, Weber S, Devroy S, Swaney S, Dobbs M and others. 2007. CHD7 gene polymorphisms are associated with susceptibility to idiopathic scoliosis. Am J Hum Genet 80(5):957‐65. Gao Z, Garcia‐Echeverria C, Jensen MR. Hsp90 inhibitors: clinical development and future opportunities in oncology therapy. Curr Opin Drug Discov Devel 13(2):193‐202. Garber K. 2009. Targeting mTOR: something old, something new. J Natl Cancer Inst 101(5):288‐90. Garcia‐Echeverria C, Sellers WR. 2008. Drug discovery approaches targeting the PI3K/Akt pathway in cancer. Oncogene 27(41):5511‐26. Gardner EJ. 1951. A genetic and clinical study of intestinal polyposis, a predisposing factor for carcinoma of the colon and rectum. Am J Hum Genet 3(2):167‐76. Garlich JR, De P, Dey N, Su JD, Peng X, Miller A, Murali R, Lu Y, Mills GB, Kundra V and others. 2008. A vascular targeted pan phosphoinositide 3‐kinase inhibitor prodrug, SF1126, with antitumor and antiangiogenic activity. Cancer Res 68(1):206‐15. Garrington TP, Johnson GL. 1999. Organization and regulation of mitogen‐activated protein kinase signaling pathways. Curr Opin Cell Biol 11(2):211‐8. George RA, Liu JY, Feng LL, Bryson‐Richardson RJ, Fatkin D, Wouters MA. 2006. Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res 34(19):e130. Gieger C, Geistlinger L, Altmaier E, Hrabe de Angelis M, Kronenberg F, Meitinger T, Mewes HW, Wichmann HE, Weinberger KM, Adamski J and others. 2008. Genetics meets metabolomics: a genome‐wide association study of metabolite profiles in human serum. PLoS Genet 4(11):e1000282. Gonzalez E, McGraw TE. 2009. The Akt kinases: isoform specificity in metabolism and cancer. Cell Cycle 8(16):2502‐8. Goto M, Miller RW, Ishikawa Y, Sugano H. 1996. Excess of rare cancers in Werner syndrome (adult progeria). Cancer Epidemiol Biomarkers Prev 5(4):239‐46. Gozukara EM, Khan SG, Metin A, Emmert S, Busch DB, Shahlavi T, Coleman DM, Miller M, Chinsomboon N, Stefanini M and others. 2001. A stop codon in xeroderma pigmentosum group C families in Turkey and Italy: molecular genetic evidence for a common ancestor. J Invest Dermatol 117(2):197‐204.

230

Graham DS, Wong AK, McHugh NJ, Whittaker JC, Vyse TJ. 2006. Evidence for unique association signals in SLE at the CD28‐CTLA4‐ICOS locus in a family‐based study. Hum Mol Genet 15(21):3195‐205. Greenwood TA, Cadman PE, Stridsberg M, Nguyen S, Taupenot L, Schork NJ, O'Connor DT. 2004. Genome‐wide linkage analysis of chromogranin B expression in the CEPH pedigrees: implications for exocytotic sympathochromaffin secretion in humans. Physiol Genomics 18(1):119‐27. Groden J, Thliveris A, Samowitz W, Carlson M, Gelbert L, Albertsen H, Joslyn G, Stevens J, Spirio L, Robertson M and others. 1991. Identification and characterization of the familial adenomatous polyposis coli gene. Cell 66(3):589‐600. Gruis NA, van der Velden PA, Sandkuijl LA, Prins DE, Weaver‐Feldhaus J, Kamb A, Bergman W, Frants RR. 1995. Homozygotes for CDKN2 (p16) germline mutation in Dutch familial melanoma kindreds. Nat Genet 10(3):351‐3. Gschwind A, Hart S, Fischer OM, Ullrich A. 2003. TACE cleavage of proamphiregulin regulates GPCR‐induced proliferation and motility of cancer cells. EMBO J 22(10):2411‐21. Gstaiger M, Aebersold R. 2009. Applying mass spectrometry‐based proteomics to genetics, genomics and network biology. Nat Rev Genet 10(9):617‐27. Guertin DA, Sabatini DM. 2007. Defining the role of mTOR in cancer. Cancer Cell 12(1):9‐22. Guertin DA, Sabatini DM. 2009. The pharmacology of mTOR inhibition. Sci Signal 2(67):pe24. Guh DP, Zhang W, Bansback N, Amarsi Z, Birmingham CL, Anis AH. 2009. The incidence of co‐morbidities related to obesity and overweight: a systematic review and meta‐analysis. BMC Public Health 9:88. Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King MC. 1990. Linkage of early‐onset familial breast cancer to chromosome 17q21. Science 250(4988):1684‐9. Hamilton SR, Liu B, Parsons RE, Papadopoulos N, Jen J, Powell SM, Krush AJ, Berk T, Cohen Z, Tetu B and others. 1995. The molecular basis of Turcot's syndrome. N Engl J Med 332(13):839‐47. Hanahan D, Weinberg RA. 2000. The hallmarks of cancer. Cell 100(1):57‐70. Hanks S, Coleman K, Reid S, Plaja A, Firth H, Fitzpatrick D, Kidd A, Mehes K, Nash R, Robin N and others. 2004. Constitutional aneuploidy and cancer predisposition caused by biallelic mutations in BUB1B. Nat Genet 36(11):1159‐61. Hartlerode AJ, Scully R. 2009. Mechanisms of double‐strand break repair in somatic mammalian cells. Biochem J 423(2):157‐68. Hartmann W, Kuchler J, Koch A, Friedrichs N, Waha A, Endl E, Czerwitzki J, Metzger D, Steiner S, Wurst P and others. 2009. Activation of phosphatidylinositol‐3'‐ kinase/AKT signaling is essential in hepatoblastoma survival. Clin Cancer Res 15(14):4538‐45.

231

Hata J, Matsuda K, Ninomiya T, Yonemoto K, Matsushita T, Ohnishi Y, Saito S, Kitazono T, Ibayashi S, Iida M and others. 2007. Functional SNP in an Sp1‐ of AGTRL1 gene is associated with susceptibility to brain infarction. Hum Mol Genet 16(6):630‐9. Hawkins PT, Anderson KE, Davidson K, Stephens LR. 2006. Signalling through Class I PI3Ks in mammalian cells. Biochem Soc Trans 34(Pt 5):647‐62. He M, Cornelis MC, Kraft P, van Dam RM, Sun Q, Laurie CC, Mirel DB, Chasman DI, Ridker PM, Hunter DJ and others. Genome‐wide association study identifies variants at the IL18‐BCO2 locus associated with interleukin‐18 levels. Arterioscler Thromb Vasc Biol 30(4):885‐90. Heinrich MC, Corless CL, Duensing A, McGreevey L, Chen CJ, Joseph N, Singer S, Griffith DJ, Haley A, Town A and others. 2003. PDGFRA activating mutations in gastrointestinal stromal tumors. Science 299(5607):708‐10. Hennessy BT, Smith DL, Ram PT, Lu Y, Mills GB. 2005. Exploiting the PI3K/AKT pathway for cancer drug discovery. Nat Rev Drug Discov 4(12):988‐1004. Herbst RS, Heymach JV, Lippman SM. 2008. Lung cancer. N Engl J Med 359(13):1367‐80. Heron MP, Hoyert DL, Murphy SL, Xu JQ, Kochanek KD, Tejada‐Vera B. 2009. Deaths: Final Data for 2006. National vital statistics reports 57(14). Hildebrandt MA, Yang H, Hung MC, Izzo JG, Huang M, Lin J, Ajani JA, Wu X. 2009. Genetic variations in the PI3K/PTEN/AKT/mTOR pathway are associated with clinical outcomes in esophageal cancer patients treated with chemoradiotherapy. J Clin Oncol 27(6):857‐71. Hirsch FR, Herbst RS, Olsen C, Chansky K, Crowley J, Kelly K, Franklin WA, Bunn PA, Jr., Varella‐Garcia M, Gandara DR. 2008. Increased EGFR gene copy number detected by fluorescent in situ hybridization predicts outcome in non‐small‐ cell lung cancer patients treated with cetuximab and chemotherapy. J Clin Oncol 26(20):3351‐7. Hoskins JM, Goldberg RM, Qu P, Ibrahim JG, McLeod HL. 2007. UGT1A1*28 genotype and irinotecan‐induced neutropenia: dose matters. J Natl Cancer Inst 99(17):1290‐5. Hotchkiss RS, Strasser A, McDunn JE, Swanson PE. 2009. Cell death. N Engl J Med 361(16):1570‐83. Howlett NG, Taniguchi T, Olson S, Cox B, Waisfisz Q, De Die‐Smulders C, Persky N, Grompe M, Joenje H, Pals G and others. 2002. Biallelic inactivation of BRCA2 in Fanconi anemia. Science 297(5581):606‐9. Huang RS, Duan S, Kistner EO, Bleibel WK, Delaney SM, Fackenthal DL, Das S, Dolan ME. 2008a. Genetic variants contributing to daunorubicin‐induced cytotoxicity. Cancer Res 68(9):3161‐8. Huang RS, Duan S, Kistner EO, Hartford CM, Dolan ME. 2008b. Genetic variants associated with carboplatin‐induced cytotoxicity in cell lines derived from Africans. Mol Cancer Ther 7(9):3038‐46.

232

Huang S, Ballard D, Zhao H. 2007. The role of heritability in mapping expression quantitative trait loci. BMC Proc 1 Suppl 1:S86. Hutz JE, Kraja AT, McLeod HL, Province MA. 2008. CANDID: a flexible method for prioritizing candidate genes for complex human traits. Genet Epidemiol 32(8):779‐90. Hynes NE, Dey JH. 2009. PI3K inhibition overcomes trastuzumab resistance: blockade of ErbB2/ErbB3 is not always enough. Cancer Cell 15(5):353‐5. Hynes NE, Lane HA. 2005. ERBB receptors and cancer: the complexity of targeted inhibitors. Nat Rev Cancer 5(5):341‐54. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. 2004. Detection of large‐scale variation in the human genome. Nat Genet 36(9):949‐51. Ihle NT, Powis G. 2009. Take your PIK: phosphatidylinositol 3‐kinase inhibitors race through the clinic and toward cancer therapy. Mol Cancer Ther 8(1):1‐9. Iliopoulos O, Levy AP, Jiang C, Kaelin WG, Jr., Goldberg MA. 1996. Negative regulation of hypoxia‐inducible genes by the von Hippel‐Lindau protein. Proc Natl Acad Sci U S A 93(20):10595‐9. Ingelman‐Sundberg M. 2005. Genetic polymorphisms of cytochrome P450 2D6 (CYP2D6): clinical consequences, evolutionary aspects and functional diversity. Pharmacogenomics J 5(1):6‐13. Itoh T, Linn S, Kamide R, Tokushige H, Katori N, Hosaka Y, Yamaizumi M. 2000. Xeroderma pigmentosum variant heterozygotes show reduced levels of recovery of replicative DNA synthesis in the presence of after ultraviolet irradiation. J Invest Dermatol 115(6):981‐5. Itoh T, Mori T, Ohkubo H, Yamaizumi M. 1999. A newly identified patient with clinical xeroderma pigmentosum phenotype has a non‐sense mutation in the DDB2 gene and incomplete repair in (6‐4) photoproducts. J Invest Dermatol 113(2):251‐7. Jansen RC, Nap JP. 2001. Genetical genomics: the added value from segregation. Trends Genet 17(7):388‐91. Jen KY, Cheung VG. 2003. Transcriptional response of lymphoblastoid cells to ionizing radiation. Genome Res 13(9):2092‐100. Jin Y, Desta Z, Stearns V, Ward B, Ho H, Lee KH, Skaar T, Storniolo AM, Li L, Araba A and others. 2005. CYP2D6 genotype, use, and tamoxifen metabolism during adjuvant breast cancer treatment. J Natl Cancer Inst 97(1):30‐9. Jinnin M, Medici D, Park L, Limaye N, Liu Y, Boscolo E, Bischoff J, Vikkula M, Boye E, Olsen BR. 2008. Suppressed NFAT‐dependent VEGFR1 expression and constitutive VEGFR2 signaling in infantile hemangioma. Nat Med 14(11):1236‐46. Jones S, Emmerson P, Maynard J, Best JM, Jordan S, Williams GT, Sampson JR, Cheadle JP. 2002. Biallelic germline mutations in MYH predispose to multiple

233

colorectal adenoma and somatic G:C‐‐>T:A mutations. Hum Mol Genet 11(23):2961‐7. Jones S, Hruban RH, Kamiyama M, Borges M, Zhang X, Parsons DW, Lin JC, Palmisano E, Brune K, Jaffee EM and others. 2009. Exomic sequencing identifies PALB2 as a pancreatic cancer susceptibility gene. Science 324(5924):217. Karppinen SM, Heikkinen K, Rapakko K, Winqvist R. 2004. Mutation screening of the BARD1 gene: evidence for involvement of the Cys557Ser allele in hereditary susceptibility to breast cancer. J Med Genet 41(9):e114. Katso R, Okkenhaug K, Ahmadi K, White S, Timms J, Waterfield MD. 2001. Cellular function of phosphoinositide 3‐kinases: implications for development, homeostasis, and cancer. Annu Rev Cell Dev Biol 17:615‐75. Kirschner LS, Carney JA, Pack SD, Taymans SE, Giatzakis C, Cho YS, Cho‐Chung YS, Stratakis CA. 2000. Mutations of the gene encoding the type I‐alpha regulatory subunit in patients with the Carney complex. Nat Genet 26(1):89‐92. Kittles RA, Baffoe‐Bonnie AB, Moses TY, Robbins CM, Ahaghotu C, Huusko P, Pettaway C, Vijayakumar S, Bennett J, Hoke G and others. 2006. A common nonsense mutation in EphB2 is associated with prostate cancer risk in African American men with a positive family history. J Med Genet 43(6):507‐ 11. Kiyatkin A, Aksamitiene E, Markevich NI, Borisov NM, Hoek JB, Kholodenko BN. 2006. Scaffolding protein Grb2‐associated binder 1 sustains epidermal growth factor‐induced mitogenic and survival signaling by multiple loops. J Biol Chem 281(29):19925‐38. Kohler S, Bauer S, Horn D, Robinson PN. 2008. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82(4):949‐58. Kong S, Wei Q, Amos CI, Lynch PM, Levin B, Zong J, Frazier ML. 2001. Cyclin D1 polymorphism and increased risk of colorectal cancer at young age. J Natl Cancer Inst 93(14):1106‐8. Konopatskaya O, Poole AW. 2009. Protein kinase Calpha: disease regulator and therapeutic target. Trends Pharmacol Sci. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L and others. 2007. Paired‐end mapping reveals extensive structural variation in the human genome. Science 318(5849):420‐6. Koyasu S. 2003. The role of PI3K in immune cells. Nat Immunol 4(4):313‐9. Kruglyak L, Daly MJ, Reeve‐Daly MP, Lander ES. 1996. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58(6):1347‐63. Kubo M, Hata J, Ninomiya T, Matsuda K, Yonemoto K, Nakano T, Matsushita T, Yamazaki K, Ohnishi Y, Saito S and others. 2007. A nonsynonymous SNP in PRKCH (protein kinase C eta) increases the risk of cerebral infarction. Nat Genet 39(2):212‐7.

234

Kufe DW, Holland JF, Frei E, American Cancer Society. 2003. Cancer medicine 6. Hamilton, Ont.: BC Decker. Kumar A, Wolpert C, Kandt RS, Segal J, Pufky J, Roses AD, Pericak‐Vance MA, Gilbert JR. 1995. A de novo frame‐shift mutation in the tuberin gene. Hum Mol Genet 4(8):1471‐2. Kummar S, Gutierrez ME, Gardner ER, Figg WD, Melillo G, Dancey J, Sausville EA, Conley BA, Murgo AJ, Doroshow JH. 2009. A phase I trial of UCN‐01 and prednisone in patients with refractory solid tumors and lymphomas. Cancer Chemother Pharmacol. La Vecchia C. 2009. Association between Mediterranean dietary patterns and cancer risk. Nutr Rev 67 Suppl 1:S126‐9. Ladroue C, Carcenac R, Leporrier M, Gad S, Le Hello C, Galateau‐Salle F, Feunteun J, Pouyssegur J, Richard S, Gardie B. 2008. PHD2 mutation and congenital erythrocytosis with paraganglioma. N Engl J Med 359(25):2685‐92. Lander E, Kruglyak L. 1995. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11(3):241‐7. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W and others. 2001. Initial sequencing and analysis of the human genome. Nature 409(6822):860‐921. Latif F, Tory K, Gnarra J, Yao M, Duh FM, Orcutt ML, Stackhouse T, Kuzmin I, Modi W, Geil L and others. 1993. Identification of the von Hippel‐Lindau disease tumor suppressor gene. Science 260(5112):1317‐20. Le Marchand L, Seifried A, Lum‐Jones A, Donlon T, Wilkens LR. 2003. Association of the cyclin D1 A870G polymorphism with advanced colorectal cancer. JAMA 290(21):2843‐8. Leppert M, Dobbs M, Scambler P, O'Connell P, Nakamura Y, Stauffer D, Woodward S, Burt R, Hughes J, Gardner E and others. 1987. The gene for familial polyposis coli maps to the long arm of chromosome 5. Science 238(4832):1411‐3. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford‐Shore BH, McGrath S, Hickenbotham M and others. 2008. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456(7218):66‐72. Li J, Yen C, Liaw D, Podsypanina K, Bose S, Wang SI, Puc J, Miliaresis C, Rodgers L, McCombie R and others. 1997. PTEN, a putative protein tyrosine phosphatase gene mutated in human brain, breast, and prostate cancer. Science 275(5308):1943‐7. Li M, Li C, Guan W. 2008. Evaluation of coverage variation of SNP chips for genome‐ wide association studies. Eur J Hum Genet. Li Y, Patra JC. Integration of multiple data sources to prioritize candidate genes using discounted rating system. BMC Bioinformatics 11 Suppl 1:S20. Liang J, Choi J, Clardy J. 1999. Refined structure of the FKBP12‐rapamycin‐FRB ternary complex at 2.2 A resolution. Acta Crystallogr D Biol Crystallogr 55(Pt 4):736‐44.

235

Liang X, Shen D, Huang Y, Yin C, Bojanowski CM, Zhuang Z, Chan CC. 2007. Molecular pathology and CXCR4 expression in surgically excised retinal hemangioblastomas associated with von Hippel‐Lindau disease. Ophthalmology 114(1):147‐56. Liao Y, Hung MC. Physiological regulation of Akt activity and stability. Am J Transl Res 2(1):19‐42. Linghu B, Snitkin ES, Hu Z, Xia Y, Delisi C. 2009. Genome‐wide prioritization of disease genes and identification of disease‐disease associations from an integrated human functional linkage network. Genome Biol 10(9):R91. Liu F, Arias‐Vasquez A, Sleegers K, Aulchenko YS, Kayser M, Sanchez‐Juan P, Feng BJ, Bertoli‐Avella AM, van Swieten J, Axenovich TI and others. 2007. A genomewide screen for late‐onset Alzheimer disease in a genetically isolated dutch population. Am J Hum Genet 81(1):17‐31. Liu HX, Zhou XL, Liu T, Werelius B, Lindmark G, Dahl N, Lindblom A. 2003. The role of hMLH3 in familial colorectal cancer. Cancer Res 63(8):1894‐9. Liu P, Cheng H, Roberts TM, Zhao JJ. 2009. Targeting the phosphoinositide 3‐kinase pathway in cancer. Nat Rev Drug Discov 8(8):627‐44. Lodovici M, Bigagli E. 2009. Biomarkers of induced active and passive smoking damage. Int J Environ Res Public Health 6(3):874‐88. Lopez‐Bigas N, Ouzounis CA. 2004. Genome‐wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res 32(10):3108‐14. Lucae S, Salyakina D, Barden N, Harvey M, Gagne B, Labbe M, Binder EB, Uhr M, Paez‐Pereda M, Sillaber I and others. 2006. P2RX7, a gene coding for a purinergic ligand‐gated ion channel, is associated with major depressive disorder. Hum Mol Genet 15(16):2438‐45. Ma RW, Chapman K. 2009. A systematic review of the effect of diet in prostate cancer prevention and treatment. J Hum Nutr Diet 22(3):187‐99; quiz 200‐2. Ma X, Lee H, Wang L, Sun F. 2007. CGI: a new approach for prioritizing genes by combining gene expression and protein‐protein interaction data. Bioinformatics 23(2):215‐21. Maier T, Guell M, Serrano L. 2009. Correlation of mRNA and protein in complex biological samples. FEBS Lett 583(24):3966‐73. Manning BD, Cantley LC. 2007. AKT/PKB signaling: navigating downstream. Cell 129(7):1261‐74. Martinelli E, De Palma R, Orditura M, De Vita F, Ciardiello F. 2009. Anti‐epidermal growth factor receptor monoclonal antibodies in cancer therapy. Clin Exp Immunol 158(1):1‐9. Massey AJ, Schoepfer J, Brough PA, Brueggen J, Chene P, Drysdale MJ, Pfaar U, Radimerski T, Ruetz S, Schweitzer A and others. Preclinical antitumor activity of the orally available heat shock protein 90 inhibitor NVP‐BEP800. Mol Cancer Ther 9(4):906‐19.

236

McGoldrick JP, Yeh YC, Solomon M, Essigmann JM, Lu AL. 1995. Characterization of a mammalian homolog of the Escherichia coli MutY mismatch repair protein. Mol Cell Biol 15(2):989‐96. Meek DW. 2009. Tumour suppression by p53: a role for the DNA damage response? Nat Rev Cancer 9(10):714‐23. Miki Y, Swensen J, Shattuck‐Eidens D, Futreal PA, Harshman K, Tavtigian S, Liu Q, Cochran C, Bennett LM, Ding W and others. 1994. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266(5182):66‐ 71. Miller AJ, Mihm MC, Jr. 2006. Melanoma. N Engl J Med 355(1):51‐65. Miller RW, Rubinstein JH. 1995. Tumors in Rubinstein‐Taybi syndrome. Am J Med Genet 56(1):112‐5. Mio F, Chiba K, Hirose Y, Kawaguchi Y, Mikami Y, Oya T, Mori M, Kamata M, Matsumoto M, Ozaki K and others. 2007. A functional polymorphism in COL11A1, which encodes the alpha 1 chain of type XI collagen, is associated with susceptibility to lumbar disc herniation. Am J Hum Genet 81(6):1271‐7. Miyaki M, Konishi M, Tanaka K, Kikuchi‐Yanoshita R, Muraoka M, Yasuno M, Igari T, Koike M, Chiba M, Mori T. 1997. Germline mutation of MSH6 as the cause of hereditary nonpolyposis colorectal cancer. Nat Genet 17(3):271‐2. Miyamoto Y, Mabuchi A, Shi D, Kubo T, Takatori Y, Saito S, Fujioka M, Sudo A, Uchida A, Yamamoto S and others. 2007. A functional polymorphism in the 5' UTR of GDF5 is associated with susceptibility to osteoarthritis. Nat Genet 39(4):529‐ 33. Moffatt MF, Kabesch M, Liang L, Dixon AL, Strachan D, Heath S, Depner M, von Berg A, Bufe A, Rietschel E and others. 2007. Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448(7152):470‐3. Moghaddam AA, Woodward M, Huxley R. 2007. Obesity and risk of colorectal cancer: a meta‐analysis of 31 studies with 70,000 events. Cancer Epidemiol Biomarkers Prev 16(12):2533‐47. Molinari M. 2000. Cell cycle checkpoints and their inactivation in human cancer. Cell Prolif 33(5):261‐74. Monks SA, Leonardson A, Zhu H, Cundiff P, Pietrusiak P, Edwards S, Phillips JW, Sachs A, Schadt EE. 2004. Genetic inheritance of gene expression in human cell lines. Am J Hum Genet 75(6):1094‐105. Mootha VK, Lepage P, Miller K, Bunkenborg J, Reich M, Hjerrild M, Delmonte T, Villeneuve A, Sladek R, Xu F and others. 2003. Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics. Proc Natl Acad Sci U S A 100(2):605‐10. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG. 2004. Genetic analysis of genome‐wide variation in human gene expression. Nature 430(7001):743‐7.

237

Morrison JL, Breitling R, Higham DJ, Gilbert DR. 2005. GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics 6:233. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan‐Smith E, Dalton JD, Girtman K, Mathew S, Ma J, Pounds SB and others. 2007. Genome‐wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446(7137):758‐ 64. Nakamura Y, Lathrop M, Leppert M, Dobbs M, Wasmuth J, Wolff E, Carlson M, Fujimoto E, Krapcho K, Sears T and others. 1988. Localization of the genetic defect in familial adenomatous polyposis within a small region of chromosome 5. Am J Hum Genet 43(5):638‐44. Naylor EW, Gardner EJ. 1977. Penetrance and expressivity of the gene responsible for the Gardner syndrome. Clin Genet 11(6):381‐93. Nellist M, Goedbloed MA, de Winter C, Verhaaf B, Jankie A, Reuser AJ, van den Ouweland AM, van der Sluijs P, Halley DJ. 2002. Identification and characterization of the interaction between tuberin and 14‐3‐3zeta. J Biol Chem 277(42):39417‐24. Nelson MH, Dolder CR. 2006. Lapatinib: a novel dual tyrosine kinase inhibitor with activity in solid tumors. Ann Pharmacother 40(2):261‐9. New DC, Wong YH. 2007. Molecular mechanisms mediating the G protein‐coupled receptor regulation of cell cycle progression. J Mol Signal 2:2. Nicolaides NC, Papadopoulos N, Liu B, Wei YF, Carter KC, Ruben SM, Rosen CA, Haseltine WA, Fleischmann RD, Fraser CM and others. 1994. Mutations of two PMS homologues in hereditary nonpolyposis colon cancer. Nature 371(6492):75‐80. Niculescu AB, 3rd, Segal DS, Kuczenski R, Barrett T, Hauger RL, Kelsoe JR. 2000. Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. Physiol Genomics 4(1):83‐91. Niida H, Nakanishi M. 2006. DNA damage checkpoints in mammals. Mutagenesis 21(1):3‐9. Nishisho I, Nakamura Y, Miyoshi Y, Miki Y, Ando H, Horii A, Koyama K, Utsunomiya J, Baba S, Hedge P. 1991. Mutations of chromosome 5q21 genes in FAP and colorectal cancer patients. Science 253(5020):665‐9. O'Reilly KE, Rojo F, She QB, Solit D, Mills GB, Smith D, Lane H, Hofmann F, Hicklin DJ, Ludwig DL and others. 2006. mTOR inhibition induces upstream signaling and activates Akt. Cancer Res 66(3):1500‐8. Ocana A, Amir E. 2009. Irreversible pan‐ErbB tyrosine kinase inhibitors and breast cancer: current status and future directions. Cancer Treat Rev 35(8):685‐91. Oh KS, Khan SG, Jaspers NG, Raams A, Ueda T, Lehmann A, Friedmann PS, Emmert S, Gratchev A, Lachlan K and others. 2006. Phenotypic heterogeneity in the XPB DNA helicase gene (ERCC3): xeroderma pigmentosum without and with Cockayne syndrome. Hum Mutat 27(11):1092‐103.

238

Oliveira JB, Bidere N, Niemela JE, Zheng L, Sakai K, Nix CP, Danner RL, Barb J, Munson PJ, Puck JM and others. 2007. NRAS mutation causes a human autoimmune lymphoproliferative syndrome. Proc Natl Acad Sci U S A 104(21):8953‐8. Osborne CK. 1998. Tamoxifen in the treatment of breast cancer. N Engl J Med 339(22):1609‐18. Ostergaard JR, Sunde L, Okkels H. 2005. Neurofibromatosis von Recklinghausen type I phenotype and early onset of cancers in siblings compound heterozygous for mutations in MSH6. Am J Med Genet A 139A(2):96‐105; discussion 96. Oti M, Snel B, Huynen MA, Brunner HG. 2006. Predicting disease genes using protein‐protein interactions. J Med Genet 43(8):691‐8. Pai SI, Wu GS, Ozoren N, Wu L, Jen J, Sidransky D, El‐Deiry WS. 1998. Rare loss‐of‐ function mutation of a death receptor gene in head and neck cancer. Cancer Res 58(16):3513‐8. Palm W, de Lange T. 2008. How shelterin protects mammalian telomeres. Annu Rev Genet 42:301‐34. Papadopoulos N, Nicolaides NC, Wei YF, Ruben SM, Carter KC, Rosen CA, Haseltine WA, Fleischmann RD, Fraser CM, Adams MD and others. 1994. Mutation of a mutL homolog in hereditary colon cancer. Science 263(5153):1625‐9. Papassotiropoulos A, Stephan DA, Huentelman MJ, Hoerndli FJ, Craig DW, Pearson JV, Huynh KD, Brunner F, Corneveaux J, Osborne D and others. 2006. Common Kibra alleles are associated with human memory performance. Science 314(5798):475‐8. Pare G, Serre D, Brisson D, Anand SS, Montpetit A, Tremblay G, Engert JC, Hudson TJ, Gaudet D. 2007. Genetic analysis of 103 candidate genes for coronary artery disease and associated phenotypes in a founder population reveals a new association between ‐1 and high‐density lipoprotein cholesterol. Am J Hum Genet 80(4):673‐82. Parrilla‐Castellar ER, Arlander SJ, Karnitz L. 2004. Dial 9‐1‐1 for DNA damage: the Rad9‐Hus1‐Rad1 (9‐1‐1) clamp complex. DNA Repair (Amst) 3(8‐9):1009‐ 14. Parsons R, Li GM, Longley M, Modrich P, Liu B, Berk T, Hamilton SR, Kinzler KW, Vogelstein B. 1995. Mismatch repair deficiency in phenotypically normal human cells. Science 268(5211):738‐40. Penault‐Llorca F, Bilous M, Dowsett M, Hanna W, Osamura RY, Ruschoff J, van de Vijver M. 2009. Emerging technologies for assessing HER2 amplification. Am J Clin Pathol 132(4):539‐48. Perez‐Iratxeta C, Bork P, Andrade‐Navarro MA. 2007. Update of the G2D tool for prioritization of gene candidates to inherited diseases. Nucleic Acids Res. Perez‐Iratxeta C, Wjst M, Bork P, Andrade MA. 2005. G2D: a tool for mining genes associated with disease. BMC Genet 6:45. Petrij F, Giles RH, Dauwerse HG, Saris JJ, Hennekam RC, Masuno M, Tommerup N, van Ommen GJ, Goodman RH, Peters DJ and others. 1995. Rubinstein‐Taybi

239

syndrome caused by mutations in the transcriptional co‐activator CBP. Nature 376(6538):348‐51. Pilia G, Chen WM, Scuteri A, Orru M, Albai G, Dei M, Lai S, Usala G, Lai M, Loi P and others. 2006. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet 2(8):e132. Pleasance ED, Stephens PJ, O'Meara S, McBride DJ, Meynert A, Jones D, Lin ML, Beare D, Lau KW, Greenman C and others. 2009. A small‐cell lung cancer genome with complex signatures of tobacco exposure. Nature. Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR and others. 2007. TRAF1‐C5 as a risk locus for rheumatoid arthritis‐‐a genomewide study. N Engl J Med 357(12):1199‐209. Poschl G, Seitz HK. 2004. Alcohol and cancer. Alcohol Alcohol 39(3):155‐65. Prescott SM, Lalouel JM, Leppert M. 2008. From linkage maps to quantitative trait loci: the history and science of the Utah genetic reference project. Annu Rev Genomics Hum Genet 9:347‐58. Purohit V, Khalsa J, Serrano J. 2005. Mechanisms of alcohol‐associated cancers: introduction and summary of the symposium. Alcohol 35(3):155‐60. Qiao M, Sheng S, Pardee AB. 2008. Metastasis and AKT activation. Cell Cycle 7(19):2991‐6. Raynaud FI, Eccles SA, Patel S, Alix S, Box G, Chuckowree I, Folkes A, Gowan S, De Haven Brandon A, Di Stefano F and others. 2009. Biological properties of potent inhibitors of class I phosphatidylinositide 3‐kinases: from PI‐103 through PI‐540, PI‐620 to the oral agent GDC‐0941. Mol Cancer Ther 8(7):1725‐38. Reyland ME. 2009. Protein kinase C isoforms: Multi‐functional regulators of cell life and death. Front Biosci 14:2386‐99. Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG, Andrew T, Falchi M, Gwilliam R, Ahmadi KR and others. 2008. Bone mineral density, osteoporosis, and osteoporotic fractures: a genome‐wide association study. Lancet 371(9623):1505‐12. Riemenschneider M, Konta L, Friedrich P, Schwarz S, Taddei K, Neff F, Padovani A, Kolsch H, Laws SM, Klopp N and others. 2006. A functional polymorphism within plasminogen activator urokinase (PLAU) is associated with Alzheimer's disease. Hum Mol Genet 15(16):2446‐56. Rigaud S, Fondaneche MC, Lambert N, Pasquier B, Mateo V, Soulas P, Galicier L, Le Deist F, Rieux‐Laucat F, Revy P and others. 2006. XIAP deficiency in humans causes an X‐linked lymphoproliferative syndrome. Nature 444(7115):110‐4. Rini BI. 2008. Temsirolimus, an inhibitor of mammalian target of rapamycin. Clin Cancer Res 14(5):1286‐90. Rioux JD, Xavier RJ, Taylor KD, Silverberg MS, Goyette P, Huett A, Green T, Kuballa P, Barmada MM, Datta LW and others. 2007. Genome‐wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet 39(5):596‐604.

240

Roe JS, Kim H, Lee SM, Kim ST, Cho EJ, Youn HD. 2006. p53 stabilization and transactivation by a von Hippel‐Lindau protein. Mol Cell 22(3):395‐405. Roelfsema JH, White SJ, Ariyurek Y, Bartholdi D, Niedrist D, Papadia F, Bacino CA, den Dunnen JT, van Ommen GJ, Breuning MH and others. 2005. Genetic heterogeneity in Rubinstein‐Taybi syndrome: mutations in both the CBP and EP300 genes cause disease. Am J Hum Genet 76(4):572‐80. Rogaeva E, Meng Y, Lee JH, Gu Y, Kawarai T, Zou F, Katayama T, Baldwin CT, Cheng R, Hasegawa H and others. 2007. The neuronal sortilin‐related receptor SORL1 is genetically associated with Alzheimer disease. Nat Genet 39(2):168‐77. Romeo S, Pennacchio LA, Fu Y, Boerwinkle E, Tybjaerg‐Hansen A, Hobbs HH, Cohen JC. 2007. Population‐based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 39(4):513‐6. Rommel C, Clarke BA, Zimmermann S, Nunez L, Rossman R, Reid K, Moelling K, Yancopoulos GD, Glass DJ. 1999. Differentiation stage‐specific inhibition of the Raf‐MEK‐ERK pathway by Akt. Science 286(5445):1738‐41. Rossi S, Masotti D, Nardini C, Bonora E, Romeo G, Macii E, Benini L, Volinia S. 2006. TOM: a web‐based integrated approach for identification of candidate disease genes. Nucleic Acids Res 34(Web Server issue):W285‐92. Saccone NL, Kwon JM, Corbett J, Goate A, Rochberg N, Edenberg HJ, Foroud T, Li TK, Begleiter H, Reich T and others. 2000. A genome screen of maximum number of drinks as an alcoholism phenotype. Am J Med Genet 96(5):632‐7. Sanoff HK, McLeod HL. 2008. Predictive Factors for Response and Toxicity in Chemotherapy: Pharmacogenomics. Semin Colon Rectal Surg 19(4):226‐230. Santarelli RL, Pierre F, Corpet DE. 2008. Processed meat and colorectal cancer: a review of epidemiologic and experimental evidence. Nutr Cancer 60(2):131‐ 44. Sato S, Fujita N, Tsuruo T. 2000. Modulation of Akt kinase activity by binding to Hsp90. Proc Natl Acad Sci U S A 97(20):10832‐7. Saxena R, Hivert MF, Langenberg C, Tanaka T, Pankow JS, Vollenweider P, Lyssenko V, Bouatia‐Naji N, Dupuis J, Jackson AU and others. Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat Genet 42(2):142‐8. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H, Roix JJ, Kathiresan S, Hirschhorn JN, Daly MJ and others. 2007. Genome‐wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316(5829):1331‐6. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, Guhathakurta D, Sieberts SK, Monks S, Reitman M, Zhang C and others. 2005. An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37(7):710‐7.

241

Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G and others. 2003. Genetics of gene expression surveyed in maize, mouse and man. Nature 422(6929):297‐302. Schafer B, Gschwind A, Ullrich A. 2004. Multiple G‐protein‐coupled receptor signals converge on the epidermal growth factor receptor to promote migration and invasion. Oncogene 23(4):991‐9. Schouten LJ, Rivera C, Hunter DJ, Spiegelman D, Adami HO, Arslan A, Beeson WL, van den Brandt PA, Buring JE, Folsom AR and others. 2008. Height, body mass index, and ovarian cancer: a pooled analysis of 12 cohort studies. Cancer Epidemiol Biomarkers Prev 17(4):902‐12. Schulze TG, Chen YS, Badner JA, McInnis MG, DePaulo JR, Jr., McMahon FJ. 2003. Additional, physically ordered markers increase linkage signal for bipolar disorder on chromosome 18q22. Biol Psychiatry 53(3):239‐43. Schwartz GK, Shah MA. 2005. Targeting the cell cycle: a new approach to cancer therapy. J Clin Oncol 23(36):9408‐21. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU and others. 2007. A genome‐wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316(5829):1341‐5. Seal S, Thompson D, Renwick A, Elliott A, Kelly P, Barfoot R, Chagtai T, Jayatilake H, Ahmed M, Spanova K and others. 2006. Truncating mutations in the Fanconi anemia J gene BRIP1 are low‐penetrance breast cancer susceptibility alleles. Nat Genet 38(11):1239‐41. Sebolt‐Leopold JS, Herrera R. 2004. Targeting the mitogen‐activated protein kinase cascade to treat cancer. Nat Rev Cancer 4(12):937‐47. Seelow D, Schwarz JM, Schuelke M. 2008. GeneDistiller‐‐distilling candidate genes from linkage intervals. PLoS One 3(12):e3874. Semenza GL. 2009. Defining the role of hypoxia‐inducible factor 1 in cancer biology and therapeutics. Oncogene. Shaw‐Smith C, Pittman AM, Willatt L, Martin H, Rickman L, Gribble S, Curley R, Cumming S, Dunn C, Kalaitzopoulos D and others. 2006. Microdeletion encompassing MAPT at chromosome 17q21.3 is associated with developmental delay and learning disability. Nat Genet 38(9):1032‐7. Shen GQ, Li L, Girelli D, Seidelmann SB, Rao S, Fan C, Park JE, Xi Q, Li J, Hu Y and others. 2007. An LRP8 variant is associated with familial and premature coronary artery disease and myocardial infarction. Am J Hum Genet 81(4):780‐91. Shields JA, Eagle RC, Jr., Shields CL, Marr BP. 2005. Aggressive retinal astrocytomas in 4 patients with tuberous sclerosis complex. Arch Ophthalmol 123(6):856‐ 63. Shukla SJ, Dolan ME. 2005. Use of CEPH and non‐CEPH lymphoblast cell lines in pharmacogenetic studies. Pharmacogenomics 6(3):303‐10.

242

Sijbers AM, de Laat WL, Ariza RR, Biggerstaff M, Wei YF, Moggs JG, Carter KC, Shell BK, Evans E, de Jong MC and others. 1996. Xeroderma pigmentosum group F caused by a defect in a structure‐specific DNA repair endonuclease. Cell 86(5):811‐22. Singh P, Rubin N. 1993. Insulinlike growth factors and binding proteins in colon cancer. Gastroenterology 105(4):1218‐37. Sjalander A, Birgander R, Rannug A, Alexandrie AK, Tornling G, Beckman G. 1996. Association between the p21 codon 31 A1 (arg) allele and lung cancer. Hum Hered 46(4):221‐5. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S and others. 2007. A genome‐wide association study identifies novel risk loci for type 2 diabetes. Nature 445(7130):881‐5. Smider V, Chu G. 1997. The end‐joining reaction in V(D)J recombination. Semin Immunol 9(3):189‐97. Smidt M, Kirsch I, Ratner L. 1990. Deletion of Alu sequences in the fifth c‐sis intron in individuals with meningiomas. J Clin Invest 86(4):1151‐7. Smith AV, Thomas DJ, Munro HM, Abecasis GR. 2005. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res 15(11):1519‐34. Soufir N, Avril MF, Chompret A, Demenais F, Bombled J, Spatz A, Stoppa‐Lyonnet D, Benard J, Bressac‐de Paillerets B. 1998. Prevalence of p16 and CDK4 germline mutations in 48 melanoma‐prone families in France. The French Familial Melanoma Study Group. Hum Mol Genet 7(2):209‐16. Sparkes RS, Murphree AL, Lingua RW, Sparkes MC, Field LL, Funderburk SJ, Benedict WF. 1983. Gene for hereditary retinoblastoma assigned to human chromosome 13 by linkage to esterase D. Science 219(4587):971‐3. Stamm DS, Siegel DG, Mehltretter L, Connelly JJ, Trott A, Ellis N, Zismann V, Stephan DA, George TM, Vekemans M and others. 2008. Refinement of 2q and 7p loci in a large multiplex NTD family. Birth Defects Res A Clin Mol Teratol 82(6):441‐52. Stefansson H, Rye DB, Hicks A, Petursson H, Ingason A, Thorgeirsson TE, Palsson S, Sigmundsson T, Sigurdsson AP, Eiriksdottir I and others. 2007. A genetic risk factor for periodic limb movements in sleep. N Engl J Med 357(7):639‐47. Stokowski RP, Pant PV, Dadd T, Fereday A, Hinds DA, Jarman C, Filsell W, Ginger RS, Green MR, van der Ouderaa FJ and others. 2007. A genomewide association study of skin pigmentation in a South Asian population. Am J Hum Genet 81(6):1119‐32. Stommel JM, Kimmelman AC, Ying H, Nabioullin R, Ponugoti AH, Wiedemeyer R, Stegh AH, Bradner JE, Ligon KL, Brennan C and others. 2007. Coactivation of receptor tyrosine kinases affects the response of tumor cells to targeted therapies. Science 318(5848):287‐90. Strachan T, Read AP. 2004. Human molecular genetics. New York: Garland Press. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C and others. 2007. Relative impact of nucleotide and

243

copy number variation on gene expression phenotypes. Science 315(5813):848‐53. Straus SE, Jaffe ES, Puck JM, Dale JK, Elkon KB, Rosen‐Wolff A, Peters AM, Sneller MC, Hallahan CW, Wang J and others. 2001. The development of lymphomas in families with autoimmune lymphoproliferative syndrome with germline Fas mutations and defective lymphocyte apoptosis. Blood 98(1):194‐200. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G and others. 2004. A gene atlas of the mouse and human protein‐encoding transcriptomes. Proc Natl Acad Sci U S A 101(16):6062‐7. Sulem P, Gudbjartsson DF, Stacey SN, Helgason A, Rafnar T, Magnusson KP, Manolescu A, Karason A, Palsson A, Thorleifsson G and others. 2007. Genetic determinants of hair, eye and skin pigmentation in Europeans. Nat Genet 39(12):1443‐52. Sun Q, Cornelis MC, Kraft P, Qi L, van Dam RM, Girman CJ, Laurie CC, Mirel DB, Gong H, Sheu CC and others. Genome‐wide association study identifies polymorphisms in LEPR as determinants of plasma soluble levels. Hum Mol Genet. Sun T, Gao Y, Tan W, Ma S, Shi Y, Yao J, Guo Y, Yang M, Zhang X, Zhang Q and others. 2007. A six‐nucleotide insertion‐deletion polymorphism in the CASP8 promoter is associated with susceptibility to multiple cancers. Nat Genet 39(5):605‐13. Tabernero J, Rojo F, Calvo E, Burris H, Judson I, Hazell K, Martinelli E, Ramon y Cajal S, Jones S, Vidal L and others. 2008. Dose‐ and schedule‐dependent inhibition of the mammalian target of rapamycin pathway with everolimus: a phase I tumor pharmacodynamic study in patients with advanced solid tumors. J Clin Oncol 26(10):1603‐10. Tanaka K, Miura N, Satokata I, Miyamoto I, Yoshida MC, Satoh Y, Kondo S, Yasui A, Okayama H, Okada Y. 1990. Analysis of a human DNA excision repair gene involved in group A xeroderma pigmentosum and containing a zinc‐finger domain. Nature 348(6296):73‐6. Tang D, Phillips DH, Stampfer M, Mooney LA, Hsu Y, Cho S, Tsai WY, Ma J, Cole KJ, She MN and others. 2001. Association between carcinogen‐DNA adducts in white blood cells and lung cancer risk in the physicians health study. Cancer Res 61(18):6708‐12. Tang D, Santella RM, Blackwood AM, Young TL, Mayer J, Jaretzki A, Grantham S, Tsai WY, Perera FP. 1995. A molecular epidemiological case‐control study of lung cancer. Cancer Epidemiol Biomarkers Prev 4(4):341‐6. Tang P, Skinner KA, Hicks DG. 2009. Molecular classification of breast carcinomas by immunohistochemical analysis: are we ready? Diagn Mol Pathol 18(3):125‐ 32. Tarpey PS, Stevens C, Teague J, Edkins S, O'Meara S, Avis T, Barthorpe S, Buck G, Butler A, Cole J and others. 2006. Mutations in the gene encoding the Sigma 2

244

subunit of the adaptor protein 1 complex, AP1S2, cause X‐linked mental retardation. Am J Hum Genet 79(6):1119‐24. Taylor M, Dehainault C, Desjardins L, Doz F, Levy C, Sastre X, Couturier J, Stoppa‐ Lyonnet D, Houdayer C, Gauthier‐Villars M. 2007. Genotype‐phenotype correlations in hereditary familial retinoblastoma. Hum Mutat 28(3):284‐93. Tosh K, Ravikumar M, Bell JT, Meisner S, Hill AV, Pitchappan R. 2006. Variation in MICA and MICB genes and enhanced susceptibility to paucibacillary leprosy in South India. Hum Mol Genet 15(19):2880‐7. Troyanskaya OG. 2005. Putting the 'bio' into bioinformatics. Genome Biol 6(10):351. Turner FS, Clutterbuck DR, Semple CA. 2003. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol 4(11):R75. Vahteristo P, Tamminen A, Karvinen P, Eerola H, Eklund C, Aaltonen LA, Blomqvist C, Aittomaki K, Nevanlinna H. 2001. p53, CHK2, and CHK1 genes in Finnish families with Li‐Fraumeni syndrome: further evidence of CHK2 in inherited cancer predisposition. Cancer Res 61(15):5718‐22. van Driel MA, Cuelenaere K, Kemmeren PP, Leunissen JA, Brunner HG. 2003. A new web‐based data mining tool for the identification of candidate genes for human genetic disorders. Eur J Hum Genet 11(1):57‐63. van Kempen GM, van Vliet LJ. 2000. Mean and variance of ratio estimators used in fluorescence ratio imaging. Cytometry 39(4):300‐5. van Slegtenhorst M, de Hoogt R, Hermans C, Nellist M, Janssen B, Verhoef S, Lindhout D, van den Ouweland A, Halley D, Young J and others. 1997. Identification of the tuberous sclerosis gene TSC1 on chromosome 9q34. Science 277(5327):805‐8. Vasey PA, Jayson GC, Gordon A, Gabra H, Coleman R, Atkinson R, Parkin D, Paul J, Hay A, Kaye SB. 2004. Phase III randomized trial of docetaxel‐carboplatin versus paclitaxel‐carboplatin as first‐line chemotherapy for ovarian carcinoma. J Natl Cancer Inst 96(22):1682‐91. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA and others. 2001. The sequence of the human genome. Science 291(5507):1304‐51. Wang B, Hurov K, Hofmann K, Elledge SJ. 2009. NBA1, a new player in the Brca1 A complex, is required for DNA damage resistance and checkpoint control. Genes Dev 23(6):729‐39. Wang K, Li M, Bucan M. 2007. Pathway‐Based Approaches for Analysis of Genomewide Association Studies. Am J Hum Genet 81(6). Wang WW, Spurdle AB, Kolachana P, Bove B, Modan B, Ebbers SM, Suthers G, Tucker MA, Kaufman DJ, Doody MM and others. 2001. A single nucleotide polymorphism in the 5' untranslated region of RAD51 and risk of cancer among BRCA1/2 mutation carriers. Cancer Epidemiol Biomarkers Prev 10(9):955‐60.

245

Watters JW, Kraja A, Meucci MA, Province MA, McLeod HL. 2004. Genome‐wide discovery of loci influencing chemotherapy cytotoxicity. Proc Natl Acad Sci U S A 101(32):11809‐14. Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM and others. 2008. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. William WN, Jr., Heymach JV, Kim ES, Lippman SM. 2009. Molecular targets for cancer chemoprevention. Nat Rev Drug Discov 8(3):213‐25. Witze ES, Old WM, Resing KA, Ahn NG. 2007. Mapping protein post‐translational modifications with mass spectrometry. Nat Methods 4(10):798‐806. Wooster R, Bignell G, Lancaster J, Swift S, Seal S, Mangion J, Collins N, Gregory S, Gumbs C, Micklem G. 1995. Identification of the breast cancer susceptibility gene BRCA2. Nature 378(6559):789‐92. World Health Organization. The World health report : 2004 : changing history. 2004. Wu X, Hawse JR, Subramaniam M, Goetz MP, Ingle JN, Spelsberg TC. 2009. The tamoxifen metabolite, endoxifen, is a potent antiestrogen that targets estrogen receptor alpha for degradation in breast cancer cells. Cancer Res 69(5):1722‐7. Xin H, Liu D, Songyang Z. 2008. The telosome/shelterin complex and its functions. Genome Biol 9(9):232. Xiong DH, Liu XG, Guo YF, Tan LJ, Wang L, Sha BY, Tang ZH, Pan F, Yang TL, Chen XD and others. 2009. Genome‐wide association and follow‐up replication studies identified ADAMTS18 and TGFBR3 as bone mass candidate genes in different ethnic groups. Am J Hum Genet 84(3):388‐98. Yamamura K, Ichihashi M, Hiramoto T, Ogoshi M, Nishioka K, Fujiwara Y. 1989. Clinical and photobiological characteristics of xeroderma pigmentosum complementation group F: a review of cases from Japan. Br J Dermatol 121(4):471‐80. Yan L, Hsu K, Beckman RA. 2008. Antibody‐based therapy for solid tumors. Cancer J 14(3):178‐83. Yilmaz S, Jonveaux P, Bicep C, Pierron L, Smail‐Tabbone M, Devignes MD. 2009. Gene‐disease relationship discovery based on model‐driven data integration and database view definition. Bioinformatics 25(2):230‐6. Yu CF, Liu ZX, Cantley LG. 2002a. ERK negatively regulates the epidermal growth factor‐mediated interaction of Gab1 and the phosphatidylinositol 3‐kinase. J Biol Chem 277(22):19382‐8. Yu T, Robb VA, Singh V, Gutmann DH, Newsham IF. 2002b. The 4.1/ezrin/radixin/moesin domain of the DAL‐1/Protein 4.1B tumour suppressor interacts with 14‐3‐3 proteins. Biochem J 365(Pt 3):783‐9. Yuan R, Kay A, Berg WJ, Lebwohl D. 2009. Targeting tumorigenesis: development and use of mTOR inhibitors in cancer therapy. J Hematol Oncol 2:45.

246

Yuan TL, Cantley LC. 2008. PI3K pathway alterations in cancer: variations on a theme. Oncogene 27(41):5497‐510. Yuan X, Waterworth D, Perry JR, Lim N, Song K, Chambers JC, Zhang W, Vollenweider P, Stirnadel H, Johnson T and others. 2008. Population‐based genome‐wide association studies reveal six loci influencing plasma levels of liver enzymes. Am J Hum Genet 83(4):520‐8. Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM and others. 2007. Replication of genome‐ wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316(5829):1336‐41. Zhang P, Zhang J, Sheng H, Russo JJ, Osborne B, Buetow K. 2006. Gene functional similarity search tool (GFSST). BMC Bioinformatics 7:135. Zhang W, Dolan ME. 2009. Use of cell lines in the investigation of pharmacogenetic loci. Curr Pharm Des 15(32):3782‐95. Zhang X, Miao X, Sun T, Tan W, Qu S, Xiong P, Zhou Y, Lin D. 2005. Functional polymorphisms in cell death pathway genes FAS and FASL contribute to risk of lung cancer. J Med Genet 42(6):479‐84. Zuo L, Weger J, Yang Q, Goldstein AM, Tucker MA, Walker GJ, Hayward N, Dracopoli NC. 1996. Germline mutations in the p16INK4a binding domain of CDK4 in familial melanoma. Nat Genet 12(1):97‐9.

247