(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date χ Τ It ί Λ ί 1 September 2011 (01.09.2011) WO 2U11/1U473U Al

(51) International Patent Classification: CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, CI2Q 1/68 (2006.01) DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, (21) International Application Number: KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, PCT/IS201 1/050004 ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, (22) International Filing Date: NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD, 24 February 201 1 (24.02.201 1) SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (25) Filing Language: English (84) Designated States (unless otherwise indicated, for every (26) Publication Langi English kind of regional protection available): ARIPO (BW, GH, (30) Priority Data: GM, KE, LR, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, 24 February 2010 (24.02.2010) IS ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK, (71) Applicant (for all designated States except US): DE¬ EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, CODE GENETICS EHF [IS/IS]; Sturlugata 8, Reyk LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, javik, IS-101 Reykjavik (IS). SM, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG). (72) Inventors; and (75) Inventors/ Applicants (for US only): GUDBJARTSSON, Declarations under Rule 4.17 : Daniel [IS/IS]; Sogavegur 38, IS-108 Reykjavik (IS). — of inventorship (Rule 4.1 7(iv)) RAFNAR, Thorunn [IS/IS]; Kvistaland 24, IS-108 Reykjavik (IS). THORGEIRSSON, Thorgeir [IS/IS]; Published: Vesturgata 5a, IS-101 Reykjavik (IS). — with international search report (Art. 21(3)) (81) Designated States (unless otherwise indicated, for every — with sequence listing part of description (Rule 5.2(a)) kind of national protection available): AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ,

© o

o (54) Title: GENETIC VARIANTS PREDICTIVE OF LUNG CANCER RISK (57) Abstract: The present invention discloses certain genetic variants that are susceptibility variants for lung cancer. The inven tion relates to risk assessment and diagnostic methods using the variants. The invention further relates to kits for use in risk assess ment of lung cancer. GENETIC VARIANTS PREDICTIVE OF LUNG CANCER RISK

BACKGROUND OF THE INVENTION Genetic risk is conferred by subtle differences in the genome among individuals in a popu lation . Variations in the hu man genome are most frequently due to single nucleotide polymorphisms (SNPs), although other variations are also importa nt. SNPs are located on average every 1000 base pai rs in the huma n genome. According ly, a typica l hu man containing 250,000 base pairs may contain 250 different SNPs. Only a minor nu mber of SNPs are located in exons and alter the amino acid sequence of the protei n encoded by the gene . Most SNPs may have little or no effect on gene function, while others may alter tra nscription, splicing, translation, or sta bility of the mRNA encoded by the gene. Additional genetic polymorphisms in the are caused by insertions, deletions, tra nslocations or inversion of either short or long stretches of DNA. Genetic polymorphisms conferring disease risk may directly alter the amino acid sequence of , may increase the amou nt of produced from the gene, or may decrease the amount of protein produced by the gene.

As genetic polymorphisms conferring risk of common diseases are uncovered, genetic testing for such risk factors is becoming increasingly important for clinica l medicine. Examples are apoli poprotein E testing to identify genetic carriers of the apoE4 polymorphism in dementia patients for the differentia l diagnosis of Alzheimer's disease, and of Factor V Leiden testing for predisposition to deep venous thrombosis. More importantly, in the treatment of ca ncer, diagnosis of genetic variants in t umor cel ls is used for the selection of the most appropriate treatment regime for the individua l patient. I n breast cancer, genetic variation in estrogen receptor expression or heregu lin type 2 (Her2) receptor tyrosine kinase expression determine if anti-estrogenic drugs (tamoxifen) or anti-Her2 anti body (Herceptin) will be incorporated into the treatment pla n. I n chronic myeloid leu kemia (CML) diag nosis of the Philadelphia genetic translocation fusing the encoding the Bcr and Abl receptor tyrosine kinases indicates that Gleevec (STI57 1), a specific inhibitor of the Bcr-Abl kinase shou ld be used for treatment of the cancer. For CML patients with such a genetic alteration, inhibition of the Bcr- Abl kinase leads to rapid elimination of the t umor cells and remission from leu kemia . Furthermore, genetic testing services are now available, providing individuals with information about their disease risk based on the discovery that certai n SNPs have been associated with risk of many of the common diseases.

Lu ng cancer causes more deaths from cancer worldwide tha n any other form of ca ncer

(Goodman, G.E., Thorax 57: 994-999 (2002)) . I n the United States, lu ng ca ncer is the prima ry cause of cancer death among both men and women . I n 2007, the death rate from lung cancer was an estimated 160,390 deaths, exceeding the combined tota l for breast, prostate and colon cancer (America Cancer Society, www.cancer.org) . Lu ng cancer is also the leadi ng cause of cancer death in all Eu ropea n cou ntries and is ra pid ly increasing in developing cou ntries. While environ mental factors, such as lifestyle factors (e.g. , smoking) and dieta ry factors, play an importa nt role in lung cancer, genetic factors also contribute to the disease . For exa mple, a family of enzymes responsible for carcinogen activation, degradation and subsequent DNA repair have been implicated in susceptibi lity to lung cancer. I n addition, an increased risk to familial members outside of the nuclear family has been shown by deCODE geneticists by ana lysing all lu ng ca ncer cases diagnosed in Icela nd over 48 yea rs. This increased risk could not be entirely accou nted for by smoking indicating that genetic variants may predispose certain individuals to lu ng ca ncer (Jonsson et.al. , JAMA 292(24) :2977-83 (2004) ; Amundadottir et.al., PLoS Med. I (3) :e65 (2004)) .

The five-year survival rate among all lu ng ca ncer patients, regard less of the stage of disease at diagnosis, is only 13% . This contrasts with a five-yea r survival rate of 46% among cases detected while the disease is still localized . However, only 16% of lu ng ca ncers are discovered before the disease has spread . Early detection is difficu lt as clinical symptoms are often not observed until the disease has reached an advanced stage. Cu rrently, diagnosis is aided by the use of chest x-rays, analysis of the type of cells contained in sputu m and fiberoptic exa mination of the bronchial passages. Treatment regimens are determined by the type and stage of the cancer, and include surgery, radiation therapy and/or chemotherapy. I n spite of considerable research into therapies for this and other cancers, lu ng cancer remains difficult to diagnose and treat effectively. Accordingly, there is a great need in the art for improved methods for detecting and treating such ca ncers.

Smoking of tobacco products, and in particular cigarettes, is the largest known risk factor lu ng cancer with a global attri buta ble proportion estimated to be approximately 90% in men and 80% in women . Although the risk of lu ng cancer associated with tobacco smoking is strongly related to duration of smoking, and declines with increasing time from cessation, the estimated lifetime risk of lung cancer among former smokers remains high, ranging from approximately 6% in smokers who give up at the age of 50, to 10% for smokers who give up at age 60, com pared to 15% for lifelong smokers and less than 1% in never-smokers (Peto et al. 2000 BMJ, 321, 323- 32, Bren nan, et al. 2006 Am J Epidemiol 164, 1233- 1241) . I n populations where the large majority of smokers have quit smoking, such as men in the US and UK, the majority of lung cancer cases now occu rs among ex-smokers (Doll et al. 1994 BMJ 309, 901-91 1, Zhu et al. 2001 Ca ncer Res, 61, 7825-7829) . This emphasizes the importance of developing alternative prevention measu res for lung cancer including the identification of high risk subgrou ps.

Nota bly, only about 15% of lifelong smokers wil l develop lu ng cancer by the age of 75, and approximately 5 to 10% of lifetime smokers will develop another tobacco-related cancer (Kjaerheim et al. 1998 Cancer Causes Control 9, 99- 108) . A possible explanation for this large differences in risk for individua ls with similar level of tobacco exposures could be that genetic factors play a determining role in lu ng cancer susceptibility (Spitz et al. 2005 J Clin Oncol 23, 267-275) . Identifying genes, which influence the risk of lu ng ca ncer, cou ld be important for severa l aspects of management of the disease. Segregation analyses predict that the majority of genetic risk for lu ng cancer is most li kely to be polygenic in nature, with multiple risk alleles that confer low to moderate risk and which may interact with each other and with environmenta l risk factors. Many studies have investigated lu ng ca ncer susceptibi lity based on the presence of low-penetrance, high-frequency single nucleotide polymorphisms in ca ndidate genes belonging to specific metabolic pathways. Genetic polymorphisms of xenobiotic meta bolism, DNA repair, cell-cycle control, immunity, addiction and nutritiona l status have been described as promising ca ndidates but have in many cases proven difficu lt to confirm (Hu ng et al. 2005 J Natl Ca ncer Inst 97, 567-576, Hu ng et al. 2006 Ca ncer Res 66, 8280-8286, La ndi et al. 2006 Carcinogenesis, in press, Bren na n et al.2005 La ncet 366, 1558-60, Hung et al. 2007 Ca rcinogenesis 28, 1334-40, Ca mpa et al. 2007 Ca ncer Causes Control 18, 449-455, Gemigna ni et al. 2007 Ca rcinogenesis 28(6), 1287-93, Hall et al. 2007 Ca rcinogenesis 28, 665-671, Ca mpa et al. 2005 Cancer Epidemiol Bioma rkers Prev 14, 2457- 2458, Campa et al. 2005 Cancer Epidemiol Bioma rkers Prev 14, 538-539, Hashibe et al. 2006 Ca ncer Epidemiol Bioma rkers Prev 15, 696-703) .

For cancers that show a familial risk of around two-fold such as lu ng cancer (Jonsson et al. 2004 JAMA 292, 2977-2983, Li and Hem min ki 2005 Lu ng Ca ncer 47, 30 1-307, Goldgar et al. 1994 J Natl Ca ncer Inst 86, 1600- 1608), the majority of cases will arise from approximately 10%- 15% of the population at greatest risk (Pharoa h et al. 2002 NatGenet 31, 33-36) . The identification of com mon genetic varia nts that affect the risk of lu ng ca ncer may ena ble the identification of individua ls who are at a very hig h risk because of their increased genetic susceptibility or, in the case of genes related to nicotine metabolism, because of their ina bility to quit smoking . Such findings could potential ly lead to chemoprevention prog rams for high risk individuals, and are especially of im portance given the high residua l risk that remains among ex-smokers, among whom the majority of lung cancers in the US and Europe now occu r. Common variants on chromosome 15 that confer risk of lu ng ca ncer have been descri bed (Thorgeirsson, T.E., et al. Nature 452 :638-42 (2008) ; Hu ng R.J ., et al. Nature 452 :633-7 (2008) ; Amos, C.I. , et al. Nat Genet 40 :616-22 (2008)) . The present invention relates to further genetic variants predictive of lu ng ca ncer risk in humans.

SUMMARY OF THE INVENTION The present invention is based on the finding that genetic varia nts in certain genetic regions contain varia nts that are correlated with risk of developing lu ng ca ncer in hu mans. Markers in the genomic regions 8pl l (e.g., rs6474412), 7pl4 (e.g., rs2 15614) and 19q l 3 (e.g., rs4105144) have been found to be indicative of lu ng cancer risk.

I n a first aspect, the invention provides a method of determining a susceptibility to lu ng cancer, the method comprising (a) obtaini ng sequence data about a hu man individua l identifying at least one allele of at least one polymorphic marker, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lung cancer in hu mans, and (b) determining a susceptibility to lu ng ca ncer from the sequence data, wherein the at least one polymorphic marker is a marker selected from the group consisting of rs6474412, rs215614 and rs4105144, and markers in lin kage disequilibrium therewith .

Another aspect relates to a method of determini ng a susceptibility to lung cancer, the method com prisi ng ana lyzing nucleic acid sequence data from a human individual for at least one polymorphic marker, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lu ng ca ncer in hu mans, wherein the marker is selected from the grou p consisting of rs6474412, rs215614 and rs4105144, and markers in lin kage disequilibriu m therewith, and determining a susceptibility to lung cancer from the nucleic acid sequence data .

Identification of risk may be based on a determi nation of the presence or absence of certain alleles indicative of lu ng cancer risk. Thus, a second aspect of the invention relates to a method of assessing a susceptibility to lu ng cancer in a hu man individua l, comprising (i) obtai ning sequence data about the individual for at least one polymorphic marker selected from the grou p consisting of rs6474412, rs215614 and rs4105 144, and markers in linkage disequi libriu m therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lu ng ca ncer in hu mans; (ii) identifying the presence or absence of at least one allele in the at least one polymorphic marker that correlates with increased occurrence of lu ng ca ncer in humans. I n a preferred embodiment, determination of the presence of the at least one allele identifies the individual as having elevated susceptibility to lu ng cancer, and determination of the absence of the at least one allele identifies the individual as not havi ng the elevated susceptibility.

Further provided is a method of identification of a marker for use in assessing susceptibility to lu ng ca ncer in hu man individuals, the method comprising (a) identifying at least one polymorphic marker in lin kage disequilibriu m with rs2 15614, rs6474412 or rs4105144; (b) obtai ning sequence information about the at least one polymorphic marker in a grou p of individuals diagnosed with lu ng ca ncer; and (c) obtaining sequence information about the at least one polymorphic marker in a grou p of control individuals; wherein determination of a significant difference in frequency of at least one allele in the at least one polymorphism in individuals diagnosed with lu ng ca ncer as compared with the frequency of the at least one allele in the control grou p is indicative of the at least one polymorphism is usefu l for assessing susceptibi lity to lung cancer. I n a preferred embodiment, an increase in frequency of the at least one allele in the at least one polymorphism in individuals diagnosed with lung cancer, as compa red with the frequency of the at least one allele in the control group, is indicative of the at least one polymorphism being usefu l for assessi ng increased susceptibility to lung cancer, and a decrease in frequency of the at least one allele in the at least one polymorphism in individua ls diag nosed with lu ng ca ncer, as com pared with the frequency of the at least one allele in the control grou p, is indicative of the at least one polymorphism being usefu l for assessi ng decreased susceptibility to, or protection against, lung cancer. The invention also relates to methods of prognosis and response to therapy. One such aspect provides a method of predicting prog nosis of an individua l diagnosed with lu ng ca ncer, the method comprising obtaining sequence data about a hu man individual identifying at least one allele of at least one polymorphic marker selected from the group consisting of rs2 15614, rs6474412 and rs4105 144, and markers in linkage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lu ng ca ncer in hu mans, and predicting prognosis of lung ca ncer from the sequence data . Another aspect provides a method of assessing probability of response of a huma n individual to a therapeutic agent for preventing, treati ng and/or ameliorating sym ptoms associated with lung cancer, comprising obtaining sequence data about a hu man individual identifying at least one allele of at least one polymorphic marker selected from the group consisting of rs6474412, rs215614 and rs4105144, and markers in lin kage disequilibriu m therewith, wherein different alleles of the at least one polymorphic marker are associated with different probabilities of response to the thera peutic agent in hu mans, and determining the proba bility of a positive response to the thera peutic agent from the sequence data .

Further provided is a kit for assessi ng susceptibility to lu ng ca ncer, the kit comprising reagents for selectively detecti ng at least one allele of at least one polymorphic marker in the genome of the individual, wherein the polymorphic marker is selected from the grou p consisti ng of rs6474412, rs215614 and rs4105 144, and markers in linkage disequilibriu m therewith, and a collection of data comprising correlation data between the at least one polymorphism and susceptibility to lu ng ca ncer.

Also provided is use of an oligonucleotide probe in the manufactu re of a diagnostic reagent for diagnosing and/or assessing a susceptibility to lu ng cancer, wherein the probe is ca pable of hybridizing to a segment of a nucleic acid whose nucleotide sequence is given by any one of SEQ I D NO :1-737, and wherein the segment is 15-400 nucleotides in length .

Computer-implemented aspects are also provided . One such aspect relates to a com puter- readable medium having computer executable instructions for determining susceptibility to lung cancer, the com puter readable medium com prising data indicative of at least one polymorphic marker, and a routine stored on the computer readable mediu m and adapted to be executed by a processor to determine risk of developing lu ng ca ncer for the at least one polymorphic marker, wherein the at least one polymorphic marker is selected from the grou p consisting of rs6474412, rs215614 and rs4105144, and markers in lin kage disequilibriu m therewith .

Another computer-im plemented aspect relates to an apparatus for determining a genetic indicator for lu ng ca ncer in a hu man individual, comprising (a) a processor, and (b) a computer readable memory having computer executa ble instructions adapted to be executed on the processor to ana lyze marker information for at least one hu man individua l with respect to at least one polymorphic marker selected from the group consisting of rs6474412, rs215614 and rs4105144, and markers in lin kage disequilibrium therewith, and generate an output based on the marker information, wherein the output comprises a measure of susceptibility of the at least one marker or haplotype as a genetic indicator of the condition for the hu man individua l.

It should be understood that all combinations of features described herein are contem plated, even if the com bination of featu re is not specifically found in the sa me sentence or pa ragraph herein . This includes in particular the use of all markers disclosed herein, alone or in com bination, for analysis individual ly or in haplotypes, in all aspects of the invention as described herein .

BRIEF DESCRIPTION OF THE DRAWINGS The foregoing and other objects, featu res and advantages of the invention will be apparent from the following more pa rticular description of preferred embodiments of the invention .

FIG 1 shows the genomic regions of association on 15q25 (A), 19q l 3 (B), and 8pl l (C) and 7pl4 (D) associated with smoking quantity (CPD) and lu ng cancer. Shown are the -logio association P values of SNPs in the region with CPD from the ENGAGE meta ana lysis (circles), the in silico replication studies (plus-signs), and joint ana lysis of ENGAGE, TAG, and OX-GSK GWA data (crosses), the SNP build 36 coordinates, the genes in the region and thei r exons and recombi nation rates in centimorgans (cM) per megabase ( Mb) (histog ram) .

FIG 2 provides a diagra m illustrating a computer-i mplemented system utilizi ng risk variants as described herein .

DETAILED DESCRIPTION Definitions Unless otherwise indicated, nucleic acid sequences are written left to rig ht in a 5' to 3' orientation . Nu meric ranges recited within the specification are inclusive of the nu mbers defini ng the range and include each integer or any non-integer fraction within the defined range. Unless defined otherwise, all tech nical and scientific terms used herein have the same mea ning as com monly understood by the ordinary person skilled in the art to which the invention pertains.

The following terms shal l, in the present context, have the meaning as indicated :

A "polymorphic marker", sometime referred to as a "marker", as described herein, refers to a genomic polymorphic site . Each polymorphic marker has at least two sequence variations cha racteristic of particular alleles at the polymorphic site. Thus, genetic association to a polymorphic marker implies that there is association to at least one specific allele of that pa rticu lar polymorphic marker. The marker can comprise any allele of any variant type fou nd in the genome, including SNPs, mini- or microsatel lites, translocations and copy num ber variations (insertions, deletions, duplications) . Polymorphic markers can be of any measu rable frequency in the population . For mapping of disease genes, polymorphic markers with popu lation frequency hig her than 5- 10% are in general most useful . However, polymorphic markers may also have lower population frequencies, such as 1-5% frequency, or even lower frequency, in pa rticu lar copy nu mber variations (CNVs) . The term shall, in the present context, be taken t o include polymorphic markers with any popu lation frequency.

An " allele" refers t o the nucleotide sequence of a given (position) on a chromosome . A polymorphic marker allele thus refers to the composition (i.e. , sequence) of the marker on a chromosome. Genomic DNA from an individual contains two alleles (e.g. , allele-specific sequences) for any given polymorphic marker, representative of each copy of the marker on each chromosome. Sequence codes for nucleotides used herein are : A = 1, C = 2, G = 3, T = 4 . For microsatellite alleles, the CEPH sample (Centre d'Etudes du Polymorphisme Humain, genomics repository, CEPH sam ple 1347-02) is used as a reference, the shorter allele of each microsatellite in this sam ple is set as 0 and all other alleles in other samples are num bered in relation t o this reference. Thus, e.g. , allele 1 is 1 bp longer than the shorter allele in the CEPH sample, allele 2 is 2 bp longer tha n the shorter allele in the CEPH sa mple, allele 3 is 3 bp longer than the lower allele in the CEPH sam ple, etc. , and allele - 1 is 1 bp shorter than the shorter allele in the CEPH sam ple, allele -2 is 2 bp shorter tha n the shorter allele in the CEPH sample, etc.

Sequence conucleotide ambiguity as described herein is as proposed by IUPAC-IUB. These codes are compatible with the codes used by the EMBL, GenBan k, and PIR data bases.

A nucleotide position at which more than one sequence is possible in a population (either a natu ral popu lation or a synthetic popu lation, e.g. , a library of synthetic molecules) is referred to herein as a "polymorphic site".

A "Single Nucleotide Polymorphism" or "SNP" is a DNA sequence variation occu rring when a sing le nucleotide at a specific location in the genome differs between members of a species or between paired chromosomes in an individua l. Most SNP polymorphisms have two alleles. Each individua l is in this instance either homozygous for one allele of the polymorphism (i.e. both chromosomal copies of the individual have the same nucleotide at the SNP location), or the individua l is heterozygous (i.e. the two sister chromosomes of the individual contain different nucleotides) . The SNP nomenclatu re as reported herein refers to the official Reference SNP (rs) I D identification tag as assigned to each unique SNP by the National Center for Biotechnological Information (NCBI) .

A "varia nt", as described herein, refers to a seg ment of DNA that differs from the reference DNA. A "marker" or a "polymorphic marker", as defined herein, is a variant. Alleles that differ from the reference are referred to as "variant" alleles.

A "microsatellite" is a polymorphic marker that has multiple small repeats of bases that are 2-8 nucleotides in length (such as CA repeats) at a particula r site, in which the nu mber of repeat lengths varies in the general population . An "indel" is a com mon form of polymorphism com prisi ng a small insertion or deletion that is typically only a few nucleotides long .

A "ha plotype," as described herein, refers to a segment of genomic DNA that is cha racterized by a specific combination of alleles arranged along the segment. For diploid organisms such as humans, a haplotype com prises one member of the pair of alleles for each polymorphic marker or locus along the segment. I n a certain embodiment, the haplotype can comprise two or more alleles, t hree or more alleles, four or more alleles, or five or more alleles. Haplotypes are described herein in the context of the marker na me and the allele of the marker in that haplotype, e.g. , "4 rs6474412" refers to the 4 allele of marker rs6474412 being in the haplotype, and is equiva lent to "rs6474412 allele 4". Fu rthermore, allelic codes in haplotypes are as for individua l markers, i.e . 1 = A, 2 = C, 3 = G and 4 = T.

The term "susceptibility", as described herein, refers to the proneness of an individua l towa rds the development of a certai n state (e.g. , a certain trait, phenotype or disease, e.g. lu ng cancer), or towards being less able to resist a pa rticu lar state than the average individua l. The term encom passes both increased susceptibi lity and decreased susceptibility. Thus, particu lar alleles at polymorphic markers of the invention as described herein (or ha plotypes comprising such markers) are characteristic of increased susceptibility (i.e. , increased risk) of lung cancer, as cha racterized by a relative risk (RR) or odds ratio (OR) of greater than one for the particu lar allele or haplotype. Other pa rticu lar alleles at the markers described herein are characteristic of decreased susceptibility (i.e ., decreased risk) of lung cancer, as characterized by a relative risk of less than one.

The term "and/or" shall in the present context be understood to indicate that either or both of the items connected by it are involved . I n other words, the term herein sha ll be taken to mea n "one or the other or both".

The term "look-up table", as described herein, is a table that correlates one form of data to another form, or one or more forms of data to a predicted outcome to which the data is relevant, such as phenotype or trait. For example, a look-u p table can com prise a correlation between allelic data for at least one polymorphic marker and a particular trait or phenotype, such as a pa rticu lar disease diag nosis, that an individual who comprises the particu lar allelic data is likely to display, or is more likely to display than individuals who do not comprise the particu lar allelic data . Look-up tables can be multidi mensional, i.e. they ca n contain information about multiple alleles for single markers simultaneously, or they ca n contain information about multiple markers, and they may also comprise other factors, such as particu lars about diseases diagnoses, racial information, bioma rkers, biochemica l measurements, therapeutic methods or drugs, etc.

A "computer-readable mediu m", is an information storage medium that ca n be accessed by a com puter using a commercially avai lable or custom-made interface. Exem plary computer- readable media include memory (e.g., RAM, ROM, flash memory, etc.), optical storage media (e.g. , CD-ROM), magnetic storage media (e.g., computer hard drives, floppy disks, etc.), punch cards, or other commercia lly available media . Information may be transferred between a system of interest and a mediu m, between computers, or between computers and the computer- readable medium for storage or access of stored information . Such transmission ca n be electrica l, or by other available methods, such as I R lin ks, wireless con nections, etc.

A "nucleic acid sa mple" as described herein, refers to a sample obtained from an individua l that contains nucleic acid (DNA or RNA) . I n certain embodiments, i.e . the detection of specific polymorphic markers and/or haplotypes, the nucleic acid sam ple com prises genomic DNA. Such a nucleic acid sa mple can be obtai ned from any source that contains genomic DNA, includi ng a blood sa mple, sample of amniotic fluid, sample of cerebrospinal fluid, or tissue sample from skin, muscle, buccal or conju nctival mucosa, placenta, gastrointestina l tract or other orga ns.

The term "lu ng ca ncer therapeutic agent" refers to an agent that can be used to ameliorate or prevent symptoms associated with lu ng cancer.

The term "lu ng ca ncer-associated nucleic acid", as described herein, refers to a nucleic acid that has been found to be associated to lu ng ca ncer. This includes, but is not limited to, the markers and haplotypes described herein and markers and haplotypes in strong lin kage disequilibriu m (LD) therewith .

The term "antisense agent" or "antisense oligonucleotide" refers, as described herein, to molecu les, or compositions comprising molecules, which include a sequence of pu rine an pyrimidine heterocyclic bases, supported by a backbone, which are effective to hydrogen bond to a corresponding contiguous bases in a target nucleic acid sequence . The backbone is com posed of su bunit backbone moieties supporting the pu rine and pyrimidine hetercyclic bases at positions which allow such hyd rogen bonding . These backbone moieties are cyclic moieties of 5 to 7 atoms in size, linked together by phosphorous-containing lin kage units of one to three atoms in length . I n certain preferred embodiments, the antisense agent comprises an oligonucleotide molecu le.

The term "LD block C07", as described herei n, refers to the Lin kage Disequilibrium (LD) block on Chromosome 7 between markers rs55661693 and rs2 15749, corresponding to position 32, 198, 199 - 32,424,097 of NCBI (Nationa l Center for Biotech nology Information) Build 36. The term "LD block C08", as described herei n, refers to the Lin kage Disequilibrium (LD) block on Chromosome 8 between markers s.42329845 (SEQ ID NO :431) and s.4316700 1 (SEQ I D NO :616), correspondi ng to position 42,329,845 - 43, 167,00 1 of NCBI, Build 36.

The term "LD block C19", as described herei n, refers to the Lin kage Disequilibrium (LD) block on Chromosome 19 between markers s.45831417 (SEQ ID NO :617) and rsl0416968, corresponding to position 45,83 1,417 - 46,099,477 of NCBI, Build 36.

Variants associated with risk of lung cancer in humans The present inventors have for the first time shown that certain genetic varia nts are associated with risk of lu ng cancer in huma ns. Certain polymorphic markers on chromosome 8pl l , 7pl4 and 19q l 3 have been found to associate with risk of lu ng cancer. Pa rticular alleles at markers in these genomic regions (e.g., rs2 15614 on chromosome 7pl4, rs6474412 on chromosome 8pl l and rs4105 144 on chromosome 19q l3) are found more frequently in individua ls with lu ng ca ncer than in the genera l population . These markers are therefore predictive of risk of lu ng cancer, i.e. individua ls carrying the particula r alleles (at-risk alleles) are at increased risk of developing lu ng cancer com pared with the genera l population . Markers that are in linkage disequilibriu m with these markers are also predictive of risk of lu ng ca ncer, as described in more detail herein .

Methods of determining susceptibility to lung cancer Accordingly, the present invention provides methods of determini ng a suscepti bility to lu ng cancer in a hu man individua l. A first aspect relates to a method of determi ning a susceptibility to lung cancer, the method comprising obtaining sequence data about a human individual identifyi ng at least one allele of at least one polymorphic marker, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lu ng ca ncer in humans, and determining a susceptibility to lu ng cancer from the sequence data, wherein the at least one polymorphic marker is a marker selected from the grou p consisti ng of rs6474412, rs215614 and rs4105144, and markers in lin kage disequilibriu m therewith .

I n certain embodiments, the sequence data is nucleic acid sequence data . Nucleic acid sequence data identifying particu lar alleles of polymorphic markers is sometimes also referred to as genotype data . Nucleic acid sequence data can be obtained for exam ple by analyzing sequence of the at least one polymorphic marker in a biologica l sa mple from the individua l. Alternatively, nucleic acid sequence data ca n be obtained in a genotype dataset from the huma n individual and ana lyzing sequence of the at least one polymorphic marker in the dataset. Such analysis in certain embodiments comprises determini ng the presence or absence of a pa rticular allele of specific polymorphic markers. Identification of particu lar alleles in general terms should be taken to mean that determination of the presence or absence of the allele(s) is made. Usua lly, determination of both allelic copies in the genome of an individual is performed, by determining the occurrence of all possi ble alleles of the pa rticu lar polymorphism in a pa rticu lar individua l (for SNPs, each of the two possible nucleotides possible for the allelic site) . It is also possible to determine whether only pa rticu lar alleles are present or not. For example, in certain embodiments, determination of the presence or absence of certain alleles that have been shown to associate with risk of kid ney cancer is made, but not necessarily other alleles of the particu lar marker, and a determination of susceptibility is made based on such determination . I n certain embodiments, sequence data about at least two polymorphic markers is obtained .

Alternatively, the allele that is detected can be the allele of the complementa ry strand of DNA, such that the nucleic acid sequence data includes the identification of at least one allele which is com plementary to any of the alleles of the polymorphic markers referenced above. For example, the allele that is detected may be the complementary C allele of the at-risk G allele of rsl058396, the complementa ry G allele of the at-risk C allele of rsl l877062, the complementary C allele of the at-risk G allele of rs2298720, or the complementa ry T allele to the at-risk A allele of rs2298720 .

I n certai n embodiments, the nucleic acid sequence data is obtained from a biologica l sa mple containing nucleic acid from the human individual . The nucleic acids sequence may suitably be obtained using a method that comprises at least one procedure selected from (i) amplification of nucleic acid from the biological sa mple; (ii) hybridization assay using a nucleic acid probe and nucleic acid from the biological sa mple; (iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of the biological sam ple, and (iv) sequencing, in pa rticu lar high-th rough put sequencing . The nucleic acid sequence data may also be obtained from a preexisting record . For exa mple, the preexisting record may comprise a genotype dataset for at least one polymorphic marker. I n certai n embodiments, the determining comprises compa ring the sequence data to a data base containing correlation data between the at least one polymorphic marker and susceptibility to lu ng ca ncer.

It is contemplated that in certain embodiments of the invention, it may be convenient to prepa re a report of results of risk assessment. Thus, certain embodiments of the methods of the invention comprise a further step of preparing a report containing resu lts from the determination, wherein said report is written in a computer readable medium, printed on paper, or displayed on a visual display. I n certain embodiments, it may be convenient to report resu lts of susceptibility to at least one entity selected from the grou p consisting of the individua l, a gua rdian of the individua l, a genetic service provider, a physicia n, a medica l organization, and a medical insu rer.

I n certain embodiments, markers on chromosome 8pl l that are predictive of lung cancer risk are markers associated with a gene selected from the grou p consisting of CHRNB3 and CHRNA6.

I n certain embodiments, markers on chromosome 7pl 4 that are predictive of lung cancer risk are markers associated with a gene selected from the grou p consisting of PDE1 C (phosphodiesterase 1C), LSM5 and AVL9 (KIAA0241 ) .

I n certain embodiments, markers on chromosome 19q l 3 that are predictive of lung cancer risk are markers associated with a gene selected from the grou p consisting of CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, and RAB4B. In certain embodiment, the marker associated with risk of lung cancer is a marker located within LD block C07, LD block C08, or LD block C19, as defined herein.

In certain embodiments, markers in linkage disequilibrium with rs215614 are selected from the group consisting of the markers rs55661693, rs2392052, s .32208500, rsl860222, rsl017085, rsl0951323, rsl0951324, rs2240676, rs6462343, rs7779445, rs7796264, rsl6875791, rsl860224, s .32222361, rsl2672267, rs719585, rs6945244, rs719586, rsl2531292, rsl2533732, rsll771370, rsl3241693, rsl3228936, rsl6875793, rsl7161043, rsl7161045, rsl7426873, rsl2669911, s .32229303, rsl7161049, s .32229594, rsl2701192, rsl3225493, rsl860225, s .32230966, rsl0233045, rsl0233473, rsll762455, rsll769301, s .32231924, s .32232040, rsl0951325, s .32232190, s .32232206, rsl0237329, rs7791872, s .32233149, s .32233418, s .32233449, rs4141108, s .32233694, rsl2531396, rs7803347, rsl2537174, rsl7161066, rsl3246764, s .32237653, s .32237781, s .32237796, rsl7161068, rsl0269368, s .32238062, s .32238131, s .32238187, s .32238385, s .32238637, s .32238720, s .32238770, rsl014242, s .32238891, s .32238954, rs7786576, rs7786797, rs7806224, s .32239995, s .32240628, s .32240965, s .32241373, rsl0216007, s .32241650, rsl3221037, s .32242123, s .32242180, s .32242305, s .32243452, rsl0215287, s .32243761, s .32243957, s .32244134, s .32244142, s .32244149, s .32244315, s .32244333, rs6977493, s .32246045, rs9639646, rsl2701200, rs73306623, rsl0259431, rs9639648, rsl0263751, rsl0263673, rs9638875, s .32249522, s .32249615, s .32250320, rsl2538475, rsl2538504, rsl2539063, rsl3238880, rsl3242197, rsll772510, s .32252191, rsl2701202, s .32252891, rsl0447633, rs7804687, s .32254102, rsl2701203, rsl2701204, rsl2701205, s .32254884, s .32254907, s .32256403, rsl0241729, s .32256486, rsl7161076, s .32257219, rsll975968, rs6955339, s .32257405, rs6955990, s .32257970, rsl2701206, s .32258116, rs6960114, rsl0236197, s .32258299, rsll773343, rsl0951326, rsl0951327, rs7798739, rsl3221985, rs929456, rs6977000, rs6977468, s .32261763, rsl7161087, rsl7161090, rsl2701209, rs6959931, s .32263762, rsl3224417, rs58894937, rs6975208, rsll762194, s .32266791, s .32266901, s .32267023, s .32267024, rs4723139, rs6947159, s .32268168, rs6948856, rs975122, rs7806397, rsl2537591, s .32271190, rs7796692, rs7780515, rs7780674, rs7801559, s .32273407, s .32273416, rs4368879, s .32274498, rs4370439, rsl7161127, rs7806417, s .32276650, rsl450869, rsl450870, rs7780377, rs6947060, s .32279209, rsl0951328, s .32280165, rs6977490, s .32280356, s .32280614, s .32280995, rs7778162, rs7778443, s .32281443, rsl0226228, s .32286305, rs2159237, rsl476765, s .32286984, rsll770877, s .32288268, s .32288404, s .32288470, s .32288482, s .32288491, s .32288538, s .32288631, rs9771228, rsl2540204, rsl2540232, s .32289625, rsl7161134, rs215596, s .32293784, rsll768207, s .32294392, s .32295805, s .32296223, rs215599, rsl0271037, s .32298436, rs215600, rs215601, rs215603, s .32303110, rs215605, rsl349399, s .32303968, s .32304331, s .32304468, rs215607, rs215608, rs215610, rsl2531858, rs59238577, rs215611, rs7780009, rs7780609, s .32309135, rs6952052, rs6952609, s .32310279, rsl2538119, rs7779181, rs6967626, rsl2536117, s .32312803, s .32313220, rs7779130, rs7778788, rs7779180, rs215614, s .32314071, s .32314173, s .32314645, s .32314726, s .32316109, s .32316473, s .32316498, s .32316632, s .32318702, s .32318703, s .32318774, s .32319085, s .32319226, rsl0951330, s .32320457, rs6462351, rsll981007, rs6462352, rsl0951331, rsll981809, rs6955946, s .32323398, s .32323503, rsl7161177, rs215622, s .32324242, s .32324582, rs215623, rs215624, rs215625, rs6943670, s .32325802, s .32325803, rsl3235908, rsl2531102, s .32326579, s .32326621, rs215629, rsl653876, rsl7161184, s .32328426, s .32328795, rs215630, rs6462353, rsl376281, rs215631, rs6462354, s .32333151, s .32333776, s .32333955, rslll5318, rs215632, s .32335243, rs215634, rs6955346, rs215635, rsll520787, rsl0264177, rsll514764, rs215636, rs215637, rs215638, rs215639, rs6977196, s .32340738, s .32340852, s .32343121, rsl0238006, s .32344007, rsl0447642, s .32344169, s .32344734, s .32345369, rs215669, rs215670, rs717757, s .32347343, s .32347366, s .32347375, s .32347376, s .32347462, rsl86229, s .32348187, rs215672, s .32348546, s .32348867, rsl653884, rs215674, s .32349420, rs215675, rs215676, rs387575, rs215677, s .32350287, s .32350891, rsl70011, rs215678, rs215679, rs215680, rs215681, rsl653887, rsl653888, rsl668386, rsl653889, rsl668387, rsl668388, rsl668389, s .32353491, rsl668390, rs690247, rs690250, rsl83347, rsl77362, rsl77363, rsl653890, rs215682, rs215683, rs215684, rs215685, rsl77364, s .32355034, rs215686, rs215687, rs215690, rsl376284, s .32355855, s .32355892, rs35554640, s .32356066, rsl376286, s .32356229, rsl376287, rsl013772, rsl013771, s .32357132, s .32357194, s .32357656, s .32358191, rsl668393, rsl3227922, rs215692, rs6979697, s .32362045, rs412876, rs7777166, rs7808851, rs215694, rs4723146, rs215695, rs215696, rs215697, rs215698, rs4723147, rsl653891, s .32364994, s .32365034, s .32365036, rsl668394, rsl77365, rs215699, rs215700, rsl653892, rs215702, rsl0486507, s .32380353, rs2099306, rsl70016, rsl0236370, s .32413299, rsll23893, rs384711, rs430356, rs730725, rs435584, s .32414636, rsl72558, rs7457071 and rs215749, which are the markers set forth in Table 1 .

In certain embodiments, markers in linkage disequilibrium with rs6474412 are selected from the group consisting o f the markers s .42329845, s .42601955, s .42618302, rs7013926, rsl2156092, rsl868860, rsl530850, rs6989472, s.42626122, rsll785591, rsl947295, rsl376442, rs4737060, s.42631425, s .42632599, rs34842664, rsl868859, rsl868858, rsl0958724, rs28441235, rs7006469, rsl0097384, rs4305884, s.42639491, s.42639835, rs6990603, rs34456987, rsl0958725, rs36057318, s.42644785, rs7837296, s.42646284, s.42646543, rs5005909, rsl979140, rsl3277840, s .42651452, s.42652114, rsll783507, rs7816726, rsl0958726, rs7842601, s.42656893, rs4295650, s.42659223, rs6474411, s .42661513, s.42661651, rsl3273442, s.42664453, s.42664514, s .42664708, s .42664984, rsl451239, rsl451240, s.42666045, rs4736835, s .42666333, s .42666490, rs6987704, rsl530847, s.42668648, rsl955185, rsl3277254, rsl3280301, rsl3277524, rs6474412, rs6474413, rs7004381, rs6985052, rs4950, rsl530848, rs9643891, rsl3280604, rs6997909, rs6474414, s.42679689, rs6474415, rs4951, rsl3263434, s.42692248, rsll783289, rs4236926, rsl3261190, s .42698182, rsl6891561, rs7459838, rs55828312, rs7017612, rs6984031, s.42718611, s.42721825, rs7822100, rs7825907, rs6982753, rs9298628, rs9298629, rs7824155, s.42726264, rs7824614, s .42726938, rs2304297, rs7845663, rs7812298, rs7004108, s.42728158, s .42728590, s .42729587, s.42729589, s .42731443, s.42731535, s.42731657, s.4273 1840, s.42732056, rsl01 10332, s.42732673, rs892413, rsl0087172, rs4398905, rs2196128, rs2 196129, rs22 17732, rsl072003, s.427413 17, s.4274423 1, rsl0 109040, rsl689 1620, s.42745575, s.42745581, rs4737069, rs2 117225, rs2 164024, rs7828365, rsl0 107450, s.42750441, s.42752754, s.42753787, s.42756840, s.42757418, rsl0092346, s.42759608, s.42760352, s.427613 17, s.42761892, s.42761909, s.42761965, rsl960346, s.42763404, s.42763808, s.42764045, s.42765003, rs456703 1, s.4276542 1, s.42769020, rs473707 1, rs6474420, s.42770829, s.42771871, rsl l986893, s.42772405, s.42779341, s.42780448, rs6985527, s.42782778, s.42782938, s.42783806, s.42784397, s.42786368, s.42787290, s.42787506, s.42789990, s.42806166, rsl0087388, s.42837205, s.42844765, s.42859016, s.42886979, s.42936582, rsl0106661, rs34727690, s.43009750, s.430 185 19, s.43 113569, s.43 150703, s.43 166986, s.43 167001, which are the markers set forth in Table 2.

I n certai n embodiments, markers in li nkage disequilibriu m with rs4105 144 are selected from the grou p consisting of the markers s.45831417, s.458597 17, rsl l083565, rs2561537, rs2604885, rs7260405, rs2607420, rs2369302, rs2254343, rsl457141, rs2604874, s.45950500, s.45950502, rs2607415, rs2607414, rs22790 11, rs2249835, rs2607424, rs2604869, rsl2973666, rs2604893, rs2644898, rs7252227, rs7937, rs2644916, rs3733828, rs4803372, s.460 14685, s.46014686, rs4803373, s.460 19 153, s.460 19626, s.46020036, s.4602401 1, s.46024197, s.46024672, rsl l670760, rsl2459249, rs725 1418, rs7251418, rs725 1570, rs434339 1, rs434339 1, rs7245507, s.46037829, rsl l08357 1, rs2302989, rs23 16213, rs8192725, rs72507 13, rsl l37 115, rs4105 144, rsl0404667, rs400 192 1, rs8102683, rs8102683, rs8105704, rsl496402, rsl496402, rsl2610432, s.460620 16, rsl2461383, rsl2461383, rs4570984, s.46067487, s.46067573, s.4606791 1, rsl l882981, rs7247469, rs725 13 15, rs650895 1, s.46075055, rs3869579, s.46075829, s.46075942, s.46077574, rsl2973598, s.46077976, s.46078049, s.46078122, s.46078260, s.46078326, s.46078327, s.46078334, s.46078367, s.46078384, s.46078387, s.46078424, s.46079081, s.46079 140, s.46080547, s.46080617, s.46082249, s.46083888, rs3875 159, rs28503746, rs3909341, rs4105142, rs4105141, rs5007415, s.46085777, rsl041 1264, rs67421541, s.46086876, s.46087595, rsl l083582, rs3909342, rs4803397, rs4803397, s.46088755, rs3852870, s.46089339, rs8103444, rs3865457, rs4803398, rs6508953, rsl0419393, rs7343061, rs4803400, rs7254188, rsl0416968, which are the markers set forth in Table 3 .

Surrogate markers in linkage disequili briu m with pa rticular key markers can in general be selected based on any particula r nu merical values of the li nkage disequilibrium measu res D' and r2, as described further herein . For example, markers that are in lin kage disequilibrium with rs215614 are exemplified by the markers listed in Table 1 herein, but the skilled person wil l appreciate that other markers in lin kage disequilibriu m with these markers may also be used in the diagnostic applications described herei n. Likewise, exemplary surrogate markers in lin kage disequili brium with rs6474412 are listed in Table 2 herein, and exemplary surrogate markers in lin kage disequilibriu m with rs4105 144 are listed in Table 3 herein . Further, as also described in more detail herei n, the skilled person wil l appreciate that since lin kage disequilibriu m is a continuous measu re, certain values of the LD measures D' and r2 may be suita bly chosen t o define markers that are usefu l as su rrogate markers in LD with the markers described herein . Nu meric values of D' and r2 may thus in certain embodi ments be used t o define marker subsets that fulfill certain nu merical cutoff values of D' and/or r2. I n one embodiment, markers in lin kage disequilibriu m with a pa rticular anchor marker (e.g., rs215614, rs6474412 and/or rs4105144) are in LD with the anchor marker characterized by numerica l values of D' greater than 0.8 and/or numerica l values of r2 of greater than 0 .2 . I n one embodiment, markers in lin kage disequilibriu m with a pa rticular anchor marker are in LD with the anchor marker cha racterized by nu merical values of r2 greater than 0 .2 . The markers provided in Tables 1 t o 3 provide exemplary markers that fulfi ll this criterion . I n other embodiments, markers in linkage disequili brium with a pa rticu lar anchor marker are in LD with the anchor marker cha racterized by numerical values of r2 of greater tha n 0 .3, greater than 0.4, greater than 0 .5, greater than 0.6, greater tha n 0 .7, greater tha n 0 .8, greater tha n 0 .9, and greater than 0 .95 . I n certain embodiments, such markers are selected from the markers listed in Tables 1 t o 3, using the information on LD measures provided in the tables. I n other words, other suitable numerica l values of r2 and/or D' may be used t o select markers that are in LD with the anchor marker. The stronger the LD, the more similar the association signa l and/or the predictive risk by the surrogate marker will be t o that of the anchor marker. Markers with values of r2 = 1 t o the anchor marker are perfect su rrogates of the anchor marker and wil l provide identica l association and risk prediction data . I n one preferred embodiment, suita ble surrogate markers are those markers that have values of r2 to an anchor marker of greater than 0 .8.

Further, as described in more detail in the followi ng, LD may be determined in samples from any pa rticu lar popu lation . I n one embodiment, LD is determined in Caucasian sa mples. I n another embodiment, LD is determined in Eu ropea n samples. I n other embodiments, LD is determined in Africa n American sa mples, in Asian samples, or the LD may be suitably determined in samples of any other population .

Table 1. Surrogate markers of anchor marker rs215614 (SEQ I D NO :252) on Chromosome 7pl4. Markers were selected using data from Caucasian HapMap dataset or the publically available 1000 Genomes project (http ://www. 1000genomes.org). Markers that have not been assigned rs names are identified by their position in NCBI Build 36 of the human genome assembly. Shown are predicted risk alleles for the surrogate markers, i.e. alleles that are correlated with the risk allele of the anchor marker, rs2 15614 allele G. Linkage disequilibrium measures D' and 2, and corresponding p-value, are also shown. The last column refers to the sequence listing number, identifying the particular SNP.

Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq I D NO: 36 rs5566 1693 T 32198199 0.88 0 .23 0 .00062 1 rs2392052 G 32199038 0.88 0 .23 0 .00062 2 s.32208500 A 32208500 0.87 0 .2 0.00081 3 rsl860222 C 32208505 0.87 0 .2 0 .00081 4 rsl0 17085 C 32213984 0.69 0 .29 0 .00012 5 rsl095 1323 G 32214715 0.680907 0 .20 1028 0 .00007894 6 rsl095 1324 G 32215509 0.76 0 .24 0 .00016 7 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rs2240676 A 32215741 0.76 0.24 0.00016 8 rs6462343 G 32216583 0.76 0.24 0.00016 9 rs7779445 T 32216842 1 0.306594 1.57E- 12 10 rs7796264 A 32216898 0.76 0.24 0.00016 11 rsl6875791 G 32222133 0.782372 0.43 1935 1.27E-09 12 rsl860224 A 32222167 0.8033 17 0.427077 1. 18E- 11 13 s.3222236 1 A 3222236 1 1 0.25 5.80E-09 14 rsl2672267 G 32225259 1 0.255268 1.06E- 10 15 rs719585 C 32225610 1 0.22366 8.58E- 10 16 rs6945244 T 32225798 0.92 1077 0.684761 5.74E-20 17 rs719586 c 32226042 1 0.23 2.00E-08 18 rsl253 1292 c 32226358 1 0.23 2.00E-08 19 rsl2533732 G 32226529 1 0.23 2.00E-08 20 rsl l77 1370 T 32226690 0.92 0.4 7.50E-07 21 rsl3241693 T 32227618 1 0.23 2.00E-08 22 rsl3228936 c 32227634 1 0.2 2.20E-07 23 rsl6875793 A 32227755 1 0.23 2.00E-08 24 rsl7 16 1043 A 32227806 1 0.330959 1.85E- 11 25 rsl7 16 1045 C 32227983 0.96 0.82 6.90E- 17 26 rsl7426873 G 32228073 1 0.2 2.20E-07 27 rsl26699 11 A 32228902 0.887848 0.7625 15 6.27E-22 28 s.32229303 C 32229303 1 0.23 2.00E-08 29 rsl7 16 1049 G 32229427 1 0.21 6.80E-08 30 s.32229594 T 32229594 0.96 0.82 6.90E- 17 31 rsl270 1192 G 32229886 1 0.23 2.00E-08 32 rsl3225493 T 32230126 1 0.2 2.20E-07 33 rsl860225 G 32230236 0.96 0.82 6.90E- 17 34 s.32230966 C 32230966 1 0.25 5.80E-09 35 rsl0233045 A 3223 1017 0.923436 0.795023 3.78E-23 36 rsl0233473 T 3223 1532 0.934931 0.458387 7.21E- 12 37 rsl l762455 G 3223 1657 1 0.23 2.00E-08 38 rsl l769301 G 3223 1725 1 0.23 2.00E-08 39 s.32231924 T 3223 1924 1 0.2 2.20E-07 40 s.32232040 G 32232040 1 0.23 2.00E-08 4 1 rsl095 1325 C 32232070 1 0.2 1.20E-08 42 s.32232 190 G 32232190 1 0.2 2.20E-07 43 s.32232206 G 32232206 1 0.2 2.20E-07 44 rsl0237329 C 32232250 0.923436 0.795023 3.78E-23 45 rs7791872 A 32233048 1 0.2 2.20E-07 46 s.32233 149 T 32233149 1 0.25 5.80E-09 47 s.32233418 A 32233418 1 0.23 2.00E-08 48 s.32233449 G 32233449 1 0.2 2.20E-07 49 rs4141 108 T 32233484 0.9349 18 0.458374 1.46E- 11 50 s.32233694 G 32233694 1 0.23 2.00E-08 51 rsl253 1396 T 32235732 1 0.23 2.00E-08 52 rs7803347 c 32235840 1 0.23 2.00E-08 53 rsl2537174 G 32236562 1 0.2 2.20E-07 54 rsl7 16 1066 A 32237383 1 0.2 2.20E-07 55 rsl3246764 C 32237506 1 0.2 2.20E-07 56 s.32237653 A 32237653 1 0.2 2.20E-07 57 s.32237781 T 32237781 1 0.2 2.20E-07 58 s.32237796 A 32237796 1 0.2 2.20E-07 59 rsl7 16 1068 G 32237849 1 0.2 2.20E-07 60 rsl0269368 G 32238039 0.943973 0.495664 4.45E- 14 61 s.32238062 C 32238062 1 0.2 2.20E-07 62 s.3223813 1 A 3223813 1 1 0.2 2.20E-07 63 s.32238187 T 32238187 0.96 0.82 6.90E- 17 64 s.32238385 A 32238385 1 0.2 2.20E-07 65 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 s.32238637 G 32238637 0.93 0.45 1.40E-07 66 s.32238720 G 32238720 1 0.2 2.20E-07 67 s.32238770 G 32238770 1 0.2 2.20E-07 68 rsl0 14242 C 32238830 0.916847 0.68787 7.32E- 19 69 s.3223889 1 C 3223889 1 1 0.2 2.20E-07 70 s.32238954 A 32238954 0.96 0.82 6.90E- 17 71 rs7786576 C 32239239 0.874374 0.6393 19 2.09E- 16 72 rs7786797 C 32239480 1 0.2 2.20E-07 73 rs7806224 C 32239632 0.95707 0.735702 1.37E- 19 74 s.32239995 A 32239995 1 0.2 2.20E-07 75 s.32240628 C 32240628 1 0.2 2.20E-07 76 s.32240965 T 32240965 1 0.2 2.20E-07 77 s.32241373 A 32241373 1 0.2 2.20E-07 78 rsl0216007 T 32241504 0.9 0.26 2.90E-05 79 s.32241650 A 32241650 1 0.2 2.20E-07 80 rsl322 1037 G 32241842 1 0.2 2.20E-07 81 s.32242 123 C 32242123 1 0.2 2.20E-07 82 s.32242 180 C 32242180 1 0.29 4.50E- 10 83 s.32242305 C 32242305 1 0.2 2.20E-07 84 s.32243452 G 32243452 1 0.29 4.50E- 10 85 rsl0215287 C 32243475 1 0.2 2.20E-07 86 s.3224376 1 C 3224376 1 1 0.29 4.50E- 10 87 s.32243957 G 32243957 1 0.23 1.20E-09 88 s.32244134 T 32244134 0.9 0.26 2.90E-05 89 s.32244142 G 32244142 0.96 0.86 2.90E- 18 90 s.32244149 C 32244149 1 0.2 1.20E-08 91 s.32244315 T 32244315 1 0.2 2.20E-07 92 s.32244333 T 32244333 1 0.2 2.20E-07 93 rs6977493 c 3224549 1 1 0.2 2.20E-07 94 s.32246045 G 32246045 1 0.2 2.20E-07 95 rs9639646 G 32246768 0.93 1593 0.387448 2.61E- 10 96 rsl270 1200 C 32247097 1 0.2 2.20E-07 97 rs73306623 A 32247340 0.93 0.45 1.40E-07 98 rsl0259431 C 32247922 0.912032 0.669887 2.89E- 17 99 rs9639648 A 32248667 0.929026 0.360581 5.74E- 10 100 rsl0263751 G 32248982 1 0.2 2.20E-07 101 rsl0263673 T 32249044 0.927154 0.3679 18 1.38E-09 102 rs9638875 A 32249439 0.92333 0.794842 7.59E-23 103 s.32249522 T 32249522 1 0.2 2.20E-07 104 s.32249615 G 32249615 1 0.2 2.20E-07 105 s.32250320 G 32250320 1 0.2 2.20E-07 106 rsl2538475 A 32250709 1 0.2 2.20E-07 107 rsl2538504 A 32250763 1 0.2 2.20E-07 108 rsl2539063 T 3225 1008 1 0.2 2.20E-07 109 rsl3238880 A 3225 1424 1 0.2 2.20E-07 110 rsl3242197 A 32252028 1 0.2 2.20E-07 111 rsl l7725 10 A 32252162 0.96 0.82 6.90E- 17 112 s.32252 19 1 A 3225219 1 1 0.25 5.80E-09 113 rsl270 1202 A 32252269 1 0.2 2.20E-07 114 s.3225289 1 C 3225289 1 1 0.29 4.50E- 10 115 rsl0447633 A 32253083 1 0.342466 6.62E- 12 116 rs7804687 A 32253844 1 0.2 2.20E-07 117 s.32254102 C 32254102 1 0.2 2.20E-07 118 rsl270 1203 A 32254285 1 0.2 2.20E-07 119 rsl270 1204 A 32254300 1 0.2 2.20E-07 120 rsl270 1205 C 32254406 1 0.2 2.20E-07 121 s.32254884 G 32254884 1 0.2 2.20E-07 122 s.32254907 T 32254907 1 0.2 2.20E-07 123 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 s.32256403 C 32256403 1 0.2 2.20E-07 124 rsl0241729 G 32256404 0.91 0.34 1.70E-06 125 s.32256486 G 32256486 1 0.29 4.50E- 10 126 rsl7 16 1076 G 32257090 0.849569 0.286191 5.75E-08 127 s.32257219 C 32257219 0.91 0.34 1.70E-06 128 rsl l975968 G 32257255 1 0.342466 6.62E- 12 129 rs6955339 C 32257367 0.96 0.82 6.90E- 17 130 s.32257405 C 32257405 1 0.25 5.80E-09 131 rs6955990 G 32257826 1 0.2 2.20E-07 132 s.32257970 A 32257970 1 0.27 1.60E-09 133 rsl270 1206 C 32257988 0.96 0.82 6.90E- 17 134 s.322581 16 C 322581 16 0.96 0.82 6.90E- 17 135 rs6960 114 G 32258124 0.96 0.82 6.90E- 17 136 rsl0236197 C 32258286 0.919969 0.79 1593 5.47E-22 137 s.32258299 A 32258299 0.96 0.82 6.90E- 17 138 rsl l773343 T 32258841 0.950079 0.581454 6.27E- 16 139 rsl095 1326 c 32259097 1 0.342466 6.62E- 12 140 rsl095 1327 T 32259283 1 0.57 1.20E- 18 141 rs7798739 A 32259486 0.919363 0.7822 19 2.01E-20 142 rsl322 1985 A 32259510 0.950816 0.589346 1.16E- 16 143 rs929456 G 32260169 0.923436 0.795023 3.78E-23 144 rs6977000 T 3226 1546 1 0.2 2.20E-07 145 rs6977468 A 3226 1700 1 0.2 2.20E-07 146 s.32261763 A 3226 1763 1 0.29 4.50E- 10 147 rsl7 16 1087 C 32262294 1 0.342466 6.62E- 12 148 rsl7 16 1090 G 32262316 1 0.2 2.20E-07 149 rsl270 1209 A 32262583 1 0.2 2.20E-07 150 rs695993 1 G 32263757 0.96 0.82 6.90E- 17 151 s.32263762 T 32263762 0.96 0.82 6.90E- 17 152 rsl3224417 A 322651 18 0.952005 0.60945 4.22E- 17 153 rs58894937 T 32265423 1 0.2 2.20E-07 154 rs6975208 A 32266228 1 0.57 1.20E- 18 155 rsl l762194 A 32266694 0.950348 0.581788 3.12E- 16 156 s.3226679 1 C 3226679 1 1 0.25 5.80E-09 157 s.3226690 1 A 3226690 1 1 0.27 1.60E-09 158 s.32267023 C 32267023 0.83 0.62 4.50E- 11 159 s.32267024 C 32267024 0.96 0.82 6.90E- 17 160 rs4723 139 G 32267242 0.96 0.82 6.90E- 17 161 rs6947 159 T 32267968 1 0.2 2.20E-07 162 s.32268168 A 32268168 1 0.25 5.80E-09 163 rs6948856 A 32268872 0.902096 0.666863 1.51E- 15 164 rs975122 A 32269319 0.947836 0.581536 3.64E- 15 165 rs7806397 T 32269864 0.919255 0.777792 1.43E-20 166 rsl2537591 c 32270244 1 0.2 2.20E-07 167 s.32271 190 A 3227 1190 1 0.25 5.80E-09 168 rs7796692 G 3227 1390 0.94922 0.59 1575 5.40E- 16 169 rs7780515 T 3227 1799 0.92333 0.794842 7.59E-23 170 rs7780674 T 3227 1916 1 0.236678 4.22E- 10 171 rs7801559 G 3227 1936 0.96 0.82 6.90E- 17 172 s.32273407 A 32273407 0.96 0.86 3.30E- 18 173 s.32273416 G 32273416 0.96 0.86 3.30E- 18 174 rs4368879 C 32274450 0.920996 0.817498 5.51E-23 175 s.32274498 C 32274498 1 0.9 3.20E-30 176 rs4370439 C 32274626 0.952005 0.60945 4.22E- 17 177 rsl7 16 1127 C 32275242 1 0.314815 2.77E- 10 178 rs7806417 C 32276626 1 0.6 2.00E- 19 179 s.32276650 G 32276650 1 0.23 1.20E-09 180 rsl450869 G 32278197 0.9242 12 0.824382 6.97E-24 181 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rsl450870 T 3227825 1 0.890378 0.792774 7.66E-23 182 rs7780377 A 32278384 1 0.330959 1.85E- 11 183 rs6947060 C 3227867 1 1 0.6 2.00E- 19 184 s.32279209 G 32279209 1 0.27 1.60E-09 185 rsl095 1328 C 32279366 1 0.6 2.00E- 19 186 s.32280 165 G 32280165 0.96 0.86 3.30E- 18 187 rs6977490 A 3228035 1 1 0.6 2.00E- 19 188 s.32280356 A 32280356 1 0.6 2.00E- 19 189 s.32280614 G 32280614 1 0.6 2.00E- 19 190 s.32280995 T 32280995 1 0.6 2.00E- 19 191 rs7778162 c 32281009 0.952407 0.616547 1.57E- 17 192 rs7778443 T 32281215 0.892731 0.795031 3.03E-23 193 s.32281443 G 32281443 1 0.6 2.00E- 19 194 rsl0226228 G 32282138 0.925327 0.82756 1.76E-24 195 s.32286305 A 32286305 0.96 0.86 3.30E- 18 196 rs2159237 A 32286594 1 0.6 2.00E- 19 197 rsl476765 G 32286983 0.924185 0.826363 5.46E-24 198 s.32286984 T 32286984 0.96 0.86 3.30E- 18 199 rsl l770877 c 32286987 0.96 0.86 3.30E- 18 200 s.32288268 A 32288268 0.96 0.86 3.30E- 18 201 s.32288404 G 32288404 1 0.22 2.70E-09 202 s.32288470 G 32288470 1 0.22 2.70E-09 203 s.32288482 A 32288482 1 0.22 2.70E-09 204 s.3228849 1 C 3228849 1 1 0.93 1.60E-31 205 s.32288538 A 32288538 1 0.22 2.70E-09 206 s.3228863 1 A 3228863 1 1 0.22 2.70E-09 207 rs9771228 C 3228902 1 0.92 1447 0.7633 13 6.37E-22 208 rsl2540204 C 32289359 0.96 0.86 3.30E- 18 209 rsl2540232 C 32289486 0.9261 18 0.85712 9.89E-25 210 s.32289625 C 32289625 1 0.27 1.60E-09 211 rsl7 16 1134 T 32290104 1 0.363322 1.54E- 12 212 rs215596 A 32292898 0.892731 0.795031 3.03E-23 213 s.32293784 T 32293784 1 0.27 1.60E-09 214 rsl l768207 c 32293832 0.952407 0.616547 1.57E- 17 215 s.32294392 c 32294392 1 0.28 4.70E- 11 216 s.32295805 T 32295805 1 0.27 1.60E-09 217 s.32296223 G 32296223 0.96 0.86 3.30E- 18 218 rs215599 C 32296654 0.887325 0.756523 4.05E-21 219 rsl027 1037 T 3229686 1 0.952407 0.616547 1.57E- 17 220 s.32298436 c 32298436 1 0.2 2.20E-07 221 rs215600 G 32300167 0.96 1738 0.859477 8.13E-26 222 rs215601 A 32300446 0.925402 0.827693 8.77E-25 223 rs215603 C 3230 1405 1 0.93 1.60E-31 224 s.32303 110 T 323031 10 1 0.2 2.20E-07 225 rs215605 G 32303490 1 1 2.10E-36 226 rsl349399 A 32303864 1 0.363322 1.54E- 12 227 s.32303968 C 32303968 1 1 3.90E-35 228 s.3230433 1 T 3230433 1 1 0.2 2.20E-07 229 s.32304468 c 32304468 1 0.31 1.20E- 10 230 rs215607 G 32304862 1 0.407286 1.17E- 13 231 rs215608 C 32305081 1 0.25 5.80E-09 232 rs2156 10 G 323061 19 1 0.416667 2.74E- 14 233 rsl253 1858 A 32306624 1 0.409956 5.83E- 14 234 rs59238577 A 32306870 1 0.27 1.60E-09 235 rs2156 11 C 32307963 1 0.963729 1.49E-33 236 rs7780009 A 32308068 1 0.399358 1.62E- 13 237 rs7780609 A 32308441 1 0.224793 3.98E-08 238 s.32309 135 C 32309135 1 0.2 2.20E-07 239 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rs6952052 G 32309418 1 0.2 2.20E-07 240 rs6952609 G 32309860 1 0.388199 2.71E- 13 241 s.32310279 T 32310279 1 0.25 5.80E-09 242 rsl25381 19 c 3231 1169 1 0.2443 6.45E-09 243 rs7779 181 c 3231 1808 1 0.416667 2.74E- 14 244 rs6967626 c 32312312 1 0.45 5.30E- 15 245 rsl25361 17 T 32312397 1 0.2 2.20E-07 246 s.32312803 T 32312803 1 0.27 1.60E-09 247 s.32313220 c 32313220 1 0.2 2.20E-07 248 rs7779 130 T 32313483 1 0.3846 15 5.57E- 13 249 rs7778788 c 32313499 1 0.416667 2.74E- 14 250 rs7779 180 G 32313727 1 0.428571 1.56E- 14 251 rs2 15614 G 32313860 1 1 0 252 s.3231407 1 C 3231407 1 1 0.2 2.20E-07 253 s.32314173 G 32314173 1 0.25 5.80E-09 254 s.32314645 T 32314645 1 0.25 5.80E-09 255 s.32314726 c 32314726 1 0.27 1.60E-09 256 s.32316 109 T 32316109 1 1 3.90E-35 257 s.32316473 G 32316473 1 0.34 5.60E- 13 258 s.32316498 G 32316498 1 1 3.90E-35 259 s.32316632 G 32316632 1 1 3.90E-35 260 s.32318702 C 32318702 1 0.25 5.80E-09 261 s.32318703 A 32318703 1 0.25 5.80E-09 262 s.32318774 C 32318774 1 0.25 5.80E-09 263 s.32319085 A 32319085 1 0.25 5.80E-09 264 s.32319226 T 32319226 1 0.25 5.80E-09 265 rsl095 1330 A 32320302 1 0.2 2.20E-07 266 s.32320457 G 32320457 1 0.25 5.80E-09 267 rs646235 1 C 3232075 1 1 0.310296 7.74E- 11 268 rsl l981007 C 32320832 1 0.205368 1.51E-07 269 rs6462352 T 32320870 1 0.363322 1.54E- 12 270 rsl095 1331 G 32320899 1 0.418182 4.33E- 14 271 rsl l981809 T 3232 1312 1 0.2 2.20E-07 272 rs6955946 A 32322564 1 0.363322 1.54E- 12 273 s.32323398 G 32323398 1 0.2 2.20E-07 274 s.32323503 T 32323503 1 0.2 2.20E-07 275 rsl7 16 1177 A 32323543 1 0.322034 2.77E- 11 276 rs215622 C 32324184 1 0.892857 5.65E-30 277 s.32324242 T 32324242 1 0.2 2.20E-07 278 s.32324582 T 32324582 1 0.31 1.20E- 10 279 rs215623 A 32324690 1 0.346507 2.84E- 11 280 rs215624 G 32324698 1 0.363322 1.54E- 12 281 rs215625 G 32324838 1 0.864979 1.75E-29 282 rs6943670 C 32325702 1 0.2 2.20E-07 283 s.32325802 T 32325802 1 0.52 4.10E- 17 284 s.32325803 G 32325803 1 0.89 1.10E-29 285 rsl3235908 T 32326014 1 0.2 2.20E-07 286 rsl253 1102 c 32326553 1 0.2 2.20E-07 287 s.32326579 A 32326579 1 0.2 2.20E-07 288 s.3232662 1 C 3232662 1 1 1 3.90E-35 289 rs215629 G 32326989 1 0.862069 4.74E-29 290 rsl653876 T 32327144 1 0.652174 1.42E-21 291 rsl7 16 1184 c 32327864 1 0.322034 2.77E- 11 292 s.32328426 G 32328426 1 0.2 2.20E-07 293 s.32328795 A 32328795 1 0.2 2.20E-07 294 rs215630 G 32330037 1 0.363322 1.54E- 12 295 rs6462353 C 32330668 1 0.363322 1.54E- 12 296 rsl376281 T 32330805 1 0.341985 1.81 E- 11 297 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rs215631 C 3233 1499 1 0.89 1.10E-29 298 rs6462354 G 32333008 1 0.600465 7.80E-20 299 s.32333 15 1 A 3233315 1 1 0.55 7.40E- 18 300 s.32333776 A 32333776 1 0.27 1.60E-09 301 s.32333955 T 32333955 1 0.29 2.00E- 11 302 rsl l l5318 A 32334175 1 0.604886 1.77E- 19 303 rs215632 A 32335049 1 0.963293 4.28E-33 304 s.32335243 C 32335243 1 0.86 2.40E-28 305 rs215634 A 32335673 0.964393 0.929302 9.57E-30 306 rs6955346 C 32336078 1 0.864979 1.75E-29 307 rs215635 C 32336745 1 0.9645 19 8.27E-34 308 rsl l520787 T 32337284 1 0.27 1.60E-09 309 rsl0264177 G 32337387 0.91 1909 0.736185 1.31 E- 18 310 rsl l514764 T 32337477 1 0.259022 5.56E-09 311 rs215636 c 32338444 1 0.6451 3.92E-21 312 rs215637 G 32339923 1 0.55 7.40E- 18 313 rs215638 G 32340066 0.96 0.93 4.90E-21 314 rs215639 C 32340164 1 0.852753 3.35E-27 315 rs6977 196 C 32340403 1 0.63334 1.52E-20 316 s.32340738 C 32340738 0.94 0.58 1.20E- 11 317 s.32340852 A 32340852 1 0.27 1.60E-09 318 s.32343 12 1 T 3234312 1 1 0.57 1.20E- 18 319 rsl0238006 A 32343478 1 0.625 1.09E-20 320 s.32344007 G 32344007 0.9 0.57 6.60E- 11 321 rsl0447642 T 32344090 0.949465 0.573797 5.70E- 16 322 s.32344169 G 32344169 1 0.33 3.20E- 11 323 s.32344734 C 32344734 1 0.57 1.20E- 18 324 s.32345369 G 32345369 0.9 0.32 6.90E-06 325 rs215669 G 32345504 0.964379 0.928341 1.50E-29 326 rs215670 G 32345743 1 0.863471 3.31 E-29 327 rs717757 C 32346675 1 0.414279 5.37E- 14 328 s.32347343 C 32347343 1 0.21 5.80E-09 329 s.32347366 A 32347366 1 0.25 5.80E-09 330 s.32347375 T 32347375 1 0.25 5.80E-09 331 s.32347376 G 32347376 1 0.57 1.20E- 18 332 s.32347462 A 32347462 0.9 0.32 6.90E-06 333 rsl86229 C 32348082 0.962689 0.893533 1.31 E-27 334 s.32348187 T 32348187 0.94 0.58 1.20E- 11 335 rs215672 T 32348235 0.932344 0.372398 3.26E- 11 336 s.32348546 A 32348546 1 0.2 1.20E-08 337 s.32348867 A 32348867 1 0.33 3.20E- 11 338 rsl653884 T 32348916 0.933025 0.383656 2.38E- 11 339 rs215674 A 32349397 0.9331 11 0.380095 2.61E- 11 340 s.32349420 A 32349420 0.96 0.89 2.30E- 19 341 rs215675 G 32349522 0.932344 0.372398 3.26E- 11 342 rs215676 A 32349676 0.87004 0.341326 4.37E- 10 343 rs387575 G 32349712 0.87004 0.341326 4.37E- 10 344 rs215677 C 32350076 0.932344 0.372398 3.26E- 11 345 s.32350287 T 32350287 0.83 0.31 1.90E-05 346 s.3235089 1 A 3235089 1 1 0.21 6.80E-08 347 rsl700 11 C 3235 1269 0.93 1455 0.37 1953 4.05E- 11 348 rs215678 T 3235 1518 0.81 0.27 6.80E-05 349 rs215679 A 3235 1884 0.9 0.32 6.90E-06 350 rs215680 A 32352036 0.9 0.32 6.90E-06 351 rs215681 C 32352243 0.930845 0.36 1693 8.72E- 11 352 rsl653887 G 32352768 0.932344 0.372398 3.26E- 11 353 rsl653888 G 32353017 0.932344 0.372398 3.26E- 11 354 rsl668386 C 323531 15 0.9299 15 0.380428 5.03E- 10 355 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rsl653889 G 32353178 0.960334 0.849991 9.46E-25 356 rsl668387 C 32353293 0.933894 0.384101 1.92E- 11 357 rsl668388 G 32353349 0.81 0.27 6.80E-05 358 rsl668389 G 32353440 0.96 1345 0.884803 8.80E-26 359 s.3235349 1 G 3235349 1 0.9 0.32 6.90E-06 360 rsl668390 C 32353707 0.93 1482 0.371708 6.54E- 11 361 rs690247 C 32353796 0.932344 0.372398 3.26E- 11 362 rs690250 A 32353847 0.932344 0.372398 3.26E- 11 363 rsl83347 G 32354018 0.932344 0.372398 3.26E- 11 364 rsl77362 C 32354268 0.933894 0.384101 1.92E- 11 365 rsl77363 G 32354318 0.81 0.27 6.80E-05 366 rsl653890 A 32354480 0.929951 0.39 1891 4.64E- 11 367 rs215682 C 32354577 0.96 0.8 4.30E- 16 368 rs215683 C 3235459 1 1 0.33 3.20E- 11 369 rs215684 G 32354704 1 0.65 2.70E-22 370 rs215685 C 32354928 0.9 0.32 6.90E-06 371 rsl77364 G 32355007 0.9 0.32 6.90E-06 372 s.32355034 T 32355034 1 0.33 3.20E- 11 373 rs215686 G 32355190 1 0.33 3.20E- 11 374 rs215687 C 32355242 0.930845 0.36 1693 8.72E- 11 375 rs215690 C 32355344 0.853488 0.295775 2.85E-08 376 rsl376284 G 32355622 0.932344 0.372398 3.26E- 11 377 s.32355855 T 32355855 0.83 0.47 1.10E-07 378 s.32355892 G 32355892 0.73 0.23 0.00041 379 rs35554640 C 32356058 0.81 0.27 6.80E-05 380 s.32356066 A 32356066 0.81 0.27 6.80E-05 381 rsl376286 G 32356079 0.933025 0.383656 2.38E- 11 382 s.32356229 G 32356229 0.92 0.86 7.00E- 18 383 rsl376287 A 32356324 0.932344 0.372398 3.26E- 11 384 rsl0 13772 T 32356476 0.938582 0.426198 1.15E- 12 385 rsl0 1377 1 G 32356613 0.938644 0.429861 1.64E- 12 386 s.32357 132 C 32357132 1 0.31 3.50E- 12 387 s.32357 194 T 32357194 0.92 0.33 6.40E-07 388 s.32357656 A 32357656 0.92 0.86 7.00E- 18 389 s.3235819 1 T 3235819 1 1 0.28 4.70E- 11 390 rsl668393 c 32360136 0.960244 0.849828 3.82E-24 391 rsl3227922 A 3236 1067 0.932344 0.372398 3.26E- 11 392 rs215692 T 3236 1383 0.962689 0.893533 1.31 E-27 393 rs6979697 T 3236 1995 0.933025 0.383656 2.38E- 11 394 s.32362045 G 32362045 0.81 0.27 6.80E-05 395 rs412876 G 32362185 0.962158 0.893002 6.46E-27 396 rs7777 166 T 32362822 0.956881 0.702553 5.73E-2 1 397 rs780885 1 c 32362945 0.936165 0.409231 7.64E- 12 398 rs215694 T 3236355 1 0.959834 0.850463 8.11E-24 399 rs4723 146 c 32363973 1 0.2941 18 1.62E- 10 400 rs215695 c 32364433 0.947652 0.569003 1.92E- 16 401 rs215696 G 32364553 1 0.564202 1.44E- 18 402 rs215697 C 32364566 1 0.555556 1.22E- 18 403 rs215698 C 32364619 0.947652 0.569003 1.92E- 16 404 rs4723 147 A 32364681 1 0.503952 4.63E- 16 405 rsl65389 1 C 32364947 0.94 0.55 6.10E- 10 406 s.32364994 A 32364994 1 0.31 3.50E- 12 407 s.32365034 C 32365034 0.83 0.47 1.10E-07 408 s.32365036 G 32365036 1 0.25 2.50E- 10 409 rsl668394 A 32365210 0.9471 19 0.561362 5.05E- 16 410 rsl77365 G 32365359 0.88 0.51 1.10E-08 4 11 rs215699 C 32365499 0.94581 1 0.543414 1.23E- 15 412 rs215700 C 3236569 1 0.9159 15 0.203676 3.51E-07 413 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq I D NO: 36 rsl653892 T 32365994 0.88 0 .51 1.10E-08 414 rs215702 G 32366183 0.957584 0 .736531 5 .29E-22 415 rsl0486507 T 32366358 0.89732 0 .22 1997 8.91 E-07 416 s.32380353 A 32380353 0.65 0.35 2.30E-05 417 rs2099306 G 32397104 0.678859 0 .275093 2.31 E-07 418 rsl700 16 G 32412842 0.768362 0 .575708 2.93E- 13 419 rsl0236370 C 32412984 0.695048 0.383604 4.90E- 10 420 s.32413299 C 32413299 0.94 0 .58 1.90E- 10 421 rsl l23893 A 32413455 0.9161 13 0 .212125 7 .00E-07 422 rs3847 11 T 32413608 0.710852 0.364773 2.45E-09 423 rs430356 A 3241375 1 0.7206 13 0.325 8.34E-08 424 rs730725 T 32413787 0.526692 0 .229073 5.31 E-06 425 rs435584 A 32414209 0.72 17 13 0.33 1452 1.72E-08 426 s.32414636 C 32414636 0.55 0 .29 0 .00024 427 rsl72558 T 32414646 0.662696 0 .209565 0 .0000 1892 428 rs745707 1 G 32416597 0.69 0.36 1.20E-05 429 rs215749 G 32424097 0.79 0 .24 0 .00096 430

Table 2. Surrogate markers of anchor marker rs6474412 (SEQ I D NO :497) on Chromosome 8pl l . Markers were selected using data from Caucasian HapMap dataset or the publically available 1000 Genomes project (http ://www. 1000genomes.org). Markers that have not been assigned rs names are identified by their position in NCBI Build 36 of the human genome assembly. Shown are predicted risk alleles for the surrogate markers, i.e. alleles that are correlated with the risk allele of the anchor marker, rs6474412 allele T. Linkage disequilibrium measures D' and 2, and corresponding p-value, are also shown. The last column refers to the sequence listing number, identifying the particular SNP.

Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq I D NO: 36 s.42329845 A 42329845 0.74 0 .22 0.0008 431 s.42601955 A 4260 1955 0.76 0 .25 0.00055 432 s.42618302 C 42618302 1 0 .26 3.70E-07 433 rs70 13926 G 42620874 1 0 .254665 1.92E-06 434 rsl2 156092 T 42622389 0.68 0.46 1.40E-07 435 rsl868860 G 42622950 1 0 .51 3.30E- 16 436 rsl530850 C 426241 10 0.756221 0 .516684 1.30E- 12 437 rs6989472 C 42624623 0.768726 0 .585241 6.87E- 14 438 s.42626 122 C 42626122 1 0.36 1.20E-09 439 rsl l785591 A 42626897 1 0.418502 4.09E- 10 440 rsl947295 A 42628094 0.94 0.8 1.80E- 14 441 rsl376442 G 42628754 1 0 .770883 2.54E- 18 442 rs4737060 T 42629500 0.95 0.86 6 .10E- 16 443 s.42631425 A 4263 1425 1 0 .29 6.60E- 11 444 s.42632599 T 42632599 1 0.47 2.50E- 12 445 rs34842664 T 42632832 0.95 0.81 3.40E- 15 446 rsl868859 G 42634958 0.88 0.34 2.50E-05 447 rsl868858 C 42635887 1 0 .29 5.80E-08 448 rsl0958724 C 42636284 0.865042 0 .61527 3 .10E- 14 449 rs28441235 T 42636892 1 0.95 1.40E-26 450 rs7006469 G 42637051 0.84 0 .23 0.0008 451 rsl0097384 G 42637768 0.84 0 .23 0.0008 452 rs4305884 G 42637880 0.84 0 .23 0.0008 453 s.4263949 1 T 42639491 1 1 1.90E-28 454 s.42639835 A 42639835 1 0 .59 2.70E- 15 455 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rs6990603 G 42642196 1 0.882641 2.11E-21 456 rs34456987 A 42642486 1 1 1.90E-28 457 rsl0958725 G 42643741 1 1 5.10E-26 458 rs360573 18 G 42644441 1 0.81 3.00E-21 459 s.42644785 G 42644785 1 0.44 2.10E- 11 460 rs7837296 C 42646051 1 0.882641 2.11E-21 461 s.42646284 G 42646284 1 0.29 5.80E-08 462 s.42646543 A 42646543 1 0.59 2.70E- 15 463 rs5005909 A 42647824 1 0.87670 1 1.25E-20 464 rsl979 140 C 42649993 1 1 5.10E-26 465 rsl3277840 G 42650156 1 0.542384 1.55E- 12 466 s.42651452 G 4265 1452 1 0.4 1.70E- 10 467 s.42652 114 C 426521 14 1 0.81 3.00E-21 468 rsl l783507 A 42653552 1 0.95 1.40E-26 469 rs7816726 G 42654594 1 1 1.90E-28 470 rsl0958726 T 42655066 1 1 5.10E-26 471 rs784260 1 T 426562 12 1 1 5.10E-26 472 s.42656893 G 42656893 1 0.59 2.70E- 15 473 rs4295650 A 42656968 1 1 1.90E-28 474 s.42659223 T 42659223 1 0.76 1.10E-21 475 rs647441 1 G 42660603 1 1 1.90E-28 476 s.42661513 C 4266 15 13 1 0.91 2.70E-25 477 s.4266165 1 G 4266 1651 1 0.26 3.70E-07 478 rsl3273442 G 42663174 1 1 5.10E-26 479 s.42664453 C 42664453 1 0.64 4.00E- 19 480 s.42664514 T 426645 14 1 1 1.90E-28 481 s.42664708 T 42664708 1 0.44 2.10E- 11 482 s.42664984 A 42664984 1 1 1.90E-28 483 rsl451239 A 42665699 1 0.94382 4.58E-24 484 rsl451240 G 42665868 1 1 5.10E-26 485 s.42666045 A 42666045 1 1 1.90E-28 486 rs4736835 C 42666190 1 1 1.90E-28 487 s.42666333 T 42666333 1 0.83 2.70E-23 488 s.42666490 c 42666490 1 0.76 6.50E-20 489 rs6987704 c 42666780 1 0.76 6.50E-20 490 rsl530847 T 42667396 1 0.826087 8.63E-20 491 s.42668648 c 42668648 1 1 1.90E-28 492 rsl955 185 T 42668804 1 1 5.10E-26 493 rsl3277254 A 42669139 1 1 1.90E-28 494 rsl3280301 G 42669174 1 0.59 2.70E- 15 495 rsl3277524 T 426692 14 1 1 1.90E-28 496 rs6474412 T 42669655 1 1 0 497 rs6474413 T 42670221 1 1 1.90E-28 498 rs7004381 G 426703 18 1 1 5.10E-26 499 rs6985052 T 42670476 1 0.79 1.90E-22 500 rs4950 A 4267 1790 1 1 5.10E-26 501 rsl530848 T 42672065 1 1 5.10E-26 502 rs964389 1 T 42675754 1 1 1.90E-28 503 rsl3280604 A 42678743 1 1 5.10E-26 504 rs6997909 G 42679406 1 1 5.10E-26 505 rs6474414 C 42679493 1 1 5.10E-26 506 s.42679689 C 42679689 1 0.26 3.70E-07 507 rs6474415 A 42682095 1 1 5.10E-26 508 rs4951 T 426827 14 1 0.95 1.40E-26 509 rsl3263434 G 426922 10 1 0.59 2.70E- 15 510 s.42692248 T 42692248 1 0.47 2.50E- 12 511 rsl l783289 T 42693753 0.934844 0.762283 6.90E- 17 512 rs4236926 G 426972 16 0.9393 15 0.873078 6.70E- 18 513 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rsl326 1190 A 42697466 1 0.5947 16 1.08E- 13 514 s.42698182 A 42698182 1 0.91 2.70E-25 515 rsl689 1561 C 42698896 0.942255 0.885692 1.77E-21 516 rs7459838 A 42703436 1 0.87 3.10E-24 517 rs558283 12 A 42708759 0.95 0.85 5.00E- 16 518 rs70 17612 A 42718402 0.8789 1 0.7335 11 1.40E- 17 519 rs698403 1 T 42718609 0.94 0.74 1.90E- 13 520 s.4271861 1 T 427186 11 1 0.81 3.00E-21 521 s.42721825 A 4272 1825 0.89 0.38 1.70E-06 522 rs7822 100* C 42722508 0.94 0.7 1.00E- 12 523 rs7825907 G 42723438 0.78 0.44 3.50E-07 524 rs6982753 A 42723948 0.814692 0.5656 7.35E- 14 525 rs9298628 C 42725148 0.814815 0.567575 6.24E- 14 526 rs9298629 G 42725343 0.814815 0.567575 6.24E- 14 527 rs7824155 A 42726008 0.814815 0.567575 6.24E- 14 528 s.42726264 T 42726264 0.84 0.67 3.30E- 11 529 rs7824614 A 42726280 0.84 0.67 3.30E- 11 530 s.42726938 T 42726938 0.92 0.54 4.60E-09 531 rs2304297 G 42727356 0.935932 0.641678 1.76E- 16 532 rs7845663 G 42727720 0.757761 0.517408 1.30E- 12 533 rs7812298 C 42727736 0.814567 0.563595 8.68E- 14 534 rs7004108 A 42727867 0.84 0.67 3.30E- 11 535 s.42728158 C 42728158 1 0.22 2.30E-06 536 s.42728590 C 42728590 0.7 0.25 0.00021 537 s.42729587 T 42729587 0.88 0.34 6.00E-06 538 s.42729589 A 42729589 0.88 0.34 6.00E-06 539 s.42731443 G 4273 1443 0.82 0.57 2.20E-09 540 s.42731535 C 4273 1535 0.77 0.4 1.50E-06 541 s.42731657 A 4273 1657 0.79 0.29 4.40E-05 542 s.42731840 C 4273 1840 0.7 0.25 0.00021 543 s.42732056 C 42732056 0.82 0.57 2.20E-09 544 rsl0 110332 C 42732343 0.84 0.67 3.30E- 11 545 s.42732673 G 42732673 0.82 0.57 2.20E-09 546 rs892413 C 42733535 0.756221 0.516684 1.30E- 12 547 rsl0087172 T 42736025 0.756221 0.516684 1.30E- 12 548 rs4398905 c 42737067 0.642179 0.314258 7.98E-08 549 rs2196 128 T 42737443 0.750908 0.459129 1.48E- 11 550 rs2196 129 G 42737563 0.642179 0.314258 7.98E-08 551 rs22 17732 A 42737603 0.814815 0.567575 6.24E- 14 552 rsl072003 C 42739158 0.756221 0.516684 1.30E- 12 553 s.42741317 T 427413 17 0.82 0.57 2.20E-09 554 s.4274423 1 c 42744231 1 0.33 8.70E-09 555 rsl0 109040 c 42744470 0.756221 0.516684 1.30E- 12 556 rsl689 1620 c 42744820 0.642179 0.314258 7.98E-08 557 s.42745575 A 42745575 0.82 0.57 2.20E-09 558 s.42745581 C 42745581 0.82 0.57 2.20E-09 559 rs4737069 A 427457 17 0.82 0.57 2.20E-09 560 rs21 17225 G 42746270 0.82 0.57 2.20E-09 561 rs2164024 A 42746361 0.82 0.57 2.20E-09 562 rs7828365 C 42748471 0.642179 0.314258 7.98E-08 563 rsl0 107450 C 42749052 0.873551 0.58822 8.33E- 15 564 s.42750441 G 42750441 0.74 0.54 5.10E-09 565 s.42752754 C 42752754 0.76 0.36 5.80E-06 566 s.42753787 A 42753787 1 0.22 2.30E-06 567 s.42756840 A 42756840 0.73 0.41 8.80E-07 568 s.42757418 A 42757418 0.73 0.41 8.80E-07 569 rsl0092346 T 42758450 0.73 0.41 8.80E-07 570 s.42759608 T 42759608 0.67 0.22 0.00059 571 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq I D NO: 36 s.42760352 G 42760352 1 0.33 8.70E-09 572 s.42761317 A 4276 13 17 0.65 0 .25 7.60E-05 573 s.42761892 A 4276 1892 0.65 0.38 1.60E-06 574 s.42761909 T 4276 1909 0.7 0 .25 0.00021 575 s.42761965 T 4276 1965 0.67 0 .22 0.00059 576 rsl960346 c 42762202 0.65 0.38 1.60E-06 577 s.42763404 T 42763404 0.78 0.44 3.50E-07 578 s.42763808 c 42763808 0.69 0.41 8.60E-07 579 s.42764045 A 42764045 0.7 0 .25 0.00021 580 s.42765003 C 42765003 0.69 0.41 8.60E-07 581 rs456703 1 G 427652 12 0.78 0.44 3.50E-07 582 s.4276542 1 C 42765421 0.7 0 .25 0.00021 583 s.42769020 G 42769020 0.7 0 .25 0.00021 584 rs473707 1 C 42769593 0.812793 0 .536568 2.33E- 13 585 rs6474420 A 42770089 0.69 0.41 8.60E-07 586 s.42770829 C 42770829 0.69 0.41 8.60E-07 587 s.4277187 1 G 4277 1871 0.69 0.41 8.60E-07 588 rsl l986893 A 427720 16 0.812793 0 .536568 2.33E- 13 589 s.42772405 C 42772405 0.7 0 .25 0.00021 590 s.42779341 T 42779341 0.77 0 .26 0.000 14 591 s.42780448 c 42780448 0.77 0 .26 0.000 14 592 rs6985527 G 42781004 0.710785 0.499 137 5.5 1E- 12 593 s.42782778 G 42782778 0.62 0 .21 0.00073 594 s.42782938 C 42782938 0.54 0 .28 0.000 11 595 s.42783806 A 42783806 0.77 0 .26 0.000 14 596 s.42784397 C 42784397 0.67 0.37 7.20E-06 597 s.42786368 C 42786368 0.79 0 .29 4.40E-05 598 s.42787290 T 42787290 0.79 0 .29 4.40E-05 599 s.42787506 A 42787506 0.62 0 .21 0.00073 600 s.42789990 C 42789990 1 0 .22 2.30E-06 601 s.42806 166 C 42806166 1 0 .26 3.70E-07 602 rsl0087388 T 42824082 0.853593 0.339812 1.9 1E-09 603 s.42837205 c 42837205 1 0 .26 3.70E-07 604 s.42844765 G 42844765 1 0 .22 2.30E-06 605 s.42859016 C 428590 16 1 0 .26 3.70E-07 606 s.42886979 A 42886979 0.47 0 .2 0.0014 607 s.42936582 G 42936582 0.75 0 .22 0.0004 608 rsl0 106661 G 42972871 0.588753 0 .291428 4.66E-07 609 rs34727690** A 42979501 1 0 .26 3.70E-07 610 s.43009750 C 43009750 0.84 0 .23 0.00039 611 s.43018519 A 430185 19 0.86 0 .27 0.000 13 612 s.43 113569 A 431 13569 1 0 .22 2.30E-06 613 s.43 150703 T 43150703 0.86 0 .27 0.000 13 614 s.43 166986 G 43166986 1 0 .26 3.70E-07 615 s.43 16700 1 T 43167001 1 0 .26 3.70E-07 616

* rs7822100 is a mixed SNP with possible alleles -/C/T/TTC where - means deletion * * rs34727690 is an indel marker with possible alleles -/ CTATAT Table 3. Surrogate markers of anchor marker rs4105144 (SEQ I D NO :668) on Chromosome 19q l3. Markers were selected using data from Caucasian HapMap dataset or the publically available 1000 Genomes project (http ://www. 1000genomes.org). Markers that have not been assigned rs names are identified by their position in NCBI Build 36 of the human genome assembly. Shown are predicted risk alleles for the surrogate markers, i.e. alleles that are correlated with the risk allele of the anchor marker, rs4105144 allele C. Linkage disequilibrium measures D' and 2, and corresponding p-value, are also shown. The last column refers to the sequence listing number, identifying the particular SNP.

Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq I D NO: 36 s.45831417 T 4583 1417 0.89 0 .26 7 .50E-05 617 s.45859717 G 45859717 1 0 .24 5 .20E-09 618 rsl l083565 T 45866989 0.5192 14 0 .255528 4 .06E-06 619 rs2561537 c 45885420 0.923106 0 .252897 5 .54E-08 620 rs2604885 A 45898595 0.688941 0 .250967 9 .13E-07 621 rs7260405 G 4592 1359 0.59 0 .21 0 .0021 622 rs2607420 T 45936727 0.568423 0 .27559 7 .12E-07 623 rs2369302 G 45943020 0.674108 0 .2646 19 1.01E-07 624 rs2254343 C 45947340 0.645593 0.305039 2.48E-08 625 rsl457 141 G 45949016 0.553248 0 .240493 1.15E-06 626 rs2604874 A 45949656 0.645593 0.305039 2.48E-08 627 s.45950500 A 45950500 0.78 0 .22 0.00 17 628 s.45950502 G 45950502 0.78 0 .22 0 .00 17 629 rs2607415 C 45954527 0.525153 0 .252423 2.95E-07 630 rs2607414 G 45958969 0.525153 0 .252423 2.95E-07 631 rs227901 1 A 4596 1128 0.750071 0 .230092 1.88E-07 632 rs2249835 T 4596 1606 0.525153 0 .252423 2.95E-07 633 rs2607424 A 45966987 0.525153 0 .252423 2.95E-07 634 rs2604869 G 45975533 0.507237 0 .240842 9.91 E-07 635 rsl2973666 C 45981237 0.808936 0 .27006 2 .57E-08 636 rs2604893 C 45985383 0.603341 0.323045 8 .20E-09 637 rs2644898 A 45990855 0.573022 0.300496 2.05E-08 638 rs7252227 T 45992955 0.7488 0.3065 18 4 .16E-08 639 rs7937 T 45994546 0.822109 0.317805 1.20E-09 640 rs2644916 G 4600 105 1 0.5785 19 0.3200 14 7 .71E-09 641 rs3733828 C 46002449 0.596605 0 .275581 8 .70E-08 642 rs4803372 T 46013212 0.65 0 .21 0 .00028 643 s.46014685 A 46014685 0.8 0.33 2 .70E-06 644 s.46014686 A 46014686 0.8 0.33 2 .70E-06 645 rs4803373 G 46018266 0.59 0 .29 3 .10E-05 646 s.46019 153 A 46019153 1 0 .2 2 .50E-08 647 s.46019626 C 46019626 0.55 0 .25 0 .00026 648 s.46020036 T 46020036 0.55 0 .25 0 .00026 649 s.4602401 1 T 4602401 1 0.9 0 .28 1.20E-05 650 s.46024197 A 46024197 0.74 0 .29 2.30E-05 651 s.46024672 G 46024672 0.58 0 .21 0 .00067 652 rsl l670760 C 46028635 0.6 0 .24 0 .00022 653 rsl2459249 C 4603 1736 1 0 .25 3 .50E- 10 654 rs7251418 G 46033429 0.905271 0 .786167 2 .18E-20 655 rs7251418 G 46033429 1 0 .25 3 .50E- 10 656 rs7251570 G 46033590 0.858289 0 .615742 8.30E- 16 657 rs434339 1 C 46036208 1 0.87 1875 1.06E-25 658 rs434339 1 C 46036208 1 0 .25 3 .50E- 10 659 rs7245507 G 46037064 1 0 .25 3 .50E- 10 660 s.46037829 G 46037829 1 0 .25 3 .50E- 10 661 rsl l083571 T 46037939 1 0 .25 3 .50E- 10 662 rs2302989 A 46043322 0.686783 0.33864 8 .65E-08 663 rs23 16213 G 46043445 0.568588 0 .2082 13 0.00002672 664 rs8192725 G 46046552 1 0 .22 4 .70E-09 665 rs7250713 C 46047035 0.68 0 .22 0 .00 14 666 rsl l37 115 C 4604812 1 1 0 .23 2 .00E-09 667 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq ID NO: 36 rs4105144 C 46050464 1 1 0 668 rsl0404667 C 46054738 1 0.23 2.00E-09 669 rs400192 1 T 46055141 1 0.23 2.00E-09 670 rs8102683 c 46055605 1 0.8653 14 1.67E-24 671 rs8102683 c 46055605 1 0.23 2.00E-09 672 rs8105704 c 46055738 1 0.23 2.00E-09 673 rsl496402 A 46057974 1 0.955563 1.77E-28 674 rsl496402 A 46057974 0.86 0.2 0.0011 675 rsl2610432 C 46059609 0.79 0.21 0.00065 676 s.46062016 C 46062016 0.79 0.21 0.00065 677 rsl246 1383 G 46062178 1 0.541 149 7.86E- 18 678 rsl246 1383 G 46062178 0.79 0.21 0.00065 679 rs4570984 C 46066942 0.6172 12 0.330101 5.73E-07 680 s.46067487 G 46067487 1 0.32 3.30E- 12 681 s.46067573 C 46067573 0.61 0.2 0.00075 682 s.4606791 1 G 4606791 1 0.79 0.21 0.00065 683 rsl l882981 G 46069846 0.61 0.2 0.00075 684 rs7247469 C 4607 116 1 0.61 0.2 0.00075 685 rs7251315 A 4607 1683 0.93 0.38 8.70E-08 686 rs650895 1 C 4607 1748 0.54 0.29 0.0001 1 687 s.46075055 C 46075055 0.58 0.22 0.00036 688 rs3869579 A 46075639 0.78 0.26 2.30E-05 689 s.46075829 T 46075829 0.78 0.26 2.30E-05 690 s.46075942 T 46075942 0.78 0.26 2.30E-05 691 s.46077574 T 46077574 0.77 0.25 3.00E-05 692 rsl2973598 G 46077674 0.78 0.26 2.30E-05 693 s.46077976 C 46077976 0.71 0.23 7.30E-05 694 s.46078049 G 46078049 0.78 0.26 2.30E-05 695 s.46078122 C 46078122 0.71 0.23 7.30E-05 696 s.46078260 C 46078260 0.77 0.25 3.00E-05 697 s.46078326 A 46078326 0.71 0.23 7.30E-05 698 s.46078327 G 46078327 0.71 0.23 7.30E-05 699 s.46078334 C 46078334 0.71 0.23 7.30E-05 700 s.46078367 A 46078367 0.71 0.23 7.30E-05 701 s.46078384 G 46078384 0.71 0.23 7.30E-05 702 s.46078387 C 46078387 0.78 0.26 2.30E-05 703 s.46078424 T 46078424 0.78 0.26 2.30E-05 704 s.46079081 A 46079081 0.78 0.26 2.30E-05 705 s.46079 140 A 46079140 0.78 0.26 2.30E-05 706 s.46080547 C 46080547 0.78 0.26 2.30E-05 707 s.46080617 G 46080617 0.78 0.26 2.30E-05 708 s.46082249 C 46082249 0.67 0.36 2.50E-06 709 s.46083888 T 46083888 0.78 0.26 2.30E-05 710 rs3875 159 c 46083994 0.77 0.25 3.00E-05 711 rs28503746 G 46084975 0.78 0.26 2.30E-05 712 rs3909341 C 46085166 0.56 0.2 0.00044 713 rs4105 142 T 46085410 0.71 0.23 7.30E-05 714 rs4105 141 A 46085440 0.56 0.2 0.00044 715 rs5007415 A 46085600 0.56 0.2 0.00044 716 s.46085777 A 46085777 0.78 0.26 2.30E-05 717 rsl041 1264 T 46086176 0.56 0.2 0.00044 718 rs6742 1541 T 46086260 0.78 0.26 2.30E-05 719 s.46086876 A 46086876 0.78 0.26 2.30E-05 720 s.46087595 A 46087595 0.56 0.2 0.00044 721 rsl l083582 C 4608769 1 0.78 0.26 2.30E-05 722 rs3909342 A 46088530 0.78 0.26 2.30E-05 723 rs4803397 A 46088705 0.661873 0.35 1138 1.68E-08 724 rs4803397 A 46088705 0.71 0.23 7.30E-05 725 Pos in Marker Risk Allele NCBI Build D' R2 P-value Seq I D NO: 36 s.46088755 G 46088755 0.56 0 .2 0 .00044 726 rs3852870 C 46089210 0.78 0 .26 2.30E-05 727 s.46089339 G 46089339 0.56 0 .2 0 .00044 728 rs8103444 A 4608950 1 0.66 1873 0.35 1138 1.68E-08 729 rs3865457 T 46090809 0.56 0 .2 0 .00044 730 rs4803398 c 46093790 0.81 0 .24 0 .0001 1 731 rs6508953 G 46094419 0.81 0 .24 0 .0001 1 732 rsl0419393 G 46096036 0.56 0 .2 0 .00044 733 rs734306 1 A 46097188 0.78 0 .26 2.30E-05 734 rs4803400 G 46097802 0.78 0 .26 2.30E-05 735 rs7254188 A 46099183 0.56 0 .2 0.00044 736 rsl0416968 A 46099477 0.68 0 .2 0 .0004 737

The sequence data that is obtained may in certain embodiments, be amino acid sequence data . Polymorphic markers ca n resu lt in alterations in the amino acid sequence of encoded polypeptide or protein sequence. I n certain embodiments, the analysis of amino acid sequence data com prises determining the presence or absence of an amino acid substitution in the amino acid encoded by the at least one polymorphic marker. Sequence data can in certain embodiments be obtained by analyzing the amino acid sequence encoded by the at least one polymorphic marker in a biological sam ple obtai ned from the individual . I n certain embodiments, the at least one polymorphic marker that is assessed is an amino acid substitution in a polypeptide encoded by a gene selected from the grou p consisting of the hu man CHRNB3, CHRNA6, PDE1C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, or RAB4B genes. I n other words, the marker may be an amino acid su bstitution in a polypeptide encoded by any one of those genes.

I n certai n embodiments of the invention, determination of the presence of pa rticu lar marker alleles or particula r ha plotypes is predictive of an increased susceptibility of lu ng cancer in humans. I n certain embodiments, determination of the presence of a marker allele selected from the group consisting of the T allele of rs6474412, the G allele of rs2 15614, the T allele of rs7937, the C allele of rs4105 144 and the G allele of rs7260329 is indicative of increased risk of lu ng ca ncer in the individua l. These marker alleles confer increased risk of lung cancer with relative risk or odds ratio of greater tha n unity, and are sometimes also referred to as at-risk alleles or at-risk variants. I n certain other embodiments, risk alleles as presented in the surrogate markers Tables 1 to 3 are at-risk alleles indicative of increased risk of lung cancer. Individuals who are homozygous for at-risk alleles are at particula rly hig h risk of developing lung cancer, since thei r genome includes two copies of the at-risk variant.

Measures of susceptibility or risk include measures such as relative risk ( ), odds ratio (OR), and absolute risk (AR), as described in more detail herein .

I n certai n embodiments, increased susceptibi lity refers to a risk with values of RR or OR of at least 1.05, at least 1.06, at least 1.07, at least 1.08, at least 1.09, at least 1.10, at least 1.11, at least 1.12, at least 1.13, at least 1.14 or at least 1.15 . Other numerica l non-integer values greater tha n unity are also possible to characterize the risk, and such nu merica l values are also contemplated . Certain embodiments relate to homozygous individuals for a pa rticular marker, i.e. individuals who carry two copies of the sa me allele in their genome . One preferred embodiment relates to individua ls who are homozygous carriers of an allele selected from the grou p consisting of the T allele of rs6474412, the G allele of rs2 15614, the T allele of rs7937, the C allele of rs4105 144 and the G allele of rs7260329, or a marker allele in linkage disequilibrium therewith .

I n certai n other embodiments, determination of the presence of pa rticular marker alleles or pa rticu lar haplotypes is predictive of a decreased susceptibi lity of lu ng ca ncer in hu mans. For SNP markers with two alleles, the alternate allele to an at-risk allele will be in decreased frequency in patients compa red with controls. Thus, in the determination of SNPs, determination of the presence of the non-risk (alternate) allele is indicative of a decreased susceptibility of lung cancer. Individuals who are homozygous for the alternate (protective) allele are at particu larly decreased susceptibility or risk.

To identify further markers that are usefu l for assessing susceptibility to lu ng cancer, it may be useful to compa re the frequency of markers alleles in individua ls with lung cancer to control individua ls. The control individua ls may be a random sa mple from the genera l popu lation, i.e. a popu lation cohort. The control individuals may also be a sample from individua ls that are disease-free, e.g. individuals who have been confirmed not to have lu ng ca ncer, or individuals who have not been diag nosed with lung cancer. I n one embodiment, an increase in frequency of at least one allele in at least one polymorphism in individua ls diagnosed with lu ng ca ncer, as com pared with the frequency of the at least one allele in the control group is indicative of the at least one allele being usefu l for assessing increased suscepti bility to lu ng ca ncer. I n another embodiment, a decrease in frequency of at least one allele in at least one polymorphism in individua ls diagnosed with lu ng ca ncer, as compa red with the frequency of the at least one allele in the control sa mple is indicative of the at least one allele being usefu l for assessing decreased susceptibility to, or protection against, lu ng cancer.

I n general, sequence data can be obtained by analyzing a sam ple from an individua l, or by ana lyzing information about specific markers in a database or other data collection, for example a genotype database or a sequence database. The sam ple is in certain embodiments a nucleic acid sam ple, or a sa mple that contains nucleic acid material . Analyzing a sam ple from an individua l may in certain embodiments include steps of isolating genomic nucleic acid from the sample, amplifying a segment of the genomic nucleic acid that contains at least one polymorphic marker, and determine sequence information about the at least one polymorphic marker. Amplification is preferably performed by Polymerase Chain Reaction (PCR) tech niques. I n certain embodiments, sequence data can be obtained t hroug h nucleic acid sequence information or amino acid sequence information from a preexisting record . Such a preexisti ng record can be any documentation, database or other form of data storage containi ng such information . Determination of a susceptibility or risk of a particula r individual in general comprises com parison of the genotype information (sequence information) of the individual to a record or database providing a correlation about particu lar polymorphic marker(s) and susceptibility to disease, e.g. lu ng ca ncer. Thus, in specific embodiments, determining a susceptibility com prises com paring the sequence data to a data base containing correlation data between the at least one polymorphic marker and susceptibility to lu ng ca ncer. I n certain embodiments, the data base com prises at least one measure of susceptibility to lu ng ca ncer for the at least one polymorphic marker. I n certain embodiments, the database com prises a look-up table comprising at least one measu re of susceptibility to kidney cancer for the at least one polymorphic marker. The measure of susceptibility may in the form of relative risk ( ), absolute risk (AR), percentage (%) . Other convenient measu res for describing genetic susceptibility of individua ls are also possible, and within scope of the invention .

Certain embodiments of the invention relate to markers associated with pa rticu lar genes, e.g. the huma n CHRNB3, CHRNA6, PDE1C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, or RAB4B genes. Markers associated with one or more of these genes are in certain embodiments useful susceptibility markers of lu ng ca ncer. Markers that are associated with any one of these genes are in certain embodiments markers that are in linkage disequili brium (LD) with at least one genetic marker within the genes. I n certain embodiments, the markers are located within the genomic segments LD block C07, LD block C08 or LD block C19, as defined herein . I n certain embodiments, markers associated with a particula r gene are selected from the markers withi n the gene, i.e. within the genomic region that contain exons, introns and promoter sequences of the gene.

Certain embodiments of the invention relate to markers located within the LD block C07, LD block C08 or LD block C19 as defined herein . It is however also contemplated that surrogate markers may be located outside the physical bounda ries of these LD blocks as defined by their genomic locations. This is because, recombination events may have led to certain risk surrogates having been "separated" from the main cluster of surrogates, although these surrogates are detecting the same variant. Thus, certain embodi ments of the invention are contemplated to also encompass surrogate markers in lin kage disequilibriu m with rs2 15614, rs6474412 or rs4105 144 that are located outside the physica l boundaries of LD block C07, LD block C08 or LD block C19 as defined .

I n certai n embodiments of the invention, more tha n one polymorphic marker is ana lyzed to determine lung cancer risk. I n certai n embodiments, at least two polymorphic markers are ana lyzed . Thus, in certain embodiments, nucleic acid data about at least two polymorphic markers is obtained .

I n certai n embodiments, a further step of ana lyzing at least one ha plotype comprising two or more polymorphic markers is included . Any convenient method for haplotype ana lysis known to the skilled person, such as those described in more detail herein, may be employed in such embodiments. One aspect of the invention relates to a method for determining a susceptibility to lu ng cancer in a hu man individua l, com prising determining the presence or absence of at least one allele of at least one polymorphic marker in a nucleic acid sa mple obtained from the individua l, or in a genotype dataset from the individual, wherein the at least one polymorphic marker is selected from the group consisting of rs6474412, rs215614 and rs4105 144, and markers in lin kage disequili brium therewith, and wherein determination of the presence of the at least one allele is indicative of a susceptibility to lung cancer. Determination of the presence of an allele that correlates with lu ng cancer is indicative of an increased susceptibi lity to lu ng cancer. Individuals who are homozygous for such alleles are pa rticu larly susceptible to lu ng cancer. On the other hand, individuals who do not carry such at-risk alleles are at a decreased susceptibility of developing lu ng ca ncer, as compared with a random ly selected individua l from the genera l popu lation . For SNPs, such individua ls will be homozygous for the alternate (protective) allele of the polymorphism .

Determination of susceptibility is in some embodiments reported by a com parison with non- carriers of the at-risk allele(s) of polymorphic markers. I n certain embodiments, suscepti bility is reported based on a com parison with the general popu lation, e.g. com pared with a random selection of individuals from the population .

I n certai n embodiments, polymorphic markers are detected by sequencing tech nologies. Obtaini ng sequence information about an individual identifies particu lar nucleotides in the context of a nucleic acid sequence . For SNPs, sequence information about a single unique sequence site is sufficient to identify alleles at that pa rticu lar SNP. For markers comprising more than one nucleotide, sequence information about the genomic region of the individual that contains the polymorphic site identifies the alleles of the individual for the particula r site. The sequence information can be obtained from a sa mple from the individual . I n certain embodiments, the sample is a nucleic acid sa mple . I n certain other embodi ments, the sa mple is a protein sample.

Various methods for obtaining nucleic acid sequence are known to the skilled person, and all such methods are usefu l for practicing the invention . Sa nger sequencing is a well-known method for generating nucleic acid sequence information . Recent methods for obtaini ng large amounts of sequence data have been developed, and such methods are also contemplated to be useful for obtaining sequence information . These include pyrosequencing tech nology (Ronag hi, M. et al. Anal Biochem 267 :65-7 1 ( 1999) ; Ronaghi, et al. Biotechniques 25:876-878 ( 1998)), e.g. 454 pyrosequencing (Nyren, P., et al. Anal Biochem 208 : 17 1- 175 ( 1993)), Illumina/Solexa sequencing technology (http ://www. illu mina .com; see also Strausberg, RL, et al Drug Disc Today 13 :569-577 (2008)), and Su pported Oligonucleotide Ligation and Detection Platform (SOLiD) technology (Applied Biosystems, http ://www.appliedbiosystems.com) ; Strausberg, RL, et al Drug Disc Today 13 :569-577 (2008) . Assessment for markers and haplotypes The genomic sequence withi n popu lations is not identica l when individuals are compa red . Rather, the genome exhibits sequence variability between individua ls at many locations in the genome . Such variations in sequence are commonly referred to as polymorphisms, and there are many such sites within each genome . For example, the hu man genome exhi bits sequence variations which occur on average every 500 base pairs. The most common sequence varia nt consists of base variations at a single base position in the genome, and such sequence varia nts, or polymorphisms, are common ly ca lled Single Nucleotide Polymorphisms ("SNPs") . These SNPs are believed to have occu rred in a single mutationa l event, and therefore there are usual ly two possible alleles possible at each SNPsite; the origina l allele and the alternate (mutated) allele. Due to natura l genetic drift and possibly also selective pressu re, the origina l mutation has resu lted in a polymorphism cha racterized by a pa rticu lar frequency of its alleles in any given popu lation . Many other types of sequence variants are found in the hu man genome, including mini- and microsatellites, and insertions, deletions and inversions (also called copy num ber variations (CNVs)) . A polymorphic microsatellite has multi ple small repeats of bases (such as CA repeats, TG on the complimenta ry stra nd) at a pa rticu lar site in which the number of repeat lengths varies in the general population . I n general terms, each version of the sequence with respect to the polymorphic site represents a specific allele of the polymorphic site. These sequence variants ca n all be referred to as polymorphisms, occu rring at specific polymorphic sites cha racteristic of the sequence variant in question . I n general, polymorphisms can comprise any num ber of specific alleles within the population, although each hu man individual has two alleles at each polymorphic site - one materna l and one paterna l allele . Thus in one embodiment of the invention, the polymorphism is characterized by the presence of two or more alleles in the population . I n another embodiment, the polymorphism is cha racterized by the presence of t hree or more alleles in the population . I n other embodiments, the polymorphism is cha racterized by fou r or more alleles, five or more alleles, six or more alleles, seven or more alleles, nine or more alleles, or ten or more alleles. All such polymorphisms ca n be utilized in the methods and kits of the present invention, and are thus within the scope of the invention .

Due to their abundance, SNPs accou nt for a majority of sequence variation in the hu man genome . Over 6 million hu man SNPs have been validated to date (http ://www. ncbi .nlm .nih .gov/projects/SNP/sn p_su mma ry.cgi) . However, CNVs are receiving increased attention . These large-sca le polymorphisms (typica lly l kb or larger) account for polymorphic variation affecting a su bsta ntia l proportion of the assembled huma n genome; known CNVs covery over 15% of the hu man genome sequence (Estivill, X Armengol; L , PloS Genetics 3 :1787-99 (2007) ; http ://projects.tcag .ca/variation/) . Most of these polymorphisms are however very rare, and on average affect only a fraction of the genomic sequence of each individua l. CNVs are known to affect gene expression, phenotypic variation and adaptation by disru pti ng gene dosage, and are also known to cause disease (microdeletion and microdu plication disorders) and confer risk of common complex diseases, includi ng HIV- 1 infection and glomeruloneph ritis (Redon, R., et al. Nature 23 :444-454 (2006)) . It is thus possible that either previously described or unknown CNVs represent causative variants in lin kage disequilibriu m with the disease-associated markers described herei n. Methods for detecting CNVs include compa rative genomic hybridization (CGH) and genotyping, including use of genotyping arrays, as described by Ca rter (Nature Genetics 39:S16-S2 1 (2007)) . The Data base of Genomic Variants (http ://projects.tcag .ca/va riation/) contains updated information about the location, type and size of described CNVs. The database cu rrently contains data for over 21,000 CNVs.

I n some insta nces, reference is made to different alleles at a polymorphic site without choosing a reference allele . Alternatively, a reference sequence can be referred to for a pa rticu lar polymorphic site. The reference allele is sometimes referred to as the "wi ld-type" allele and it usually is chosen as either the first sequenced allele or as the allele from a "non-affected" individua l (e.g., an individual that does not display a trait or disease phenotype) .

Alleles for SNP markers as referred to herein refer to the bases A, C, G or T as they occu r at the polymorphic site. The allele codes for SNPs used herein are as follows : 1= A, 2=C, 3=G, 4=T. Since human DNA is dou ble-stranded, the person skilled in the art will rea lise that by assayi ng or reading the opposite DNA strand, the complementary allele can in each case be measu red . Thus, for a polymorphic site (polymorphic marker) cha racterized by an A/G polymorphism, the methodology employed to detect the marker may be designed to specifica lly detect the presence of one or both of the two bases possible, i.e. A and G . Alternatively, by designing an assay that is designed to detect the compli menta ry strand on the DNA template, the presence of the com plementary bases T and C can be measured . Quantitatively (for example, in terms of risk estimates), identical resu lts wou ld be obtained from measurement of either DNA strand (+ stra nd or - strand) .

Typica lly, a reference sequence is referred to for a particu lar sequence. Alleles that differ from the reference are sometimes referred to as "va riant" alleles. A variant sequence, as used herein, refers to a sequence that differs from the reference sequence but is otherwise su bstantially simila r. Alleles at the polymorphic genetic markers described herein are variants. Variants ca n include changes that affect a polypeptide . Sequence differences, when compa red to a reference nucleotide sequence, can include the insertion or deletion of a sing le nucleotide, or of more tha n one nucleotide, resu lting in a fra me shift; the change of at least one nucleotide, resulting in a cha nge in the encoded amino acid; the cha nge of at least one nucleotide, resu lti ng in the generation of a premature stop codon ; the deletion of severa l nucleotides, resu lti ng in a deletion of one or more amino acids encoded by the nucleotides; the insertion of one or several nucleotides, such as by unequal recombination or gene conversion, resulting in an interru ption of the codi ng sequence of a reading frame; duplication of all or a part of a sequence; transposition ; or a rearrangement of a nucleotide sequence, . Such sequence changes can alter the polypeptide encoded by the nucleic acid . For exa mple, if the cha nge in the nucleic acid sequence causes a fra me shift, the fra me shift can result in a cha nge in the encoded amino acids, and/or can result in the generation of a prematu re stop codon, causing generation of a tru ncated polypeptide. Alternatively, a polymorphism can be a synonymous change in one or more nucleotides (i.e. , a cha nge that does not resu lt in a change in the amino acid sequence) . Such a polymorphism can, for exa mple, alter splice sites, affect the stability or transport of mRNA, or otherwise affect the tra nscription or tra nslation of an encoded polypeptide . It can also alter DNA sta bility so as t o increase the possibility that structural changes, such as amplifications or deletions, occur at the somatic level .

A ha plotype refers t o a single-stra nded segment of DNA that is characterized by a specific com bination of alleles arranged along the segment. For diploid organisms such as humans, a haplotype comprises one member of the pair of alleles for each polymorphic marker or locus . I n a certain embodiment, the haplotype can comprise two or more alleles, three or more alleles, four or more alleles, or five or more alleles, each allele corresponding t o a specific polymorphic marker along the seg ment. Haplotypes ca n com prise a combi nation of various polymorphic markers, e.g. , SNPs and microsatellites, having particu lar alleles at the polymorphic sites. The haplotypes thus comprise a combination of alleles at various genetic markers.

Detecti ng specific polymorphic markers and/or haplotypes ca n be accomplished by methods known in the art for detecting sequences at polymorphic sites. For example, standard techniques for genotyping for the presence of SNPs and/or microsatellite markers can be used, such as fluorescence-based techniques (e.g. , Chen, X . e a/. , Genome Res. 9(5) : 492-98 ( 1999) ; Kutyavin et al. , Nucleic Acid Res. 34 :el28 (2006)), uti lizing PCR, LCR, Nested PCR and other techniques for nucleic acid amplification . Specific com mercial methodologies availa ble for SNP genotyping include, but are not limited to, TaqMan genotyping assays and SNPlex platforms (Applied Biosystems), gel electrophoresis (Applied Biosystems), mass spectrometry (e.g. , MassARRAY system from Sequenom), minisequencing methods, real-time PCR, Bio-Plex system (BioRad), CEQ and SNPstream systems (Beckma n), array hybridization tech nology(e.g. , Affymetrix GeneChip; Perlegen), BeadArray Technologies (e.g. , I llumina GoldenGate and Infiniu m assays), array tag tech nology (e.g., Pa rallele), and endonuclease-based fluorescence hybridization tech nology (Invader; Third Wave) . Some of the available array platforms, including Affymetrix SNP Array 6.0 and Illu mina CNV370-Duo and 1M BeadChips, include SNPs that tag certain CNVs. This allows detection of CNVs via su rrogate SNPs included in these platforms. Thus, by use of these or other methods available t o the person skilled in the art, one or more alleles at polymorphic markers, including microsatellites, SNPs or other types of polymorphic markers, can be identified .

I n certai n embodiments, polymorphic markers are detected by sequencing tech nologies. Obtaini ng sequence information about an individual identifies particu lar nucleotides in the context of a sequence . For SNPs, sequence information about a single unique sequence site is sufficient t o identify alleles at that particular SNP. For markers comprising more than one nucleotide, sequence information about the nucleotides of the individua l that contain the polymorphic site identifies the alleles of the individua l for the pa rticu lar site. The sequence information can be obtai ned from a sa mple from the individual . I n certain embodiments, the sample is a nucleic acid sample. I n certain other embodiments, the sample is a protein sample. Various methods for obtaining nucleic acid sequence are known to the skilled person, and all such methods are usefu l for practicing the invention . Sa nger sequencing is a well-known method for generating nucleic acid sequence information . Recent methods for obtaini ng large amounts of sequence data have been developed, and such methods are also contemplated to be useful for obtaining sequence information . These include pyrosequencing tech nology (Ronag hi, M. et al. Anal Biochem 267 :65-7 1 ( 1999) ; Ronaghi, et al. Biotechniques 25:876-878 ( 1998)), e.g. 454 pyrosequencing (Nyren, P., et al. Anal Biochem 208 : 17 1- 175 ( 1993)), Illumina/Solexa sequencing technology (http ://www. illu mina .com; see also Strausberg, RL, et al Drug Disc Today 13 :569-577 (2008)), and Su pported Oligonucleotide Ligation and Detection Platform (SOLiD) technology (Applied Biosystems, http ://www.appliedbiosystems.com) ; Strausberg, RL, et al Drug Disc Today 13 :569-577 (2008) .

It is possible to impute or predict genotypes for un-genotyped relatives of genotyped individuals. For every un-genotyped case, it is possible to calculate the probability of the genotypes of its relatives given its four possi ble phased genotypes. I n practice it may be preferable to include only the genotypes of the case's parents, chi ldren, siblings, half-siblings (and the ha lf-si bling's pa rents), gra nd-parents, gra nd-children (and the grand-children's parents) and spouses. It wil l be assu med that the individua ls in the small sub-pedigrees created arou nd each case are not related t hrough any path not included in the pedigree . It is also assu med that alleles that are not transmitted to the case have the same frequency - the popu lation allele frequency. Let us consider a SNP marker with the alleles A and G. The proba bility of the genotypes of the case's relatives ca n then be computed by:

Pr(genotypes of relatives; Θ) = Pr( z;Θ) Pr(genotypes of relatives | h ) , where Θ denotes the A allele's frequency in the cases. Assu ming the genotypes of each set of relatives are independent, this allows us to write down a likelihood function for Θ

L 9) = Pr(genotypesof relativesof case '; ) . (*)

This assum ption of independence is usually not correct. Accou nting for the dependence between individua ls is a difficult and potentially prohibitively expensive computational task. The li kelihood function in (*) may be thoug ht of as a pseudolikelihood approximation of the ful l likelihood function for Θ which properly accounts for all dependencies. I n genera l, the genotyped cases and controls in a case-control association study are not independent and applying the case-control method to related cases and controls is an ana logous approximation . The method of genomic control (Devlin, B. et al., Nat Genet 36, 1129-30; author reply 113 1 (2004)) has proven to be successfu l at adjusting case-control test statistics for related ness. We therefore apply the method of genomic control to accou nt for the dependence between the terms in our pseudolikelihood and produce a valid test statistic.

Fisher's information can be used to estimate the effective sa mple size of the part of the pseudolikelihood due to un-genotyped cases. Brea king the tota l Fisher information, J, into the pa rt due to genotyped cases, Ig, and the pa rt due to ungenotyped cases, I , I = Ig + Iu, and denoting the nu mber of genotyped cases with N, the effective sam ple size due to the u n genotyped cases is estimated by — N .

I n the present context, an individua l who is at an increased suscepti bility (i.e ., increased risk) for a disease, is an individua l in whom at least one specific allele at one or more polymorphic marker or ha plotype conferring increased susceptibility (increased risk) for the disease is identified (i.e ., at-risk marker alleles or haplotypes) . The at-risk marker or haplotype is one that confers an increased risk (increased susceptibility) of the disease. I n one embodiment, significa nce associated with a marker or ha plotype is measu red by a relative risk (RR) . I n another embodiment, significa nce associated with a marker or haplotye is measu red by an odds ratio (OR) . I n a further embodiment, the significa nce is measu red by a percentage. I n one embodiment, a significant increased risk is measured as a risk (relative risk and/or odds ratio) of at least 1.05, including but not limited to: at least 1.06, at least 1.07, at least 1.08, at least 1.09, at least 1.10, at least 1.11, at least 1.12, at least 1.13, at least 1.14, at least 1.15, at least 1.16, at least 1.17, at least 1.18, at least 1.19, at least 1.20, at least 1.25, at least 1.30, at least 1.35, at least 1.40, at least 1.45 and at least 1.50 . I n a particular embodiment, a risk (relative risk and/or odds ratio) of at least 1.05 is significa nt. I n another particu lar embodiment, a risk of at least 1.09 is significant. I n yet another embodiment, a risk of at least 1.10 is sig nifica nt. I n a further embodiment, a relative risk of at least 1.12 is significant. I n another further embodiment, a significant increase in risk is at least 1.15 is significant. However, other cutoffs are also contemplated, e.g., at least 1.20, 1.25, 1.35, and so on, and such cutoffs are also within scope of the present invention . I n other embodiments, a significant increase in risk is at least about 5%, including but not limited to about 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, and 20% . I n one pa rticu lar embodi ment, a significa nt increase in risk is at least 9% . I n other embodiments, a significa nt increase in risk is at least 10%, at least 12%, and at least 15% . Other cutoffs or ranges as deemed suitable by the person ski lled in the art to characterize the invention are however also contemplated, and those are also within scope of the present invention . I n certai n embodiments, a significant increase in risk is characterized by a p-va lue, such as a p-value of less than 0.05, less tha n 0.01, less tha n 0.00 1, less than 0.000 1, less tha n 0.00001, less tha n 0.000001, less than 0.0000001, less tha n 0.00000001, or less tha n 0.00000000 1.

An at-risk polymorphic marker or haplotype as described herein is one where at least one allele of at least one marker or ha plotype is more frequently present in an individual at risk for the disease (e.g., lu ng ca ncer) (affected), or diagnosed with the disease, com pared to the frequency of its presence in a comparison grou p (control), such that the presence of the marker or haplotype is indicative of susceptibility to the disease. The control grou p may in one embodiment be a population sa mple, i.e. a random sample from the general popu lation . I n another embodiment, the control grou p is represented by a group of individuals who are disease- free, e.g. those that have not been diagnosed with lu ng cancer. I n another embodiment, the disease-free control group is characterized by the absence of one or more disease-specific risk factors. Such risk factors are in one embodi ment at least one envi ron mental risk factor, such as smoking . I n certain embodiments, the control group com prises individuals who have never smoked and have never been diag nosed with lung cancer. I n certain other embodiments, the control grou p comprises individua ls who do have a history of smoking but have not been diagnosed with lu ng ca ncer.

As an example of a sim ple test for correlation would be a Fisher-exact test on a two by two table. Given a cohort of chromosomes, the two by two table is constructed out of the num ber of chromosomes that include both of the markers or haplotypes, one of the markers or ha plotypes but not the other and neither of the markers or haplotypes. Other statistica l tests of association known to the skilled person are also contemplated and are also within scope of the invention .

I n certai n embodiments of the invention, an individual who is at a decreased susceptibility (i.e., at a decreased risk) for a disease is an individua l in whom at least one specific allele at one or more polymorphic marker or ha plotype conferring decreased susceptibility for the disease is identified . The marker alleles and/or haplotypes conferring decreased risk are also said to be protective. I n one aspect, the protective marker or ha plotype is one that confers a significant decreased risk (or susceptibility) of the disease or trait. I n one embodiment, significa nt decreased risk is measu red as a relative risk (or odds ratio) of less tha n 0.95, including but not limited to less tha n 0.90, less than 0.85, less tha n 0.80, less and less than 0.75 . I n another embodiment, the decrease in risk (or susceptibility) is at least 5%, including but not limited to at least 6%, at least 7%, at least 8%, at least 9%, at least 10%, at least 11%, at least 12%, at least 13%, at least 14%, at least 15%, at least 20%, at least 25%, and at least 30% . I n one particular embodi ment, a sig nifica nt decrease in risk is at least about 10% . I n another embodiment, a significant decrease in risk is at least about 12% . I n another embodiment, the decrease in risk is at least about 15% . Other cutoffs or ranges as deemed suita ble by the person skilled in the art to characterize the invention are however also contemplated, and those are also within scope of the present invention .

A genetic variant associated with a disease can be used alone to predict the risk of the disease for a given genotype . For a bial lelic marker, such as a SNP, there are 3 possible genotypes : homozygote for the at risk varia nt, heterozygote, and non carrier of the at risk varia nt. Risk associated with variants at multiple loci can be used to estimate overall risk. For multi ple SNP variants, there are k possible genotypes k = 3" x 2 ; where n is the nu mber autosomal loci and p the number of gonosomal (sex chromosomal) loci . Overall risk assessment calculations for a plura lity of risk variants usua lly assu me that the relative risks of different genetic varia nts multiply, i.e. the overa ll risk (e.g. , RR or OR) associated with a particular genotype combination is the product of the risk values for the genotype at each locus. If the risk presented is the relative risk for a person, or a specific genotype for a person, compared to a reference popu lation with matched gender and ethnicity, then the combined risk is the product of the locus specific risk values and also corresponds to an overal l risk estimate compared with the popu lation . If the risk for a person is based on a com parison to non-ca rriers of the at risk allele, then the combined risk corresponds to an estimate that compares the person with a given com bination of genotypes at all loci to a grou p of individua ls who do not carry risk varia nts at any of those loci . The group of non-ca rriers of any at risk varia nt has the lowest estimated risk and has a com bined risk compa red with itself {i.e. , non-carriers) of 1.0, but has an overall risk, com pare with the population, of less tha n 1.0 . It should be noted that the grou p of non-carriers can potentially be very sma ll, especia lly for large nu mber of loci, and in that case, its relevance is correspondingly small .

The multiplicative model is a pa rsimonious model that usua lly fits the data of com plex traits reasona bly well . Deviations from multiplicity have been rarely descri bed in the context of com mon variants for com mon diseases, and if reported are usually only suggestive since very la rge sam ple sizes are usual ly required to be able to demonstrate statistical interactions between loci .

By way of an exam ple, let us consider the case where a tota l of eight variants that have been associated with a disease. One such exa mple is provided by prostate ca ncer (Gud mundsson, J., et al., Nat Genet 39 :63 1-7 (2007), Gudmundsson, J., et al., Nat Genet 39 :977-83 (2007) ; Yeager, M., et al, Nat Genet 3 9 :645-49 (2007), Amu ndadottir, L , el al., Nat Genet 38 :652-8 (2006) ; Haiman, C.A. , et al., Nat Genet 39 :638-44 (2007)) . Seven of these loci are on autosomes, and the remaining locus is on chromosome X . The total number of theoretica l 7 1 genotypic combinations is then 3 x 2 = 4374. Some of those genotypic classes are very ra re, but are still possible, and shou ld be considered for overa ll risk assessment.

It is likely that the multiplicative model applied in the case of multiple genetic variant will also be valid in conjugation with non-genetic risk varia nts assuming that the genetic variant does not clea rly correlate with the "environ mental" factor. I n other words, genetic and non-genetic at- risk variants can be assessed under the multiplicative model to estimate combined risk, assuming that the non-genetic and genetic risk factors do not interact.

Using the same quantitative approach, the combined or overall risk associated with any plu rality of varia nts associated with lung cancer may be assessed . For exa mple, the com bined risk of any plura lity of the variants described herein {e.g., rs6474412, rs2 15614 and rs4105 144, and thei r surrogates) may be assessed . Fu rther, other markers described to be associated with risk of lu ng ca ncer may be assessed in com bination of any one of the markers described herein, e.g. markers in the CHRNA5/CHRNA3/CHRNB4 gene cluster on chromosome 15 {e.g, rsl05 1730, or its su rrogates) .

Linkage Disequilibrium The natura l phenomenon of recombination, which occu rs on average once for each chromosoma l pair during each meiotic event, represents one way in which natu re provides variations in sequence (and biological function by consequence) . It has been discovered that recombination does not occu r random ly in the genome; rather, there are large variations in the frequency of recombination rates, resulting in small regions of high recom bination frequency (also ca lled recombination hotspots) and larger regions of low recombination frequency, which are commonly referred to as Lin kage Disequilibrium (LD) blocks (Myers, S. et al., Biochem Soc Trans 34: 526- 530 (2006) ; Jeffreys, A.J ., et al., Nature Genet 29 :217-222 (200 1) ; May, C.A. , et al., Nature Genet 31:272-275(2002)) .

Lin kage Disequilibriu m (LD) refers to a non-random assortment of two genetic elements. For example, if a particu lar genetic element (e.g. , an allele of a polymorphic marker, or a haplotype) occu rs in a popu lation at a frequency of 0.50 (50%) and another element occurs at a frequency of 0.50 (50%), then the predicted occu rrance of a person's havi ng both elements is 0.25 (25%), assuming a random distribution of the elements. However, if it is discovered that the two elements occur together at a frequency higher than 0 .25, then the elements are said to be in lin kage disequilibriu m, since they tend to be inherited together at a hig her rate than what their independent frequencies of occu rrence (e.g., allele or ha plotype frequencies) wou ld predict. Roug hly spea king, LD is generally correlated with the frequency of recombination events between the two elements. Allele or ha plotype frequencies ca n be determined in a population by genotyping individuals in a popu lation and determi ning the frequency of the occurence of each allele or haplotype in the population . For populations of diploids, e.g. , human populations, individua ls will typica lly have two alleles or allelic combinations for each genetic element (e.g. , a marker, haplotype or gene) .

Many different measu res have been proposed for assessing the strength of lin kage disequi libriu m (LD; reviewed in Devlin, B. & Risch, N., Genomics 29 :311-22 ( 1995)) . Most ca pture the strength of association between pairs of bial lelic sites. Two important pairwise measures of LD are r2 (sometimes denoted ∆ 2) and |D'| (Lewontin, R., Genetics 49:49-67 ( 1964) ; Hill, W.G. & Robertson, A . Theor. Appl. Genet. 22 :226-231 ( 1968)) . Both measures range from 0 (no disequili brium) to 1 ('complete' disequilibriu m), but their interpretation is slig htly different. |D' | is defined in such a way that it is equa l to 1 if just two or t hree of the possible ha plotypes for two markers are present, and it is < 1 if all four possible ha plotypes are present. Therefore, a value of |D' | that is < 1 indicates that historical recombination may have occu rred between two sites (recurrent mutation can also cause |D'| to be < 1, but for sing le nucleotide polymorphisms (SNPs) this is usually regarded as being less likely than recombination) . The measu re r2 represents the statistical correlation between two sites, and takes the value of 1 if only two haplotypes are present.

The r2 measure is arguably the most releva nt measu re for association mapping, because there is a sim ple inverse relationship between r2 and the sam ple size required to detect association between susceptibility loci and SNPs. These measu res are defined for pai rs of sites, but for some applications a determination of how strong LD is across an entire region that contains many polymorphic sites might be desira ble (e.g. , testing whether the strength of LD differs sig nificantly among loci or across populations, or whether there is more or less LD in a region tha n predicted under a particu lar model) . Roug hly spea king, r measures how much recom bination wou ld be required under a particula r population model to generate the LD that is seen in the data . This type of method ca n potentia lly also provide a statistically rigorous approach to the problem of determining whether LD data provide evidence for the presence of recombination hotspots. For the methods described herein, a significant r2 value between markers indicative of the markers being in linkage disequilibriu m ca n be at least 0.1, such as at least 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0 .9 1, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or at least 0.99 . I n one preferred embodiment, the significa nt r2 value can be at least 0 .2 . I n another embodiment, the sig nifica nt r2 value is at least 0.3 . I n another embodiment, the sig nifica nt r2 value is at least 0.4. Other r2 values are also contemplated, and are also within the scope of the invention, including but not li mited to at least 0 .5, 0.6, 0.7, 0.8 and 0.9. The values of r2 given in the su rrogate Tables 1 to 3 may be used to select markers fulfilling any suitable criteria of r2 values. Lin kage disequilibrium can be determined in a sing le human population, as defined herein, or it ca n be determined in a collection of sa mples com prisi ng individua ls from more than one human population . I n one embodiment of the invention, LD is determined in a sam ple from one or more of the Ha pMa p popu lations (Caucasian, Africa n (Yuroba n), Ja pa nese, Chinese), as defined (http ://www. ha pmap .org) . I n one such embodiment, LD is determined in the CEU population of the Ha pMap samples (Uta h residents with ancestry from northern and western Europe) . I n another embodiment, LD is determined in the YRI popu lation of the HapMap samples (Yuroba in Ibada n, Nigeria) . I n another embodiment, LD is determined in the CHB population of the Ha pMa p sa mples (Han Chinese from Beijing, Chi na) . I n another embodiment, LD is determined in the JPT popu lation of the HapMap sam ples (Ja panese from Tokyo, Ja pan) . I n yet another embodiment, LD is determined in sa mples from the Icelandic population .

If all polymorphisms in the genome were independent at the population level (i.e., no LD), then every single one of them would need to be investigated in association studies, to assess all the different polymorphic states. However, due to lin kage disequi libriu m between polymorphisms, tightly lin ked polymorphisms are strongly correlated, which reduces the nu mber of polymorphisms that need to be investigated in an association study to observe a sig nificant association . Another consequence of LD is that many polymorphisms may give an association signal due to the fact that these polymorphisms are strong ly correlated .

Genomic LD maps have been generated across the genome, and such LD maps have been proposed to serve as framework for mapping disease-genes (Risch, N. & Merkiangas, K, Science 273 : 15 16- 1517 ( 1996); Maniatis, N., et ai., Proc Natl Acad Sci USA 99 :2228-2233 (2002) ; Reich, DE e ai, Nature 4 11:199-204 (200 1)) .

It is now established that many portions of the human genome ca n be broken into series of discrete ha plotype blocks containing a few com mon haplotypes; for these blocks, lin kage disequili brium data provides little evidence indicating recombi nation (see, e.g., Wall ., J.D. and Pritcha rd, J.K ., Nature Reviews Genetics 4 :587-597 (2003) ; Daly, M. et ai., Nature Genet. 29: 229-232 (200 1); Ga briel, S.B. et ai., Science 296 :2225-2229 (2002) ; Patil, N. et ai., Science

294 :17 19- 1723 (200 1); Dawson, E. et ai., Nature 4 . 8 :544-548 (2002) ; Philli ps, M.S. et ai., Nature Genet. 33: 382-387 (2003)) .

There are two main methods for defining these haplotype blocks: blocks ca n be defined as regions of DNA that have limited haplotype diversity (see, e.g., Da ly, M. et al., Nature Genet. 29: 229-232 (200 1); Patil, N. et a\., Science 294 :17 19-1723 (2001) ; Dawson, E. et al., Nature

4 . 8 :544-548 (2002) ; Zhang, K . et al., Proc. Natl. Acad. Sci. USA 99: 7335-7339 (2002)), or as regions between transition zones having extensive historical recombi nation, identified using lin kage disequilibriu m (see, e.g., Gabriel, S.B. et al., Science 296: 2225-2229 (2002); Phil lips, M.S . et al., Nature Genet. 33: 382-387 (2003) ; Wang, N. et al., Am. J. Hum. Genet. 71 :1221- 1234 (2002) ; Stum pf, M.P., and Goldstein, D.B., Curr. Biol. 13: 1-8 (2003)) . More recently, a fine-sca le map of recombination rates and corresponding hotspots across the huma n genome has been generated (Myers, S., et al., Science 310 :321-32324 (2005) ; Myers, S. et al., Biochem Soc Trans 34 :526530 (2006)) . The map reveals the enormous variation in recom bination across the genome, with recombination rates as hig h as 10-60 cM/Mb in hotspots, while closer to 0 in intervening regions, which thus represent regions of limited haplotype diversity and high LD. The map can therefore be used to define ha plotype blocks/LD blocks as regions fla nked by recombination hotspots. As used herein, the terms "ha plotype block" or "LD block" includes blocks defined by any of the above described characteristics, or other alternative methods used by the person skilled in the art to define such regions.

Haplotype blocks (LD blocks) can be used to map associations between phenotype and ha plotype status, using single markers or haplotypes comprising a plura lity of markers. The main haplotypes ca n be identified in each haplotype block, and then a set of "tagging" SNPs or markers (the smallest set of SNPs or markers needed to distinguish among the ha plotypes) ca n then be identified . These taggi ng SNPs or markers can then be used in assessment of samples from groups of individua ls, in order to identify association between phenotype and haplotype . For example, markers shown herein to be associated with lung cancer are such tagging markers.

It has thus become appa rent that for any given observed association to a polymorphic marker in the genome, additiona l markers in the genome also show association . This is a natu ral consequence of the uneven distribution of LD across the genome, as observed by the la rge variation in recombination rates. The markers used to detect association thus in a sense represent "tags" for a genomic region (i.e., a haplotype block or LD block) that is associating with a given disease or trait, and as such are usefu l for use in the methods and kits of the present invention . One or more causative (functiona l) variants or mutations may reside within the region fou nd to be associating to the disease or trait. The functional variant may be another SNP, a tandem repeat polymorphism (such as a minisatellite or a microsatellite), a transposable element, or a copy nu mber variation, such as an inversion, deletion or insertion . Such variants in LD with the variants described herein may confer a hig her relative risk (RR) or odds ratio (OR) than observed for the tagging markers used to detect the association . The present invention thus refers to the markers used for detecting association to the disease, as described herein, as well as markers in li nkage disequi librium with the markers. Thus, in certain embodiments of the invention, markers that are in LD with the markers originally used to detect an association may be used as surrogate markers. The su rrogate markers have in one embodiment relative risk (RR) and/or odds ratio (OR) values smaller than origina lly detected . I n other embodiments, the surrogate markers have RR or OR values greater tha n those initially determined for the markers initial ly fou nd to be associating with the disease. An exa mple of such an embodiment would be a ra re, or relatively rare (such as < 10% allelic popu lation frequency) variant in LD with a more com mon variant ( > 10% popu lation frequency) initially fou nd to be associating with the disease. Identifying and using such surrogate markers for detecting the association can be performed by routine methods well known to the person skilled in the art, and are therefore within the scope of the present invention .

Determination of haplotype frequency The frequencies of ha plotypes in patient and control grou ps can be estimated using an expectation-maximization algorithm (Dempster A . et al. , J. R. Stat. Soc. B, 39 :1-38 ( 1977)) . An implementation of this algorithm that ca n handle missing genotypes and uncertainty with the phase can be used . Under the nu ll hypothesis, the patients and the controls are assumed to have identical frequencies. Using a likeli hood approach, an alternative hypothesis is tested, where a candidate at-risk-ha plotype, which ca n include the markers described herein, is allowed to have a higher frequency in patients than controls, while the ratios of the frequencies of other haplotypes are assu med to be the same in both grou ps. Likeli hoods are maxi mized separately under both hypotheses and a corresponding 1-df likelihood ratio statistic is used to evaluate the statistical significance .

To look for at-risk and protective markers and haplotypes within a susceptibility region, for example within an LD block, association of all possible combinations of genotyped markers within the region is studied . The combined patient and control groups ca n be randomly divided into two sets, equal in size to the origi na l group of patients and controls. The marker and ha plotype ana lysis is then repeated and the most sig nificant p-value registered is determined . This ra ndomization scheme can be repeated, for example, over 100 times to construct an empirica l distribution of p-values. I n a preferred embodiment, a p-value of <0.05 is indicative of a significa nt marker and/or haplotype association .

Haplotype Analysis

One general approach to haplotype ana lysis involves using li kelihood-based inference applied to NEsted MOdels (Gretarsdottir S., et al. , Nat. Genet. 35: 13 1-38 (2003)) . The method is implemented in the program NEMO, which allows for many polymorphic markers, SNPs and microsatellites. The method and software are specifically designed for case-control studies where the pu rpose is to identify haplotype grou ps that confer different risks. It is also a tool for studying LD structures. I n NEMO, maximu m likelihood estimates, likelihood ratios and p-va lues are calculated directly, with the aid of the EM algorith m, for the observed data treating it as a missing-data problem .

Even thoug h likelihood ratio tests based on li kelihoods computed directly for the observed data, which have captured the information loss due to uncertainty in phase and missing genotypes, can be relied on to give valid p-values, it wou ld still be of interest to know how much information had been lost due to the information being incom plete. The information measure for haplotype ana lysis is described in Nicolae and Kong (Technica l Report 537, Department of Statistics, University of Statistics, University of Chicago; Biometrics, 60(2) :368-75 (2004)) as a natu ral extension of information measu res defined for lin kage ana lysis, and is implemented in NEMO.

Association analysis For sing le marker association to a disease, the Fisher exact test can be used to calculate two- sided p-values for each individua l allele. Correcting for relatedness among patients can be done by extending a varia nce adjustment procedure previously described (Risch, N. & Teng, J. Genome Res., 8 : 1273- 1288 ( 1998)) for sibships so that it ca n be applied to genera l familial relationships. The method of genomic controls (Devlin, B. & Roeder, K . Biometrics 55 :997 ( 1999)) ca n also be used to adjust for the relatedness of the individuals and possible stratification .

For both single-marker and ha plotype analyses, relative risk (RR) and the popu lation attributable risk (PAR) ca n be calcu lated assuming a multiplicative model (ha plotype relative risk model) (Terwilliger, J.D. & Ott, J., Hum. Hered. 42 :337-46 ( 1992) and Falk, C.T. & Ru binstei n, P, Ann. Hum. Genet. 51 (Pt 3) :227-33 (1987)), i.e. , that the risks of the two alleles/ha plotypes a person carries multiply. For exam ple, if RR is the risk of A relative to a, then the risk of a person homozygote AA wil l be RR times that of a heterozygote Aa and RR2 times that of a homozygote aa. The multiplicative model has a nice property that simplifies ana lysis and computations — haplotypes are independent, i.e., in Hardy-Weinberg equilibriu m, within the affected population as well as within the control population . As a consequence, haplotype cou nts of the affecteds and controls each have multi nomia l distributions, but with different ha plotype frequencies under the alternative hypothesis. Specifical ly, for two haplotypes, , and h}, risk(ft )/risk(ft ) =

(fi/Pi)/(f j/ Pj ), where and p denote, respectively, frequencies in the affected popu lation and in the control popu lation . Whi le there is some power loss if the true model is not multiplicative, the loss tends to be mild except for extreme cases. Most importa ntly, p-values are always valid since they are computed with respect to nu ll hypothesis.

An association signal detected in one association study may be replicated in a second cohort, ideally from a different population {e.g., different region of same country, or a different country) of the same or different eth nicity. The advantage of replication studies is that the nu mber of tests performed in the replication study is usually quite sma ll, and hence the less stringent the statistical measure that needs to be applied . For exa mple, for a genome-wide search for susceptibility varia nts for a particula r disease or trait usi ng 300,000 SNPs, a correction for the 300,000 tests performed (one for each SNP) ca n be performed . Since many SNPs on the arrays typically used are correlated {i.e., in LD), they are not independent. Thus, the correction is conservative. Nevertheless, applying this correction factor requires an observed P-value of less than 0.05/300,000 = 1.7 x 10 7 for the sig na l to be considered significant applying this conservative test on results from a sing le study cohort. Obviously, sig nals found in a genome- wide association study with P-values less tha n this conservative threshold {i.e. , more significa nt) are a measure of a true genetic effect, and replication in additiona l cohorts is not necessa ry from a statistical point of view. Importantly, however, signals with P-va lues that are greater tha n this t hreshold may also be due to a true genetic effect. The sample size in the first study may not have been sufficiently large to provide an observed P-value that meets the conservative t hreshold for genome-wide significance, or the first study may not have reached genome-wide significa nce due to inherent fluctuations due to sa mpling . Since the correction factor depends on the number of statistical tests performed, if one signal (one SNP) from an initia l study is replicated in a second case-control cohort, the appropriate statistica l test for significa nce is that for a single statistical test, i.e., P-va lue less tha n 0.05 . Replication studies in one or even severa l additional case-control cohorts have the added advantage of providing assessment of the association sig na l in additional populations, thus simu ltaneously confirming the initial finding and providing an assessment of the overall significance of the genetic variant(s) being tested in human populations in general .

The results from several case-control cohorts ca n also be combined to provide an overa ll assessment of the underlying effect. The methodology commonly used to com bine resu lts from multiple genetic association studies is the Mantel-Haenszel model (Ma ntel and Haenszel, J Natl Cancer Inst 22 :719-48 ( 1959)) . The model is designed to deal with the situation where association resu lts from different populations, with each possibly having a different population frequency of the genetic variant, are combined . The model combines the resu lts assu ming that the effect of the varia nt on the risk of the disease, a measu red by the OR or RR, is the same in all popu lations, whi le the frequency of the variant may differ between the populations. Combining the results from several populations has the added advantage that the overall power to detect a real underlying association signa l is increased, due to the increased statistical power provided by the combined cohorts. Fu rthermore, any deficiencies in individual studies, for example due to unequa l matching of cases and controls or population stratification will tend to ba lance out when resu lts from multi ple cohorts are com bined, again providi ng a better estimate of the true underlying genetic effect.

Risk assessment and Diagnostics Within any given population, there is an absolute risk of developing a disease or trait, defined as the chance of a person developi ng the specific disease or trait over a specified time-period . For example, a woma n's lifetime absolute risk of breast cancer is one in nine . That is to say, one woman in every nine will develop breast cancer at some point in their lives. Risk is typica lly measured by looking at very large nu mbers of people, rather than at a particular individual . Risk is often presented in terms of Absolute Risk (AR) and Relative Risk (RR) . Relative Risk is used to com pare risks associating with two varia nts or the risks of two different grou ps of people . For example, it can be used to compare a grou p of people with a certain genotype with another grou p having a different genotype. For a disease, a relative risk of 2 means that one group has twice the chance of developing a disease as the other group. The risk presented is usual ly the relative risk for a person, or a specific genotype of a person, compared to the population with matched gender and ethnicity. Risks of two individuals of the same gender and ethnicity could be compared in a simple man ner. For exam ple, if, com pared to the population, the first individua l has relative risk 1.5 and the second has relative risk 0.5, then the risk of the first individua l compa red to the second individual is 1.5/0 .5 = 3 .

Risk Calculations

The creation of a model to calcu late the overa ll genetic risk involves two steps: i) conversion of odds-ratios for a single genetic variant into relative risk and ii) combination of risk from multi ple variants in different genetic loci into a single relative risk value.

Deriving risk from odds-ratios Most gene discovery studies for complex diseases that have been published to date in authoritative journals have employed a case-control design because of their retrospective setu p. These studies sam ple and genotype a selected set of cases (people who have the specified disease condition) and control individuals. The interest is in genetic variants (alleles) which frequency in cases and controls differ significantly.

The results are typically reported in odds ratios, that is the ratio between the fraction (proba bility) with the risk variant (carriers) versus the non-risk variant (non-carriers) in the grou ps of affected versus the controls, i.e. expressed in terms of probabilities conditional on the affection status :

OR = (Pr(c |A)/Pr(nc| A)) / (Pr(c| C)/Pr(nc| C))

Sometimes it is however the absolute risk for the disease that we are interested in, i.e. the fraction of those individuals carrying the risk variant who get the disease or in other words the proba bility of getting the disease . This num ber cannot be directly measu red in case-control studies, in part, because the ratio of cases versus controls is typically not the same as that in the general popu lation . However, under certai n assu mption, we ca n estimate the risk from the odds ratio.

It is well known that under the ra re disease assu mption, the relative risk of a disease can be approximated by the odds ratio. This assum ption may however not hold for many common diseases. Still, it turns out that the risk of one genotype varia nt relative to another can be estimated from the odds ratio expressed above . The calculation is pa rticu larly simple under the assum ption of ra ndom population controls where the controls are random samples from the same population as the cases, including affected people rather than bei ng strictly unaffected individua ls. To increase sample size and power, many of the large genome-wide association and replication studies use controls that were neither age-matched with the cases, nor were they carefu lly scruti nized to ensu re that they did not have the disease at the time of the study. Hence, while not exactly, they often approximate a random sa mple from the genera l population . It is noted that this assu mption is rarely expected to be satisfied exactly, but the risk estimates are usually robust to moderate deviations from this assu mption .

Ca lcu lations show that for the dominant and the recessive models, where we have a risk varia nt carrier, "c", and a non-carrier, "nc", the odds ratio of individua ls is the sa me as the risk ratio between these variants : OR = Pr(A |c)/Pr(A |nc) = r

And likewise for the multi plicative model, where the risk is the product of the risk associated with the two allele copies, the allelic odds ratio equals the risk factor:

OR = Pr(A |aa)/Pr(A| ab) = Pr(A| ab)/Pr(A| bb) = r

Here "a" denotes the risk allele and "b" the non-risk allele . The factor "r" is therefore the relative risk between the allele types.

For many of the studies published in the last few years, reporting common variants associated with complex diseases, the multi plicative model has been fou nd to su mma rize the effect adequately and most often provide a fit to the data superior to alternative models such as the dominant and recessive models.

The risk relative to the average population risk It is most convenient to represent the risk of a genetic variant relative to the average population since it makes it easier to communicate the lifetime risk for developi ng the disease compa red with the baseline popu lation risk. For exa mple, in the multiplicative model we ca n calcu late the relative popu lation risk for varia nt "aa" as :

RR(aa) = Pr(A| aa)/Pr(A) = (Pr(A| aa)/Pr(A |bb))/(Pr(A)/Pr(A |bb)) = r /(Pr(aa) r2 + Pr(a b) r + Pr(bb)) = r /(p 2 r2 + 2pq r + q2) = r / R

Here "p" and "q" are the allele frequencies of "a" and "b" respectively. Likewise, we get that RR(ab) = r/R and RR(bb) = 1/R. The allele frequency estimates may be obtained from the pu blications that report the odds-ratios and from the HapMap database. Note that in the case where we do not know the genotypes of an individual, the relative genetic risk for that test or marker is sim ply equal to one.

As an example, for lung ca ncer risk, allele T of marker rs6474412 has an allelic OR of 1.12 and a frequency (p) arou nd 0.8 in white popu lations. The genotype relative risk compared to genotype CC are estimated based on the multiplicative model .

For TT it is 1.12 x 1.12 = 1.25; for CT it is simply the OR 1.12, and for CC it is 1.0 by definition .

The frequency of allele C is q = l - p = l - 0 .8 = 0 .2 . Population frequency of each of the t hree possible genotypes at this marker is:

Pr(TT) = p2 = 0.64, Pr(CT) = 2pq = 0.32, and Pr(CC) = q2 = 0.04

The average popu lation risk relative to genotype CC (which is defined to have a risk of one) is :

R = 0.64 x 1.25 + 0.32 x 1.12 + 0.04 x 1 = 1.20

Therefore, the risk relative to the general popu lation (RR) for individuals who have one of the following genotypes at this marker is:

RR(TT) = 1.25/1 .20 = 1.04, RR(CT) = 1.12/1 .20 = 0.93, RR(CC) = 1/1 .20 = 0.83 . Of course, using non-carriers of the T allele of rs6474412 as reference, the risk will be considerably greater. The at-risk allele T is com mon in the population, which mea ns that a large proportion of the popu lation is at-risk. Therefore, the risk compa red with the general popu lation is relatively smal l than the risk com pared with non-ca rriers of the at-risk T allele .

Combining the risk from multiple markers When genotypes of many SNP varia nts are used to estimate the risk for an individual a multiplicative model for risk ca n genera lly be assumed . This mea ns that the combined genetic risk relative to the popu lation is calcu lated as the product of the corresponding estimates for individua l markers, e.g . for two markers gl and g2:

RR(g l,g2) = RR(g l)RR(g2)

The underlying assu mption is that the risk factors occu r and behave independently, i.e . that the joi nt conditional probabi lities can be represented as products:

Pr(A| gl,g2) = Pr(A |gl)Pr(A| g2)/Pr(A) and Pr(g l,g2) = Pr(g l)Pr(g2)

Obvious violations to this assu mption are markers that are closely spaced on the genome, i.e. in lin kage disequilibriu m, such that the concu rrence of two or more risk alleles is correlated . I n such cases, we ca n use so called haplotype modeling where the odds-ratios are defined for all allele combinations of the correlated SNPs.

As is in most situations where a statistical model is utilized, the model applied is not expected to be exactly true since it is not based on an underlying bio-physical model . However, the multiplicative model has so far been fou nd to fit the data adequately, i.e . no sig nificant deviations are detected for many com mon diseases for which many risk variants have been discovered .

As an example, an individual who has the fol lowing genotypes at 4 hypothetical markers associated with a particula r disease along with the risk relative to the population at each marker:

Combined, the overall risk relative to the popu lation for this individual is : 1.03 x 1.30 x 0.88 x 1.54 = 1.81 .

Adjusted life-time risk The lifetime risk of an individual is derived by multiplying the overall genetic risk relative to the popu lation with the average life-time risk of the disease in the general population of the same ethnicity and gender and in the region of the individua l's geog raphical origin . As there are usually several epidemiologic studies to choose from when defining the general population risk, we will pick studies that are well-powered for the disease definition that has been used for the genetic varia nts. For example, if the overal l genetic risk relative to the population is 1.8 for a disease for an individua l, and if the average life-time risk of the disease demographic grou p of the individual is 20%, then the adjusted lifetime risk for the individual is 20% x 1.8 = 36% .

Note that since the average RR for a popu lation is one, this multiplication model provides the same average adjusted life-time risk of the disease. Furthermore, since the actual life-time risk can not exceed 100%, there must be an upper limit to the genetic RR.

Risk assessment for lung cancer As described herei n, certain polymorphic markers are found to be usefu l for risk assessment of lu ng ca ncer. Risk assessment can involve the use of such markers for determining a susceptibility to lu ng ca ncer. Tagging markers in lin kage disequilibrium with at-risk variants (or protective variants) can be used as surrogates for these markers. Such surrogate markers ca n be located within a particula r haplotype block or LD block. Such su rrogate markers ca n also sometimes be located outside the physica l bou ndaries of such a haplotype block or LD block, either in close vicinity of the LD block/ha plotype block, but possibly also located in a more distant genomic location .

Long-distance LD ca n for example arise if pa rticu lar genomic regions (e.g. , genes) are in a functional relationshi p. For exam ple, if two genes encode proteins that play a role in a sha red metabolic pathway, then pa rticu lar varia nts in one gene may have a direct impact on observed variants for the other gene . Let us consider the case where a variant in one gene leads to increased expression of the gene product. To cou nteract this effect and preserve overall flux of the pa rticula r pathway, this variant may have led to selection of one (or more) variants at a second gene that confers decreased expression levels of that gene. These two genes may be located in different genomic locations, possibly on different chromosomes, but variants within the genes are in appa rent LD, not because of their shared physical location within a region of high LD, but rather due to evolutionary forces. Such LD is also contemplated and within scope of the present invention . The skilled person will appreciate that many other scenarios of functiona l gene-gene interaction are possi ble, and the particu lar example discussed here represents only one such possible scena rio.

Markers with values of r2 equa l to 1 are perfect surrogates for the at-risk variants (anchor variants), i.e. genotypes for one marker perfectly predicts genotypes for the other. Markers with smal ler values of r2 than 1 can also be surrogates for the at-risk variant, or alternatively represent variants with relative risk values as hig h as or possi bly even higher tha n the at-risk variant. I n certai n preferred embodi ments, markers with values of r2 to the at-risk anchor variant are useful surrogate markers. The at-risk variant identified may not be the functional variant itself, but is in this instance in linkage disequilibrium with the true functional variant. The functional varia nt may be a SNP, but may also for exa mple be a tandem repeat, such as a minisatellite or a microsatellite, a transposable element {e.g., an Alu element), or a structura l alteration, such as a deletion, insertion or inversion (sometimes also called copy num ber variations, or CNVs) . The present invention encompasses the assessment of such surrogate markers for the markers as disclosed herei n. Such markers are annotated, mapped and listed in pu blic data bases, as well known to the skilled person, or can alternatively be readily identified by sequencing the region or a part of the region identified by the markers of the present invention in a grou p of individuals, and identify polymorphisms in the resu lting grou p of sequences. As a consequence, the person skil led in the art ca n readily and without undue experimentation identify and genotype su rrogate markers in linkage disequilibrium with the markers and/or haplotypes as described herein .

The present invention ca n in certain embodiments be practiced by assessing a sample com prising genomic DNA from an individual for the presence variants described herein to be associated with lu ng ca ncer. Such assessment typically includes steps that detect the presence or absence of at least one allele of at least one polymorphic marker, using methods well known to the skilled person and further described herein, and based on the outcome of such assessment, determi ne whether the individual from whom the sa mple is derived is at increased or decreased risk (i.e., increased or decreased susceptibility) of lung ca ncer. Detecting particu lar alleles of polymorphic markers ca n in certain embodiments be done by obtai ning nucleic acid sequence data about a particula r hu man individual that identifies at least one allele of at least one polymorphic marker. Different alleles of the at least one marker are associated with different susceptibility to the disease in huma ns. Obtaining nucleic acid sequence data can com prise nucleic acid sequence at a single nucleotide position, which is sufficient to identify alleles at SNPs. The nucleic acid sequence data can also comprise sequence at any other num ber of nucleotide positions, in pa rticu lar for genetic markers that comprise multi ple nucleotide positions, and can be anywhere from two to hu nd reds of thousands, possibly even mil lions, of nucleotides (in pa rticu lar, in the case of copy nu mber variations (CNVs)) .

I n certai n embodiments, the invention can be practiced uti lizing a dataset comprising information about the genotype status of at least one polymorphic marker. I n other words, a dataset containing information about such genetic status, for example in the form of genotype counts at a certain polymorphic marker, or a plu rality of markers (e.g., an indication of the presence or absence of certain at-risk alleles), or actual genotypes for one or more markers (for exa mple in the form of sequence information), ca n be queried for the presence or absence of certain at-risk alleles at certain polymorphic markers shown by the present inventors to be associated with lu ng cancer. A positive result for a variant (e.g., marker allele) associated with lu ng cancer, is indicative of the individual from which the dataset is derived is at increased susceptibility (increased risk) of lu ng ca ncer.

I n certai n embodiments of the invention, a polymorphic marker is correlated to lung cancer by referencing genotype data for the polymorphic marker to a data base, such as a look-u p table, that comprises correlation data between at least one allele of the polymorphism and lu ng cancer. I n some embodiments, the table comprises a correlation for one polymorphism . I n other embodiments, the table comprises a correlation for a plu rality of polymorphisms. I n both scenarios, by referencing to a look-u p table that gives an indication of a correlation between a marker and lung ca ncer, a risk or susceptibility of lu ng cancer can be identified in the individua l from whom the sa mple is derived . I n some embodiments, the correlation is reported as a statistical measure. The statistical measure may be reported as a risk measu re, such as a relative risk ( ), an absolute risk (AR) or an odds ratio (OR) .

Risk markers may be useful for risk assessment and diag nostic purposes, either alone or in com bination . Results of disease risk assessment can also be combined with data for other genetic markers or risk factors for the disease, to establish overa ll risk. Thus, even in cases where the increase in risk by individual markers is relatively modest, e.g. on the order of 10- 30%, the association may have sig nificant implications when combined with other risk markers. Thus, relatively common variants may have sig nifica nt contribution to the overal l risk (Population Attri butable Risk is hig h), or combination of markers can be used to define groups of individual who, based on the combi ned risk of the markers, is at significa nt combined risk of developing the disease . Thus, by assaying for multiple genetic markers associated with lung cancer risk, a significa nt risk may be captured using the com bination of varia nts, even thoug h each variant may, on its own, captu re a relatively small proportion of the overall genetic risk.

As a consequence, in certain embodiments of the invention, a plura lity of varia nts are used for overa ll risk assessment. These variants are in one embodiment selected from the variants as disclosed herein . Other embodiments include the use of the variants of the present invention in com bination with other variants known to be usefu l for diagnosing a susceptibility to lu ng cancer. I n such embodiments, the genotype status of a plu rality of markers (or haplotypes) is determined in an individual, and the status of the individua l compared with the popu lation frequency of the associated varia nts, or the frequency of the variants in clinical ly healthy subjects, such as age-matched and sex-matched su bjects. Methods known in the art, such as multivariate analyses or joint risk analyses, such as those described herein, or other methods known to the skilled person, may subsequently be used to determine the overall risk conferred based on the genotype status at the multiple loci . Assessment of risk based on such analysis may subsequently be used in the methods, uses and kits of the invention, as described herein .

Study population I n a genera l sense, the methods and kits described herei n ca n be utilized from samples containing nucleic acid materia l (DNA or RNA) for protein materia l rom any source and from any individua l, or from genotype or sequence data derived from such samples. I n preferred embodiments, the individua l is a human individual . The individual ca n be an adu lt, chi ld, or fetus. The nucleic acid or protein source may be any sa mple comprising nucleic acid or protein material, including biological sam ples, or a sample comprising nucleic acid or protein material derived therefrom . The present invention also provides for assessing markers in individuals who are members of a target population . Such a target popu lation is in one embodiment a popu lation or grou p of individuals at pa rticu lar risk, for exam ple smokers.

The invention provides for embodiments that include individuals with age of onset or age at diagnosis of lu ng ca ncer in certain age subgrou ps, such as those over the age of 40, over age of 45, or over age of 50, 55, 60, 65, 70, 75, 80, or 85. Other embodiments of the invention pertain to other age grou ps, such as individua ls aged less than 85, such as less tha n age 80, less than age 75, or less tha n age 70, 65, 60, 55, 50, 45, 40, 35, or age 30. Other embodiments relate to individua ls with age at onset or age at diag nosis of lung cancer in any of the age ra nges described in the above. It is also contemplated that a range of ages may be releva nt in certain embodiments, such as age at onset at more than age 45 but less than age 60 . Other age ranges are however also contemplated, including all age ranges bracketed by the age values listed in the above. The invention furthermore relates to individuals of either gender, males or females.

The Icelandic population is a Caucasian popu lation of Northern European ancestry. A large num ber of studies reporting resu lts of genetic lin kage and association in the Icela ndic popu lation have been pu blished in the last few yea rs. Many of those studies show replication of variants, originally identified in the Icelandic popu lation as being associating with a pa rticu lar disease, in other populations (Su lem, P., et al. Nat Genet May 17 2009 (Epu b ahead of print) ; Rafnar, T., et al. Nat Genet 4 1:22 1-7 (2009) ; Greta rsdottir, S., et al. Ann Neurol 64 :402-9 (2008) ; Stacey, S.N., et al. Nat Genet 40 : 13 13- 18 (2008) ; Gud bja rtsson, D.F., et al. Nat Genet 40:886-9 1 (2008) ; Sty rka rsdottir, U., et al. N Engl J Med 358 :2355-65 (2008) ; Thorgeirsson, T., et al. Nature 452 :638-42 (2008) ; Gud mundsson, J., et al. Nat Genet. 40 :281-3 (2008) ; Stacey, S.N., et al., Nat Genet. 39 :865-69 (2007); Helgadottir, A., et al., Science 316: 1491-93 (2007) ; Steinthorsdottir, V., et al., Nat Genet. 39 :770-75 (2007) ; Gud mundsson, J., et al., Nat Genet. 39 :631-37 (2007) ; Frayling, TM, Nature Reviews Genet 8:657-662 (2007) ; Amu ndadottir, L.T., et al., Nat Genet. 38 :652-58 (2006); Gra nt, S.F., et al., Nat Genet. 38: 320-23 (2006)) . Thus, genetic findings in the Icela ndic popu lation have in general been replicated in other popu lations, including popu lations from Africa and Asia .

It is thus believed that the markers described herein to be associated with risk of lu ng ca ncer will show simila r association in other hu man popu lations. Particula r embodiments comprising individua l hu man populations are thus also contemplated and within the scope of the invention . Such embodiments relate to huma n subjects that are from one or more hu man popu lation including, but not li mited to, Caucasian popu lations, Europea n popu lations, American popu lations, Eurasia n populations, Asian populations, Central/South Asian popu lations, East Asia n popu lations, Middle Eastern popu lations, African populations, Hispanic populations, and Ocea nia n populations. Eu ropean populations include, but are not limited to, Swedish, Norwegian, Fin nish, Russian, Danish, Icelandic, Irish, Kelt, Eng lish, Scottish, Dutch, Belgian, French, Germa n, Spa nish, Portuguese, Ita lian, Polish, Bu lgarian, Slavic, Serbian, Bosnia n, Czech, Greek and Turkish populations.

The racia l contribution in individual subjects may also be determined by genetic analysis. Genetic analysis of ancestry may be ca rried out using unlin ked microsatellite markers such as those set out in Smith et al. {Am J Hum Genet 74, 1001- 13 (2004)) .

I n certai n embodiments, the invention relates to markers and/or ha plotypes identified in specific popu lations, as described in the above. The person skilled in the art will appreciate that measures of linkage disequili brium (LD) may give different results when applied to different popu lations. This is due to different popu lation history of different huma n popu lations as well as differential selective pressu res that may have led to differences in LD in specific genomic regions. It is also well known to the person skilled in the art that certain markers, e.g. SNP markers, have different population frequency in different popu lations, or are polymorphic in one popu lation but not in another. The person skilled in the art wil l however apply the methods available and as described herein to practice the present invention in any given hu man popu lation . This may include assessment of polymorphic markers in particula r LD regions, so as to identify those markers that give strongest association within the specific popu lation . Thus, the at-risk varia nts of the present invention may reside on different haplotype backgrou nd and in different frequencies in various human populations. However, utilizing methods known in the art and the markers of the present invention, the invention can be practiced in any given huma n popu lation .

Utility of Genetic Testing The person skilled in the art will appreciate and understand that the variants described herein in general do not, by themselves, provide an absolute identification of individua ls who will develop lu ng ca ncer. The variants described herein do however indicate increased and/or decreased likelihood that individuals ca rrying the at-risk varia nts disclosed herein will develop lu ng cancer. This information is however extremely valua ble in itself, as outlined in more detail in the following, as it can be used, for example, to initiate preventive measures at an early stage, perform regula r physical exams to monitor the development, progress and/or appearance of symptoms, or to schedule exams at a regu lar interval to identify lu ng cancer in its early stages, so as to be able to apply treatment at an early stage which is often critical for successfu l lu ng cancer therapy.

The knowledge about a genetic variant that confers a risk of developing lu ng cancer also offers the opportu nity to apply a genetic test to distinguish between individua ls with increased risk of developing lu ng ca ncer (i.e. carriers of at-risk varia nts) and those with decreased risk of developing lu ng ca ncer (i.e. carriers of protective variants, and/or non-ca rriers of at-risk variants) . The core value of genetic testing is the possibility of being able to identify a predisposition to disease at an early stage of disease, or before appearance of disease, so as to allow the clinician to apply the most appropriate treatment and/or preventive measure .

Individuals with a fami ly history of lung ca ncer and ca rriers of at-risk varia nts may also benefit from genetic testing since the knowledge of the presence of at-risk genetic risk factors may provide incentive for implementing a healthier lifestyle, by avoiding or minimizing known environ mental risk factors for lung ca ncer. For exam ple, an individual who is a current smoker and is identified as a carrier of one or more at-risk varia nts of lu ng ca ncer may, due to his/her increased risk of developing the disease, choose to quit smoki ng .

Integration of Genetic Risk Models into Clinical Management of Lung Cancer: Management of lung cancer currently relies on a combination of primary prevention (most importa ntly abstinence from smoki ng), early diag nosis and appropriate treatment. There are clea r clinica l imperatives for integ rating genetic testing into several aspects of these management areas. Identification of cancer susceptibility genes may also reveal key molecula r pathways that may be manipu lated (e.g ., using small or la rge molecula r weight drugs) and may lead to more effective treatments.

Primary prevention Prima ry prevention options currently focus on avoiding exposu re to tobacco smoke or other environ mental toxins that have been associated with the development of lu ng cancer.

Early Diagnosis Patients who are identified as bei ng at hig h risk for lung ca ncer may be referred to have chest X- rays or sputum cytology examination . I n addition, a spiral CT scan is a newly-developed procedu re for lu ng cancer screening . Numerous lung ca ncer screening trials are currently taking place but presently, the U.S. Preventive Services Task Force (USPSTF) concludes that evidence is insufficient to recom mend for or against screening asymptomatic persons for lu ng ca ncer with either low dose computerized tomog ra phy (LDCT), chest x-ray, sputum cytology, or a com bination of these tests.

Many of the screening protocols being evaluated involve some form of radiation or invasive procedu re such as bronchoscopy. These protocols ca rry certain risks and may prove hard to implement due to the considerable costs involved . I n light of the fact that only about 15% of lifetime smokers develop lu ng cancer, it is clea r that the great majority of individuals at risk wou ld be needlessly su bjected to repeated screening tests with the associated costs and negative side-effects. The identification of genetic biomarkers that affect the risk of developing lu ng ca ncer cou ld be used to help identify individuals should be offered extreme help in risk reduction prog rams such as smoking termination . I n the case of fai lure to stop smoki ng, or in the case of ex-smokers, such genetic bioma rkers cou ld help in defining the subpopulation of individua ls that would benefit the most from screening .

Less than 10% of lung cancer cases arise in individuals that have never smoked . Genetic bioma rkers that predict the risk of lung cancer wou ld be particula rly usefu l in this group . The genetic component of this form of the disease is likely to be even stronger than in tobacco- related lu ng ca ncer. If genetic variants that affect the risk of non-smoking lung cancer were known, it might be possible to identify individuals at high risk for this disease and su bject them to regu lar screening tests.

Diagnostic and screening methods The polymorphic markers shown herein to be associated with risk of lung cancer are usefu l in diagnostic methods. Although methods of diagnosing lu ng ca ncer are known, genetic risk markers such as those described herein provide added value to such diagnostic methods. Thus, by obtaining sequence data about particu lar markers, e.g., nucleic acid sequence data identifying at least one allele of at least one polymorphic marker, a diag nostic measure of lu ng cancer risk is obtained that may be utilized in various diagnostic methods as described herein . The present invention pertains in some embodiments to methods of clinica l applications of diagnosis, e.g., diagnosis performed by a medica l professional . I n other embodiments, the invention pertains to methods of diagnosis or methods of determination of a susceptibility performed by a layma n. The layman ca n be the customer of a genotypi ng service . The layman may also be a genotype service provider, who performs genotype analysis on a DNA sample from an individua l, in order to provide service related to genetic risk factors for particula r traits or diseases, based on the genotype status of the individual {i.e. , the customer) . Recent technological adva nces in genotyping tech nologies, including high-th rough put genotyping of SNP markers, such as Molecular Inversion Probe array technology {e.g., Affymetrix GeneChip), and BeadArray Technologies {e.g., Illu mina GoldenGate and Infinium assays) have made it possible for individua ls to have their own genome assessed for up to one million SNPs simulta neously, at relatively little cost. The resu lting genotype information, which can be made availa ble to the individua l, can be compa red to information about disease or trait risk associated with various SNPs, including information from public literature and scientific pu blications. The diagnostic application of disease-associated alleles as described herein, can thus for exam ple be performed by the individua l, t hrough ana lysis of his/her genotype data, by a health professional based on resu lts of a clinical test, or by a third party, including the genotype service provider. The third pa rty may also be service provider who interprets genotype information from the customer to provide service related to specific genetic risk factors, including the genetic markers described herein . I n other words, the diagnosis or determination of a susceptibi lity of genetic risk ca n be made by hea lth professiona ls, genetic cou nselors, third parties providing genotyping service, third pa rties providing risk assessment service or by the layman {e.g. , the individua l), based on information about the genotype status of an individual and knowledge about the risk conferred by particula r genetic risk factors {e.g., pa rticu la r SNPs) . I n the present context, the term "diagnosing", "diagnose a susceptibility" and "determine a susceptibility" is mea nt to refer to any availa ble diagnostic method, includi ng those mentioned above.

I n certai n embodiments, a sam ple containing genomic DNA from an individual is collected . Such sample can for example be a bucca l swab, a saliva sa mple, a blood sample, or other suita ble samples containing genomic DNA, as described further herein . The genomic DNA is then ana lyzed using any common tech nique available to the skilled person, such as hig h-throug hput array technologies. Results from such genotypi ng are stored in a convenient data storage unit, such as a data carrier, including computer data bases, data storage disks, or by other convenient data storage means. I n certain embodiments, the com puter data base is an object database, a relationa l database or a post-relational database . The genotype data is subsequently analyzed for the presence of certain varia nts known to be susceptibility variants for a pa rticu lar huma n conditions, such as the genetic variants described herein . Genotype data can be retrieved from the data storage unit using any convenient data query method . Calcu lating risk conferred by a pa rticu lar genotype for the individua l ca n be based on com paring the genotype of the individual to previously determined risk (expressed as a relative risk (RR) or and odds ratio (OR), for example) for the genotype, for exa mple for an heterozygous carrier of an at-risk variant for a pa rticu lar disease or trait. The calculated risk for the individua l can be the relative risk for a person, or for a specific genotype of a person, com pared to the average population with matched gender and eth nicity. The average population risk can be expressed as a weig hted average of the risks of different genotypes, using resu lts from a reference popu lation, and the appropriate calculations to ca lcu late the risk of a genotype grou p relative to the popu lation can then be performed . Alternatively, the risk for an individual is based on a com parison of particular genotypes, for exa mple heterozygous ca rriers of an at-risk allele of a marker compared with non-ca rriers of the at-risk allele . Using the population average may in certain embodiments be more convenient, since it provides a measu re which is easy to interpret for the user, i.e . a measure that gives the risk for the individua l, based on his/her genotype, compared with the average in the popu lation . The calcu lated risk estimated can be made availa ble to the customer via a website, preferably a secu re website .

I n certai n embodiments, a service provider will include in the provided service all of the steps of isolating genomic DNA from a sa mple provided by the customer, performing genotyping of the isolated DNA, ca lcu lating genetic risk based on the genotype data, and report the risk to the customer. I n some other embodi ments, the service provider will include in the service the interpretation of genotype data for the individual, i.e. , risk estimates for pa rticu lar genetic variants based on the genotype data for the individual . I n some other embodiments, the service provider may include service that includes genotyping service and interpretation of the genotype data, starting from a sa mple of isolated DNA from the individua l (the customer) .

Overall risk for multiple risk variants ca n be performed using standard methodology. For example, assumi ng a multi plicative model, i.e. assu ming that the risk of individua l risk varia nts multiply to esta blish the overall effect, allows for a straight-forwa rd ca lcu lation of the overall risk for multiple markers.

I n one embodiment, determination of a susceptibi lity to lu ng cancer ca n be accom plished using hybridization met hods (see Cu rrent Protocols in Molecu lar Biology, Ausubel, F. e ai. , eds. , John Wiley & Sons, including all su pplements) . The presence of a specific marker allele ca n be indicated by sequence-specific hybridization of a nucleic acid probe specific for the particu lar allele . The presence of more than one specific marker allele or a specific ha plotype ca n be indicated by using several sequence-specific nucleic acid probes, each being specific for a pa rticu lar allele . A sequence-specific probe can be directed to hybridize to genomic DNA, RNA, or cDNA. A "nucleic acid probe", as used herein, ca n be a DNA probe or an RNA probe that hybridizes to a complementary sequence . One of ski ll in the art would know how to design such a probe so that sequence specific hybridization wil l occu r only if a pa rticu lar allele is present in a genomic sequence from a test sa mple .

To determine a susceptibility to lu ng cancer, a hybridization sa mple can be formed by contacting the test sample, such as a genomic DNA sa mple, with at least one nucleic acid probe. A non- limiting exam ple of a probe for detecting mRNA or genomic DNA is a labeled nucleic acid probe that is capable of hybridizing to mRNA or genomic DNA sequences described herein . The nucleic acid probe ca n be, for exa mple, a full-length nucleic acid molecule, or a portion thereof, such as an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 nucleotides in length that is sufficient to specifically hybridize under stringent conditions to appropriate mRNA or genomic DNA. I n certain embodiments, the oligonucleotide is from about 15 to about 100 nucleotides in length . I n certai n other embodiments, the oligonucleotide is from about 20 to about 50 nucleotides in length . The nucleic acid probe can comprise all or a portion of the nucleotide sequence of LD block C07, LD block C08 or LD block C19, as described herein, optionally comprising at least one marker described herein, or the probe can be the complementary sequence of such a sequence. The nucleic acid probe can also com prise all or a portion of the nucleotide sequence of a gene selected from the grou p consisting of CHRNB3, CHRNA6, PDE1 C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, or RAB4B, or the probe ca n be the com plementary sequence of such a sequence. Other suita ble probes for use in the diag nostic assays of the invention are descri bed herein . Hybridization can be performed by methods well known to the person skilled in the art (see, e.g. , Cu rrent Protocols in Molecula r Biology, Ausubel,

F. et al. , eds., Joh n Wiley & Sons, includi ng all supplements) . I n one embodiment, hybridization refers to specific hybridization, i.e ., hybridization with no mismatches (exact hybridization) . I n one embodiment, the hybridization conditions for specific hybridization are high stringency.

Specific hybridization, if present, is detected usi ng standard methods. If specific hybridization occu rs between the nucleic acid probe and the nucleic acid in the test sa mple, then the sam ple contains the allele that is complementary to the nucleotide that is present in the nucleic acid probe . The process can be repeated for any markers of the present invention, or markers that make up a haplotype of the present invention, or multi ple probes can be used concurrently to detect more tha n one marker alleles at a time.

I n one embodiment of the invention, a test sa mple containing genomic DNA obtained from the subject is collected and the polymerase chai n reaction (PCR) is used to amplify a fragment com prisi ng one ore more markers of the present invention . As described herein, identification of a particula r marker allele or haplotype ca n be accom plished using a variety of methods (e.g. , sequence analysis, ana lysis by restriction digestion, specific hybridization, single stranded conformation polymorphism assays (SSCP), electrophoretic analysis, etc. ) . I n one embodiment, a method utilizing a detection oligonucleotide probe comprising a fluorescent moiety or group at its 3' terminus and a quencher at its 5' terminus, and an enha ncer oligonucleotide, is employed, as described by Kutyavin et al. (Nucleic Acid Res. 34 :el28 (2006)) . I n another embodiment, diagnosis is accomplished by expression analysis, for exam ple by using qua ntitative PCR (kinetic thermal cycling) . This technique can, for exam ple, utilize commercia lly availa ble tech nologies, such as TaqMan® (Applied Biosystems, Foster City, CA) . The tech nique can assess the presence of an alteration in the expression or composition of a polypeptide or splicing variant(s) . Further, the expression of the variant(s) ca n be qua ntified as physically or functionally different.

I n another embodiment of the methods of the invention, analysis by restriction digestion ca n be used to detect a pa rticu lar allele if the allele results in the creation or elimination of a restriction site relative to a reference sequence. Restriction fragment length polymorphism (RFLP) analysis can be conducted, e.g. , as described in Cu rrent Protocols in Molecu lar Biology, supra . The digestion pattern of the releva nt DNA fragment indicates the presence or absence of the pa rticu lar allele in the sample.

Sequence analysis ca n also be used to detect specific alleles or haplotypes. Therefore, in one embodiment, determination of the presence or absence of a particula r marker alleles or haplotypes comprises sequence ana lysis of a test sample of DNA or RNA obtained from a subject or individual . PCR or other appropriate methods can be used to amplify a portion of a nucleic acid that contains a polymorphic marker or haplotype, and the presence of specific alleles can then be detected directly by sequencing the polymorphic site (or multiple polymorphic sites in a haplotype) of the genomic DNA in the sa mple . The direct sequence ana lysis can be of the nucleic acid of a biologica l sam ple obtained from the human individual for which a susceptibi lity is being determined . The biologica l sa mple can be any sam ple containing nucleic acid (e .g ., genomic DNA) obtained from the human individual . I n a specific aspect of the invention, obtaining nucleic acid sequence data com prises obtaining nucleic acid sequence information from a preexisting record, e.g ., a preexisting medica l record com prisi ng genotype information of the human individual . For example, direct sequence analysis of the allele of the polymorphic marker can be accomplished by mining a pre-existing genotype dataset for the sequence of the allele of the polymorphic marker.

I n another embodiment, arrays of oligonucleotide probes that are complementa ry to target nucleic acid sequence segments from a subject, can be used to identify pa rticu lar alleles at polymorphic sites. For exa mple, an oligonucleotide array can be used . Oligonucleotide arrays typically comprise a plurality of different oligonucleotide probes that are cou pled to a surface of a substrate in different known locations. These arrays ca n generally be produced using mecha nical synthesis methods or light directed synthesis methods that incorporate a combination of photolithographic methods and solid phase oligonucleotide synthesis methods, or by other methods known to the person skilled in the art (see, e.g. , Bier, F.F., et al. Adv Biochem Eng Blotechnol 109 :433-53 (2008) ; Hoheisel, J.D., Nat Rev Genet 7:200- 10 (2006) ; Fa n, J.B., et al. Methods Enzymol 410 :57-73 (2006) ; Raqoussis, J. & Elvidge, G., Expert Rev Mol Diagn 6 :145-52 (2006) ; Mockler, T.C., et al Genomics 85: 1- 15 (2005), and references cited therein, the entire teachings of each of which are incorporated by reference herein) . Many additional descriptions of the preparation and use of oligonucleotide arrays for detection of polymorphisms can be found, for exam ple, in US 6,858,394, US 6,429,027, US 5,445,934, US 5,700,637, US

5,744,305, US 5,945,334, US 6,054,270, US 6,300,063, US 6,733,977, US 7,364,858, EP 619 321, and EP 373 203, the entire teachi ngs of which are incorporated by reference herein .

Other methods of nucleic acid analysis that are available to those skilled in the art can be used to detect a particula r allele at a polymorphic site. Representative methods include, for example, direct manual sequencing (Chu rch and Gilbert, Proc. Natl. Acad. Sci. USA, 81 : 199 1- 1995 ( 1988) ; Sa nger, F., et al. , Proc. Natl. Acad. Sci. USA, 74: 5463-5467 ( 1977) ; Beavis, et al. , U.S. Patent No. 5,288,644); automated fluorescent sequencing ; single-stra nded conformation polymorphism assays (SSCP) ; clamped denatu ring gel electrophoresis (CDGE) ; denaturing gradient gel electrophoresis (DGGE) (Sheffield, V., et al. , Proc. Natl. Acad. Sci. USA, 6 :232-236 ( 1989)), mobility shift analysis (Orita, M., et al. , Proc. Natl. Acad. Sci. USA, 86 :2766-2770 ( 1989)), restriction enzyme analysis (Flavell, R., et al. , Cell, 15 :25-41 ( 1978) ; Geever, R., et al. , Proc. Natl. Acad. Sci. USA, 78: 5081-5085 ( 1981)) ; heterodu plex ana lysis; chemica l mismatch cleavage (CMC) (Cotton, R., et al. , Proc. Natl. Acad. Sci. USA, 85:4397-440 1 ( 1985)) ; RNase protection assays (Myers, R., et al. , Science, 230: 1242-1246 ( 1985) ; use of polypeptides that recognize nucleotide mismatches, such as E. coli mutS protein; and allele-specific PCR.

Indirect analyses Alternatively, the nucleic acid sequence data may be obtained t hrough indirect ana lysis of the nucleic acid sequence of the allele of the polymorphic marker. For exa mple, the allele cou ld be one which leads t o the expression of a variant protein com prising an altered amino acid sequence, as compared t o the non-va riant (e.g ., wild-type) protei n, due t o one or more amino acid su bstitutions, deletions, or insertions, or tru ncation (due to, e.g ., splice variation) . Other possible effects include alterations in relative amounts of alternative splice forms of mRNA, effects on RNA stability, effects on transport from the nucleus t o cytoplasm, and effects on the efficiency and accuracy of translation .

A variety of methods are known in the art and ca n be used for detecting protein expression levels, including enzyme lin ked immunosorbent assays (ELISA), Western blots, immunopreci pitations and immunofluorescence. For exa mple, a test sa mple from a subject can be assessed for the presence of an alteration in the expression and/or an alteration in polypeptide com position . Both quantitative and qua litative alterations can be present. An " alteration" in the polypeptide expression or composition, as used herein, refers t o an alteration in expression or com position in a test sam ple, as compa red t o the expression or composition of the polypeptide in a control sa mple . A control sa mple is suitably a sample that corresponds t o the test sample (e.g. , is from the same type of cells), and is from a subject who is not affected by, and/or who does not have a suscepti bility to, lu ng cancer. I n one embodiment, the control sample is from a su bject who does not possess an at-risk marker allele for lung ca ncer, as described herein . Various means of exa mining expression or composition of a polypeptide encoded by a nucleic acid are known t o the person skil led in the art and can be used, including spectroscopy, colorimetry, electrophoresis, isoelectric focusi ng, and immunoassays (e.g. , David et al. , U.S. Pat. No. 4,376, 110) such as immu noblotting (see, e.g. , Current Protocols in Molecula r Biology, pa rticu larly cha pter 10, supra) .

For example, in one embodiment, an anti body (e.g. , an antibody with a detectable label) can be used . Antibodies can be polyclonal or monoclonal . An intact antibody, or a fragment thereof

(e.g. , Fv, Fab, Fa b', F(ab') 2) can be used . The term "la beled", with regard t o the probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i.e. , physica lly lin king) a detectable su bstance t o the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent that is directly labeled . Exam ples of indirect labeling include detection of a primary antibody using a labeled seconda ry antibody (e.g. , a fluorescently-labeled secondary antibody) and end-la beling of a DNA probe with biotin such that it ca n be detected with fluorescently-labeled streptavidin .

A level or amount of the polypeptide in the test sample that is higher or lower tha n the level or amount of the polypeptide in the control sa mple, such that the difference is statistically significa nt, is indicative of an alteration in the expression of the polypeptide, and is a diagnostic for a particu lar allele or ha plotype associated with the difference in expression . Alternatively, the com position of the polypeptide in a test sample is compared with the composition of the polypeptide in a control sam ple. I n another embodiment, both the level or amou nt and the com position of the polypeptide ca n be assessed in the test sample and in the control sam ple.

Further, risk assessment of lung cancer may be made by detecting at least one marker of the present invention in combination with an additiona l protein-based, RNA-based or DNA-based assay.

Kits Kits useful in the methods of the invention comprise com ponents usefu l in any of the methods described herein, including for exam ple, primers for nucleic acid amplification, hybridization probes, restriction enzymes (e.g. , for RFLP analysis), allele-specific oligonucleotides, antibodies that bind t o an altered polypeptide encoded by a nucleic acid of the invention as descri bed herei n

(e.g. , a genomic segment comprising at least one polymorphic marker and/or ha plotype of the present invention) or t o a non-altered (native) polypeptide encoded by a nucleic acid of the invention as descri bed herein, means for amplification of a nucleic acids associated with lu ng cancer, mea ns for ana lyzing the nucleic acid sequence of a nucleic acid associated with lu ng cancer, mea ns for ana lyzing the amino acid sequence of a polypeptide encoded by a nucleic acid associated with lu ng ca ncer, etc. The kits can for exa mple include necessary buffers, nucleic acid primers for amplifying nucleic acids of the invention (e.g. , a nucleic acid seg ment com prisi ng one or more of the polymorphic markers as described herein), and reagents for allele-specific detection of the fragments amplified using such primers and necessa ry enzymes

(e.g. , dna polymerase) . Additiona lly, kits can provide reagents for assays t o be used in com bination with the methods of the present invention, e.g. , reagents for use with other diagnostic assays for lu ng ca ncer.

I n one embodiment, the invention pertains t o a kit for assaying a sam ple from a su bject t o detect a susceptibility t o lung ca ncer in a su bject, wherein the kit comprises reagents necessa ry for selectively detecti ng at least one allele of at least one polymorphism of the present invention in the genome of the individua l. I n a particu lar embodiment, the reagents comprise at least one contiguous oligonucleotide that hybridizes t o a fragment of the genome of the individual com prisi ng at least one polymorphism of the present invention . I n another embodiment, the reagents comprise at least one pair of oligonucleotides that hybridize t o opposite strands of a genomic segment obtai ned from a subject, wherein each oligonucleotide primer pair is designed t o selectively amplify a fragment of the genome of the individua l that includes at least one polymorphism associated with lung cancer risk. I n one such embodiment, the polymorphism is selected from the grou p consisting of the polymorphisms rs6474412, rs2 15614 and rs4105 144, and polymorphic markers in lin kage disequilibrium therewith . I n one embodiment, the polymorphism is selected from the grou p consisting of the markers listed in Table 1, Table 2 and/or Table 3 herein . I n yet another embodiment the fragment is at least 20 base pairs in size . Such oligonucleotides or nucleic acids (e.g. , oligonucleotide pri mers) can be desig ned using portions of the nucleic acid sequence flanking polymorphisms (e.g. , SNPs or microsatellites) that are associated with risk of lu ng ca ncer. I n another embodiment, the kit comprises one or more la beled nucleic acids capa ble of allele-specific detection of one or more specific polymorphic markers or haplotypes, and reagents for detection of the label . Suitable labels include, e.g. , a radioisotope, a fluorescent label, an enzyme la bel, an enzyme co-factor la bel, a magnetic la bel, a spin label, an epitope la bel .

I n particula r embodiments, the polymorphic marker or haplotype to be detected by the reagents of the kit comprises one or more markers, two or more markers, three or more markers, four or more markers or five or more markers selected from the group consisting of the markers set forth in any one of Table 1, Table 2 and Table 3. I n another embodiment, the marker or haplotype to be detected comprises at least one marker from the grou p of markers in strong lin kage disequilibriu m, as defined by values of r2 greater than 0.2, to a marker selected from the grou p consisting of rs6474412, rs2 15614 and rs4105 144. I n another embodiment, the marker or ha plotype to be detected is selected from the grou p consisting of rs6474412, rs2 15614 and rs4105144.

I n a preferred embodiment, the DNA template contai ning the SNP polymorphism is amplified by Polymerase Chain Reaction (PCR) prior to detection, and primers for such amplification are included in the reagent kit. I n such an embodiment, the amplified DNA serves as the tem plate for the detection probe and the enhancer probe.

I n one embodiment, the DNA template is amplified by mea ns of Whole Genome Amplification (WGA) methods, prior to assessment for the presence of specific polymorphic markers as described herein . Standard methods well known to the skilled person for performing WGA may be utilized, and are within scope of the invention . I n one such embodiment, reagents for performing WGA are included in the reagent kit.

I n a further aspect of the present invention, a pharmaceutical pack (kit) is provided, the pack com prisi ng a thera peutic agent and a set of instructions for administration of the therapeutic agent to huma ns diag nostically tested for one or more varia nts of the present invention, as disclosed herein . The therapeutic agent ca n be a small molecu le drug, an anti body, a peptide, an antisense or rnai molecule, or other therapeutic molecu les. I n one embodiment, an individual identified as a carrier of at least one varia nt of the present invention is instructed to take a prescribed dose of the therapeutic agent. I n one such embodiment, an individual identified as a homozygous ca rrier of at least one variant of the present invention is instructed to take a prescribed dose of the therapeutic agent. I n another embodiment, an individual identified as a non-ca rrier of at least one variant of the present invention is instructed to take a prescribed dose of the therapeutic agent.

I n certai n embodiments, the kit further comprises a set of instructions for using the reagents com prisi ng the kit. I n certain embodiments, the kit further comprises a collection of data com prisi ng correlation data between the polymorphic markers assessed by the kit and susceptibility to lu ng ca ncer.

Therapeutic agents The variants (markers and/or haplotypes) disclosed herein to confer increased risk of lu ng ca ncer can be useful for the identification of novel therapeutic targets for lung cancer. For exa mple, genes containing, or in li nkage disequili brium with, one or more of these variants, or their products {e.g., CHRNB3, CHRNA6, PDE1 C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, or RAB4B), as wel l as genes or their products that are directly or indirectly regu lated by or interact with such variant genes or their products, ca n be targeted for the development of novel therapeutic agents for lung cancer. Thera peutic agents may comprise one or more of, for exa mple, small non-protein and non-nucleic acid molecu les, proteins, peptides, protein fragments, nucleic acids (DNA, RNA), PNA (peptide nucleic acids), or their derivatives or mimetics which ca n modulate the function and/or levels of the target genes or their gene products.

The nucleic acids and/or varia nts described herein, or nucleic acids comprising their com plementary sequence, may be used as antisense constructs to control gene expression in cells, tissues or organs. The methodology associated with antisense techniques is well known to the skilled artisa n, and is for exa mple described and reviewed in AntisenseDrug Technology: Principles, Strategies, and Applications, Crooke, ed ., Marcel Dekker Inc. , New York (200 1) . I n general, antisense agents (antisense oligonucleotides) are comprised of sing le stra nded oligonucleotides (RNA or DNA) that are capable of binding to a com plimentary nucleotide segment. By bindi ng the appropriate target sequence, an RNA-RNA, DNA-DNA or RNA-DNA duplex is formed . The antisense oligonucleotides are complementa ry to the sense or coding stra nd of a gene. It is also possible to form a triple helix, where the antisense oligonucleotide binds to duplex DNA.

Severa l classes of antisense oligonucleotide are known to those skilled in the art, including cleavers and blockers. The former bind to target RNA sites, activate intracellu lar nucleases (e.g., RnaseH or Rnase L), that cleave the target RNA. Blockers bind to target RNA, inhibit protein tra nslation by steric hindrance of the ribosomes. Exa mples of blockers include nucleic acids, morpholino compou nds, locked nucleic acids and methylphosphonates (Thompson, Drug Discovery Today, 7:912-9 17 (2002)) . Antisense oligonucleotides are usefu l directly as therapeutic agents, and are also usefu l for determining and validati ng gene function, for exam ple by gene knock-out or gene knock-down experiments. Antisense tech nology is further described in Lavery et al. , Curr. Opin. Drug Discov. Devel. 6 :561-569 (2003), Stephens et al., Curr. Opin. Mol. Ther. 5:118- 122 (2003), Kurreck, Eur. J. Biochem. 270 :1628-44 (2003), Dias et al., Mol. Cancer Ter. 1:347-55 (2002), Chen, Methods Mol. Med. 75:621-636 (2003), Wang et al., Curr. Cancer Drug Targets 1:177-96 (200 1), and Bennett, Antisense Nucleic Acid Drug. Dev. 12 :215- 24 (2002) .

I n certai n embodiments, the antisense agent is an oligonucleotide that is ca pa ble of binding to a pa rticu lar nucleotide segment. Antisense nucleotides can be from 5-500 nucleotides in length, including 5-200 nucleotides, 5- 100 nucleotides, 10-50 nucleotides, and 10-30 nucleotides. I n certain preferred embodiments, the antisense nucleotides is from 14-50 nucleotides in length, includig n 14-40 nucleotides and 14-30 nucleotides. All integer lengths from 5-500 are specifica lly contemplated for the present invention, as are all subra nges of lengths. I n certai n preferred embodiments, the antisense nucleotides is from 14-50 nucleotides in length, includign 14-40 nucleotides and 14-30 nucleotides. I n certain such embodiments, the antisense nucleotide is ca pable of binding to a nucleotide segment of the a gene selected from the grou p consisting of CHRNB3, CHRNA6, PDE1 C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, and RAB4B.

The variants described herein can also be used for the selection and design of antisense reagents that are specific for pa rticu lar variants. Using information about the variants described herein, antisense oligonucleotides or other antisense molecu les that specifically target mRNA molecu les that contain one or more variants of the invention ca n be designed . I n this manner, expression of mRNA molecules that contain one or more varia nt of the present invention (i.e. certain marker alleles and/or ha plotypes) can be inhibited or blocked . I n one embodiment, the antisense molecu les are designed to specifically bind a pa rticu lar allelic form (i.e ., one or several variants (alleles and/or ha plotypes)) of the target nucleic acid, thereby inhibiting translation of a product originating from this specific allele or haplotype, but which do not bind other or alternate variants at the specific polymorphic sites of the target nucleic acid molecu le. As antisense molecu les can be used to inactivate mRNA so as to inhibit gene expression, and thus protein expression, the molecu les can be used for disease treatment. The methodology can involve cleavage by mea ns of ribozymes containing nucleotide sequences complementa ry to one or more regions in the mRNA that attenuate the ability of the mRNA to be translated . Such mRNA regions include, for exa mple, protein-coding regions, in pa rticular protein-coding regions corresponding to cata lytic activity, substrate and/or ligand bindi ng sites, or other functional domains of a protein .

The phenomenon of RNA interference (RNAi) has been actively studied for the last decade, since its original discovery in C. elegans (Fire et al., Nature 39 1:806- 11 ( 1998)), and in recent years its potentia l use in treatment of human disease has been actively pu rsued (reviewed in Kim & Rossi, Nature Rev. Genet. 8 :173-204 (2007)) . RNA interference (RNAi), also ca lled gene silencing, is based on using double-stra nded RNA molecules (dsRNA) to t urn off specific genes. I n the cell, cytoplasmic double-stra nded RNA molecules (dsRNA) are processed by cellu lar com plexes into smal l interfering RNA (siRNA) . The siRNA guide the targeti ng of a protein-RNA complex to specific sites on a target mRNA, leadi ng to cleavage of the mRNA (Thompson, Drug Discovery Today, 7:912-9 17 (2002)) . The siRNA molecu les are typically about 20, 21, 22 or 23 nucleotides in length . Thus, one aspect of the invention relates to isolated nucleic acid molecu les, and the use of those molecules for RNA interference, i.e . as smal l interfering RNA molecules (siRNA) . I n one embodiment, the isolated nucleic acid molecules are 18-26 nucleotides in length, preferably 19-25 nucleotides in length, more preferably 20-24 nucleotides in length, and more preferably 21, 22 or 23 nucleotides in length .

Another pathway for RNAi-mediated gene silencing originates in endogenously encoded primary microRNA (pri-miRNA) tra nscripts, which are processed in the cell t o generate precursor miRNA (pre-miRNA) . These miRNA molecules are exported from the nucleus t o the cytoplasm, where they undergo processing t o generate matu re miRNA molecu les (miRNA), which direct tra nslational inhibition by recognizing t arget sites in the 3' untranslated regions of mRNAs, and subsequent mRNA degradation by processi ng P-bodies (reviewed in Kim & Rossi, Nature Rev. Genet 8 : 173-204 (2007)) .

Clinical applications of RNAi include the incorporation of synthetic siRNA duplexes, which preferably are approximately 20-23 nucleotides in size, and preferably have 3' overla ps of 2 nucleotides. Knockdown of gene expression is esta blished by sequence-specific design for the t arget mRNA. Several commercia l sites for optima l design and synthesis of such molecules are known to those skil led in the art.

Other applications provide longer siRNA molecu les (typica lly 25-30 nucleotides in length, preferably about 27 nucleotides), as well as small hairpin RNAs (shRNAs; typically about 29 nucleotides in length) . The latter are natu rally expressed, as descri bed in Amarzguioui et al. {FEBS Lett. 579 :5974-81 (2005)) . Chemica lly synthetic siRNAs and shRNAs are su bstrates for in vivo processing, and in some cases provide more potent gene-silencing than shorter designs (Kim et al., Nature Biotechnol. 23:222-226 (2005) ; Siolas et al., Nature Biotechnol. 23:227-23 1 (2005)) . I n genera l siRNAs provide for transient silencing of gene expression, because their intracellula r concentration is diluted by subsequent cell divisions. By contrast, expressed shRNAs mediate long-term, stable knockdown of target tra nscripts, for as long as transcription of the shRNA takes place (Ma rques et al., Nature Biotechnol. 23 :559-565 (2006) ; Brummelkam p et al., Science 296 : 550-553 (2002)) .

Since RNAi molecules, including siRNA, miRNA and shRNA, act in a sequence-dependent manner, the variants presented herein ca n be used t o design RNAi reagents that recognize specific nucleic acid molecu les comprising specific alleles and/or haplotypes (e.g., the alleles and/or haplotypes of the present invention), while not recognizing nucleic acid molecules com prising other alleles or haplotypes. These RNAi reagents can thus recognize and destroy the t arget nucleic acid molecu les. As with antisense reagents, RNAi reagents can be usefu l as thera peutic agents (i.e., for turning off disease-associated genes or disease-associated gene varia nts), but may also be useful for cha racterizing and validating gene function (e.g., by gene knock-out or gene knock down experiments) .

Delivery of RNAi may be performed by a range of methodologies known t o those skilled in the art. Methods uti lizing non-vira l delivery include cholesterol, sta ble nucleic acid-lipid particle (SNALP), heavy-chai n antibody fragment (Fab), apta mers and nanoparticles. Vira l delivery methods include use of lentivirus, adenovi rus and adeno-associated virus. The siRNA molecules are in some embodiments chemica lly modified to increase their stability. This can include modifications at the 2' position of the ribose, including 2'-0-methylpurines and 2'- fluoropyrimidines, which provide resista nce to Rnase activity. Other chemica l modifications are possible and known to those skilled in the art.

The following references provide a further su mmary of RNAi, and possibi lities for targeting specific genes usi ng RNAi : Kim & Rossi, Nat. Rev. Genet. 8 :173- 184 (2007), Chen & Rajewsky, Nat. Rev. Genet. 8 : 93- 103 (2007), Reynolds, et al., Nat. Biotechnol. 22:326-330 (2004), Chi et al., Proc. Natl. Acad. Sci. USA 100 :6343-6346 (2003), Vickers et al., J. Biol. Chem. 278: 7108- 7118 (2003), Agami, Curr. Opin. Chem. Biol. 6 :829-834 (2002), Lavery, et al., Curr. Opin. Drug Discov. Devel. 6 :561-569 (2003), Shi, Trends Genet. 19 :9- 12 (2003), Shuey et al., Drug Discov. Today 7:1040-46 (2002), McManus et al., Nat. Rev. Genet. 3:737-747 (2002), Xia et al., Nat. Biotechnol. 20 :1006-10 (2002), Plasterk et al., curr. Opin. Genet. Dev. 10 :562-7 (2000), Bosher et al., Nat. Cell Biol. 2:E31-6 (2000), and Hu nter, Curr. Biol. 9 :R440-442 ( 1999) .

A genetic defect leadi ng to increased predisposition or risk for development of a disease, such as lu ng ca ncer, or a defect causing the disease, may be corrected permanently by admi nistering to a su bject carrying the defect a nucleic acid fragment that incorporates a repair sequence that supplies the normal/wild-type nucleotide(s) at the site of the genetic defect. Such site-specific repair sequence may concom pass an RNA/DNA oligonucleotide that operates to promote endogenous repair of a subject's genomic DNA. The administration of the repair sequence may be performed by an appropriate vehicle, such as a complex with polyethelenimine, enca psulated in anionic liposomes, a viral vector such as an adenovirus vector, or other pharmaceutica l com positions suita ble for promoting intracellu lar uptake of the adminstered nucleic acid . The genetic defect may then be overcome, since the chimeric oligonucleotides induce the incorporation of the normal sequence into the genome of the subject, leading to expression of the norma l/wild-type gene product. The replacement is propagated, thus rendering a permanent repair and alleviation of the symptoms associated with the disease or condition .

Methods of assessing probability of response to therapeutic agents, methods of monitoring progress of treatment and methods of treatment As is known in the art, individuals can have differential responses to a pa rticu lar thera py (e.g. , a therapeutic agent or therapeutic method) . Pha rmacogenomics addresses the issue of how genetic variations (e.g., the varia nts (ma rkers and/or haplotypes) of the present invention) affect drug response, due to altered drug disposition and/or abnormal or altered action of the drug . Thus, the basis of the differential response may be genetical ly determined in pa rt. Cli nical outcomes due to genetic variations affecting drug response may result in toxicity of the drug in certain individuals (e.g., carriers or non-ca rriers of the genetic varia nts of the present invention), or therapeutic failure of the drug . Therefore, the variants of the present invention may determine the manner in which a therapeutic agent and/or method acts on the body, or the way in which the body meta bolizes the thera peutic agent. Accordingly, in one embodiment, the presence of a pa rticu la r allele at a polymorphic site or haplotype is indicative of a different response, e.g. a different response rate, to a pa rticular treatment modality. This means that a patient diagnosed with lu ng ca ncer, and carrying a certain varia nt of the present invention (e.g. , the at-risk alleles of the invention) would respond better to, or worse to, a specific thera peutic, drug and/or other therapy used to treat the disease . Therefore, the presence or absence of the variant cou ld aid in deciding what treatment shou ld be used for the patient. For exa mple, for a newly diag nosed patient, the presence of a variant may be assessed (e.g. , t hrough testing DNA derived from a blood sample, as descri bed herein) . If the patient is positive for the varia nt, then the physician recommends one particu lar therapy, while if the patient is negative for the varia nt, then a different cou rse of therapy may be recommended (which may include recommendi ng that no immediate therapy, other than serial monitoring for progression of the disease, be performed) . Thus, the patient's carrier status cou ld be used to help determi ne whether a particular treatment modality shou ld be administered . The value lies within the possibilities of being able to diagnose disease, or susceptibility to disease, at an early stage, to select the most appropriate treatment, and provide information to the clinician about prognosis/aggressiveness of the disease in order to be able to apply the most appropriate treatment.

The treatment for lung cancer can in certain embodiments be selected from surgical treatment (surgical removal of tumor), radiation thera py and chemotherapy. It is contem plated that the markers described herein to be associated with lu ng ca ncer ca n be used to predict the efficacy of any of these particu lar treatment modules. I n certain embodiments, the markers of the inventions, as described herein may be used to determine an appropriate combination of therapy, which can include any one, two or t hree of these treatment modules. I n certain embodiments, the radiation therapy is brachytherapy. The agent useful for chemothera py may be any chemical agent commonly used, or in development, as a chemothera py agent, including, but not limited to, cisplatin, carboplati n, gemcitabi ne (4-a mino- l-[3,3-difluoro-4-hydroxy-5- (hydroxymethyl) a tetrahydrofu ran-2-yl] - lH-pyrimidin- 2-one), paclitaxel ((2a,4a,53,73, 10 ,13a)-4, 10-bis(acetyloxy)-13-{[(2R,3S)-3-(benzoylami no)-2-hydroxy-3- phenylpropa noyl]oxy>- l,7-dihydroxy-9-oxo-5,20-epoxytax- l l-en-2-yl benzoate), docetaxel ((2R,3S)-/V-ca rboxy-3-phenylisoserine, /V-tert-butyl ester, 13-ester with 5, 20-epoxy- l , 2, 4, 7, 10, 13-hexahyd roxytax-l l-en-9-one 4-acetate 2-benzoate), etoposide (4'-demethyl- epipodophyllotoxin 9-[4,6-0-(R)-ethylidene-beta-D-glucopyranoside] , 4' -(dihyd rogen phosphate)), vinorelbine (4-(acetyloxy)-6,7-didehydro- 15- ((2R,6R,8S)-4-ethyl- l,3,6,7,8,9- hexa hydro- 8-(methoxycarbonyl)-2,6-metha no- 2H-azecino(4,3-0)indol-8-yl)-3-hyd roxy- 16- methoxy- l methyl, methylester, (2beta,3beta,4beta,5alpha, 12R, 19alpha) - aspidospermidine-3- carboxylic acid), and etoposide (4'-demethyl-epipodophyllotoxin 9-[4,6-0-(R)-ethylidene-beta-D- glucopyranoside] 4' -(dihydrogen phosphate)) . Chemotherapy agents may be used alone or in com bination . I n one embodiment, the agent targets an epidermal growth factor receptor. I n certain such embodi ments, the agent is gefitini b (Iressa; /V-(3-chloro-4-fluoro-phenyl)-7- methoxy-6-(3-morpholin-4-ylpropoxy)quinazolin-4-amine) or erlotinib (Tarceva; Λ/-(3- ethynylphenyl)-6,7-bis(2-methoxyethoxy)quinazolin-4-a mine) . I n certain other embodiments, the agent is angiogenesis inhibitor. Such inhibitors can for exam ple be anti bodies that inhibit the vascular endothelia r growth factor, such as Bevacizumab (Avastin) .

The present invention also relates to methods of monitori ng progress or effectiveness of a treatment for lung ca ncer. This ca n be done by assessing for the absence or presence of at least one varia nt associated with lu ng cancer, as disclosed herein, or by monitoring expression of genes that are associated with the variants (markers and ha plotypes) of the present invention .

Another aspect of the invention relates to methods of selecting individuals suitable for a pa rticu lar treatment modality, based on their likelihood of developing pa rticula r complications or side effects of the particular treatment. It is well known that most therapeutic agents ca n lead to certain unwanted complications or side effects. Likewise, certain therapeutic procedures or operations may have complications associated with them . Complications or side effects of these pa rticu lar treatments or associated with specific therapeutic agents may have a genetic com ponent. It is therefore contem plated that selection of the appropriate treatment or therapeutic agent ca n in part be performed by determining the genotype of an individual, and using the genotype status of the individual to decide on a suitable therapeutic procedure or on a suitable therapeutic agent to treat the pa rticu lar disease. It is therefore contemplated that the polymorphic markers of the invention can be used in this man ner. I n particula r, the polymorphic markers of the invention can be used to determine whether administration of a particu lar therapeutic agent or treatment modality or method is suitable for the individua l, based on estimating the likelihood that the individual will benefit from the administration of the particular therapeutic agent or treatment modality or method . Indiscriminate use of such a thera peutic agents or treatment modalities may lead to unnecessary and needless adverse complications.

I n view of the foregoing, the invention provides a method of assessing an individual for proba bility of response to a therapeutic agent for preventing, treating, and/or ameliorating symptoms associated with lu ng cancer. I n one embodiment, the method com prises: determi ning the identity of at least one allele of at least one polymorphic marker in a sa mple, e.g ., a nucleic acid sam ple, obtained from the individual, wherein the at least one polymorphic marker is selected from polymorphic markers selected from the group consisting of rs6474412, rs2 15614 and rs4105 144, and markers in linkage disequilibriu m therewith, wherein the identity of the at least one allele of the at least one marker is indicative of a probability of a positive response to the thera peutic agent

I n a further aspect, the markers of the present invention ca n be used to increase power and effectiveness of clinical tria ls. Thus, individua ls who are ca rriers of at least one at-risk variant of the present invention may be more li kely to respond favorably to a pa rticu lar treatment. I n one embodiment, individua ls who carry at-risk varia nts for gene(s), or their gene product which a pa rticu lar treatment (e.g. , sma ll molecu le drug) is targeting, are more likely to be responders to the treatment. I n certain embodiments, the treatment is targeting a gene selected from the grou p consisting of CHRNB3, CHRNA6, PDE1 C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, and RAB4B. I n another embodi ment, individuals who ca rry at-risk variants associated with a gene which expression and/or function is altered by the at-risk variant, are more likely to be responders to a treatment modality targeti ng that gene, its expression or its gene product. This application ca n improve the safety of clinical trials, but can also enha nce the chance that a cli nical tria l will demonstrate statistically significant efficacy, which may be limited to a certain sub-g rou p of the popu lation . Thus, one possible outcome of such a trial is that carriers of certain genetic variants, e.g., the markers and haplotypes of the present invention, are statistically significa ntly li kely to show positive response to the thera peutic agent, i.e. experience alleviation of symptoms associated with lu ng ca ncer when taking the therapeutic agent or drug as prescribed .

Computer-implemented aspects

As understood by those of ordina ry ski ll in the art, the methods and information described herein may be implemented, in all or in part, as computer executa ble instructions on known com puter readable media . For exa mple, the methods described herein may be implemented in hardware . Alternatively, the method may be implemented in softwa re stored in, for exa mple, one or more memories or other computer readable medium and implemented on one or more processors. As is known, the processors may be associated with one or more controllers, calculation units and/or other units of a computer system, or implanted in firmware as desired . If implemented in software, the routines may be stored in any computer readable memory such as in RAM, ROM, flash memory, a magnetic disk, a laser disk, or other storage medium, as is also known . Likewise, this software may be delivered to a computing device via any known delivery method including, for exa mple, over a com munication chan nel such as a telephone line, the Internet, a wireless con nection, etc. , or via a transporta ble mediu m, such as a computer readable disk, flash drive, etc.

More generally, and as understood by those of ordina ry skil l in the art, the various steps described above may be implemented as various blocks, operations, tools, modu les and techniques which, in turn, may be implemented in hardwa re, firmware, software, or any com bination of ha rdwa re, firmwa re, and/or softwa re. When implemented in hardware, some or all of the blocks, operations, techniques, etc. may be implemented in, for example, a custom integrated circuit (IC), an application specific integ rated circuit (ASIC), a field progra mmable logic array (FPGA), a programmable logic array (PLA), etc.

When implemented in software, the software may be stored in any known computer readable mediu m such as on a mag netic disk, an optica l disk, or other storage mediu m, in a RAM or ROM or flash memory of a computer, processor, ha rd disk drive, optical disk drive, tape drive, etc. Likewise, the softwa re may be delivered to a user or a com puting system via any known delivery method including, for example, on a computer readable disk or other transportable computer storage mechanism .

Figure 2 illustrates an example of a suita ble com puting system environment 100 on which a system for the steps of the claimed method and appa ratus may be implemented . The computi ng system environ ment 100 is only one example of a suitable computing environ ment and is not intended t o suggest any li mitation as t o the scope of use or functiona lity of the method or appa ratus of the claims. Neither should the computing environ ment 100 be interpreted as having any dependency or requirement relating t o any one or combination of components illustrated in the exemplary operating environment 100 .

The steps of the claimed method and system are operational with numerous other general pu rpose or specia l pu rpose com puting system environments or configurations. Exa mples of well known computing systems, environ ments, and/or configu rations that may be suitable for use with the methods or system of the claims include, but are not limited to, persona l computers, server computers, ha nd-held or la ptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, progra mmable consu mer electronics, network PCs, minicom puters, mainframe computers, distributed computing envi ron ments that include any of the above systems or devices, and the like.

The steps of the claimed method and system may be described in the general context of com puter-executa ble instructions, such as prog ram modu les, being executed by a computer. Genera lly, prog ram modu les include routines, prog rams, objects, components, data structures, etc. that perform particula r tasks or implement pa rticular abstract data types. The methods and appa ratus may also be practiced in distributed computing environ ments where tasks are performed by remote processi ng devices that are lin ked t hrough a commu nications network. I n both integ rated and distri buted computi ng environ ments, progra m modules may be located in both local and remote com puter storage media including memory storage devices.

With reference to Figure 2, an exemplary system for implementing the steps of the claimed method and system includes a general pu rpose com puting device in the form of a computer 110 . Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 12 1 that cou ples various system components including the system memory t o the processi ng unit 120. The system bus 12 1 may be any of severa l types of bus structures including a memory bus or memory controller, a peripheral bus, and a loca l bus using any of a variety of bus architectures. By way of exam ple, and not limitation, such architectu res include Industry Standa rd Architectu re (ISA) bus, Micro Cha nnel Architectu re (MCA) bus, En ha nced ISA (EISA) bus, Video Electronics Sta nda rds Association (VESA) local bus, and Peri phera l Component Interconnect (PCI) bus also known as Mezzanine bus.

Computer 110 typically includes a variety of computer readable media . Computer reada ble media ca n be any availa ble media that can be accessed by com puter 110 and includes both volatile and nonvolatile media, remova ble and non-removable media . By way of exam ple, and not limitation, com puter readable media may com prise computer storage media and com munication media . Computer storage media includes both volatile and nonvolatile, removable and non-remova ble media implemented in any method or tech nology for storage of information such as com puter readable instructions, data structu res, prog ram modu les or other data . Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory tech nology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, mag netic tape, magnetic disk storage or other magnetic storage devices, or any other mediu m which can be used to store the desired information and which ca n accessed by com puter 110. Commu nication media typica lly embodies com puter readable instructions, data structures, program modules or other data in a modulated data signa l such as a ca rrier wave or other tra nsport mecha nism and includes any information delivery media . The term "modulated data sig nal" mea ns a sig na l that has one or more of its cha racteristics set or cha nged in such a manner as to encode information in the signal . By way of example, and not limitation, commu nication media includes wired media such as a wired network or direct-wired con nection, and wireless media such as acoustic, RF, infrared and other wireless media . Combinations of the any of the above should also be included within the scope of computer readable media .

The system memory 130 includes com puter storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132 . A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within com puter 110, such as during start-up, is typica lly stored in ROM 13 1. RAM 132 typically contains data and/or progra m modules that are immediately accessible to and/or presently being operated on by processing unit 120 . By way of exa mple, and not limitation, Figure 2 illustrates operating system 134, application progra ms 135, other program modules 136, and progra m data 137 .

The com puter 110 may also include other removable/non-removable, volatile/nonvolati le com puter storage media . By way of exa mple only, Figu re 2 illustrates a hard disk drive 140 that reads from or writes to non-remova ble, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolati le mag netic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD ROM or other optical media . Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exempla ry operating envi ron ment include, but are not limited to, magnetic tape cassettes, flash memory ca rds, digita l versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 141 is typically con nected to the system bus 12 1 t hrough a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optica l disk drive 155 are typica lly connected to the system bus 12 1 by a removable memory interface, such as interface 150 .

The drives and their associated computer storage media discussed above and illustrated in Figure 2, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110 . I n Figu re 2, for exam ple, ha rd disk drive 141 is illustrated as storing operating system 144, application prog rams 145, other progra m modules 146, and program data 147 . Note that these components ca n either be the sa me as or different from operating system 134, application programs 135, other progra m modu les 136, and progra m data 137 . Operating system 144, application prog rams 145, other progra m modules 146, and program data 147 are given different num bers here to illustrate that, at a minimu m, they are different copies. A user may enter commands and information into the computer 20 t hrough input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball or touch pad . Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scan ner, or the like . These and other input devices are often con nected to the processing unit 120 t hrough a user in put interface 160 that is cou pled to the system bus, but may be con nected by other interface and bus structu res, such as a para llel port, game port or a universal seria l bus (USB) . A monitor 19 1 or other type of display device is also con nected to the system bus 12 1 via an interface, such as a video interface 190 . I n addition to the monitor, com puters may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected t hrough an output peri pheral interface 190 .

The com puter 110 may operate in a networked environ ment usi ng logical connections to one or more remote com puters, such as a remote computer 180. The remote computer 180 may be a persona l computer, a server, a router, a network PC, a peer device or other common network node, and typica lly includes many or all of the elements described above relative to the com puter 110, although only a memory storage device 181 has been illustrated in Figure 2. The logical con nections depicted in Figure 2 include a loca l area network (LAN) 17 1 and a wide area network (WAN) 173, but may also include other networks. Such networking environments are com monplace in offices, enterprise-wide computer networks, intra nets and the Internet.

When used in a LAN networking environment, the com puter 110 is connected to the LAN 17 1 t hrough a network interface or adapter 170. When used in a WAN networki ng environ ment, the com puter 110 typica lly includes a modem 172 or other mea ns for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be interna l or external, may be connected to the system bus 12 1 via the user input interface 160, or other appropriate mechanism . I n a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, Figure 2 illustrates remote application programs 185 as residi ng on memory device 181 . It will be appreciated that the network con nections shown are exemplary and other mea ns of establishing a commu nications lin k between the computers may be used .

Althoug h the forgoing text sets forth a detailed description of nu merous different embodi ments of the invention, it shou ld be understood that the scope of the invention is defined by the words of the clai ms set forth at the end of this patent. The detailed description is to be construed as exemplary only and does not describe every possibly embodiment of the invention because describing every possible embodiment wou ld be impractica l, if not impossible . Nu merous alternative embodi ments could be implemented, using either cu rrent technology or tech nology developed after the filing date of this patent, which would still fall within the scope of the claims defining the invention .

While the risk evaluation system and method, and other elements, have been described as preferably being implemented in softwa re, they may be implemented in hardware, firmware, etc., and may be implemented by any other processor. Thus, the elements described herei n may be implemented in a sta nda rd multi-pu rpose CPU or on specifically designed hardware or firmwa re such as an application-specific integrated circuit (ASIC) or other hard-wired device as desired, including, but not limited to, the computer 110 of Figure 2. When implemented in software, the software routine may be stored in any computer readable memory such as on a magnetic disk, a laser disk, or other storage medium, in a RAM or ROM of a computer or processor, in any database, etc. Li kewise, this softwa re may be delivered to a user or a diagnostic system via any known or desired delivery method including, for exa mple, on a com puter readable disk or other tra nsportable computer storage mechanism or over a com munication chan nel such as a telephone line, the internet, wireless commu nication, etc. (which are viewed as being the same as or interchangea ble with providing such software via a tra nsportable storage mediu m) .

Thus, many modifications and variations may be made in the tech niques and structu res described and illustrated herein without departing from the spirit and scope of the present invention . Thus, it shou ld be understood that the methods and apparatus described herein are illustrative only and are not li miting upon the scope of the invention .

Accordingly, the invention relates to computer-implemented applications using the polymorphic markers and haplotypes described herein, and genotype and/or disease-association data derived therefrom . Such applications can be usefu l for storing, manipulati ng or otherwise ana lyzing genotype data that is usefu l in the methods of the invention . One exa mple pertains to stori ng genotype information derived from an individual on readable media, so as to be able to provide the genotype information to a third party (e.g. , the individua l, a guardia n of the individual, a hea lth care provider or genetic ana lysis service provider), or for deriving information from the genotype data, e.g. , by compa ring the genotype data to information about genetic risk factors contributing to increased susceptibility to the disease, and reporting results based on such com parison .

I n certai n embodiments, computer-readable media suitably comprise capabi lities of storing (i) identifier information for at least one polymorphic marker or a ha plotype, as described herein; (ii) an indicator of the identity (e.g., presence or absence) of at least one allele of said at least one marker, or a ha plotype, in individuals with the disease; and (iii) an indicator of the risk associated with the marker allele or ha plotype.

The markers and haplotypes descri bed herein to be associated with increased suscepti bility (increased risk) of lu ng ca ncer, are in certain embodiments usefu l for interpretation and/or ana lysis of genotype data . Thus in certain embodiments, determination of the presence of an at- risk allele for lu ng cancer, as shown herein, or determination of the presence of an allele at a polymorphic marker in LD with any such risk allele, is indicative of the individual from whom the genotype data originates is at increased risk of lu ng ca ncer. I n one such embodiment, genotype data is generated for at least one polymorphic marker shown herein to be associated with lu ng cancer, or a marker in li nkage disequilibrium therewith . The genotype data is subsequently made available to a third party, such as the individual from whom the data origi nates, his/her gua rdian or representative, a physicia n or hea lth ca re worker, genetic cou nsellor, or insu rance agent, for exam ple via a user interface accessible over the internet, together with an interpretation of the genotype data, e.g. , in the form of a risk measu re (such as an absolute risk (AR), risk ratio (RR) or odds ratio (OR)) for the disease . I n another embodiment, at-risk markers identified in a genotype dataset derived from an individua l are assessed and resu lts from the assessment of the risk conferred by the presence of such at-risk variants in the dataset are made available to the thi rd party, for exa mple via a secu re web interface, or by other com munication means. The resu lts of such risk assessment can be reported in nu meric form (e.g. , by risk values, such as absolute risk, relative risk, and/or an odds ratio, or by a percentage increase in risk compared with a reference), by gra phical means, or by other means suitable to illustrate the risk to the individual from whom the genotype data is derived .

Nucleic acids and polypeptides The nucleic acids and polypeptides descri bed herein ca n be used in methods and kits of the present invention . An "isolated" nucleic acid molecule, as used herein, is one that is separated from nucleic acids that norma lly fla nk the gene or nucleotide sequence (as in genomic sequences) and/or has been completely or partially pu rified from other transcribed sequences (e.g ., as in an RNA library) . For exa mple, an isolated nucleic acid of the invention ca n be substantia lly isolated with respect to the complex cellu lar milieu in which it naturally occu rs, or cultu re medium when produced by recombinant tech niques, or chemical precu rsors or other chemicals when chemica lly synthesized . I n some instances, the isolated materia l wil l form part of a com position (for exa mple, a crude extract containing other su bstances), buffer system or reagent mix. I n other circu msta nces, the material can be purified to essential homogeneity, for example as determined by polyacryla mide gel electrophoresis (PAGE) or colu mn chromatogra phy (e.g ., HPLC) . An isolated nucleic acid molecule of the invention ca n comprise at least about

50%, at least about 80% or at least about 90% (on a molar basis) of all macromolecula r species present. With regard to genomic DNA, the term "isolated" also can refer to nucleic acid molecu les that are sepa rated from the chromosome with which the genomic DNA is natu rally associated . For exa mple, the isolated nucleic acid molecu le can contain less than about 250 kb, 200 kb, 150 kb, 100 kb, 75 kb, 50 kb, 25 kb, 10 kb, 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0 .5 kb or 0 .1 kb of the nucleotides that flan k the nucleic acid molecu le in the genomic DNA of the cell from which the nucleic acid molecule is derived .

The nucleic acid molecule ca n be fused to other coding or regulatory sequences and still be considered isolated . Thus, recombinant DNA contained in a vector is included in the definition of "isolated" as used herein . Also, isolated nucleic acid molecules include recombi nant DNA molecu les in heterologous host cells or heterologous organisms, as wel l as partial ly or substantia lly pu rified DNA molecules in solution . "Isolated" nucleic acid molecu les also encom pass in vivo and in vitro RNA tra nscripts of the DNA molecules of the present invention . An isolated nucleic acid molecu le or nucleotide sequence can include a nucleic acid molecule or nucleotide sequence that is synthesized chemically or by recom binant means. Such isolated nucleotide sequences are usefu l, for exam ple, in the manufactu re of the encoded polypeptide, as probes for isolating homologous sequences (e.g ., from other mam malia n species), for gene mapping (e.g ., by in situ hybridization with chromosomes), or for detecting expression of the gene in tissue (e.g ., hu man tissue), such as by Northern blot analysis or other hybridization techniques.

The invention also pertains to nucleic acid molecu les that hybridize under hig h stringency hybridization conditions, such as for selective hybridization, to a nucleotide sequence described herein (e.g ., nucleic acid molecules that specifica lly hybridize to a nucleotide sequence containing a polymorphic site associated with a marker or ha plotype described herein) . Such nucleic acid molecu les can be detected and/or isolated by allele- or sequence-specific hybridization (e.g ., under high stringency conditions) . Stringency conditions and methods for nucleic acid hybridizations are well known to the skilled person (see, e.g ., Cu rrent Protocols in Molecu lar Biology, Ausubel, F. et al, Joh n Wiley & Sons, ( 1998), and Kraus, M. and Aa ronson, S., Methods Enzymol ., 200 :546-556 ( 199 1), the entire teachings of which are incorporated by reference herein .

The percent identity of two nucleotide or amino acid sequences can be determined by alig ning the sequences for optimal comparison purposes (e.g ., gaps can be introduced in the sequence of a first sequence) . The nucleotides or amino acids at corresponding positions are then compa red, and the percent identity between the two sequences is a function of the number of identical positions sha red by the sequences (i.e., % identity = # of identical positions/tota l # of positions x 100) . I n certain embodi ments, the length of a sequence alig ned for com parison pu rposes is at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or at least 95%, of the length of the reference sequence. The actual comparison of the two sequences can be accom plished by well-known methods, for exam ple, using a mathematica l algorith m. A non-li miting exa mple of such a mathematica l algorithm is described in Karlin, S. and Altschu l, S., Proc. Natl . Acad . Sci . USA, 90 :5873-5877 ( 1993) . Such an algorith m is incorporated into the NBLAST and XBLAST programs (version 2.0), as described in Altschu l, S. et al., Nucleic Acids Res., 25 :3389-3402 ( 1997) . When utilizing BLAST and Ga pped BLAST programs, the defau lt para meters of the respective programs (e.g ., NBLAST) ca n be used . See the website on the world wide web at ncbi .nlm .nih .gov. I n one embodiment, parameters for sequence comparison ca n be set at score= 100, wordlength = 12, or can be varied (e.g ., W=5 or W=20) . Another exa mple of an algorith m is BLAT (Kent, W.J. Genome Res. 12 :656-64 (2002)) .

Other examples include the algorith m of Myers and Miller, CABIOS ( 1989), ADVANCE and ADAM as described in Torellis, A . and Robotti, C , Comput. Appl . Biosci. 10 :3-5 ( 1994); and FASTA described in Pearson, W. and Lipma n, D., Proc. Natl . Acad . Sci . USA, 85:2444-48 ( 1988) .

I n another embodiment, the percent identity between two ami no acid sequences can be accom plished using the GAP program in the GCG software package (Accelrys, Cam bridge, UK) .

The present invention also provides isolated nucleic acid molecu les that contain a frag ment or portion that hybridizes under hig hly stringent conditions to a nucleic acid that comprises, or consists of, the nucleotide sequence of LD block C07, LD block C08 or LD block C19, or a nucleotide sequence comprising, or consisting of, the com plement of the nucleotide sequence of LD block C07, LD block C08 or LD block C19, wherein the nucleotide sequence optionally com prises at least one polymorphic marker described herein .

The invention also provides isolated nucleic acid molecu les that contain a fragment or portion that hybridizes under highly stringent conditions to a nucleic acid that comprises, or consists of, the nucleotide sequence of a gene selected from the group consisting of CHRNB3, CHRNA6, PDE1C, LSM5 AVL9 (KIAA0241), CYP2A6, CYP2A7, CYP2B7P1, CYP2A13, CYP2B6, or RAB4B, or a nucleotide sequence comprising, or consisting of, the com plement of the nucleotide sequence of the gene, wherein the nucleotide sequence optionally comprises at least one polymorphic marker described herein .

The nucleic acid frag ments are at least about 15, at least about 18, 20, 23 or 25 nucleotides, and can be 30, 40, 50, 100, 200, 500, 1000, 10,000 or more nucleotides in length . I n a specific embodiment, the nucleic acid fragments are 15-400 nucleotides in length .

The invention further provides isolated nucleic acid molecules that contain a fragment or portion that hybridizes under highly stringent conditions to a nucleic acid comprising a nucleotide sequence of any of SEQ I D NOs: 1-737, each of which sequences comprise one of the polymorphic markers associated with lung cancer, as described herein . Such nucleic acid molecu les, e.g ., oligonucleotide probes, ca n be used in the manufacture of a diag nostic reagent for diagnosi ng and/or assessing suscepti bility to lung ca ncer.

The nucleic acid frag ments of the invention are used as probes or primers in assays such as those described herein . "Probes" or "primers" are oligonucleotides that hybridize in a base- specific man ner to a com plementary stra nd of a nucleic acid molecu le . I n addition to DNA and RNA, such probes and primers include polypeptide nucleic acids (PNA), as described in Nielsen, P. et al., Science 254 :1497- 1500 ( 199 1) . A probe or primer comprises a region of nucleotide sequence that hybridizes to at least about 15, typica lly about 20-25, and in certain embodiments about 40, 50 or 75, consecutive nucleotides of a nucleic acid molecu le. I n one embodi ment, the probe or primer comprises at least one allele of at least one polymorphic marker or at least one haplotype described herein, or the complement thereof. I n particula r embodiments, a probe or primer can com prise 100 or fewer nucleotides; for example, in certain embodiments from 6 to 50 nucleotides, or, for exa mple, from 12 to 30 nucleotides. I n other embodiments, the probe or primer is at least 70% identical, at least 80% identical, at least 85% identica l, at least 90% identical, or at least 95% identical, to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence. I n another embodiment, the probe or primer is capable of selectively hybridizing to the contiguous nucleotide sequence or to the complement of the contiguous nucleotide sequence. Often, the probe or primer further comprises a label, e.g ., a radioisotope, a fluorescent label, an enzyme la bel, an enzyme co-factor la bel, a magnetic la bel, a spin label, an epitope la bel .

The nucleic acid molecules of the invention, such as those described above, can be identified and isolated using sta ndard molecular biology techniques well known to the skilled person . The amplified DNA can be la beled (e.g ., radiola beled, fluorescently la beled) and used as a probe for screening a cDNA library derived from hu man cells. The cDNA can be derived from mRNA and contained in a suita ble vector. Corresponding clones can be isolated, DNA obtai ned fol lowing in vivo excision, and the cloned insert can be sequenced in either or both orientations by art- recognized methods to identify the correct reading fra me encoding a polypeptide of the appropriate molecu lar weight. Using these or simi lar methods, the polypeptide and the DNA encoding the polypeptide ca n be isolated, sequenced and further characterized .

Antibodies The invention also provides antibodies which bind to an epitope comprising either a varia nt amino acid sequence (e.g ., comprisi ng an amino acid su bstitution) encoded by a varia nt allele or the reference amino acid sequence encoded by the corresponding non-variant or wild-type allele. For example, if a variant allele encodes an amino acid sequence comprising the epitope CYSTWFEH, wherein the T is an amino acid su bstitution from the native or wild-type A, the antibody of the invention specifically binds to either the epitope CYSTWFEH or CYSAWFEH . The term "antibody" as used herein refers to immu noglobu lin molecu les and immunologica lly active portions of immu noglobulin molecu les, i.e. , molecu les that contai n antigen-bi nding sites that specifica lly bind an antigen . A molecule that specifical ly binds to an epitope is a molecule that binds to that epitope, but does not su bsta ntially bind other epitopes in a sa mple, e.g. , a biological sample, which natura lly contains the epitope . Exam ples of immunologica lly active portions of immu noglobulin molecu les include F(a b) and F(a b fragments which can be generated by treating the anti body with an enzyme such as pepsin . The antibody can be polyclonal or monoclona l. The term " monoclona l antibody" or "monoclona l antibody com position", as used herein, refers to a population of antibody molecules that contain only one species of an antigen binding site ca pable of immunoreacting with a pa rticu lar epitope of a polypeptide of the invention . A monoclona l antibody composition thus typically displays a single binding affinity for a pa rticu lar polypeptide of the invention with which it immu noreacts.

Polyclona l antibodies can be prepared as described above by immu nizing a suitable su bject with a desired immunogen, e.g. , polypeptide of the invention or a fragment thereof. The antibody titer in the immu nized subject can be monitored over time by sta ndard tech niques, such as with an enzyme li nked immu nosorbent assay (ELISA) using immobilized polypeptide. If desired, the antibody molecu les directed against the polypeptide ca n be isolated from the mamma l (e.g. , from the blood) and further pu rified by well-known techniques, such as protei n A chromatography to obtain the IgG fraction . At an appropriate time after immu nization, e.g. , when the antibody titers are hig hest, antibody-producing cells ca n be obtained from the su bject and used to prepare monoclonal anti bodies by standa rd tech niques, such as the hybridoma technique origina lly described by Koh ler and Milstein, Nature 256 :495-497 (1975), the human B cell hybridoma technique (Kozbor et al. , Immunol. Today 4 : 72 ( 1983)), the EBV-hybridoma technique (Cole et al. , Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, Inc. , pp. 77-96) or trioma tech niques. The technology for producing hybridomas is well known (see generally Current Protocols in Immunology ( 1994) Coliga n et al., (eds. ) Joh n Wiley & Sons, Inc., New York, NY) . Briefly, an immortal cell line (typica lly a myeloma) is fused to lym phocytes (typically splenocytes) from a mam mal immu nized with an immunogen as described above, and the cultu re supernata nts of the resu lting hybridoma cells are screened to identify a hybridoma producing a monoclona l antibody that binds a polypeptide of the invention .

Any of the many well known protocols used for fusing lymphocytes and immortalized cell lines can be applied for the pu rpose of generati ng a monoclona l antibody to a polypeptide of the invention (see, e.g. , Current Protocols in Immunology, supra; Galfre et al. , Nature 266: 55052 ( 1977) ; R.H . Kenneth, in Monoclonal Antibodies: A New Dimension I n Biological Analyses, Plenum Pu blishing Corp., New York, New York ( 1980) ; and Lerner, Yale J. Biol. Med. 54 :387-402 ( 1981)) . Moreover, the ordinarily skilled worker will appreciate that there are many variations of such methods that also would be usefu l.

Alternative to preparing monoclonal anti body-secreting hybridomas, a monoclonal antibody to a polypeptide of the invention can be identified and isolated by screening a recom bina nt com binatoria l immu noglobu lin library (e.g. , an antibody phage display li brary) with the polypeptide to thereby isolate immu noglobu lin libra ry members that bind the polypeptide. Kits for generating and screening phage display libraries are commercially available (e.g. , the Pha rmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01 ; and the Stratagene SurfZAP™ Phage Display Kit, Catalog No. 240612) . Additiona lly, exam ples of methods and reagents pa rticularly amenable for use in generating and screening antibody display library can be fou nd in, for exa mple, U.S. Patent No. 5,223,409; PCT Publication No. WO 92/18619; PCT Publication No. WO 91/17271 ; PCT Pu blication No. WO 92/2079 1; PCT Pu blication No. WO 92/15679; PCT Pu blication No. WO 93/0 1288; PCT Publication No. WO 92/0 1047; PCT

Publication No. WO 92/09690; PCT Pu blication No. WO 90/02809; Fuchs et al. , Bio/Technology 9 : 1370- 1372 ( 199 1) ; Hay et al. , Hum. Antibod. Hybridomas 3:81-85 ( 1992) ; Huse et al. , Science 246 : 1275- 1281 ( 1989) ; and Griffiths et al. , EMBO J. 12 :725-734 ( 1993) .

Additiona lly, recombi nant antibodies, such as chimeric and humanized monoclonal antibodies, com prisi ng both human and non-hu man portions, which can be made using standard recombinant DNA techniques, are within the scope of the invention . Such chi meric and humanized monoclonal antibodies can be produced by recombina nt DNA techniques known in the art.

I n general, antibodies of the invention (e.g. , a monoclonal antibody) can be used to isolate a polypeptide of the invention by standard techniques, such as affinity chromatography or immunopreci pitation . A polypeptide-specific antibody can facilitate the pu rification of natural polypeptide from cells and of recombinantly produced polypeptide expressed in host cells. Moreover, an antibody specific for a polypeptide of the invention ca n be used to detect the polypeptide (e.g. , in a cellula r lysate, cell supernatant, or tissue sa mple) in order to evaluate the abunda nce and pattern of expression of the polypeptide . Antibodies can be used diagnostically to monitor protein levels in tissue as part of a clinical testing procedu re, e.g. , to, for exam ple, determine the efficacy of a given treatment regimen . The antibody can be cou pled to a detectable su bsta nce to facilitate its detection . Examples of detectable su bsta nces include various enzymes, prosthetic grou ps, fluorescent materia ls, lu minescent materials, biolu minescent materia ls, and radioactive materials. Exam ples of suita ble enzymes include horseradish peroxidase, alka line phosphatase, beta-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin ; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhoda mine, dich lorotriazinylami ne fluorescei n, dansyl chloride or phycoeryth rin; an example of a lu minescent material includes luminol; exa mples of biolu minescent materials include luciferase, luciferin, and aequorin, and exa mples of suita ble radioactive material include 1 I , 131 I, 3 S or 3H .

Antibodies may also be useful in pharmacogenomic analysis. I n such embodiments, antibodies against variant proteins encoded by nucleic acids according to the invention, such as variant proteins that are encoded by nucleic acids that contain at least one polymorpic marker of the invention, can be used to identify individuals that requi re modified treatment moda lities.

Antibodies ca n furthermore be usefu l for assessing expression of variant proteins in disease states, such as in active stages of lung cancer, or in an individua l with a predisposition to a disease related to the function of the protein, in pa rticu lar lu ng cancer. Antibodies specific for a variant protein of the present invention that is encoded by a nucleic acid that comprises at least one polymorphic marker as described herein can be used to screen for the presence of the variant protein, for exa mple to screen for a predisposition to lu ng cancer as indicated by the presence of the varia nt protein .

Antibodies ca n be used in other methods. Thus, antibodies are useful as diagnostic tools for evaluating proteins, such as variant proteins of the invention, in conjunction with analysis by electrophoretic mobility, isoelectric point, tryptic or other protease digest, or for use in other physica l assays known to those skilled in the art. Antibodies may also be used in tissue typing . I n one such embodiment, a specific varia nt protein has been correlated with expression in a specific tissue type, and antibodies specific for the variant protei n can then be used to identify the specific tissue type.

Subcel lu lar localization of proteins, including varia nt proteins, can also be determined using antibodies, and ca n be applied to assess aberrant subcellu lar localization of the protein in cells in various tissues. Such use can be applied in genetic testing, but also in monitoring a particu lar treatment modality. I n the case where treatment is aimed at correcting the expression level or presence of the varia nt protein or aberrant tissue distribution or developmenta l expression of the variant protein, antibodies specific for the varia nt protein or frag ments thereof can be used to monitor therapeutic efficacy.

The present invention further relates to kits for using antibodies in the methods described herein . This includes, but is not limited to, kits for detecti ng the presence of a varia nt protein in a test sa mple . One preferred embodiment comprises antibodies such as a la belled or labelable antibody and a compou nd or agent for detecting variant proteins in a biologica l sa mple, means for determining the amou nt or the presence and/or absence of variant protei n in the sample, and means for compa ring the amou nt of variant protein in the sample with a standard, as well as instructions for use of the kit.

The present invention will now be exemplified by the followi ng non-limiting exa mples.

EXAMPLE 1 To search for common variants associati ng with smoking behavior we performed meta-analyses of GWA studies, main ly usi ng samples of Europea n ancestry from the ENGAGE consortiu m (www.euengage.org), focusing on two smoking phenotypes: CPD and smoking initiation . The smoking initiation analysis was performed with a tota l of 30,43 1 ever-smokers and 16,050 never-smokers, usi ng data from 12 GWA studies, Corogene, deCODE, EGPUT, ERF, NFBC, KORA, NTR-NESDA, Rotterdam, SUSOD, TwinUK and WTCCC-CAD. For CPD we combined data from the same studies and the NL-BLC with a total of 31,266 subjects. Information on the meta-analysis studies for CPD and smoking initiation is provided in Table 4, and in the Methods. After genomic control correction of each com ponent study, we combined association data for ~2, 500, 000 imputed and genotyped autosoma l SNPs with a fixed-effects additive meta-analysis using the inverse-varia nce method for CPD and smoking initiation . QQ-plots for CPD, excluding markers in the 15q25 region, displayed only modest inflation of the x -test statistic (( C = 1.02) . I n addition t o the 15q25 locus previously described, SNPs at two loci, 19q l 3 and 7pl4, reached genome-wide significa nce for CPD (P<5 10 ~8 ) in the meta-a nalysis data . The QQ-plot for smoking initiation displayed wea k inflation of the x -test statistic ( C = 1.03) and no locus reached GWS .

We selected 15 regions for smoking initiation totaling 277 SNPs (Table 5) and 14 regions totaling 443 SNPs for CPD (Table 6), for in silico replication in sa mples from the Tobacco and Genetics (TAG) and the Oxford-Glaxo Smith Kline (Ox-GSK) consortia . Sample descriptions and their study resu lts are descri bed elsewhere (see OX-GSK and TAG Consortiu m papers) . I n the case of smoking initiation, the regions were identified by sequential selection among SNPs with P<0 .00 1 that had been ordered by increasing p-va lue, skipping over markers in already selected regions, as determined by visual inspection of the LD displayed in the UCSC Genome Browser. For CPD we selected 14 regions in the sa me manner, and included a region on chromosome 8pl l based on large number of SNPs exhibiting suggestive associations with CPD, strong candidacy of region genes (encoding nicoti ne acetylcholi ne receptor subunits a6 and β3 (CHRNA6 and CHRNB3)), and prior suggestive evidence for association between SNPs within this region and ND3 4 .

I n addition to the 15q25 locus, t hree novel loci, 7pl4, 8pl l and 19q l3, were GWS for CPD after com bini ng the resu lts from the ENGAGE meta-analysis set with those of TAG and OX-GSK (Ta ble 7, Figu re 1, and Table 8) . No GWS associations for the selected smoki ng initiation regions were observed in the combined ana lysis of the meta-analysis and the in silico data (Table 9), although many markers gave association signal that is close to GWS. For further confirmation of the CPD association signals at the 7pl4, 8pl l and 19q l 3 loci, selected markers from these regions were genotyped in additional sam ples (n=9,040) from Iceland, Australia, Den mark, Germa ny and Spai n (Table 7) . Markers at 8pl l and the 19q l 3 loci had effects in the same directions, but not the marker on 7pl4 (Table 7) . After combi ning these data with ENGAGE resu lts and the in silico replication the 8pl l and the 19q l 3 loci remained GWS but not the 7pl 4 locus (Table 7) .

Rs6474412 on chromosome 8pl lis located about 2 .1 kb from the 5' end of the β3 nicotinic acetylcholine receptor subunit gene (CHRNB3), and belongs to a grou p of hig hly correlated SNPs, that includes two SNPs in exons of CHRNB3: a synonymous SNP (rs4593), and the only known non-synonymous SNP in CHRNB3 (rs4952) 3. Although the CHRNB3 gene is implicated by the location of the associating SNPs, these markers cou ld be tagging variation elsewhere within the LD block that also contains the 6 nicotinic acetylcholi ne receptor subunit gene (CHRNA6) (Figure 1) .

Nine different nicotinic cholinergic receptor subunits ( 2-α7, β2-β4) are expressed in the hu man brain, and they combine with each other in diverse patterns to form various types of functional penta meric receptors. The different receptor su btypes are distinguished by su bunit composition and sensitivity to nicotine 12 . Involvement of CHRNA6 and CHRNB3 receptor su bunits in nicotine- induced dopami ne-release is indicated in rodent studies 13 . Neither CHRNA6 nor CHRNB3 are expressed in lu ng tissue 14 .

The CPD associated markers on chromosome 19q l 3 are located in a region harboring CYP2A6, which encodes CYP2A6, an enzyme that plays a major role in the oxidation of nicotine in human liver microsomes, as wel l as several other genes and belonging to the CYP gene family (Figu re 1) . A nu mber of sequence varia nts in or nea r CYP2A6 that reduce CYP2A6's enzymatic activity have bee l n identified 15 . For some of these variants, effects on smoking behavior have been suggested 15 . I n the present study, the most significant association in the region was observed with rs4105 144. This SNP is in LD with CYP2A6 *2 (rsl80 1272) ( =0.13 and D'= 1.0 in the HapMa p CEU sa mples) and the CYP2A6 *2 reduced function allele is only found on the background of rs4105 144-C which associates with reduced smoking qua ntity. Althoug h the effect of rs4105144 (0 .41 ±0.06) is sma ller than that of rsl801272 (0.68±0 .18)(Table 7), its association is more significant (lower P value) because of higher minor allele frequency. This suggests that rs4105 144-C may be tagging many reduced function variants. The second most significa nt association in the region was with rs7937, in the untranslated 3' end of the RAB4B gene, which is in LD with rs4105 144 ( =0.32, D'=0 .82 in the HapMa p CEU sa mples) . The third most significa nt association in the region was with rs7260329 that is almost independent of rs4105144 (/^=0.0064, D'=0.091 in the Ha pMap CEU sam ples) . Rs7260329 is an intronic SNP in CYP2B6, but its product converts nicotine to coti nine with about 10% of the cata lytic activity of the CYP2A6 enzyme, and also metabolizes several drugs of abuse, and buproprion, an atypical antidepressant also used as a smoking cessation aid 15 . The CYP2B6 levels in the huma n brain are higher tha n those of CYP2A6 and are altered in smokers and alcoholics 15 16 . We next assessed the SNPs from the novel regions associating with CPD for association with Nicotine Dependence (ND), defined as a score of fou r or hig her on the Fagerstrom Test for Nicotine Dependence (FTND), or endorsement of at least t hree of the seven Diag nostic and Statistica l Ma nual of Menta l Disorders 4th edition (DSM-IV) criteria 7 (See also Methods) . Allele frequencies for 1,979 Icelandic (deCODE) and 835 Dutch (NTR-NESDA) ND cases were compared to 36,202 Icela ndic and 6 11 Dutch population controls. SNPs on chromosome 8pl l , and chromosome 7pl4 associated nomina lly with ND, but none of the SNPs on chromosome 19q l 3 (Ta ble 10) .

We directly genotyped selected markers from the 7pl4, 8pl l and 19q l 3 regions for association with LC (2,0 19 cases and 40,509 controls) in sa mples of European ancestry. The LC data were also combined with su mma ry-level data from the publicly available GWA dataset on lung cancer (2,5 18 case and 1,92 1 controls) from the International Agency for Research on Cancer (IARC) (Ta ble 11) . Nominally sig nificant associations with LC were observed for rs6474412-T on 8pl l (OR = 1.12, 95%CI : 1.05- 1.20, P=0 .00060), rs215614-G on 7pl4 (OR = 1.07, 95%CI : 1.02- 1.13, P=0.01 1), and rs7260329-G and rs4105 144-C on 19q l 3 (OR = 1.06, 95%CI : 1.00- 1.12, P=0 .041 and (OR = 1.09, 95%CI : 1.00- 1.18, P=0.040) (Table 11) . As for the effect on CPD (Ta ble 7) the effects of these variants on LC is su bsta ntially weaker than that of the 15q25 variants (OR = 1.31, P= 1.5 10 )7_ (Table 11), warranting further ana lysis in additional sam ple sets. The potential effect of rs7260329-G and rs4105 144-C on LC, is interesting in light of the fact that CYP2A6 gene product activates procarcinogenic nitrosamines 15 .

Methods

Written informed consent was obtained from all subjects in the populations from 11 countries (Australia, Austria, Den mark, Estonia, Fin land, Germa ny, Iceland, the Netherla nds, New Zealand, Spai n, and United Kingdom) . Inclusion in the study required the availa bility of genotypes from either GWA studies or follow-up genotyping of selected SNPs in additional subjects. All su bjects are of Eu ropean descent. The sa mple sizes are listed in Table 4, for each of the sam ples used in the study. For the ENGAGE meta-analysis of CPD data for 31,266 smokers were utilized, and for the meta-a na lysis of SI the nu mber of cases and controls were 30,43 1 and 16,050, respectively. A brief description of each sam ple follows (the NLBLC sample that participated in the CPD meta ana lysis is described with the lu ng ca ncer and peripheral arterial disease sa mples) .

Description of individual ENGAGE CPD and smoking initiation samples COROGENE: Corogene controls are selected as popu lation controls for CAD cases from the Nationa l Fin risk 1997, 2002, and 2007 su rveys (More information at http ://www. ktl .fi/porta l/eng lish/resea rch people programs/health_promotion_a nd_chronic_ disease_prevention/u nits/chronic_disease_epidemiology_u nit/the_nationa l_fin risk_study ) . The num ber of individua ls that smoked regula rly was 554 and 610 had never smoked . The number of cigarettes smoked per day was avai lable for 353 individuals with mea n of 16.3 (9 .6) .The mean age in the Corogene controls was 57. 5 ( 11.0) and 56 % of the individuals were males. EGPUT : The Estonian cohort is from the population-based bioban k of the Estonian Genome Project of University of Tartu (EGPUT) . The project is conducted according to Estonia n Gene Resea rch Act and all participants have sig ned the broad informed consent (www.geeniva ramu .ee ref 23). Cohort size is 37,000, from 18 years of age and up which reflects closely the age distribution in the Estonian popu lation, 33% male, 67% fema le, 83% Estonians, 14% Russians, 3% other. Su bjects are recruited by the genera l practitioners (GP) and physicians in the hospitals were randomly selected from individuals visiting GP offices or hospita ls24 . Com puter Assisted Personal interview (CAPI) is fil led during 1-2 hou rs at doctors office includi ng personal data (place of birth, place(s) of living, nationality etc. ), genea logica l data (fa mily history, fou r generations), educational and occu pationa l history, lifestyle data (physical activity, dietary habits, smoking, alcohol consumption, women ' s health, quality of life), also anth ropometric and physiological measurements are taken . For the current study GWAS was performed on 1,0 19 selected randomly from all over the country. The smoking quantity was determi ned from the following questions: "If you have ever smoked how old were you when you started to smoke regu larly?"; "How often and how much have you smoked in last 12 months?"; "How many yea rs have you smoked?"; "If you have cha nged you r smoking habits then how?"; "How long have you smoked so?" and "How many hou rs per day do you spend in a smoking area?". Smoking quantity was available for 506 individuals (325 current smokers and 181 former smokers) . The cohort mean age was 42.7 (SD 14.9) years and included 327 (64.6%) males and 179 (35.4%) females.

Decode: The Icela ndic cigarette smoking data were described in detail previously 7, and additiona l su bjects were characterized in the same way. Altogether we included data for 15,3 10 smokers and 6,077 never smokers who have been genotyped on a chip containing the Illu mina 317K set of SNPs in one of several GWA studies conducted by deCODE Genetics. Of these 10,995 smokers were included in our previous study of CPD7. These studies were approved by the Data Protection Commission of Icela nd and the National Bioethics Committee of Iceland . Personal identifiers associated with phenotypic information and blood sa mples were encrypted using a third-party encryption system as previously described 25 . I n addition 4,859 Icelandic subjects, were genotyped for replication using a sing le-track assay (Nanogen - Centau rus) .

ERF: This is a family-based cohort study that is embedded in the Genetic Research in Isolated Populations (GRIP) program in the South West of the Netherlands 26 . The aim of this progra m was to identify genetic risk factors in the development of complex disorders. For the ERF study, 22 families that had at least five children ba ptized in the community church between 1850- 1900 were identified with the hel p of genealogical records. All living descenda nts of these cou ples and their spouses were invited to take part in the study (N~4,700) . Data col lection started in June 2002 and was finished in Februa ry 2005 . 2,923 successfu lly com pleted the questionnaire . Fema les constituted 55% of this sa mple and average age was 50 years.

Genmets/FTC: The Fin nish Twin Cohort includes nationwide sam ples of twins follow-up longitudi nal ly, and forms a part of the GenomEUtwin project, in which female monozygotic pairs were genotyped . DNA sa mples from one member of each monozygotic twin pair were used for genotyping 27 . The Fi nnish twins were unselected with respect to disease status, and had pa rticipated in several waves of data collection in which smoking behaviors have been asked as a pa rt of la rger surveys of health, health habits and other hea lth-related factors. Details of the data collection are avai lable elsewhere 28 29 . The female twins ca me from the older Fi nnish Twin Cohort (questionnaire assessments in 1975, 1981 and 1990) and from the Finntwin l 6 sample (surveys as you ng adu lts was used for smoking assessments) . Because of the low num ber of subjects in the FTC cohort we pooled the data with Health 2000 dataset. Health 2000 is a la rge Finnish cross-sectional health examination survey. It includes a total of 8,028 su bjects aged 30 or over and is a nationa lly representative sa mple of adult Finnish popu lation . Here, we studied a subcohort of 2,124 individuals, Gen MetS, selected for GWA study on metabolic syndrome. Cases were selected according to the IDF Worldwide Defi nition of the Meta bolic Synd rome (http ://www. idf.org/home/index. cfm?node= 1429) . Controls were selected for not carrying the trait. For the cigarettes per day measure for the twins, mea n of the two measures was used if both had the information . We had smoking information for 1996 individua ls of which 488 had smoked regu larly and 1,508 had never smoked . The continuous smoking information as cigarettes per day (CPD) was availa ble for 502 individua ls with mean of 15 .4 (sd = 9 .5) . The mean age in the pooled dataset was 51.4 ( 11.7) and 51.3 % of the dataset were fema les. For a subset of the Gen MetS su bjects ( N=485) serum cotinine levels were available. The cotinine concentration (ng/ml) was determined from the seru m using liquid-phase radioi mmu noassay methodology (Nicotine Metabolite DOUBLE ANTIBODY kit, Diagnostic Products Corporation, Los Angeles, USA) .

KORA: All participants from KORA study are of white European ancestry. Briefly, KORA S4 and KORA F3 epidemiological cohorts represent independent sa mples of unrelated su bjects from the general popu lation from the Augsburg Area (Southern Germany) . KORA F3 was a follow-up exami nation in 2004/05 of KORA S3 individuals recruited in 1994- 1995, where as individua ls in

KORA S4 were recruited in 1999-200 1. From KORA F3 survey (fu ll cohort n = 3,006), 1,644 individua ls between 35 to 79 years were selected for Genotyping on Affymetrix 500K 30 . From

KORA S4 survey (fu ll cohort n = 4261), 1,814 individua ls between 25 to 74 yea rs were selected for Genotypi ng on Affymetrix 1000K.

NFBC (Northern Finnish Birth Cohort of 1966): Mothers expected to give birth in the two northern provinces of Ou lu and La pla nd in 1966 were enrolled in NFBC1966 (n = 12,058 live births) 34 . At the 31-year clinica l exa mination (n = 5,654) and DNA was also extracted from the blood sa mples provided at this time. Of the genotyped individuals 3,299 had smoked regula rly and 1,896 had never smoked . The continuous smoking measu red as cigarettes per day (CPD) was available for 2,233 individua ls having mean CPD of 12 .4 (7.9) . The sex distribution in the NFBC1966 was 47.8 % males and 52. 2 % fema les.

NTR-NESDA: The sample comes from two large-sca le longitudinal studies: the Netherla nds Study of Depression and Anxiety (NESDA) 2 and the Netherlands Twin Registry (NTR) 27 . NESDA and NTR studies were approved by the Central Ethics Committee of the VU University Medical Center Amsterda m. The GWA sample consisted of 1,777 pa rticipa nts from the NTR and 1,763 pa rticipants from NESDA31 . The mean age of the participa nts was 43.8 yea rs (SD 13 .4) and 65.7% of the sam ple was female . For participants of the NTR data longitudinal survey data from 7 waves of data collection ( 199 1-2004) were used to determine smoking behavior. For pa rticipants from NESDA, data on smoking behavior were collected during a clinica l interview between 2004 and 2007 2 . The total sam ple consisted of 1,207 never smokers and 2,236 ever smokers.

Rotterdam: The Rotterdam Study was plan ned and designed in the early 1990s as a longitudi nal study investigating the incidence and prog ression of diseases in the elderly. From

199 1 to 1995 all inha bitants of Ommoord, a district of Rotterdam in the Netherla nds, who were 55 yea rs or older, were invited to partici pate in this study. Of 10,275 eligible individua ls, 7,983 agreed to participate (78%) . I n 1999, 3,0 11 participa nts (out of 4,472 invitees) who had become 55 yea rs of age or moved into the study district since the start of the study were added to the cohort 32 . The Rotterdam Study has been approved by the institutional review board (Medical Ethics Committee) of the Erasmus Medical Center and by the review board of the Netherlands Ministry of Health, Welfare and Sports. All pa rticipa nts provided written informed consent. The current analysis included 6,234 pa rticipa nts for whom genotypi ng was successful and information on smoking behavior was availa ble. 3,610 participants reported to smoke or have smoked in the past while 2,624 participa nts were never smokers. The mean age was 67.9 years (SD - 8.81) and 60% were female.

SORBS: All subjects are part of a sa mple from an extensively phenotyped isolated popu lation from Eastern Germa ny, the Sorbs. The Sorbs are of Slavonic origin, and have lived in ethnic isolation among the Germanic majority during the past 1, 100 years. Today, the Sorbian spea king, Catholic minority comprises approximately 15,000 full-blooded Sorbs resident in about 10 villages in rural Upper Lusatia (Oberlausitz), Eastern Saxony. Smoking ha bits were assessed in a standardized interview. Su bjects were asked "Do you smoke or have you ever smoked?, If yes, how many ciga rettes per day do/did you smoke on average (on most days) and for how many years ?" At present, more tha n 1,000 Sorbia n individuals are enrolled in the study. 913 subjects (321 smokers and 592 never-smokers) were available for the present study. The smokers (208 males, 113 women) had a mea n age of 42. 77 (± 18. 2) years, and the never- smokers ( 162 males, 430 females) had a mea n age of 47.97 (± 18.75) yea rs.

TWINS UK: The cohort (www.twinsuk.ac.u k) is an adu lt twin British registry shown to be representative of singleton populations and the United Kingdom popu lation 33 . A total of 924 females with smoking phenotype were included in the analysis. The mea n age of the TwinsUK cohort was 53.73 (22-80) . Ethics approva l was obtained from the Guy's and St. Thomas' Hospita l Ethics Committee. Written informed consent was obtained from every pa rticipa nt to the study.

The study desig n and genotyping methodology is described in detail elsewhere 34 .

WTCCC-CAD: Detai led descriptions of the Wellcome Trust Case Controls Consortium Study data have already been provided elsewhere, and the CAD cases are Europea n Caucasia ns who had a validated history of either myoca rdial infa rction (MI) or coronary revascula risation (coronary artery bypass surgery or percuta neous coronary angioplasty) before their 66th birthday . The were recruited from April 1998 t o Novem ber 2003 on a national basis35 .

Samples genotyped for individual markers AUS: The Austra lian sa mple took part in the sing le SNP assay replication . Data obtained from 3264 Austra lian su bjects (49% women), 18-88 years of age (mean : 45; SD : 11 years) were used as one of the replication sa mples. Su bjects were participants in either the Australian Nicotine Addiction Genetics (NAG) or a community-based (BigSib) family study. Fa milies chosen for both studies were identified from two cohorts of the Australian Twin Panel, which included spouses of the older of these two cohorts. The NAG families were identified throug h heavy cigarette smoking index cases, and the BigSib families were comprised of families ascertained t hrough the Austra lian Twin Pa nel selected for five or more offspring sharing both biological pa rents. The ancestry of the Austra lian sam ples is predominantly Anglo-Celtic or northern European ( >90%) . The same assessment protocol was used for both the NAG and BigSib studies 36 . Cli nical data were collected usi ng a computer-assisted telephone diagnostic interview (CATI), and adaptation of the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA) 37 38 for telephone admi nistration . The tobacco section of the CATI was derived from the Composite International Diag nostic Interview (CIDI) 3 and incorporated standard FTND, DSM- IIR, and DSM-IV assessments of nicotine dependence . It also included a detailed history of cigarette and other tobacco use, including quantity and frequency of use for cu rrent, most recent, and heaviest period of use. The measure examined for the pu rposes of this study was the num ber of cigarettes smoked per day, during heaviest period of use.

All data-collection procedu res were approved by institutional review boards at Washington University in the United States, and the Queensland Institute of Medica l Research in Austra lia .

DCLST: The Da nish Lu ng Cancer Screening Trial (DCLST)40 participated in the single-SNP assay replications for CPD. DLCST is a randomised 5-yea r tria l compa ring the effect of annual screening with low dose CT on the morta lity of lu ng cancer, with no genetic screening in the control arm. Lu ng function tests are performed annually and information on smoking exposure recorded in all participants. Individua ls volu nteered for the study in response t o advertisements in loca l and regiona l free newspa pers and weeklies. Pa rticipa nts were current or former smokers of both sexes at an age between 50-70 yea rs at inclusion and with a smoking history of more than 20 pack years. Participa nts had t o be able t o clim b 2 flig hts of stairs (around 36 steps) without pausing . FEV1 was at least 30% of predicted normal . Ineligible were those applicants with body weight above 130 kg or previous treatment for lung cancer, breast ca ncer, malignant melanoma or hyperneph roma . Individuals with a history of any other ca ncer withi n 5 years or tuberculosis within 2 yea rs or any serious illness that would shorten life expectancy t o less tha n 10 years were also excluded .

GER: Unrelated commu nity-based volu nteers of Germa n descent (i.e ., both pa rents German) were ra ndomly selected from the general popu lation of Mu nich, Germany, and contacted by mail . To exclude su bjects with central neu rological diseases and psychotic disorders or subjects who had first-degree relatives with psychotic disorders, severa l screeni ngs were conducted before the volunteers were enrolled in the study. First, su bjects who responded were initially screened by phone for the absence of neuropsychiatric disorders. Second, detailed medical and psychiatric histories were assessed for both themselves and their first-degree relatives by using a semi- structu red interview. Third, if no exclusion criteria were fulfilled, they were invited to a com prehensive interview including the Structured Clinical Interview for DSM-IV (SCID I and SCID II) to validate the absence of any lifetime psychotic disorder. Additiona lly, the Family History Assessment Module was conducted to exclude psychotic disorders among their first- degree relatives. Furthermore, a neu rological exami nation was conducted to exclude su bjects with subjects with cu rrent CNS impairment. I n the case that the volu nteers were older than 60 years, the Mini Mental Status Test was performed to exclude su bjects with possible cognitive impairment.

Lung Cancer and Peripheral Arterial Disease

The case-control sam ples utilized for testing for association with the smoking-related diseases LC and PAD were included in our prior study 7, with the addition of LC samples from Denver and a the Danish PAD sample. The NLBLC sam ple was also included in the GWA study of CPD. All samples included in the present study are described below.

NLBLC (The Nijmegen Lung and Bladder Cancer study) : This sa mple set is comprised of samples, previously described in studies of lung and bladder cancer. The Dutch series consists of 3 groups : population controls, patients with urinary bladder cancer, and patients with lung cancer. The lung cancer cases are Population controls : the 1,832 population controls (46% males) were recruited within a project entitled "Nijmegen Biomedica l Study" (NBS) . The details of this study were reported previously 40 . Briefly, this is a population-based survey conducted by the Department of Epidemiology and Biostatistics and the Department of Clinical Chemistry of the Rad boud University Nijmegen Medical Centre (RUN MC), in which 9,37 1 individuals pa rticipated from a total of 22,500 age- and sex stratified, randomly selected inha bitants of Nijmegen . Control individuals from the NBS were invited to pa rticipate in a study on gene- environ ment interactions in multifactorial diseases, such as cancer. The 1,832 controls is a subsa mple of all the participants to the NBS, frequency-age-matched to a series of breast cancer and a series of prostate cancer patients. All the 1,832 participa nts are of self- reported European descent and were fully informed about the goa ls and the procedures of the study. The study protocols of the NBS were approved by the Institutiona l Review Boa rd of the RUNMC and all study subjects signed a written informed consent form . The Dutch bladder cancer popu lation has been described in a previous pu blication 4 1 . Briefly, patients were recruited for the Nijmegen Bladder Ca ncer Study (NBCS) (see http ://dceg .cancer.gov/icbc/membership. html) . The NBCS identified patients through the popu lation-based regiona l ca ncer registry held by the Comprehensive Cancer Centre East, Nijmegen . Patients diagnosed between 1995 and 2006 under the age of 75 years were selected and their vital status and cu rrent addresses updated t hrough the hospital information systems of the 7 commu nity hospita ls and one university hospital (RUN MC) that are covered by the cancer registry. All patients still alive on August 1, 2007 were invited to the study by the Com prehensive Ca ncer Center on behalf of the patients' treati ng physicia ns. I n case of consent, patients were sent a lifestyle questionnaire to fill out and blood sa mples were collected by Thrombosis Service centers which hold offices in all the com munities in the region . 1,65 1 patients were invited to partici pate. Of all the invitees, 1,082 gave informed consent (66%) : 992 filled out the question naire (60%) and 1016 (62%) provided a blood sam ple . The nu mber of participating patients was increased with a non-overlapping series of 376 bladder cancer patients who were recruited previously for a study on gene- environ ment interactions in t hree hospita ls (RUN MC, Canisius Wil hel mina Hospita l, Nijmegen, and Streekzieken huis Midden-Twente, Hengelo, the Netherlands) . Ultimately, completed question naires that included questions on smoking and blood samples were available for 1,276 and 1,392 patients, respectively. All the patients that were selected for the analyses (N= l,277) were of self-reported European descent. The median age at diag nosis was 62 (range 25-93) years and 82% of the participants were males. The study protocols of the NBCS were approved by the Institutional Review Board of the RUN MC and all study su bjects gave written informed consent. The series of patients with lung cancer has been described before 42 . Briefly, Patients with lu ng ca ncer were identified through the popu lation-based ca ncer registry of the Comprehensive Cancer Center IKO, Nijmegen, the Netherla nds. Patients who were diagnosed in one of t hree hospitals (Radboud University Nijmegen Medica l Center and Canisius Wil helmina Hospita l in Nijmegen and Rijnstate Hospital in Arn hem) and who were alive at April 15th, 2008 were recruited for a study on gene-environ ment interactions in lu ng ca ncer. 458 patients gave informed consent and donated a blood sample. This case series was increased with 94 patients to a tota l of 552 by lin king three other studies to the population-based cancer registry in order to identify new occurrences of lung ca ncer among the participants of these other studies. Information on histology, stage of disease, and age at diagnoses was obtained t hrough the cancer registry.

Lung cancer (Iceland). Recruitment began in the yea r 1998 with a nationwide list from the Icelandic Ca ncer Registry (ICR) . About 1,265 LC patients were alive duri ng the period of recruitment, and 665 participated in the project. Information in the ICR includes year and age at diagnosis, yea r of death, SNOMED (Systematized Nomenclatu re of Medicine) code and ICD- 10 (Internationa l Statistical Classification of Diseases and Related Hea lth Problems, 10th revision) classification . Histologica l and cytological verification was available for 647 cases; the remaining 18 cases were diag nosed clinica lly.

Lung cancer (Spain). Patients were recruited at the Oncology Department of Zaragoza Hospita l. Cli nical information including age at onset and histology were collected from medical records. All lung cancer cases and 865 of the 1507 control individuals answered a lifestyle question naire, including questions on smoking status (never, former, cu rrent), and the amount of smoking . Study protocols were approved by the Institutional Review Board of Zaragoza University Hospital .

Lung cancer (Denver). DNA sam ples from blood sa mples and clinica l data were provided from the University of Colorado Cancer Center under COMIRB protocol 08-0380 . Blood samples were collected from 12 17 patients enrolled in any of 20 clinical resea rch tria ls carried out at Colorado SPORE protocols between 1993 and 2008. Of these 12 17 patients, 246 were lung ca ncer cases and 971 had never had lu ng cancer at the time of sample shipment. Lu ng ca ncer cases were identified either from data matches with the Colorado Centra l Cancer Registry or by having malignant lu ng tissue collected via enrollment in a su rgical protocol .

PAD sample (Austria): patients and controls were recruited through the Linz Peripheral Arterial Disease (LIPAD) study duri ng 2000 t o 2002, at the Department of Su rgery, St Joh n of God Hospita l. All patients with chronic atherosclerotic occlusive disease of the lower extremities with typical symptoms, eg claudication or leg pai n on exertion, rest pai n, or minor or major tissue loss, were included on the basis of the final clinical diag nosis established by attending vascu lar surgeons. The diag nosis was verified by interview, physical examination, noninvasive techniques, and angiog raphy 33 . All control subjects were patients at the St John of God Hospital and fulfilled the following criteria : no clinical indication of PAD by history and physica l examination, and systolic brachial blood pressure equal t o or less than the blood pressu re in each of the right and left anterior tibial and posterior tibial arteries (that is, ankle brachial index ≥ 1.0) 33 . Smoking status was assessed as described in ref. 34.

PAD sample (Denmark). The sample consist of five hu nd red and seven patients were consecutively included during November 1999 to Janua ry 2004. All patients had PAD. The diagnosis was esta blished from typica l findings in clinical investigation (intermittent claudication, rest pai n, ulcer or gangrene, and ankle-brachial-index<0 .9) . The sa mples were taken at baseline in a randomized, dou ble-blind trial of roxithromycin versus placebo 43 . All patients were enrolled at Vascula r Surgery Department, Viborg Hospital, Denmark. Exclusion criteria were allergy to macrolides and liver insufficiency.

PAD (Iceland). Patients have been recruited over the past eleven years, as part of a genetic study at deCODE, from a registry of individuals diagnosed with PAD at the major hospita l in

Reykjavik, the Landspitali University Hospita l, during the years 1983-2006. Diagnosis was confirmed by vascula r imaging or segmental pressure measu rements.

PAD (New Zealand). Patients were recruited from the Otago-Southla nd region, and PAD was confirmed by an ankle brachia l index of less than 0 .7, pu lse volu me recordings and angiography/u ltrasou nd imaging . The control group consisted of elderly individua ls with no history of vascu lar disease from the same geographical region . Controls were asymptomatic for PAD and had ankle brachia l indexes of more tha n 1. An abdominal ultrasou nd scan excluded concurrent abdominal aortic aneurysm from both the PAD and control grou ps, and Ang lo- European ancestry was required for inclusion .

Genome- Wide Genotyping Samples had been genotyped on various platforms. Most of the ENGAGE projects utilized the Illu mina platform, either Hu manHap300-HH370 (DCGN, NLBLC), HumanHap550 (Rotterdam study, ERF, TwinU K), or the 610-quad (Corogene, Gen Mets/FTC, and NFBC), but others used Affymetrix 500k (KORA, Sorbs, WTCCC-CAD) and Perlegen 600k (NTR-NESDA) . SNP imputation was based on the Phase I I CEU Ha pMa p sa mples , and was done mostly using IMPUTE , but some studies used MACH4 (Corogene, GenMets/FTC, and NFBC), yielding a total of approximately 2 .5 million SNPs. SNPs were excluded if they had (a) yield lower tha n 95%, (b) minor allele frequency less tha n 1% in the popu lation, or (c) showed significant deviation from Hardy-Weinberg equilibrium in the controls ( P < 0 .00 1) . Any sa mples with a call rate below 98% were excluded from the analysis.

Single SNP Genotyping

Single SNP genotyping for all samples was carried out at deCODE genetics in Reykjavik, Iceland, applying the sa me platform to all populations studied . All single SNP genotyping was carried out using the Centaurus (Nanogen) platform 47 . The quality of each Centaurus SNP assay was evaluated by genotyping each assay on the CEU sa mples and com paring the resu lts with the

HapMa p data 44 . All assays had mismatch rate <0 .5% . Additionally, all markers were re- genotyped on more than 10% of sa mples typed with the Illu mina platform resu lting in an observed mismatch in less than <0 .5% of sa mples.

Association Analysis For the quantitative trait association ana lysis, i.e . smoking qua ntity measu red in cigarettes per day, a classical linea r regression, using the genotype as an additive covariate (or expected allele cou nt for imputed SNPs) and CPD categories as a response, was fit to test for association . An additive model for SNP effects was assu med in all instances. The smoking categories are : 1- 10 CPD, 11-20 CPD, 21-30 CPD, and 31 CPD and over, and associations with quantitative traits were performed adjusting for sex and year of birth 2. We converted the resu lt t o CPD by dividing the categorical effect size by 10 . The association analysis was performed by most of the ENGAGE studies using SNPTEST45, but Mach 46 (KORA), ProbABEL (ERF, Rotterdam, KORA) ; and GENABEL (TwinUK) were also used .

For case control association ana lysis, e.g. when com paring PAD and LC and nicotine dependence cases t o population controls, we utilized a sta nda rd likelihood ratio statistic, implemented in the NEMO softwa re44 t o calculate two-sided P values for each individual allele, assuming a multiplicative model for risk, i.e . that the risk of the two alleles a person ca rries multiplies 48 . Combined significa nce levels were calcu lated by weighing z-scores by the inverse of the square root of each study's effective sam ple size.

Heterogeneity is tested by compa ring the null hypothesis of the effect being the sa me in all popu lations t o the alternative hypothesis of each popu lation having a different effect using a likelihood ratio test. I 2 lies between 0% and 100% and describes the proportion of total variation in study estimates that is due t o heterogeneity 49 .

Correction for Relatedness of the Subjects and Genomic Control We estimated an inflation factor for each genome-wide association scan by calculating the average of the chi-squa re statistics, which is a method of genomic control 50 t o adjust for both relatedness and potential population stratification . The inflation factors for CPD and smoki ng initiation were estimated within each study by the ratio of the media n of the x -test statistic and its expected value (0 .675 2), or as 1 if this ratio was less tha n 1, and all the results presented from association with these traits were adjusted based on these inflation factors.

In-silico Replication Studies The TAG and OX-GSK consortia provided results for the selected SNPs, using the sa me methods (i.e. categorical CPD corrected for age and sex) as described above, and provided resu lts from each of the partici pating populations. Data from sa mples also present in the ENGAGE ana lysis were excluded from the in-silico replication stage, and data derived from samples participating in both the TAG and the OX-GSK consortia were entered only once into the ana lysis. Table 4. Sample sizes for the ENGAGE GWAS used for the meta-analyses of Smoking Initiation and

CPD.

Smoking Initiation case control Corogene 579 323 deCODE 15,3 10 6,077 EGPUT 506 5 13 ERF 907 378 KORA3 792 831 KORA4 955 800 NFBC 3,219 1,852 NTR-NESDA 2,236 1,207 Rotterdam 3,610 2,624 SUSOD 321 592 TwinUK 537 387 WTCCC-CAD 1,459 466 Total 30,431 16,050

CPD N Corogene 265 deCODE 15,3 10 EGP 531 ERF 5 11 KORA3 183 KORA4 274 NFBC 2, 167 NL-BLC 2,97 1 NTRNESDA 2,07 1 Rotterdam 4,759 SUSOD 321 TwinUK 668 WTCCC-CAD 1,235 Total 3 1,266 Table 5. Genomic regions selected for follow-up (Smoking Initiation)

P-value Chromosome Start Size (kb) Markers SNP (ENGAGE) Chr 1 231,463,228 30 11 rsl2122968 7.2 -10 7 Chr 5 124,100,202 12 4 rs7705693 1.1 10-6 Chr 7 134,321,159 26 6 rs4329203 1.8 10-6 Chr 15 79,160,087 9 2 rs868954 7.1 10-6 Chr 7 117,282,903 167 55 rsl0487380 8.6 10-6 Chr 16 5,549,163 7 9 rs9888773 8.8 10-6 Chr 7 38,397,140 4 5 rsl2701627 9.2 10-6 Chr 3 141,016,599 13 4 rsl0935356 9.8 10-6 Chr 2 45,018,893 32 11 rsl63503 1.0 lo-5 Chr 5 166,920,252 68 47 rs2336894 1.1 lo-5 Chr 7 150,546,004 36 3 rsll22979 1.6 lo-5 Chr 2 145,884,678 130 3 rsl6824949 2.3 lo-5 Chr 3 65,775,272 13 25 rs868633 3.4 lo-5 Chr 1 43,629,135 392 58 rs2251802 5.0 lo-5 Chr 11 112,335,772 82 15 rsll214441 6.2 lo-5 Table 6. Genomic regions selected for follow-up (CPD).

Size P-value Start (kb Markers SNP Genes (ENGAGE)

C r 15 76498858 461 193 rsl051730 2. 1 10 IREB2, LOCI 23688, PSMA4,CHRNA5, CHRNA3, CHRNB4,ADAMTS7, MORF4L1

Chr 19 459609 16 268 49 rs8102683 1.6 -10 NUMBL, ADCK4, ITPC, C190RF54, SNRPA, RAB4B, EGLN2, CYP2A6, CYP2B7P1, CYP2B6, CYP2A13, CYP2F1

Chr 7 32222133 221 103 rs2 15596 6.3 -10- PDE1C

Chr 16 81544689 554 47 rs4783307 1.3 -10 CDH13

Chr 1 51937813 9 16 45 rsl935289 4 .1 -10 OSBPL9, NRD1, RAB3B, TXNDC12, BTF3L4, ZFYVE9, CC2D1B, ORCIL, PRPF38A, ZCCHC11

Chr 22 25725385 211 13 rsl l090466 2. 1 -10

Chr 9 27830794 103 55 rsl0968202 4 .6 10 LING02

Chr 3 128566720 109 8 rs732548 9.9 -10 (BC015846)

Chr 2 2361 18985 17 13 rsl2470301 1. 1 -10 CENTG2

Chr 6 41658150 56 3 rsl2207736 4 .2 -10 FOXP4, MDFI

Chr 11 43574122 222 33 rs7 114842 4 .0 10 HSD1 7B12

Chr 5 16 1262342 92 23 rs6887149 5.7 -10 GABRA1, GABRG2

Chr 16 6461007 353 12 rs8047986 8.3 0 A2BP1

Chr 2 33032150 142 18 rsl0490451 1. 1 -10 LTBP1

Chr 8 42643741 84 21 rsl0958726 1.3 10 CHRNA6, CHRNB3 Table 7. Association of markers in 4 chromosomal regions with CPD. Results are given for the ENGAGE analysis (ENGAGE), the in-silico replication obtained by combining results from TAG and OX-GSK {in-silico), and the results of single-SNP assay replications in samples from Iceland, Australia, Denmark, Germany, and Spain (ISL-AUST-DEN-GER-SPA). Samples that were both in ENGAGE and either TAG or OX-GSK were removed before obtaining 5 the combined in-silico results. Shown are the number of smokers (N), the effect allele (Al) and the other allele (A2), the allele frequencies (Freq), the chromosome number and position, the estimated allelic effects on CPD and their standard errors in CPD (Effect and SE), the P value for the test of association (P), the P value for the test for heterogeneity in effect size, and an estimate of the proportion of total variation in study estimates that is due to heterogeneity (I 2)

ENGAGE TAG and OX-GSK ISL-AUS-DEN-GER (meta-analysis) (In silico replication) SPA Combined Allele ( N=31,266) (N=45,691) (direct genot) (N=85,997) (N=9,040) SNP Al A2 Freq C r Position Effect±SE Effect±SE Effect±SE Effect±SE I rsl051730 0.339 15q25 76681394 0.84±0.07 2. 1-10 33 0.78±0.06 5.6-10 76,972 0.80±0.05 2.4-10- 9 0.035 32 rs6474412 T C 0.784 8pll 42669655 0.31±0.08 0.00017 0.30±0.07 2.6- 10 5 0.19±0.18 0.30 84,956 0.29±0.05 1.4-10-' 0.24 13 rsl3280604 A G 0.784 8pll 42678743 0.31±0.08 0.00012 0.30±0.07 2.7- 10 5 76,670 0.31±0.05 1.3-10-' 0.24 14 rs215614 G A 0 ..356 7pl4 32313860 0 ..38±0..07 2 ..4-10-8 0 ..17±0..06 0 ..0036 -0. 15±0. 16 0.35 86,259 0 ..22±0..04 2 ..1-10 7 0 .018 34 8 rs215605 G T 0 ,.357 7pl4 32303490 0 .,39±0..07 1 ..7-10- 0 .,17±0,.06 0 .,0035 77,012 0 .,26±0..04 5 ..4-10- 0 ,.12 22 rs7937 T c 0 ,.560 19ql3 45994546 0 .,34±0..07 2 ..2 10 7 0 ..19±0..06 0 .,0011 0.19±0.14 0.17 86,319 0 .,24±0..04 2 ..4-10 0 ,.45 1 rsl801272 A T 0 ,.961 19ql3 46046373 1 .,08±0,.27 7 ..0-10 5 0 .,41±0,.24 0 .,084 66,380 0 .,68±0..18 0 ..00011 0 ,.50 0 rs4105144 C T 0 ..704 19ql3 46050464 0 .59± 0..10 1 ..2-10 9 0 ..31±0..08 5 ..8-10 5 0.27±0.15 0.069 83,317 0 .39± 0..06 2 ..2-10 12 0 .51 0 rs7260329 G A 0 ,.687 19ql3 46213478 0 .,43±0..07 1 ..l-lO 0 .,06±0,.06 0 .36 0.08±0.16 0.65 86,092 0 .,20±0..04 5 ..5-10 0 ,.12 21 10 Table 8 . CPD. Association of markers within the regions selected by ENGAGE. Results are given for the ENGAGE discovery sample, the in-silico replication studies using data from the TAG and OX/GSK consortia (see accompanying papers). Shown are the number of smokers (N), the effect allele (Al) and the other allele (A2), the allele frequencies (Freq), the chromosome number and position, the estimated allelic effects on CPD and their standard errors in CPD

(Effect and SE), the P value for the test of association (P), the P value for the test for heterogeneity in effect size ( Phet), and an estimate of the proportion of total variation in study estimates that is due t o heterogeneity (I 2)

ENGAGE I n silico Combined 2 SNP Al A2 Chr Position Effect ±SE P Effect ±SE P N Effect ±SE P Phet I rsl2130751 G A 1 51937813 0 .,33±0..08 l .le-05 0 .,05±0..07 0 ..46 76,592 0 .,17±0..05 0.00074 0.40 4 rsl0888734 A G 1 52038830 0 .,31±0..06 l .le-06 0 .,08±0..06 0 . 15 77,009 0 .,18±0..04 2e-05 0.32 9 rs7541944 A C 1 52042644 0 .,24±0..07 0.00054 0 .,11±0..06 0 ..084 76,961 0 .,16±0..05 0.00037 0.45 1 rs755 1758 G T 1 52046666 0 .,31±0..06 1.6e-06 0 .,09±0..06 0 . 11 77,007 0 .,18±0..04 1.3e-05 0.27 12 rs2982846 G T 1 52052238 0 .,30±0..07 1.7e-05 0 .,08±0..06 0 .22 76,971 0 .,17±0..05 0.00019 0.49 0 rs2747525 C A 1 52056606 0 .,32±0..07 2.3e-06 0 . 12±0..06 0 ..047 76,976 0 .,20±0..04 4.8e-06 0.60 0 rsl l205896 G T 1 52063572 0 .,31±0..06 9.8e-07 0 .,08±0..06 0 . 16 76,980 0 .,18±0..04 1.7e-05 0.70 0 rs2077725 A G 1 52066 158 0 .,28±0..07 1.9e-05 0 . 12±0..06 0 ..045 76,981 0 .,19±0..04 1.6e-05 0.62 0 rsl l205897 T C 1 52070739 0 .,31±0..06 7.7e-07 0 ..07±0..06 0 .2 77,001 0 .,18±0..04 2.5e-05 0.59 0 rs6702037 G A 1 52083344 0 .,31±0..06 le-06 0 .,10±0..06 0 ..082 77,022 0 .,19±0..04 6.3e-06 0.59 0 rs4422953 T C 1 52083470 0 .,29±0..07 8.7e-06 0 .,13±0..07 0 ..083 57,620 0 .,22±0..05 8.7e-06 0.48 0 rs6676789 c T 1 52089373 0 .,31±0..06 1.2e-06 0 .,10±0..06 0 ..079 77,008 0 .,19±0..04 6.4e-06 0.61 0 rs883058 T A 1 52089819 0 .,34±0..07 l .le-06 0 .,10±0..06 0 . 11 76,792 0 .,20±0..05 l .le-05 0.47 0 rs7526552 c T 1 52093 147 0 .,32±0..06 4.4e-07 0 ..07±0..06 0 .2 77,014 0 .,18±0..04 1.9e-05 0.63 0 rsl l205899 c T 1 52095249 0 .,29±0..07 2.3e-05 0 .,08±0..06 0 .2 1 76,973 0 .,17±0..05 0.00021 0.54 0 rs669 1091 c G 1 52096360 0 .,31±0..06 l .le-06 0 ..07±0..06 0 .22 77,01 1 0 .,17±0..04 4.6e-05 0.41 3 rs6588415 A G 1 52106635 0 .,31±0..07 3.4e-06 0 .,11±0..06 0 ..071 76,983 0 .,19±0..04 l .le-05 0.60 0 rsl538881 C T 1 52109627 0 .,31±0..06 8.3e-07 0 .,10±0..06 0 ..089 77,005 0 .,19±0..04 6.3e-06 0.54 0 rsl890946 C T 1 521 15015 0 .,31±0..06 1.4e-06 0 . 11±0..06 0 ..07 76,993 0 .,19±0..04 6e-06 0.58 0 rs6663305 A G 1 521 15885 0 .,30±0..07 8.8e-06 0 . 12±0..06 0 ..04 76,959 0 .,20±0..04 8.4e-06 0.76 0 rs736756 C A 1 521 17002 0 .,30±0..06 4.4e-06 0 ..09±0..06 0 . 12 76,892 0 .,18±0..04 2.9e-05 0.72 0 rsl0888738 C T 1 521 19626 0 .,26±0..06 4.5e-05 0 ..09±0..06 0 . 11 76,987 0 .,17±0..04 0.0001 1 0.96 0 rsl l205902 T G 1 52121357 0 ..29±0..07 1.3e-05 0 ..09±0..06 0 . 11 76,995 0 .,18±0..04 5e-05 0.97 0 rsl954260 T C 1 52163620 0 ..29±0..06 5e-06 0 . 12±0..06 0 ..035 77,004 0 .,20±0..04 4.7e-06 0.98 0 rsl l20591 1 G A 1 52168900 0 ..32±0..07 1.8e-06 0 .,15±0..06 0 ..014 76,995 0 .,22±0..04 6.3e-07 0.93 0 rsl2566236 T G 1 52169531 0 ..32±0..07 1.6e-06 0 . 14±0..06 0 ..016 77,003 0 .,22±0..04 7.1e-07 0.94 0 rsl0888740 G A 1 52172685 0 .,31±0..06 1.9e-06 0 .,12±0..06 0 ..036 77,008 0 .,20±0..04 2.6e-06 0.97 0 rs2795002 T C 1 52174617 0 .,25±0..07 0.00039 0 .,10±0..06 0 . 1 76,819 0 . 17±0..05 0.00039 0.89 0 rs66981 10 c T 1 52176729 0 .,33±0..07 5.1e-07 0 .,15±0..06 0 ..01 1 77,008 0 .,23±0..04 2.1e-07 0.95 0 rsl935288 T c 1 52178276 0 .,25±0..07 0.00037 0 .,10±0..06 0 . 1 76,819 0 . 17±0..05 0.00039 0.88 0 rsl935289 c T 1 52179460 0 ..41±0..07 4 .1e-08 0 ..09±0..07 0 . 16 74,287 0 ..22±0..05 4.6e-06 0.30 10 rs2809943 c T 1 52188728 0 .,25±0..07 0.00042 0 . 11±0..06 0 ..092 76,827 0 . 17±0..05 0.00036 0.92 0 rs2809944 T c 1 52190 126 0 .,25±0..07 0.00035 0 .,10±0..06 0 . 12 76,821 0 .,16±0..05 0.00045 0.87 0 rs7541308 T G 1 52190709 0 32±0 06 8 4e-07 0.12±0.06 0 039 77 009 0 21±0 04 1 7e-06 0 97 0 rs6686975 A G 1 52195722 0 32±0 07 2 8e-06 0.12±0.06 0 06 76 852 0 20±0 05 6 7e-06 0 96 0 rsl0888743 T C 1 52195755 0 39±0 08 3 5e-07 0.12±0.07 0 081 76 321 0 23±0 05 4 2e-06 0 33 9 rs2809948 G A 1 52196682 0 26±0 07 0 00037 0.12±0.07 0 082 76 677 0 18±0 05 0 00025 0 76 0 rsl0888744 G T 1 52204400 0 34±0 09 0 00013 0.10±0.08 0 22 76 230 0 20±0 06 0 00073 0 24 14 rsl7107020 A G 1 52299 174 0 58±0 18 0 00 16 0.04±0. 14 0 77 76 470 0 23±0 11 0 036 0 40 4 rs4394585 G A 1 52485622 0 28±0 07 4 9e-05 0.08±0.06 0 19 76 659 0 16±0 05 0 00028 0 60 0 rs7513934 G A 1 52590776 0 21±0 06 0 00082 0.12±0.06 0 035 76 646 0 16±0 04 0 00015 0 22 15 rsl2044739 C T 1 52601327 0 47±0 15 0 00 17 0.07±0. 11 0 52 76 711 0 21±0 09 0 018 0 18 18 rs9633423 A G 1 52608727 0 33±0 07 1 5e-06 0.10±0.06 0 12 76 702 0 19±0 05 1 7e-05 0 30 10 rs835036 C T 1 52769828 0 30±0 07 4 5e-06 0.11±0.06 0 078 76 518 0 19±0 04 1 4e-05 0 29 11 rs667 1552 A G 1 52854173 0 58±0 18 0 00 1 0.11±0. 12 0 33 76 969 0 25±0 10 0 0091 0 25 13 rsl70 12387 A G 2 33032 150 1 11±0 26 2 8e-05 -0.29±0.20 0 14 72 857 0 17±0 14 0 23 0 065 29 rsl70 12390 C T 2 33032718 1 10±0 27 6 4e-05 -0.39±0.20 0 048 72 421 0 08±0 13 0 54 0 050 32 rsl70 12393 G A 2 33032785 1 10±0 27 4 7e-05 -0.37±0.20 0 063 72 619 0 11±0 14 0 43 0 045 32 rs4952325 A G 2 33038890 0 96±0 25 0 00012 -0.33±0. 19 0 085 75 055 0 12±0 13 0 37 0 098 25 rs4952326 C T 2 33041 196 0 91±0 23 6 4e-05 -0.31±0. 19 0 11 75 055 0 17±0 14 0 2 0 093 25 rsl70 12409 A G 2 33042874 0 78±0 23 0 00063 -0.27±0. 19 0 15 75 341 0 12±0 13 0 35 0 28 12 rsl l l24300 G T 2 33061280 0 85±0 24 0 00047 -0. 19±0.20 0 33 73 131 0 19±0 14 0 18 0 12 23 rsl2468402 G A 2 33065980 1 11±0 27 3 5e-05 -0. 19±0.21 0 38 72 577 0 27±0 16 0 078 0 082 27 rsl70 12426 C T 2 33070 188 0 93±0 24 8 4e-05 -0. 17±0.21 0 4 1 74 746 0 28±0 15 0 06 0 12 23 rsl542343 T c 2 33076523 1 10±0 26 3e-05 -0.08±0.20 0 69 74 472 0 31±0 15 0 037 0 042 32 rsl7569615 T G 2 33091430 0 91±0 23 0 0001 1 -0. 16±0.20 0 43 76 686 0 27±0 15 0 064 0 062 29 rs3769554 A G 2 33108941 1 06±0 26 6 6e-05 -0. 15±0.20 0 46 74 284 0 27±0 15 0 08 0 056 30 rsl l l24301 T A 2 33109675 0 71±0 22 0 00 11 -0. 16±0.20 0 44 76 716 0 22±0 14 0 12 0 073 27 rs6727912 T C 2 33142223 1 07±0 25 1 3e-05 -0.26±0.20 0 18 74 764 0 22±0 14 0 12 0 074 28 rs2123915 A C 2 33150083 1 09±0 27 3 8e-05 -0.29±0.21 0 17 72 649 0 20±0 15 0 18 0 11 24 rsl0490451 A G 2 33165965 1 09±0 25 1 le-05 -0.26±0. 18 0 16 72 571 0 19±0 14 0 17 0 031 35 rsl l l24303 A G 2 33170204 0 94±0 26 0 00039 -0.27±0.20 0 17 72 866 0 13±0 14 0 35 0 21 16 rsl902030 G A 2 33174095 0 91±0 24 0 00017 -0. 17±0. 18 0 33 74 485 0 17±0 13 0 19 0 10 25 rsl l687781 A G 2 2361 18985 0 47±0 13 0 00026 -0.07±0.09 0 46 74 272 0 11±0 07 0 13 0 31 10 rs6718421 C T 2 2361 19282 0 47±0 13 0 00025 -0.07±0.09 0 48 74 536 0 11±0 07 0 12 0 35 7 rsl3421279 C G 2 236120581 0 47±0 13 0 00025 -0. 11±0.09 0 24 74 268 0 08±0 07 0 24 0 29 11 rsl3021 173 T C 2 236121417 0 43±0 10 1 7e-05 -0.04±0.08 0 62 76 965 0 13±0 06 0 029 0 11 23 rs7586914 A G 2 236122252 0 43±0 10 1 5e-05 -0.03±0.08 0 7 76 959 0 14±0 06 0 02 0 043 31 rsl2475206 C T 2 236122627 0 45±0 10 9 4e-06 -0.01±0.08 0 89 76 937 0 16±0 06 0 0092 0 034 32 rsl2470301 G c 2 236122942 0 49±0 10 1 le-06 -0.05±0.08 0 51 74 682 0 15±0 06 0 015 0 020 36 rs758 1943 T G 2 236126122 0 47±0 13 0 00024 -0.06±0. 10 0 51 74 528 0 12±0 07 0 11 0 39 5 rsl30 18365 T G 2 236126752 0 46±0 13 0 00055 -0.07±0. 10 0 49 76 383 0 11±0 08 0 14 0 40 4 rsl3431390 A G 2 236128971 0 60±0 14 1 4e-05 -0.03±0. 10 0 76 74 430 0 18±0 08 0 022 0 36 6 rsl865947 G A 2 236130793 0 47±0 13 0 00027 -0.05±0. 10 0 63 74 524 0 13±0 08 0 082 0 44 2 rs6414040 A C 2 236135306 0 33±0 09 0 00019 0.01±0.08 0 92 76 797 0 14±0 06 0 014 0 63 0 rsl2692173 A C 2 236135601 0 47±0 13 0 00033 -0.05±0. 10 0 58 74 236 0 13±0 08 0 097 0 37 6 rsl2634857 A G 3 128566720 0 30±0 09 0.00063 -0.05±0.07 0 46 76 984 0 08±0 05 0 13 0 22 15 rs212 1851 T C 3 128641324 0 35±0 07 1.6e-06 -0.07±0.06 0 29 76 808 0 10±0 05 0 028 0 045 31 rsl3065243 T C 3 128642649 0 37±0 08 le-06 -0.08±0.06 0 18 76 517 0 09±0 05 0 046 0 014 38 rsl2486396 G A 3 128643205 0 36±0 07 l .le-06 -0.09±0.06 0 18 76 8 11 0 09±0 05 0 047 0 020 36 rs732548 A G 3 128644442 0 37±0 07 9.9e-07 -0.09±0.06 0 18 76 819 0 09±0 04 0 047 0 016 37 rs732549 T G 3 128644479 0 36±0 07 1.4e-06 -0.09±0.06 0 16 76 788 0 09±0 04 0 057 0 017 37 rs737646 c A 3 128645161 0 36±0 08 2.8e-06 -0.09±0.07 0 18 76 762 0 09±0 05 0 066 0 023 35 rs2594220 T G 3 1286761 17 0 36±0 10 0.00021 -0.04±0.08 0 66 76 702 0 12±0 06 0 048 0 34 8 rs6859788 A T 5 161262342 0 28±0 07 0.00017 -0.02±0.07 0 82 76 717 0 10±0 05 0 027 0 44 2 rsl870230 C T 5 161270378 0 29±0 08 0.00012 -0.02±0.07 0 8 1 76 717 0 11±0 05 0 027 0 45 1 rsl0068984 A G 5 161274708 0 28±0 08 0.00023 -0.02±0.07 0 8 1 76 717 0 10±0 05 0 034 0 43 3 rs7723554 T C 5 161276527 0 28±0 07 0.00014 -0.02±0.07 0 8 76 717 0 10±0 05 0 026 0 43 2 rsl3186615 T C 5 161278335 0 28±0 07 0.00024 -0.02±0.07 0 8 76 717 0 10±0 05 0 033 0 42 3 rsl457703 A G 5 161280570 0 28±0 08 0.0002 -0.02±0.07 0 79 76 717 0 10±0 05 0 033 0 43 3 rs7730737 G A 5 161282399 0 28±0 07 0.00016 -0.01±0.07 0 83 76 717 0 10±0 05 0 027 0 43 2 rs6893787 C T 5 161282903 0 29±0 08 0.00016 -0.02±0.07 0 78 76 717 0 10±0 05 0 032 0 44 2 rsl902795 T c 5 161286763 0 31±0 08 0.00019 -0.02±0.07 0 76 76 579 0 11±0 05 0 031 0 20 16 rs6556564 G A 5 1612903 16 0 30±0 08 0.00024 -0.02±0.07 0 75 76 580 0 11±0 05 0 033 0 19 17 rsl350374 C T 5 161291288 0 30±0 08 8.7e-05 -0.03±0.07 0 69 76 697 0 10±0 05 0 032 0 4 1 4 rsl350375 C G 5 161291466 0 30±0 08 7.7e-05 -0.05±0.07 0 44 76 697 0 09±0 05 0 055 0 35 7 rs9313903 C T 5 161291777 0 31±0 08 0.00012 -0.03±0.07 0 72 76 579 0 12±0 05 0 024 0 19 17 rsl585196 G A 5 161293077 0 32±0 08 4.9e-05 -0.02±0.07 0 76 76 655 0 12±0 05 0 019 0 51 0 rsl585198 G A 5 161294336 0 30±0 08 7.1e-05 -0.02±0.07 0 75 76 689 0 11±0 05 0 023 0 42 3 rsl902796 A G 5 161301520 0 30±0 07 5.4e-05 -0.02±0.07 0 8 76 689 0 11±0 05 0 019 0 36 7 rsl457705 A G 5 161307052 0 33±0 07 1.4e-05 -0.02±0.07 0 8 76 689 0 12±0 05 0 Oi l 0 36 7 rsl0050729 T C 5 161309251 0 32±0 07 2e-05 -0.02±0.07 0 79 76 689 0 12±0 05 0 012 0 37 6 rs6866875 c T 5 161330637 0 31±0 07 4.3e-05 -0.02±0.07 0 71 76 686 0 11±0 05 0 022 0 37 6 rs21 12596 A G 5 161348056 0 33±0 08 9.9e-06 -0.03±0.07 0 65 76 653 0 11±0 05 0 016 0 23 15 rs6887149 A G 5 161353253 0 34±0 08 5.7e-06 -0.04±0.07 0 59 76 637 0 11±0 05 0 015 0 23 15 rs3886595 C G 5 161353951 0 31±0 08 2.8e-05 -0.07±0.07 0 29 76 610 0 08±0 05 0 068 0 22 15 rsl862328 A G 5 161354432 0 32±0 07 2.1e-05 -0.05±0.07 0 49 76 610 0 10±0 05 0 031 0 29 11 rsl475365 A C 6 41658 150 0 30±0 09 0.00052 0.10±0.07 0 17 76 573 0 17±0 05 0 0013 0 30 10 rsl2207736 G T 6 41681966 0 41±0 09 4.2e-06 0.09±0.08 0 27 76 863 0 22±0 06 0 00015 0 045 30 rs2495229 T c 6 41713808 0 33±0 09 0.00013 0.08±0.08 0 29 76 909 0 19±0 06 0 001 1 0 28 11 rsl6875791 G c 7 32222 133 0 25±0 07 0.00038 0.02±0.06 0 75 77 003 0 12±0 05 0 012 0 0056 42 rsl860224 A G 7 32222 167 0 26±0 07 0.00022 0.01±0.06 0 89 77 004 0 11±0 05 0 013 0 0075 40 rsl2672267 G A 7 32225259 0 29±0 08 0.00015 0.02±0.07 0 8 1 76 799 0 13±0 05 0 0091 0 15 20 rs719585 C G 7 32225610 0 35±0 08 2e-05 0.05±0.07 0 52 76 713 0 17±0 05 0 0013 0 16 19 rs6945244 T C 7 32225798 0 27±0 06 2.2e-05 0.11±0.06 0 059 76 918 0 18±0 04 2 4e-05 0 079 26 rsl266991 1 A C 7 32228902 0 35±0 07 1.4e-06 0.13±0.06 0 032 74 808 0 22±0 05 2 7e-06 0 090 25 rsl0233045 A G 7 32231017 0 33±0 07 4.le-06 0.15±0.06 0 016 76 978 0 22±0 05 1 6e-06 0 063 28 rsl0233473 T C 7 32231532 0 34±0 08 1.8e-05 0.05±0.07 0 51 74 826 0 16±0 05 0 0013 0 0048 43 rsl0237329 c T 7 32232250 0 33±0 07 3.6e-06 0.15±0.06 0 015 76 978 0 22±0 05 1 3e-06 0 070 27 rs4141 108 T c 7 32233484 0 35±0 08 1.4e-05 0.05±0.07 0 5 76 994 0 17±0 05 0 001 0 0067 4 1 rsl0269368 G A 7 32238039 0 34±0 08 1.2e-05 0.04±0.07 0 58 76,984 0 16±0 05 0 0014 0 012 38 rsl014242 C T 7 32238830 0 36±0 08 3.6e-06 0 19±0 07 0 0039 76 808 0 26±0 05 2.4e-07 0 21 16 rs7786576 C A 7 32239239 0 38±0 08 l .le-06 0 17±0 07 0 0097 74 630 0 25±0 05 4.2e-07 0 29 11 rs7806224 C T 7 32239632 0 36±0 08 3.4e-06 0 19±0 07 0 0039 76 808 0 26±0 05 2.3e-07 0 22 15 rs9639646 G A 7 32246768 0 40±0 09 8.2e-06 0 06±0 08 0 43 76 819 0 20±0 06 0.00057 0 19 17 rsl0259431 C T 7 32247922 0 38±0 08 1.8e-06 0 19±0 07 0 0038 74 640 0 27±0 05 1.5e-07 0 17 19 rs9639648 A G 7 32248667 0 39±0 09 9.4e-06 0 07±0 08 0 37 76 815 0 20±0 06 0.00045 0 096 24 rsl0263673 T C 7 32249044 0 41±0 09 5.1e-06 0 07±0 08 0 38 74 648 0 20±0 06 0.00037 0 082 26 rs9638875 A T 7 32249439 0 33±0 07 3e-06 0 16±0 06 0 012 76 977 0 23±0 05 7.8e-07 0 080 26 rsl0241729 G A 7 32256404 0 32±0 10 0.00075 0 14±0 09 0 12 76 839 0 21±0 06 0.00075 0 16 19 rsl0236197 C T 7 32258286 0 35±0 07 1.5e-06 0 17±0 06 0 0085 74 809 0 24±0 05 3.4e-07 0 070 27 rsl l773343 T c 7 32258841 0 29±0 08 0.00062 0 18±0 07 0 0 1 74 814 0 22±0 05 3.5e-05 0 15 20 rs7798739 A T 7 32259486 0 35±0 07 1.4e-06 0 16±0 06 0 0098 74 809 0 24±0 05 4.1e-07 0 068 28 rsl3221985 A c 7 32259510 0 29±0 08 0.00062 0 18±0 07 0 0095 74 809 0 22±0 05 3.2e-05 0 15 20 rs929456 G T 7 32260 169 0 34±0 07 1.3e-06 0 15±0 06 0 017 76 962 0 23±0 05 7.4e-07 0 084 26 rsl3224417 A G 7 32265 118 0 29±0 08 0.00062 0 18±0 07 0 0081 74 813 0 22±0 05 2.6e-05 0 16 19 rsl l762194 A G 7 32266694 0 29±0 08 0.00056 0 19±0 07 0 0073 74 818 0 23±0 05 2.2e-05 0 13 22 rs6948856 A G 7 32268872 0 29±0 08 0.00071 0 19±0 07 0 007 74 773 0 23±0 05 2.4e-05 0 16 19 rs975 122 A T 7 32269319 0 29±0 08 0.00056 0 19±0 07 0 0054 74 817 0 23±0 05 1.5e-05 0 17 19 rs7806397 T c 7 32269864 0 35±0 07 7.5e-07 0 16±0 06 0 Oi l 76 979 0 24±0 05 2.6e-07 0 084 26 rs7796692 G A 7 32271390 0 29±0 08 0.00052 0 19±0 07 0 0076 76 983 0 23±0 05 2.le-05 0 21 15 rs7780515 T C 7 32271799 0 35±0 07 9.2e-07 0 16±0 06 0 Oi l 76 970 0 24±0 05 3.6e-07 0 092 25 rs4368879 c T 7 32274450 0 36±0 07 3.2e-07 0 16±0 06 0 012 76 972 0 24±0 05 1.7e-07 0 054 29 rs4370439 c T 7 32274626 0 29±0 08 0.00039 0 18±0 07 0 0091 76 975 0 22±0 05 2.le-05 0 15 20 rsl450869 G T 7 32278 197 0 36±0 07 4.7e-07 0 16±0 06 0 0096 76 968 0 24±0 05 1.7e-07 0 063 28 rsl450870 T c 7 32278251 0 37±0 07 2.2e-07 0 13±0 06 0 032 76 960 0 23±0 05 7.1e-07 0 10 24 rs7778162 c T 7 32281009 0 30±0 08 0.00025 0 18±0 07 0 0084 76 975 0 23±0 05 1.4e-05 0 11 23 rs7778443 T c 7 32281215 0 38±0 07 1.3e-07 0 14±0 06 0 027 76 964 0 23±0 05 3.9e-07 0 11 24 rsl0226228 G A 7 32282 138 0 36±0 07 5e-07 0 16±0 06 0 Oi l 76 971 0 24±0 05 2.4e-07 0 059 28 rsl476765 G T 7 32286983 0 37±0 07 1.8e-07 0 16±0 06 0 Oi l 76 972 0 25±0 05 1.3e-07 0 050 30 rs977 1228 C T 7 32289021 0 38±0 07 1.8e-07 0 18±0 06 0 0046 76 967 0 26±0 05 3.5e-08 0 043 31 rsl2540232 C T 7 32289486 0 36±0 07 4.2e-07 0 16±0 06 0 Oi l 76 972 0 24±0 05 2.2e-07 0 064 28 rs215596 A G 7 32292898 0 41±0 07 6.3e-09 0 13±0 06 0 027 76 973 0 25±0 05 7.9e-08 0 052 30 rsl l768207 C G 7 32293832 0 32±0 08 0.0001 1 0 14±0 07 0 033 76 996 0 21±0 05 5e-05 0 053 29 rs215599 C T 7 32296654 0 40±0 07 2.3e-08 0 13±0 06 0 032 76 953 0 24±0 05 2.2e-07 0 077 26 rsl0271037 T G 7 32296861 0 33±0 08 9.2e-05 0 16±0 07 0 014 77 Oi l 0 22±0 05 1.5e-05 0 074 27 rs215600 G A 7 32300 167 0 41±0 07 1.6e-08 0 15±0 06 0 017 76 987 0 25±0 05 7e-08 0 069 27 rs215601 A C 7 32300446 0 40±0 07 1.9e-08 0 13±0 06 0 04 76 976 0 23±0 05 3.1e-07 0 11 23 rs215605 G T 7 32303490 0 39±0 07 1.7e-08 0 17±0 06 0 0035 77 012 0 26±0 04 5.4e-09 0 12 22 rs215607 G A 7 32304862 0 39±0 09 8.6e-06 0 24±0 09 0 0082 57 610 0 31±0 06 5.6e-07 0 62 0 rs215610 G A 7 32306 119 0 36±0 08 1.8e-05 0 24±0 07 0 00074 77 021 0 28±0 05 l .le-07 0 72 0 rsl2531858 A C 7 32306624 0 34±0 08 2e-05 0 23±0 07 0 00081 77 020 0 28±0 05 1.2e-07 0 77 0 rs21561 1 C G 7 32307963 0 40±0 07 6.4e-09 0 05±0 06 0 42 76 998 0 19±0 04 1.9e-05 0 0084 40 rs7780009 A G 7 32308068 0 37±0 08 le-05 0 24±0 07 0 00068 77 022 0 29±0 05 6.3e-08 0 72 0 rsl l771526 G A 7 32309 143 0 37±0 10 0.00033 0 12±0 10 0 2 76 933 0 23±0 07 0.00089 0 32 9 rs6952609 G A 7 32309860 0 32±0 08 9.2e-05 0 25±0 07 0 00063 76,948 0 28±0 05 2.9e-07 0 65 0 rs7779181 C T 7 3231 1808 0 35±0 08 2.8e-05 0 24±0 07 0 00048 77 025 0 28±0 05 9.5e-08 0 71 0 rs7779130 T G 7 32313483 0 35±0 08 2.5e-05 0 24±0 07 0 00051 77 028 0 28±0 05 9.1e-08 0 71 0 rs7778788 c A 7 32313499 0 35±0 08 3.le-05 0 24±0 07 0 00059 77 025 0 28±0 05 1.3e-07 0 65 0 rs7779180 G A 7 32313727 0 36±0 08 1.2e-05 0 24±0 07 0 00045 77 026 0 29±0 05 4.6e-08 0 66 0 rs215614 G A 7 32313860 0 38±0 07 2.4e-08 0 17±0 06 0 0036 77 008 0 26±0 04 7.1e-09 0 10 24 rsl0951331 G A 7 32320899 0 35±0 08 2.9e-05 0 24±0 07 0 00064 77 018 0 28±0 05 1.4e-07 0 70 0 rs215622 C T 7 32324184 0 38±0 07 9.5e-08 0 15±0 06 0 013 77 015 0 24±0 05 1.2e-07 0 11 23 rs215625 G A 7 32324838 0 38±0 07 1.3e-07 0 16±0 06 0 0097 76 997 0 25±0 05 le-07 0 077 26 rs215629 G C 7 32326989 0 38±0 07 1.6e-07 0 18±0 06 0 0031 76 998 0 26±0 05 2.4e-08 0 14 21 rsl653876 T C 7 32327 144 0 33±0 08 8 le-05 0 15±0 07 0 025 77 021 0 22±0 05 3e-05 0 13 21 rs6462354 G A 7 32333008 0 33±0 09 0.0001 0 19±0 07 0 0063 77 000 0 24±0 05 5.8e-06 0 26 12 rsl l l5318 A G 7 32334175 0 34±0 09 6.3e-05 0 18±0 07 0 0094 77 002 0 24±0 05 6.6e-06 0 29 10 rs215632 A G 7 32335049 0 37±0 07 2e-07 0 18±0 06 0 0025 76 972 0 26±0 05 1.9e-08 0 13 21 rs215634 A G 7 32335673 0 37±0 07 7.8e-08 0 16±0 06 0 0075 76 966 0 25±0 05 4.6e-08 0 16 19 rs6955346 C T 7 32336078 0 37±0 07 5e-07 0 17±0 06 0 0062 74 823 0 25±0 05 1.4e-07 0 13 21 rs215635 C T 7 32336745 0 36±0 07 1 8e-07 0 18±0 06 0 003 76 972 0 25±0 05 2e-08 0 11 23 rsl0264177 G A 7 32337387 0 42±0 08 3 2e-08 0 18±0 07 0 0076 76 780 0 27±0 05 2.4e-08 0 070 27 rs215636 C G 7 32338444 0 31±0 08 0 00026 0 16±0 07 0 019 76 975 0 21±0 05 4.3e-05 0 21 15 rs215639 C T 7 32340 164 0 36±0 07 8 9e-07 0 17±0 06 0 0075 74 795 0 24±0 05 2.6e-07 0 13 22 rs6977196 C T 7 32340403 0 31±0 08 0 0002 0 17±0 07 0 013 76 976 0 23±0 05 2e-05 0 44 2 rsl0238006 A c 7 32343478 0 32±0 09 0 00021 0 17±0 07 0 017 76 905 0 22±0 05 3.le-05 0 36 6 rsl0447642 T c 7 32344090 0 34±0 09 0 00015 0 17±0 07 0 022 76 785 0 23±0 06 3.7e-05 0 21 16 rs215669 G A 7 32345504 0 34±0 07 8 5e-07 0 15±0 06 0 017 76 949 0 23±0 05 6.5e-07 0 18 18 rs215670 G A 7 32345743 0 36±0 07 7 5e-07 0 16±0 06 0 Oi l 76 884 0 24±0 05 3.7e-07 0 13 22 rsl86229 C A 7 32348082 0 34±0 07 8 3e-07 0 13±0 06 0 038 76 968 0 21±0 05 2.2e-06 0 23 14 rsl653889 G A 7 32353 178 0 27±0 08 0 00039 0 11±0 06 0 093 76 943 0 17±0 05 0.00042 0 45 1 rsl668389 G T 7 32353440 0 28±0 08 0 00015 0 11±0 06 0 097 76 943 0 18±0 05 0.00023 0 42 3 rsl668393 C T 7 32360 136 0 26±0 07 0 00054 0 10±0 06 0 11 76 943 0 16±0 05 0.00066 0 35 7 rs215692 T c 7 32361383 0 32±0 07 3 5e-06 0 12±0 06 0 043 76 982 0 20±0 04 6.5e-06 0 12 22 rs412876 G T 7 32362 185 0 34±0 07 1 3e-06 0 13±0 06 0 034 76 982 0 21±0 04 3e-06 0 15 20 rs215694 T G 7 32363551 0 33±0 07 2 3e-06 0 12±0 06 0 053 76 964 0 20±0 05 8.2e-06 0 13 21 rs215695 c T 7 32364433 0 32±0 07 1 6e-05 0 09±0 06 0 14 76 700 0 18±0 05 0.00014 0 13 22 rs215696 G A 7 32364553 0 29±0 08 0 00018 0 11±0 07 0 1 76 690 0 18±0 05 0.00032 0 18 18 rs215697 C T 7 32364566 0 28±0 08 0 00018 0 12±0 07 0 077 76 704 0 18±0 05 0.00019 0 20 16 rs215698 C T 7 32364619 0 30±0 07 5 3e-05 0 09±0 06 0 15 76 702 0 17±0 05 0.00031 0 16 19 rs4723147 A G 7 32364681 0 33±0 08 4 7e-05 0 13±0 07 0 062 76 988 0 20±0 05 6.9e-05 0 17 18 rsl668394 A C 7 32365210 0 32±0 08 2 4e-05 0 10±0 06 0 13 76 692 0 18±0 05 0.00016 0 15 20 rs215699 C G 7 32365499 0 32±0 07 1 6e-05 0 01±0 06 0 85 76 697 0 13±0 05 0.0048 0 035 33 rs215700 C T 7 32365691 0 27±0 07 0 00015 0 01±0 07 0 84 76 819 0 12±0 05 0.0086 0 067 28 rs215702 G A 7 32366 183 0 37±0 07 3 6e-07 0 14±0 06 0 021 76 691 0 23±0 05 6.4e-07 0 09 1 25 rsl0486507 T G 7 32366358 0 43±0 09 3 8e-06 0 11±0 08 0 16 76 690 0 23±0 06 8.6e-05 0 20 16 rs215717 A G 7 32382301 0 56±0 12 5 le-06 0 06±0 11 0 6 73 978 0 25±0 08 0.001 1 0 22 16 rsl70016 G A 7 32412842 0 34±0 08 1 5e-05 0 15±0 07 0 019 73 946 0 23±0 05 6.4e-06 0 30 11 rsl0236370 C G 7 32412984 0 32±0 08 3 7e-05 0 02±0 06 0 75 76 379 0 14±0 05 0.0048 0 067 28 rsl l23893 T C 7 32413455 0 37±0 09 7 le-05 0 06±0 07 0 39 73,867 0 17±0 06 0.0021 0 21 16 rs896 165 C A 7 32439685 0 23±0 07 0.00075 -0.00±0.05 0.95 77 012 0 09±0 04 0.036 0 073 27 rs716500 A G 7 32443454 0 24±0 07 0.0006 0.02±0.06 0.68 77 013 0 11±0 04 0.013 0 15 20 rsl0958725 G T 8 42643741 0 28±0 08 0.00063 0.29±0.07 3.7e-05 76 670 0 29±0 05 8.5e-08 0 22 15 rs7837296 C A 8 42646051 0 31±0 09 0.00054 0.29±0.08 0.00019 76 629 0 30±0 06 3.8e-07 0 23 15 rs5005909 A G 8 42647824 0 30±0 09 0.00065 0.29±0.08 0.00019 76 629 0 29±0 06 4.4e-07 0 25 13 rsl979140 C T 8 42649993 0 31±0 08 0.00017 0.30±0.07 3.4e-05 76 668 0 30±0 05 2.3e-08 0 26 12 rsl0958726 T G 8 42655066 0 31±0 08 0.00013 0.30±0.07 3.3e-05 76 668 0 30±0 05 1.7e-08 0 25 13 rs7842601 T C 8 42656212 0 30±0 08 0.00018 0.29±0.07 3.8e-05 76 668 0 30±0 05 2.7e-08 0 24 14 rsl3273442 G A 8 42663 174 0 29±0 08 0.00032 0.29±0.07 3.6e-05 76 669 0 29±0 05 4.3e-08 0 26 13 rsl45 1239 A G 8 42665699 0 30±0 08 0.00021 0.30±0.07 4e-05 76 643 0 30±0 05 3.2e-08 0 26 13 rsl45 1240 G A 8 42665868 0 29±0 08 0.00032 0.30±0.07 2.6e-05 76 669 0 30±0 05 3.2e-08 0 27 12 rsl955185 T C 8 42668804 0 31±0 08 0.00024 0.12±0.09 0.19 57 327 0 22±0 06 0.00036 0 43 2 rs6474412 T C 8 42669655 0 31±0 08 0.00017 0.30±0.07 2.6e-05 76 670 0 30±0 05 1.7e-08 0 25 13 rs7004381 G A 8 42670318 0 29±0 08 0.00031 0.30±0.07 2.6e-05 76 670 0 30±0 05 3e-08 0 21 16 rs4950 A G 8 42671790 0 31±0 08 0.00017 0.10±0.09 0.31 57 327 0 21±0 06 0.0006 0 43 3 rsl530848 T G 8 42672065 0 28±0 08 0.00049 0.30±0.07 2.6e-05 76 670 0 29±0 05 4.8e-08 0 23 15 rsl3280604 A G 8 42678743 0 31±0 08 0.00012 0.30±0.07 2.7e-05 76 670 0 31±0 05 1.3e-08 0 24 14 rs6997909 G A 8 42679406 0 30±0 08 0.00022 0.30±0.07 2.7e-05 76 670 0 30±0 05 2.3e-08 0 25 13 rs6474414 C A 8 42679493 0 29±0 08 0.00037 0.30±0.07 2.8e-05 76 666 0 29±0 05 3.9e-08 0 23 14 rs6474415 A G 8 42682095 0 30±0 08 0.00024 0.30±0.07 3.2e-05 76 670 0 30±0 05 2.9e-08 0 25 13 rsl6891561 C T 8 42698896 0 30±0 09 0.00062 0.31±0.08 4.8e-05 76 610 0 30±0 06 l .le-07 0 27 12 rs7017612 A c 8 42718402 0 34±0 09 0.00018 0.27±0.08 0.00041 74 551 0 29±0 06 3.8e-07 0 0051 44 rs2304297 G c 8 42727356 0 28±0 08 0.00075 0.15±0.07 0.024 76 681 0 20±0 05 0.0001 1 0 13 22 rsl2376406 G T 9 27830794 0 28±0 07 0.00017 0.08±0.06 0.2 76 940 0 16±0 05 0.00075 0 36 6 rsl2376417 C T 9 27830808 0 28±0 07 0.00015 0.08±0.06 0.19 76 940 0 16±0 05 0.00067 0 38 5 rs7853855 G A 9 27836594 0 28±0 07 0.0001 0.08±0.06 0.18 76 945 0 16±0 05 0.00049 0 36 6 rs947521 C T 9 27846707 0 28±0 07 0.00014 0.06±0.06 0.31 76 819 0 15±0 05 0.0015 0 24 14 rsl05 11818 T c 9 27847771 0 29±0 07 4.6e-05 0.06±0.06 0.32 76 819 0 15±0 05 0.00083 0 25 13 rsl0968151 T c 9 27847857 0 28±0 07 8e-05 0.06±0.06 0.32 76 819 0 15±0 05 0.001 1 0 25 13 rs458 1139 T c 9 27850390 0 29±0 07 6.7e-05 0.06±0.06 0.32 76 818 0 15±0 05 0.001 0 25 13 rsl08 12677 T c 9 27852281 0 28±0 07 0.00012 0.06±0.06 0.32 76 819 0 15±0 05 0.0015 0 24 14 rsl08 12678 G A 9 27852655 0 29±0 07 6e-05 0.07±0.06 0.28 76 823 0 15±0 05 0.00079 0 29 11 rsl004158 G A 9 27855600 0 23±0 07 0.00054 0.09±0.06 0.12 76 826 0 15±0 04 0.0006 0 6 1 0 rsl0968157 A C 9 27856582 0 28±0 07 0.00013 0.09±0.06 0.17 76 584 0 16±0 05 0.00049 0 37 6 rsl08 12680 C T 9 27864522 0 28±0 07 0.00019 0.08±0.06 0.18 76 816 0 16±0 05 0.00076 0 19 17 rs2039997 G A 9 27865763 0 28±0 07 0.00013 0.09±0.06 0.16 76 862 0 16±0 05 0.00049 0 26 12 rs2778429 C A 9 27868045 0 28±0 07 9.3e-05 0.09±0.06 0.16 76 884 0 16±0 05 0.00039 0 25 13 rsl08 12681 C G 9 27869095 0 24±0 07 0.00022 0.04±0.06 0.48 76 891 0 12±0 04 0.0061 0 14 20 rs7469938 C T 9 27884033 0 29±0 07 6.8e-05 0.09±0.06 0.16 76 910 0 17±0 05 0.00033 0 24 14 rsl930021 T c 9 27885371 0 30±0 07 5.1e-05 0.08±0.06 0.18 76 924 0 16±0 05 0.00036 0 26 13 rsl0968177 T c 9 27887 124 0 23±0 07 0.00059 -0.03±0.06 0.6 76 912 0 08±0 04 0.07 0 010 39 rsl2378440 G A 9 27892874 0 31±0 07 2.2e-05 0.09±0.06 0.16 76 861 0 18±0 05 0.00017 0 17 19 rs2383743 A G 9 27894341 0 30±0 07 3.5e-05 0.08±0.06 0.18 76 900 0 17±0 05 0.00027 0 28 11 rsl930025 G A 9 27896255 0 24±0 07 0.00027 -0.03±0.06 0.61 76 902 0 08±0 04 0.05 0 012 38 rsl08 12686 A C 9 27899600 0 31±0 07 2.9e-05 0.08±0.06 0.18 76,986 0 17±0 05 0.00025 0 26 13 rsl08 12687 A G 9 27902332 0 29±0 07 6.2e-05 0.08±0.06 0 2 76 981 0 16±0 05 0 00044 0 23 14 rsl0491823 T A 9 27904937 0 31±0 07 1.9e-05 0.08±0.06 0 21 76 980 0 17±0 05 0 00027 0 23 14 rsl0491824 G A 9 27905 195 0 32±0 07 1.2e-05 0.08±0.06 0 21 76 979 0 17±0 05 0 00021 0 24 14 rsl0757695 T C 9 27905464 0 25±0 07 0.00013 -0.03±0.06 0 55 76 974 0 09±0 04 0 045 0 Oi l 39 rsl08 12690 A G 9 27905481 0 32±0 07 1.6e-05 0.08±0.06 0 22 76 986 0 17±0 05 0 00024 0 24 14 rsl3286150 G A 9 2791 1436 0 28±0 07 4.7e-05 -0.04±0.06 0 54 76 978 0 09±0 04 0 035 0 Oi l 39 rsl0757697 A G 9 27915 198 0 24±0 07 0.00028 -0.02±0.06 0 73 77 016 0 09±0 04 0 04 0 013 38 rsl930037 G A 9 27916612 0 24±0 07 0.00028 -0.02±0.06 0 71 77 016 0 09±0 04 0 042 0 012 38 rsl930038 C T 9 27916755 0 25±0 07 0.00021 -0.02±0.06 0 67 77 Oi l 0 09±0 04 0 041 0 Oi l 39 rsl08 12695 G T 9 27917 133 0 35±0 07 9.4e-07 0.03±0.06 0 62 77 015 0 15±0 04 0 00054 0 12 22 rsl953037 G c 9 27917768 0 35±0 07 9.5e-07 0.02±0.06 0 7 76 961 0 15±0 04 0 00073 0 14 21 rsl953038 G A 9 27918087 0 35±0 07 le-06 0.03±0.06 0 67 76 985 0 15±0 04 0 00065 0 16 19 rsl953039 C A 9 27918 101 0 35±0 07 8.9e-07 0.02±0.06 0 68 76 985 0 15±0 04 0 00067 0 17 18 rsl08 12697 A C 9 27918282 0 26±0 07 0.00012 -0.03±0.06 0 66 76 990 0 09±0 04 0 033 0 Oi l 38 rsl0968200 A G 9 27918472 0 25±0 07 0.00021 -0.03±0.06 0 66 76 990 0 09±0 04 0 041 0 017 36 rsl0491825 T C 9 27920953 0 33±0 07 3.6e-06 0.03±0.06 0 64 76 781 0 15±0 04 0 001 1 0 25 13 rsl0968202 G A 9 27921 179 0 36±0 07 4.6e-07 0.02±0.06 0 71 76 981 0 16±0 05 0 00058 0 17 18 rsl08 12698 A G 9 27922763 0 25±0 07 0.00028 -0.03±0.06 0 64 76 988 0 08±0 04 0 049 0 020 36 rsl0968206 G A 9 27924303 0 34±0 07 1.8e-06 0.03±0.06 0 68 76 985 0 15±0 05 0 00088 0 17 18 rsl0968210 A G 9 27924992 0 24±0 07 0.00042 -0.04±0.06 0 55 76 951 0 08±0 04 0 074 0 029 33 rsl08 12699 G C 9 27925096 0 25±0 07 0.0002 -0.01±0.06 0 8 76 947 0 09±0 04 0 047 0 028 34 rsl0968212 G A 9 27925252 0 27±0 07 0.00013 -0.04±0.06 0 53 74 778 0 08±0 04 0 057 0 021 36 rs2383748 C T 9 27926360 0 32±0 07 1.6e-05 0.06±0.06 0 33 76 962 0 16±0 05 0 00056 0 26 13 rsl0968213 A c 9 27926769 0 31±0 07 3.3e-05 0.06±0.06 0 33 76 966 0 16±0 05 0 00083 0 24 14 rsl08 12700 G c 9 27926991 0 32±0 07 1.7e-05 0.08±0.06 0 21 76 958 0 17±0 05 0 00024 0 32 9 rs2804 A T 9 27927750 0 32±0 07 2e-05 0.06±0.06 0 34 76 970 0 16±0 05 0 00065 0 25 13 rsl2375721 T c 9 27927803 0 31±0 07 3.4e-05 0.06±0.06 0 34 76 970 0 16±0 05 0 00087 0 23 14 rsl2375740 A G 9 27927887 0 32±0 08 2.6e-05 0.06±0.06 0 36 76 917 0 16±0 05 0 00082 0 25 13 rs2383749 C T 9 27927912 0 31±0 07 3.2e-05 0.06±0.06 0 34 76 970 0 16±0 05 0 00086 0 24 14 rsl930041 A G 9 27928615 0 31±0 07 2.9e-05 0.06±0.06 0 36 76 970 0 16±0 05 0 00089 0 22 15 rsl08 12702 C T 9 27929856 0 30±0 07 4.7e-05 0.05±0.06 0 4 76 955 0 15±0 05 0 0014 0 24 14 rsl0968215 T c 9 27930 160 0 33±0 08 1.7e-05 0.05±0.06 0 42 74 779 0 16±0 05 0 001 0 15 20 rsl930047 c T 9 27933323 0 31±0 08 3.2e-05 0.05±0.06 0 4 76 892 0 15±0 05 0 0012 0 30 10 rsl l037491 T G 11 43574122 0 30±0 09 0.00062 -0.04±0.08 0 58 77 016 0 09±0 05 0 088 0 8 1 0 rsl2800492 A G 11 43575637 0 30±0 09 0.00071 -0.04±0.08 0 58 77 013 0 09±0 05 0 087 0 75 0 rsl l037498 G T 11 43584409 0 32±0 09 0.00046 -0.04±0.08 0 55 77 007 0 09±0 06 0 091 0 75 0 rs7936371 G A 11 43589915 0 31±0 08 7.3e-05 -0.06±0.07 0 36 77 000 0 09±0 05 0 07 0 56 0 rs2902373 T C 11 43594139 0 31±0 08 7.6e-05 -0.06±0.07 0 36 77 000 0 09±0 05 0 076 0 53 0 rsl l037530 c G 11 43613252 0 29±0 08 0.00017 -0.03±0.07 0 67 77 016 0 10±0 05 0 041 0 71 0 rsl l037532 c T 11 43614142 0 28±0 08 0.00027 -0.05±0.07 0 46 77 015 0 08±0 05 0 084 0 49 0 rs9783372 G A 11 43616872 0 30±0 08 0.00013 -0.06±0.07 0 38 77 019 0 08±0 05 0 084 0 63 0 rsl l037536 G A 11 43621394 0 27±0 08 0.00038 -0.05±0.07 0 44 77 017 0 08±0 05 0 1 0 58 0 rsl l037539 C A 11 43630795 0 29±0 08 0.00026 -0.06±0.07 0 35 77 020 0 08±0 05 0 12 0 77 0 rs7129385 A G 11 43641361 0 29±0 08 0.00018 -0.06±0.07 0 4 77 000 0 08±0 05 0 086 0 52 0 rsl l037545 A G 11 43644987 0 30±0 08 0.0002 -0.06±0.07 0 35 77,022 0 08±0 05 0 11 0 77 0 rs7105746 G A 11 43648 160 0 29±0 08 0.00025 -0.07±0.07 0 31 77 024 0 07±0 05 0.12 0 70 0 rs9804429 A T 11 43706 164 0 34±0 08 4.1e-05 0.01±0.07 0 94 76 543 0 14±0 05 0.0081 0 48 0 rs997 1430 T c 11 43712356 0 33±0 08 4.5e-05 -0.00±0.06 0 99 76 892 0 13±0 05 0.01 0 44 2 rsl0838159 G A 11 43718786 0 33±0 08 4.7e-05 -0.00±0.06 0 99 76 918 0 13±0 05 0.01 0 4 1 3 rsl0742688 G A 11 43724830 0 26±0 07 0.00034 -0.04±0.06 0 49 76 974 0 08±0 05 0.075 0 4 1 3 rs71 15970 T C 11 43725871 0 26±0 07 0.00036 -0.04±0.06 0 5 76 974 0 08±0 05 0.077 0 42 3 rsl0768976 G T 11 43730389 0 27±0 07 0.00033 -0.04±0.06 0 5 76 672 0 08±0 05 0.077 0 38 5 rs4643069 G A 11 43732310 0 26±0 07 0.00035 -0.04±0.06 0 51 76 672 0 08±0 05 0.077 0 40 4 rs71 101 15 A G 11 43736769 0 30±0 08 0.00019 0.00±0.06 0 99 74 920 0 12±0 05 0.019 0 43 3 rs71 10437 A C 11 43737 129 0 26±0 07 0.00044 -0.04±0.06 0 53 76 672 0 08±0 05 0.078 0 38 6 rsl2273608 A G 11 43741212 0 35±0 08 1.8e-05 -0.00±0.07 0 96 77 014 0 14±0 05 0.0079 0 21 16 rsl l037609 T C 11 43743055 0 34±0 08 2.8e-05 -0.00±0.07 0 98 77 014 0 13±0 05 0.0088 0 24 14 rs6485460 T G 11 43757265 0 34±0 08 3.3e-05 -0.01±0.07 0 92 76 715 0 13±0 05 0.012 0 23 14 rsl0400325 A G 11 43763578 0 32±0 08 7.8e-05 -0.02±0.07 0 75 76 688 0 11±0 05 0.025 0 35 7 rsl0400390 G T 11 43763968 0 33±0 08 4.7e-05 -0.01±0.07 0 89 76 715 0 12±0 05 0.015 0 25 13 rs4755744 A c 11 43774306 0 30±0 08 7.7e-05 -0.01±0.07 0 94 57 326 0 15±0 05 0.0057 0 52 0 rsl0838172 A c 11 43776784 0 32±0 08 4.9e-05 -0.01±0.07 0 89 76 701 0 12±0 05 0.015 0 26 12 rsl l037654 A G 11 43785335 0 33±0 08 3e-05 -0.01±0.07 0 89 77 025 0 12±0 05 0.013 0 25 13 rsl0400343 A G 11 43789081 0 30±0 08 8.8e-05 -0.01±0.07 0 94 57 326 0 15±0 05 0.0061 0 55 0 rs2037296 A G 11 43792647 0 33±0 08 4.4e-05 -0.01±0.07 0 85 76 714 0 12±0 05 0.017 0 23 15 rs7 114842 C A 11 43796007 0 34±0 07 4e-06 -0.03±0.06 0 7 76 986 0 12±0 05 0.0086 0 18 17 rs2869030 T G 15 76498858 0 57±0 09 1.3e-10 0.59±0.08 5 6e- 15 76 775 0 59±0 06 4.6e-24 0 29 10 rs4887053 c A 15 76499754 0 59±0 08 1.7e-13 0.66±0.07 1 2e-20 77 005 0 63±0 05 1.5e-32 0 052 29 rs2869032 T C 15 76501616 0 61±0 08 2.9e-13 0.66±0.07 1 2e-20 77 004 0 64±0 05 2.6e-32 0 055 29 rs2869045 c T 15 76505954 0 62±0 08 l .le-13 0.66±0.07 1 3e-20 77 005 0 65±0 05 l .le-32 0 066 28 rs2568498 A T 15 76508987 0 20±0 06 0.00 15 0.17±0.06 0 0046 76 824 0 18±0 04 2.5e-05 0 30 10 rsl394371 T c 15 7651 1524 0 61±0 07 2.3e-16 0.60±0.06 3 6e-21 76 969 0 6 1±0 05 6.9e-36 0 0096 39 rs2568500 c T 15 76513983 0 18±0 07 0.0053 0.29±0.08 0 00014 57 609 0 23±0 05 3.6e-06 0 60 0 rsl7483548 A G 15 76517368 0 71±0 07 2e-24 0.66±0.06 2 2e-27 77 016 0 68±0 05 6.3e-50 0 06 1 28 rsl7405217 T C 15 76518204 0 71±0 07 8.1e-25 0.66±0.06 1 9e-27 77 017 0 68±0 05 2.5e-50 0 056 29 rs924840 A T 15 76518863 0 58±0 08 l .le-13 0.62±0.07 4 9e- 19 76 995 0 60±0 05 4e-31 0 092 25 rs2938671 G A 15 76519809 0 59±0 08 8.3e-14 0.62±0.07 3 le- 19 76 996 0 6 1±0 05 1.9e-31 0 095 25 rsl7483721 C T 15 76520786 0 71±0 07 7.4e-25 0.67±0.06 8 2e-28 77 022 0 69±0 05 9.1e-51 0 063 28 rsl847529 A c 15 76522 125 0 21±0 06 0.00 12 0.18±0.06 0 0017 76 829 0 19±0 04 7e-06 0 27 12 rs8041628 C G 15 76522410 0 21±0 06 0.00 13 0.15±0.06 0 Oi l 76 824 0 17±0 04 6e-05 0 15 20 rs2568488 A T 15 76523648 0 61±0 08 1.6e-13 0.62±0.07 3 le- 19 77 024 0 62±0 05 3.7e-31 0 093 25 rs2656052 C A 15 76527987 0 72±0 07 6.6e-25 0.67±0.06 2 8e-28 77 023 0 69±0 05 2.7e-51 0 067 28 rs2568494 A G 15 76528019 0 71±0 07 5.7e-25 0.67±0.06 2 4e-28 77 023 0 69±0 05 2e-51 0 053 29 rs718 1486 C T 15 76528673 0 72±0 07 5.3e-25 0.67±0.06 2 2e-28 77 019 0 69±0 05 1.7e-51 0 059 28 rs2656073 G T 15 76529331 0 61±0 08 6.4e-14 0.62±0.07 4 7e- 19 77 020 0 62±0 05 2.3e-31 0 089 25 rsl7483929 A G 15 76529431 0 72±0 07 4.2e-25 0.66±0.06 8 9e-28 77 016 0 69±0 05 6e-51 0 086 25 rsl05 19198 C A 15 76529809 0 21±0 06 0.00 12 0.19±0.06 0 00083 76 825 0 20±0 04 3.4e-06 0 29 10 rs2958719 A G 15 76530084 0 62±0 09 7.7e-13 0.60±0.07 4 2e- 16 76 937 0 6 1±0 06 2.5e-27 0 20 17 rsl2909921 A G 15 76530315 0 21±0 06 0.00 12 0.19±0.06 0 00099 76 829 0 20±0 04 4.2e-06 0 31 9 rsl29 10090 A C 15 76530355 0 21±0 06 0.00 14 0.19±0.06 0 00094 76,829 0 20±0 04 4.3e-06 0 32 9 rs2656071 A T 15 76532398 0 59±0 08 9.6e-14 0.62±0.07 5e- 19 77 023 0 6 1±0 05 3.5e-31 0 090 25 rs2656069 T c 15 76532762 0 59±0 08 7.4e-14 0.61±0.07 2. 1e- 18 77 024 0 60±0 05 l .le-30 0 047 30 rs2656065 A G 15 76537604 0 72±0 07 3.2e-25 0.68±0.06 2.7e-29 77 023 0 70±0 05 1.2e-52 0 090 25 rs2568483 A G 15 76539398 0 58±0 08 1.8e-13 0.62±0.07 2.8e- 19 77 024 0 6 1±0 05 3.7e-31 0 076 26 rsl l639224 A G 15 76540426 0 21±0 06 0.00 1 0.20±0.06 0.00066 76 829 0 20±0 04 2.4e-06 0 34 8 rsl964678 G A 15 76541055 0 48±0 07 5.6e-13 0.56±0.06 5e-21 77 027 0 52±0 04 2.5e-32 0 29 11 rs2009746 G A 15 76541 157 0 72±0 07 3.4e-25 0.70±0.06 6.1e-30 77 010 0 71±0 05 2.7e-53 0 096 24 rs2938674 C A 15 76544968 0 59±0 08 l .le-13 0.62±0.07 4.2e- 19 77 025 0 6 1±0 05 3.3e-31 0 09 1 25 rsl7484235 G C 15 76548469 0 72±0 07 6.3e-25 0.54±0.06 7. 1e- 19 77 009 0 6 1±0 05 5.2e-41 3 Oe-08 65 rs42991 16 A T 15 76553249 0 49±0 07 3.2e-13 0.56±0.06 5.5e-21 77 027 0 53±0 04 1.4e-32 0 29 10 rsl504550 G A 15 76553305 0 73±0 07 2e-25 0.70±0.06 2. 1e-30 77 008 0 71±0 05 5.1e-54 0 11 23 rsl29 10910 T C 15 76554905 0 48±0 07 4e-13 0.56±0.06 5.8e-21 77 027 0 53±0 04 2e-32 0 32 9 rs8043227 G C 15 76555926 0 48±0 07 5.5e-13 0.21±0.06 0.00039 77 Oi l 0 32±0 04 1.6e-13 1 6e-l l 71 rsl l072766 C T 15 76558601 0 61±0 08 3e-14 0.62±0.07 3.3e- 19 77 025 0 62±0 05 7.4e-32 0 11 23 rsl7484524 G A 15 76559731 0 73±0 07 1.9e-25 0.71±0.06 8.6e-31 77 008 0 72±0 05 2e-54 0 12 22 rs8042238 T C 15 76561326 0 49±0 07 2.4e-13 0.56±0.06 5.8e-21 77 027 0 53±0 04 l .le-32 0 28 11 rs8042260 G A 15 76561429 0 48±0 07 3.2e-13 0.56±0.06 5.8e-21 77 027 0 53±0 04 1.6e-32 0 31 10 rsl2903295 G A 15 76566027 0 49±0 07 le-12 0.56±0.06 6.3e-21 77 027 0 53±0 04 5e-32 0 29 10 rsl2904234 T C 15 76566439 0 48±0 07 8.7e-13 0.56±0.06 6.5e-21 77 027 0 52±0 04 4.7e-32 0 33 8 rs965604 A G 15 76576278 0 48±0 07 3.8e-13 0.55±0.06 7. 1e-21 77 027 0 52±0 04 2.3e-32 0 30 10 rsl3180 T C 15 76576543 0 48±0 07 4.4e-13 0.55±0.06 l .le-20 77 025 0 52±0 04 4.2e-32 0 18 18 rsl062980 T C 15 76579582 0 49±0 07 2e-13 0.56±0.06 5.3e-21 77 027 0 53±0 04 8.7e-33 0 24 14 rs4362358 T C 15 76583 159 0 49±0 07 3.6e-13 0.58±0.06 4.5e-22 77 027 0 54±0 04 1.5e-33 0 31 9 rs5019044 T A 15 76583337 0 61±0 08 1.4e-13 0.65±0.07 9.4e-21 77 025 0 64±0 05 9.4e-33 0 076 27 rs9788682 G A 15 76589641 0 69±0 08 le-17 0.75±0.07 1.3e-25 76 959 0 73±0 05 1.2e-41 0 036 32 rs9788721 C T 15 76589924 0 82±0 07 8.4e-33 0.77±0.06 3. 1e-37 77 022 0 79±0 05 5e-68 0 18 18 rs7164594 C T 15 76590 112 0 68±0 08 2.7e-16 0.76±0.07 7.6e-27 76 978 0 73±0 05 1.8e-41 0 012 38 rs8034191 C T 15 76593078 0 82±0 07 7.9e-33 0.77±0.06 4.3e-37 76 967 0 79±0 05 6.8e-68 0 18 17 rsl2591557 A G 15 76598787 0 22±0 07 0.00 1 0.30±0.08 8.6e-05 57 614 0 25±0 05 3.9e-07 0 45 1 rsl05 19203 G A 15 76601 101 0 82±0 07 8.7e-33 0.76±0.06 1.6e-36 77 010 0 78±0 05 3.1e-67 0 15 20 rs7163730 A G 15 76601736 0 69±0 08 6.1e-18 0.75±0.07 l .le-26 77 010 0 73±0 05 6.1e-43 0 033 33 rs803 1948 T G 15 76603 112 0 82±0 07 9.5e-33 0.77±0.06 2.4e-37 77 000 0 79±0 05 4.6e-68 0 17 18 rs446 1039 A T 15 76604502 0 70±0 08 1.5e-18 0.76±0.07 4.4e-27 77 018 0 73±0 05 5.8e-44 0 026 34 rs3885951 G A 15 76612972 0 54±0 12 1.3e-05 0.71±0. 10 4.5e- 12 76 684 0 64±0 08 4.2e-16 0 00026 52 rs931794 G A 15 76613235 0 83±0 07 5.2e-33 0.78±0.06 7.5e-38 76 993 0 80±0 05 7.3e-69 0 17 18 rs2036534 T C 15 76614003 0 69±0 08 2.6e-18 0.74±0.07 3.3e-26 77 007 0 72±0 05 7.2e-43 0 0040 43 rs3813570 T C 15 76619887 0 66±0 08 1.6e-16 0.75±0.07 l .le-26 76 955 0 71±0 05 1.7e-41 0 0068 4 1 rs4243083 G C 15 76620885 0 23±0 07 0.00062 -0.05±0.08 0.51 57 614 0 10±0 05 0.04 0 0027 48 rs4887063 T C 15 76626770 0 24±0 06 0.00018 0.22±0.06 0.00019 76 791 0 23±0 04 1.3e-07 0 32 9 rsl979907 c T 15 76629294 0 29±0 07 8.7e-06 0.23±0.06 8.8e-05 76 970 0 26±0 04 4.4e-09 0 45 1 rsl979905 c A 15 76629429 0 29±0 07 9.7e-06 0.24±0.06 8.5e-05 76 932 0 26±0 04 4.4e-09 0 38 5 rs4887064 c G 15 76629902 0 29±0 07 1.2e-05 0.16±0.06 0.0062 76 970 0 22±0 04 8.2e-07 0 13 21 rsl2907966 c T 15 76630 106 0 29±0 07 1.2e-05 0.23±0.06 8.6e-05 76 971 0 26±0 04 5.4e-09 0 49 0 rs880395 G A 15 7663141 1 0 28±0 07 1.4e-05 0.23±0.06 9e-05 76 970 0 26±0 04 6.5e-09 0 49 0 rs905740 C T 15 76631441 0 28±0 07 1.4e-05 0.23±0.06 8.7e-05 76,970 0 26±0 04 6.4e-09 0 49 0 rs7164030 A G 15 76631716 0 28±0 07 1.6e-05 0 23±0 06 8.9e-05 76 970 0 25±0 04 7.2e-09 0 48 0 rs4275821 T C 15 76636596 0 27±0 07 4.7e-05 0 32±0 06 2.8e-07 76 929 0 30±0 05 5.9e-l l 0 85 0 rs7173512 T C 15 76636969 0 28±0 07 4e-05 0 32±0 06 2.7e-07 76 929 0 30±0 05 4.8e-l l 0 85 0 rs2036527 A G 15 76638670 0 83±0 07 2.3e-3 1 0 80±0 06 4.6e-38 76 912 0 8 1±0 05 1.6e-67 0 09 1 25 rs684513 C G 15 76645455 0 67±0 08 2.9e-15 0 63±0 07 9.6e- 19 76 896 0 65±0 05 2.6e-32 4 6e-08 65 rs667282 T C 15 76650527 0 70±0 08 5.4e-18 0 74±0 07 3.4e-26 76 763 0 72±0 05 1.6e-42 0 015 37 rsl7486278 c A 15 76654537 0 83±0 07 4.6e-3 1 0 79±0 06 1.4e-37 76 939 0 80±0 05 le-66 0 062 28 rs569207 c T 15 76660 174 0 72±0 08 4.1e-18 0 61±0 09 6.9e- 12 57 573 0 67±0 06 3.7e-28 0 026 37 rs637 137 T A 15 76661031 0 71±0 08 2e-18 0 73±0 07 5.8e-26 76 764 0 72±0 05 le-42 0 018 36 rs7180002 T A 15 76661048 0 83±0 07 6.9e-3 1 0 79±0 06 1.2e-37 76 943 0 80±0 05 1.4e-66 0 066 28 rs951266 A G 15 76665596 0 83±0 07 1.9e-30 0 78±0 06 3.7e-36 74 879 0 80±0 05 1.4e-64 0 064 29 rsl6969968 A G 15 76669980 0 82±0 07 5.4e-32 0 77±0 06 7.6e-38 76 996 0 79±0 05 7e-68 0 089 25 rs518425 A G 15 76670868 0 65±0 07 4.4e-18 0 59±0 06 3e-20 76 790 0 62±0 05 1.5e-36 0 49 0 rs578776 G A 15 76675455 0 64±0 08 2e-17 0 62±0 06 6.7e-22 76 791 0 63±0 05 1.3e-37 0 49 0 rsl29 10984 A G 15 76678682 0 71±0 08 1.7e-18 0 72±0 07 6.4e-26 76 995 0 72±0 05 9.6e-43 0 0052 42 rsl05 1730 A G 15 76681394 0 84±0 07 2.1e-33 0 78±0 06 5.6e-38 76 972 0 80±0 05 2.4e-69 0 035 32 rs3743078 G C 15 76681814 0 72±0 08 2.8e-19 0 33±0 07 1.7e-06 76 995 0 48±0 05 6.8e-21 3 7e-18 78 rsl317286 G A 15 76683 184 0 81±0 07 4.8e-32 0 77±0 06 7.5e-38 77 002 0 79±0 05 6.2e-68 0 09 1 25 rs938682 A G 15 76683602 0 72±0 08 1.9e-19 0 72±0 07 6.4e-26 77 006 0 72±0 05 l .le-43 0 0081 40 rsl29 14385 T C 15 76685778 0 78±0 07 2.4e-3 1 0 77±0 06 2. 1e-39 76 697 0 78±0 04 6.7e-69 0 06 1 29 rsl l637630 A G 15 76686774 0 72±0 08 2.9e-19 0 74±0 07 6.2e-27 76 993 0 73±0 05 1.6e-44 0 016 37 rs7177514 C G 15 76694461 0 70±0 08 1.4e-17 0 63±0 07 3.7 Θ-20 77 005 0 66±0 05 6.6e-36 2 4e-06 60 rs6495308 T C 15 7669471 1 0 71±0 08 1.2e-18 0 74±0 07 6.1e-27 77 005 0 73±0 05 6.3e-44 0 016 37 rsl2443170 G A 15 76694791 0 63±0 11 6.2e-09 0 78±0 10 3.4e- 16 76 566 0 72±0 07 1.7e-23 0 064 28 rs8042059 A C 15 76694914 0 72±0 08 1.7e-19 0 73±0 07 1.7e-26 77 004 0 73±0 05 2.6e-44 0 014 37 rs8042374 A G 15 76695087 0 71±0 08 8.5e-19 0 72±0 07 1.3e-25 77 005 0 71±0 05 9.5e-43 0 0072 40 rs4887069 A G 15 76696 125 0 72±0 09 3.8e-17 0 71±0 07 4.6e-24 76 962 0 72±0 05 1.6e-39 0 020 36 rs3743075 C T 15 76696507 0 25±0 07 0.00028 0 27±0 06 6.1e-06 76 657 0 26±0 05 6.7e-09 0 83 0 rs8040868 C T 15 76698236 0 77±0 07 1.9e-29 0 74±0 08 2.5e-22 57 303 0 76±0 05 5.8e-50 0 085 29 rs6495309 C T 15 76702300 0 71±0 08 1.6e-18 0 71±0 07 5.2e-24 76 977 0 71±0 05 7.4e-41 0 014 38 rsl7487223 T c 15 7671 1042 0 78±0 07 9.2e-29 0 75±0 06 3. 1e-34 76 928 0 76±0 05 3.9e-61 0 27 12 rsl2440014 c G 15 76713781 0 72±0 08 8.4e-19 0 64±0 07 8.8e-20 76 583 0 67±0 05 l .le-36 3 3e-06 60 rsl2441088 T G 15 76715319 0 62±0 09 le-12 0 77±0 07 2. 1e-26 74 467 0 71±0 06 2.6e-37 0 0066 4 1 rsl l636605 G A 15 76715933 0 59±0 09 2.7e-10 0 67±0 07 1.6e- 19 74 454 0 64±0 06 3.1e-28 0 032 33 rsl2441998 A G 15 76716427 0 59±0 09 2.4e-10 0 66±0 07 le- 19 74 704 0 64±0 06 1.8e-28 0 020 36 rsl l072768 G T 15 76716533 0 59±0 09 2.7e-10 0 66±0 07 1.9e- 19 74 730 0 64±0 06 3.5e-28 0 038 32 rsl316971 G A 15 76717565 0 59±0 09 1.8e-10 0 65±0 07 4.3e- 19 76 883 0 63±0 06 5.5e-28 0 046 30 rs9920506 G A 15 76718 112 0 58±0 11 1.4e-07 0 63±0 09 3e- 13 74 393 0 6 1±0 07 2.4e-19 0 079 27 rsl l634351 A G 15 76731773 0 64±0 07 1.2e-20 0 65±0 06 5e-27 76 728 0 64±0 05 5.5e-46 0 19 17 rs8023822 G C 15 76732095 0 29±0 09 0.00078 0 19±0 07 0.0073 76 917 0 23±0 06 2.8e-05 5 4e-06 59 rsl2594247 C T 15 76733688 0 43±0 09 6e-06 0 61±0 08 2.8e- 14 76 519 0 54±0 06 1.9e-18 0 06 1 29 rsl02 1070 C G 15 76733918 0 41±0 07 2e-09 0 20±0 06 0.0013 76 626 0 29±0 05 2.6e-10 0 046 31 rs718 1405 A G 15 76735207 0 41±0 07 2.9e-09 0 32±0 06 1.9e-07 76 633 0 36±0 05 6e-15 0 78 0 rsl l638830 C G 15 76735374 0 61±0 07 2e-19 0 26±0 06 le-05 76 806 0 40±0 04 5e-20 1 9e-15 76 rs4887074 C G 15 76739 165 0 32±0 08 5.7e-05 0 43±0 07 5.4e- 10 76,971 0 38±0 05 1.9e-13 0 025 34 rsl l072774 C T 15 76739752 0 50±0 10 1.4e-07 0 59±0 08 3.3e- 14 74 816 0 56±0 06 3e-20 0 034 33 rsl7487514 T c 15 76740840 0 59±0 08 7.6e-13 0 65±0 07 1.5e- 19 76 505 0 63±0 05 8.4e-31 0 13 21 rsl2899135 G A 15 76741434 0 60±0 07 1.7e-19 0 59±0 06 2.9e-24 76 807 0 60±0 04 4.7e-42 0 21 16 rsl2148319 A G 15 76743247 0 43±0 10 2.8e-05 0 56±0 09 1.5e- 10 76 675 0 51±0 07 2.6e-14 0 060 29 rsl29 10237 C T 15 76743393 0 40±0 07 8.4e-09 0 30±0 06 l .le-06 76 656 0 34±0 05 le-13 0 71 0 rsl996371 C T 15 76743861 0 61±0 07 4.6e-20 0 59±0 06 2.9e-24 76 808 0 60±0 04 1.4e-42 0 22 15 rsl2594550 C G 15 76746092 0 44±0 10 4.2e-06 0 46±0 09 8.8e-08 76 985 0 45±0 06 1.7e-12 0 0067 4 1 rs6495314 C A 15 76747584 0 59±0 07 l .le-18 0 59±0 06 2. 1e-24 76 809 0 59±0 04 2.1e-41 0 20 16 rs922691 A G 15 76751049 0 37±0 07 7.5e-08 0 28±0 06 7.2e-06 76 630 0 32±0 05 5.2e-12 0 56 0 rsl2905641 C T 15 76751417 0 36±0 07 2.6e-07 0 31±0 06 6e-07 76 680 0 33±0 05 9.8e-13 0 88 0 rsl l072784 C T 15 76753 113 0 25±0 09 0.0065 0 30±0 08 9.3e-05 76 946 0 28±0 06 2e-06 0 24 13 rsl l639372 T c 15 76753710 0 61±0 07 1.8e-19 0 59±0 06 8.1e-24 76 808 0 59±0 04 1.4e-41 0 26 12 rsl2902602 G A 15 76754456 0 61±0 07 l .le-19 0 59±0 06 8.9e-24 76 808 0 60±0 04 le-41 0 22 15 rsl02 1071 C G 15 76755234 0 62±0 07 6.9e-20 0 25±0 06 2.5e-05 76 808 0 40±0 04 1.4e-19 8 7e-15 76 rsl l072785 T C 15 76755284 0 60±0 07 3.6e-19 0 59±0 06 l .le-23 76 808 0 59±0 04 4e-41 0 26 12 rsl l857532 G T 15 76755323 0 54±0 07 4.3e-16 0 49±0 06 5. 1e- 17 76 931 0 51±0 04 2.5e-31 0 40 4 rs4886580 G T 15 76756440 0 61±0 07 l .le-19 0 59±0 06 1.3e-23 76 808 0 60±0 04 1.4e-41 0 26 12 rsl6970006 T c 15 76757314 0 64±0 15 l .le-05 0 56±0 12 1.7e-06 74 571 0 59±0 09 9e-l l 0 017 37 rs8032552 T c 15 76758 191 0 31±0 08 7.8e-05 0 41±0 07 2.2e-09 77 005 0 37±0 05 9.7e-13 0 24 13 rsl l072787 T c 15 76760032 0 32±0 08 3.2e-05 0 41±0 07 le-09 77 002 0 38±0 05 1.9e-13 0 20 16 rs8043123 c T 15 76760448 0 30±0 08 0.0001 1 0 40±0 07 3.6e-09 76 995 0 36±0 05 2.3e-12 0 25 13 rs8038920 G A 15 76761600 0 38±0 07 7.2e-08 0 30±0 06 7.3e-07 76 699 0 33±0 05 4.3e-13 0 90 0 rs4887077 T C 15 76765419 0 61±0 07 2.2e-19 0 57±0 06 1.5e-22 76 829 0 58±0 04 3.8e-40 0 27 12 rsl l638372 T C 15 76770614 0 60±0 07 5.3e-19 0 57±0 06 1.9e-22 76 827 0 58±0 04 l .le-39 0 31 10 rs922692 A C 15 76771269 0 60±0 07 2.7e-19 0 57±0 06 8.9e-23 76 813 0 58±0 04 2.6e-40 0 29 11 rsl29 10627 C G 15 76781988 0 60±0 07 2.3e-19 0 24±0 06 4.9e-05 76 808 0 39±0 04 7.7e-19 5 4e-14 75 rsl l072791 A C 15 76784131 0 61±0 07 1.4e-19 0 58±0 06 8.6e-23 76 805 0 59±0 04 1.3e-40 0 31 10 rsl l633519 G A 15 76786607 0 35±0 08 5.3e-06 0 36±0 07 9.7e-08 76 992 0 36±0 05 2.4e-12 0 57 0 rsl2899940 T C 15 76788754 0 38±0 08 l .le-06 0 36±0 07 4.9e-08 77 015 0 37±0 05 2.6e-13 0 54 0 rsl l634628 A G 15 76792634 0 38±0 08 6.6e-07 0 36±0 07 4.5e-08 77 018 0 37±0 05 1.6e-13 0 56 0 rsl l072793 A G 15 76793497 0 39±0 08 3.1e-07 0 35±0 07 7.5e-08 76 825 0 37±0 05 1.3e-13 0 44 2 rsl l072794 C T 15 76793637 0 39±0 08 5e-07 0 36±0 07 4.6e-08 77 015 0 37±0 05 1.2e-13 0 6 1 0 rsl l638490 T c 15 76795005 0 61±0 07 l .le-19 0 58±0 06 9.4e-23 76 805 0 59±0 04 l .le-40 0 37 6 rs4887078 T c 15 76798 128 0 41±0 08 2.2e-07 0 36±0 07 7e-08 76 824 0 38±0 05 9.5e-14 0 58 0 rsl l629637 T c 15 76806079 0 61±0 07 9.6e-20 0 58±0 06 1.4e-22 76 994 0 59±0 04 1.4e-40 0 37 6 rs899997 T G 15 76806633 0 39±0 08 2.2e-06 0 35±0 07 3. 1e-07 76 742 0 36±0 05 3.5e-12 0 57 0 rs3813565 T G 15 76806665 0 64±0 07 7.1e-20 0 61±0 06 3e-23 76 800 0 63±0 05 2.2e-41 0 20 16 rs4887082 c T 15 76812 122 0 59±0 07 8e-18 0 58±0 06 1.6e-21 76 869 0 58±0 05 l .le-37 0 46 1 rsl383634 c T 15 76816451 0 39±0 08 8.1e-07 0 35±0 07 3.4e-07 76 734 0 37±0 05 1.5e-12 0 56 0 rs2219939 A G 15 76816778 0 42±0 08 3.4e-07 0 36±0 07 2. 1e-07 75 903 0 38±0 05 4.6e-13 0 6 1 0 rs4887091 C T 15 76830635 0 39±0 08 2.6e-06 0 34±0 07 6.7e-07 76 910 0 36±0 05 9.4e-12 0 73 0 rs7182567 G A 15 76832 109 0 39±0 08 1.2e-06 0 35±0 07 5.6e-07 76 9 11 0 36±0 05 3.5e-12 0 71 0 rsl2286 A G 15 76838814 0 58±0 07 5.3e-17 0 58±0 06 6.3e-21 76 863 0 58±0 05 2.9e-36 0 48 0 rsl809420 C T 15 76843824 0 56±0 07 5.2e-14 0 55±0 06 4.3e- 19 72 292 0 55±0 05 1.8e-31 0 31 10 rs7174367 G A 15 76851722 0 57±0 07 1.2e-16 0 55±0 06 1.5e- 19 76,866 0 56±0 05 1.7e-34 0 44 1 rs717 1916 G c 15 76855006 0 .54±0..07 3e-14 0 .38±0..06 2e-09 76,717 0 .44±0..05 2.4e-21 1 .4e-05 57 rsl994016 T c 15 76867289 0 .53±0..07 3.4e-14 0 .40±0..06 6.2e- l l 76,924 0 .45±0..05 5.5e-23 0 ..21 15 rsl994017 c T 15 76867361 0 .33±0..08 8.8e-06 0 .26±0..07 0.00012 77,004 0 .29±0..05 6.7e-09 0 ..90 0 rsl2905740 c T 15 76869419 0 .32±0..07 1.7e-05 0 .26±0..07 0.00013 76,703 0 .28±0..05 1.2e-08 0 ..90 0 rs2277545 c T 15 76870646 0 .50±0..07 7.7e-14 0 .41±0..06 2.8e- 12 76,965 0 .45±0..04 3.1e-24 0 .16 19 rsl564499 c T 15 76871863 0 .33±0..08 9.5e-06 0 .25±0..07 0.0002 76,709 0 .28±0..05 1.3e-08 0 ..92 0 rsl2903203 c T 15 76871988 0 .51±0..07 2.9e-14 0 .41±0..06 3.5e- 12 76,965 0 .45±0..04 1.7e-24 0 .15 20 rs2904228 G A 15 76873 154 0 .34±0..08 7.9e-06 0 .26±0..07 0.00019 76,667 0 .29±0..05 le-08 0 ..88 0 rs3743057 C T 15 76876062 0 .32±0..07 1.6e-05 0 .23±0..07 0.00037 76,717 0 .27±0..05 3.9e-08 0 ..88 0 rs3825807 G A 15 76876 166 0 .50±0..07 6.2e-14 0 .41±0..06 4.2e- 12 76,959 0 .45±0..04 3.7e-24 0 .16 19 rs7177699 C T 15 76876789 0 .50±0..07 7.6e-14 0 .42±0..06 7.9e- 12 75,326 0 .45±0..04 7.8e-24 0 .14 21 rs8038189 C G 15 76886081 0 .38±0..08 5.8e-07 0 . 18±0..07 0.0083 76,667 0 .26±0..05 1.8e-07 0 ..34 7 rs922693 A G 15 76886593 0 .40±0..08 1.8e-07 0 .26±0..07 0.0001 1 76,662 0 .32±0..05 2.9e-10 0 ..77 0 rsl 1634042 T C 15 76892405 0 .54±0..07 3.2e-14 0 .37±0..08 3e-06 57,483 0 .46±0..05 3.3e-18 0 ..048 33 rsl383636 A G 15 76893275 0 .41±0..08 8.6e-08 0 .26±0..07 0.00013 76,902 0 .33±0..05 1.9e-10 0 ..79 0 rs4380028 T C 15 76898 148 0 .31±0..07 4.6e-06 0 .29±0..06 6e-07 76,651 0 .30±0..04 1.4e-l l 0 ..020 36 rs6495335 G T 15 76904188 0 .33±0..07 1.9e-06 0 .29±0..06 5.6e-07 76,645 0 .31±0..04 5.8e-12 0 ..027 34 rs7178051 T c 15 76905351 0 .35±0..07 5.2e-07 0 .30±0..06 6.1e-07 76,895 0 ..32±0..05 2.1e-12 0 ..043 31 rs7176187 T c 15 76908428 0 .32±0..07 2.3e-06 0 .31±0..06 1.6e-07 76,701 0 .31±0..04 1.9e-12 0 ..029 34 rsl l852830 A T 15 76912484 0 .36±0..06 1.7e-08 0 .36±0..06 6.1e- 10 76,977 0 .36±0..04 5.9e-17 0 ..28 11 rs6495337 C G 15 76912744 0 .30±0..06 1.4e-06 0 . 16±0..06 0.005 76,999 -0.06±0.04 0.11 1 .0e-08 66 rs8032771 A G 15 76913 114 0 .37±0..06 1.6e-08 0 .36±0..06 6.2e- 10 76,976 0 .36±0..04 5.8e-17 0 ..30 10 rs4539564 G A 15 76915554 0 .35±0..06 5.5e-08 0 .36±0..06 6.5e- 10 76,973 0 .35±0..04 1.9e-16 0 ..34 8 rs8035039 A G 15 76916878 0 .34±0..06 l .le-07 0 .36±0..06 6.3e- 10 76,974 0 .35±0..04 3.7e-16 0 ..34 8 rsl l072810 T C 15 76919261 0 ..37±0..07 l .le-08 0 .35±0..06 3.8e-09 77,003 0 .36±0..04 2.7e-16 0 ..65 0 rsl l07281 1 A C 15 76919385 0 .38±0..07 5.1e-09 0 .34±0..06 7. 1e-09 76,998 0 .36±0..04 2.5e-16 0 ..58 0 rs7403393 C G 15 76922857 0 .38±0..07 1.2e-08 0 .27±0..06 7.6e-06 76,977 0 .32±0..04 1.4e-12 0 ..035 32 rs7173743 C T 15 76928839 0 ..39±0..07 3e-09 0 .34±0..06 l .le-08 76,437 0 .36±0..04 2.3e-16 0 ..53 0 rs7164529 G A 15 76932853 0 .38±0..07 8.4e-09 0 .37±0..06 l .le-09 76,973 0 ..37±0..04 5.1e-17 0 ..56 0 rs5029904 G C 15 76939477 0 .40±0..07 1.2e-09 0 .25±0..06 2.2e-05 76,930 0 .31±0..04 le-12 0 ..045 30 rsl2595538 A T 15 76941508 0 .40±0..07 2.5e-09 0 .38±0..06 l .le-09 76,885 0 ..39±0..05 1.6e-17 0 ..50 0 rs8029659 A G 15 76954658 0 ..37±0..07 3.2e-07 0 .29±0..07 1.7e-05 76,722 0 ..32±0..05 3.9e-l l 0 ..46 1 rsl7243470 T G 15 76959821 0 .36±0..07 1.2e-06 0 .30±0..07 le-05 76,724 0 ..32±0..05 7.2e-l l 0 ..52 0 rsl7832351 A G 15 76960060 0 .37±0..07 2.9e-07 0 .30±0..07 le-05 76,724 0 .33±0..05 2e-l l 0 ..53 0 rs8047986 C T 16 6461007 0 .25±0..06 8.3e-05 0 . 13±0..06 0.026 76,946 0 .18±0..04 2.le-05 0 ..25 13 rs809704 A T 16 6461769 0 .26±0..06 6.6e-05 0 . 14±0..06 0.021 76,946 0 .19±0..04 1.3e-05 0 ..25 13 rs802698 C T 16 6463483 0 .25±0..06 0.0001 1 0 . 14±0..06 0.018 76,695 0 .19±0..04 1.5e-05 0 .15 20 rsl640968 A c 16 6463672 0 .26±0..06 6.2e-05 0 . 14±0..06 0.019 76,694 0 .19±0..04 1.2e-05 0 ..22 15 rs42347 T A 16 6465592 0 .25±0..06 0.00013 0 . 13±0..06 0.027 76,692 0 .18±0..04 3e-05 0 ..34 7 rs7187508 A C 16 6473550 0 ..29±0..07 2e-05 0 .08±0..08 0.33 57,294 0 .19±0..05 0.00015 0 .18 20 rs813914 T C 16 6474370 0 .20±0..06 0.00 15 0 .08±0..06 0.18 76,710 0 .13±0..04 0.0021 0 ..61 0 rs3095508 c A 16 6490401 0 .23±0..07 0.00052 0 . 10±0..06 0.11 76,717 0 .15±0..04 0.0005 0 ..42 3 rs81 1919 G T 16 6492268 0 23±0 07 0 00043 0.09±0.06 0 14 76,717 0 15±0 04 0 00063 0 38 5 rsl l645855 G A 16 6798924 0 51±0 13 0 00012 -0. 12±0. 11 0 27 76,620 0 12±0 08 0 12 0 17 19 rsl 1648889 A G 16 6804585 0 63±0 16 6 2e-05 -0. 13±0. 13 0 32 76,408 0 15±0 09 0 11 0 063 29 rs4786123 G C 16 6813922 0 24±0 08 0 0027 -0.08±0.07 0 22 76,517 0 05±0 05 0 37 0 00 18 47 rs741591 A G 16 81544689 0 35±0 08 8 le-06 -0.09±0.06 0 13 74,571 0 06±0 04 0 15 0 083 26 rs9928039 G C 16 81555241 0 26±0 06 5 4e-05 0.01±0.06 0 9 1 77,001 0 12±0 04 0 0051 0 0037 43 rsl2929479 A G 16 81555354 0 22±0 07 0 00082 0.07±0.06 0 23 76,949 0 14±0 04 0 002 0 12 22 rs806421 1 G A 16 81558599 0 27±0 06 2 9e-05 0.05±0.06 0 43 77,025 0 14±0 04 0 00092 0 010 39 rs8062451 A G 16 81558637 0 27±0 06 2 4e-05 0.04±0.06 0 45 77,019 0 14±0 04 0 00088 0 012 38 rs8046196 T G 16 81558849 0 24±0 06 0 00024 0.10±0.06 0 1 76,951 0 16±0 04 0 00026 0 029 33 rsl0492868 G C 16 81559356 0 30±0 06 4 5e-06 0.02±0.06 0 74 76,939 0 14±0 04 0 0014 0 0026 44 rs8059783 T G 16 81570343 0 30±0 06 3 4e-06 0.03±0.06 0 59 76,997 0 14±0 04 0 0006 0 00 17 46 rs4782742 c A 16 81573 151 0 31±0 06 1 8e-06 0.04±0.06 0 53 77,005 0 15±0 04 0 00033 0 0035 43 rs8063602 c A 16 81573922 0 32±0 06 8 2e-07 0.04±0.06 0 53 77,001 0 16±0 04 0 00023 0 0040 43 rs7404645 A G 16 81578029 0 28±0 07 0 00022 0.02±0.07 0 8 1 76,620 0 12±0 05 0 0 1 0 0054 42 rs806 1888 A G 16 81578930 0 26±0 08 0 00067 -0.07±0.06 0 28 77,007 0 06±0 05 0 19 0 020 36 rsl l l50530 T A 16 81583815 0 29±0 08 0 00015 -0.01±0.07 0 9 1 76,547 0 12±0 05 0 018 0 00 18 46 rs6565099 T C 16 81596483 0 30±0 06 3 le-06 0.00±0.05 1 76,684 0 13±0 04 0 0024 0 0077 4 1 rs720601 1 A T 16 81596720 0 28±0 06 9 2e-06 -0.04±0.06 0 5 76,929 0 10±0 04 0 017 0 047 30 rs7203988 C A 16 81600439 0 28±0 06 7 8e-06 0.00±0.06 0 94 76,992 0 12±0 04 0 0029 0 0061 4 1 rs6565100 T C 16 81600819 0 22±0 06 0 00037 0.01±0.06 0 9 1 76,976 0 10±0 04 0 016 0 026 34 rsl2925746 G A 16 81602 166 0 48±0 14 0 00046 0.00±0. 11 0 96 74,537 0 17±0 08 0 033 0 32 9 rs5000155 T C 16 81615 158 0 36±0 10 0 00063 -0.02±0.08 0 77 76,51 1 0 12±0 06 0 064 0 22 15 rs8057717 A C 16 81617414 0 44±0 10 6 le-06 -0.07±0.08 0 36 76,406 0 12±0 06 0 045 0 049 31 rsl29 18209 C T 16 81617613 0 34±0 08 1 9e-05 -0.04±0.07 0 53 76,449 0 11±0 05 0 025 0 0021 46 rs9888896 C T 16 81622904 0 32±0 08 5 6e-05 -0.02±0.07 0 79 76,482 0 12±0 05 0 019 0 0037 44 rs6565105 A G 16 81623 165 0 22±0 07 0 00 12 -0.00±0.06 0 97 76,977 0 09±0 04 0 043 0 13 21 rsl2934188 G A 16 81624733 0 39±0 10 6 4e-05 -0.07±0.08 0 4 1 76,685 0 10±0 06 0 072 0 37 6 rsl2934355 G A 16 81624826 0 41±0 10 2 3e-05 -0.06±0.08 0 43 76,984 0 12±0 06 0 049 0 37 6 rs4366697 C G 16 81625484 0 37±0 08 1 6e-06 -0.06±0.07 0 37 76,370 0 12±0 05 0 014 0 00 11 48 rs4329910 A G 16 81627 132 0 33±0 09 0 00024 -0.07±0.08 0 37 77,010 0 09±0 05 0 12 0 32 9 rs4473177 T C 16 81627210 0 34±0 08 3 4e-05 -0.03±0.07 0 64 76,474 0 12±0 05 0 022 0 0026 45 rsl0220997 T C 16 81627755 0 36±0 09 9 9e-05 -0.07±0.08 0 35 76,990 0 09±0 06 0 092 0 38 5 rsl l l50533 T G 16 81628647 0 37±0 09 8 4e-05 -0.08±0.08 0 33 76,988 0 09±0 06 0 099 0 37 6 rs4387591 G C 16 81629923 0 27±0 07 0 00014 -0.02±0.07 0 75 76,640 0 12±0 05 0 Oi l 0 034 32 rs4290460 C A 16 81629963 0 36±0 08 2 6e-05 -0.03±0.07 0 65 76,368 0 12±0 05 0 021 0 0033 44 rs4783307 G T 16 81634135 0 43±0 08 1 3e-08 -0.06±0.06 0 38 76,823 0 14±0 05 0 0035 0 0084 40 rs4523912 G A 16 81634243 0 37±0 09 8 7e-05 -0.08±0.08 0 33 74,783 0 10±0 06 0 082 0 058 29 rs4294808 G A 16 81636082 0 44±0 08 2 5e-08 -0.06±0.07 0 35 76,710 0 14±0 05 0 0049 0 0096 39 rsl7685517 A G 16 82064531 0 33±0 09 0 00014 -0.06±0.07 0 38 77,018 0 09±0 05 0 088 0 33 8 rsl7685702 T C 16 82067235 0 28±0 08 0 00029 -0.05±0.07 0 46 77,01 1 0 08±0 05 0 088 0 25 13 rs2200793 G A 16 82069004 0 34±0 08 5 4e-05 -0.06±0.07 0 42 77,007 0 10±0 05 0 055 0 42 3 rsl6960622 G A 16 82072047 0 32±0 09 0 00017 -0.05±0.07 0 46 77,000 0 09±0 05 0 075 0 42 3 rsl7758985 T C 16 82075 197 0 33±0 08 0 00012 -0.05±0.07 0 46 77,018 0 10±0 05 0 066 0 27 12 rsl7686362 c T 16 82075242 0 35±0 09 3 le-05 -0.05±0.07 0 48 77,020 0 11±0 05 0 039 0 29 11 rs923419 A G 16 8207591 1 0 33±0 09 9.9e-05 -0.05±0.07 0 48 77 017 0 10±0 05 0 06 0 28 11 rsl2930939 T C 16 82095449 0 32±0 09 0.00014 -0.06±0.07 0 44 77 012 0 09±0 05 0 078 0 16 19 rsl387379 T C 16 82097095 0 30±0 09 0.00039 -0.04±0.07 0 59 76 971 0 09±0 05 0 077 0 20 17 rs4238689 A G 16 82097206 0 32±0 08 0.00016 -0.04±0.07 0 54 76 941 0 10±0 05 0 063 0 4 1 3 rs4782539 C G 16 82097934 0 29±0 08 0.00053 -0.01±0.07 0 9 1 76 964 0 11±0 05 0 039 0 40 4 rs7189644 T C 16 82098347 0 31±0 08 0.00028 -0.03±0.07 0 65 76 954 0 10±0 05 0 058 0 45 1 rs2607420 A G 19 45936727 0 32±0 09 0.00043 0.09±0.07 0 23 30 986 0 17±0 06 0 002 0 14 30 rs2305797 C T 19 45960916 0 26±0 07 0.00025 0.08±0.06 0 24 76 855 0 15±0 05 0 001 1 0 037 32 rs227901 1 T G 19 45961 128 0 29±0 07 l .le-05 0.14±0.06 0 019 76 954 0 20±0 04 3 9e-06 0 51 0 rsl2973666 c G 19 45981237 0 22±0 06 0.0004 0.16±0.06 0 0061 76 997 0 15±0 04 0 00045 0 020 36 rsl2151282 c T 19 45990259 0 29±0 07 2.2e-05 0.09±0.06 0 13 76 771 0 18±0 05 9 8e-05 0 11 24 rs7252227 T G 19 45992955 0 28±0 07 2.2e-05 0.16±0.06 0 006 77 Oi l 0 21±0 04 1 3e-06 0 65 0 rs7937 T C 19 45994546 0 34±0 07 2.2e-07 0.19±0.06 0 001 1 77 007 0 25±0 04 5 3e-09 0 42 3 rs2644916 c T 19 46001051 0 29±0 08 0.00028 0.19±0.07 0 004 76 997 0 23±0 05 6 4e-06 0 63 0 rs725 1418 G A 19 46033429 0 52±0 11 7.7e-07 0.19±0.08 0 02 28 522 0 31±0 06 1 4e-06 0 87 0 rs725 1570 G A 19 46033590 0 55±0 10 l .le-08 0.25±0.08 0 0017 28 434 0 36±0 06 2 3e-09 0 59 0 rs4343391 C G 19 46036208 0 52±0 11 7.1e-07 0.22±0.09 0 0 1 28 546 0 33±0 07 4 7e-07 0 93 0 rsl80 1272 A T 19 46046373 1 08±0 27 7e-05 0.41±0. 24 0 084 28 296 0 68±0 18 0 0001 1 0 50 0 rs4105144 C T 19 46050464 0 59±0 10 1.2e-09 0.31±0.08 5 8e-05 28 508 0 4 1±0 06 5 9e-12 0 87 0 rs8102683 C T 19 46055605 0 62±0 10 1.6e-09 0.26±0.08 0 0021 28 535 0 40±0 06 8 4e-10 0 97 0 rsl496402 A T 19 46057974 0 58±0 10 1.9e-09 0.31±0.08 5 9e-05 28 524 0 4 1±0 06 9 le-12 0 86 0 rsl2461383 G c 19 46062 178 0 55±0 09 3.6e-09 0.25±0.07 0 00066 28 206 0 36±0 06 3 le-10 0 74 0 rs3852872 T c 19 46107983 0 34±0 08 1.4e-05 0.15±0.07 0 027 77 022 0 22±0 05 8 4e-06 0 43 3 rs3852873 G T 19 46108 100 0 46±0 09 2.7e-07 0.14±0.07 0 053 74 678 0 26±0 06 3 7e-06 0 30 10 rs4090553 C T 19 461 18597 0 00±0 0 1 0.61 0.15±0.07 0 027 77 019 0 0 1±0 0 1 0 42 0 0088 40 rsl2459565 C G 19 461 19379 0 29±0 07 5.4e-05 0.14±0.07 0 032 77 019 0 21±0 05 1 9e-05 0 39 5 rs3844443 C T 19 46123775 0 34±0 08 l .le-05 0.12±0.07 0 057 77 020 0 21±0 05 2 2e-05 0 40 4 rs3843043 T c 19 46125771 0 33±0 08 2.3e-05 0.15±0.07 0 024 77 024 0 22±0 05 9 4e-06 0 48 0 rs3844444 G A 19 46125887 0 34±0 08 2e-05 0.15±0.07 0 029 74 856 0 22±0 05 1 3e-05 0 4 1 3 rsl820025 C G 19 46128386 0 35±0 08 l .le-05 0.14±0.07 0 043 77 024 0 22±0 05 1 6e-05 0 38 5 rs81 10485 G A 19 46140045 0 33±0 09 0.00012 0.12±0.07 0 094 74 157 0 20±0 05 0 00021 0 0020 46 rsl26 11133 T C 19 46140444 0 32±0 08 2.8e-05 0.10±0.07 0 14 74 496 0 19±0 05 0 00013 0 0058 42 rs4124633 T C 19 46141443 0 33±0 08 6.9e-05 0.05±0.07 0 5 74 331 0 16±0 05 0 0023 0 47 0 rs3745220 T C 19 46142449 0 35±0 08 2.4e-05 0.06±0.07 0 4 1 74 312 0 18±0 05 0 00088 0 43 2 rs4239510 T C 19 46145339 0 31±0 07 2.3e-05 0.09±0.06 0 15 74 711 0 18±0 05 0 00014 0 015 37 rs4803408 c T 19 46148705 0 92±0 25 0.00018 0.18±0. 23 0 43 28 243 0 50±0 16 0 0023 0 99 0 rs3889806 T c 19 46151081 0 33±0 07 6.5e-06 0.10±0.06 0 13 74 739 0 19±0 05 5 6e-05 0 0098 39 rsl0417579 T c 19 46156970 0 31±0 07 2.7e-05 0.09±0.06 0 16 74 699 0 17±0 05 0 00021 0 018 36 rsl2459237 T c 19 46158771 1 71±0 49 0.00054 -0.66±0.68 0 33 26 857 0 82±0 38 0 032 0 11 43 rsl l671 108 A c 19 46173813 0 29±0 08 0.00015 0.07±0.07 0 32 76 613 0 16±0 05 0 0013 0 86 0 rs725 1950 C T 19 46174582 0 33±0 07 1.5e-06 0.07±0.06 0 24 77 024 0 18±0 04 5 7e-05 0 16 19 rsl808002 C T 19 46177034 0 38±0 07 7.5e-08 0.07±0.06 0 28 77 016 0 19±0 05 2 3e-05 0 12 23 rsl808682 G A 19 46181288 0 29±0 08 0.0002 0.07±0.07 0 33 76 606 0 16±0 05 0 0019 0 83 0 rsl0418990 G C 19 46182244 0 32±0 07 2.5e-06 0.07±0.06 0 23 74 810 0 17±0 04 9 5e-05 0 051 30 rs8109525 A G 19 46183758 0 34±0 07 4.8e-07 0.05±0.06 0 44 76,996 0 17±0 04 0 0001 1 0 21 16 rs2099361 C A 19 46190 188 0 .30±0..07 1.4e-05 0.07±0.06 0 .25 76,951 0 .16±0..04 0 .00025 0 ..035 32 rs6508963 T C 19 46190573 0 .37±0..07 1.2e-07 0.04±0.06 0 .52 77,006 0 .17±0..04 0 .0001 1 0 .18 18 rs2014141 A G 19 46191829 0 .33±0..07 2.1e-06 0.03±0.06 0 .56 74,838 0 .15±0..04 0 .00063 0 ..033 33 rs8100458 T C 19 46192053 0 .37±0..07 8.4e-08 0.04±0.06 0 .56 77,002 0 .17±0..04 0 .0001 0 .18 18 rs6508964 G A 19 46194442 0 .32±0..07 4.4e-06 0.03±0.06 0 ..61 74,830 0 .14±0..04 0 .001 1 0 ..050 30 rs4803417 C A 19 46199860 0 .30±0..07 4.4e-06 0.02±0.06 0 .78 76,976 0 .14±0..04 0 .0015 0 ..033 32 rs81 13196 A C 19 46206 192 0 .30±0..07 2e-05 0.06±0.06 0 .31 74,667 0 .16±0..05 0 .00053 0 ..012 38 rs81 13200 A G 19 46206203 0 .30±0..07 2e-05 0.06±0.06 0 .32 74,667 0 .16±0..05 0 .00055 0 ..013 38 rs2279345 T C 19 46207542 0 .30±0..07 2.1e-05 0.06±0.06 0 .33 74,667 0 .16±0..05 0 .00062 0 ..012 39 rsl l671243 A C 19 4621 1555 0 .29±0..07 5.1e-05 0.07±0.06 0 .29 74,665 0 .15±0..05 0 .00074 0 ..0052 42 rs7260329 G A 19 46213478 0 .43±0..07 l .le-09 0.06±0.06 0 .36 76,898 0 .2 1±0..05 3 .4e-06 0 .12 22 rs3786551 C G 19 46213586 0 .41±0..08 6.4e-08 0.10±0.06 0 . 13 74,731 0 .22±0..05 4 .6e-06 0 ..31 10 rs3786552 C G 19 46213597 0 .40±0..07 5.8e-08 0.10±0.06 0 . 13 76,898 0 .22±0..05 3 .8e-06 0 ..36 6 rs707265 A G 19 46215927 0 .30±0..07 le-05 0.05±0.06 0 ..4 76,812 0 .16±0..04 0 .00045 0 ..0079 40 rsl552222 T A 19 46217744 0 .36±0.10 0.00043 0.11±0.09 0 .22 74, 167 0 .2 1±0..07 0 .0015 0 ..89 0 rs21 13103 G A 19 46220507 0 .34±0.10 0.00045 0.12±0.08 0 . 15 74,528 0 .2 1±0..06 0 .00091 0 ..79 0 rs7257703 A G 19 46228462 0 .32±0..08 3.9e-05 0.10±0.07 0 . 15 74,303 0 .19±0..05 0 .0002 0 ..013 38 rs9608562 C T 22 25725385 0 ..42±0..11 7.9e-05 -0.08±0. 10 0 .39 76,572 0 .13±0..07 0 .055 0 ..49 0 rsl2628550 C A 22 25727683 0 ..49±0..12 1.8e-05 0.03±0. 10 0 .77 76,733 0 .23±0..08 0 .0027 0 ..93 0 rs2516082 T C 22 25728512 0 ..41±0..08 6e-07 0.02±0.07 0 ..83 77,017 0 .18±0..05 0 .0007 0 ..73 0 rs9608564 c T 22 25728910 0 .40±0..08 7.4e-07 0.01±0.07 0 ..84 77,017 0 .18±0..05 0 .00079 0 ..72 0 rsl 1090466 G A 22 25729321 0 ..42±0..08 2.1e-07 0.01±0.07 0 ..87 77,017 0 .18±0..05 0 .00053 0 ..72 0 rsl2628017 T C 22 25729405 0 ..39±0..08 l .le-06 0.01±0.07 0 ..89 77,017 0 .17±0..05 0 .001 1 0 ..63 0 rs9613336 c T 22 25730316 0 ..39±0..08 9e-07 0.01±0.07 0 ..93 77,017 0 .17±0..05 0 .001 1 0 ..69 0 rs96 13337 c T 22 25730447 0 .38±0..08 1.9e-06 0.00±0.07 0 ..97 77,017 0 .16±0..05 0 .0019 0 ..67 0 rs9608565 c T 22 25733729 0 .40±0..08 9.7e-07 0.01±0.07 0 ..87 77,017 0 .17±0..05 0 .001 1 0 ..68 0 rs576 1920 A G 22 25767550 0 ..32±0..09 0.0003 -0.07±0.07 0 .33 76,627 0 .08±0..05 0 .14 0 ..42 3 rs576 1921 C T 22 25768386 0 .30±0..09 0.00073 -0.06±0.07 0 .37 76,700 0 .07±0..05 0 .17 0 ..50 0 rsl2160816 T c 22 25770602 0 ..29±0..09 0.00083 -0.02±0.07 0 .74 76,699 0 .09±0..05 0 .074 0 ..75 0 rs572784 c T 22 25936778 0 .23±0..06 0.00033 -0.01±0.06 0 ..86 76,954 0 .09±0..04 0 .026 0 ..33 8 Table 9 . Smoking Initiation. Association of markers within the regions selected by ENGAGE. Results are given for the ENGAGE discovery sample, and the in-silico replication studies using data from the TAG and OX/GSK consortia (see accompanying papers). Shown are the number of smokers and never- smokers (N), the effect allele and the other allele, the allele frequencies (Freq), the chromosome number and position, the effect size and standard error (Effect and SE), the P value for the test of association (P), the P value for the test for heterogeneity in effect size (Phet) and an estimate of the proportion of total variation in study estimates that is due t o heterogeneity (I 2 )

ENGAGE I n silico Combined 2 SNP Al A2 Chr Position Effect ±SE P EffectiSE P EffectiSE P Phet I rs839756 G C 1 43629135 0 .,52±0. 14 0 ,,00024 -0.02±0. 16 0 ,.88 0 ,,26±0,,10 0 ,,01 1 0 ,.75 0 rsl762343 T A 1 43631524 0 .,49±0. 14 0 ,,00035 -0.03±0. 11 0 ,.76 0 ,,10±0,,08 0 ,,24 0 ,.55 0 rs839758 A G 1 43634745 0 .,54±0. 13 5 ,,2e-05 0.22±0. 11 0 ,.04 0 ,,34±0,,08 4 ,,6e-05 0 ,.81 0 rs710249 G C 1 43641822 0 .,57±0. 14 7 ,,4e-05 0.22±0. 11 0 ,.043 0 ,,20±0,,09 0 ,,024 0 ,.25 13 rs839768 A G 1 43644210 0 .,59±0. 15 5 ,,5e-05 0.25±0. 11 0 ,.019 0 ,,35±0,,08 3 ,,6e-05 0 ,.73 0 rs839771 T G 1 43646056 0 .,54±0. 14 0 ,,0001 1 0.28±0. 11 0 ,.0095 0 ,,37±0,,08 1 ,,2e-05 0 ,.76 0 rs839772 G A 1 43646070 0 .,56±0. 14 8 ,,4e-05 0.27±0. 11 0 ,.012 0 ,,37±0,,08 1 ,,6e-05 0 ,.75 0 rs2842177 G C 1 43657244 0 .,56±0. 14 8 ,.le-05 0.25±0. 11 0 ,.019 0 ,,23±0,,09 0 ,,01 1 0 ,.26 12 rs2782642 A G 1 43658908 0 .,59±0. 15 4 ,,9e-05 0.27±0. 11 0 ,.01 1 0 ,,37±0,,08 1 ,,3e-05 0 ,.75 0 rs2782643 C T 1 43659081 0 .,60±0. 15 4 ,,le-05 0.28±0. 11 0 ,.01 0 ,,37±0,,09 1 ,,le-05 0 ,.75 0 rs2782644 G A 1 43659733 0 .,55±0. 14 0 ,,0001 1 0.27±0. 11 0 ,.01 1 0 ,,37±0,,08 1 ,,6e-05 0 ,.77 0 rs2782645 G A 1 43661 189 0 .,56±0. 14 8 ,,5e-05 0.27±0. 11 0 ,.01 1 0 ,,37±0,,08 1 ,,3e-05 0 ,.76 0 rs2782646 G A 1 43664776 0 .,56±0. 14 8 ,,4e-05 0.28±0. 11 0 ,.01 0 ,,37±0,,08 1 ,,3e-05 0 ,.76 0 rs2842179 C T 1 43672214 0 .,59±0. 15 4 ,,9e-05 0.27±0. 11 0 ,.013 0 ,,37±0,,08 1 ,,7e-05 0 ,.73 0 rs2039531 A c 1 43673598 0 .,59±0. 15 5 ,,9e-05 0.27±0. 11 0 ,.013 0 ,,36±0,,08 1 ,,9e-05 0 ,.74 0 rs2782647 T G 1 43674045 0 .,56±0. 14 7 ,,4e-05 0.28±0. 11 0 ,.01 0 ,,37±0,,08 1 ,,2e-05 0 ,.74 0 rs2842180 c T 1 43674503 0 .,58±0. 15 6 ,,2e-05 0.26±0. 11 0 ,.016 0 ,,36±0,,09 2 ,,5e-05 0 ,.80 0 rs2782648 G T 1 43676176 0 .,64±0. 15 3 ,,4e-05 0.23±0. 14 0 ,.11 0 ,,40±0,,10 9 ,,4e-05 0 ,.57 0 rs2842182 A G 1 43676282 0 .,25±0. 10 0 ,,012 -0.01±0. 11 0 ,.92 0 ,,13±0,,07 0 ,,081 0 ,.66 0 rs2842184 A G 1 43676903 0 .,64±0. 15 3e-05 0.23±0. 14 0 ,.11 0 ,,40±0,,10 8 ,,8e-05 0 ,.56 0 rs2782649 T C 1 43678406 0 .,56±0. 14 7 ,,3e-05 0.27±0. 11 0 ,.01 1 0 ,,37±0,,08 1 ,,2e-05 0 ,.74 0 rs2027130 A G 1 43679483 0 ..59±0. 15 5 ,,6e-05 0.27±0. 11 0 ,.013 0 ,,36±0,,08 1 ,,9e-05 0 ,.74 0 rs2782650 A G 1 43685177 0 ..59±0. 15 4 ,,8e-05 0.27±0. 11 0 ,.014 0 ,,36±0,,08 1 ,,8e-05 0 ,.73 0 rs225 1804 T C 1 43689996 0 .,55±0. 14 0 ,,00013 0.25±0. 11 0 ,.022 0 ,,34±0,,08 4 ,,8e-05 0 ,.69 0 rs225 1802 G A 1 43690224 0 .,55±0. 14 9 ,,7e-05 0.23±0. 14 0 ,.11 0 ,,38±0,,10 0 ,,00012 0 ,.50 0 rs2782651 C G 1 43693273 0 .,6 1±0. 15 7 ,,7e-05 0.03±0. 11 0 ,.78 0 ,,08±0,,10 0 ,,42 0 ,.54 0 rs2842195 G T 1 43693532 0 ..62±0. 15 2 ,,4e-05 0.27±0. 11 0 ,.012 0 ,,36±0,,08 1 ,,4e-05 0 ,.72 0 rsl334973 T G 1 43693971 0 ..07±0.,05 0 ,,16 0.26±0. 11 0 ,.014 0 ,,11±0,,05 0 ,,017 0 ,.20 16 rs2782657 G C 1 43702575 0 ..57±0. 15 0 ,,0002 O. l liO. l l 0 ,.35 0 ,,12±0,,10 0 ,,24 0 ,.18 17 rs7413861 C A 1 4371 1388 0 .,55±0. 15 0 ,,0002 0.32±0. 11 0 ,.0038 0 ,,40±0,,09 6 ,,3e-06 0 ,.73 0 rsl l 210860 G A 1 437551 14 0 .,60±0. 15 7 ,,5e-05 0.36±0. 11 0 ,.00093 0 ,,44±0,,09 7 ,,3e-07 0 ,.60 0 rs21521 13 C T 1 43756156 0 .,65±0. 15 2 ,,6e-05 0.36±0. 11 0 ,.001 1 0 ,,45±0,,09 4 ,,3e-07 0 ,.50 0 rsl l577403 G A 1 43762360 0 .,66±0. 16 4 ,,4e-05 0.39±0. 12 0 ,.00084 0 ,,47±0,,09 4 ,,2e-07 0 ,.56 0 rs2782640 T C 1 43781620 0 .,64±0. 15 2 ,,5e-05 0.34±0. 11 0 ,.0016 0 ,,40±0,,08 1 ,,8e-06 0 ,.50 0 rs2782641 A G 1 43785942 0 ..62±0. 15 3 ,,2e-05 0.32±0. 11 0 ,.003 0 ,,36±0,,08 8 ,,7e-06 0 ,.63 0 rs2842188 C T 1 43786867 0 ..62±0. 15 2 ,,6e-05 0.36±0. 11 0 ,.00087 0 ,,43±0,,09 4 ,,2e-07 0 ,.43 2 rs2819333 T A 1 43787160 0 .60±0.15 4.5e-05 0.32±0. 11 0 .0034 0 .29±0. 09 0.00077 0 .069 27 rs2819334 T C 1 43787322 0 .58±0.15 6.8e-05 0.36±0. 11 0 .00086 0 .44±0 09 5.2e-07 0 .41 3 rs2842187 c T 1 43787536 0 .60±0.15 7. 1e-05 0.36±0. 11 0 .00083 0 .42±0. 09 9e-07 0 .47 0 rs2842185 T c 1 43792318 0 .57±0.15 7.8e-05 0.33±0. 10 0 .0013 0 .41±0 08 l .le-06 0 .34 7 rsl l 210869 A G 1 43798627 0 .65±0.15 1.3e-05 0.36±0. 11 0 .00092 0 .44±0 09 2.8e-07 0 .39 4 rsl887402 G A 1 43808672 0 .55±0.15 0.00015 0.35±0. 11 0 .00097 0 .42±0. 09 l .le-06 0 .33 8 rs379 1136 C T 1 43822534 0 .59±0.15 7.9e-05 0.35±0. 11 0 .0012 0 .41±0 09 1.3e-06 0 .34 7 rs605709 C T 1 43831054 0 .64±0.15 2.4e-05 0.33±0. 11 0 .0021 0 .42±0. 09 1.2e-06 0 .32 9 rsl7371903 A G 1 43843278 0 .62±0.15 4.5e-05 0.33±0. 11 0 .002 0 .40±0 09 2.2e-06 0 .27 11 rs660899 G T 1 43889593 0 .57±0.15 0.00022 0.37±0. 11 0 .001 0 .43±0. 09 1.6e-06 0 .60 0 rs489319 C T 1 43904381 0 .60±0.15 0.0001 0.33±0. 11 0 .0034 0 .41±0 09 4e-06 0 .31 9 rs618678 C T 1 43905886 0 .6 1±0. 15 9.4e-05 0.33±0. 11 0 .0032 0 .42±0. 09 3.4e-06 0 .32 9 rsl0789442 A c 1 43912662 0 .6 1±0. 15 7.5e-05 0.31±0. 11 0 .0062 0 .41±0 09 6.6e-06 0 .29 10 rs9787076 A c 1 43913736 0 .6 1±0. 15 6.8e-05 0.33±0. 11 0 .0032 0 .42±0. 09 2.7e-06 0 .36 6 rs379 1034 A G 1 43917717 0 .60±0.15 9.2e-05 0.34±0. 11 0 .0021 0 .43±0. 09 2e-06 0 .43 2 rs4660257 T C 1 43920755 0 .65±0.16 4.4e-05 0.35±0. 11 0 .0019 0 .45±0 09 l .le-06 0 .33 8 rsl7401357 G C 1 43926206 0 .65±0.16 4e-05 0.12±0. 11 0 .27 0 .15±0 10 0.12 0 .010 38 rs379 1035 G C 1 43927066 0 .66±0.16 2.7e-05 0.12±0. 11 0 .27 0 .16±0 10 0.11 0 .0072 39 rs2270972 C G 1 43930716 0 .66±0.15 2. 1e-05 0.02±0. 12 0 .84 0 .24±0. 09 0.0088 0 .025 33 rsl2410155 A C 1 43961052 0 .67±0.16 1.9e-05 0.34±0. 11 0 .0027 0 .45±0 09 9.9e-07 0 .30 10 rs379 1040 A G 1 43975320 0 .67±0.16 2.3e-05 0.45±0. 15 0 .003 0 .55±0 11 4.5e-07 0 .58 0 rsl2354267 T C 1 44020859 0 .70±0.17 2.8e-05 0.32±0. 13 0 .013 0 .41±0 10 1.8e-05 0 .32 8 rs2884216 G C 1 231463228 0 .75±0.15 5.4e-07 0.25±0. 11 0 .026 0 .19±0 10 0.047 0 .059 27 rsl2122968 G T 1 231463989 0 .83±0.15 3.9e-08 0.26±0. 11 0 .023 0 .45±0 09 5e-07 0 .56 0 rsl033325 C T 1 231465336 0 .80±0.15 7.8e-08 0.28±0. 11 0 .012 0 .45±0 09 3. 1e-07 0 .57 0 rsl033322 A G 1 231467047 0 .77±0.15 1.7e-07 0.28±0. 11 0 .01 1 0 .44±0 09 3.6e-07 0 .54 0 rsl09 10122 C T 1 231469297 0 .73±0.15 1.8e-06 0.27±0. 11 0 .018 0 .41±0 09 3.3e-06 0 .52 0 rsl l58741 1 C T 1 231476321 0 .77±0.15 1.6e-07 0.28±0. 11 0 .01 1 0 .45±0 09 3.3e-07 0 .61 0 rs6683734 G A 1 231481990 0 ..67±0. 16 1.9e-05 0.21±0. 12 0 .064 0 .37±0. 09 4.3e-05 0 .58 0 rs4649294 T C 1 231483051 0 .60±0.15 7.8e-05 0.31±0. 11 0 .0054 0 .41±0 09 3.7e-06 0 .79 0 rsl2044078 T C 1 231483565 0 .62±0.15 3.9e-05 0.31±0. 11 0 .0055 0 .42±0. 09 2.3e-06 0 .78 0 rsl0737196 G T 1 231486940 0 .68±0.15 4e-06 0.29±0. 11 0 .01 0 .43±0. 09 1.8e-06 0 .69 0 rsl294327 G T 1 231493299 0 .60±0.15 6.6e-05 0.20±0. 12 0 .089 0 .34±0. 09 0.00019 0 .57 0 rsl56 1227 T c 2 45018893 0 .13±0. 07 0.051 -0. 16±0. 11 0 .14 0 .04±0 05 0.42 0 .010 38 rs83995 T G 2 45021074 0 .38±0.11 0.0003 -0. 16±0. 11 0 .13 0 .10±0 07 0.16 0 .0097 38 rs338070 G C 2 45028793 0 ..41±0. 11 0.00025 0.06±0. 10 0 .57 0 .05±0 09 0.54 0 .0087 39 rsl73076 A G 2 45029089 0 ..59±0. 13 3.5e-06 -0. 15±0. 11 0 .17 0 .12±0. 07 0.11 0 .01 1 37 rsl63516 G T 2 45037503 0 ..42±0. 11 0.00027 -0. 13±0. 11 0 .25 0 .10±0 07 0.17 0 .032 32 rs4952728 T c 2 45037927 0 ..52±0. 13 3.3e-05 -0.09±0. 11 0 .4 0 .15±0 08 0.053 0 .067 27 rsl63513 G A 2 45038728 0 .28±0.09 0.0023 -0. 14±0. 11 0 .2 0 .10±0 07 0.15 0 .026 33 rsl983312 A G 2 45040631 0 .49±0.12 6.6e-05 -0.09±0. 11 0 .4 0 .12±0. 07 0.11 0 .075 26 rs340514 T C 2 45042515 0 ..49±0. 12 7.9e-05 -0. 13±0. 11 0 .23 0 .12±0. 08 0.11 0 .029 32 rsl63507 A G 2 45047793 0 ..41±0. 11 0.00037 -0.23±0. 11 0 .042 0 .07±0 07 0.33 0 .020 34 rsl63503 G A 2 45050643 0 .55±0.15 0.00033 -0. 10±0. 11 0 .38 0 .12±0. 09 0.17 0 .12 22 rsl6824949 T G 2 145884678 0 .60±0.13 6.1e-06 0.21±0. 11 0 .054 0 .35±0. 08 1.9e-05 0 .11 22 rsl533427 G A 2 145887503 0 .65±0.15 2e-05 0.18±0. 11 0 .1 0.32±0.09 0 0002 0 .19 16 rsl0192394 T C 2 1460 14477 0 .65±0.15 2.3e-05 0.17±0. 12 0 .14 0.32±0.09 0 00032 0 .16 19 rsl473550 T C 3 65775272 0 .75±0.16 2e-06 0.08±0. 12 0 .52 0.05±0.04 0 27 0 .0093 38 rsl473551 G A 3 65775400 0 .47±0.13 0 00032 0.08±0. 12 0 .52 0.23±0.08 0 0056 0 .092 24 rs883570 T A 3 65776543 0 .75±0.16 1 .8e-06 0.08±0. 12 0 .49 0.02±0.04 0 55 0 .0067 40 rs868633 c T 3 65776574 0 .04±0.04 0 38 0.09±0. 12 0 .48 0.04±0.04 0 28 0 .0062 40 rsl473531 c T 3 65776979 0 .6 1±0. 14 9 8e-06 0.09±0. 12 0 .44 0.28±0.09 0 001 0 .040 30 rsl473530 T G 3 65777027 0 .74±0.16 2 6e-06 0.09±0. 12 0 .46 0.04±0.04 0 27 0 .0095 38 rs6796986 c T 3 65777595 0 .05±0.04 0 21 0.10±0. 12 0 .43 0.06±0.04 0 15 0 .01 1 37 rsl3059631 G c 3 65777697 0 .56±0.12 7 .le-06 0.20±0. 12 0 .09 0.08±0.04 0 069 0 .013 36 rsl2634960 C T 3 65777960 0 .59±0.14 2 3e-05 0.10±0. 12 0 .38 0.27±0.08 0 0013 0 .059 28 rsl2635056 G A 3 65778128 0 .54±0.12 9 6e-06 0.10±0. 12 0 .42 0.06±0.04 0 14 0 .01 1 37 rsl l08716 C A 3 65779300 0 .43±0.11 7 8e-05 0.10±0. 12 0 .4 0.07±0.04 0 12 0 .012 37 rsl l08718 G A 3 65779623 0 .67±0.14 1 .8e-06 0.12±0. 11 0 .28 0.30±0.08 0 00022 0 .069 26 rs2888253 C A 3 65780675 0 .64±0.13 2 le-06 0.13±0. 11 0 .25 0.32±0.08 0 00011 0 .037 31 rsl2485806 T C 3 65781 126 0 .57±0.13 1 .8e-05 O. l liO. l l 0 .31 0.28±0.08 0 00042 0 .042 30 rs716244 A T 3 65781513 0 .10±0. 06 0 089 0.09±0. 12 0 .47 0.06±0.05 0 26 0 .01 1 37 rsl495456 C T 3 65781939 0 .11±0. 06 0 086 0.07±0. 12 0 .55 0.10±0.06 0 075 0 .019 35 rsl495457 T c 3 65782138 0 .35±0.10 0 00073 0.09±0. 12 0 .44 0.22±0.07 0 0033 0 .061 27 rs7624282 T c 3 65782553 0 .35±0.12 0 0021 0.09±0. 12 0 .43 0.21±0.08 0 0086 0 .064 27 rs2036069 A G 3 65782762 0 .58±0.13 1 .6e-05 0.13±0. 12 0 .27 0.24±0.08 0 002 0 .085 25 rs4688248 C A 3 65784256 0 .40±0.11 0 00026 0.12±0. 11 0 .3 0.25±0.07 0 00077 0 .044 29 rsl874320 C A 3 65784882 0 .55±0.13 2 .le-05 0.09±0. 12 0 .45 0.09±0.05 0 092 0 .020 34 rs2372151 T A 3 65785178 0 .20±0.08 0 0067 O. l liO. l l 0 .32 0.14±0.06 0 023 0 .016 35 rs 149 5448 T G 3 65785954 0 .53±0.13 2 5e-05 0.14±0. 11 0 .22 0.29±0.08 0 0002 0 .043 30 rsl7372566 T A 3 65787443 0 .36±0.10 0 00032 O. l liO. l l 0 .34 0.19±0.07 0 0088 0 .017 35 rsl2490463 A C 3 65788413 0 .58±0.13 8e-06 0.08±0. 11 0 .49 0.24±0.08 0 0025 0 .13 21 rsl0935353 G A 3 1410 16599 0 .56±0.14 4 4e-05 -0. 13±0. 10 0 .22 0.10±0.07 0 17 0 .093 24 rsl0935354 G A 3 141026326 0 .53±0.14 0 00011 -0. 10±0. 10 0 .33 0.11±0.08 0 15 0 .14 20 rsl0935356 C T 3 141026782 0 ..59±0. 14 2 4e-05 -0. 10±0. 11 0 .35 0.13±0.08 0 094 0 .079 25 rs6777464 C T 3 141029906 0 .56±0.14 4 8e-05 -0. 12±0. 10 0 .24 0.10±0.07 0 17 0 .10 23 rs642 1891 G c 5 124100202 0 .58±0.15 0 00018 0.05±0. 11 0 .63 0.05±0. 11 0 66 0 .043 30 rs7718029 G A 5 124101912 0 .56±0.15 0 0002 0.07±0. 11 0 .53 0.25±0.09 0 0036 0 .17 18 rs48361 14 C G 5 1241 10481 0 .76±0.16 3 7e-06 -0.05±0. 15 0 .74 -0.05±0.09 0 55 0 .026 33 rs7705693 C T 5 1241 12264 0 ..72±0. 15 1 .8e-06 0.04±0.20 0 .86 0.45±0. 12 0 00012 0 .045 32 rs883322 G T 5 166920252 0 .48±0.14 0 00046 0.18±0. 11 0 .12 0.28±0.09 0 00094 0 .086 24 rs888976 A G 5 166920483 0 .5 1±0. 13 0 00012 0.16±0. 11 0 .15 0.29±0.08 0 00053 0 .056 28 rs888975 T C 5 166921084 0 ..54±0. 17 0 0016 0.02±0. 12 0 .86 0.22±0. 10 0 038 0 .0068 40 rsl862347 T C 5 166921 163 0 .62±0.15 3 4e-05 0.16±0. 11 0 .14 0.31±0.09 0 00036 0 .056 28 rs888974 G T 5 166921545 0 ..64±0. 14 8 4e-06 0.17±0. 11 0 .13 0.31±0.08 0 00022 0 .082 25 rsl0071347 A G 5 166921820 0 .66±0.15 1 . 3e-05 0.16±0. 11 0 .14 0.31±0.09 0 00038 0 .057 28 rs2336894 T G 5 166923173 0 .6 1±0. 14 9 6e-06 0.17±0. 11 0 .13 0.33±0.08 0 00012 0 .052 28 rs2080976 T C 5 166923617 0 ..62±0. 15 3 7e-05 0.17±0. 11 0 .13 0.31±0.09 0 00037 0 .060 27 rs2098651 c G 5 166923848 0 ..04±0. 03 0 26 0.16±0. 11 0 .14 0.02±0.04 0 65 0 .0076 39 rsl3186288 T C 5 166924274 0 .65±0.16 3 4e-05 0.17±0. 11 0 .14 0.31±0.09 0 00042 0 .057 28 rs4869056 G A 5 166924656 0 .04±0.05 0 35 0 18±0. 11 0 1 0 07±0 04 0.12 0 .017 35 rs4869058 T A 5 166924825 0 .58±0.15 7 .le-05 0 17±0. 11 0 13 0 24±0 09 0.0055 0 .021 34 rsl l747772 c T 5 166925286 0 .56±0.14 0 0001 0 16±0. 11 0 15 0 28±0 09 0.00096 0 .067 26 rsl l7381 10 T G 5 166925338 0 .56±0.14 0 00011 0 17±0. 11 0 13 0 30±0 09 0.00051 0 .036 31 rs986391 G A 5 166926550 0 .66±0.16 2 3e-05 0 13±0. 11 0 25 0 29±0 09 0.00099 0 .022 34 rsl2188010 A T 5 166929044 0 .50±0.13 0 00015 0 18±0. 11 0 099 0 24±0 08 0.0029 0 .032 31 rs4267883 C T 5 166929300 0 .58±0.14 6 9e-05 0 18±0. 11 0 099 0 30±0 09 0.00043 0 .061 27 rs4324704 G c 5 166929514 0 .64±0.16 3 3e-05 0 04±0.12 0 76 0 22±0 09 0.01 0 .012 37 rsl2188278 A G 5 166930800 0 .49±0.14 0 00053 0 14±0. 11 0 21 0 26±0 09 0.0027 0 .0076 39 rsl0039321 T C 5 166930963 0 .50±0.16 0 0014 0 14±0. 11 0 21 0 26±0 09 0.0046 0 .0056 40 rsl0042499 A G 5 166931 157 0 .36±0.13 0 0065 0 14±0. 11 0 21 0 22±0 08 0.0093 0 .0059 40 rsl3153563 T C 5 166932406 0 .27±0.11 0 012 0 12±0. 11 0 28 0 19±0 08 0.012 0 .0024 43 rsl3160227 G A 5 166935946 0 .4 1±0.15 0 0066 0 12±0. 11 0 29 0 21±0 09 0.015 0 .0021 44 rsl0475853 G T 5 166936147 0 .14±0. 08 0 092 0 14±0. 11 0 22 0 13±0 07 0.042 0 .0031 42 rsl024993 C T 5 166938587 0 .49±0.15 0 0011 0 11±0. 11 0 31 0 23±0 09 0.0092 0 .0038 42 rsl024994 T c 5 166939254 0 .48±0.15 0 00092 0 13±0. 12 0 26 0 26±0 09 0.0039 0 .0067 39 rs9313385 A G 5 166939971 0 .57±0.15 0 00015 0 14±0. 11 0 23 0 27±0 09 0.002 0 .0063 40 rs278016 A G 5 166951589 0 .56±0.16 0 00035 0 14±0. 11 0 22 0 26±0 09 0.0034 0 .0073 39 rsl l750548 G A 5 166958525 0 .50±0.15 0 00077 0 15±0. 11 0 16 0 27±0 09 0.0022 0 .0057 40 rs4869061 C T 5 166962903 0 .55±0.16 0 00043 0 12±0. 11 0 28 0 24±0 09 0.0053 0 .0034 42 rsl l738133 C T 5 166964671 0 .53±0.16 0 00093 0 12±0. 11 0 26 0 24±0 09 0.0074 0 .0028 43 rs4869062 T c 5 166965607 0 .47±0.15 0 0016 0 12±0. 11 0 27 0 24±0 09 0.0066 0 .0029 42 rs4868804 A G 5 166965813 0 .57±0.15 0 0001 0 15±0. 12 0 2 0 29±0 09 0.0012 0 .0049 4 1 rs898 171 G A 5 166966356 0 .46±0.14 0 0013 0 15±0. 12 0 2 0 26±0 09 0.003 0 .0048 4 1 rs73271 1 A C 5 166966645 0 .59±0.16 0 00016 0 10±0. 11 0 37 0 25±0 09 0.005 0 .0026 43 rsl0475856 A G 5 166968159 0 .58±0.15 0 00014 0 15±0. 12 0 2 0 29±0 09 0.0015 0 .0049 4 1 rs981898 A T 5 166968792 0 ..57±0. 16 0 0004 0 15±0. 12 0 2 0 29±0 09 0.002 0 .0060 40 rs6893866 A c 5 166970303 0 .58±0.16 0 00038 0 15±0. 12 0 21 0 28±0 09 0.0031 0 .0042 42 rsl l l34465 G A 5 166970512 0 .5 1±0. 16 0 0016 0 11±0. 12 0 35 0 24±0 09 0.0098 0 .0042 43 rsl 1134466 T C 5 166972188 0 .6 1±0. 17 0 00025 0 12±0. 12 0 33 0 27±0 10 0.0041 0 .0046 4 1 rsl l738927 A G 5 166972378 0 ..59±0. 16 0 00025 0 11±0. 12 0 37 0 26±0 09 0.0057 0 .0033 42 rsl l743417 G A 5 166975676 0 .48±0.16 0 0021 0 14±0. 12 0 25 0 25±0 09 0.0064 0 .0019 44 rsl459066 C T 5 166977244 0 .6 1±0. 17 0 00023 0 15±0. 13 0 22 0 29±0 10 0.0025 0 .0057 40 rs2336897 C T 5 166982854 0 .6 1±0. 17 0 00023 0 16±0. 13 0 2 0 30±0 10 0.0021 0 .0083 39 rs962065 T c 5 166983151 0 .56±0.16 0 00058 0 16±0. 13 0 2 0 31±0 10 0.002 0 .0087 39 rsl966924 c A 5 166985787 0 .57±0.16 0 0005 0 16±0. 13 0 21 0 31±0 10 0.0021 0 .0099 38 rs2336898 A G 5 166988514 0 ..62±0. 17 0 0002 0 12±0. 13 0 34 0 29±0 10 0.004 0 .0074 39 rsl2701627 G A 7 38397140 0 ..02±0. 02 0 13 0 01±0.08 0 93 0 02±0 02 0.13 0 .0030 42 rs2072507 T C 7 3840051 1 0 ..2 1±0. 07 0 0032 0 02±0.10 0 85 0 15±0 06 0.01 1 0 .16 19 rs720667 T C 7 38400790 0 ..24±0. 08 0 0017 0 03±0.11 0 8 1 0 16±0 06 0.0073 0 .18 17 rs720668 c T 7 38400798 0 ..64±0. 14 4e-06 0 06±0.11 0 6 1 0 27±0 08 0.0015 0 .074 26 rs715413 T c 7 38401048 0 .08±0.04 0.057 0 02±0.10 0 8 1 0 08±0 04 0.062 0 .10 23 rs69761 11 A c 7 117282903 0 .20±0.08 0.0089 0 42±0.12 0 00053 0 27±0 07 3.8e-05 0 .0040 4 1 rs7776980 T c 7 117283555 0 .53±0.14 0.00015 0 37±0.12 0 0012 0 43±0 09 le-06 0 .13 21 rsl0252771 G T 7 117284126 0 .58±0.14 5.8e-05 0 38±0.12 0 001 0 44±0 09 6.8e-07 0 .12 21 rsl0259910 G T 7 117291218 0 .57±0.14 6.5e-05 0 37±0.11 0 .001 1 0 43±0 09 8 3e-07 0 12 22 rsl0244364 C T 7 1173 16877 0 .57±0.14 7. 1e-05 0 38±0.11 0 .00067 0 44±0 09 4.4e-07 0 12 22 rs727 164 A G 7 1173 19388 0 .5 1±0. 14 0.00036 0 38±0.11 0 .00087 0 42±0 09 2e-06 0 076 27 rs6952555 C T 7 117322514 0 .58±0.14 6e-05 0 37±0.12 0 .0012 0 45±0 09 6 3e-07 0 10 23 rs7807019 G A 7 117330299 0 .14±0. 08 0.069 0 29±0.11 0 .0075 0 19±0 06 0 0021 0 046 29 rsl7488728 T C 7 117331416 0 .26±0.10 0.0088 0 26±0.11 0 .014 0 26±0 07 0 00032 0 12 21 rsl0266994 c T 7 117333073 0 .64±0.14 7e-06 0 26±0.10 0 .014 0 26±0 07 0 00033 0 090 24 rsl4771 14 T c 7 117334918 0 .26±0.10 0.0085 0 26±0.10 0 .014 0 26±0 07 0 00033 0 091 24 rs7789130 G T 7 117335533 0 .64±0.14 6.7e-06 0 26±0.10 0 .014 0 26±0 07 0 00032 0 092 24 rsl2706159 C A 7 117335743 0 .26±0.10 0.0083 0 25±0.10 0 .015 0 26±0 07 0 00033 0 096 23 rs2193257 G A 7 117337480 0 .25±0.10 0.01 1 0 25±0.10 0 .016 0 25±0 07 0 00049 0 11 22 rs6466630 G A 7 117339057 0 .26±0.10 0.0089 0 25±0.10 0 .015 0 25±0 07 0 00037 0 088 24 rs6964051 C T 7 117341868 0 .64±0.14 6.8e-06 0 28±0.11 0 .01 0 27±0 07 0 00024 0 10 23 rsl2706160 T c 7 117342153 0 .26±0.10 0.009 0 28±0.11 0 .0097 0 27±0 07 0 00024 0 10 23 rsl548460 T c 7 117354727 0 .13±0. 07 0.087 0 27±0.11 0 .01 1 0 18±0 06 0 0038 0 042 30 rs6466636 T A 7 117356633 0 .26±0.10 0.0089 0 27±0.11 0 .01 1 0 25±0 07 0 00053 0 087 24 rsl0272923 c T 7 117359540 0 .64±0.14 8.1e-06 0 28±0.11 0 .01 1 0 27±0 07 0 00028 0 10 23 rs6950622 T c 7 117360641 0 .26±0.10 0.0087 0 28±0.11 0 .01 0 27±0 07 0 00025 0 12 22 rsl l978052 A c 7 117363804 0 .65±0.14 7e-06 0 28±0.11 0 .0099 0 27±0 07 0 00025 0 11 22 rs697 1964 T c 7 117364617 0 .26±0.10 0.0096 0 29±0.11 0 .0086 0 27±0 07 0 00022 0 11 22 rs2158050 G c 7 117366098 0 .26±0.10 0.009 0 20±0.11 0 .08 0 24±0 07 0 0011 0 068 26 rsl024432 T G 7 117366565 0 .26±0.10 0.0089 0 38±0.14 0 .0072 0 30±0 08 0 00021 0 16 22 rsl024433 c T 7 117366759 0 .65±0.14 5.6e-06 0 33±0.12 0 .0061 0 29±0 08 0 00015 0 40 4 rsl024434 A c 7 117367289 0 .65±0.14 5.2e-06 0 39±0.14 0 .0063 0 31±0 08 0 00016 0 15 23 rs6950716 T A 7 117367969 0 .26±0.10 0.0084 0 30±0.11 0 .0073 0 26±0 07 0 00041 0 095 24 rsl0276758 A G 7 117369189 0 .58±0.14 4.8e-05 0 38±0.12 0 .0012 0 45±0 09 5 8e-07 0 15 19 rs7782815 G A 7 117373015 0 .60±0.14 1.8e-05 0 41±0.12 0 .00049 0 48±0 09 7 2e-08 0 23 14 rsl0487380 G A 7 117373527 0 ..64±0. 15 1.2e-05 0 38±0.12 0 .0012 0 48±0 09 1 8e-07 0 16 18 rs6959314 G A 7 117374593 0 .26±0.10 0.0096 0 30±0.11 0 .0079 0 28±0 07 0 00021 0 12 21 rsl3221302 G A 7 117375933 0 .26±0.10 0.0097 0 30±0.11 0 .0082 0 27±0 07 0 00022 0 13 21 rs6966339 A G 7 117376444 0 ..62±0. 14 1.3e-05 0 29±0.11 0 .0088 0 27±0 07 0 00028 0 12 21 rs6943120 C G 7 117378624 0 .6 1±0. 14 1.6e-05 0 22±0.11 0 .045 0 12±0 08 0 13 0 018 35 rsl989880 G C 7 117379136 0 .25±0.10 0.01 1 0 18±0. 11 0 .11 0 23±0 07 0 0019 0 071 26 rs6969783 A T 7 117380544 0 ..64±0. 14 5.9e-06 0 28±0.11 0 .012 0 20±0 07 0 0078 0 035 31 rsl l56954 A G 7 117382074 0 ..54±0. 14 0.00015 0 39±0.12 0 .0012 0 44±0 09 1 6e-06 0 30 10 rs916784 G A 7 117386750 0 .44±0.14 0.0015 0 14±0. 11 0 .19 0 26±0 08 0 0014 0 19 16 rs780 1876 G A 7 117389362 0 ..39±0. 14 0.0045 0 14±0. 11 0 .19 0 25±0 08 0 0026 0 22 14 rsl013278 C G 7 117391056 0 ..44±0. 13 0.001 1 0 13±0. 10 0 .22 0 08±0 09 0 42 0 086 24 rsl468183 A G 7 117392950 0 .45±0.14 0.001 1 0 13±0. 11 0 .22 0 24±0 08 0 0036 0 24 13 rs7784849 G A 7 117398486 0 ..41±0. 14 0.0034 0 14±0. 11 0 .19 0 25±0 08 0 0022 0 24 13 rsl0280709 T C 7 117400345 0 .40±0.14 0.0049 0 14±0. 11 0 .21 0 25±0 08 0 0032 0 22 14 rsl0237233 c A 7 117402159 0 ..41±0. 14 0.0034 0 14±0. 11 0 .21 0 25±0 08 0 0025 0 21 15 rsl0226992 c T 7 117402553 0 ..47±0. 14 0.00059 0 13±0. 11 0 .21 0 25±0 08 0 0023 0 21 15 rs7793280 T c 7 117405155 0 ..39±0. 14 0.006 0 14±0. 11 0 .2 0 24±0 08 0 0036 0 21 15 rsl21 11806 c G 7 117407087 0 .46±0.14 0.00077 0 13±0. 11 0 .23 0 08±0 09 0 4 1 0 059 27 rs970 185 A T 7 117415419 0 .45±0.14 0.00085 0 13±0. 11 0 .22 0 18±0 08 0 026 0 .12 21 rs989996 T c 7 117429488 0 .58±0.14 3.6e-05 0 24±0.11 0 .022 0 37±0 08 8 9e-06 0 .56 0 rsl3438629 A T 7 117435455 0 .59±0.14 3e-05 0 26±0.11 0 .013 0 25±0 08 0 0033 0 .17 18 rsl0249457 C A 7 117439807 0 .58±0.15 0.0001 0 27±0.11 0 .01 1 0 38±0 08 9 .le-06 0 .55 0 rs739619 C G 7 117443502 0 .60±0.14 1.4e-05 0 17±0. 11 0 .1 0 12±0 09 0 21 0 .063 27 rsl02401 10 G C 7 117450234 0 .59±0.15 7.6e-05 0 22±0.11 0 .033 0 34±0 08 5 3e-05 0 .41 3 rsl0255829 C T 7 117450335 0 .60±0.14 2.2e-05 0 26±0.11 0 .013 0 37±0 08 8 le-06 0 .61 0 rsl7168159 C T 7 134321 159 0 .82±0.2 1 0.0001 0 16±0. 14 0 .25 0 36±0 12 0 0021 0 .044 30 rsl l983164 T c 7 134328121 0 .78±0.20 0.00014 0 17±0. 14 0 .25 0 36±0 11 0 0018 0 .12 22 rsl l973318 T c 7 134331846 0 .87±0.22 6.4e-05 0 15±0. 14 0 .3 0 35±0 12 0 0023 0 .059 28 rs4565407 A G 7 134346538 0 .86±0.19 7.8e-06 0 14±0. 14 0 .31 0 39±0 11 0 00059 0 .074 26 rs4329203 A C 7 134347280 0 .84±0.18 2.5e-06 0 14±0. 14 0 .3 0 41±0 11 0 00022 0 .052 28 rs4415249 C A 7 134347420 0 .87±0.22 6e-05 0 14±0. 14 0 .34 0 35±0 12 0 0027 0 .043 30 rsl l22979 A G 7 150546004 0 .11±0. 05 0.022 -0.04±0.24 0 .87 0 10±0 05 0 032 0 .076 25 rs7812088 A G 7 150550762 0 .43±0.14 0.0016 -0.04±0.21 0 .83 0 22±0 10 0 027 0 .083 25 rs778 1265 A G 7 150581873 0 .04±0.02 0.077 -0.06±0.20 0 .76 0 03±0 02 0 086 0 .087 24 rsl0891481 G A 11 112335772 0 .30±0.10 0.003 0 29±0.11 0 .007 0 30±0 07 6 3e-05 0 .18 17 rs7937151 G T 11 112340234 0 .59±0.14 2.5e-05 0 31±0.11 0 .0043 0 33±0 08 2 7e-05 0 .19 16 rs2155281 A G 11 112343548 0 .54±0.13 4e-05 0 29±0.11 0 .0069 0 28±0 07 9 3e-05 0 .17 17 rs720023 T A 11 112344077 0 .27±0.10 0.0046 0 29±0.11 0 .0068 0 21±0 07 0 0029 0 .057 28 rs7948789 G A 11 112344742 0 .28±0.10 0.0045 0 30±0.11 0 .0063 0 29±0 07 8 .le-05 0 .15 19 rs7126748 C T 11 112348186 0 .6 1±0. 14 1.7e-05 0 30±0.11 0 .0059 0 28±0 07 0 00013 0 .13 21 rs71 10863 G A 11 112348348 0 .25±0.09 0.0076 0 32±0.11 0 .0031 0 28±0 07 7 le-05 0 .12 21 rsl 12 14441 A T 11 112351923 0 .60±0.13 7.7e-06 0 29±0.11 0 .0068 0 32±0 08 9 2e-05 0 .090 24 rs2186707 A T 11 112355853 0 .56±0.13 2.3e-05 0 31±0.11 0 .0052 0 31±0 08 0 00012 0 .11 22 rs2155290 G c 11 112356278 0 .49±0.13 0.0001 1 0 15±0. 11 0 .17 0 21±0 08 0 0082 0 .023 33 rs2298527 G c 11 112357171 0 .58±0.13 1.7e-05 0 16±0. 11 0 .14 0 19±0 07 0 0076 0 .049 29 rsl940727 T G 11 112357798 0 ..49±0. 13 9.3e-05 0 31±0.11 0 .005 0 38±0 08 3 7e-06 0 .27 11 rsl940724 G A 11 112358156 0 .48±0.12 0.0001 0 30±0.11 0 .0054 0 38±0 08 4 le-06 0 .29 10 rs712 1047 T A 11 112366103 0 .33±0.11 0.002 0 31±0.11 0 .005 0 32±0 08 3 2e-05 0 .17 17 rsl0891487 A G 11 112374264 0 .56±0.13 3e-05 0 30±0.11 0 .006 0 37±0 08 7 4e-06 0 .24 13 rs4589334 C G 11 112384666 0 ..62±0. 14 8.4e-06 0 20±0.11 0 .059 0 14±0 09 0 13 0 .010 37 rs71 13596 A C 11 112388971 0 .44±0.12 0.00022 0 31±0.11 0 .0051 0 34±0 08 1 5e-05 0 .22 14 rsl0732853 G C 11 112392620 0 ..42±0. 12 0.00067 0 14±0. 11 0 .17 0 27±0 08 0 00063 0 .084 25 rs999851 G A 11 112395056 0 ..49±0. 13 0.00025 0 31±0.11 0 .0048 0 37±0 08 8 3e-06 0 .27 11 rsl940733 A G 11 112397484 0 .6 1±0. 14 1.7e-05 0 31±0.11 0 .0049 0 38±0 08 5 2e-06 0 .26 12 rs7926312 G A 11 112399066 0 .55±0.14 4.8e-05 0 31±0.11 0 .0047 0 39±0 08 2 8e-06 0 .26 12 rsl940734 C G 11 112400238 0 .6 1±0. 14 1.3e-05 0 20±0.11 0 .054 0 13±0 09 0 16 0 .010 38 rsl0750022 T G 11 112401523 0 ..52±0. 14 0.0001 1 0 31±0.11 0 .005 0 38±0 08 5 le-06 0 .29 10 rsl892983 c T 11 112404424 0 ..62±0. 14 1.3e-05 0 30±0.11 0 .0065 0 38±0 08 5e-06 0 .28 11 rsl 12 14469 A T 11 112405553 0 ..42±0. 12 0.00037 0 30±0.11 0 .0053 0 24±0 08 0.0023 0 .047 29 rs71 13099 A T 11 112409545 0 .6 1±0. 14 2e-05 0 30±0.11 0 .006 0 28±0 08 0.0008 0 .066 27 rsl940712 C G 11 11241 1428 0 .50±0.13 0.0001 0 19±0. 10 0 .063 0 11±0 09 0.22 0 .013 36 rs7948327 A C 11 112415287 0 .6 1±0. 14 2e-05 0.29±0. 11 0 .007 0 37±0 08 8 5e-06 0 .26 12 rs7938812 G T 11 112416214 0 .6 1±0. 14 1.4e-05 0.31±0. 11 0 .0036 0 39±0 08 2 7e-06 0 .31 9 rs7942723 G T 11 112417049 0 .6 1±0. 14 1.8e-05 0.29±0. 11 0 .0068 0 37±0 08 7 7e-06 0 .26 12 rs3802847 C T 11 112417513 0 .14±0. 06 0.035 0.29±0. 11 0 .0075 0 17±0 06 0 0018 0 .054 28 rs3802848 C A 11 112417597 0 .5 1±0. 14 0.00017 0.29±0. 11 0 .0068 0 37±0 08 1 .le-05 0 .26 12 rs3802850 C A 11 112417728 0 .52±0.14 0.0001 1 0.29±0. 11 0 .007 0 37±0 08 8 .le-06 0 .27 11 rs2186874 C T 11 112417934 0 .48±0.12 8.8e-05 0.29±0. 11 0 .0071 0 35±0 08 1 2e-05 0 .21 15 rs2663907 A c 15 79 160087 0 .67±0.17 6e-05 -0. 10±0. 15 0 .51 0 22±0 11 0 039 0 .0013 46 rs868954 G A 15 79 169043 0 .7 1±0. 16 1.4e-05 -0.00±0.01 0 .97 0 29±0 10 0 0049 0 .0015 46 rsl0852660 T C 16 5549163 0 .57±0.14 6.1e-05 0.04±0. 12 0 .75 0 23±0 09 0 0078 0 .01 1 37 rs9888783 G C 16 5549316 0 .63±0.15 2.4e-05 0.06±0. 13 0 .66 0 04±0 14 0 76 0 .0078 39 rs9888773 C G 16 5549451 0 .63±0.15 2.5e-05 0.14±0. 12 0 .24 0 30±0 09 0 00064 0 .030 32 rs9888774 A G 16 5549464 0 .62±0.15 3. 1e-05 0.02±0. 13 0 .89 0 22±0 09 0 0 11 0 .013 37 rs2880356 T C 16 5549672 0 .6 1±0. 15 4.4e-05 0.02±0. 13 0 .9 0 22±0 09 0 013 0 .014 36 rsl7790267 c T 16 5550469 0 .60±0.15 5.3e-05 0.02±0. 13 0 .91 0 21±0 09 0 015 0 .019 35 rsl969139 T c 16 5551 122 0 ..59±0. 15 6.5e-05 0.02±0. 13 0 .88 0 21±0 09 0 015 0 .023 34 rsl948951 A G 16 5553998 0 .60±0.15 3.8e-05 0.05±0. 12 0 .68 0 24±0 09 0 0062 0 .014 36 rs7186722 A G 16 5556055 0 .45±0.13 0.00041 -0.02±0. 10 0 .84 0 17±0 08 0 035 0 .12 21 Table 10. Association with Nicotine Dependence. Shown are the number of cases and controls (N), the frequencies of the effect allele (see Table 7) in cases and controls, the odds ratio and 95% confidence intervals (OR and 95%CI), the P value for the test of association (P). The results for Lung Cancer are shown in Table 11 .

N Freq SNP-Allele Population cases controls cases controls OR (95% CI) P rsl051 730-A Iceland 1976 36 147 0 384 0 343 1 20 ( 1 12, 1 28) 1 0 10 7 Chrs 15 NTR-NESDA 835 6 11 0 286 0 34 1 28 ( 1 09, 1 50) 0 0026 Combined - - - - 1 2 1 ( 1 14, 1 29) 1 3 10 9 rs6474412-T Iceland 1979 36202 0 785 0 771 1 09 ( 1 01, 1 18) 0 032 Chrs 8 NTR-NESDA 835 6 11 0 833 0 846 1 10 (0 89, 1 36) 0 38 Combined - - - - 1 09 ( 1 01, 1 17) 0 020 rsl3280604-A Iceland 1979 36202 0 785 0 771 1 09 ( 1 01, 1 18) 0 032 Chrs 8 NTR-NESDA 835 6 11 0 833 0 846 1 10 (0 89, 1 36) 0 39 Combined - - - - 1 09 ( 1 01, 1 17) 0 021 rs215614-G Iceland 1979 36202 0 37 0 355 1 07 ( 1 00, 1 14) 0 050 Chrs 7 NTR-NESDA 835 6 11 0 342 0 341 0 99 (0 86, 1 15) 0 95 Combined - - - - 1 06 (0 99, 1 12) 0 080 rs215605-G Iceland 1979 36 163 0 372 0 357 1 07 ( 1 00, 1 14) 0 055 Chrs 7 NTR-NESDA 835 6 11 0 346 0 345 1 00 (0 84, 1 17) 0 96 Combined - - - - 1 06 (0 99, 1 12) 0 078 rs7937-T Iceland 1975 36121 0 548 0 549 1 00 (0 93, 1 06) 0 92 Chrs 19 NTR-NESDA 835 6 11 0 529 0 531 1 0 1 (0 83, 1 23) 0 93 Combined - - - - 1 00 (0 94, 1 06) 0 95 rsl801272-A Iceland 1979 36202 0 96 0 964 0 90 (0 69, 1 18) 0 45 Chrs 19 NTR-NESDA 835 6 11 0 987 0 981 0 67 (0 30, 1 46) 0 3 1 Combined - - - - 0 88 (0 68, 1 13) 0 30 rs4105144-C Iceland 1979 36202 0 697 0 706 0 96 (0 87, 1 06) 0 40 Chrs 19 NTR-NESDA 835 6 11 0 681 0 665 0 93 (0 74, 1 16) 0 5 1 Combined - - - - 0 96 (0 88, 1 04) 0 30 rs7260329-G Iceland 1969 35982 0 678 0 669 1 04 (0 97, 1 11) 0 25 Chrs 19 NTR-NESDA 835 6 11 0 67 0 697 1 14 (0 92, 1 40) 0 23 Combined - - - - 1 05 (0 98, 1 12) 0 14 Table 11. Association of SNPs in 4 chromosomal regions with Lung Cancer in four populations. Shown are the number of cases and controls (N), the frequencies of the effect allele (see Table 1) in cases and controls, the odds ratio and 95% confidence intervals (OR and 95%CI), the P value for the test of association (P).

N Fre Population case control case control OR (95% CI) P rs6474412-T, chromosome 8pl l Iceland 839 36,606 0 784 0 770 1 08 (0 96, 1 22) 0 19 Denver 192 856 0 805 0 790 1 09 (0 83, 1 44) 0 53 Spain 351 1,195 0 819 0 764 1 40 ( 1 13, 1 72) 0 0019 Netherlands 515 769 0 828 0 809 1 13 (0 92, 1 39) 0 23 IARCa 2,506 1,9 14 0 763 0 778 1 10 (0 99, 1 22) 0 072 Combined 4,403 41,340 - - 1 12 ( 1 05, 1 20) 0 00060

rs215614-G, chromosome 7pl4 Iceland 839 36,606 0 366 0 355 1 05 (0 95, 1 16) 0 37 Denver 195 864 0 403 0 376 1 12 (0 89, 1 40) 0 33 Spain 450 1,281 0 370 0 335 1 17 ( 1 00, 1 37) 0 055 Netherlands 502 1,709 0 366 0 367 0 99 (0 86, 1 15) 0 92 IARCa 2,5 13 1,9 17 0 344 0 365 1 09 ( 1 00, 1 19) 0 057 Combined 4,499 42,377 - - 1 07 ( 1 02, 1 13) 0 011

rs7937-T, chromosome 19ql3 Iceland 836 36,552 0 555 0 549 1 03 (0 93, 1 13) 0 60 Denver 193 864 0 567 0 595 0 89 (0 71, 1 12) 0 32 Spain 453 1,330 0 532 0 512 1 08 (0 93, 1 26) 0 3 1 Netherlands 528 1,629 0 552 0 548 1 02 (0 89, 1 17) 0 80 IARC 2,5 18 1,921 0 559 0 580 1 09 ( 1 00, 1 18) 0 048 Combined 4,528 42,296 - - 1 05 (0 99, 1 10) 0 080

rs41 05144-C, chromosome 19ql3 Iceland 839 36,606 0 .713 0 705 1 04 (0 90, 1 20) 0 61 Denver 193 848 0 .725 0 688 1 20 (0 94, 1 53) 0 14 Spain 437 1,288 0 .669 0 620 1 24 ( 1 06, 1 46) 0 0085 Netherlands 513 1,665 0 .638 0 640 0 99 (0 86, 1 15) 0 93 Combined 1,982 40,407 - - 1 09 ( 1 00, 1 18) 0 040

rs7260329-G, chromosome 19ql3 Iceland 831 36,454 0 .688 0 669 1 09 (0 98, 1 21) 0 11 Denver 189 808 0 .728 0 694 1 18 (0 92, 1 51) 0 20 Spain 457 1,305 0 .702 0 674 1 14 (0 97, 1 35) 0 11 Netherlands 519 1,660 0 .701 0 678 1 12 (0 96, 1 30) 0 15 IARC 2,481 1,899 0 .662 0 670 1 02 (0 95, 1 10) 0 61 Combined 4,477 42,126 - - 1 06 ( 1 00, 1 12) 0 041

For IARC, results for rs6474412 and rs215614 were not available and here w e report results for rs6474414 and rs215605, respectively, both of which are perfect surrogates in the HapMap CEU samples ( r = l). References

1. WHO Report on the Global Tobacco Epidemic, 2008. (2008). 2 . Bierut, L.J. et al. Novel genes identified in a high-density genome wide association study for nicotine dependence. Hum Mol Genet 16, 24-35 (2007). 3 . Saccone, S.F. et al. Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs. Hum Mol Genet 16, 36-49 (2007). 4 . Rose, R., Broms, U., Korhonen, T., Dick, D. & Kaprio, J. Genetics of Smoking Behavior, in Handbook of Behavior Genetics (ed. Kim, Y.) (Springer Science + Business Media, 2009). 5 . Li, M.D., Cheng, R., Ma, J.Z. & Swan, G.E. A meta-analysis of estimated genetic and environmental effects on smoking behavior in male and female adult twins. Addiction 98, 23- 3 1 (2003). 6 . Koopmans, J.R., Slutske, W.S. , Heath, A.C. , Neale, M.C. & Boomsma, D.I. The genetics of smoking initiation and quantity smoked in Dutch adolescent and young adult twins. Behav Genet 29, 383-93 ( 1999). 7 . Thorgeirsson, T.E. et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638-42 (2008). 8 . Amos, C.I. et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25. 1. Nat Genet 40, 616-22 (2008). 9 . Hung, R.J. et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633-7 (2008). 10. Pillai, S.G. et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD) : identification of two major susceptibility loci. PLoS Genet 5, el00042 1 (2009). 11. Stevens, V.L. et al. Nicotinic receptor gene variants influence susceptibility to heavy smoking . Cancer Epidemiol Biomarkers Prev 17, 3517-25 (2008). 12. Collins, A.C. , Salminen, O., Marks, M.J. , Whiteaker, P. & Grady, S.R. The road to discovery of neuronal nicotinic cholinergic receptor subtypes. Handb Exp Pharmacol, 85- 112 (2009). 13. Mineur, Y.S. & Picciotto, M.R. Genetics of nicotinic acetylcholine receptors: Relevance to nicotine addiction. Biochem Pharmacol 75, 323-33 (2008). 14. West, K.A. et al. Rapid Akt activation by nicotine and a tobacco carcinogen modulates the phenotype of normal human airway epithelial cells. J Clin Invest 111, 81-90 (2003). 15. Ray, R., Tyndale, R. F. & Lerman, C. Nicotine dependence pharmacogenetics : role of genetic variation in nicotine-metabolizing enzymes. J Neurogenet 23, 252-6 1 (2009). 16. Miksys, S., Lerman, C , Shields, P.G., Mash, D.C. & Tyndale, R. F. Smoking, alcoholism and genetic polymorphisms alter CYP2B6 levels in human brain . Neuropharmacology 45, 122-32 (2003). 17. Keskitalo, K. et al. Association of serum cotinine level with a cluster of three nicotinic acetylcholine receptor genes (CHRNA3/CH RNA5/CHRNB4) on chromosome 15. Hum Mol Genet 18 , 4007- 12 (2009). 18. Uhl, G. R. et al. Molecular genetics of successful smoking cessation : convergent genome-wide association study results. Arch Gen Psychiatry 65, 683-93 (2008). 19. Uhl, G. R. et al. Molecular genetics of nicotine dependence and abstinence : whole genome association using 520,000 SNPs. BMC Genet 8 , 10 (2007). 20. Chanock, S.J. & Hunter, D.J. Genomics : when the smoke clears. Nature 452, 537-8 (2008). 21. Thorgeirsson, T.E. & Stefansson, K. Genetics of smoking behavior and its consequences: the role of nicotinic acetylcholine receptors. Biol Psychiatry 64, 9 19-21 (2008). 22. Krestyaninova, M. et al. A System for Information Management in BioMedical Studies-- SIM BioMS. Bioinformatics 25, 2768-9 (2009). 23. Metspalu, A . The Estonian Genome Project. Drug Rev Res 62, 97- 101 (2004). 24. Nelis, M. et al. Genetic structure of Europeans: a view from the North-East. PLoS One 4, e5472 (2009). 25. Gulcher, J. R. , Kristjansson, K., Gudbjartsson, H. & Stefansson, K. Protection of privacy by third-party encryption in genetic research in Iceland. Eur J Hum Genet 8 , 739-42 (2000). 26. Aulchenko, Y.S. et al. Linkage disequilibrium in young genetically isolated Dutch population . Eur J Hum Genet 12, 527-34 (2004). 27. Aulchenko, Y.S. et al. Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat Genet 41, 47-55 (2009). 28. Kaprio, J. & Koskenvuo, M. Genetic and environmental factors in complex diseases : the older Finnish Twin Cohort. Twin Res 5, 358-65 (2002). 29. Kaprio, J., Pulkkinen, L. & Rose, R.J. Genetic and environmental factors in health-related behaviors: studies on Finnish twins and twin families. Twin Res 5, 366-71 (2002). 30. Wichmann, H.E., Gieger, C. & Illig, T. KORA-gen--resource for population genetics, controls and a broad spectrum of disease phenotypes. Gesundheitswesen 67 SuppI 1, S26-30 (2005). 3 1. Boomsma, D.I. et al. Genome-wide association of major depression : description of samples for the GAIN Major Depressive Disorder Study: NTR and NESDA biobank projects. Eur J Hum Genet 16, 335-42 (2008). 32. Hofman, A . et al. The Rotterdam Study: 2010 objectives and design update. Eur J Epidemiol 24, 553-72 (2009). 33. Spector, T.D. & MacGregor, A.J. The St. Thomas' UK Adult Twin Registry. Twin Res 5, 440-3 (2002). 34. Richards, J. B. et al. Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study. Lancet 371, 1505- 12 (2008). 35. Samani, N.J. et al. A genomewide linkage study of 1,933 families affected by premature coronary artery disease : The British Heart Foundation (BHF) Family Heart Study. Am J Hum Genet 77, 10 11-20 (2005). 36. Saccone, S.F. et al. Genetic linkage to chromosome 22q l 2 for a heavy-smoking quantitative trait in two independent samples. Am J Hum Genet 80, 856-66 (2007). 37. Bucholz, K. K. et al. A new, semi-structured psychiatric interview for use in genetic linkage studies: a report on the reliability of the SSAGA. J Stud Alcohol 55, 149-58 ( 1994). 38. Hesselbrock, M., Easton, C , Bucholz, K. K., Schuckit, M. & Hesselbrock, V. A validity study of the SSAGA--a comparison with the SCAN. Addiction 94, 1361-70 ( 1999). 39. Cottier, L. B. et al. The CIDI-core substance abuse and dependence questions: cross-cultural and nosological issues. The WHO/ADAM HA Field Trial . Br J Psychiatry 159, 653-8 (199 1). 40. Wetzels, J.F., Kiemeney, L.A. , Swinkels, D.W., Willems, H. L. & den Heijer, M. Age- and gender-specific reference values of estimated GFR in Caucasians: the Nijmegen Biomedical Study. Kidney Int 72, 632-7 (2007). 4 1. Kiemeney, L.A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nat Genet 40, 1307- 12 (2008). 42. Rafnar, T. et al. Sequence variants at the TERT-CLPTM1L locus associate with many cancer types. Nat Genet 41, 221-7 (2009). 43. Joensen, J. B. et al. Can long-term antibiotic treatment prevent progression of peripheral arterial occlusive disease? A large, randomized, double-blinded, placebo-controlled trial. Atherosclerosis 196, 937-42 (2008). 44. A haplotype map of the human genome. Nature 437, 1299-320 (2005). 45. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39, 906- 13 (2007). 46. Li, Y., Wilier, C , Sanna, S. & Abecasis, G. Genotype imputation. Annu Rev Genomics Hum Genet 10, 387-406 (2009). 47. Kutyavin, I.V. et al. A novel endonuclease IV post-PCR genotyping system. Nucleic Acids Res 34, el28 (2006). 48. Rice, J.A . Mathematical Statistics and Data Analysis, (Wadsworth Inc. , Belmont, CA, 1995). 49. Higgins, J. & Thompson, S. Quantifying heterogeneity in a meta-analysis. Statistics in Medinine 21, 1539-58 (2002). 50. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997- 1004 (1999). CLAIMS

A method of determining a susceptibility to lu ng ca ncer, the method com prising :

obtaining sequence data about a hu man individua l identifying at least one allele of at least one polymorphic marker, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lu ng cancer in humans, and

determining a susceptibility to lu ng ca ncer from the sequence data,

wherein the at least one polymorphic marker is a marker selected from the grou p consisting of rs6474412, rs215614 and rs4105 144, and markers in linkage disequilibriu m therewith .

The method of claim 1, wherein the sequence data is nucleic acid sequence data obtai ned from a biological sa mple contai ning nucleic acid from the hu man individual .

The method of claim 2, wherein the nucleic acid sequence data is obtained using a method that com prises at least one procedu re selected from :

(i) amplification of nucleic acid from the biological sam ple;

(ii) hybridization assay using a nucleic acid probe and nucleic acid from the biologica l sam ple;

(iii) hybridization assay using a nucleic acid probe and nucleic acid obtained by amplification of the biological sample, and

(iv) high-th rough put sequencing

The method of claim 1, wherein the sequence data is obtained from a preexisting record .

The method of claim 4, wherein the preexisting record com prises a genotype dataset.

The method of any one of the preceding claims, wherein the determining comprises determining the presence or absence of at least one at-risk allele for lung cancer of the polymorphic marker.

7. The method of any one of the preceding claims, wherein the determining comprises compari ng the sequence data to a data base contai ning correlation data between the at least one polymorphic marker and susceptibility to lu ng ca ncer. 8 . The method of any one of the preceding claims, wherein the at least one polymorphic marker in lin kage disequilibriu m with rs6474412 is selected from the grou p consisting of the markers listed in Table 2.

9. The method of any one of the claims 1 to 3, wherein the at least one polymorphic marker in lin kage disequilibrium with rs2 15614 is selected from the group consisting of the markers listed in Table 1.

10 . The method of any one of the claims 1 to 3, wherein the at least one polymorphic marker in lin kage disequilibrium with rs4105 144 is selected from the group consisting of the markers listed in Table 3 .

11. The method of claim 2, wherein obtaining nucleic acid sequence data comprises obtaining a biologica l sample from the human individual and transforming the sample to determine sequence of the at least one polymorphic marker.

12 . The method of claim 11, wherein transforming the sam ple com prises amplifying a nucleic acid segment that comprises the at least one polymorphic marker and determining the sequence of the polymorphic marker.

13 . The method of claim 11 or claim 12, wherein determining sequence of the at least one polymorphic marker comprises determining the presence or absence of at least one allele of the at least one polymorphic marker.

14. The method of claim 1, wherein the sequence data is amino acid sequence data obtained from a biological sa mple comprising polypeptide from the individua l.

15 . The method of claim 14, comprising determini ng the presence or absence of an amino acid su bstitution in a polypeptide from the individual .

16. The method of any one of the preceding claims, wherein determination of a susceptibi lity comprises comparing the sequence data to a data base containing correlation data between the at least one polymorphic marker and susceptibility to lung cancer.

17 . The method of any one of the preceding claims, wherein the at least one allele or ami no acid su bstitution is associated with an increased susceptibility of lu ng ca ncer in hu mans.

18. A method of assessing a susceptibility to lu ng cancer in a hu man individua l, comprising

i. obtaining sequence data about the individual for at least one polymorphic marker selected from the grou p consisting of rs6474412, rs2 15614 and rs4105144, and markers in lin kage disequilibrium therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibi lities to lung cancer in hu mans;

ii . identifyi ng the presence or absence of at least one allele in the at least one polymorphic marker that correlates with increased occurrence of lu ng ca ncer in hu mans;

wherein determi nation of the presence of the at least one allele identifies the individual as having elevated susceptibility to lung cancer, and

wherein determi nation of the absence of the at least one allele identifies the individual as not having the elevated susceptibility.

19 . The method of claim 17 or claim 18, wherein the presence of the at least one allele or amino acid su bstitution is indicative of increased susceptibility with a relative risk of at least 1.05, at least 1.06, at least 1.07, at least 1.08, at least 1.09, at least 1.10, at least 1.11, at least 1.12, at leas 1.13, at least 1.14 or at least 1.15 .

20 . The method of claim 19, wherein the at least one allele is selected from the grou p consisting of the T allele of rs6474412, the G allele of rs2 15614, the T allele of rs7937, the C allele of rs4105144 and the G allele of rs7260329 .

21. A method of identification of a marker for use in assessing susceptibility to lung cancer in hu man individuals, the method comprising

a. identifying at least one polymorphic marker in lin kage disequilibriu m with rs6474412, rs215614, or rs4105 144;

b. obtaining sequence information about the at least one polymorphic marker in a group of individua ls diagnosed with lu ng ca ncer; and

c. obtaining sequence information about the at least one polymorphic marker in a group of control individua ls;

wherein determi nation of a significant difference in frequency of at least one allele in the at least one polymorphism in individua ls diagnosed with lu ng ca ncer as compa red with the frequency of the at least one allele in the control group is indicative of the at least one polymorphism is usefu l for assessing susceptibi lity to lu ng ca ncer.

22. The method of Claim 21, wherei n an increase in frequency of the at least one allele in the at least one polymorphism in individua ls diagnosed with lu ng ca ncer, as com pared with the frequency of the at least one allele in the control group, is indicative of the at least one polymorphism being usefu l for assessing increased susceptibility to lu ng ca ncer, and wherein a decrease in frequency of the at least one allele in the at least one polymorphism in individuals diag nosed with lung cancer, as compa red with the frequency of the at least one allele in the control grou p, is indicative of the at least one polymorphism being useful for assessing decreased susceptibility to, or protection agai nst, lung cancer.

23. A method of predicti ng prognosis of an individua l diagnosed with lu ng ca ncer, the method com prising

obtaining sequence data about a hu man individua l identifying at least one allele of at least one polymorphic marker selected from the group consisting of rs2 15614, rs6474412 and rs4105 144, and markers in linkage disequi libriu m therewith, wherein different alleles of the at least one polymorphic marker are associated with different susceptibilities to lu ng ca ncer in hu mans, and

predicting prognosis of lu ng ca ncer from the sequence data .

24. A method of assessing probabi lity of response of a huma n individual to a therapeutic agent for preventing, treating and/or ameliorating sym ptoms associated with lung cancer, comprising :

obtaining sequence data about a hu man individua l identifying at least one allele of at least one polymorphic marker selected from the group consisti ng of rs6474412, rs215614 and rs4105 144, and markers in linkage disequi libriu m therewith, wherein different alleles of the at least one polymorphic marker are associated with different probabilities of response to the therapeutic agent in hu mans, and

determining the probability of a positive response to the therapeutic agent from the sequence data .

25. A kit for assessing susceptibi lity to lu ng cancer, the kit comprising :

reagents for selectively detecting at least one allele of at least one polymorphic marker in the genome of the individual, wherein the polymorphic marker is selected from the group consisting of rs6474412, rs215614 and rs4105 144, and markers in linkage disequilibriu m therewith .

26. The kit of claim 25, further comprising a a col lection of data comprisi ng correlation data between the at least one polymorphism and susceptibility to lung ca ncer.

27. The kit of claim 26, wherein the col lection of data is on a com puter-readable medium . 28. The kit of any one of the claims 25 to 27, wherein the kit comprises reagents for detecting no more tha n 100 alleles in the genome of the individua l.

29 . The kit of claim 28, wherein the kit comprises reagents for detecting no more tha n 20 alleles in the genome of the individua l.

30 . The kit of any one of the claims 25 to 29, wherein the reagents comprise at least one oligonucleotide probe for selectively detecting at least one allele of the at least one polymorphic marker.

31. The kit of claim 30, wherein the at least one oligonucleotide probe is from 15 to 50 nucleotides in length .

32. Use of an oligonucleotide probe in the manufacture of a diag nostic reagent for diag nosi ng and/or assessi ng a susceptibility to lu ng cancer, wherein the probe is capable of hybridizing to a segment of a nucleic acid whose nucleotide sequence is given by any one of SEQ I D NO :1-737, and wherein the segment is 15-400 nucleotides in length .

33. The use of claim 32, wherein the segment of the nucleic acid to which the probe is capable of hybridizing com prises a polymorphic site.

34. A computer-readable mediu m having computer executable instructions for determining susceptibility to lu ng ca ncer, the computer readable medium com prising :

sequence data identifying at least one allele of at least one polymorphic marker in the individua l ;

a routine stored on the computer reada ble mediu m and ada pted to be executed by a processor to determine risk of developing lu ng cancer for the at least one polymorphic marker;

wherein the at least one polymorphic marker is selected from the grou p consisting of rs6474412, rs215614 and rs4105 144, and markers in linkage disequilibriu m therewith .

35. The com puter-readable medium of claim 34, wherein the mediu m contains data indicative of at least two polymorphic markers.

36. The com puter-readable medium of claim 34 or claim 35, wherein the data indicative of the at least one polymorphic marker comprises sequence data identifying at least one allele of the at least one polymorphic marker. 37. An appa ratus for determini ng a genetic indicator for lung ca ncer in a human individual, com prising :

a processor;

a computer readable memory having computer executable instructions ada pted to be executed on the processor to ana lyze marker information for at least one hu man individua l with respect to at least one polymorphic marker selected from the grou p consisting of rs6474412, rs215614 and rs4105 144, and markers in linkage disequilibriu m therewith, and generate an output based on the marker information, wherein the output comprises a measu re of susceptibility of the at least one marker or haplotype as a genetic indicator of the condition for the huma n individual .

38. The appa ratus of claim 37, wherein the marker information comprises sequence data identifying at least one allele of the at least one marker in the genome of the individual .

39 . The appa ratus of claim 38, wherein the sequence data comprises a genotype dataset.

40. The appa ratus according to Claim 37, wherein the computer readable memory further comprises data indicative of the risk of developing lu ng ca ncer associated with at least one allele of at least one polymorphic marker, and wherein a risk measure for the hu man individua l is based on a com parison of the marker information for the human individual to the risk of the condition associated with the at least one allele of the at least one polymorphic marker.

4 1. The method, kit, use, medium or apparatus according to any of the preceding claims,

wherein linkage disequi li briu m between markers is characterized by pa rticula r nu merical values of the linkage disequilibriu m measures r2 and/or |D'| .

42. The method, kit, use, medium or apparatus according to any of the preceding claims,

wherein linkage disequi li briu m between markers is characterized by values of r2 of at least 0 .2.

[INTERNATIONAL SEARCH REPORT International application No. PCT/IS201 1/050004

A. CLASS IFICATION OF SUBJECT MATTER C12Q 1/68 (2006.01)

According to International Patent Classification (IPC) or to both national classification and IPC

B . FIELDS SEARCHED Minimum documentation searched (classification system followed by classification symbols)

C12Q

Documentation searched other than minimum documentation to the extent that suc documents are included in the fields searched DK, NO, SE, Fl: Classes as above

Electronic data base consulted during the international search (name of data base and. where nracticable. search terms used) EPODOC, WPI, ENGLISH, GERMAN AND FRENCH FULL-TEXT, CAPLUS, MEDLINE, BIOSIS, EMBASE, SCISEARCH, BIOTECHABS

C. DOCUMENTS CONSIDERED TO BE RELEVANT

Category* Citation of document, with indication, where appropriate, o f the relevant passages Relevant to claim No.

P.X THORGEIRSSON, T.E. et al.: "Sequence variants at 1-8, 11-20, 23, CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior", Nature 25-42 Genetics, e-pub 25 April 2010, Vol. 42, No. 5, pages 448-453 rs6474412, rs215614 and rs4105144

A WO 2009101639 A 1 (DECODE GENETICS EHF.) 20 August 2009 rs1051730

EP 2071039 A 1 (COMMISSARIAT A L' ENERGIE ATOMIQUE) 7 June 2009 rs1051730

AMOS, C.I. et al.: "Genome-wide association scan of tag SNP 's identifies a susceptibility locus for lung cancer at 15q25.1", Nature Genetics, e-pub 2 April 2008, Vol. 40, No. 5, pages 616-622 rs1051730

Further documents are listed in the continuation of Box C. See patent family anne

* Special categories of cited documents: ' later document published after the international filing date or priority "A" document defining the general state of the art which s not considered dale and not in conflict with the application but cited to understand to be of particular relevance he principle or theory underlying the invention "E" earlier application or patent but published on or after the international 'X" document of particular relevance: the claimed invention cannot be tiling date considered novel or cannot be considered to involve an inventive " document which may throw doubts on priority claim(s) or which is step when the document is taken alone cited to establish the publication da e of another citation or other special reason (as specified) Ύ " document of particular relevance; the claimed invention cannot be considered to involve an inventive step when the document is "O" document referring to an oral disclosure, use. exhibition or other combined with one or more other such documents, such combination means being obvious to a person skilled in the art P" document published prior to the international filing date but later than '& document member of the same patent family

Date of the actual completion of the international searcli Date of mailing of the international search report 25/05/201 1 17/05/201 1

Authorized officer M 8 1, DK-2630 Taastrup, Denmark Tine Haarmark Nielsen

Facsimile No. +45 43 50 80 08 Telephone No. + 4 3 5 0 8 0 5 7 Form PCT/ISA/210 (second sheet) (July 2009) INTERNATIONAL SEARCH REPORT International application No. PCT/IS201 1/050004

Box No. I Nucleotide and/or amino acid sequencc(s) (Continuation of item l.c of the first sheet)

With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the international search was carried t on the basis of a sequence listing filed or furnished:

a. (means) □ on paper in electronic form

b. (time) □ in the international application as filed together with the international application in electronic form □ subsequently to this Authority for the purposes of search

In addition, in the case that more than one version or copy of a sequence listing has been filed or furnished, the required statements that the information in the subsequent or additional copies is identical to that in the application as filed or does not go beyond the application as filed, as appropriate, were furnished.

3. Additional comments:

Form PCT/ SA 2 0 (continuation of first sheet (1)) (July 2009) INTERNATIONALSEARCH REPORT Internationa! application No. PCT/IS201 1/050004

Box No. II Observations where certain claims were found unsearchable (Continuation of item 2 of first sheet)

This intemational search report has not been established in respect of certain claims under Article 17(2)(a) for the following reasons:

Claims Nos.: 1-3, 6-8, 11-13, 16-20, 23 and 41-42 (all partly) because they relate to subject matter not required to be searched by this Authority, namely: Claims 1-3, 6-8, 11-13, 16-20 and 41-42 (all partly) relate to a subject matter considered by this Authority to be covered by the provision of Rule 39.1(iv)/67.1(iv) PCT. The methods comprise diagnostic methods of the human or animal body due to the phrasing "obtaining" in the claims. Continued on Extra Sheet.

Claims Nos.: 1-7, 11-19, 23, 25-31 and 34-42 (all partly) because they relate to parts of the international application that do not comply with the prescribed requirements to such an extent that no meaningful intemational search can be carried out, specifically: The term "markers in linkage disquilibrium therewith" in the claims leads to a lack of clarity because it is unclear which markers might be encompassed in the claims, leading to an obscure scope of the claims comprising said term. Continued on Extra Sheet. □ Claims Nos.: because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a).

Box No. Ill Observations where unity of invention is lacking (Continuation of item 3 of first sheet)

This Intemational Searching Authority found multiple inventions in this intemational application, as follows: The common concept that would link inventions 1-5 can be seen as genetic markers in linkage disequilibrium with each other and with lung cancer. However, this concept is known in the art see for example WO 2009101639 which describes genetic markers that are linked to lung cancer. In lack of any structural or functional feature linking the different markers, each marker and markers in linkage disequilibrium therewith provides different solutions to the problem. Furthermore, inventions 1-3 solve different problems compared to inventions 4-5. Thus, the application does not meet the requirements of unity of invention as defined in Rules 13.1 and 13.2 PCT. Consequently, in view of the above, the search is restricted to invention 1. Continued on Extra Sheet.

1. I I As all required additional search fees were timely paid by the applicant, this intemational search report covers all searchable claims.

As all searchable claims could be searched without effort justifying additional fees, this Authority did not invite payment of additional fees.

3. □ As only some of the required additional search fees were timely paid by the applicant, this intemational search report covers only those claims for which i es were paid, specifically claims Nos.:

No required additional search fees were timely paid by the applicant. Consequently, this international search report is restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 1-8, 11-20, 23, 25-42 (all partly)

Remark o Protest The additional search fees were accompanied by the applicant's protest and. where applicable, the payment of a protest fee. □ The additional search fees were accompanied by the applicant's protest but the applicable protest fee was not paid within the time limit specified in the invitation. □ No protest accompanied the payment of additional search fees. Form PCT/ISA/210 (continuation of first sheet (2)) (July 2009) INTERNATION SEARCH REPORT International application No. PCT/IS201 1/050004

Continuation of Box II, .: The search of claims 1-3, 6-8, 11-13, 16-20, 23 and 41-42 has been carried out for these claims when relating to in vitro diagnostic methods. Likewise, an opinion on novelty, inventive step and industrial applicability will be given taking the in vitro diagnostic method into account.

Continuation of Box II, 2.: Therefore, claims 1-7, 11-19, 23, 25-31 and 34-42 have been searched partially, i.e. marker rs6474412 and markers in linkage disequilibrium therewith as set forth in table 2 on pages 23-26.

Continuation of Box III: Invention 1 (claims 1-8, 11-20, 23, 25-42, all partly): A method for determining a susceptibility to lung cancer comprising obtaining sequence data identifying at least one allele of at least one polymorphic marker selected from rs6474412 and markers in linkage disequilibrium therewith; A method of assessing a susceptibility to lung cancer in a human individual comprising obtaining sequence data and identifying the presence or absence of at least one allele of at least one polymorphic marker selected from rs6474412 and markers in linkage disequilibrium therewith; A method of predicting prognosis of an individual diagnosed with lung cancer comprising obtaining sequence data and identifying at least one allele of at least one polymorphic marker selected from rs6474412 and markers in linkage disequilibrium therewith; A kit for assessing susceptibility to lung cancer comprising reagents for seletively detecting at least one allele of at least one polymorphic marker selected from rs6474412 and markers in linkage disequilibrium therewith; Use of an oligonucleotide probe in the manufacture of a diagnostic reagent for diagnosing and/or assessing a susceptibility to lung cancer in human wherein the probe is capable of hybridizing to a segment of a nucleotide sequence given by SEQ ID NO:1-737 related to rs6474412 and markers in linkage disequilibrium therewith; A computer-readable medium comprising data indicative of at least one polymorphic marker selected from rs6474412 and markers in linkage disequilibrium therewith; An apparatus for determining a genetic indicator for lung cancer comprising a processor and a computer-readable memory adapted to analyze marker information for at least one polymorphic marker selected from rs6474412 and markers in linkage disequilibrium therewith.

Invention 2 (Claims 1-7, 9, 11-20, 23, 25-42, all partly): The same as for invention 1 wherein the polymorphic markers is rs215614 and markers in linkage disequilibrium therewith.

Invention 3 (Claims 1-7, 10-20, 23, 25-42, all partly): The same as for invention 1 wherein the polymorphic markers is rs4105144 and markers in linkage disequilibrium therewith.

Invention 4 (Claims 21-22, 41-42, all partly): A method of identification of a marker for use in assessing susceptibility to lung cancer in human individuals, comprising identifying at least one polymorphic marker in linkage disequilibrium with rs6474412, rs215614 and rs4105144.

Invention 5 (Claims 24, 41-42, all partly): A method of assessing probability of response of a human individual to a therapeutic agent for preventing, treating and/or ameliorating symptoms associated with lung cancer comprising obtaining sequence data and identifying at least one allele of at least one polymorphic marker selected from the group consisting of rs6474412, rs215614 and rs41 05144, and markers in linkage disequilibrium therewith.

Form PCT/ISA/210 (extra sheet) (July 2009) INTERNATIONAL SEARCH REPORT Iiiteniatioiia) application No. PCT/IS201 1/050004

C (Continuation). DOCUMENTS CONSIDERED TO BE RELEVANT

Category* Citation of document, with indication, where appropriate, of the relevant passages Relevant to claim No.

HUNG, R.J. et al.: "A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25", Nature, April 2008, Vol. 452, No. 7187, pages 633-637 rs1051730

A SACCONE, S.F. et al.: "Cholinergic nicotinic receptor genes implicated in a nicotine depence association study targeting 348 candiate genes with 3713 SNPs", Human Molecular Genetics, 2007, Vol. 16, No. 1, pages 36-49 rs6474413

A HOFT, N.R. et al.: "Genetic Association of the CHRNA6 and CHRNB3 Genes with Tobacco Dependence in a National Representative Sample", Neuropsychopharmacology, 2009, Vol. 34, No. 3, pages 698-706

ETTER, J.-F. et al.: " Association of genes coding for the a-4, a-5, β-2 and β-3 subunits of nicotinic receptors with cigarette smoking and nicotine dependence", Addictive Behaviors, 2009, Vol. 34, pages 772-775

WO 2006123955 A2 (SYNERGENZ BIOSCIENCE LIMITED) 23 November 2006

THUN, M. J. et al: "Lung Cancer Occurence in Never-smokers: An A Analysis of 13 Cohorts and 22 Cancer Registry Studies", PLoS Medicine, September 2008, Vol. 5, No. 9, e185., pages 1357-1371

Form PCT/ISA/2 10 (continuation of second sheet) (July 2009) INTERNATIONAL SEARCH REPORT international application No. Information on patent family members PCT/IS201 1/050004

Patent documents Publication date Patent family number(s) Publication date cited in search report

WO2009101639 A 1 20090820 EP2247755 A 1 201011 10 CA2714521 A 1 20090820 AU2009213689 A 1 20090820

EP2071039 A 1 20090617 US201 1039344 A 1 201 10217 WO2009074673 A 1 20090618

WO20061 23955 A2 20061 23 ZA2007 109 0 A 20081126 KR2008001 1292 A 20080201 MX2007014220 A 20090216 US2007099202 A 1 20070503 RU2007147214 A 20090627 JP2008545390T T 20081218 EP1 888777 A2 20080220 CN101 180408 A 20080514 CA2608161 A 1 20061123 BRPI0610065 A2 20100525 AU2006248201 A 1 20061 123

Form PCT/TSA/210 (patent family annex) (July 2009)