Genetic Association Study

Genetic Association Study

Cathryn Lewis Genetic association studies: is senior lecturer in Genetic Epidemiology and Statistics at Guy’s, King’s and St. Thomas’ Design, analysis and School of Medicine. Her research interests are the localisation of genes for interpretation complex disorders through linkage and association. Cathryn M. Lewis Date received (in revised form): 5th April 2002 Abstract This paper provides a review of the design and analysis of genetic association studies. In case control studies, the different contingency tables and their relationships to the underlying genetic model are defined. Population stratification is discussed, with suggested methods to Downloaded from Keywords: case control identify and correct for the effect. The transmission disequilibrium test is provided as an study, TDT, population alternative family-based test, which is robust to population stratification. The relative benefits stratification of each analysis are summarised. INTRODUCTION mutations, and gene localisation studies bib.oxfordjournals.org The Human Genome Project has screen large numbers of SNPs to test the generated a wealth of data that will co-occurrence of SNP alleles and disease. determine the genetic contribution to These genetic association studies are common human disorders. Genetic performed to determine whether a studies have already proved highly genetic variant is associated with disease: successful in cloning genes for simple an individual carrying one or two copies at NJ Inst of Technology on March 10, 2011 Mendelian diseases, such as cystic fibrosis, of a high-risk variant is at increased risk of Huntington’s disease and many rare developing a disease. syndromes, but progress has been slower This review paper will consider two in complex diseases. This class of diseases different study designs for association covers a broad spectrum of human health, studies: the case control study and the including inflammatory bowel disease, transmission disequilibrium test (TDT), asthma and heart disease, where several focusing on study design, statistical genes are likely to control disease risk, and analysis methods and interpretation of gene–gene or gene–environmental results. Web sites for downloading interactions may be important. Identifying analysis software are given. the genetic contributions to complex diseases will lead to advances in diagnosis CASE CONTROL STUDIES and therapy (especially Case control studies compare the pharmacogenomics) and with far-reaching frequency of SNP alleles in two well- implications for public health. defined groups of individuals: cases who The major tools from the Human have been diagnosed with the disease Genome Project for identifying disease under study, and controls, who are either susceptibility loci are the single nucleotide known to be unaffected, or who have Cathryn Lewis, polymorphisms (SNPs). These single been randomly selected from the Division of Medical and Molecular Genetics, base-pair changes are common across the population. (Both choices of controls 8th Floor, genome, and over 1.4 million such form a valid study.) An increased Guy’s Tower, polymorphisms have been detected.1 frequency of a SNP allele or genotype in Guy’s Hospital, London SE1 9RT, UK SNPs occur ubiquitously across the cases compared with controls indicates genome, in coding, non-coding and that presence of the SNP allele may Tel: +44 (0) 20 7955 8761 Fax: +44 (0) 20 7955 4644 untranslated regions. Such variants are increase risk of disease. The major E-mail: [email protected] strong candidates for disease susceptibility problem in case control studies is ensuring 146 & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 3. NO 2. 146–153. JUNE 2002 Genetic association studies a good match between the genetic Table 1: Contingency tables for case background of cases and controls, so that control analyses, by genetic model. Test 1 is any genetic difference between them is a baseline analysis, and any further analysis related to the disease under study and not should be driven by prior hypothesis. a, b, c, d, e, f are genotype counts observed in cases to biased sampling. Clearly, cases and and controls Case control studies are controls should be from similar ethnic a widely used and groups. More subtle genetic differences (a) Full genotype table for a general genetic model powerful study design can be guarded against by collecting controls from the same geographical area AA AB BB as cases, or by collecting information such Cases abc as the birth place of grandparents to check Controls def a similar distribution between cases and (b) Dominant model: allele B increases risk controls. AA AB+BB Analysis methods Cases ab+c Downloaded from For a single SNP with alleles A and B Controls de+f tested in a case control study, the data (c) Recessive model: two copies of allele B required generated consist of six counts of the for increased risk numbers of genotypes (AA, AB and BB) AA + AB BB in cases and controls (Table 1(a)). We bib.oxfordjournals.org Cases a+b c assume a total of ncase cases and ncont Controls d+e f controls have been tested, and the total number of AA genotypes observed is nAA, (d) Multiplicative model: r-fold increased risk for AB, r2 increased risk for BB. Analysed by allele, not by etc. This 2 3 3 contingency table can be genotype analysed directly using an observed- AB expected test statistic, which has a chi- at NJ Inst of Technology on March 10, 2011 squared distribution on two degrees of Cases 2a + bb+2c freedom (df ). Contingency tables can be Controls 2d + ee+2f analysed using any standard statistical (e) Additive model: r-fold increased risk for AB, 2r package (Stata, SAS, SPSS, Splus, etc.) or increased risk for BB. Genotypes analysed by using Excel. Armitage’s test for trend Contingency table The chi-square statistic tests for AA AB BB departure from the expected values across analysis methods allow Cases abc for different genetic cells in the table. Thus the observed value Controls def models for AA genotype in cases (O1 ¼ a)is compared with its expected value given the total number of cases and the total number of AA genotypes, so across the genotypes AA, AB and BB. E1 ¼ nAA ncase=n. The full test statistic is The test statistic approximation to the X6 2 chi-square distribution is asymptotic, (Oi À Ei) ÷ implying that the analysis becomes more X ¼ 2 i¼1 Ei accurate with larger data sets. A small count in any cell can violate the where the summation is over all six cells distributional assumptions, and an in the table, and Oi are the observed expected value of at least five observations values a, b, c, d, e, f in each cell. in each cell is regarded as a minimum Notice that this test statistic compares number. the observed number of AA genotypes in The data may also be analysed assuming cases with that expected assuming both a prespecified genetic model. For cases and controls have the same example, with the hypothesis that frequency of AA genotypes. The analysis carrying allele B increased risk of disease does not provide any sense of ordering (dominant model), the AB and BB & HENRY STEWART PUBLICATIONS 1467-5463. BRIEFINGS IN BIOINFORMATICS. VOL 3. NO 2. 146–153. JUNE 2002 1 4 7 Lewis genotypes are pooled giving a 2 3 2 table BB individuals, but not in the specific (Table 1(b)). This is particularly relevant relationship of a multiplicative or additive when allele B is rare, with few BB model. observations in cases and controls. Although this paper focuses on biallelic Alternatively, under a recessive model for SNPs, association studies may also be allele B, cells AA and AB would be performed using multi-allele systems, such pooled (Table 1(c)). as microsatellite markers. Indeed, several Analysing by alleles provides an disease genes have been identified Allele frequency alternative perspective for case control through association studies with methods assume a data. This breaks down genotypes to microsatellite markers in regions multiplicative genetic 4,5 model compare the total number of A and B delineated by linkage studies. The alleles in cases and controls, regardless of analysis methods remain similar, but the genotypes from which these alleles are problems arise when rare alleles lead to constructed (Table 1(d)). This analysis is sparse contingency tables that cannot be counter-intuitive, since alleles do not act analysed by chi-square statistics. Sham and Downloaded from independently, but it provides the most Curtis6 provide a solution to this in their powerful method of testing under a program CLUMP, which analyses case multiplicative genetic model, where risk control data from microsatellite markers of developing a disease increases by a using Monte Carlo simulation methods. factor r for each B allele carried: risk r for bib.oxfordjournals.org genotype AB and r 2 for genotype BB. If a Testing for Hardy–Weinberg multiplicative genetic model is equilibrium appropriate, both case and control Control genotypes should be in Hardy– genotypes will be in Hardy–Weinberg Weinberg equilibrium, provided the equilibrium,2 and this can be tested for population they are selected from is (see below). random mating and is large in size. at NJ Inst of Technology on March 10, 2011 A fourth possible genetic model is Suppose the population frequency of additive, with an increased disease risk of r allele A is p and allele B is q ¼ 1 À p, then for AB genotypes, and 2r for BB the genotypes AA, AB and BB should genotypes (Table 1(e)). This model shows have frequency p2,2pq and q2. This may a clear trend of an increased number of be tested in controls, comparing observed AB and BB genotypes, with the risk for control genotype counts against those AB genotypes approximately half that for expected under Hardy–Weinberg An additive genetic BB genotypes.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us