The Candidate Gen e App ro a c h

Jennifer M. Kwon, M.D., and Alison M. Goate, D.Phil.

Alcoholism has a significant genetic basis, and identifying genes that confer a susceptibility to alcoholism will aid clinicians in preventing and effectively treating the disease. One commonly used technique to identify genetic risk factors for complex disorders such as alcoholism is the candidate gene approach, which directly tests the effects of genetic variants of a potentially contributing gene in an association study. These studies, which may include members of an affected family or unrelated cases and controls, can be performed relatively quickly and inexpensively and may allow identification of genes with small effects. However, the candidate gene approach is limited by how much is known of the biology of the disease being investigated. As researchers identify potential candidate genes using animal studies or linking them to DNA regions implicated through other analyses, the candidate gene approach will continue to be commonly used. KEY WO R D S : genetic theory of AODU (AOD [alcohol or other drug] use, abuse, and dependence); ; genetic polymorphism; nucleotides; apolipoproteins; ; alcohol dehydrogenases; aldehyde dehydrogenases; Alzheimer’s disease

am i l y , twin, and adoption studies families, investigators can identify genetic Wh e r eas the linkage mapping ha v e indicated that alcoholism regions associated or “in linkage” with ap p r oach is an unbiased search of the Fhas a strong genetic component the disease by observing that affected en t i r e without any prec o n c e p - (R eich et al. 1999). Although res e a rc h e r s family members share certain marke r tions about the role of a certain gene, ar e still investigating its exact nature, variants (i.e., alleles) located in those the candidate gene approach allows se v eral genes of var ying effect that con- regions more frequently than would be res e a r chers to investigate the validity of fer a susceptibility to alcoholism (i.e., expected by chance. These regions can an “educated guess” about the genetic susceptibility genes) likely play a rol e . then be isolated, or cloned, for furth e r basis of a disorde r . This approach invol ve s Identification of these genes will aid analysis and characterization of the assessing the association between a par- res e a r chers and clinicians in preve n t i n g responsible genes. Linkage mapping ticular allele (or set of alleles) of a gene and effectively treating this disorde r . techniques have already resulted in the that may be invol v ed in the disease (i . e . , The search for alcoholism suscepti- identification of several potential DNA a candidate gene) and the disease it s e l f . bility genes centers on two major tech- regions that may contain susceptibility In other words, this type of association niques, linkage mapping and the can- genes for alcoholism (Reich et al. 19 9 9 ) . didate gene approach. Linkage mapping, (F or a rev i e w of the analogous approa c h also called positional cloning, is the pro- in mice, see the article in this issue on JEN N I F E R M. KWO N , M.D., is an assistant cess of systematically scanning the entire qu a n t i t a t i v e trait locus [QTL] analysis pr ofessor of neurol o g y and ALI S O N M. DNA contents (i.e., the ) of by Grisel, pp. 169–174.) The primary GOATE , D.PHI L ., is a professor of various members of families affected by ad v antage of linkage mapping is that in psych i a t r y at Washington Uni ve r s i t y the disorder using regularly spaced, in v estigators need no prior knowledge of School of Medicine, St. Louis, Mis s o u r i . highly variable (i.e., polymorphic) DNA the physiology or biology underlying the Both authors are res e a r chers in the segments whose exact position is known dis o r der being studied, which is importa n t Co l l a b o ra t i v e Study of the Genetics of (i.e., genetic markers). Using those for complex disorders, such as alcoholism. Alc o h o l i s m .

164 Alcohol Research & Health TheWhat Candidate Is Moderate Gene Drinking?Approach

study tries to answer the question, se v eral variants, or alleles, allowing for functions that might influence the trait “I s one allele of a candidate gene more its use in the candidate gene approach. of interes t . 1 (F or more information on fr equently seen in subjects with the dis- In alcoholism, as in other addictive the relationship betwee n mutations in ease than in subjects without the dis- di s o r ders, the pathways through which the DNA and variations in prot e i n ease?” The major difficulty with this brain chemicals (i.e., neurot r a n s m i t t e r s ) function, see the sidebar, p. 167.) In the ap p r oach is that in order to choose a and other signaling molecules act may case of ALDH, several wel l - k n ow n potential candidate gene, res e a rc h e r s polymorphisms result in the substitution must already have an understanding of of certain protein building blocks (i.e., the mechanisms underlying the disease Candidate gene studies amino acids) and thus can lead to pro- (i.e., disease pathophysiology). In con- teins with biologically rel e v ant changes trast with linkage mapping studies, how- ar e better suited for in function. In many cases, howeve r , eve r , studies of candidate genes do not detecting genes res e a r chers may know a gene’s DNA req u i r e large families with both affected sequence but may not have any infor- and unaffected members, but can be un d e r lying common mation about functional variation in pe r formed with unrelated cases and the gene. co n t r ol subjects or with small families and more complex Detecting genetic variants is a labo- (e.g., a prob a n d and parents). Furt h e r - diseases where the risk rious process that often invol ve s mo r e, candidate gene studies are better sequencing—that is, determining the suited for detecting genes underlying associated with any sequence of DNA building blocks (i.e., common and more complex diseases nucleotides)—for the entire gene in wh e r e the risk associated with any given gi v en candidate gene both affected and unaffected individu- candidate gene is rel a t i v ely small (Collins is rel a t i v ely small. als to look for consistent differen c e s . 2 et al. 1997; Risch and Mer i k a n g a s Al t e r n a t i ve l y , res e a r chers can employ 1996). This article describes the methods sc re e n i n g pro c e d u r es during which they used in candidate gene studies, including also play a role in the development and isolate small gene sections from many associated methodological and technical maintenance of addictive behaviors. individuals and compare their mobility considerations, and rev i e ws examples of For example, res e a r chers using the frui t in a gelatinous material. Dif f e r ences in this approach that are both related and fly Dr osophila melanogaster as an animal mobility in these analyses may indicate un r elated to alcoholism. Because many model of alcohol sensitivity found that nucleotide variations (Malhotra and of the best known examples of this a fly mutant called ch e a p d a t e ex h i b i t e d Goldman 1999).3 To confirm that a ap p r oach have been conducted in enhanced vulnerability to alcohol (Moo r e potential nucleotide variation exists and humans, these studies have been high- et al. 1998). This mutant carries a spe- to determine its exact location in the lighted. Howeve r , the overall approa c h cific allele of a gene important in cellu- genome, investigators then must con- is similar in other model organisms, lar signaling pathways; co n s e q u e n t l y , duct additional studies, typically on the and the differences will be rev i e wed at genes invol v ed in this and other signaling di r ect sequencing of the DNA section the end of this article. pa t h w a y s would be reasonable choices in question. This information also for candidate genes influencing alcohol al l o ws res e a r chers to determine Strategies Used sensitivity and possibly alcoholism. (For whether the nucleotide variation is in the Candidate mo r e information on alcohol-rel a t e d likely to have functional significance, Gene Approach studies in Dro s o p h i l a , see the ar ticle in either because it actually results in this issue by Heberlein, pp. 185–188.) amino acid changes in the resulting pro- The selection of pa r ticular genes for tein or because it occurs in DNA reg i o n s Selecting a Candidate Gene fu r ther analysis as candidate genes could co n t r olling the gene’s activity. Fin a l l y , to be facilitated if some of the po t e n t i a l l y be useful for candidate gene studies, the The first critical step in conducting im p o r tant genes wer e located in DNA variant should occur with sufficient fre- candidate gene studies is the choice of a regions that could be linked to alco- suitable candidate gene that may plausi- holism in genome screens. 1Both the genes and the proteins they encode frequently are abbreviated with the same letters; however, the bly play a rel e v ant role in the process or names of the genes are usually typed in italics and the disease under investigation. For example, Choosing a DNA Polymorphism names of the proteins in regular letters. when studying alcoholism, genes encod- 2A typical gene can span 10,000 to 100,000 or more in g enzymes that act in various pathways Once investigators have selected a can- nucleotides of the human genome, of which approximately of alcohol metabolism, such as alcohol didate gene, they must decide which 2 to 5 percent (i.e., a few thousand nucleotides) consist of de h yd r ogenase (ADH) and aldehyde polymorphism would be most useful the coding sequence and the rest, intronic sequence. de h yd r ogenase (ALDH), are logical for testing in an association study. To 3Two such techniques are single strand conformation choices. Both enzymes are encoded by this end, they must identify existing ge n e analysis (SSCP) and denaturing high performance liquid chromatography (DHPLC). For a discussion of other mo r e than one gene (i.e., by gene fami- variants and determine which of those candidate gene variant selection techniques, see Malhotra lies), and each of these genes exists in variants result in proteins with altered and Goldman 1999; Collins et al. 1997.

Vol. 24, No. 3, 2000 165 quency to allow detection of differen c e s gene in a sample of randomly chosen tein E (ApoE), whose gene (AP O E ) is be t w een individuals with and without subjects with the disease (i.e., cases) located on chromosome 19. ApoE was the trait under inves t i g a t i o n . and without the disease (i.e., control s ) . implicated in the development of AD Not all genes, howeve r , have an eas- Such subject groups are rel a t i v ely easy by findings that it binds tightly to b- ily identifiable common functional to obtain, giving the candidate gene amyloid in the fluid surrounding the variant that can be exploited in associa- ap p r oach an important advantage over brain and spinal cord (i.e., the cere- tion studies, and in many cases res e a rc h e r s the linkage mapping approach, which br ospinal fluid) (Strittmatter et al. 1993). ha v e identified only changes in individ- req u i r es the analysis of families with Furt h e r m o r e, prior linkage data had ual nucleotides (i.e., single nucleotide multiple affected members. Add i t i o n a l indicated that a gene for late-onset AD polymorphisms [SNPs]) that have no ad v antages of the case-control study was located on chromosome 19 (Per i c a k - kn o wn functional significance. Neve r - design over linkage-based methods Vance et al. 1991), in a region that theless, SNPs can be potentially useful include the following (Malhotra and included the AP O E gene. Based on in narrowing a linkage region. In addi- Goldman 1999): these findings, res e a r chers conducted tion, they may show a statistically sig- an association study comparing the fre- nificant association with a disease sus- •Res e a r chers can more easily obtain quency of three AP O E alleles called E2, ceptibility gene if they are located within large numbers of cases and control E3, and E4 in 30 cases and 91 unrel a t e d or near that gene by virtue of linkage su b j e c t s . co n t r ols (Strittmatter et al. 1993). The disequilibrium (see the sidebar for a in v estigators found that whereas all alleles description of this phenomenon). • The effect of disease heterog e n e i t y oc c u r r ed in the controls, the AP O E * E 4 SN P s can be of particular benefit in (i.e., that a disease may have multi- allele was greatly over re p r esented in the studies of complex disorders for which ple genetic causes despite a similar AD cases, indicating that this allele is a many potential candidate genes exist. disease phenotype) is less prob l e m a t i c . major risk factor for the devel o p m e n t For example, linkage mapping studies of AD. This robust association betwee n ha v e suggested several genomic area s •Res e a r chers do not need to make AP O E * E 4 and AD has been confirmed that may contain susceptibility genes assumptions about the exact mode in many subsequent studies (for a for alcoholism. Each of these areas, how- of disease transmission before con- rev i e w, see St. Geo r g e - H yslop 2000). eve r , is so large that it may contain dozen s ducting their analyses. With respect to alcoholism, res e a rc h e r s or hundreds of genes depending on the ha v e used the candidate gene approa c h si z e and gene density of each reg i o n . 4 The major problem associated with to investigate the association betwee n Because it would be proh i b i t i v ely diffi- the case-control design is that it may ce r tain AD H and AL D H alleles and an cult to sequence all these genes, publicly result in spurious associations if the al t e r ed risk of alcoholism. Studies have av ailable SNP data are a great res o u rc e co n t r ols are not appropriately matched found that the enzyme encoded by an for candidate gene and association stud- to the cases with respect to ethnicity or AL D H allele called AL D H 2 * 2 de g r a d e s ies. For example, res e a r chers rec e n t l y other factors that influence an individ- ac e t a l d e h y de more slowly than normal, an a l y z ed several SNPs in the DNA ua l ’s genetic composition. resulting in the prolongation of certa i n region containing a candidate gene for unpleasant alcohol effects, such as facial Al z h e i m e r ’s disease (AD) and demon- flushing, racing of the heart (i.e., palpi- strated that two SNPs closely flanking Examples of the tations), and nausea. Not surprisingly, that gene indeed showed strong associa- Candidate Gene this allele appears to have a prot e c t i v e tion with AD (Mar tin et al. 2000). (For Approach in Humans effect against alcoholism—that is, peo- mo r e information on this candidate ple carrying the allele are less likely to gene for AD, see the section “Exa m p l e s A widely cited example of the useful- consume alcohol and to develop alco- of the Candidate Gene App r oach in ness of the candidate gene approa c h holism (for a rev i e w, see Reich et al. Humans.”) in vo l v es AD, the most common cause 1999). The frequency of the AL D H 2 * 2 of dementia in the elderly. AD typically allele is particularly high in some Asian Testing the Candidate Gene is a late-onset disorder (i.e., the earliest populations, and carriers of this allele symptoms occur after age 60) with consume less alcohol and are much less Once investigators have chosen a candi- a complex inheritance pattern. The dis- likely to develop alcoholism than are date gene and suitable po l y m o r p h i s m , ease often appears to occur sporadically, people without the allele. they commonly test the role of this ev en when there is an underlying genetic pr edisposition. One of the pathologic 4Genes are not equally spaced throughout the genome, and ha l l m a r ks of AD is the presence of The Candidate Gene some DNA regions may contain more genes than others. mi c r oscopic aggregates, or plaques, of Approach in Mouse Studies For example, although the human chromosomes 21 and 22 are of similar size (approximately 33.5 million nucleotides), a small protein-like molecule called b- their gene density differs substantially. Thus, there are on amyloid peptide. These b-a m y l o i d Qua n t i t a t i v e trait loci (QTLs) are DNA average 6 genes per million nucleotides on chromosome 21 (Hattori et al. 2000) but 16 genes per million nucleotides on plaques also contain several other pro- regions that may contain one or more chromosome 22 (Dunham et al. 1999). teins, including one called apolipopro- genes related to the development of a

166 Alcohol Research & Health TheWhat Candidate Is Moderate Gene Drinking?Approach

Genes and Mutations

DNA, the genetic material contained in each cell, to the development of a different phenotype, they encodes the information for all the proteins needed to represent polymorphisms that can be useful in candi- cr eate and maintain an organism. The information for date gene studies. Many mutations do not result in each protein is contained within one gene. Genes rep re s e n t amino acid changes, however, and therefore do not only a small portion of a cell’s entire DNA (i.e., the alter the resulting protein or its function. For example, genome), howeve r , and stretches of DNA both betwee n because of the redundant nature of the DNA code, and within genes are not conver ted into proteins. Som e of these “no n c o d i n g ” DNA stretches (e.g., prom o t e r s ) some mutations result in codons that still specify the regulate the activity of the genes and determine which same amino acid. Thus, if a mutation occurred in the gene is turned “on ” or “of f ” in a given cell at a given last nucleotide of the GCA triplet, all three possible time. This regulation is necessary, because not all ce l l s new triplets (GCC, GCG, and GCT) would still need to generate all proteins at all times, and exce s s i v e or en c o d e the amino acid alanine. Neve rt h e l e s s , these single untimely protein production can lead to disease. For nucleotide polymorphisms (SNPs) can be useful in example, only blood cells need to produce the prot e i n linkage studies, as described in the main article. hemoglobin, which carries oxygen from the lungs to the Furthermore, many mutations occur in noncoding tissues. Noncoding DNA stretches within genes are called DNA regions and therefore do not result in prote i n in t r ons. They are cut (i.e., spliced) out of an intermedi- variants that are associated with an altered phenotype or ar y molecule called messenger RNA during the conver - increased disease risk. Under two conditions, however, sion of the genetic information in the DNA into a pro- tein. This splicing process must be highly accurate in even mutations in noncoding regions might result in or der to ensure that the resulting protein is functional. an altered phenotype and therefore be useful in candi- DNA is a long, thread-like molecule whose building date gene studies. First, mutations that occur in regula- blocks—the nucleotides—consist of sugar molecules tory regions, such as promoters or intron splice sites, linked to organic bases. There are four such bases: could alter gene activity and, consequently, the pheno- adenine (A), guanine (G), cytosine (C), and thymine type determined by that gene. Second, noncoding (T). The order, or sequence, in which the nucleotides mutations that occur in an intron or outside a gene are arranged specifies the order in which the building could be associated with an altered phenotype if they blocks of the resulting proteins (i.e., the amino acids) are positioned close to (i.e., typically within 200,000 are combined. Because there are 20 amino acids but nucleotides) a functional mutation and are therefore only 4 different nucleotides, a triplet of three nucleotides (i.e., a codon) represents one specific amino acid. almost always inherited together with the functi o n a l However, the 4 nucleotides can be arranged into 64 mutation. This phenomenon is known as “linkage different triplets, far more than the 20 codons needed disequilibrium.” In all other cases, an observed associa- to represent each amino acid. As a result, the genetic tion between a noncoding mutation and a disease may code is redundant, which means that more than one codon be a consequence of population stratification—that is, can represent a particular amino acid. For example, general differences between cases and controls if both only the codon ATG rep r esents the amino acid methion- subject groups are drawn from different underlying in e ; however, four different codons (GCA, GCC, populations (e.g., ethnic groups or animal strains)— GCG, and GCT) represent the amino acid alanine. or a chance event. Both during the DNA duplication that occurs when cells divide and as the result of external factors (e.g., exposure to radiation or certain chemicals), changes in the nucleotide sequence (i.e., mutations) can occur. If —J ennifer M. Kwon and these changes result in altered proteins that contribute Alison M. Goa t e

Vol. 24, No. 3, 2000 167 ce r tain quantitative trait. Mapping of Conclusion MAL H O T R A , A.K., AN D GOL D M A N , D. Benefits and QTLs in animal models of alcohol- pitfalls encountered in psychiatric genetic associa- related phenotypes has identified mul- A combination of linkage mapping and tion studies. Biological Psychiatry 45:544–550, 1999. tiple genomic areas that potentially a candidate gene approach has been the MAR T I N , E.R.; GIL B E R T , J.R.; LAI , E.H.; E TA L. contain candidate genes for these phe- most successful method of identifying Analysis of association at single nucleotide polymor- notypes. (For more information on disease genes to date. The candidate gene phisms in the APOE region. Ge n o m i c s 63:7– 12, 2000. ap p r oach is useful for quickly determin- QTL mapping, see the article in this MOO R E , M.S.; DEZAZ Z O , J.; LUK , A.Y.; E TA L. in g the association of a genetic var i a n t issue by Grisel, pp. 169–174.) The Ethanol intoxication in Dr o s o p h i l a : Genetic and with a disorder and for identifying pharmacological evidence for regulation by the methods of identifying these candidate genes of modest effect. This approa c h cAMP signaling pathway. Ce l l 93:997–1007, 1998. genes and any potential functional var i - has certain advantages over traditional ants are essentially the same as those PER I C A K -V AN C E , M.A.; BEB O U T , J.L.; GAS K E L L , linkage mapping or positional cloning P.C.; E TA L. Linkage studies in familial Alzheimer used in humans. Once functional var i - ap p r oaches. The current methods for disease: Evidence for chromosome 19 linkage. ants are found, howeve r , any positive ev aluating risk associated with candidate American Journal of Human Genetics 48 : 1 0 3 4 – association between a variant and the genes complement traditional linkage 1050, 1991. trait of interest must be interpret e d ef f o r ts in identifying susceptibility genes REI C H , T.; HIN R I C H S , A.; CUL V E R H O U S E , R.; AN D with caution. For example, because of for alcoholism. As more SNPs are iden- BIE R U T , L. Genetic studies of alcoholism and sub- the way mouse strains are bred, mice tified throughout the genome, some stance dependence. American Journal of Human who have a trait (analogous to human of those SNPs also will be located Genetics 65:599–605, 1999. cases) and mice who do not have that within candidate genes, thereb y allowi n g RIS C H , N., AN D MER I K A N G A S , K. The future of trait (analogous to controls) may pos- res e a rc h e r s the use of the candidate gene genetic studies of complex human diseases. Sc i e n c e ■ sess different alleles at a particular gene ap p r oach on a genome-wide scale. 273:1516–1517, 1996. ev en if that gene is unrelated to the dis- ST. GEO R G E -H YS L O P , P.H. Molecular genetics of ease (or trait) under consideration. The References Alzheimer’s disease. Biological Psychiatry 47 : 1 8 3 – gene that actually confers the risk for 199, 2000. COL L I N S , F.S.; GUY E R , M.S.; AN D CHA R K R A V A R T I , the disease or trait under inves t i g a t i o n STR I T T M A T T E R , W.J.; SAU N D E R S , A.M.; SCH M E C H E L , A. Variations on a theme: Cataloging human DNA D.; E TA L. Apolipoprotein E: High avidity binding may be located near the gene showi n g sequence variation. Sc i e n c e 278:1580–1581, 1997. the allelic polymorphism, but may be to b-amyloid and increased frequency of type 4 allele DUN H A M I.; SHI M I Z U , N.; ROE , B.A.; E TA L. Th e in late-onset familial Alzheimer disease. Pr o c e e d i n g s difficult to identify positively using DNA sequence of human chromosome 22. Na t u r e of the National Academy of Sciences US A 90 : 1 9 7 7 – association methods alone. 402:489–495, 1999. 1981, 1993.

168 Alcohol Research & Health