Genetic : an expanding scientific discipline1

Diego F. Wyszynski 2

ABSTRACT Genetic epidemiology is a relatively new discipline that studies the interaction between genetic and environmental factors in the etiology of human diseases. Taking advantage of genetic markers provided by molecular biological research, complex computerized algorithms, and large databases, the field of genetic epidemiology has undergone significant development over the past 10 years. Using concrete examples from recent scientific literature, this article describes the objectives and methodology of genetic epidemiology.

The notion that environmental fac- (4) were forerunners in associating the cases of congenital hypothyroidism tors interact with the genome in the term genetic epidemiology with the and phenylketonuria. Finally, tertiary production of diseases emerged discipline that strives to control and prevention consists of minimizing the around the middle of the 19th century, prevent illness by identifying the role effects of a disease by reducing the when certain individuals were ob- of genetic factors, in interaction with complications and damage it causes. served to be more resistant than others environmental factors, in the etiology An example of tertiary prevention of a to communicable diseases. Almost 100 of human disease (5). genetic disease is the use of prophy- years passed, however, before epi- Prevention can take place at the pri- laxis with antibiotics and immuniza- demiologists interested in and mary, secondary, and tertiary levels. tion for individuals with sickle cell geneticists interested in epidemiology Primary prevention refers to reducing trait to prevent certain bacterial infec- were able to develop the first analytic the incidence of a disease in a popula- tions that could endanger the life of methods to identify environmental tion (6). The best known example of the patient. and genetic factors involved in the primary prevention is immunization Genetic are the basis of pathologic process (1). to prevent certain infectious diseases. variation in the population (8). Like Although expressions such as “epi- In the scope of genetic epidemiology, other clinically expressed or mani- demiologic genetics” (2) and “clinical avoiding an environmental risk factor fested traits (phenotypes), diseases population-based genetics” (3) had al- (maternal smoking, for example) that involve genetic factors in three ready been coined, Morton and Chung interacts with genetic susceptibility ways, which are not always mutually (genotype A2 of the TGF TaqI marker exclusive: in the fetus), thereby leading to a cer- tain pathologic process (cleft palate), is 1. The may be directly 1 This article has been published in Spanish in this an example of primary prevention (7). harmful to the individual. This cat- journal (Vol. 3, No. 1, 1998, pp. 26–34) under the Secondary prevention refers to pre- egory includes the many disorders title “La epidemiología genética: disciplina cientí- fica en expansión.” vention of the clinical manifestations transmitted in an autosomal domi- 2 The Johns Hopkins University, School of Hygiene of a disease through early detection nant manner through a single gene, and Public Health, Baltimore, Maryland, USA. Mailing address: Department of Epidemiology, and effective intervention in the pre- such as achondroplasia and Marfan School of Hygiene and Public Health, The Johns clinical stage (6). Well-known exam- syndrome. Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205, USA. Tel: (410) 955-7961; Fax: (410) 955- ples of secondary prevention include 2. The mutation may be harmful, but 0863. E-mail: [email protected] early detection and intervention in it may remain dormant for genera-

Rev Panam Salud Publica/Pan Am J Public Health 3(3), 1998 179 tions. For instance, certain meta- history of diseases, both in popula- knowledge of characteristics of the bolic disorders of the newborn, tions and families. Analytic studies disorder and their ability to recall such as cystic fibrosis, appear only answer the “why?” and “how?” ques- them may also be greater if they have when an individual inherits two tions of genetic epidemiology. an affected relative. Table 1 demon- copies () of the mutated gene, strates a simple method of calculating that is, one from each parent. relative risk (RR) through the use of a 3. The mutation may be harmful only Family recurrence studies 2 2 table, illustrated in the study of when it interacts with other genetic Mettlin et al. (10), who investigated or environmental factors (1). For A fundamental aspect of genetic epi- familial history of breast cancer in 779 example, individuals who have demiology is the study of aggregation patients and 1 558 controls admitted to both mutated alleles for phenylke- (or recurrence) of certain diseases in the Roswell Park Memorial Institute in tonuria or congenital hypothyroid- given families. King et al. (9) proposed Buffalo, New York. The RR of suffer- ism manifest these diseases only three questions to help identify the ing from breast cancer associated with when they are exposed to elevated scope of studies of family recurrence: a positive family history was 1.62 (95% concentrations of phenylalanine or CI: 1.28 to 2.06) (see Table 1). When the reduced concentrations of thyroid 1. Are there diseases that affect vari- analysis was broken down by age of hormone, respectively. ous members of the same family? the cases and the controls (<55 or ≥55 2. Is this familial aggregation associ- years of age), the RRs were 1.34 (95% The goals of genetic epidemiology ated with common environmental CI: 0.94 to 1.92) and 1.88 (95% CI: 1.37 contrast with those of “traditional” exposure, hereditary susceptibil- to 2.58), respectively (Table 2). This dif- epidemiology and population genet- ity, or cultural inheritance of risk ference reveals a limitation in family- ics. “Traditional” epidemiology stud- factors? based case-control studies, especially ies the relationship between the envi- 3. If there is genetic susceptibility, when the illnesses that are studied ronment and the incidence of a given how is it inherited? appear at a later stage in life, because disease, although it recognizes the sig- family members of young patients nificance of the host and his or her The existence of familial aggrega- tend to be younger than those of the genetic makeup. , tion can be determined by observing controls. on the other hand, seeks to predict the the prevalence of a given disease in Other methods, such as cohort anal- influences of population structure and family members of the index case (the ysis, regressions, and generalizable selection and mutation on bodily phe- index case is the affected individual estimation equations, allow calcula- notypes and diseases. Finally, genetic who introduces the family into the tions to be broadened to include more epidemiology studies the way envi- study) and of controls (individuals complex situations. It is important to ronmental risk factors interact with the who are not affected). Such an aggre- point out that a high family aggrega- genetic makeup of a given population. gation exists when relatives of affected tion does not prove the existence of individuals run a higher risk of suffer- a genetic mechanism producing the ing from the disease than relatives of disease, just as a low recurrence does METHODS OF GENETIC individuals who are not affected. This not exclude the possibility that such a EPIDEMIOLOGY method is efficient and inexpensive, mechanism exists. but one of its limitations is that infor- Although the comparison of family Genetic epidemiology uses two mation about characteristics of family members of patients and of controls types of research strategies: descrip- members and controls may give rise to may be considered to be an “epidemi- tive and analytic. The descriptive strat- bias. For example, if the researcher is ologic” technique, it is also possible to egy, at the population as well as at aware that the disease is present in identify a familial aggregation by the family level, is based on the study the participant’s family, he or she may means of “statistical genetics.” In this of time, location, and the individual. overdiagnose it. Family members’ case, the degree of aggregation of a Some questions that exemplify this strategy are as follows: What is the prevalence at birth of achondroplasia among the population, and what is the TABLE 1. Relative risk of suffering from breast cancer associated with a positive and a mutation rate for this disease? What negative family history, based on a group of 779 breast cancer patients and 1 558 controls are the frequencies of blood groups and of histocompatibility antigens in Cases Controls Total cases Total controls different population groups? Do geo- Other relative affecteda Yes a b 144 191 graphic differences exist in the preva- No c d 635 1 367 lence of a given genetic factor? In con- Relative risk (95% CI) ad/bc 1.62 (1.28 to 2.06) 1.00 trast, analytic studies seek to identify Source: Reference 10. the role of genetic factors in the natural a Other relative affected refers to any first-degree relative (mother, daughter, sister) with breast cancer.

180 Wyszynski • Genetic epidemiology: an expanding scientific discipline TABLE 2. Relative risk of suffering from breast cancer associated with a positive and a can be applied to phenotypes that are negative family history, based on a group of 779 patients and 1 558 controls, by age expressed as continuous variables, such as blood lipid or blood glucose Age concentrations, blood pressure, and <55 years old ≥55 years old hormone levels. Analyses of the vari- Cases Controls Cases Controls ance components, or alternatively path Other relative affecteda Yes 58 90 86 101 analysis, are also useful methods for No 300 626 335 741 studying these phenotypes. Relative risk (95% CI) 1.34 (0.94 to 1.92) 1.0 1.88 (1.37 to 2.58) 1.0 Once there is evidence of familial Source: Reference 10. aggregation and genetic control of a a Other relative affected refers to any first-degree relative (mother, daughter, sister) with breast cancer. disease, a third question emerges: How can the genetic marker involved be identified? To respond to this ques- tion, various methods have been de- disease in a family is expressed as R, ronmental exposure. For discrete phe- veloped over the past 20 years, thanks which is defined as the quotient be- notypes (affected as opposed to not to the many new molecular biological tween the risk among relatives of the affected), the statistical model is based techniques, as well as computers and cases of having the disease and the on the premise that there is a contin- complex statistical algorithms. The prevalence of that disease in the over- uum of liability, with normal distribu- most common methodologies are des- all population. This method requires tion, which determines the risk of suf- cribed below, with examples from to be calculated for each degree of rela- fering from the disease. According to recently published studies. tion. Table 3 shows the results of the this model, when the threshold is sur- study by Slater and Cowie (11), who passed, the disease appears. Both the analyzed data from the first published susceptibility and the threshold can be Twin studies familial studies on schizophrenia. It inherited, and mathematical proper- can be observed that approaches 1 ties of the normal distribution allow Twin studies have typically been as the degree of relationship becomes the parameter to be predicted. Analy- used to determine whether genetic fac- more distant. It is important to point sis of the multifactorial model focuses tors play a role in the etiology of cer- out that such an association is not suf- on estimating the risk correlation tain diseases. Such studies consist of ficient in linking schizophrenia to a among family members (12). This comparing the difference in concor- purely genetic origin. model does not distinguish genetic in- dance between identical or monozy- In the case of multifactorial heredi- fluences from environmental ones, gotic twins (MZ) and fraternal or dizy- tary diseases, two components can be and heritability can be given too much gotic twins (DZ). MZ twins share 100% distinguished in the covariance or cor- weight, especially when there are en- of their genetic material, whereas DZ relation between blood relatives: that vironmental factors that greatly influ- twins share, on average, 50% of their attributable to genetic differences and ence the risk among family members. genes. If sets of twins are being stud- that produced by differences in envi- The multifactorial linear model also ied, and the MZ twins are found to be concordant (both have the same dis- ease, for example) with greater fre- quency than the DZ twins, it is possi- ble to conclude that genetic factors are TABLE 3. The first studies conducted on familial risk of suffering from schizophrenia at least partially involved in the etiol- ogy of that disease (13). It is important Years Studies Relation Incidencec a to note, however, that genetic differ- 1928–1962 14 Parents 336/7 675 = 4.36% 5.45 ences may exist between MZ twins. (adjusted valueb = 14.12%) 17.65b They may differ, for example, in the 1928–1962 12 Siblings 724/8 504 = 8.51% 10.6 series of T-cell antibodies and recep- 1921–1962 5 Offspring 151/1 226 = 12.31% 15.4 1930–1941 4 Aunt/uncle 68/3 376 = 2.01% 2.5 tors, in the number of mitochondrial 1916–1946 3 Half siblings 10/311 = 3.22% 4.0 deoxyribonucleic acid (DNA) mole- 1926–1938 5 Niece/nephew 52/2 315 = 2.25% 2.8 cules, in somatic mutations in general, 1928–1938 4 Grandchildren 20/713 = 2.81% 3.5 and in the inactivation pattern of the X 1928–1941 4 Cousins 71/2 438 = 2.91% 3.6 in female twins (14). It is Source: Reference 11. also well known that MZ twins may a Values are calculated assuming a population prevalence of 0.8%. b The adjustment is made because the patient rarely has children once schizophrenia has become clinically overt. differ from DZ twins as a result of c Calculated by the following formula: environmental factors. Individuals with X relationship who develop the illness during the given time period One of two calculations is normally Total individuals with X relationship during the given time period made in twin studies, based on the

Rev Panam Salud Publica/Pan Am J Public Health 3(3), 1998 181 method used to select the twins: (1) istries. Another limitation, especially TABLE 6. Gene-environment interaction pair concordance rate, which describes in behavior studies, is that MZ twins analysis in the context of case-only studies the proportion of twin pairs where tend to share environmental factors both siblings are affected; and (2) more frequently than DZ twins. Susceptible genotype index case concordance rate, which is Exposure No Yes the proportion of affected individuals No a b among the co-twins of those selected Gene-environment interaction Yes c d as index cases. Although the pair con- studies cordance rate is the simplest method Source: Reference 17. of determining whether genes affect a The existence of interactions be- specific phenotype, it does not mea- tween genetic and environmental fac- sure the magnitude of such an effect. tors has been widely described in the For that purpose, use of the index case last half century. Phenylketonuria is TABLE 7. Review of data by Hwang et al. (7 ) concordance rate is preferable. a classic example. This recessive meta- in the context of case-only studies Twin studies are limited by several bolic disorder manifests itself only in factors, in particular those associated individuals who are homozygous for TGF phenotype (A2 ) with the way participants are selected the mutation and who have been ex- for the studies. For example, it has posed to phenylalanine (an amino acid Maternal smoking No Yes been observed that studies that de- present in milk and other food prod- No 36 7 pend exclusively on volunteers have a ucts). Xeroderma pigmentosum is Yes 13 13a greater proportion of MZ twins, fe- another example; affected individuals Source: Reference 17. a male pairs, and participants who are increase their risk of developing skin ORco: ad/bc: (36 13) / (7 13) = 5.14 (95% CI: 1.68 to concordant for the phenotype under cancer when they expose themselves 15.71). study. Such differences may influence to ultraviolet rays. Ottman (15) has re- the concordance rate that is calculated, viewed other similar examples. which is why several countries— Because of advances in the Human Sweden is a prime example—have Genome Project, the case-control sible genetic-environmental interac- launched population-based twin reg- method is often used to describe pos- tions. As seen in Tables 4 and 5, mater- nal cigarette smoking in the first trimester of pregnancy interacts with the fetal phenotype (A2 allele of the genetic marker known as the trans- TABLE 4. Outline for gene-environment interaction analysis in the context of a case-control forming growth factor-alpha [TGF]) study in the formation of nonsyndromatic Environmental exposurea Genetic susceptibility Cases Controls Odds ratio cleft palate (odds ratio obtained in the case-control study [ORcc] = 5.5 [95% CI: a b 1.0 2.1 to 14.6]). This finding was later cdORg = bc/ad confirmed by Shaw et al. (16) in a efORe = be/af ghOR= bg/ah study of isolated cases of cleft lip and ge palate. Khoury and Flanders (17) Source: Reference 17. a recently described case-only studies as : absent; : present; interaction under an additive model: ORge = ORg ORe; interaction under a multiplicative model: an alternative to case-control studies. ORge = ORg ORe. In a case-only study, the 2 2 table is reconfigured as illustrated in Tables 6 and 7. The odds ratio calculated in the

case-only study (ORco) is similar to the TABLE 5. Interaction between fetal phenotype with TGF and maternal smoking associated ORcc. Although both methods are with cleft palate statistically powerful and relatively simple to perform, it is difficult to Maternal Phenotype TGF Odds ratio interpret the results. The existence of smoking (A2 allele) Cases Controls (95% CI) gene-environment interaction is, in No No 36 167 1.0 itself, a statistical association, which No Yes 7 34 1.0 (0.3 to 2.4) is not necessarily causal. However, it Yes No 13 69 0.9 (0.4 to 1.8) is important to emphasize that both Yes Yes 13 11 5.5 (2.1 to 14.6) methods are useful instruments in the Source: Reference 7. analysis of gene-environment interac-

182 Wyszynski • Genetic epidemiology: an expanding scientific discipline tion, because they identify factors that “positional cloning” techniques are 1. The allele in question is actually the could become significant in the pre- used, including allelic association cause of the phenotype. vention of the disorder being studied. analysis and linkage analysis. 2. The allele does not cause the pheno- type but is in linkage disequilib- rium with the causal allele. Linkage Complex segregation analysis Allelic association studies disequilibrium takes place when the causal allele of the phenotype is Complex segregation analysis is a The primary goal in allelic associa- physically close (or linked) to the useful technique for determining tion studies is to compare the fre- allele being studied. This is often whether a specific phenotype (repre- quency of different risk factors in a observed in young, typically iso- sented by a continuous or discrete group of individuals affected by a lated populations (the Finnish pop- variable) has a mendelian transmis- given disease and in a control group ulation is a good example of a stable sion pattern in a genealogical group (27). The risk factor assessment may group in which allelic association (1). The algorithm used provides prob- include environmental exposure or studies often produce positive ability estimates for various genetic genetic traits. Genetic traits may be results). factors: for the mendelian models, both genetic products, such as proteins 3. The population is mixed. In a mixed these include transmission probabili- or enzymes, or genetic markers based population, any phenotype com- ties, gene frequencies, and penetrance on DNA sequences. Genetic markers, mon to an ethnic group would ap- parameters; for polygenic models, her- known as restriction fragment length pear to be positively associated with itability, sample averages, and vari- polymorphisms (RFLPs), are obtained any allele that is also common in ances; and for what is known as the by using restriction enzymes, which that particular ethnic group. Lander mixed model, both types of parame- cut DNA at specific sites. In recent and Schork (34) give an amusing ters (18). For example, Newman et al. years, another type of genetic marker example of an association resulting (19) showed that the degree of family has been developed—the so-called from a mixed population group: aggregation of breast cancer in 1 759 microsatellites—which, in most cases, families was consistent with autoso- can offer more genetic information “. . . suppose that a would-be geneti- mal dominant inheritance as a result of than traditional RFLPs (8). cist set out to study the ‘trait’ of the action of an uncommon allele Statistical analysis in an association ability to eat with chopsticks in (0.06%). This allele was implicated in study is simple and can be summa- the San Francisco population by 4% of all cases except 20% of affected rized in a 2 2 table. The challenge, as performing an association study mother-daughter pairs, within the in most case-control studies, lies in with the HLA complex. The allele larger context of multifactorial causa- selecting the controls. Allelic associa- HLA-A1 would turn out to be posi- tion. Other examples of phenotypes tions have yielded a better under- tively associated with ability to use studied by this technique are asthma standing and earlier diagnoses of cer- chopsticks—not because immuno- and atopy (20), obesity (21), plasmatic tain autoimmune diseases. The allele logical determinants play any role apolipoprotein (22, 23), dyslexia (24), HLA-B27, for example, is present in in manual dexterity, but simply and labiopalatine clefts (25).3 The pri- 90% of patients with ankylosing spon- because the allele HLA-A1 is more mary limitation of this method is its dylitis, but it is found in only 9% of the common among Asians than Cau- sensitivity to the process by which in- general population (28). HLA alleles casians.” dividuals are selected. If the selection have also been associated with type I is biased, which usually occurs when diabetes, rheumatoid arthritis, multi- For this reason, the study of rela- cases come from a clinical setting, the ple sclerosis, celiac disease, and sys- tively homogeneous populations results tend to be spurious. Further- temic lupus erythematosus (29). Asso- allows such spurious associations to more, segregation studies are long and ciations have recently been identified be avoided. costly. between the angiotensin I-converting Other analytic techniques developed The methods described above indi- enzyme (ACE) and cardiovascular dis- in recent years do not seem to be af- cate the relative importance of genetic ease (30), between angiotensinogen fected by the makeup of the target factors in a disease or phenotype, but and hypertension (31), between apo- population (35). One such technique is they do not identify the specific causal lipoprotein E and Alzheimer’s disease the transmission disequilibrium test factor. To identify the genes that might (32), and between the insulin gene (TDT) (36).4 A hypothetical example is be involved in the origin of diseases, (INS) and type I diabetes (33). a genetic marker with two alleles, M1 The interpretation of a positive asso- ciation should not be taken lightly. Associations can arise for three rea- 3 Segregation calculations can be done on various 4 Readers interested in other methods can refer to computer programs accessible through the Inter- sons, one of which is completely arti- the work by Thomson on the haplotype relative net (see reference 26). ficial (34): risk method (37).

Rev Panam Salud Publica/Pan Am J Public Health 3(3), 1998 183 TABLE 8. Combinations of transmitted and fected), a finding of allelic association hypothesis is (linkage or no linkage); 2 nontransmitted marker alleles M1 and M2 is not valid. This is because the test thus, maximum likelihood is ex- among parents (2n) of affected cases (n) | assumes that observations are inde- pressed as L(O | ). The relative likeli- pendent, but this is not the case when hoods of the two hypotheses (linkages, Nontransmitted allele participants are related. The linkage or < 0.5; or no linkage, = 0.5) are Transmitted allele M M Total 1 2 test, however, is valid even under calculated based on the likelihood | | M a b a + b these conditions (35). quotient LQ = L(O | < 0.5) / L(D | 1 M2 c d c + d = 0.5). To produce a significance Total a + c b + d 2n value, the LQ should be transformed Source: Reference 36. Linkage analysis logarithmically into an LOD score (logarithm for the likelihood of odds Linkage analysis is a valuable re- quotient of linkage, represented by Z). source used to identify genes that may Algebraically, this is expressed as Z = 5 have a causal association with the phe- log10(LQ). and M2, so that the possible combina- notype in question, because it allows LOD scores for the different values tions are M1M1, M1M2 (or M2M1), and one to assess whether the loci in a of are usually illustrated in a table M2M2 (Table 8). The case group is chromosome are transmitted together (45). When = 0.5, Z is always 0— selected based on the presence of a more often than expected during because they divide two identical given phenotype, and the genotype of meiosis. Statistical tests of linkage esti- probabilities—and log10(1) = 0. For these cases and their parents is deter- mate the recombination fraction () recombinant fractions less than 0.5, the mined. The frequency with which the between two loci. If = 0, then there is referent LOD scores are 3.00 and ≥ 4 M1 or M2 allele is transmitted to each complete linkage, which implies that 2.00. An LOD score 3.00 (P = 10 ) affected individual is then assessed. the alleles at the two loci are always is evidence of linkage, whereas an Families may be triads (the affected transmitted together. A finding of pos- LOD score < 2.00 rejects the linkage individual and parents) or they may itive linkage between the locus of a hypothesis. Recently, Lander and be more complex (various affected genetic marker (known) and the locus Kruglyak (46) suggested that linkage family members plus parents). The of the phenotype under study (un- should be considered significant once method is statistically sound, even in known) allows the investigator to an LOD score of ≥3.3 (P = 5 ( 105) is mixed population groups. The TDT determine the chromosomal location reached. examines the hypothesis that the of the locus that produces the latter. In Linkage analysis may be broadened marker and the phenotype are not this manner, causal genes of more to include more complex systems. For genetically linked. The theory used is than 60 mendelian disorders have example, multipoint linkage analysis derived from the Neyman-Pearson been identified, and the list grows allows multiple genetic markers lo- method (38) and uses only the b and c daily (41). cated in the same chromosome to be observations (see Table 8) from het- If = 0.5, then there is no linkage. assessed simultaneously. As a result of erozygous parents (M1M2). The for- In other words, each of the loci is growing identification of the genetic mula (b c)2/(b + c) reveals whether transmitted independently of the markers present in each chromosome, there is an equal number of M1 and M2 other (as occurs with loci on differ- multipoint linkage analysis has be- transmissions from heterozygous par- ent ). The value also come the technique of choice for the ents to their affected offspring. If link- serves as a measure of the physical exact location of genes. Given that this age exists between the marker and the distance between two loci; the greater technique implies that a large number phenotype, in addition to allelic asso- the value of , the greater the distance of markers will be analyzed within the ciation, b and c will tend to be differ- from one locus to another. Linkage same chromosome, investigators nor- ent. The test for statistical significance analysis may also be used to establish mally apply it only after signs of link- of the TDT is the 2 (McNemar asymp- the sequence of loci on one chromo- age have been found in a specific chro- totic test) or Fisher’s exact test (36). A some if more than two loci are being mosomal region. considerable difference confirms that investigated. Linkage analysis may also be con- the marker is linked to the phenotype Linkage between two loci is not an ducted when a desired phenotype locus. The TDT may be used with actual occurrence but rather a hypoth- shows genetic heterogeneity or when genetic markers with more than two esis to be tested statistically. For this it is a result of the interaction of two or alleles and may incorporate covari- purpose, the maximum likelihood ables (39, 40). It is important to point method is used (42). The likelihood out that when the TDT is conducted on of a hypothesis, called L(H), is pro- families with a recurrent phenotype portional to the probability of the ex- 5 LOD scores can be calculated with programs avail- (the so-called “multiplex families” in perimental observation under this hy- able for personal computers and networks (43) or | which more than one member is af- pothesis, Prob(O | H). In this case, the through the Internet (44).

184 Wyszynski • Genetic epidemiology: an expanding scientific discipline more genes. In the first case, more than results in locating chromosomes for netic epidemiology is to educate the one gene acts independently in pro- several phenotypes, such as those for rest of the scientific community, and ducing the phenotype. For example, in type I diabetes, essential hypertension, more importantly, the nonscientific some families, hereditary breast can- serum immunoglobulin E levels, and community, regarding the implica- cer is attributable to mutations of the bone density in postmenopausal tions and importance of the Interna- BRCA1 gene; in others, it is due to mu- women (34). Although much sounder tional . tations of the BRCA2 gene. Finally, in than linkage analysis, the analysis of some families the cause lies in muta- pairs of affected siblings is limited by tions of unidentified genes. Pheno- the large number of siblings needed to Ethical, legal, and social concerns types produced, at least partially, by provide sufficient data to perform the the synergistic interaction of two or statistical calculations (on the order of Since its inception, the planners of more genes include those observed in hundreds or thousands of sibpairs). the International Human Genome multiple sclerosis (47) and in total Project recognized that gene identifi- serum immunoglobulin E levels (48). cation would have profound implica- FINAL COMMENTS tions for individuals, families, and society. Many questions were raised, Analysis of shared alleles Academic activity and employment such as how the genetic information opportunities: what the future holds should be interpreted and used; who The linkage analysis method des- should have access to it; how individ- cribed here is extremely sensitive to Genetic epidemiology is a rapidly ex- uals could be protected from potential errors in the hereditary transmission panding discipline. Many academic in- harm; and what is the benefit of ge- models used to explain the phenotype stitutions and government agencies— netic research when little, or nothing, studied and in the variations of the particularly in England, France, and the can be offered in terms of a cure or population-based allele frequency val- United States of America—offer aca- prevention. ues attributed to the families under demic and research programs in genetic Genes that cause, or at least partially study. Thus, analytic techniques re- epidemiology. Employment possibili- cause, several diseases have already quiring no models have been devel- ties for genetic epidemiologists are ex- been identified. Although such dis- oped, based on comparing the alleles cellent, especially in the more industri- eases can be detected and diagnosed that are shared between family mem- alized nations. earlier and more accurately, the long- bers. One such technique, the analysis The International Human Genome term goal of the International Human of affected siblings, evaluates how Project has spurred great interest and Genome Project is to improve their often a specific copy of a chromosomal controversy. Its primary goal is to ob- treatment, to prevent them, and to ul- region is shared identically by descent tain a complete map of the human timately cure them. In the interim, (IBD), that is, by being passed down genome by sequential analysis of when early detection is possible but from a common ancestor. For example, DNA (49).6 The U.S. National Center knowledge is limited and treatment is two siblings may share none, one, or for Biotechnology Information an- not yet available, there is a period two IBD copies of any locus (with an nounced, in October 1996, that ap- marked by critical ethical, legal, and expected distribution of 25%, 50%, and proximately 16 500 genes had been social controversy. 25%, respectively, if random allelic identified, which corresponds to Since 1989, the National Human segregation has occurred). The statisti- about 20% of all human genes (50). Genome Research Institute of the cal test compares the average number The Project is scheduled for comple- United States has housed the Working of alleles shared IBD () with the tion in the year 2005, when the se- Group on Ethical, Legal, and Social expected average (50%). The results quence of the 3 billion nucleotides of Implications of the Human Genome are given in P values, LOD scores, or Z human DNA has been mapped. In this Project. As a multidisciplinary and scores (number of standard deviations context, one task for specialists in ge- interinstitutional group, it is interested by which surpasses the expected in the following four domains (51, 52): 50%). For example, 100 sibpairs who share 61% of the alleles in a genome 1. Privacy and fairness in the use and sector correspond to a P-value of 0.001, interpretation of genetic informa- an LOD score of 2.1, and a Z score of 3.1 6 The home page for the U.S. National Research tion. It seeks to assess the mecha- Center of the Human Genome can be accessed (46). According to those authors, evi- on the Internet at: http://www.nhgri.nih.gov; nisms for preventing the discrimi- dence of linkage is obtained with the the Genome Database, one of the main databases nation and stigmatization that for localized genes, can be found at: http:// sibpair method when the LOD score gdbwww.gdb.org; the comprehensive genetic result from the misuse (and misin- equals 3.6 or more (P ≥ 2.2 10 5). map can be seen at: http://www.ncbi.nim.nih. terpretation) of this information. gov/SCIENCE96/; and the catalogue of genes and This method, using pairs of affected congenital defects is located at: http://www3. 2. Clinical integration of genetic tech- siblings, has been used with positive ncbi.nim.nig.gov/omin/ nology. In this context, the effect of

Rev Panam Salud Publica/Pan Am J Public Health 3(3), 1998 185 the availability of in Genetic epidemiology in today’s can countries a common medium for medical practice will be examined, world the development of genetic epidemiol- together with the mechanisms for ogy has not yet been created. Initia- its assessment. The International Genetic Epidemi- tives in this respect have been inde- 3. Methodology of genetic research. It ology Society,7 which has more than pendent and sporadic. Although many is primarily concerned with deter- 400 members and is growing rapidly, Latin American countries are experi- mining how to inform potential vol- edits the monthly journal Genetic Epi- encing widespread social and political unteers of the risks and benefits of demiology and organizes international changes, academic and scientific insti- participating in a research study conferences annually. In Latin Ameri- tutions need to make room for and and how to obtain the correspond- give support to new disciplines, such ing consent. as genetic epidemiology, without ne- 4. Education for community members glecting those that already exist. Only and medical professionals on the in this way will Latin American coun- 7 Its Website can be found at the following Internet scope and importance of the Hu- address: http://darwin.mhmc.cwru.edu/IGES/ tries be able to join the circle of scien- man Genome Project. index.html tifically advanced nations.

REFERENCES

1. Khoury MJ, Beaty TH, Cohen BH. Funda- 15. Ottman R. An epidemiologic approach to visited. Cleft Palate Craniofac J 1996;33: mentals of genetic epidemiology. New York: gene-environment interaction. Genet Epi- 406–417. Oxford University Press; 1993. demiol 1990;7:177–186. 26. Statistical analysis for genetic epidemiol- 2. Neel JV, Schull WJ. Human heredity. Chi- 16. Shaw GM, Wasserman CR, Lammer EJ, ogy. Available: http://darwin.mhmc.cwru. cago: University of Chicago Press; 1954: O’Malley CD, Murray JC, Basart AM, et al. edu/pub/sage.html. Accesed 25 Septem- 283–306. Orofacial clefts, parental cigarette smok- ber 1997. 3. Vogel F, Motulsky AG. Human genetics, ing, and transforming growth factor-alpha 27. Hwang S-J, Beaty TH, Liang K-Y, Coresh J, problems and approaches. 2nd ed. Berlin: gene variants. Am J Hum Genet 1996;58: Khoury MJ. Minimum sample size estima- Springer-Verlag; 1986. 551–561. tion to detect gene-environment interac- 4. Introduction. In: Morton NE, Chung CS. 17. Khoury MJ, Flanders WD. Nontraditional tion in case-control designs. Am J Epidemiol Genetic epidemiology. New York: Academic epidemiologic approaches in the analysis 1994;140:1029–1037. Press; 1978:3–11. of gene-environment interaction: case- 28. Ryder LP, Andersen E, Svejgaard A. HLA 5. Cohen BH. Chronic obstructive pulmonary control studies with no controls! Am J Epi- and disease registry: third report. Copen- disease: a challenge in genetic epidemiol- demiol 1996;144:207–213. hagen: Munskgaard; 1979. ogy. Am J Epidemiol 1980;112:274–288. 18. Morton NE, MacLean CJ. Analysis of fam- 29. Braun WE. HLA and disease. Boca Raton, 6. Introduction. In: Gordis L. Epidemiology. ily resemblance. III. Complex segregation Florida: Chemical Rubber Company Press; Philadelphia: WB Saunders; 1996:5–6. analysis of quantitative traits. Am J Hum 1979. 7. Hwang SJ, Beaty TH, Panny SR, Street NA, Genet 1974;26:489–503. 30. Tiret L, Rigat B, Visvikis S, Breda C, Corvol Joseph JM, Gordon S, et al. Association 19. Newman B, Austin MA, Lee M, King MC. P, Cambien F, et al. Evidence, from com- study of transforming growth factor alpha Inheritance of human breast cancer: evi- bined segregation and linkage analysis, (TGF) TaqI polymorphism and oral clefts: dence for autosomal dominant transmis- that a variant of the angiotensin I-convert- indication of gene-environment interaction sion in high-risk families. Proc Natl Acad ing enzyme (ACE) gene controls plasma in a population-based sample of infants Sci U S A 1988;85:3044–3048. ACE levels. Am J Hum Genet 1992;51: with birth defects. Am J Epidemiol 1995;141: 20. Panhuysen CIM, Meyers DA, Postma DS, 197–205. 629–636. Levitt RC, Bleecker ER. The genetics of 31. Jeunemaitre X, Soubrier F, Kotelevtsev YV, 8. Mutation and instability of human DNA. asthma and atopy. Allergy 1995;50:863–869. Lifton RR, Williams CS, Charry A, et al. In: Strachan T, Read AP. Human molecular 21. Bouchard C. The genetics of obesity: from Molecular basis of human hypertension: genetics. New York: Wiley-Liss; 1996: genetic epidemiology to molecular mark- role of angiotensinogen. Cell 1992;71: 259–261. ers. Mol Med Today 1995:45–50. 169–180. 9. King MC, Lee GM, Spinner NB, Thomson 22. Moll PP, Michels VV, Weidman WH, Kot- 32. Pericak-Vance MA, Haines JL. Genetic sus- G, Wrensch MR. Genetic epidemiology. tke BA. Genetic determination of plasma ceptibility to Alzheimer disease. Trends Annu Rev Public Health 1984;5:1–52. apolipoprotein AI in a population-based Genet 1995;11:504–508. 10. Mettlin C, Corghan I, Natarajan N, Lane sample. Am J Hum Genet 1989;44:124–139. 33. Bain SC, Prins JB, Hearne CM, Rodriguez W. The association of age and familial risk 23. Prenger VL, Beaty TH, Kwiterovich PO. NR, Rowe BR, Pritchard LE, et al. Insulin in a case-control study of breast cancer. Am Genetic determination of high-density gene region-encoded susceptibility to type J Epidemiol 1990;131:973–986. lipoprotein cholesterol and apolipoprotein 1 diabetes is not restricted to HLA-DR4- 11. Slater E, Cowie V. Genetics of mental disor- A-I plasma levels in a family study of car- positive individuals. Nature Genet 1992;2: ders. Oxford: Oxford University Press; diac catheterization patients. Am J Hum 212–215. 1970. Genet 1992;51:1047–1057. 34. Lander ES, Schork NJ. Genetic dissection of 12. Elston RC. Segregation analysis. Adv Hum 24. Pennington BF, Gilger JW, Pauls D, Smith complex traits. Science 1994;265:2037–2048. Genet 1981;11:63–120. SA, Smith SD, DeFries JC. Evidence for 35. Spielman RS, Ewens WJ. The TDT and 13. Susser M. Separating heredity and envi- major gene transmission of developmen- other family-based tests for linkage dis- ronment. Am J Prev Med 1985;1:5–23. tal dyslexia. J Am Med Assoc 1991;266: equilibrium and association. Am J Hum 14. Ollier WER, MacGregor A. Genetic epi- 1527–1534. Genet 1996;59:983–989. demiology of rheumatoid disease. Br Med 25. Wyszynski DF, Beaty TH, Maestri NE. 36. Spielman RS, McGinnis RE, Ewens WJ. Bull 1995;51:267–285. Genetics of non-syndromic oral clefts re- Transmission disequilibrium test for link-

186 Wyszynski • Genetic epidemiology: an expanding scientific discipline age disequilibrium: the insulin gene 43. Terwilliger J, Ott J. Handbook for human 50. Schuler GD, Boguski MS, Stewart EA, Stein region and insulin-dependent diabetes . Baltimore: Johns Hopkins LD, Gyapay G, Rice K, et al. A gene map mellitus (IDDM). Am J Hum Genet 1993;52: University Press; 1994. of the human genome. Science 1996;274: 506–516. 44. An alphabetic list of genetic analysis soft- 540–546. 37. Thomson G. Mapping disease genes: ware. Available: http://linkage.rockefeller. 51. Pembrey ME, Anionwu EN. Ethical as- family-based association studies. Am J edu/soft/list.html. Accessed 25 September pects of genetic screening and diagnosis. Hum Genet 1995;57:487–498. 1997. In: Rimoin DL, Connor JM, Pyeritz RE, eds. 38. Kendall MG, Stuart A. Volume 2: Inference 45. Morton NE. Sequential tests for the detec- Emery and Rimoin’s principles and practice of and relationship. In: The advanced theory of tion of linkage. Am J Hum Genet 1955;7: medical genetics. 3rd ed. New York: statistics. 4th ed. London: Griffin; 1979. 277–318. Churchill Livingston; 1996. 39. Duffy DL. Screening a 2 cM genetic map for 46. Lander E, Kruglyak L. Genetic dissection 52. Human Genome Project information. allelic association: a simulated oligogenic of complex traits: guidelines for interpret- Ethical, legal, and social issues (ELSI). trait. Genet Epidemiol 1995;12:595–600. ing and reporting linkage results. Nature Available: http://www.ornl.gov/Tech 40. Bickeböller H, Clerget-Darpoux F. Statisti- Genet 1995;11:241–247. Resources/Human_Genome/resource/ cal properties of the allelic and genotypic 47. Tienari PJ, Terwilliger JD, Ott J, Palo J, elsi.html. Accessed 25 September 1997. transmission/disequilibrium test for multi- Peltonen L. Two-locus linkage analysis in allelic markers. Genet Epidemiol 1995;12: multiple sclerosis (MS). Genomics 1994;19: 865–870. 320–325. 41. McKusick VA. History of medical genet- 48. Xu J, Levitt RC, Panhuysen CIM, Postma ics. In: Rimoin DL, Connor JM, Pyeritz RE, DS, Taylor EW, Amelung PJ, et al. Evidence eds. Emery and Rimoin’s principles and prac- for two unlinked loci regulating total serum tice of medical genetics, 3rd ed. New York: IgE levels. Am J Hum Genet 1995;57:425–430. Churchill Livingstone; 1996. 49. Engel LW. The human genome project: his- 42. Edwards AWF. Likelihood. Cambridge: tory, goals, and progress. Arch Pathol Lab Manuscript received on 22 April 1996. Revised version Cambridge University Press; 1972. Med 1993;117:459–465. accepted for publication on 5 February 1997.

RESUMEN La epidemiología genética es una disciplina relativamente reciente que estudia la interacción entre los factores genéticos y ambientales en el origen de las enfermedades humanas. Valiéndose de marcadores genéticos desarrollados a través de la biología La epidemiología genética: molecular, de complejos algoritmos almacenados en computadoras y de amplias disciplina científica en bases de datos, la epidemiología genética se ha desarrollado notablemente durante los expansión últimos 10 años. El presente artículo describe los objetivos de la epidemiología genética y su metodología, empleando ejemplos concretos de la literatura científica reciente.

Rev Panam Salud Publica/Pan Am J Public Health 3(3), 1998 187