Theory and Practice in Quantitative Genetics
Total Page:16
File Type:pdf, Size:1020Kb
Theory and Practice in Quantitative Genetics Daniëlle Posthuma1, A. Leo Beem1, Eco J. C. de Geus1, G. Caroline M. van Baal1, Jacob B. von Hjelmborg 2, Ivan Iachine3, and Dorret I. Boomsma1 1 Department of Biological Psychology,Vrije Universiteit Amsterdam,The Netherlands 2 Institute of Public Health, Epidemiology, University of Southern Denmark, Denmark 3 Department of Statistics, University of Southern Denmark, Denmark ith the rapid advances in molecular biology, the near and phenotypic data from eight participating twin registries Wcompletion of the human genome, the development of will be simultaneously analysed. appropriate statistical genetic methods and the availability of In this paper the main theoretical foundations underly- the necessary computing power, the identification of quantita- tive trait loci has now become a realistic prospect for ing quantitative genetic analyses that are used within quantitative geneticists. We briefly describe the theoretical bio- the genomEUtwin project will be described. In addition, metrical foundations underlying quantitative genetics. These an algebraic translation from theoretical foundation theoretical underpinnings are translated into mathematical to advanced structural equation models will be made that equations that allow the assessment of the contribution of can be used in generating scripts for statistical genetic soft- observed (using DNA samples) and unobserved (using known genetic relationships) genetic variation to population variance in ware packages. quantitative traits. Several statistical models for quantitative genetic analyses are described, such as models for the classi- Observed, Genetic, and Environmental Variation cal twin design, multivariate and longitudinal genetic analyses, The starting point for gene finding is the observation of extended twin analyses, and linkage and association analyses. population variation in a certain trait. This “observed”, or For each, we show how the theoretical biometrical model can be translated into algebraic equations that may be used to gen- phenotypic, variation may be attributed to genetic and erate scripts for statistical genetic software packages, such as environmental causes. Genetic and environmental effects Mx, Lisrel, SOLAR, or MERLIN. For using the former program interact when the same variant of a gene differentially a web-library (available from http://www.psy.vu.nl/mxbib) has affects the phenotype in different environments. been developed of freely available scripts that can be used to About 1% of the total genome sequence is estimated to conduct all genetic analyses described in this paper. code for protein and an additional but still unknown per- centage of the genome is involved in regulation of gene expression. Human individuals differ from one another by “Genetic factors explain x% of the population variance in about one base pair per thousand. If these differences occur trait Y” is an oft heard outcome of quantitative genetic within coding or regulatory regions, phenotypic variation studies. Usually this statement derives from (twin) family in a trait may result. The different effects of variants research that exploits known genetic relationships to esti- (“alleles”) of the same gene is the basis of the model that mate the contribution of unknown genes to the observed underlies quantitative genetic analysis. variance in the trait. It does not imply that any specific genes that influence the trait have been identified. Given the rapid Quantifying Genetic and Environmental Influences: advances made in molecular biology (Nature Genome Issue, The Quick and Dirty Approach February 15, 2001; Science Genome Issue, February 16, In human quantitative genetic studies, genetic and environ- 2001), the near completion of the human genome and the mental sources of variance are separated using a design that development of sophisticated statistical genetic methods includes subjects of different degrees of genetic and envi- (e.g., Dolan et al., 1999a, 1999b; Fulker et al., 1999; ronmental relationship (Fisher, 1918; Mather & Jinks, Goring, 2000; Terwilliger & Zhao, 2000), the identification 1982). A widely used design compares phenotypic resem- of specific genes, even for complex traits, has now become a blance of monozygotic (MZ) and dizygotic (DZ) twins. realistic prospect for quantitative geneticists. To identify Since MZ twins reared together share part of their environ- genes, family studies, specifically twin family studies, again ment and 100% of their genes (but see Martin et al., appear to have great value, for they allow simultaneous 1997), any resemblance between them is attributed to these modelling of observed and unobserved genetic variation. As a “proof of principle”, genomEUtwin will perform genome- wide genotyping in twins to target genes for the complex Address for correspondence: Daniëlle Posthuma, Vrije Universiteit, traits of stature, body mass index (BMI), coronary artery Department of Biological Psychology, van der Boechorststraat 1, disease and migraine. To increase power, epidemiological 1081 BT, Amsterdam, The Netherlands. Email: [email protected] Twin Research Volume 6 Number 5 pp. 361–376 361 Downloaded from https://www.cambridge.org/core. UQ Library, on 10 Feb 2020 at 19:38:16, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1375/twin.6.5.361 Daniëlle Posthuma, A. Leo Beem, Eco J. C. de Geus, G. Caroline M. van Baal, Jacob B. von Hjelmborg, Ivan Iachine, and Dorret I. Boomsma two sources of resemblance. The extent to which MZ twins can be obtained by subtracting the MZ correlation from 2 do not resemble each other is ascribed to unique, non- unit correlation (e = 1 – rMZ). shared environmental factors, which also include These intuitively simple rules are described in textbooks measurement error. Resemblance between DZ twins reared on quantitative genetics and can be understood without together is also ascribed to the sharing of the environment, knowledge of the relative effects and location of the actual and to the sharing of genes. DZ twins share on average genes that influence a trait, or the genotypic effects on phe- 50% of their segregating genes, so any resemblance notypic means. These point estimates, however, depend on between them due to genetic influences will be lower than the accuracy of the MZ and DZ correlation estimates and for MZ pairs. The extent to which DZ twins do not resem- the true causes of variation of a trait in the population. For ble each other is due to non-shared environmental factors small sample sizes (i.e., most of the time) they may be and to non-shared genetic influences. grossly misleading. Knowledge of the underlying biometri- Genetic effects at a single locus can be partitioned into cal model becomes crucial when one wants to move beyond additive (i.e., the effect of one allele is added to the effect of these twin-based heritability estimates, for instance, to add another allele) or dominant (the deviation from purely information of multiple additional family members or additive effects) effects, or a combination. The total simultaneously estimate from a number of different rela- amount of genetic influence on a trait is the sum of the tionships the magnitude of genetic variance in the additive and dominance effects of alleles at multiple loci, population. plus variance due to the interaction of alleles at different loci (epistasis; Bateson, 1909). The expectation for the phe- The Classical Biometrical Model: notypic resemblance between DZ twins due to genetic From Single Locus Effects on a Trait Mean influences depends on the underlying (and usually to the Decomposition of Observed Variation unknown) mode of gene action. If all contributing alleles in a Complex Trait act additively and there is no interaction between them Although within a population many different alleles may within or between loci, the correlation of genetic effects in exist for a gene (e.g., Lackner et al., 1991), for simplicity DZ twins will be on average 0.50. However, if some alleles we describe the biometrical model assuming one gene with act in a dominant way the correlation of genetic dominance two possible alleles, allele A1 and allele A2. By convention, effects will be 0.25. The presence of dominant gene action allele A1 has a frequency p, while allele A2 has frequency q, thus reduces the expected phenotypic resemblance in DZ and p + q = 1. With two alleles there are three possible twins relative to MZ twins. Epistasis reduces this similarity genotypes: A1A1, A1A2, and A2A2 with genotypic freq- even further, the extent depending on the number of loci uencies p2, 2pq, and q 2, respectively, under random mating. involved and their relative effect on the phenotype (Mather The genotypic effect on the phenotypic trait (i.e., the geno- & Jinks, 1982). Depending on the nature of the types of typic value) of genotype A1A1, is called “a” and the effect familial relationships within a dataset, additive genetic, of genotype A2A2 “-a”. The midpoint of the phenotypes of dominant genetic, and shared and non-shared environmen- the homozygotes A1A1 and A2A2 is by convention 0, so a tal influences on a trait can be estimated. For example, is called the increaser effect, and –a the decreaser effect. employing a design including MZ and DZ twins reared The effect of genotype A1A2 is called “d”. If the mean together allows decomposition of the phenotypic variance into components of additive genetic variance, non-shared genotypic