Polygenic Inheritance, GWAS, Polygenic Risk Scores, and the Search for Functional Variants PERSPECTIVE Daniel J
Total Page:16
File Type:pdf, Size:1020Kb
PERSPECTIVE Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants PERSPECTIVE Daniel J. M. Croucha and Walter F. Bodmerb,1 Edited by Mary-Claire King, University of Washington, Seattle, WA, and approved July 2, 2020 (received for review May 22, 2020) The reconciliation between Mendelian inheritance of discrete traits and the genetically based correlation between relatives for quantitative traits was Fisher’s infinitesimal model of a large number of genetic variants, each with very small effects, whose causal effects could not be individually identified. The de- velopment of genome-wide genetic association studies (GWAS) raised the hope that it would be possible to identify single polymorphic variants with identifiable functional effects on complex traits. It soon be- came clear that, with larger and larger GWAS on more and more complex traits, most of the significant associations had such small effects, that identifying their individual functional effects was essentially hope- less. Polygenic risk scores that provide an overall estimate of the genetic propensity to a trait at the individual level have been developed using GWAS data. These provide useful identification of groups of individuals with substantially increased risks, which can lead to recommendations of medical treatments or behavioral modifications to reduce risks. However, each such claim will require extensive investigation to justify its practical application. The challenge now is to use limited genetic association studies to find individually identifiable variants of significant functional effect that can help to understand the molecular basis of complex diseases and traits, and so lead to improved disease prevention and treatment. This can best be achieved by 1) the study of rare variants, often chosen by careful candidate assessment, and 2) the careful choice of phenotypes, often extremes of a quantitative variable, or traits with relatively high heritability. GWAS | association mapping | polygenic scores The key to Mendel’s successful demonstration of the This was entitled “The correlation between relatives discrete nature of inheritance, without which it would on the supposition of Mendelian inheritance.” He not be possible to maintain the genetic variability re- showed that the observed correlations could be quired by Darwinian evolution by natural selection, explained by a model in which a large number of dis- was his careful choice of clearly dichotomous pheno- crete inherited differences were each inherited types for study. Only in this way could he have ob- according to Mendel’s laws and where each had a served the simple ratios that defined his laws. That is small effect on the quantitative trait in question. The why we refer to clearly inherited, often extreme, dif- cumulative effect of such variants at many loci could ferences that are obviously familial, as “Mendelian.” be assumed to lead to a normal distribution, and it was The ultimate reconciliation between Mendelian in this paper that Fisher introduced the term variance discrete inheritance and the correlation between as we now know it and the concept of the analysis of relatives for continuously inherited traits, such as variance. This “infinitesimal model” is commonly con- height or weight, which had been clearly observed by sidered to be the founding principle of quantitative Francis Galton and his biometrician followers and genetics. It explains observed correlations between which implied inherited tendencies for such traits, was relatives based on Mendelian genetics but does not provided by R. A. Fisher (1) in his seminal 1918 paper. attempt to identify the causal effects of particular aJDRF/Wellcome Diabetes and Inflammation Laboratory, Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom; and bCancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, United Kingdom Author contributions: W.F.B. designed research; D.J.M.C. and W.F.B. performed research; D.J.M.C. and W.F.B. analyzed data; and D.J.M.C. and W.F.B. wrote the paper. The authors declare no competing interest. This article is a PNAS Direct Submission. Published under the PNAS license. 1To whom correspondence may be addressed. Email: [email protected]. This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2005634117/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.2005634117 PNAS Latest Articles | 1of10 Downloaded by guest on October 1, 2021 genetic variants on the quantitative trait being studied. Fisher defined genes with recognizably large effects on the chance of returned only once to the analysis of quantitative inheritance us- getting diabetes. ing the infinitesimal model, and this was in the context of plant breeding (2). Here, his aim was to show how statistical approaches HLA and Disease Associations based on analyzing the quantitative distribution of a trait could be While there were claims of associations between ABO types and useful even, as he put it, “when individual factors cannot certain diseases starting in the early 1950s, it was the really striking be recognized....” associations of HLA types with autoimmune diseases, notably At that time, Fisher had become very interested in the HLA-B*27 with ankylosing spondylitis (6), that changed the whole serological techniques then being developed by Charles Todd, field of studying multifactorial inheritance by looking for associa- which were the basis for the later recognition of red cell blood tions between phenotypes and specific, possibly causal, genetic groups. Fisher suggested in a letter to Todd, with remarkable variants. The first suggestion that linkage disequilibrium (LD) with foresight, that these techniques might be able to detect “the di- an unobserved causal variant could account for associations be- rect products of individual genes rather than (those that) have tween a genetic variant and a disease was made by Bodmer (7), secondary reactions.” In this correspondence with Todd, Fisher based on early data on Hodgkin’s disease and this idea was fur- later said that he believed that such “work is going to lead to a ther developed in the context of autoimmune diseases by greater advance, both theoretical and practical, in the problems of McDevitt and Bodmer (8). The role of HLA in the control of the human genetics than can be expected from any further work on immune response provided a clear rationale for testing these as- biometrical or genealogical lines.” Here, he was foreseeing the sociations. Early examples of the use of LD with common HLA possibility of identifying biochemically the products of genes polymorphic alleles for identifying causal variants in novel closely whose variants determine a given human trait, and so identifying linked genes are the discovery of the key role of an HLA-DQ allele the effects of a given variant on the trait at the molecular level. in susceptibility to type I diabetes through the initial disease as- The transition from the “work on biometrical or genealogical sociation with HLA-B*8,15 and then HLA-DR*3,4 (9,10),and lines” to the wish to identify the direct products of genes epito- the discovery of HFE gene variants as the main underlying cause mizes the prevailing conflict in studying the genetics of complex of hereditary hemochromatosis through its association with traits. This conflict is in the contrast between analyses that do not HLA-A*3 (11). aim to identify the contributions of individual genetic variants and those that aim to understand the underlying molecular basis of the Genome-Wide Association Studies trait variation through the identification of specific variants, in It was an obvious possibility to extend the HLA and disease as- genes with defined functions, that have clearly defined effects sociation studies to looking for the association between any on the trait. The concept of a gene in the context we are discus- marker and a disease on the assumption that this could lead, by sing, is any defined mapped DNA sequence in the genome that LD, to the discovery of the functionally relevant variant in the vi- can have some functional effect and so within which a variant cinity of the gene carrying the associated variant (12). Thus, the sequence may have some detectable differential effect on a given interpretation of the reason for an observed association between phenotype. It is this dichotomy of approaches between not iden- a given identified variant and a given phenotype is that either tifying the contribution of individual variants, and understanding there is a direct effect of the variant on the phenotype or that the their molecular function, which we analyze in this paper. association is due to the effect of one or more as-yet-not- identified variants all in strong LD with observed associated vari- Polygenic Inheritance and the Location of Polygenes ant, and collectively with a larger effect on the phenotype being The first use of polygenic in the context of quantitative inheritance studied than the originally associated variant. was by Mather (3) in 1941. He talked of polygenic characters and Many initial studies looking for such disease associations in polygenic variation and postulated that this type of inheritance, genes chosen as likely functionally relevant candidates proved following Fisher’s infinitesimal model, was controlled by a differ- difficult to repeat. This