Type 2 Diabetes: Genetic Data Sharing to Advance Complex Disease Research
Total Page:16
File Type:pdf, Size:1020Kb
REVIEWS Type 2 diabetes: genetic data sharing to advance complex disease research Jason Flannick1,2 and Jose C. Florez1–4 Abstract | As with other complex diseases, unbiased association studies followed by physiological and experimental characterization have for years formed a paradigm for identifying genes or processes of relevance to type 2 diabetes mellitus (T2D). Recent large-scale common and rare variant genome-wide association studies (GWAS) suggest that substantially larger association studies are needed to identify most T2D loci in the population. To hasten clinical translation of genetic discoveries, new paradigms are also required to aid specialized investigation of nascent hypotheses. We argue for an integrated T2D knowledgebase, designed for a worldwide community to access aggregated large-scale genetic data sets, as one paradigm to catalyse convergence of these efforts. Heritability Genetic studies of human disease seek insight into disease In the era that has followed these early GWAS, studies The proportion of phenotypic processes or novel therapies from naturally occurring have become much larger and more diverse, technologies variance in a population owing heritable variation in a population. Characterizing the assay rarer variants across the entire exome or genome, to genetic differences, as degree to which disease-informative variants exist, and biological experiments have begun to translate opposed to environmental identifying them and their effects, and understanding reported associations to insights into causal variants differences. how they might guide preventive or therapeutic inter- and disease processes. Here, we review the history and ventions are thus related but distinct goals of genetic recent progress towards mapping genes for T2D and con- association studies. sequent implications for disease genetic architecture. We Type 2 diabetes mellitus (T2D) is a complex, herit- then review studies to understand pathogenic mechan- able disease with a sibling relative risk of approximately isms that may point to new therapeutic modalities. We 2 1Programs in Metabolism 2 (REF. 1) and a heritability estimated at 30–70% . For dec- argue that future research into each area will be most and Medical and Population ades, analyses of natural genetic variation have been used powerful and synergistic with a new model for open and Genetics, Broad Institute of to understand T2D mechanisms or to improve prognoses inter active sharing of data and results between previously MIT and Harvard, 415 Main for the approximately 415 million people who suffer from loosely coupled communities. Street, Cambridge, Massachusetts 02142, USA. its effects; these include prolonged hyperglycaemia from 3 2Center for Human Genetic insulin resistance and relative insulin deficiency , numer- Mapping disease loci Research, Department of ous micro- and macrovascular complications4 and an Common variant association studies. Motivated by Medicine, Massachusetts increased risk of early death5. the increased power of association studies over prior General Hospital, genome-wide association 14 common disease common 185 Cambridge Street, Before common variant family studies , as well as the 15 Boston, Massachusetts studies (GWAS), it was not known to what extent variant hypothesis (CDCV hypothesis) , GWAS exploited 02062, USA. such ‘experiments of nature’ could lend insight into inexpensive single-nucleotide polymorphism (SNP) 3Diabetes Research Center, T2D. Early genetic mapping studies for T2D involved microarrays and reference catalogues of population-level Diabetes Unit, Department comparatively small-scale candidate gene and linkage variation16 to analyse through linkage disequilibrium nearly of Medicine, Massachusetts General Hospital, studies. Although these approaches localized several all common variants in the genome. The first T2D GWAS, 6 17–21 185 Cambridge Street, disease genes (for example, PPARG , KCNJ11 (REF. 7) and which studied a few thousand samples each , collec- Boston, Massachusetts TCF7L2 (REF. 8)), they were on the whole unsuccessful9. tively identified 10 genomic loci, and many loci were 02062, USA. By contrast, GWAS yielded — for the first time — a sub- reported by multiple studies. These findings validated 4Department of Medicine, Harvard Medical School, stantial number of genomic loci reproducibly linked to the GWAS design but suggested that common variants 25 Shattuck Street, Boston, T2D risk, suggesting previously unsuspected biological of large effects were mostly absent from the population: Massachusetts 02115, USA. pathways10,11. And yet, the modest fraction of heritability all associated variants affected risk by less than 40%, and Correspondence to J.C.F. attributable to GWAS associations, together with insuf- most affected risk by closer to 15%. [email protected] ficient resolution to identify specific causal variants or In order to identify common variants of weaker doi:10.1038/nrg.2016.56 effector transcripts, led to suggestions that the endeavour effects, collaboration and data sharing were necessary to Published online 11 Jul 2016 produced minimal insights12,13. increase power (FIG. 1). The Diabetes Genetics Replication NATURE REVIEWS | GENETICS ADVANCE ONLINE PUBLICATION | 1 ©2016 Mac millan Publishers Li mited. All ri ghts reserved. REVIEWS 225 MTMR3 POC5 African American PNPLA3 East Asian ZZEF1 FAM63A UBE3C 200 European PLEKHA1 Hispanic or Native American NRXN3 NHEG1 South Asian MAP3K11 HSF1 175 Initial sample size UBE2E2 HSD17B12 TMEM154 HORMAD2 Replication sample size TLE4 HLA-DQ1 C2CD4B–C2CD4A TCF19–POU5F1 GLP2R Linkage or candidate gene ZFAND6 RREB1–SSR1 CMIP 150 ZBED3 MPHOSPH9 CENPW GWAS or Metabochip TP53INP1 LPP ATP5G1 PRC1 SGCG FAF1 APOE Exome array KLF14 CILP2–TM6SF2 TMEM163 ARL15 ACSL1 Genome or exome sequencing IRS1 RBM43–RND3 MACF1 DNER IGF2 125 HNF1A ZFAND3 RASGRP1 SLC16A11 PDX1 HLA-B HMGA2 PSMD6 GRK5 MIR129–LEP TBC1D4 ABO DUSP9 PEPD FAM58A GPSM1 PAM CHCHD9 PAX4–GCC1 ZMIZ1 ITGB6–RBMS1 VPS26A TLE1 100 SRR CENTD2 ST6GAL1 MGC21675 MC4R PROX1 HNF4A BCL2 KLHDC5 MTNR1B HMG20A MAEA GIPR GCKR GRB14 KCNK16 LAMA1 DGKB GCK AP3S2 GLIS3 ANK1 CCND2 75 DGKB–TMEM195 BCAR1 THADA DUSP8 BCL11A ANKRD55 NOTCH2–ADAM30 PTPRD C6orf57 Sample size (1,000s) LGR5–TSPAN8 ADCY5 SPRY2 HNF1B JAZF1 50 CDKAL1 CDC123–CAMK1D FTO KCNQ1 IGF2BP2 ADAMTS9 CDKN2A–CDKN2B WFS1 SLC30A8 25 HHEX KCNJ11 TCF7L2 PPARG 0 et al. et al. 10973253 12540637 16415884 17463249 17293876 17463246 17460697 17463248 17603485 17554300 17668382 17603484 18372903 18711367 18711366 19056611 19401414 20016592 20174558 20081858 20418489 20581827 20818381 20862305 21490949 21573907 21874001 21799836 22101970 22238593 22158537 22325160 22293688 22456796 22693455 22961080 23160641 23209189 22885922 23300278 23532257 23945395 24101674 24509480 24390345 24464100 25043022 25102180 Scott ASHG 25483131 Mahajan ASHG Fuchsberger Fuchsberger 2001 2005 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Year Figure 1 | The history of T2D GWAS. Over the years, type 2 diabetes mellitus (T2D) genome-wideNature association Reviews studies| Genetics (GWAS), which typically consist of a two-stage discovery and replication study design, have increased in size and diversity. Plotted are circles representing T2D GWAS, as specified by the US National Human Genome Research Institute–European Bioinformatics Institute (NHGRI–EBI) GWAS catalogue163, as well as additional candidate gene or sequencing studies of note. The x-axis shows the year of publication, whereas the y-axis shows discovery sample size. The inner (darker) circles are also scaled in proportion to discovery sample size, whereas the outer (lighter) circles are scaled in proportion to total (discovery + replication) sample size. Circles are coloured according to ethnic composition of the sample set: African American (dark blue), East Asian (light blue), European (purple), Hispanic or Native American (yellow) or South Asian (green). PubMed identifiers (or first author name) for each study are shown at the base of the figure and connected to the corresponding circle with a dotted line. Identifiers are coloured according to the technology used in the study: linkage or candidate gene studies (red), GWAS or Metabochip (beige), exome array (orange) or sequencing (dark grey). Additionally, T2D loci are listed directly above the first reporting study. With some exceptions based on retrospective knowledge, a P value of 5 × 10−8 is used as a threshold for significance, and loci are named as originally reported. Genome-wide association studies and Meta-analysis (DIAGRAM) Consortium increased in either overlapping samples or more diverse ethnic 32–34 (GWAS). An approach for sample size above 10,000 (with approximately 4,500 T2D groups . The similar goals, organization and method- genetic mapping that cases), identifying a further six loci22. Enabled by a willing- ology of these consortia, as well as of those investigating compares frequencies of ness of researchers to collaborate rather than to compete, other metabolic traits, led to the joint design of a custom variants across the genome 35 between disease cases and an availability of resources and tools to harmonize data genotyping array that facilitated analysis of still larger 23 matched controls. This is a sets and a means to share statistics while protecting study sample sizes. A T2D association study used this array to paradigm for identifying genes participants24, GWAS meta-analysis consortia quickly analyse almost 150,000 individuals (approximately 34,800 or biological