Review Type 2 diabetes: new , new understanding

Inga Prokopenko1,2, Mark I. McCarthy1,2,3 and Cecilia M. Lindgren1,2

1 Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Old Road, Headington, Oxford, OX3 7LJ, UK 2 Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Headington, Oxford, OX3 7BN, UK 3 Oxford National Institute for Health Research (NIHR) Biomedical Research Centre, Churchill Hospital, Old Road, Headington, Oxford, OX3 7LJ, UK

Over the past two years, there has been a spectacular such as T2D, involved either hypothesis-free genome-wide change in the capacity to identify common genetic var- linkage mapping in families with multiply affected sub- iants that contribute to predisposition to complex multi- jects or association studies within ‘candidate’ genes using factorial phenotypes such as type 2 diabetes (T2D). The case-control samples or parent–offspring trios. The former principal advance has been the ability to undertake suffered from being seriously underpowered for sensible surveys of genome-wide association in large study susceptibility models, because linkage is best placed to samples. Through these and related efforts, 20 com- detect variants with high penetrance. As far as we can mon variants are now robustly implicated in T2D tell, common variants with high penetrance do not con- susceptibility. Current developments, for example in tribute substantially to risk of common forms of T2D and high-throughput resequencing, should help to provide few, if any, robust signals have emerged from such efforts a more comprehensive view of T2D susceptibility in the [9]. The latter candidate- association approach has near future. Although additional investigation is needed historically been compromised by difficulties associated to define the causal variants within these novel T2D- with choosing credible gene candidates. Selection was susceptibility regions, to understand disease mechan- typically based on some particular hypothesis about the isms and to effect clinical translation, these findings are biological mechanisms that are putatively involved in T2D already highlighting the predominant contribution of pathogenesis but, because the function of much of the defects in pancreatic b-cell function to the development genome is poorly characterized, it remains almost imposs- of T2D. ible to make fully informed decisions. In addition, all too often these candidate-gene studies were conducted in A leap forward in complex trait genetics sample sets that were far too small to offer confident Type 2 diabetes (T2D) is a common, chronic, complex detection of variants with the kinds of effect sizes that disorder of rapidly growing global importance. It accounts are now known to be realistic. Only rarely could the find- for >95% of diabetes worldwide and is characterized ings of one study be replicated in another [10–12]. With by concomitant defects in both insulin secretion (from hindsight, it is easy to appreciate why these approaches the b-cells in the pancreatic islets) and insulin action (in yielded so few examples of genuine diabetes-susceptibility fat, muscle, liver and elsewhere), the latter being typically variants. associated with obesity [1] (Figure 1). Although strong In fact, only two of the many candidate-gene associations evidence for familial clustering points to the contribution claimed for T2D have stood the test of time. The Pro12Ala of genetic mechanisms, recent rapid changes in diabetes variant in the peroxisome proliferator-activated receptor incidence (which has, in the United States, doubled in two gamma (PPARG) gene (encoding the target for the thiazo- decades) and prevalence (which is well above 10% in some lidinedione class of drugs used to treat T2D) [11] and the societies) indicate that environmental and lifestyle factors Glu23Lys variant in KCNJ11 (the potassium inwardly rec- are also of major relevance [1,2]. tifying channel, subfamily J, member 11, which encodes part Despite strenuous efforts over the past two decades to of the target for another class of diabetes drug, the sulpho- identify the genetic variants that contribute to individual nylureas) [12] are both common polymorphisms shown in differences in predisposition to T2D, the story until multiple studies to influence risk of T2D. Their effect sizes recently was characterized by slow progress and limited are only modest, each copy of the susceptibility allele success. The advent of genome-wide association (GWA) increasing risk of disease by 15–20%. Interestingly, rare analysis has transformed the potential for researchers to mutations in both KCNJ11 and PPARG are also known to be uncover variants influencing complex, common phenotypes causal for certain rare monogenic syndromes (neonatal – including T2D – and has resulted in the identification of a diabetes and lipodystrophies) characterized by severe meta- growing number of trait susceptibility loci [3–8]. bolic disturbance of b-cell function and insulin resistance, Until 2006, the main approaches used to track down respectively [13,14]. common variants influencing common dichotomous traits, The capacity to undertake efficient, accurate association analysis on a far larger scale than previously (culminating Corresponding author: McCarthy, M.I. ([email protected]). in GWA analysis) has transformed this picture. As we

0168-9525/$ – see front matter ß 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2008.09.004 Available online 25 October 2008 613 Review Trends in Genetics Vol.24 No.12

discuss some of the challenges that will need to be faced, if these findings are to be translated into advances in the clinical management of those with diabetes.

The first moves towards large-scale association mapping The earliest indication that the ‘hypothesis-free’ associ- ation approach to gene identification might succeed for T2D came from the discovery that variants within the transcription factor 7-like 2 (TCF7L2) gene had a substan- tial effect on T2D susceptibility [15]. TCF7L2 encodes a transcription factor that is active in the Wnt-signalling pathway and that had no ‘track-record’ as a candidate for T2D; indeed, this susceptibility effect was detected through a search for microsatellite associations across a large region of 10 that had been previously implicated in T2D susceptibility by linkage [16]. Sub- sequent fine-mapping efforts localized the likely causal variant(s) to an intron within TCF7L2 [15,17]. The fact that this signal was found within a region of apparent T2D linkage seems to have been serendipitous, because none of these variants within TCF7L2 are capable of explaining the linkage effect [15,17]. Across a swathe of replication studies [3–7,18], it has become clear that TCF7L2 variants have a substantially stronger effect on T2D risk than those in PPARG and KCNJ11, with a per-allele odds ratio of 1.4 Figure 1. Schema for the pathogenesis of T2D. T2D results, in most individuals, (Table 1; Figure 2). As a result, the 10% of Europeans that from concomitant defects in both insulin secretion and insulin action. It is likely that abnormalities in both b-cell mass and b-cell function contribute to the former, are homozygous for the risk allele have approximately whereas obesity is a major cause of deficient insulin action. All processes are likely twice the odds of developing T2D as those carrying no to involve contributions from both inherited and environmental effects. Examples copies [15,18]. The evidence implicating variants within of some of the genes and exposures implicated are shown in the yellow boxes. TCF7L2 in T2D susceptibility has naturally prompted describe in this review, such studies have led to confident efforts to understand the mechanisms involved. Current identification of many novel T2D-susceptibility signals, evidence indicates that alteration of TCF7L2 expression or thereby exposing some of the key mechanisms and path- function disrupts pancreatic islet function, possibly ways involved in the pathogenesis of T2D. Here, we also through dysregulation of proglucagon gene expression,

Table 1. T2D-susceptibility loci for which there is genome-wide significant evidence for associationa (nearest Year association Approach Probable mechanism Index variant Effect sizeb Risk-allele genes) ‘proven’ frequency PPARG 2000 Candidate Insulin action rs1801282 1.14 0.87 KCNJ11 2003 Candidate b-cell dysfunction rs5215 1.14 0.35 TCF7L2 2006 Large-scale association b-cell dysfunction rs7901695 1.37 0.31 FTO 2007 GWA Altered BMI rs8050136 1.17 0.40 HHEX/IDE 2007 GWA b-cell dysfunction rs1111875 1.15 0.65 SLC30A8 2007 GWA b-cell dysfunction rs13266634 1.15 0.69 CDKAL1 2007 GWA b-cell dysfunction rs10946398 1.14 0.32 CDKN2A/2B 2007 GWA b-cell dysfunction rs10811661 1.20 0.83 IGF2BP2 2007 GWA b-cell dysfunction rs4402960 1.14 0.32 HNF1B 2007 Large-scale association b-cell dysfunction rs4430796 1.10 0.47 WFS1 2007 Large-scale association Unknown rs10010131 1.12 0.60 JAZF1 2008 GWA b-cell dysfunction rs864745 1.10 0.50 CDC123/CAMK1D 2008 GWA Unknown rs12779790 1.11 0.18 TSPAN8/LGR5 2008 GWA Unknown rs7961581 1.09 0.27 THADA 2008 GWA Unknown rs7578597 1.15 0.90 ADAMTS9 2008 GWA Unknown rs4607103 1.09 0.76 NOTCH2 2008 GWA Unknown rs10923931 1.13 0.10 KCNQ1 2008 GWA b-cell dysfunction rs2237892 1.29 0.93 aAbbreviations: ADAMTS9, ADAM metallopeptidase with type 1 motif 9; CAMK1D, calcium/calmodulin-dependent protein kinase 1D; CDC123, cell division cycle 123 homologue (Saccharomyces cerevisiae); CDKAL1, CDK5 regulatory subunit-associated protein1-like1; CDKN2B, cyclin-dependent kinase inhibitor 2B; FTO, fat mass and obesity associated; HHEX, haematopoietically expressed homeobox; HNF1B, hepatocyte nuclear factor 1 homeobox B; IDE, insulin degrading ; IGF2BP2, insulin- like growth factor 2 mRNA binding protein 2; JAZF1, juxtaposed with another zinc finger gene 1; KCNJ11, potassium inwardly rectifying channel, subfamily J, member 11; KCNQ1, potassium voltage-gated channel, KQT-like subfamily, member 1; LGR5, leucine-rich repeat-containing G-protein coupled; NOTCH2, Notch homologue 2 (Drosophila); PPARG, peroxisome proliferator-activated receptor gamma; SLC30A8, solute carrier family 30 (zinc transporter), member 8; TCF7L2, transcription factor 7 like 2; THADA, thyroid adenoma associated; TSPAN8, tetraspanin 8; WFS1, Wolfram syndrome1. bEstimates of effect size (given as per-allele odds ratios, i.e. the increase in odds of diabetes per copy of the risk allele) and risk-allele frequencies are all reported for European- descent populations based on available data (Figure 2).

614 Review Trends in Genetics Vol.24 No.12

Figure 2. Effect sizes of known T2D-susceptibility loci. The T2D-susceptibility variants so far discovered have only modest individual effects. The Â-axis gives the per-allele odds ratio (estimated for European-descent samples) for each locus listed on the y-axis. Loci are sorted by descending order of per-allele effect size from TCF7L2 (1.37) to ADAMTS9 (1.09) (Table 1). Loci shown in blue are those identified by GWA approaches, whereas those found by candidate-gene approaches and by large-scale association analyses are shown in yellow and red, respectively. Odds ratios are estimated from data in Refs [3–7,24,25,43,45,44]. These figures are approximate estimates of the true effect sizes at each locus and might be either overestimates (owing to winner’s curse) or underestimates (because the causal variant in most cases is not known yet). leading to reduced insulin secretion and enhanced risk of The third crucial change was the dawning realization by T2D [19]. investigators that earlier estimates of locus effect sizes had been wildly optimistic and that success would require GWA scans identify novel loci influencing T2D analyses of far larger sample sizes than previously con- susceptibility sidered, an objective that has catalysed large-scale inter- The solution to previous disappointing efforts to map T2D- national collaboration [10]. susceptibility loci had been obvious for some time, but it is So far, results from a total of 12 GWA scans for T2D have only in the past two years that it has become possible, from been published (Table 2). Six of these represent ‘high- both technical and economic perspectives, to undertake density’ scans (i.e. at least 300 000 SNPs, offering gen- hypothesis-free GWA testing in samples of sufficient size to ome-wide coverage >65%), all in samples of Northern generate convincing results. The advent of the GWA European descent [3–8]. These studies included >19 000 approach was the result of a ‘perfect storm’ involving at individuals (7142 T2D patients and 11 996 controls) ran- least three components. The first was delivery of a catalo- ging in sample size from 997 to 6674 (Table 2). Two recent gue of patterns of -sequence variation studies in East Asian subjects were on a smaller scale through the efforts of the International HapMap Consor- (featuring between 82 000 and 207 000 typed SNPs in a few tium (http://www.hapmap.org) [20]. One of the important hundred cases only), but large-scale replication did result messages generated by the HapMap was that, in non- in at least one confirmed T2D signal (see later) [24,25]. The African-descent populations at least, extensive corre- other four studies featured a wider array of ethnic groups lations between neighbouring single nucleotide poly- (including Native American [26], Hispanic [27] and popu- morphisms (SNPs; i.e. linkage disequilibrium) could lations of European descent [28,29]) but were less exten- constrain the number of independent genetic tests sive with respect to both sample size and SNP density (all required to survey the genome, such that 80% of all used the Affymetrix 100K array) and will not be considered common variation could be sampled with as few as further here. 500 000 carefully chosen SNPs [21,22]. At the same time, As has been reviewed elsewhere [30], the first wave of new genotyping methods were developed that addressed studies led to identification and mutual replication of six the technical challenges required to perform massively entirely novel T2D-susceptibility loci and confirmed the parallel SNP-typing at high accuracy and low cost [23]. three previously established signals at PPARG, KCNJ11

615 Review Trends in Genetics Vol.24 No.12

Table 2. Overview of GWA scans for type 2 diabetesa,b Study Number of Number of Sample source Genotyping array T2D phenotype Refs cases controls Wellcome Trust Case Control 1924 2938 UK Affymetrix 500K Enrichment for family history of T2D, [7] Consortium AAO <65 years Diabetes Genetics Initiative 1464 1467 Finland, Sweden Affymetrix 500K Partial enrichment for family history [4] and lean T2D deCODE Genetics 1399 5275 Iceland Illumina 300K No specific enrichment for family [6] history, young AAO or BMI Finland–US Investigation of NIDDM 1161 1174 Finland Illumina 300K Partial enrichment for family history [5] Genetics (FUSION) Diabetes Gene Discovery Group 694 645 France Illumina 300K + Family history of T2D, AAO <45 years, [3] Illumina 100K BMI <30 kg/m2 DiaGen 500 497 East Finland, Illumina 300K Enrichment for family history, AAO [8] Germany, UK, <60 years Ashkenazi Pima 300 334 Pima Indians Affymetrix 100K AAO <25 years, enrichment for family [26] history Starr County, Texas 281 280 Mexican Americans Affymetrix 100K Controlled for admixture [27] BioBank Japan 194 1556 Japanese Custom set of Enrichment for T2D cases with [25] 268K SNPs retinopathy Japanese multi-disease 187 1504 Japanese JSNP Genome T2D cases [24] collaborative genome scan Scan 100K SNPs Old Order Amish 124 295 Amish Affymetrix 100K Enrichment for family history [28] Framingham Health Study 91 1087 Massachusetts Affymetrix 100K Incident T2D cases [29] aAbbreviations: AAO, age at onset. bProjects are listed in descending order of case sample size. and TCF7L2 [3–8]. Each of these six loci has subsequently date and, although this does not necessarily exclude a been replicated in studies involving individuals of both contribution to T2D predisposition, it indicates that the European [31] and Asian [32–35] origin. Apart from var- main effects attributable to these variants are small and/or iants within the fat mass and obesity associated (FTO) subject to substantial modification by genetic background gene [36], which exert their T2D effect through a primary or environmental exposures. Either way, it seems likely impact on body mass index (BMI), the remaining signals that exhorbitantly large sample sets will be required before (involving SNPs within or adjacent to hematopoietically such signals can attain the standard of proof now available expressed homeobox [HHEX]/insulin degrading enzyme for the loci described in Table 1. [IDE], CDK5 regulatory subunit associated protein 1-like 1[CDKAL1], insulin-like growth factor 2 mRNA binding Meta-analysis identifies six further T2D-susceptibility protein 2 [IGF2BP2], cyclin-dependent kinase inhibitors loci 2a and 2b [CDKN2A and CDKN2B] and solute carrier Efforts to find additional T2D-susceptibility loci have to family 30, member 8 [SLC30A8]) exert their primary effect contend with the modest effect sizes anticipated and the on insulin secretion [31,37,38]. Other reported signals, stringent significance thresholds required when many such as those in exostoses (multiple) 2 (EXT2) and hundreds of thousands of SNPs are tested in parallel LOC387761 [3] have not been widely substantiated on [39]. The obvious solution is to increase sample size and replication and are less likely to represent genuine signals. the most effective strategy for this involves combining Although it has become convention, largely as a matter of existing GWA data through meta-analysis. The Diabetes convenience, to label these susceptibility loci according to Genetics Replication And Meta-analysis (DIAGRAM) con- the most credible regional candidate (e.g. FTO), it is vital to sortium took such an approach [41], integrating data from remember that these assignments do not imply a con- three previously published GWA scans [4,5,7], thereby firmed mechanistic link. What these GWA studies deliver doubling the sample size compared to the largest of the are association signals, and extensive fine-mapping and individual studies (Table 2)to4500 cases and 5500 functional studies are required before the causal variants controls. The consortium also used novel imputation are identified and the molecular and cellular mechanisms approaches [42] to infer genotypes at additional SNPs that through which they function are precisely defined [39]. were not directly typed on the commercial arrays used for These GWA studies, as well as detecting new loci, the original GWA scans, thereby extending the analysis to provided the first ‘genome-wide’ perspective of the land- a total of 2.2 million SNPs across the genome. (Imputa- scape of T2D susceptibility and thereby enabled clearer tion approaches use information on haplotype structure ‘bench-marking’ of other claimed T2D-susceptibility effects available from a densely typed reference sample [such as for which the accumulated evidence from candidate-gene HapMap] to fill-in genotypes at that subset of untyped studies remained somewhat equivocal [40]. Examples in- SNPs for which confident assignment is possible.) In this clude variants in the genes encoding calpain-10 (CAPN10; study, 69 signals showing the strongest associations in the thought to be involved in b-cell function), insulin (INS;an GWA meta-analysis were genotyped in an initial replica- obvious candidate) and PC-1 (ENPP1; the product of which tion set of 22 426 individuals and the top 11 signals (repre- is known to modulate insulin-receptor function). None of senting ten independent loci) emerging from this second these genes has featured prominently in GWA analyses to analysis were then evaluated in 57 000 further subjects.

616 Review Trends in Genetics Vol.24 No.12

After integrating data from all study subjects, six signals equilibrium with a causal allele present in 4% of cases and reached combined levels of significance that are highly only 1% of controls. Such a variant would have an effect size unlikely to have occurred by chance (P 5 Â 10À8) [41] much larger than that of the ‘index’ variant through which (Table 1; Figure 2). Only one of these signals contained a the signal had been originally detected. gene for which there was reasonable biological candidacy: Ongoing efforts to expand existing meta-analyses of Notch homologue 2, Drosophila (NOTCH2) is known to be European-descent populations [41] and to extend the num- involved in pancreatic development. For the others, map- ber of SNPs taken forward for large-scale replication ping within or adjacent to ADAM metallopeptidase with should uncover additional T2D-susceptibility loci. At the thrombospondin type 1 motif 9 (ADAMTS9), calcium/cal- same time, GWA studies conducted in other major ethnic modulin-dependent protein kinase 1D (CAMK1D), juxta- groups should provide complementary views of the posed with another zinc finger gene 1 (JAZF1), tetraspanin susceptibility landscape of T2D. Indeed, the first GWA 8(TSPAN8)/leucine-rich repeat-containing G-protein scans for T2D reported in East Asian subjects [24,25] coupled (LGR5) and thyroid adenoma associated (THADA), (Table 2) have recently demonstrated how studies in the mechanisms involved remain unclear. non-European descent populations can reveal novel susceptibility loci. In these East Asian scans, variants in Further signals from large-scale candidate-pathway the potassium voltage-gated channel, KQT-like subfamily, studies member 1 gene (KCNQ1) have been shown to influence At the same time as these GWA efforts, which had added a diabetes risk. total of 12 loci to the list, two further loci emerged from Replication studies conducted in samples of East Asian large-scale candidate-pathway studies initiated in the pre- and South Asian origin indicate that the variants found in GWA era. The first of these focused on genes that were European-descent populations tend to exert similar effects thought to be likely to modulate b-cell function and high- on T2D susceptibility, although differences in allele fre- lighted variants in the Wolfram syndrome 1 (WFS1) gene quency can lead to differences in detectability and popu- [43]. Wolfram syndrome is a rare monogenic syndrome, lation-level effect [32–35]. For example, the minor allele which features optic atrophy, diabetes insipidus and deaf- frequency at T2D-associated TCF7L2 variants is far lower ness as well as diabetes mellitus, for which rare mutations in East Asians, thereby rendering the association with T2D in the WFS1 gene are causal [43]. Although the evidence much less visible [34]. The situation at KCNQ1 is reversed for association in the initial report fell just short of strict in that, although the effect size in European-descent significance thresholds at the genome-wide level (the best samples is not that different from that observed in East SNP generated a P-value of 1.4 Â 10À7), replication studies Asians, substantial differences in risk-allele frequency have confirmed this association [44]. The second study mean that the association was far easier to detect in examined variants in genes that were causally implicated studies of Japanese, Korean and Chinese individuals. (It in monogenic forms of diabetes and generated compelling is not yet clear whether these differences in allele fre- evidence that common variants in the gene encoding hep- quency are simply the result of chance genetic drift or atocyte nuclear factor 1-b (HNF1B, also known as TCF2,a constitute evidence for selection.) It is likely, therefore, transcription factor implicated in pancreatic islet devel- that GWA studies performed in non-European descent opment and function) were associated with T2D [45],an populations will reveal additional loci which, by virtue of effect independently confirmed when the same gene differences in allele frequency, haplotype structure, emerged from a GWA analysis for prostate-cancer- genetic background or relevant environmental exposure, susceptibility loci [46]. would be extremely hard to find in European populations, however large the study. Common variants with modest effect sizes For the T2D-susceptibility variants so far discovered Understanding function (Table 1), the risk-allele frequency in Europeans ranges Gene-discovery efforts are primarily motivated by the from approximately 10% (NOTCH2) to 90% (THADA, expectation that, by finding robust genotype–phenotype PPARG). This allele-frequency spectrum is, in part, the associations, we will generate novel insights into disease direct consequence of experimental designs (such as the mechanisms and new avenues for clinical translation. use of commodity genotyping arrays) that have focused on However, the task of moving from association signal to the detection of common variants: these have low power to complete mechanistic understanding will often be labor- detect associations caused by rare causal alleles [47]. Apart ious and early inferences can be misleading. We should from TCF7L2, per-allele effect sizes are, at best, modest expect that, on occasion, genes other than the most obvious (mostly in the 1.1–1.2 range), which explains the massive regional candidates will turn out to be responsible for the sample sizes required for robust inference [41]. susceptibility effect. However, for most of these loci, the causal variant(s) have Nevertheless, for some of the loci identified by GWA, a yet to be identified with any certainty. Although it is likely, compelling functional candidate is clear, the best example for reasons of power, that the causal variants in each of these being SLC30A8 [3]. This gene encodes an islet-specific zinc regions have risk-allele frequencies similar to those of the transporter, which is implicated in the maintenance of variants showing maximal association, this is not inevita- insulin granule function [48]. At this locus, one of the ble. For example, an association detected on genome-wide variants displaying the strongest association is a non- scanning based on a case-control allele frequency difference synonymous SNP (Arg325Trp), which might well of 35% versus 32% could, in principle, reflect linkage dis- represent the causal polymorphism.

617 Review Trends in Genetics Vol.24 No.12

At other loci, more than one excellent biological candi- have similar phenotypic consequences. Further resequen- date maps into the region of the association signal. On cing of the newly identified T2D-susceptibility loci is likely chromosome 10q, for example, the causal variants most to reveal additional instances of lower-frequency, higher- probably lie, on the basis of the patterns of flanking penetrance alleles and might contribute to more effective recombination hot-spots, within a 400-kb region that con- functional analyses of these regions. tains the coding regions of three genes, two of which (HHEX, which encodes a homeobox protein involved in Individual prediction pancreatic development and IDE, which is implicated in In addition to delivering novel biological insights into rodent models of diabetes) have strong biological claims for disease pathogenesis, there is growing interest in the a role in T2D pathogenesis [49–51]. potential for the growing numbers of susceptibility var- At other signals, the links between association and iants to open the door to the provision of individualized function remain less obvious, but more intriguing. The medical care contingent upon the results of personalized chromosome 9p21 T2D association signal maps to a genetic profiling [63]. 15-kb interval that lies 200 kb from the nearest At this stage, however, the prospects for individual protein-coding genes, CDKN2A and CDKN2B. These genes prediction seem limited. In a recent study, Lango and encode cyclin-dependent kinase inhibitors, primarily colleagues addressed the extent to which the known T2D- known for their role in the development of cancers [52]. susceptibility variants might, in combination, provide However, in rodent models at least, overexpression of the quantification of individual T2D risk [64].Itiscertainly CDKN2A homologue recapitulates the T2D phenotype possible to identify small numbers of individuals who, [53], raising the suspicion that remote regulatory effects because they have inherited extremely large or extremely on CDKN2A expression, possibly mediated through small numbers of susceptibility alleles, differ substan- ANRIL, a non-coding RNA which maps to the region of tially with regard to risk of T2D. With the present collec- maximal association, could be responsible [54]. If so, this tion of variants, this amounts to an approximately would be consistent with the observation that several of the fourfold difference in risk between those individuals in newly found T2D-susceptibility loci implicate genes the top and bottom 1% of the distribution of risk-allele involved in cell-cycle regulation, and that there might be counts. some overlap in the mechanisms influencing cancer and When viewed from a more population-based perspect- diabetes predisposition. ive, the situation is less compelling. The discriminative accuracy achieved with the current set of T2D risk variants Lessons about diabetes (0.60) compares unfavourably with the value of 0.78 Several other interesting features emerge from these new possible through use of age, BMI (a standardised measure findings. First, analyses of the associated variants in of obesity obtained by dividing weight in kg by the square of healthy, non-diabetic populations has demonstrated that, the height in metres) and gender alone [64]. for most, the primary effect on T2D susceptibility is For the time being then, these ‘traditional’ risk factors mediated through deleterious effects on insulin secretion, (e.g. age and BMI) provide far more accurate risk-strati- rather than insulin action [31,37,38]. This finding fication than do the common variants so far identified. It addresses a long-standing debate about the relative remains unclear whether providing individuals with infor- importance of inherited defects in insulin secretion as mation based on their personal genetic profiling would opposed to action in the pathogenesis of T2D and focuses actually translate into health benefits through, for attention on mechanisms involved in the regulation of islet example, improved motivation to make lifestyle changes b-cell mass and function. or altered therapeutic interventions. The second interesting observation relates to the fre- quency with which the genes and variants implicated in Next steps T2D-susceptibility are also associated with other traits. GWA studies have certainly kick-started complex trait For example, a different set of variants at the 9p21 region genetics but much work remains to be done before a full influences predisposition to coronary artery disease and to mechanistic picture emerges (Figure 3). In T2D, the scans pathological dilatation of the major blood vessels (‘arterial performed so far have concentrated on European popu- aneurysms’) [55–58]. Variants at HNF1B (TCF2) and lations and have only been designed to detect those var- JAZF1 have clear effects on susceptibility to prostate iants represented on, or tagged by, the sites on the cancer [46,59] and CDKAL1 has recently been revealed commodity genotyping arrays. Only a small proportion as a susceptibility locus for Crohn’s disease [60]. These (<10%) of the overall variance in T2D predisposition has examples of pleiotropy point towards previously unsus- been described. pected overlap in pathogenetic pathways. Clearly, identification of the causal variants responsible Finally, several of the loci implicated in common forms for the association signals uncovered will provide valuable of T2D are also known to harbour rare mutations that are clues to understanding disease predisposition. This is causal for rare, monogenic forms of diabetes. This is true of likely to require detailed re-sequencing of the associated PPARG [13] and KCNJ11 [14], and for HNF1B (TCF2) [61] regions followed by exhaustive fine-mapping in multiple and WFS1 [62]. Of course, having shown that one type of ethnic groups. Even then, linkage disequilibrium might variant in a given gene is capable of disturbing glucose make it impossible to limit the causal candidates to a single metabolism, it is much more likely that variants from other variant and functional studies will be required for defini- parts of the allele-frequency spectrum (if they exist) will tive assignment of causation.

618 Review Trends in Genetics Vol.24 No.12

Figure 3. Future directions for T2D research. The next wave of work to understand the underlying mechanisms and pathways of T2D physiology will focus on finding variants that remain undetected by GWA studies. In addition, analyses of the contribution of rare SNPs and structural variants will help identification of novel loci contributing to T2D risk. In parallel, defining the causal variant(s) in T2D loci will require large scale deep sequencing and fine mapping. A first layer of functional evaluation of the variants within associated loci through expression-quantitative trait locus (eQTL) studies will combine the genetic variation in associated loci with expression analysis data to define regulatory relationships. Studies designed to understand the functional effect of any causal variants in relevant cell systems and animal models will give insight to physiological consequence. These advances will underpin efforts to translate the findings through development of diagnostic tests, risk evaluation and identification of novel drug targets.

In parallel, there are efforts to extend the range of linkage and frequency too low to be captured by current genomic variation encompassed by genome-wide GWA reagents [41]. An improved inventory of such var- approaches. Structural (or copy number) variants (CNVs) iants is emerging from both local and global resequencing are prime candidates here, given growing evidence of their efforts (such as the 1000 Genomes Project; http:// overall contribution to human genomic diversity and of www.1000genomes.org) and the large sample sizes now their phenotypic consequences [65–69]. Armed with more available will enable properly powered targeted associ- complete inventories of the sites of common CNVs and new ation studies to be performed. If these efforts identify tools that will enable genome-wide CNV genotyping (a far low-frequency, intermediate-penetrance variants contri- more demanding task than CNV discovery), the coming buting to T2D predisposition, these are likely to be valu- months should see the first reports evaluating the part able tools for genomic profiling and individual prediction played by CNVs in T2D predisposition. [39]. New high-throughput resequencing approaches should Finally, we can expect to see advances in the under- also enable investigators to obtain a more comprehensive standing of genome function to be increasingly integrated view of susceptibility effects across the allele-frequency into genetic studies. One obvious example of this relates to spectrum. Traditional linkage approaches are well-suited the growing importance attributed to the regulatory role of to finding rare, highly penetrant variants and have been microRNAs (miRNAs) [68]. miRNAs are small, 22-base successful in identifying mutations causal for highly famil- pair non-coding RNAs that bind target mRNAs and modify ial forms of diabetes, such as maturity onset diabetes of the their expression by translational repression or target young. As has been described earlier, the GWA approach is degradation. Because each miRNA is thought to regulate a powerful tool for detecting common variant effects, which many target mRNAs, a picture is emerging of miRNAs as are, in the case of T2D at least, almost exclusively of low- ‘molecular switches’ that are able to activate (or deactivate) penetrance. Neither approach would do well at detecting broad programs of cellular response. variants with characteristics that lie between these Recently, several specific miRNAs have been found to extremes, with penetrance too modest to be detected by have important roles in regulation of glucose and lipid

619 Review Trends in Genetics Vol.24 No.12 metabolism. In humans, there is evidence that miRNA-1 13 Barroso, I. et al. (1999) Dominant negative mutations in human PPARg and miRNA-133 are instrumental in the normal prolifer- associated with severe insulin resistance, diabetes mellitus and hypertension. Nature 402, 880–883 ation and differentiation of skeletal muscle tissue [69] and, 14 Gloyn, A.L. et al. (2004) Activating mutations in the gene encoding the in murine models, miRNA-9 and miRNA-375 have been ATP-sensitive potassium-channel subunit Kir6.2 and permanent shown to regulate insulin secretion [70,71]. miRNA-124a2 neonatal diabetes. N. Engl. J. Med. 350, 1838–1849 has also been implicated in pancreatic b-cell development 15 Grant, S.F.A. et al. (2006) Variant of transcription factor 7-like 2 and function, probably through a binding of FOXA2 lead- (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 16 Reynisdottir, I. et al. (2003) Localisation of a susceptibility gene for ing to a downstream effect on FOXA2 target genes in- type 2 diabetes to chromosome 5q34-q35.2. Am. J. Hum. Genet. 73, cluding PDX1, KCNJ11 and ABCC8 [72]. Other 323–335 examples of key miRNA regulators include miRNA-143 17 Helgason, A. et al. (2007) Refining the impact of TCF7L2 gene variants (adipocyte differentiation) [73] and miRNA-122 (hepatic on type 2 diabetes and adaptive evolution. Nat. Genet. 39, 218–225 lipid metabolism) [74]. Together, these (and other) studies 18 Groves, C.J. et al. (2006) Association analysis of 6736 UK subjects provides replication and confirms TCF7L2 as a type 2 diabetes provide a strong basis for regarding miRNAs as key func- susceptibility gene with a substantial effect on individual risk. tional candidates relevant to the pathogenesis of T2D, and Diabetes 55, 2640–2644 ongoing and future studies (for example, examining the 19 Lyssenko, V. et al. (2007) Mechanisms by which common variants in contribution of genome sequence variation in miRNA cod- the TCF7L2 gene increase risk of type 2 diabetes. J. Clin. Invest. 117, 2155–2163 ing and target sequences to T2D predisposition) should 20 International HapMap Consortium (2005) A haplotype map of the help to establish the extent to which this hypothesis is true. human genome. Nature 437, 1299–1320 21 Barrett, J.C. and Cardon, L.R. (2006) Evaluating coverage of genome- Conclusion wide association studies. Nat. Genet. 38, 659–662 The ability to undertake genome-wide surveys of the 22 Pe’er, I. et al. (2006) Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat. Genet. 38, 663–667 relationship between sequence variation and common 23 Fan, J.B. et al. (2006) Highly parallel genomic assays. Nat. Rev. Genet. human phenotypes has provided a powerful new tool for 7, 632–644 geneticists. It is clear that substantial advances in the 24 Yasuda, K. et al. (2008) Variants in KCNQ1 are associated with understanding and management of major diseases, in- susceptibility to type 2 diabetes mellitus. Nat. Genet. 40, 1092–1097 cluding T2D, will flow from wider application of these 25 Unoki, H. et al. (2008) SNPs in KCNQ1 are associated with susceptibility to type 2 diabetes in East Asian and European approaches and clinical translation of the findings. populations. Nat. Genet. 40, 1098–1102 26 Hanson, R.L. et al. (2007) A search for variants associated with young- Acknowledgements onset type 2 diabetes in American Indians in a 100K genotyping array. The preparation of this manuscript was supported by funding from the Diabetes 56, 3045–3052 European Commission (ENGAGE consortium: HEALTH-F4–2007– 27 Hayes, M.G. et al. (2007) Identification of type 2 diabetes genes in 201413); Medical Research Council (G0601261); Oxford NIHR Biomedical Mexican Americans through genome-wide association studies. Research Centre; Diabetes UK and the Wellcome Trust. C.M.L. is a Diabetes 56, 3033–3044 Nuffield Department of Medicine Scientific Leadership Fellow. 28 Rampersaud, E. et al. (2007) Identification of novel candidate genes for type 2 diabetes from a genome-wide association scan in the Old Order References Amish: evidence for replication from diabetes-related quantitative 1 Stumvoll, M. et al. (2005) Type 2 diabetes: principles of pathogenesis traits and from independent populations. Diabetes 56, 3053–3062 and therapy. Lancet 365, 1333–1346 29 Florez, J.C. et al. (2007) A 100K genome-wide association scan for 2 Zimmet, P. et al. (2001) Global and societal implications of the diabetes diabetes and related traits in the Framingham Heart Study: epidemic. Nature 414, 782–787 replication and integration with other genome-wide datasets. 3 Sladek, R. et al. (2007) A genome-wide association study identifies Diabetes 56, 3063–3074 novel risk loci for type 2 diabetes. Nature 445, 881–885 30 Frayling, T.M. (2007) Genome-wide association studies provide new 4 Saxena, R. et al. (2007) Genome-wide association analysis identifies loci insights into type 2 diabetes aetiology. Nat. Rev. Genet. 8, 657–662 for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 31 Grarup, N. et al. (2007) Studies of association of variants near the 5 Scott, L.J. et al. (2007) A genome-wide association study of type 2 HHEX, CDKN2A/B, and IGF2BP2 genes with type 2 diabetes and diabetes in Finns detects multiple susceptibility variants. Science 316, impaired insulin release in 10,705 Danish subjects: validation and 1341–1345 extension of genome-wide association studies. Diabetes 56, 3105–3111 6 Steinthorsdottir, V. et al. (2007) A variant in CDKAL1 influences 32 Omori, S. et al. (2008) Association of CDKAL1, IGF2BP2, CDKN2A/B, insulin response and risk of type 2 diabetes. Nat. Genet. 39, 770–775 HHEX, SLC30A8, and KCNJ11 with susceptibility to type 2 diabetes in 7 Zeggini, E. et al. (2007) Replication of genome-wide association signals a Japanese population. Diabetes 57, 791–795 in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336– 33 Wu, Y. et al. (2008) Common variants in CDKAL1, CDKN2A/B, 1341 IGF2BP2, SLC30A8 and HHEX/IDE genes are associated with type 8 Salonen, J.T. et al. (2007) Type 2 diabetes whole-genome association 2 diabetes and impaired fasting glucose in a Chinese Han population. study in four populations: the DiaGen consortium. Am. J. Hum. Genet. Diabetes 57, 2834–2842 81, 338–345 34 Ng, M.C. et al. (2008) Implication of genetic variants near TCF7L2, 9 McCarthy, M.I. (2003) Growing evidence for diabetes susceptibility SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2 and FTO in Type 2 genes from genome scan data. Curr. Diab. Rep. 3, 159–167 diabetes and obesity in 6719 Asians. Diabetes 57, 2226–2233 10 Hattersley, A.T. and McCarthy, M.I. (2005) A question of standards: 35 Sanghera, D.K. et al. (2008) Impact of nine common type 2 diabetes risk what makes a good genetic association study? Lancet 366, 1315–1323 polymorphisms in Asian Indian Sikhs: PPARG2 (Pro12Ala), IGF2BP2, 11 Altshuler, D. et al. (2000) The common PPARg Pro12Ala polymorphism TCF7L2 and FTO variants confer a significant risk. BMC Med. Genet. is associated with decreased risk of type 2 diabetes. Nat. Genet. 26, 9, 59 76–80 36 Frayling, T.M. et al. (2007) A common variant in the FTO gene is 12 Gloyn, A.L. et al. (2003) Large scale association studies of variants in associated with body mass index and predisposes to childhood and genes encoding the pancreatic b-cell K-ATP channel subunits Kir6.2 adult obesity. Science 316, 889–894 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant 37 Pascoe, L. et al. (2007) Common variants of the novel type 2 diabetes is associated with increased risk of type 2 diabetes. Diabetes 52, genes, CDKAL1 and HHEX/IDE, are associated with decreased 568–572 pancreatic b-cell function. Diabetes 56, 3101–3104

620 Review Trends in Genetics Vol.24 No.12

38 Grarup, N. et al. (2008) Association testing of novel type 2 diabetes risk- 55 McPherson, R. et al. (2007) A common allele on chromosome 9 alleles in the JAZF1, CDC123/CAMK1D, TSPAN8, THADA, associated with coronary heart disease. Science 316, 1488–1491 ADAMTS9, and NOTCH2 loci with insulin release, insulin 56 Helgadottir, A. et al. (2007) A common variant on chromosome 9p21 sensitivity and obesity in a population-based sample of 4,516 affects risk of myocardial infarction. Science 316, 1491–1493 glucose-tolerant middle-aged Danes. Diabetes 57, 2534–2540 57 Samani, N.J. et al. (2007) Genomewide association analysis of coronary 39 McCarthy, M.I. et al. (2008) Genome wide association studies for artery disease. N. Engl. J. Med. 357, 443–453 complex traits: consensus, uncertainty and challenges. Nat. Rev. 58 Helgadottir, A. et al. (2008) The same sequence variant on 9p21 Genet. 9, 356–369 associates with myocardial infarction, abdominal aortic aneurysm 40 Parikh, H. and Groop, L. (2004) Candidate genes for type 2 diabetes. and intracranial aneurysm. Nat. Genet. 40, 217–224 Rev. Endocr. Metab. Disord. 5, 151–176 59 Thomas, G. et al. (2008) Multiple loci identified in a genome-wide 41 Zeggini, E. et al. (2008) Meta-analysis of genome-wide association data association study of prostate cancer. Nat. Genet. 40, 310–315 and large-scale replication identifies additional susceptibility loci for 60 Barrett, J.C. et al. (2008) Genome-wide association defines more than type 2 diabetes. Nat. Genet. 40, 638–645 30 distinct susceptibility loci for Crohn’s disease. Nat Genet. 40, 955– 42 Marchini, J. et al. (2007) A new multipoint method for genome-wide 962 association studies by imputation of genotypes. Nat. Genet. 39, 906–913 61 Maestro, M.A. et al. (2007) Distinct roles of HNF1b, HNF1a and 43 Sandhu, M.S. et al. (2007) Common variants in WFS1 confer risk of HNF4a in regulating pancreas development, beta-cell function and type 2 diabetes. Nat. Genet. 39, 951–953 growth. Endocr. Dev. 12, 33–45 44 Franks, P.W. et al. (2008) Replication of the association between 62 Domenech, E. et al. (2006) Wolfram/DIDMOAD syndrome, a variants in WFS1 and risk of type 2 diabetes in European heterogenic and molecularly complex neurodegenerative disease. populations. Diabetologia 51, 458–463 Pediatr. Endocrinol. Rev. 3, 249–257 45 Winckler, W. et al. (2007) Evaluation of common variants in the six 63 Lyssenko, V. et al. (2005) Genetic prediction of future type 2 Diabetes. known maturity-onset diabetes of the young (MODY) genes for PLoS Med. 2, e345 association with type 2 diabetes. Diabetes 56, 685–693 64 Lango, H. et al. (2008) Assessing the combined impact of 18 common 46 Gudmundsson, J. et al. (2007) Two variants on chromosome 17 confer genetic variants of modest effect sizes on type 2 diabetes risk. Diabetes, prostate cancer risk, and the one in TCF2 protects against type 2 DOI: 10.2337/db08-0504 diabetes. Nat. Genet. 39, 977–983 65 McCarroll, S.A. and Altshuler, D.M. (2007) Copy-number variation and 47 Zeggini, E. et al. (2005) An evaluation of HapMap sample size and association studies of human disease. Nat. Genet. 39 (7 Suppl), S37– tagging SNP performance in large-scale empirical and simulated data S42 sets. Nat. Genet. 37, 1320–1322 66 Stranger, B.E. et al. (2007) Population genomics of human gene 48 Chimienti, F. et al. (2004) Identification and cloning of a b-cell-specific expression. Nat. Genet. 39, 1217–1224 zinc transporter, ZnT-8, localized into insulin secretory granules. 67 Stefansson, H. et al. (2008) Large recurrent microdeletions associated Diabetes 53, 2330–2337 with schizophrenia. Nature 455, 232–236 49 Fakhrai-Rad, H. et al. (2000) Insulin-degrading enzyme identified as a 68 Bartel, D.P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and candidate diabetes susceptibility gene in GK rats. Hum. Mol. Genet. 9, function. Cell 116, 281–297 2149–2158 69 Callis, T.E. et al. (2007) MicroRNAs in skeletal and cardiac muscle 50 Hwang, D.Y. et al. (2007) Significant change in insulin production, development. DNA Cell Biol. 26, 219–225 glucose tolerance and ER stress signaling in transgenic mice 70 Poy, M.N. et al. (2004) A pancreatic islet-specific microRNA regulates coexpressing insulin-siRNA and human IDE. Int. J. Mol. Med. 19, 65–73 insulin secretion. Nature 432, 226–230 51 Bort, R. et al. (2004) Hex homeobox gene-dependent tissue positioning 71 Plaisance, V. et al. (2006) MicroRNA-9 controls the expression of is required for organogenesis of the ventral pancreas. Development 131, Granuphilin/Slp4 and the secretory response of insulin-producing 797–806 cells. J. Biol. Chem. 281, 26932–26942 52 Finkel, T. et al. (2007) The common biology of cancer and ageing. 72 Yang, B. et al. (2007) The muscle-specific microRNA miR-1 regulates Nature 448, 767–776 cardiac arrhythmogenic potential by targeting GJA1 and KCNJ2. Nat. 53 Krishnamurthy, J. et al. (2006) p16INK4a induces an age-dependent Med. 13, 486–491 decline in islet regenerative potential. Nature 443, 453–457 73 Esau, C. et al. (2004) MicroRNA-143 regulates adipocyte 54 Broadbent, H.M. et al. (2008) Susceptibility to coronary artery disease differentiation. J. Biol. Chem. 279, 52361–52365 and diabetes is encoded by distinct, tightly linked, SNPs in the ANRIL 74 Esau, C. et al. (2006) miR-122 regulation of lipid metabolism revealed locus on chromosome 9p. Hum. Mol. Genet. 17, 806–814 by in vivo antisense targeting. Cell Metab. 3, 87–98

621