R EPORTS

16. L. Shen, K. L. Rock, Proc. Natl. Acad. Sci. U.S.A. 101, J. Immunol. 171, 5668 (2003). Supporting Online Material 3035 (2004). 22. N. P. Restifo et al., J. Immunol. 154, 4414 (1995). www.sciencemag.org/cgi/content/full/304/5675/1318/ 17. S. P. Schoenberger et al., J. Immunol. 161, 3808 (1998). 23. We thank B. Buschling, D. Tokarchick, and A. Schell DC1 18. M. L. Albert, B. Sauter, N. Bhardwaj, Nature 392,86 for technical assistance. We are grateful to M. Epler Materials and Methods b (1998). and S. Tevethia for their generous gift of D - Figs. S1 and S2 19. M. Bellone et al., J. Immunol. 159, 5391 (1997). NP366-374 tetramers. This work was supported in part References and Notes 20. J. W. Yewdell, C. C. Norbury, J. R. Bennink, Adv. by a Wellcome Prize Traveling Fellowship and U.S. Immunol. 73, 1 (1999). Public Health Service grants, and NIH grant AI- 21. A. Serna, M. C. Ramirez, A. Soukhanova, L. J. Sigal, 056094-01 to C.C.N. 3 February 2004; accepted 14 April 2004

with the dog genome, which was estimated Ultraconserved Elements in the using reads from the National Center for Biotechnology Information (NCBI) trace ar- chive (477/481 ϭ 99.2% of the elements aligning at an average of 99.2% identi- Gill Bejerano,1* Michael Pheasant,3 Igor Makunin,3 ty).Thus, it appears that nearly all of these Stuart Stephen,3 W. James Kent,1 John S. Mattick,3 ultraconserved elements may have been un- David Haussler2* der extreme negative selection in many spe- cies for more than 300 million years, and There are 481 segments longer than 200 base pairs (bp) that are absolutely some of them for at least 400 million years. conserved (100% identity with no insertions or deletions) between orthologous As expected, the ultraconserved elements regions of the human, rat, and mouse genomes.Nearly all of these segments exhibit almost no natural variation in the are also conserved in the chicken and dog genomes, with an average of 95 and human population. Only 6 out of 106,767 99% identity, respectively.Many are also significantly conserved in fish.These bases examined in the ultraconserved ele- ultraconserved elements of the human genome are most often located either ments (excluding the first and last 20 bases in overlapping exons in involved in RNA processing or in introns or nearby each element) are at validated single- genes involved in the regulation of transcription and development.Along with nucleotide polymorphisms (SNPs) in NCBI’s more than 5000 sequences of over 100 bp that are absolutely conserved among SNP database (dbSNP) (table S2a). For this the three sequenced mammals, these represent a class of genetic elements much DNA, we would have expected 119 whose functions and evolutionary origins are yet to be determined, but which validated sites, so validated SNPs are under- Ϫ42 are more highly conserved between these species than are and appear represented by 20-fold (P Ͻ 10 ). The 48 to be essential for the ontogeny of mammals and other vertebrates. unvalidated SNPs we found revealed many likely errors in the unvalidated portion of Although only about 1.2% of the human served with orthologous segments in rodents: dbSNP (table S2b). These same 106,767 genome appears to code for proteins (1–3), those showing 100% identity and with no bases exhibit very few differences with the it has been estimated that as much as 5% is insertions or deletions in their alignment with chimp genome as well, showing only 38 more conserved than would be expected the mouse and rat. Exclusive of ribosomal single base changes where the chimp base has from neutral evolution since the split with RNA (rRNA) regions, there are 481 such a Phred quality score at least 45, whereas the rodents, and hence may be under negative segments longer than 200 bp that we call expected number would be 716 (roughly a or “purifying” selection (4–6). Several ultraconserved elements (table S1). They are 19-fold reduction, P Ͻ 10Ϫ200; supporting studies have found specific noncoding seg- widely distributed in the genome (on all chro- text, section S2). This low level of variation ments in the human genome that appear to mosomes except 21 and Y) and within the human population and in comparison be under selection, using a threshold for are often found in clusters (Fig. 1). The prob- with the chimp suggests that these elements are conservation of 70 or 80% identity with the ability is less than 10Ϫ22 of finding even one currently changing at a rate that is roughly 20 mouse over more than 100 base pairs (bp) such element in 2.9 billion bases under a times slower than the average for the genome. (7–13). A study of these elements on hu- simple model of neutral evolution with inde- Only 4.3% of the bases are different in the man 21 found that those that pendent substitutions at each site, using the chicken, which is also consistent with a roughly were very highly conserved in multiple spe- slowest neutral substitution rate that is ob- 20-fold reduction from neutral substitution rates cies contained significant numbers of non- served for any 1-Mb region of the genome (supporting text, section S2). coding elements (13). Similar results were (supporting text, section S1). Nearly all of Of the 481 ultraconserved elements, 111 found when comparing the human, mouse, these elements also exhibit extremely high overlap the mRNA of a known human and rat genomes (14, 15) in a study of the levels of conservation with orthologous re- -coding [including the untrans- 1.8–megabase (Mb) CFTR region (16, 17) gions in the chicken genome [467 out of 481 lated regions (UTRs)], 256 show no evidence and in a functional study of the SIM2 (467/481) ϭ 97% of the elements aligning at of transcription from any matching expressed in a number of mammalian species (18). an average of 95.7% identity, 29 at 100% sequence tag (EST) or mRNA from any spe- We determined the longest segments of identity] and about two-thirds of them with cies, and for the remaining 114 the evidence the human genome that are maximally con- the fugu genome as well (324/481 ϭ 67.3% for transcription is inconclusive. We call of the elements aligning at an average of these partly exonic (or exonic for short), non- 1Department of Biomolecular Engineering, 2Howard 76.8% identity), despite the fact that only exonic, and possibly exonic ultraconserved Hughes Medical Institute, University of California about 4% of the human genome can be reli- elements, respectively. A hundred non- Santa Cruz, Santa Cruz, CA 95064, USA. 3ARC Special ably aligned to the chicken genome (at an exonic elements are located in introns of Research Centre for Functional and Applied Genom- average of 62.9% identity where an align- known genes and the rest are intergenic. ics, Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia. ment is found), and less than 1.8% of the The non-exonic elements, both intronic and human genome aligns to fugu (at an average intergenic, tend to congregate in clusters *To whom correspondence should be addressed. E- mail: [email protected] (G.B.); [email protected] of 60% identity). In addition, nearly all ex- near transcription factors and developmen- (D.H.). hibit extremely high levels of conservation tal genes, whereas the exonic and possibly

www.sciencemag.org SCIENCE VOL 304 28 MAY 2004 1321 R EPORTS exonic elements are more randomly distrib- enrichment for RNA binding and regulation attributes are enriched in type I genes as well uted along the chromosomes (Fig. 1). of splicing (P Ͻ 10Ϫ18 and 10Ϫ9, respective- but 16, 8, and 9 orders of magnitude less There are 93 known genes that overlap ly, against all GO annotated human genes) significantly, respectively. This suggests that with exonic ultraconserved elements; we call and are uniquely abundant in the RNA rec- exonic ultraconserved elements may be spe- these type I genes. The 225 genes that are ognition motif RRM (P Ͻ 10Ϫ17, against all cifically associated with RNA processing and near the non-exonic elements we call type II InterPro annotated human genes). In contrast, non-exonic elements with regulation of tran- genes (methods in supporting text, section the type II genes are devoid of enrichment for scription at the DNA level. S3). We looked for categories of biological RNA binding or splicing or the RRM (P ϭ Non-exonic ultraconserved elements are process and molecular function defined in the 0.39, 0.44, and 0.77, respectively). However, often found in “gene deserts” that extend (GO) database (19) that are type II genes are strongly enriched for regu- more than a megabase. In particular, of the significantly enriched in type I and II genes lation of transcription and DNA binding (P Ͻ non-exonic elements, there are 140 that are and also searched InterPro (20) for enrich- 10Ϫ19 and 10Ϫ14, respectively), as well as more than 10 kilobases (kb) away from any ment in particular structural domains (Fig. 2). DNA binding motifs, in particular the Ho- known gene, and 88 that are more than 100 The type I genes show significant functional meobox domain (P Ͻ 10Ϫ14). These three kb away. The set of 156 annotated genes that

chr1 chr7 chr14

POU3F1 LMO4 PBX1 AKT3/ZNF238 SP4/SP8 AUTS2 DLX6 NOVA1 VRK1 FLJ10597 PTBP2 HNRPU TRA2A FOXP2 FOXG1B FAF1 HOXA HIC/TFEC COCH castor-ortholog AUTL1 NPAS3 GARNL1/GARNL2 MIPOL1 PRPF39 chr2 chr8 chr15

BCL11A ZFHX1B ST18 ZFPM2 MEIS2 MAP2K5 NR4A2 BHLHB5 TBR1 FIGN chr9 chr16 HAT1/DLX1/DLX2 SP3/PTD004 HOXD chr3 DMRT1/DMRT3 TLE4 MNAB OAZ C9orf39 HNRPK PBX3 IRX3/IRX5/IRX6 ELAVL2 NFAT5 FLJ22611 ATBF1 chr17 SATB1 FOXP1 ZNF288 SOX14 ZIC4 SHOX2 EVI1 chr10 chr18 LZK1/LHX1/AATF HOXB SFRS1 chr4 DRG11 PAX2 EHZF TCF4 C10orf11 FLJ13188 TAF4B ZNF407 BUB3 BRUNOL4 LRBA EBF3 chr11 chr19 chr20 chr5 LMO1 PAX6 KIAA1474 SOX6

KPNB2 RANBP17 orthopedia-ortholog chr12 chr21 chr22 chrY MEF2C NR2F1 chr6 chr13 HOXC chrX

DDX3X TFAP2B/PKHD1 POU3F2 DACH POLA/ARX NLGN3 GRIA3/STAG2

Base Position 24000000 24500000 Chromosome Band Xp22.11 Xp21.3 PDK3 POLA PCYT1B ARX Ultra-conserved Elements

Fig. 1. Locations of the 481 ultraconserved elements on the 24 human mRNA entry if no Human Genome Organization (HUGO)–named chromosomes. Each partly exonic element is represented by a thin gene was available. Among the cluster representatives, there is a blue tick mark extending above the chromosome, each non-exonic distinct enrichment for non-exonic elements and for developmental element by a green tick mark extending below the chromosome, and genes, suggesting that many of these clusters may be part of distal each possibly exonic element by a black tick mark centered on the enhancers or “global control loci” analogous to those studied in chromosome. Purple boxes represent centromeres. By joining two association with HOXD (38)orDACH (21). One possible such cluster, elements into a cluster when they are separated by less than 675 kb, near the ARX gene, is shown in more detail in the inset at the bottom we obtained 89 local clusters of two or more elements, each of which of the figure. There known genes are shown in blue (tall boxes for is boxed and named. Names are taken from a prominent gene or gene coding exons, shorter boxes for UTRs, and hatched lines for introns), family co-located with the cluster or from a Drosophila ortholog or and ultraconserved elements are shown below them.

1322 28 MAY 2004 VOL 304 SCIENCE www.sciencemag.org R EPORTS flank intergenic ultraconserved elements is ily, contain the RNA recognition motif. The exonic ultraconserved elements also in- significantly enriched for developmental ultraconserved elements associated with al- clude the consecutive mutually exclusive genes (P Ͻ 10Ϫ6) and in particular for genes ternative splicing events often contain small “flop” and “flip” exons (uc.478/9) from the involved in early developmental tasks (P ϭ coding exons that are skipped in the mRNA glutamate receptor GRIA3 (26), which ex- 2.7 ϫ 10Ϫ5), suggesting that many of the in some tissues, but the elements extend well hibits RNA editing as well as alternate associated ultraconserved elements may be into the flanking intronic regions on one or splicing (27). The “flop” ultraconserved distal enhancers of these early developmental both sides of the exon. Such is the case for element extends into the ϳ600-bp intron 13 genes. Indeed, one of these elements (uc.351 one explicitly studied ultraconserved element of the gene. At the other end (adjacent to in table S1) is contained in an enhancer situ- (uc.33) in PTBP2, a polypyrimidine tract– the previous exon), intron 13 contains a ated about 225 kb upstream of DACH (ho- binding protein (23). PTBP2 contains a 312- much shorter highly conserved RNA hair- molog of the Drosophila dachshund gene, bp ultraconserved segment that is mostly in- pin structure that guides the essential and known to be involved in the development of tronic but includes a small (34-bp) exon that highly regulated editing of adenosine to brain, limbs, and sensory organs), which has is included in the mRNA only in brain tissue. inosine (27). Although the element contain- been shown to reproducibly drive expression The 203 bases at the 3Ј end of the element, ing the “flop” exon does not have detect- in the retina when cloned upstream of a including the 34-bp exon, are 100% con- able RNA secondary structure preferences, mouse heat shock protein 69 minimal pro- served in the chicken as well. the minimal energy of the secondary struc- moter coupled to ␤-galactosidase and inject- The PTBP2 element may form an RNA ture of the element containing the “flip” is ed into a mouse oocyte (21). Non-exonic structure in the pre-mRNA that participates in less than that of 34 out of 10,000 permuted ultraconserved elements that lie in introns are the regulation of splicing through interactions versions, indicating possible structure. also often associated with developmental with the spliceosome (23). We used the pro- Although the minimal region of 100% genes. These include the neuroretina-specific gram RNAfold (24) to further assess the po- conservation between human, mouse, and rat enhancer in the fourth intron of PAX6 tential of this and other ultraconserved ele- that was required to be included in the ultra- (uc.328), investigated in quail but shown to ments to form an RNA secondary structure, conserved set was 200 bp, many elements also be functionally conserved in mouse (22). comparing the energy of the best folded were considerably longer. The longest ele- Type I genes (harboring exonic elements) structure for both the positive and negative ments (779, 770, and 731 bp) all lie in the last include many genes encoding well-known strand element to that of 10,000 random per- three introns in the 3Ј portion of POLA, the RNA-binding proteins, such as HNRPK, mutations of the same sequence (table S4). DNA polymerase alpha catalytic subunit (EC HNRPH1, HNRPU, HNRPDL, HNRPM, No statistically significant structure was 2.7.7.7) on chromosome X, along with other SFRS1, SFRS3, SFRS6, SFRS7, SFRS10, found for the PTBP2 element, but the energy shorter ultraconserved elements (Fig. 1). A SFRS11, TRA2A, PCBP2, and PTBP2. All of of the fold for the 573-bp ultraconserved similar-sized conserved region, 711 bp the above are among the 59 type I genes self-regulated alternatively-spliced UTR ele- formed by concatenation of uc.468 and annotated by GO that exhibit clear mRNA/ ment (uc.189) in arginine/serine-rich splicing uc.469 (separated by a single base), lies in the EST evidence of alternative splicing overlap- factor SFRS3 (25) was lower than that of all ϳ7-kb intergenic region between the 3Ј end ping the ultraconserved element [out of 66 but one of the 10,000 randomized versions of of POLA and its downstream neighbor, the elements in all, from a total of 111 exonic this sequence, indicating that it may form an ARX homeobox gene. ARX is involved in elements (table S3)]. Many of the above, important RNA secondary structure (fig. S2). central nervous system development and is including the six members of the SFRS fam- In addition to alternative splicing, the associated with a host of X-linked Mendelian diseases, including epilepsy, mental retarda- tion, autism, and cerebral malformations Annotation Enrichment in Type I and Type II Genes (28). Because this group of elements lies at the 3Ј end of the 303-kb POLA gene, nearer RNA binding observed to the 3Ј end of ARX than to the rest of POLA RNA splicing expected (Fig. 1), it is possible that their function is not RNA rec. motif related to POLA but that they instead form a Transcription reg. cluster of enhancers of ARX. The longest of DNA binding these ultraconserved elements, 779 bp, is Homeobox actually adjacent to a 275-bp element, RNA binding which together form a 1046-bp region with RNA splicing only one change in rodents. As a calibra- RNA rec. motif tion, note that these POLA/ARX elements Transcription reg. are considerably longer than the ultracon- served portions of the human, mouse, and DNA binding Type II Genes I rat rRNA genes, which harbor six ultracon- Homeobox served segments, three each in the 18S and 0 10203040506070 28S rRNA genes, the longest of which is No. of Genes 563 bp (table S1). Fig. 2. Annotation enrichment in type I and type II genes. In the top half of the figure, the maroon In sharp contrast to rRNA and most hu- bars (“observed”) give the numbers of type I genes that are annotated in the GO database (19) with man coding regions, there were only 24 out of molecular function “RNA binding” or “DNA binding” or biological process “RNA splicing” or 481 cases (5%) where an ortholog of an “transcription regulation,” or are annotated in InterPro (20) as containing the domains “RNA ultraconserved element could be partially recognition motif” or “homeobox.” The blue bars (“expected”) give the number of genes that one traced back by sequence similarity search as would expect to obtain if the same number of genes (111 genes for type I) were chosen at random far as Ciona intestinalis, Drosoplila melano- among all genes annotated in the relevant database. The bottom half of the figure gives similar information for type II genes. It is apparent that type I genes are enriched for RNA-related gaster,orCaenorhabditis elegans (table S7). functions, whereas type II genes are not. Both types are enriched for DNA-related functions, but the All of these were among the 68 elements type II genes are more enriched. (14%) that overlapped coding exons from

www.sciencemag.org SCIENCE VOL 304 28 MAY 2004 1323 R EPORTS known genes. In 17 of these 24 “ancient” abut in C. elegans. Despite the inability to If experiments with less conserved ele- cases, there is clear mRNA or EST evidence trace most of the vertebrate ultraconserved ments in recent studies (13, 18) are any that the coding region overlapped by the el- elements to distant species, the possibility indication, many of these shorter elements ement is alternatively spliced in humans. that processes similar to those that produced are also functional. Compared to the ultra- These include alternatively spliced exons of ultraconserved elements in vertebrates also conserved elements, a greater percentage of genes EIF2C1, BCL11A, EVI1, ZFR, CLK4, exist in other classes of species remains open. these shorter conserved elements are signif- HNRPH1, and DDX5, as well as GRIA3.In In one tantalizing example, it has been ob- icantly different in birds but are highly none of the other cases could we find evi- served that the mating type gene MATa2 in conserved in mammals. This suggests that dence that any element that was intronic in yeast shows 100% conservation over 357 bp the process of the evolution of new ele- humans was coding in another species, al- in four yeast species (30). The mechanism of ments, followed by near-“freezing” of their though in some cases there was EST evidence this conservation is not known. DNA sequences, is probably still ongoing for a retained intron that presumably has a We found only 12 paralogous sets, each in vertebrates. Lineage-specific specializa- function other than protein coding. Moreover, consisting of two or three elements, among tions of these elements may reflect regula- indels of noncoding ultraconserved elements all 481 ultraconserved elements (table S5). tory changes that are important to the on- relative to their alignments with the chicken Each paralogous set is consistent with the togeny and physiology of the clade. and other species are often not in multiples of paralogy relationship between the enclosing The patterns of conservation exhibited in three, giving further evidence that these se- or nearby “host” genes. All paralogs (except, the ultraconserved elements must result from quences are noncoding (fig. S1, A and B). currently, uc.344 overlapping HOXC5) have the onset during chordate evolution of either The ultraconserved elements we found in highly conserved matches in the chicken, a highly elevated negative selection rate in introns seem to have been at one time rather providing more opportunities for evolution- these regions (about a 20 times smaller fast-evolving as compared to the known cod- ary analysis of these duplication events that chance of mutations becoming fixed in the ing exons in their genes. We tried to map predate the divergence from birds. In each of population), a highly reduced mutation rate selected introns containing ultraconserved in- the clusters, we found significant divergence (about 20 times fewer mutations), or some tronic elements to more distant species using between the paralogs, which must have oc- combination of these effects. The possibility protein/translated DNA matches to their en- curred in the early part of their evolution (fig. of strong negative selection is intriguing, be- closing exons. Often only a “core” conserved S3), because each individual instance in a cause selection to maintain protein coding, region was recognizable in fish, and this had paralogous set has changed very little in the protein-nucleic acid interactions, or RNA- very different flanking DNA, suggesting that past 300 million years in birds and mammals. RNA interactions does not result in near total additional parts of the ultraconserved region This, combined with the above analysis, sug- conservation over long stretches of bases un- were innovations after the common ancestor gests that the bulk of the ultraconserved ele- less multiple functions are overlaid on the with fish, as observed in the analysis of ments represent chordate innovations that same DNA, such as in regions of coding uc.108 near HOXD (29). In cases where we evolved fairly rapidly at first but then slowed exons that also bind splicing factors or in could trace beyond vertebrates, we always down considerably, becoming effectively regions of rRNA that must form RNA struc- found that the orthologous intron in the more “frozen” in birds and mammals. tures as well as bind proteins. If the exonic distant species was either very small with A more extensive analysis of paralogs, ultraconserved elements form pre-mRNA struc- apparently unrelated sequence, or was non- based on a recent global clustering of high- tures that are under selection to preserve inter- existent. For example, tracing the intron that ly conserved noncoding human DNA (31), action with the spliceosome or editing machin- contains the first (most 5Ј) ultraconserved reveals several further highly conserved in- ery (23, 27), then these interactions must be element in POLA (uc.460), we find that al- tronic and intergenic elements in function- extremely constraining over hundreds of bases though it is an approximately 50-kb intron in ally equivalent positions relative to paralo- of DNA, much like those of the anciently de- humans, its ortholog in Fugu rubripes is only gous genes. These were not classified as rived rRNAs, making them potentially quite ϳ7500 bp (which is still large relative to ultraconserved by our stringent criteria. In- novel objects for molecular study. The same most fugu introns), only about 335 bp in C. deed, if we merge alignment blocks of 200 holds true if the conservation in the non-exonic intestinalis, and does not exist (the flanking bases, each with at least 99% identical elements is associated with selection for molec- exons abut) in D. melanogaster and C. ele- columns, we obtain 1974 “highly con- ular interactions involved in the regulation of gans. The human element is not recognizably served” elements up to 1087 bp long in the transcription, which could be in cis over long similar to anything in the orthologous intron human. Four of the five longest elements genomic distances, or in trans, perhaps also of Ciona. Yet, like the other POLA ultracon- are the aforementioned POLA/ARX ele- involving RNA (29, 32, 33). served elements discussed above, this ele- ments, along with a 906-bp element (en- On the other hand, if reduced mutation ment is more than 99% identical between compassing uc.326/7) in an intron of ELP4, rates are the explanation, then the existence human and chicken. Similar results were adjacent to PAX6. If instead we demand at of regions of a few hundred bases with 20- found for the three longest POLA intronic least a 100-bp exact match between humans fold reduced mutation rates would itself be elements. Another similar case was a cluster and rodents, we get more than 5000 highly quite novel. Although neutral mutation rates of seven ultraconserved elements (uc.273 to conserved elements. Tens of thousands may vary depending on chromosomal loca- uc.279) with sizes ranging from 237 to 432 more are found at lower cutoffs; for exam- tion on a megabase scale (34–36), there is to bp, all contained in an ϳ165-kb intron of ple, there is a 57-bp exactly conserved our knowledge no evidence or precedent for PBX3, pre-B cell leukemia transcription fac- sequence overlapping an alternatively the existence of short “hypomutable” or “hy- tor 3, a member of the TALE/PBX homeobox spliced exon of the WT1 gene which is perrepaired” neutral regions. Finally, the an- family. This was one of the largest introns we invariant in mammals and in chickens and swer could also be a combination of negative found, and it contained one of the largest is largely conserved in fishes (fig. S1). The selection and better repair in these regions, collections of ultraconserved elements in a percentage of the conserved elements that owing to some vital role that these elements single intron. The orthologous intron in F. overlap with a known coding region steadi- play, such as self-regulating networks of rubripes is ϳ38 kb, in C. intestinalis it ap- ly rises from 14 to 34.7% as the length RNA processing control in the case of exonic pears to be ϳ1 kb, in Drosophila it is ϳ200 criteria defining these elements is reduced elements and self-regulatory networks of bp (ortholog exd), and the flanking exons from 200 to 50 bp (table S6). transcriptional control for non-exonic ele-

1324 28 MAY 2004 VOL 304 SCIENCE www.sciencemag.org R EPORTS ments. In any case, the questions remain: peat sites for 1-Mb windows; and S. Walton for help in SOM Text What kind of elements associated with these preparing the manuscript. G.B., W.J.K., and D.H. were Figs. S1 to S3 supported by National Human Genome Research Insti- Tables S1 to S7 processes would have arrived relatively early tute grant 1P41HG02371 and National Cancer Institute in chordate evolution and then become prac- contract 22XS013A, and D.H. was additionally support- References and Notes tically frozen in birds and mammals? And ed by the Howard Hughes Medical Institute. S.S., M.P., I.M., and J.S.M. were supported by the Australian Re- 19 March 2004; accepted 27 April 2004 what mechanisms would underlie this, allow- search Council and the Queensland State Government. Published online 6 May 2004; ing them to resist virtually all further change? Supporting Online Material 10.1126/science.1098119 Note added in proof: We recently became www.sciencemag.org/cgi/content/full/1098119/DC1 Include this information when citing this paper. aware of related observations made by Bof- felli et al.(37).

References and Notes 1. E. S. Lander et al., Nature 409, 860 (2001). A Family with Severe Insulin 2. J. C. Venter et al., Science 291, 1304 (2001). 3. Human Genome Sequencing Consortium, in preparation. 4. R. H. Waterston et al., Nature 420, 520 (2002). Resistance and Diabetes Due to a 5. K. M. Roskin, M. Diekhans, D. Haussler, in Proceedings of the 7th Annual International Conference on Re- search in Computational Molecular Biology (ACM, Mutation in AKT2 New York, NY, 2003), pp. 257–266. 6. F. Chiaromonte et al., Cold Spring Harbor Symp. Stella George,1* Justin J. Rochford,1* Christian Wolfrum,3 Quant. Biol. 68, 245 (2003). 1 1 1 7. R. C. Hardison, Trends Genet. 16, 369 (2000). Sarah L. Gray, Sven Schinner, Jenny C. Wilson, 8. G. G. Loots et al., Science 288, 136 (2000). Maria A. Soos,1 Peter R. Murgatroyd,1 Rachel M. Williams,2 9. L. A. Pennacchio, E. M. Rubin, Nature Rev. Genet. 2, 2 2 4 100 (2001). Carlo L. Acerini, David B. Dunger, David Barford, 10. K. A. Frazer et al., Genome Res. 11, 1651 (2001). A. Margot Umpleby,5 Nicholas J. Wareham,6 Huw Alban Davies,7 11. U. DeSilva et al., Genome Res. 12, 3 (2002). 8 3 1 12. E. T. Dermitzakis et al., Nature 420, 578 (2002). Alan J. Schafer, Markus Stoffel, Stephen O’Rahilly, † 13. E. T. Dermitzakis et al., Science 302, 1033 (2003). Ineˆs Barroso8,9 14. RatGenome Sequencing Consortium, Nature 428, 493 (2004). 15. G. M. Cooper et al., Genome Res. 14, 539 (2004). Inherited defects in signaling pathways downstream of the insulin receptor have 16. J. W. Thomas et al., Nature 424, 788 (2003). long been suggested to contribute to human type 2 diabetes mellitus.Here we 17. E. H. Margulies, M. Blanchette, D. Haussler, E. D. ␤ Green, Genome Res. 13, 2507 (2003). describe a mutation in the gene encoding the protein kinase AKT2/PKB in a 18. K. A. Frazer et al., Genome Res. 14, 367 (2004). family that shows autosomal dominant inheritance of severe insulin resistance 19. M. Ashburner et al., Nature Genet. 25, 25 (2000). and diabetes mellitus.Expression of the mutant kinase in cultured cells dis- 20. N. J. Mulder et al., Nucleic Acids Res. 31, 315 rupted insulin signaling to metabolic end points and inhibited the function of (2003). 21. M. A. Nobrega, I. Ovcharenko, V. Afzal, E. M. Rubin, coexpressed, wild-type AKT.These findings demonstrate the central importance Science 302, 413 (2003). of AKT signaling to insulin sensitivity in humans. 22. S. Plaza, C. Dozier, M. C. Langlois, S. Saule, Mol. Cell. Biol. 15, 892 (1995). Most forms of diabetes are likely to be poly- tified a missense mutation in the serine/ 23. L. Rahman, V. Bliskovski, F. J. Kaye, M. Zajac-Kaye, Genomics 83, 76 (2004). genic in origin, although a number of mono- threonine kinase gene AKT2 in one Caucasian 24. I. L. Hofacker, Nucleic Acids Res. 31, 3429 (2003). genic forms are being recognized (1, 2). Al- proband. AKT2 (also known as PKB␤)is 25. H. Jumaa, P. J. Nielsen, EMBO J. 16, 5077 (1997). though rare, these monogenic examples offer highly expressed in insulin-sensitive tissues 26. B. Sommer et al., Science 249, 1580 (1990). 27. P. J. Aruscavage, B. L. Bass, RNA 6, 257 (2000). insight into the function of the affected gene and is activated in response to growth factors 28. E. H. Sherr, Curr. Opin. Pediatr. 15, 567 (2003). in humans as well as offering important clues and related stimuli (3, 4), a process that 29. C. Sabarinadh, S. Subramanian, R. Mishra, Genome to understanding more common forms. requires its phosphorylation by the phospho- Biol. 4 (2003). We have been screening genomic DNA inositide-3 phosphate-dependent kinase ac- 30. M. Kellis, N. Patterson, M. Endrizzi, B. Birren, E. S. Lander, Nature 423, 241 (2003). from 104 unrelated subjects with severe in- tivities designated PDK1 and PDK2 (3). The 31. G. Bejerano, D. Haussler, M. Blanchette, Bioinformat- sulin resistance for mutations in genes that proband (Fig. 1D, iii/1) is a nonobese 34- ics (Suppl.), in press. are implicated in insulin signaling. We iden- year-old female who developed diabetes mel- 32. J. S. Mattick, M. J. Gagen, Mol. Biol. Evol. 18, 1611 (2001). litus at 30 years of age. The proband, her 33. E. T. Dermitzakis et al., Genome Res. 14, 852 1Department of Clinical Biochemistry, 2Department nonobese mother, her maternal grandmother, (2004). of Paediatrics, University of Cambridge, Adden- and a maternal uncle were all heterozygous 34. K. H. Wolfe, P. M. Sharp, W. H. Li, Nature 337, 283 brooke’s Hospital, Hills Road, Cambridge CB2 2QQ, (1989). for a G-to-A substitution predicted to result in UK. 3Laboratory of Metabolic Diseases, Rockefeller 35. R. C. Hardison et al., Genome Res. 13, 13 (2003). 4 an R-to-H substitution at amino acid 274 36. J. H. Chuang, H. Li, PLoS Biol. 2, E29 (2004). University, New York, NY 10021, USA. Section of (Fig. 1, A and B) (5). All were markedly 37. D. Boffelli, M. Nobrega, E. M. Rubin, Nature Rev. Structural Biology, Institute of Cancer Research, Ches- 5 hyperinsulinemic (table S1), and the mother Genet., in press. ter Beatty Laboratories, London SW3 6JB, UK. De- 38. F. Spitz, F. Gonzalez, D. Duboule, Cell 113, 405 partment of Diabetes, Endocrinology and Internal and maternal grandmother developed diabe- (2003). Medicine, Guy’s, King’s and St. Thomas’ School of tes mellitus in their late thirties. Three other 6 39. We thank the Genome Sequencing Consortia for the Medicine, London, UK. Medical Research Council Ep- first-degree relatives available for study were human, mouse, rat, dog, chicken, and other genome idemiology Unit, Strangeways Research Laboratory, sequences we used in this analysis; W. Miller, M. Worts Causeway, Cambridge CB1 8RN, UK. 7Darent all clinically normal, with normal fasting glu- Diekhans, A. Hinrichs, K. Rosenbloom, D. Thomas, and Valley Hospital, Darenth Wood Road, Dartford, Kent cose and insulin, and were homozygous for the members of the University of California Santa Cruz DA2 8DA, UK. 8Incyte, 3160 Porter Drive, Palo Alto, the wild-type AKT2 sequence (Fig. 1D and (UCSC) browser team for providing the genome align- 9 CA 94304, USA. Wellcome Trust Sanger Institute, table S1). This mutation was not found in the ments and other tracks of genome annotation available Wellcome TrustGenome Campus, Hinxton,Cam- on the UCSC genome browser; M. Blanchette, S. bridge CB10 1SA, UK. genomic DNA of 1500 Caucasian control Salama, T. Lowe, M. Ares, K. Pollard, and B. Cohen for subjects from the United Kingdom. helpful discussions; A. Siepel for the neutral substitution *These authors contributed equally to this work. 274 rate analysis involving chicken and chimp; K. Roskin for †To whom correspondence should be addressed. E- R forms part of an RD sequence motif the calculation of the percent identity in ancestral re- mail: [email protected] within the catalytic loop of the AKT2 kinase

www.sciencemag.org SCIENCE VOL 304 28 MAY 2004 1325