<<

news and views Recounting a genetic story Roger H. Reeves

The DNA sequence of 21, now published, provides indications that the total number of human has been overestimated, and is a valuable resource for research into .

he sequencing of whole means that genetic information is increasingly Tconsidered in functional rather than structural terms. But for the moment, the principal structural element of genomes — the chromosome — remains the predomi- nant unit by which to measure progress in the Human Project. LIBRARY CNRI/SCIENCE PHOTO On page 311 of this issue1, a multinational consortium reports the complete sequence of , the smallest human chromosome. The results indicate that pre- vious estimates of the total number of human genes may need to be revised down- wards. Meanwhile, the small number of genes, and a catalogue that identifies them, provide a boost for those endeavouring to define all of the primary molecular players in Down syndrome. Affecting one in 700 live births, Down syndrome occurs when three copies of chromosome 21 are inherited instead of two (Fig 1, top). The condition is the most common known genetic cause of mental retardation and the leading cause of congenital heart disease, and results in a wide variety of other developmental and health problems. The consortium’s report1 describes sev- eral new technical achievements. The total length of the sequence reported is 33.55 million base pairs (or megabases, Mb). This covers 99.7% of the long arm of chro- mosome 21 (Fig 1, bottom), and just exceeds the 33.46 Mb reported for the slightly larger long arm of Figure 1 Chromosome 21 in (ref. 2). The paper includes the longest con- context. Top, triplication of tinuous DNA sequence reported to date, chromosome 21 is the extending 28.5 Mb. The entire chromo- genetic defect underlying some sequence has only three gaps Down syndrome. Bottom, (totalling 100 kilobases), compared with transmission electron the ten gaps (totalling about 1 Mb) for the micrograph of chromosome long arm of chromosome 22. 21, showing the long and The new sequence also includes 281 short arms. RICHARD J. GREEN/SCIENCE PHOTO LIBRARY GREEN/SCIENCE PHOTO RICHARD J. kilobases from chromosome 21’s short arm — mapping and cloning of which posed a challenge because it contains several classes of highly repetitive sequences3. The length of this short arm can vary greatly among individuals. So this sequence is the first example of a large genome region that can tion of the . The most strik- (Box 1, overleaf). But the dif- expand or contract on a scale of many ing difference is the reduced content of ferences may well balance each other out, megabases. chromosome 21 — 225 genes identified, meaning that a comparison of gene numbers The sequencing of the long arm of chro- compared with 545 on chromosome 22. The is valid. mosome 21 provides a somewhat arbitrary, two consortia responsible for these Chromosome 21 was expected to be rela- but nonetheless worthwhile, basis for deriv- sequences used somewhat different criteria tively gene-poor, but it seems that it is even ing conclusions about the general organiza- to identify the genes within their respective more impoverished than anticipated. The

NATURE | VOL 405 | 18 MAY 2000 | www.nature.com © 2000 Macmillan Magazines Ltd 283 news and views long arm of chromosome 21 represents about 1% of the human genome, but was Box 1 Finding genes in a DNA haystack predicted to contain less than 1% of the total number of human genes. The Unigene The two groups1,2 involved in chromosome 21 by an computer algorithms that can project4 suggested that chromosome 21 sequencing chromosomes 21 unknown amount. predict exons from a variety of would contain only 80% of the number of and 22 used a similar The chromosome 21 sequence features (in silico genes that would be expected on the basis of combination of methods to consortium1 used a prediction). Those with a gene- its size. If the total number of human genes search the sequences for conservative criterion for like structure identified by two were 100,000, as predicted, chromosome 21 genes. But they set different identifying genes amid DNA or more algorithmic methods would still be expected to contain 800–1,000 criteria to arrive at the sequences. This criterion were added into the genes. The 225 genes now identified1 stand in numbers of genes and CpG demanded that regions chromosome 21 gene total. stark contrast to this prediction. islands — regions of DNA with matching the short sequences Gene-prediction programs can Combining data from the long arms of more than 55% of cytosine and representing the transcripts of give high false-positive rates of the two completely sequenced chromo- guanine nucleotides, which genes should show evidence of exon prediction, even in somes, the chromosome 21 consortium esti- often mark the 5፱ ends of having been spliced from combination, and could inflate mates that the human genome may contain genes. multiple -coding gene the gene number. as few as 40,000 genes. However, this is based Sixty per cent of portions (exons). For chromosome 22, genes on complete sequences for just 2% of the mammalian genes are This would lower the predicted only by algorithmic human genome, and could be low for a vari- reported to have a CpG island predicted number of genes methods do not contribute to ety of reasons. For example, other human near their 5፱ end. But the relative to the calculation for the total of 545 genes. But the chromosomes may be more gene-rich. The percentage of CpG islands chromosome 22 (ref. 2), which analysis of chromosome 22 major histocompatibility complex (MHC) associated with genes is included matches of these included a projection that was region on — a region essen- unknown, and can only be expressed sequence tags corrected for false positives tial to the immune system — spans only 3.6 determined by knowing the (ESTs) to single putative exons. resulting from in silico Mb, but contains 128 genes and 96 pseudo- total number of genes. Genes with large 3፱ predictions. If genes predicted genes5. Chromosome 22’s long arm2 is untranslated regions or large only by algorithm were Another measure of gene richness is pro- reported to have 553 CpG introns (non-coding parts of included, the chromosome 22 vided by the number of ‘CpG islands’ on the islands. The corresponding genes), or those represented in total might rise by about 100. long arms of chromosomes 21 and 22. These number is not given for EST databases only by So the differences in the gene- islands are DNA sequences of a few hundred chromosome 21. Rather, the untranslated sequences, are identification strategies used by base pairs that have a high amount (more total of 115 CpG islands likely to be excluded by the the two consortia will tend to than 55%) of cytosine and guanine reported for chromosome 21 chromosome 21 criterion. cancel each other out. The nucleotides. They are associated with about includes only the subset that However, the estimate of degree to which this is true 60% of known human genes, and might be are not associated with 225 genes on the long arm of could be determined by re- useful in gene identification. The two repetitive DNA elements. This chromosome 21 includes those analysis of each sequence 1,2 sequencing consortia again applied differ- lowers the CpG island total on that are identified only by using the other method. R. H. R. ent criteria to count CpG islands (Box 1), and these differences probably produce a total that is higher — by an unknown amount — gene-linkage maps may result from the dif- the contributions of specific genes to traits for chromosome 22’s long arm. Even so, fering resolution of these maps. In fact, the seen in Down syndrome. The small number chromosome 21 appears to be even poorer in higher-resolution physical map6 of mouse of genes on chromosome 21 is likely to be CpG islands than in genes when compared shows that all 24 genes part of the reason why the presence of three with chromosome 22. known to be shared between mouse chro- copies of this chromosome — unlike so The chromosome 22 sequencing consor- mosome 10 and human chromosome 21 many chromosome defects — is not fatal at a tium suggested that its identification of 545 occur in the same order. The high degree of very young age, or even before birth. Yet genes on the long arm was low — a conclu- conservation between human and mouse is there are varying ideas about which genes are sion based in part on the fact that 271 of the important, because comparing the two associated with particular features of Down 553 identified CpG islands have not yet been sequences — as more of the mouse syndrome, and the mechanisms by which associated with genes. In fact, nearly all of sequence becomes available — is likely to an imbalance in the number of genes might the 115 conservatively predicted CpG increase our ability to pick out genes and produce the more than 80 physical and islands on chromosome 21’s long arm are other significant features from the welter of mental disorders that can be seen in this associated with genes. Analysis of both sequence information. . Obtaining a comprehensive cata- chromosomes using the same methods will The availability of the chromosome 21 logue of the genes on chromosome 21 has help to determine the accuracy of identify- sequence will have an immediate impact been a goal of Down syndrome researchers ing genes by counting CpG islands. on the study of human single-gene dis- for many years, and is realized in this land- The chromosome 21 sequencing con- orders. For example, the genes responsible mark contribution. I sortium also compared the chromosome 21 for five of those monogenic disorders that Roger H. Reeves is in the Department of Physiology, sequence with data in the available mouse map to chromosome 21 — including two The Johns Hopkins University School of Medicine, genome database. No new conserved synte- forms of deafness, Usher and Knobloch’s 725 North Wolfe Street, Baltimore, Maryland nies — regions where the same genes are syndromes — have not yet been identified. 21205-2105, USA. ‘linked’ on chromosomes in different But having the complete sequence will e-mail: [email protected] species — were identified. The previously obviate the labour-intensive step of identi- 1. The Chromosome 21 Mapping and Sequencing Consortium known conserved syntenies are with fying candidate genes. The genes responsi- Nature 405, 311–319 (2000). 2. Dunham, I. et al. Nature 402, 489–495 (1999). mouse chromosomes 10, 16 and 17. The ble for these disorders are likely to be found 3. Wang, S. Y. et al. Genome Res. 9, 1059–1073 (1999). chromosome 21 consortium suggests that rapidly. 4. Deloukas, P. et al. Science 282, 744–746 (1998). 5. The MHC Sequencing Consortium Nature 401, 921–923 discrepancies in the gene order predicted But the greatest impact of the chromo- (1999). by comparing the sequence to mouse some 21 gene catalogue will be in assessing 6. Wiltshire, T. et al. Genome Res. 9, 1214–1222 (1999).

284 © 2000 Macmillan Magazines Ltd NATURE | VOL 405 | 18 MAY 2000 | www.nature.com