Downloaded from

https://www.cambridge.org/core

British Journal of Nutrition (2004), 91, 1–2 DOI: 10.1079/BJN20031012 q The Author 2004

Editorial

. IP address:

Of and genomes – and dark matter 170.106.35.234

The draft sequence of the was published in recognised as being involved in determining fat deposition, , on

February 2001 with simultaneous papers in and in e.g. a lysosomal transporter and a peptide transporter 29 Sep 2021 at 09:35:57 Science (Lander et al. 2001; Venter et al. 2001). For the (Ashrafi et al. 2003). scientific community this achievement was of enormous Although the number of genes in C. elegans is quoted importance, and its broader cultural significance was as a very precise figure, the number of human genes emphasised by the involvement of political leaders on continues to be a matter of uncertainty. A few years ago both sides of the Atlantic in the accompanying publicity. it was estimated that there were upwards of 100 000

The draft sequence was replaced 2 years later, in April protein-coding genes in the human genome. This number , subject to the Cambridge Core terms of use, available at 2003, by the ‘final’ version. This was done not by publi- fell to as little as 32 000 when the draft sequence of the cation in a journal but through an announcement that the human genome was published (Lander et al. 2001), and sequence information had been deposited in publicly the much lower than expected figure was one of the accessible databases. The sequence of the human major surprises from the genome sequence. The difficulty genome undoubtedly represents the summit of the in determining the precise number of human genes reflects genome sequencing efforts that began more than a the fact that exons – the protein coding regions – make up decade ago. only around 2 % of the total DNA of the genome. This We now have the full sequence of the genome of a means that some 98 % of the genome does not directly rapidly growing number of organisms. These range from code for proteins; some, of course, reflects regulatory bacteria such as Escherichia coli and Salmonella enterica, sequences. the nematode worm, Caenorhabditis elegans, the fruit fly, There are distinct regions of the genome which appear Drosophila melangaster, and plants such as rice (Oryza not to contain any genes – the so-called dark matter sativa). Priorities for the future include the sequencing regions. There is continuing debate as to whether the of the honeybee (Apis mellifera), chicken (Gallus dark matter does really contain genes; however, if present gallus), and chimpanzee (Pan troglodytes) genomes (see protein coding sequences are not recognised by the various https://www.cambridge.org/core/terms Couzin, 2003). Of particular importance is the deposition software programmes that predict counts. These of the draft sequence of the genome of the mouse in May programmes themselves come up with very different 2002 (see Marshall, 2002). The mouse is, of course, estimates for the number of human genes outwith the widely used as an experimental animal and as a model regions of dark matter. At one end of the spectrum, GEN- system for human disease, including in nutrition-related SCAN predicts as many as 45 000 genes, while Ensembl/ research. For example, genes critical for the regulation Genewise gives a prediction of just 24 500 – with other of energy balance, such as those encoding leptin and its programmes providing figures closer to the lower end of receptor, have been identified through studies on mutant this range (see Pennisi, 2003). mice (Zhang et al. 1994; Chua et al. 1996; Lee et al. Three years ago the gene sequencing community set up

.

1996). a betting pool - GeneSweep - with a prize for the estimate https://doi.org/10.1079/BJN20031012 More primitive organisms can also be valuable for which was nearest to the final agreed value. Last summer investigating regulatory processes that impact on human (2003) the decision was made to award the prize to the function and disease, and this has been part of the under- estimate closest to 24 500 – the Ensembl/Genewise predic- lying ethos of those studying C. elegans. The potential tion. In practise, even this currently accepted best value impact of such an approach is well illustrated by a recent could be an over-estimate of the number of protein- report in which a genome-wide RNAi (RNA-mediated coding genes, since it may include as many as 3 000 pseu- interference) analysis has identified a large number of dogenes (see Pennisi, 2003). This suggests that the real genes implicated in normal fat storage (Ashrafi et al. number of human genes is little more than 20 000. Such 2003). The C. elegans genome is believed to encompass a figure seems remarkable, given that C. elegans –a 16 757 genes, and 305 gene inactivations which reduce simple worm – is considered to contain nearly 17 000 body fat and 112 that lead to increases in fat storage genes. have been identified. A number of these C. elegans fat Clearly, establishing the true number of protein-coding regulatory genes have mammalian homologues and some genes in the human genome is much more than of are already recognised as important in the control of abstract interest; it is a pre-requisite for comprehending body fat in man. But this approach has also led to the the full complexity of human function at the molecular, identification of genes encoding proteins not hitherto cellular and physiological levels – as well as for the Downloaded from

https://www.cambridge.org/core

2 P. Trayhurn maturation of the rapidly burgeoning field of nutritional mouse diabetes and rat fatty due to mutations in the ob (Trayhurn, 2003). (leptin) receptor. Science 271, 994–996. Couzin J (2003) GENOMICS: Sequencers examine priorities. Paul Trayhurn Science 301, 1176–1177.

Lander ES, Linton LM, Birren B, et al. (2001) Initial sequencing . IP address: Editor-in-Chief and analysis of the human genome. Nature 409, 860–921. Liverpool Centre for Nutritional Genomics Lee GH, Proenca R, Montez JM, et al. (1996) Abnormal Department of Medicine splicing of the leptin receptor in diabetic mice. Nature 379, University of Liverpool 632–635. 170.106.35.234 Liverpool L69 3GA Marshall E (2002) Public group completes draft of the mouse. UK Science 296, 1005. [email protected] Pennisi E (2003) BIOINFORMATICS: Gene counters struggle to

get the right answer. Science 301, 1040–1041. , on

Trayhurn P (2003) Nutritional genomics – ‘Nutrigenomics’. Br J 29 Sep 2021 at 09:35:57 References Nutr 89, 1–2. Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of Ashrafi K, Chang FY, Watts JL, et al. (2003) Genome-wide RNAi the human genome. Science 291, 1304–1351. analysis of Caenorhabditis elegans fat regulatory genes. Nature Zhang YY, Proenca R, Maffei M, Barone M, Leopold L & 421, 268–272. Friedman JM (1994) Positional cloning of the mouse obese Chua SC, Chung WK, Wupeng XS, et al. (1996) Phenotypes of gene and its human homolog. Nature 372, 425–432.

, subject to the Cambridge Core terms of use, available at

https://www.cambridge.org/core/terms

.

https://doi.org/10.1079/BJN20031012