Life Will Never Be the Same Annual Genome Sequencing and Biology Meeting, Cold Spring Harbor Laboratory, USA
Total Page:16
File Type:pdf, Size:1020Kb
Yeast Yeast 2000; 17: 241±243. Meeting Review Life will never be the same Annual Genome Sequencing and Biology Meeting, Cold Spring Harbor Laboratory, USA. May 2000 M.A. Strivens* MRC Mammalian Genetics Unit and UK Mouse Genome Centre, Harwell, UK *Correspondence to: M. A. Strivens, MRC Mammalian Genetics Unit and UK Mouse Genome Centre, Harwell, Oxfordshire OX11 0RD,UK. E-mail: [email protected] It seems appropriate that the Cold Spring Harbor The crux of the whole-genome shotgun strategy is Genome Sequencing and Biology Meeting, which the assembly technique. Gene Myers (Celera Geno- witnessed the creation of the Human Genome mics) reported on how the `double-barrelled' shot- Organization (HUGO) in 1988, should this year gun2 approach had given a signi®cant advantage to present three major advances in genomic science: the computer algorithms employed in the assembly the completion of the ®nished sequence of Droso- of the ¯y genome. The assembly system employs a phila melanogaster; the announcement that 85% of bottom-up, nucleating strategy, initially assembling the genome of Homo sapiens is now in draft small islands of sequence of high con®dence sequence; and the complete, ®nished sequence of a (diverting the assembly of repeat regions to later second human chromosome, chromosome 21. Other stages) and then searching for other sequence major sessions of the meeting focused on single (including orientation data from clone end- nucleotide polymorphisms (SNPs), ethical, legal and sequencing) to join the islands together. In addition, social implications (ELSI), as well as comparative he con®rmed a less than 0.5% error rate in the and functional genomics. automated assembly of repeat sequences that are a There is signi®cant debate as to whether the potential problem for this type of system. technique of `whole genome shotgun' sequencing1 is Gerry Rubin (BDGP) and Mark Adams (Celera applicable to the elucidation of larger genomes (e.g. Genomics) presented material from the analysis of human). Prior to the beginning of this year, this the genomic sequence, showing that the total technique had only been demonstrated on small number of genes is approximately 13 600 (compared microbial genomes (such as Haemophilus in¯uenzae, with approximately 15 000 seen in the smaller with a genome of approximately 1.8 Mb [1]) and 100 Mb Caenorhabditis elegans genome). In addi- there had been considerable scepticism as to tion, there was substantial variation of 0±30 genes whether the technique would work in human or per 50 kb (but without the clustering seen in C. fruit ¯y. There was, therefore, considerable interest elegans). There was also a general trend showing a and excitement in the presentation by the Berkeley marked decrease in gene density, G+C content and Drosophila Genome Project (BDGP) and Celera an increase in transposons in the 1 Mb portion Genomics (Rockville, USA) on the full Drosophila adjacent to the centromeric heterochromatin. Gerry sequence. 2 This is where sequence is generated from both ends of shotgun clones as well as a range of small insert libraries. These libraries 1 Where all genomic material is sequenced as small fragments and have a narrowly de®ned size range, so end-sequence data, resulting sequence fragments is reassembled by sophisticated provides important positional information for ordering and software algorithms. orientation of sequence fragments. Copyright # 2000 John Wiley & Sons, Ltd. 242 M. A. Strivens Rubin commented that it was remarkable that the several sessions focused on the bioinformatics ¯y had only just over twice the number of genes techniques being designed to analyse the explosion compared with yeast, leading to the proposition of raw data from the sequencing and mapping that the complexity of an organism's gene content is centres. One of the principal problems with the high not directly proportional to the complexity of the output of the various genome projects concerns the organism itself. degree and appropriateness of annotation assigned This observation prompted spirited debate, to a sequence. This is exacerbated by the constantly centred on exactly how many genes would com- evolving draft sequence. Ewan Birney (EBI, UK) prise the human genome and, indeed, how one presented the Ensembl (http://www.ensembl.org/) would de®ne what comprised a single gene. In system, which is capable of de®ning features on response to this debate, Ewan Birney of the draft sequence (assigning them stable identi®ers), European Bioinformatics Institute (EBI, UK) has maintaining those features and identi®ers as draft created a sweepstake `Gene Sweep' (http://www. sequence progresses to its ®nal ®nished form. This ensembl.org/genesweep.html), allowing bets to be system is clearly a valuable tool in the early placed on what the eventual ®gure will be. It is identi®cation and exploitation of novel genes and characteristic of the current divergence in opinion regulatory elements. Comparative sequence analy- that guesses currently range from 27 000 to sis3 is also becoming a viable tool for the dissection 200 000. The sweepstake is open for another 2 of novel genomic regions where sequence is avail- years with de®nitions and absolute gene number able in a number of species. Greg Elgar (Human being decided at the Cold Spring Harbor meetings Genome Project Resource Centre, UK) presented in 2002 and 2003, respectively. comparative analysis between the compressed In addition to the announcement of the comple- genome of the puffer-®sh, Fugu, with a number of tion Drosophila genome, substantial progress has loci in mouse and man, demonstrating the power of been achieved in the production of the working this technique in identifying new genes and con- draft of the human genome. Jane Rogers (Sanger served regions in these species. In addition, to aid Centre, UK) presented the accelerated progress of this type of approach, tools are being developed to the draft sequencing, now representing 85% of the align and visualize multi-species sequence compar- human genome, with anything up to 97% in the isons, e.g. the Vista tool (Kelly Frazer, Lawrence ®nal phase of checking (http://www.sanger.ac.uk/ Berkley National Laboratory, USA). This tool HGP/stats.shtml). is capable of visualizing long-range alignments Andre Rosenthal (Institut fuÈr Molekulare Bio- between several species and can be used to de®ne technologie, Germany) reported the work of an statistical cut-offs for conserved elements. international consortium of labs [2] producing a The keynote speech was, appropriately, delivered ®nished sequence for human chromosome 21. This by Francis Collins (National Human Genome chromosome has almost 20 disease loci associated Research Institute, USA), who has played a pivotal with it, in addition to the trisomy that gives rise to role in the organisation of the Human Genome Down's syndrome. From the analysis of the ®nished Project and who clearly relished the prospect of sequence, the consortium identi®ed 127 genes of delivering this address. He urged the assembled known function and 98 putative genes (including audience not to lose sight of the ultimate goals of 59 pseudogenes). Approximately 40% of this chro- the genome project; in the near-term this means the mosome is composed of repetitive elements, with generation of high-quality, annotated sequence, that some apparently very gene-poor areas totalling is made available to all. In the longer term, the approximately 10 Mb. One of the most startling of requirement is to begin to use the data generated to the gene-poor regions was a 7 Mb region on 21q, pump-prime the next phase of genomic science ± where only one gene, as yet, has been detected. seeing that this, in turn, is translated into advances Using the number of genes identi®ed from the in basic medical science. completed sequence from human chromosomes 21 Given the potential impact of even the draft and 22, an extrapolation was done showing that sequence on the whole of science and society, Dr (theoretically) the human genome could contain as 3 The alignment of sequence from syntenic regions in order to little as 40 000 genes. identify evolutionary conserved regions, such as conserved exons In addition to the intense experimental effort, and regulatory regions. Copyright # 2000 John Wiley & Sons, Ltd. Yeast 2000; 17: 241±243. Life will never be the same 243 Collins, ®nished his address with 10 `exhortations' 6. Establish links with the schools of business, (see below) to the assembled scientists to `attend to law, public health and education at your the broader social context' of the Human Genome institution. Project. This involves both the education of school 7. Start a DNA Day on your campus, at your and college students and ensuring the understand- hospital, or within your community. ing of the general public: 8. Engage your community in a discussion of genetics. Use mistakes and over-simpli®cations 1. Cut out the lights, close the door, and get out in the press to improve local genetic literacy of the lab. Spread your wisdom. Turn your bar through letters to the editor. Write op-eds. napkins into genetic primers. 9. Get to know the ELSI issues and offer your 2. Volunteer to be a resource to local science expertise to state and federal policy makers. teachers. Make yourself available to speak to Share your concerns about the importance of elementary and secondary school students. Get protecting the privacy and fair use of genetic involved in setting state and local science information. education standards. Alert local high school 10. Share ideas. Let me ± and others ± know what biology teachers to the NHGRI curriculum works in your community. supplement on human genetic variation and HGP educational video documentary and CD ROM to be released this Fall. Acknowledgements 3. Be an ambassador for science ± speak to Rotary, Chamber of Commerce, church group My thanks to Francis Collins (NHGRI, USA) for providing or local bar association.