The Human Genome Project
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE The Human Genome Project BY STEPHEN SCHERER RÉSUMÉ Il existe désormais une séquence préliminaire de ABSTRACT We now have a draft sequence of the entire human l’ensemble du génome humain et de la plupart des gènes encodés. genome and most of the genes it encodes. Here we explain what Dans le présent article, nous expliquons ce qu’est le génome, ce the genome is, what it means to sequence it, outline how the que représente son séquençage et comment il a été réalisé, avant sequencing was done, and reflect on what this means for the way d’aborder une réflexion sur l’incidence de cette découverte sur we do science and, more fundamentally, how we understand notre façon de faire de la recherche scientifique et, sur un plan plus ourselves. A comprehensive description of the human genome is fondamental, sur notre connaissance de nous-mêmes. La descrip- the foundation of human biology and the essential prerequisite for tion complète du génome humain constitue le fondement de la an in-depth understanding of disease mechanisms. As such, biologie humaine et la condition essentielle à la connaissance information generated by the sequenced human genome will approfondie des mécanismes de la maladie. Comme telle, l’infor- represent a source book for biomedical science in the 21st century. mation découlant du séquençage du génome humain équivaudra à It will help scientists and clinicians to understand, diagnose and un guide de référence pour la science biomédicale XXIe siècle. Elle eventually treat many of the 5000 genetic diseases that afflict aidera les chercheurs et les cliniciens à comprendre, diagnostiquer humankind, including the multifactorial diseases in which genetic et finalement traiter nombre des 5 000 maladies génétiques qui predisposition and environmental cues play an important role. : KATY LEMAY affligent l’humanité, notamment les maladies plurifactorielles dans lesquelles la prédisposition génétique et les éléments environne- mentaux jouent un rôle important. (Traduction : www.isuma.net) ILLUSTRATIONS isuma 11 the human genome project enes are instructions that contrary to “garbage” which we, and dna was painstakingly slow and give organisms their charac- the cell, throw away). expensive. Sequencing the entire teristics. The instructions are human genome seemed impractical. Gstored in each cell of every living The Human Genome Project Others felt it made more sense to organism in a long string-like mole- and DNA sequencing focus research effort and money only cule called Deoxyribonucleic Acid “Sequencing” is the process of deter- on discovering particular genes (dna). dna molecules are subdivided mining the specific order and identity thought to be associated with dis- into finite structures called chromo- of the three billion base pairs in the eases, rather than trying to spell out somes (Figure 1). Each organism has a genome with the ultimate goal of iden- the entire genome. However, even for characteristic number of chromo- tifying all of the genes. “Mapping” is disease projects aimed at finding sin- somes. For humans the number is 46 the process of identifying discrete dna gle gene disorders (such as that for (23 pairs) and this complete set of segments of known position on a cystic fibrosis) the worldwide cost genetic information is called the chromosome which are then used for could often run in the tens of millions genome (Figure 2). sequencing (mapping is a crucial step dollars before success was achieved. for proper reconstruction of the The idea for the human genome pro- DNA, genes and genomes genome; it usually precedes sequenc- ject eventually won support, justified Human dna looks like a twisted lad- ing but is also necessary in post- as an investment in fundamental bio- der with three billion rungs. If sequencing). To give one a sense of the logical infrastructure upon which the unwound, your dna would stretch enormity of the task to decode the scientific community around the over five feet, but it is only 50 tril- human genome, a simple listing of all world would build exciting new lionths of an inch wide. The total of the atcg combinations would fill research that would lead to the diag- amount of dna in the 100 trillion or hundreds of phone books. And hav- nosis and treatment of disease. so cells in the human body laid end to ing the sequence is only a first step in The enormity of the challenge end would run to the sun and back finding the genes, understanding how inspired the creation of an interna- some 20 times. The three billion rungs they operate, and how particular tional, publicly funded research ini- are made up of chemical units, called genetic discrepancies are associated tiative called the Human Genome “base pairs,” of nucleotides — with diseases. Using this dna sequence Project (hgp), officially launched in adenines, thymines, cytosines and gua- information to diagnose, prevent or the early 1990s. The Human Genome nines, represented by the letters A, T, treat those genetic predispositions or Organization (hugo) brings together C and G. Particular combinations of causes of disease is an even more com- over 1000 scientists from 50 countries these dna base pairs (or genes) consti- plex task. worldwide, and has a mandate that tute coded instructions for the forma- When the idea of sequencing the includes promoting international col- tion and functioning of proteins, which human genome was first proposed laboration within the hgp. The earli- make up the body and govern its bio- about 15 years ago, it was highly con- est genome meetings were attended by logical functioning (examples of pro- troversial. For one thing, given the a rag-tag group of usually fewer than teins include insulin, collagen, diges- technology at the time, sequencing 100 biologists of diverse background tive enzymes, etc.). Ribonucleic acid, (rna) is a single stranded copy of dna that acts as an intermediate messenger FIGURE 1 molecule that allows dna sequence to be translated to protein. This central process, whereby dna transcribes to Gene rna, which in turn transcribes to pro- Nucleus Cell Chromosomes tein, underlies all of life. Some genes are made up of only a DNA few hundred base pairs, others run to a couple of million base pairs. It is now estimated that we have around 30,000 to 40,000 genes (much more work will be required to determine the precise Protein number), but those genes account for less than five percent of our dna, with the rest being filler. For lack of a better term, the non-genic dna is often dubbed “junk” since its role is not ful- ly known (in fact this might be a good Figure 1. The human genome consists of 23 pairs of chromosomes that are comprised of DNA. Approximately 5% of the DNA along the chromosome encodes for genes. Genes make term since just like the cell we often proteins. Proteins provide cellular structure and functional activity within the body. keep our “junk” just in case we need it 12 Autumn • Automne 2001 the human genome project FIGURE 2 It will likely take two to three years for the draft 1 23 45form ofthe dna sequence to reach 6 7 8 9 10 11 12 what is considered 181716151413 to be in a finished 19 20 21 22 X Y state. Figure 2. The 23 pairs of human chromosomes one from each pair being inherited from one parent. Note the interchange of material between chromosomes 7 and 13 which disrupts a gene leading to disease. and research interests, only a few of dyed nucleotides can attach them- whom had some type of vision of how selves to the dna fragments, but other to accomplish this distant goal (Fig- nucleotides will not adhere to the dyed ure 3). Since 1990, the strategies and nucleotides. Thus the chemical reac- measures of successes of the hgp have tion generates dna fragments of vari- most often been debated at an annual ous length that each terminate at flu- meeting held in Cold Spring Harbor ourescently dyed A, T, C or G Laboratories, New York (directed by nucleotides. The underlying sequence James Watson, co-discover of dna). for the range of the dna fragments The most accurate documentation of created in the chemical reaction is then the science and progress behind the determined by an automated sequenc- genome project can be found within ing machine. The synthesized dna the abstracts of this meeting. The ear- fragments are negatively charged. The ly consensus was that before large- sequencing machine sets up an electri- scale and cost-effective sequencing of cal field. The dna fragments move the entire genome could be complet- through a pourous gel toward the pos- ed, numerous technological develop- itive charge, with shorter fragments ments, associated with mapping and moving more quickly through the gel. sequencing processes as well as with The fluorescent, tagged bases at the the computational capacity to make end of the fragments can be detected sense of the resulting information, with the help of a laser, and the result- were essential. ing information stored on a comput- With developments in technology er. As a result, the order in which the over the last few years, the seqencing particular tagged nucleotides are read is now done through a process called reflects their order on the stretch of fluorescence-based dideoxy sequenc- dna that has been replicated in the ing (Figure 4a). Fragments of dna are chemical reaction. first cloned in bacteria. They are then Each reaction reveals the sequence put into a chemical reaction with free of about 500 letters (A,T,C,G) of dna nucleotides, some of which are tagged before the process runs its course. Once with fluorescent dyes. Nucleotides these relatively tiny sequences are attach themselves to the dna frag- obtained, their place in the overall ments in a particular order.