Paper No. : 16 Molecular Genetics Module : 29 Large scale analysis of genome: Human Genome Part I

Development Team

Principal Investigator: Prof. Neeta Sehgal Head, Department of Zoology, University of Delhi

Co-Principal Investigator: Prof. D.K. Singh Department of Zoology, University of Delhi

Paper Coordinator: Prof. Namita Agarwal Department of Zoology, University of Delhi

Content Writer: Dr. Nidhi Garg Deshbandhu College, University of Delhi

Content Reviewer: Dr. Surajit Sarkar Department of Genetics, South Campus, Delhi University

1

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

Description of Module

Subject Name Zoology

Paper Name Molecular Genetics Zool 016

Module Name/Title Large scale analysis of genome

Module Id M29: Human Genome: Part I

Keywords Genome, Gene, Sequencing, Genetic and Physical Maps

Contents 1. Learning Outcomes 2. Introduction 3. (HGP) 4. History of Human Genome Sequencing 5. Budget of the Human Genome Project 6. Goals of the Human Genome Project 7. Summary

2

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

1. Learning Outcomes

After studying this module, you shall be able to

• Know how what genome is. • Learn about the history of the Human Genome Project. • Evaluate the importance of Human Genome Project. • Know the important goals of HGP and how well within the time frame they were achieved. 2. Introduction

The genome is defined as the genetic material of an organism which comprises of DNA or it can be RNA in RNA viruses. The term genome was coined by Professor Hans Winkler of the University of Hamburg, Germany in 1920. The DNA is organized in the form of chromosomes. In haploid organisms such as bacteria, archaea, viruses and in organelles like mitochondria and chloroplasts, the genome consists of a single circular or linear chromosome. In a sexually reproducing diploid organism, the genome comprises of a two full sets of chromosomes in a somatic cell. The gametes of a diploid organism contain half the number of chromosomes due to meiosis. Some organisms may be triploid, tetraploid, pentaploid etc. and therefore, have multiple sets of chromosomes. The term genome thus, refers not only to the DNA present in the nucleus known as the "nuclear genome" but also to the DNA stored in mitochondria and chloroplast which is known as the "mitochondrial genome" and the "chloroplast genome".

Sequencing the genome of an organism refers to the determination of the order of nitrogenous bases A, T, G and C in its genetic material. Thus, for a virus it may involve knowing the base composition of only a single chromosome whereas, for a bacterium it may involve sequencing both the chromosome and the plasmids which together comprise its genome. For sexually reproducing organisms, genome sequencing means determining the sequences of a complete set of autosomes and one of each type of sex chromosome. For example, the human genome consists of 22 pairs of autosomes and 2 sex chromosomes, therefore a complete genome sequence will comprise of 46 separate chromosome sequences. It is also important to determine the sequence of the mitochondrial or chloroplast DNA to have complete information about the genome of eukaryotic organisms.

To sequence the genome of any organism genome projects are undertaken. Genome projects are scientific research projects initiated by research groups world over with the aim of sequencing the complete genome, annotating the protein-coding genes and decoding the essential features of a genome which either distinguishes it or relates it to another genome. Both the length of the genome as well as the total number of genes differ extensively from one species to another.

The decision to sequence a genome by the research agencies depends upon the importance of that organism. It might be a model organism, may have commercial importance (example crop plant, livestock, yeast or enzyme producing bacteria) or significant importance to human health. Emphasis is also given to sequencing the genome of a species that will help in determining molecular evolution or phylogeny. The genome sequence provides information regarding the order of every nitrogenous base, whereas a genome map is less detailed than a genome sequence but identifies the landmarks and helps in navigating around the genome. Historically, for sequencing the eukaryotic genomes the common 3

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

approach was to first map the genome to which gives information regarding the landmarks within the genome instead of sequencing the chromosome in one go. Mapping the chromosome allows sequencing to be done bit by bitas one already knows just about where a particular DNA fragment might be located on the chromosome. Currently, due to improvements in DNA sequencing technology it is possible to sequence the entire genome more quickly and in one go using methods such as the Shotgun approach. Sequencing of genomes has become more affordable due to steady reduction in the cost in terms of cost per base pair.

3. Human Genome Project

The HGP was a collaborative project between several countries that aimed to know the sequence of 3 billion base pairs comprising the human DNA. It also involved both identifying and mapping the total number of genes in the human genome. The HGP was both proposed and funded by the US government and till date is the world's largest collaborative project. Although, the planning of the project started in 1984 but the work began in 1990 and the complete genome was announced in 2003. In 1998 Craig Venter founded the Celera Genomics, a company that took up the sequencing project parallel to HGP that was privately funded. The sequencing was carried out in the twenty institutes mentioned below.

The International Human Genome Sequencing Consortium included the following institutes:

1. The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S. 2. The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U. K. 3. Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo., U.S. 4. United States DOE Joint Genome Institute, Walnut Creek, Calif., U.S. 5. Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, Houston, Tex., U.S. 6. RIKEN Genomic Sciences Center, Yokohama, Japan 7. Genoscope and CNRS UMR-8030, Evry, France 8. GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass., USA 9. Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany 10. Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China 11. Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash. 12. Stanford Genome Technology Center, Stanford, Calif., U.S. 13. Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, Calif., U.S. 14. University of Washington Genome Center, Seattle, Wash., U.S. 15. Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan 16. University of Texas Southwestern Medical Center at Dallas, Dallas, Tex., U.S. 17. University of Oklahoma's Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, Norman, Okla., U.S. 18. Max Planck Institute for Molecular Genetics, Berlin, Germany

4

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

19. Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y., U.S. 20. GBF - German Research Centre for Biotechnology, Braunschweig, Germany.

These international institutions played a vital role in quick and effective completion of the HGP. In the United States, where the project was founded the major contributors were

1. The U.S. Department of Energy (DOE)- It was the center for the discussion of the HGP as early as 1984. 2. National Institutes of Health (NIH)- It first participated in the project in 1988, by creating the Office for Human Genome Research, which was upgraded in 1990 to the National Center for Human Genome Research and then later on in 1997 it was named as the National Human Genome Research Institute (NHGRI).

The funding for the HGP came from not only the US government through the NIH and DOE but, also from a UK based charity organization known as the Wellcome Trust, and several organizations located world over. The UNESCO played a significant role in involving the developing nations in the HGP.

4. History of Human Genome Sequencing

The HGP arose because of two important perceptions that arose in the early 1980s. The first was to sequence complete genomes which would result in accelerated biomedical research, as it would allow the researchers to solve problems in an all-inclusive and unbiased fashion. The second insight was the requirement to build infrastructure through communal effort, something that no one had attempted in biomedical research so far. Important projects that played a vital role in crystallizing these insights were:

1. In between 1977 and 1982, the complete genomes of bacterial viruses ØX174 and ƛ, the animal virus SV407 and the human mitochondrion were sequenced. These sequencing projects demonstrated the practicability of assembling small sequences into complete genomes. The data generated led people to value the complete set of genes and other functional elements for further research and analysis. 2. Botstein and colleagues in 1980, launched a program which could generate a human genetic map which made it feasible to find genes causing disease of unknown function on the basis of only their inheritance patterns. 3. In the mid-1980s, Olson and Sulston launched programs that created physical maps of clones containing sequences that covered the yeast and worm genomes. This allowed the separation of genes and regions on the basis of their chromosomal position.

The history of the HGP dates back to 1985 when, Robert Sinsheimer in the May of 1985 organized a workshop for discussing the sequencing of the human genome, but the NIH was not interested in his proposal. In March 1986, Charles DeLisi and David Smith from the DOE's Office of Health and Environmental Research (OHER) organized` Santa Fe Workshop. Two months later a workshop was organized by Dr. James Watson at the Cold Spring Harbor Laboratory. A memo containing a broad plan of HGP was sent by Charles DeLisi, the then Director of OHER, to who was 5

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

the Assistant Secretary for Energy Research. Dr. Alvin Trivelpiece then pursued and also got the consent for the project from Deputy Secretary . The Santa Fe workshop had indeed been successful in motivating the Federal Agency to support the HGP which ultimately led to the approval of funds which allowed the OHER to start the HGP in 1986. A total of $4 million were initially allocated to initiate the project.

The budget for the genome project was proposed by President Regan in his 1987 budget to Congress which was ultimately approved by both the Houses. Senator Peter Domenici, a friend of DeLisi played a vital role in getting the Congressional approval for the project by chairing both the Senate Committee on Energy and Natural Resources and the Budget Committee. A line item budget of $3 billion was approved by the Reagan Administration and the Project was expected to take 15 years beginning from 1990. In 1990, the DOE and NIH, signed a MoU for coordinating the plans and for initiating the genome project. In 1990, James Watson headed the NIH funded Genome Program while David Galas was initially made the Director of the Office of Biological and Environmental Research in the U.S. Department of Energy’s Office of Science. In 1993, Francis Collins succeeded James Watson while Aristides Patrinos succeeded Galas. Francis Collins was made to head the project Director of the NIH. National Center for Human Genome Research which was later renamed as the National Human Genome Research Institute.

In 1998, an American Craig Venter founded a privately funded firm known as Celera Genomics. In the early 1990’s he was a research scientist at the NIH, associated from the beginning with the HGP. The Celera was founded with a capital of $300,000,000 and aimed to sequence the genome speedily and at a cost much lower than $3 billion. Celera Genomics employed the technique of whole genome shotgun sequencing, which was employed for sequencing bacterial genomes with a size of six million base pairs, but had never been used for sequencing a genome containing three billion base pairs.

Celera Genomics had promised to publish their findings by releasing new data annually abiding by the 1996 "Bermuda Statement". On the other hand, the HGP being a publicly funded project released its new data daily. Celera Genomics permitted neither the free redistribution nor the scientific use of the data. Thus, the HGP being a publically funded project released the first draft of the human genome earlier than Celera Genomics. In March 2000, the President of United States, denied the patenting of the human genome sequence, and that the researchers will have free access to it. This announcement by the President had a negative impact on the Celera's shares at the Nasdaq stock exchange thus, its price went down drastically. The biotechnology sector as a whole suffered a loss of approximately $50 billion in the stock market within two days of the announcement.

As a result of international cooperation, developments in genome sequencing and bioinformatics, a 'working draft' of the genome was finished in 2000, a year ahead of the planned timeline. Genome announcement was made on June 26, 2000, together by the U.S. President Bill Clinton and the British Prime Minister Tony Blair. A rough draft of the genome was completed and released on July 7, 2000 by the UCSC Genome Bioinformatics Group at the University of California. On the first day of free and open access about 500 GB of information was downloaded by the scientific community from the UCSC genome server. The research paper describing the details which included the methods and sequence analysis of the rough draft of the human genome was published in February 2001. The researchers of HGP published their work in the journal Nature while the scientists at Celera Genomics

6

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

published their work in Science. These drafts published by both the groups covered about 83% of the genome which included 90% of the euchromatic regions and 150,000 gaps. At that time, the order and orientation of several DNA segments was not well-known. Due to advances in the sequencing techniques, the complete genome was announced on April 14, 2003, which was two years ahead of the timeline. The complete draft of the human genome was published in 2003. In May 2006, the sequence of the last chromosome was published in Nature which led to the completion of the project. Mentioned below is the timeline of the Human Genome Project (Figure 1)

For more information, watch the title on https://www.youtube.com/watch?v=slRyGLmt3qc

Figure 1: Timeline of the Human Genome Project 1984-2001 Source: Lander, et. al. 2001. Initial sequencing and analysis of the human genome. International Human Genome Sequencing Consortium. Nature, Vol 409, pp 860- 921.

For more information regarding the key events of the Human Genome Project and the ongoing research log on to http://www.genome.gov/10001763

5. Budget of the Human Genome Project

The budget set for carrying out the Human Genome Project was $3 billion. This amount was to be spent in three stages over a 15-year period initially (1990-2005) but due to accelerated progress the funding was calculated from 1990 to 2003 (Table 1). The funding was to be spent for

7

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

1. Conducting studies of human diseases. 2. Sequencing of model organisms. 3. For developing latest technologies to be used for biological and medical research. 4. Development of computational methods to analyze genomes. 5. Ethical, legal, and social issues (ELSI) related to genome sequencing. 6. Sequencing the human genome.

Table 1: The Funding of U.S. Human Genome Project from 1988 to 2003 ($Millions).

U.S. Human Genome Project Funding ($Millions) FY DOE NIH* U.S. Total 1988 10.7 17.2 27.9 1989 18.5 28.2 46.7 1990 27.2 59.5 86.7 1991 47.4 87.4 134.8 1992 59.4 104.8 164.2 1993 63.0 106.1 169.1 1994 63.3 127.0 190.3 1995 68.7 153.8 222.5 1996 73.9 169.3 243.2 1997 77.9 188.9 266.8 1998 85.5 218.3 303.8 1999 89.9 225.7 315.6 2000 88.9 271.7 360.6 2001 86.4 308.4 394.8 2002 90.1 346.7 434.3 2003 64.2 372.8 437 Note: Funds involved in construction have not been included, as they comprise a minor port ion of the budget. Source: http://web.ornl.gov/sci/techresources/Human_Genome/project/budget.shtml

The funding agencies allotted 3% to 5% of their budgets for studying ethical, legal, and social issues related to the project.

6. Goals of the Human Genome Project

The goals for the 3 five year plans were set together by the NIH and the DOE, as they were the two main organizations which received funding for the human genome project (Table 2). The HGP was a collaborative worldwide research effort whose primary goal was to analyze the structure of human DNA and to know the precise position of genes. Parallely, they also planned to sequence the genome of certain model organisms for obtaining comparative information which was important to understand the functioning of the human genome. The information generated by the HGP will aid in the advancement of biomedical science. Not only this, the knowledge of genes will provide enormous 8

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

utility in medicine, helping in understanding and treating several genetic diseases and multi-factorial diseases where genetic predisposition plays an important role.

The human genome project was initially planned for a span of 15 years from 1990 to 2005. This time period was divided into three five year plans. The first 5-year plan from 1990-1995, was revised in 1993 as there was accelerated progress in the genome sequencing. The second 5-year plan defined goals from 1993 to 1998. The development of the third plan occurred through several workshops conducted by the DOE and NIH.

First five-year (1990-1995) Goals of the Human Genome Project:

1. Mapping and Sequencing the Human Genome: a) Genetic Mapping: To complete the human genetic map containing markers spaced 2 to 5 centi Morgan (cM). To recognize every marker by a sequence tagged site (STS). b) Physical Mapping: To assemble STS maps of all human chromosomes with markers spaced at 100,000-bp intervals. To generate overlapping sets of cloned DNA with continuity over lengths of 2 Mb for large parts of the human genome. c) DNA Sequencing: To improve the existing DNA sequencing methods and to develop newer sequencing techniques, this will help in lowering the cost of large-scale sequencing of DNA to $0.50 per base pair. To sequence 10 Mb of human DNA in large uninterrupted stretches. 2. Gene Identification: To develop methods efficient enough for not only identifying but also placing the known genes on physical maps. 3. Mapping and Sequencing the Genomes of Model Organisms: To generate a genetic map of mouse genome on the basis of DNA markers. To start the physical mapping on just one or two chromosomes. Sequencing approximately 20 Mb of DNA of different model organisms, with a focus on stretches that are 1 Mb long. This would be done during the development and validation of new and developed DNA sequencing technology. 4. Data Collection and Distribution: To develop software and database effective enough for supporting the large-scale mapping and sequencing projects. To create database tools capable of providing an easy access to up-to-date physical, genetic and chromosome mapping. Not only this, the database must also allow access to sequencing information data which can be easily compared with the data of several other data sets. To develop algorithms and analytical tools for interpreting genomic data. 5. Ethical, Legal, and Social Considerations: To improve programs that aim to understand the ethical, legal, and social implications of HGP data. It also involved the identification and the defining of the major issues related to HGP data and the development of initial policy options for addressing them. 6. Research Training: The HGP also aimed to support the research training of both the pre- and the postdoctoral fellows from the fiscal year 1990. The project would support the training tilla total of 600 trainees per year is reached by 1995. To scrutinize the requirement for other types of research training in 1991. 7. Technology Development: To back automated instrumentation and innovative and high-risk development of technology. To improvise the existing technology for meeting the requirements of the HGP. 8. Technology Transfer: To improve the working relationships with industry. To boost as well as assist the transfer of technologies and medically important information to the medical fraternity. 9

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

Even though the first 5-year plan was till September 1995 but due to unexpected advances in genome research the first 5-year goals were updated in 1993. Detailed human genetic maps were generated along with better physical maps of both human and model organisms. There was improvement in DNA sequencing and bioinformatics. Alongside there was identification of major ethical, legal, and social issues (ELSI) associated with increased availability of genetic information. The genome project had begun to demonstrate its deep impact on biomedical research. The availability of comprehensive genetic maps allowed the scientists to find genes associated with Menkes syndrome, Huntington's disease, myotonic dystrophy, fragile X syndrome, etc.

The second 5-year plan was from 1993 to 1998 and was published in the journal Science, coauthored by Francis Collins and David Galas. The new 5-year plan extended the research goals of the first 5- year plan and added specific new goals in order to develop technology for identifying genes and mapping. The main goal was to get the complete human DNA sequenced. Development of programs for the distribution of genome materials to the scientific community was also envisioned. There was an ongoing debate regarding the value of sequencing the whole genome, researchers realized that smaller-scale techniques were ineffective in providing complete information regarding the genes and their biological functions.

Second five-year (1993-1998) Goals of the Human Genome Project:

1. Genetic Mapping: To generate a full 2- to 5-cM map by 1995. Developing techniques for fast genotyping. To find easy to use markers along with new techniques for mapping. 2. Physical Mapping: To complete a STS map of the human genome having a resolution of 100 kb. 3. DNA Sequencing: To develop DNA sequencing methods and capacity capable of sequencing DNA in Mb and at a rate of 50 Mb per year. To develop high-throughput sequencing technology, this focuses on systems integration of all steps beginning from preparation of template till data interpretation. 4. Gene Identification: To develop efficient techniques to identify genes and to place known genes on physical maps or sequenced DNA. 5. Technology Development: To significantly increase the support for developing innovative technology and improving the present technology used for DNA sequencing. 6. Model Organisms: To complete an STS map of the mouse genome at a resolution of 300kb. To sequence selected segments of mouse DNA alongside the corresponding human DNA. To complete the sequencing of E. coli and S. cerevisiae by 1998 or earlier. To sequence the genome of Caenorhabditis elegans and Drosophila melanogaster for their near completion by 1998. 7. Informatics: To continue the creation, development, and operation of databases and tools for easy access to data. This should include effective tools and standards to facilitate data exchange and links among databases. To consolidate, distribute, and continue the development of effective software’s for large-scale genome projects. To carry on the development of software’s required for comparison and understanding genome data. 8. Ethical, Legal, and Social Implications (ELSI): To continue the identification of issues and the development of policy options for addressing them. To develop and distribute policy options concerning the genetic testing services with probable extensive usage. To raise better approval of genetic variation in humans. To increase and enlarge public and professional education which would make people sensitive to socio-cultural and psychological matters. 10

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

9. Training: To carry on the training of scientists in interdisciplinary sciences which are related to genome research. 10. Technology Transfer: To inspire and escalate transmission of technology both inside and outside institute of genome research. 11. Outreach: To facilitate the co-operation with those ready to create centers for dissemination of genome data. Sharing of information and materials within 6 months of their development, through submission of the data to public databases or repositories, or to both.

Third five year (1998-2003) Goals of the Human Genome Project:

1. Human DNA Sequence: To complete the sequencing of human genome by 2003. To complete sequencing one-third of the human DNA and to achieve a minimum of 90% coverage of the genome in a working draft based on mapped clones by the end of 2001. Making the complete sequence available free of cost. 2. Sequencing Technology: Emphasis on continued growth in the throughput and a subsequent decrease in sequencing cost. To support research which leads to the development of novel technologies that can significantly improve sequencing technology. 3. Human Genome Sequence Variation: To promote the development of technologies for rapid, and large-scale identification of SNPs and other DNA sequence variants. To generate a SNP map of containing a minimum of 100,000 markers. Creation of public resources of DNA samples and cell lines. 4. Functional Genomics Technology: Generation of complete cDNA clones and sequences representing human genes and model organisms. Supporting research to develop techniques for studying the functions of non-protein-coding sequences and complete study of gene expression. 5. Improve methods for genome-wide mutagenesis: Development of technology for conducting comprehensive protein analyses. 6. Comparative Genomics: Completion of genome sequencing of C. elegans by 1998, and Drosophila by 2002. Development of physical and genetic map for Mus musculus, and completion of its genome sequence by 2008. Identification of additional valuable model organisms and to study their genomic. 7. Ethical, Legal, and Social Issues: To look at various concerns that are associated with the generation of the human DNA sequence and genetic variation. Examination of issues that have arisen due to incorporation of genetic technologies and information into health care and public health activities. Conduct research on the effect of racial, ethnic, and socioeconomic factors on the usage, understanding, and explanation of genetic information, genetic services and policy development. 8. Bioinformatics and Computational Biology: To further develop both the content and the usefulness of the existing databases. To promote the development of improved methods for generation of data, capture, annotation, comprehensive functional studies, representation and analysis of sequence similarity and variation. Development of software that is robust, exportable and extensively shared. 9. Training and Manpower: To encourage the training of researchers that are skilled in the field of genomics and to establish their academic career. To promote a rise in the number of scholars having knowledge of both genetic and genomic and in ELSI.

11

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

The goals of the HGP and their respective completion dates are mentioned in the table below (Table 2).

Table 2: The goals of the Human Genome Project and their date of completion.

Area Goal Achieved Date

Genetic Map 2- to 5-cM resolution map (600 1-cM resolution map (3,000 September 1994 – 1,500 markers) markers) Physical Map 30,000 STSs 52,000 STSs October 1998 DNA Sequence 95% of gene-containing part of 98% of gene-containing part of April 2003 human sequence finished to human sequence finished to 99.99% accuracy 99.99% accuracy Capacity and Sequence 500 Mb/year at < Sequence>1,400 Mb/year at November 2002 Cost of Finished $0.25 per finished base <$0.09 per finished base Sequence Human Sequence 100,000 mapped human SNPs 3.7 million mapped human February 2003 Variation SNPs Gene Full-length human cDNAs 15,000 full-length human March 2003 Identification cDNAs Model Organisms Complete genome sequences of Finished genome sequences of April 2003 E. coli, S. cerevisiae, C. elegans, E. coli, S. cerevisiae, D. melanogaster C. elegans, D. melanogaster, plus whole-genome drafts of several others, including C. briggsae, D. pseudoobscura, mouse and rat. Functional Develop genomic-scale High-throughput 1994 Analysis technologies oligonucleotide synthesis DNA microarrays 1996 Eukaryotic, whole-genome 1999 knockouts (yeast) Scale-up of two-hybrid system 2002 for protein-protein interaction

Source: Collins, F. S., Morgan, M., and Patrinos, A. 2003. The Human Genome Project: Lessons from Large- Scale Biology. Science, Vol. 300 no. 5617 pp. 286-290. 7. Summary

 The term genome was coined by Professor Hans Winkler and is defined as the genetic material of an organism.  Sequencing of an organism’s genome is the determination of the order of nitrogenous bases A, T, G and C in its genetic material.

12

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I

 Genome projects are scientific research project taken up by research groups to know the complete genome sequence of organisms and to annotate protein-coding genes and gain information about the other important features of a genome.  The Human Genome Project (HGP) was a collaborative effort which was started in 1990. Its goal was to sequence and identify all the three billion base pairs in the human genome.  The Human Genome Project aimed to completely map as well as understand the structure and function all the genes of humans. This was followed by the identification of genetic variants which escalate the possibility for common diseases such as cancer and diabetes and to develop the appropriate treatment for it.  The human genome was sequenced in twenty universities and research centers that were located in the US, the UK, France, Japan, Germany, and China.  The HGP was started in the United States and funded majorly by the U.S. Department of Energy (DOE) and the National Institutes of Health (NIH).  A budget of$3 billion was set for carrying out the HGP which was planned to be spent in 3 stages in 15-year period from 1990-2005, but due to accelerated progress the project was completed in 2003.  The goals of the HGP were to generate the genetic Map with a resolution of 2- to 5-cM (600 – 1,500 markers), to generate a Physical Map with 30,000 STSs, to sequence the euchromatin with 99% accuracy, to increase the capacity of sequencing with a subsequent reduction in sequencing cost, to map the single nucleotide polymorphism, identification of full length cDNA, to sequence the genomes of model organisms such as E. coli, S. cerevisiae, C. elegans, D. melanogaster which could be subsequently used in comparative analysis.

13

Molecular Genetics ZOOLOGY Large scale analysis of genome: Human Genome Part I