A few highlights from the HGP databanks: n The contains 3 billion chemical nucleotide bases biotech 101 (A, C, T, and G). Compare this to the laboratory mouse (2.6 billion), the fruitfly (137 million), yeast (12.1 million), and the bacterium The Project: E. coli (4.6 million).

n The average gene consists of 3,000 bases, but sizes vary greatly, looking back, looking ahead with the largest known human gene being dystrophin at 2.4 mil- ive years ago this spring, the “com- other and the environment. Furthermore, lion bases. Fpletion” of the Human Genome they didn’t have a clear understanding of n The functions are unknown for Project (HGP) was announced with much how genes keep us healthy or predispose more than 50 percent of discov- fanfare. This 13-year collaboration identi- us to disease. A representative genome ered genes. fied the sequence of the 3 billion chemi- had been sequenced, but how many dif- n The human genome sequence is cal bases of information that reside on 46 ferences would be found if peoples from almost (greater than 99 percent) chromosomes in nearly every cell in our around the world were compared? How the same in all people. body. The published DNA sequence was did the human sequence compare to n About 2 percent of the genome akin to an operations manual or book of those of other ? Sequencing encodes instructions for the syn- recipes, identifying the genetic instruc- the human genome raised more ques- thesis of . tions for how cells build, operate, main- tions than it answered. tain and reproduce, all while responding n More than 40 percent of the Flash forward to 2008 and while ques- predicted human proteins share to varying conditions from the surround- tions still abound, scientists are begin- similarity with fruitfly or worm ing environment. (See the listing, left, for ning to compile an impressive array of proteins. some of the HGP’s findings.) The finished results. In this issue of Biotech 101, we’ll sequence was hailed as ’s “moon- n Genes appear to be concen- examine the impact of the HGP, high- shot” – accomplished by the collabora- trated in random areas along the lighting three research areas. Just like genome, with vast expanses of tive efforts of over 2,000 scientists spread the HGP, all the information generated DNA between. across six countries. Coming in under from these resulting fields are freely budget and two years ahead of schedule, n Chromosome 1 (the largest hu- accessible by scientists and the public it was truly an amazing effort. man chromosome) has the most around the world – see the “Want more While the completion of the HGP may genes (2,968), and the Y chromo- information” section to link directly to have felt like the end of an era, in reality some has the fewest (231). the research. it was only the beginning. Scientists had very little knowledge of how cells utilized Want more information: the information found in each genetic Genetic Variation recipe to function and interact with each Any two are identical at the The Human : 99.9 percent level. While there are many www.genome.gov/10001772 more similarities than differences, slight variations in our DNA can have a major The International HapMap: impact on whether or not we develop a www.hapmap.org particular disease, how we respond to ENCODE: an infection and which drugs are most www.genome.gov/10005107 effective. In order to understand the ef- fects of this genetic variation, we first A quick guide to sequenced need to identify and catalog the similari- from the Genome ties and differences found across human News Network: populations. The International HapMap www.genomenewsnetwork.org/ resources/sequenced_genomes/ Project was created in 2003 to compare genome_guide_p1.shtml the genetic sequences of different in- dividuals. Internationally funded, the project is a collaboration between sci- 4 entists from Japan, Canada, the United hudsonALPHA institute for biotechnology to roosters. Since the mid 1990s, over 180 organisms have had their genomes sequenced. By compar- ing the human reference sequence with genomes of other organisms, researchers can identify areas of similarity and difference and better understand a gene’s structure and function. For example, researchers have found that two-thirds of hu- man genes known to be involved in cancer have counterparts in the fruit fly. Research using organisms such as flies, worms and mice of- fers a cost-effective way to examine the functions of these similar genes. Increased understanding of genes’ Kingdom, China, Nigeria and the United pilot set of data. Suprisingly, the organi- connections to diseases supports States. The HapMap describes where zation and function of the genome ap- opportunities for potential treat- in DNA the variants are found and how pears to be far more complicated than ments and pharmaceutical inter- they are distributed within and across most had suspected. The ENCODE ventions. world populations. The project does not data indicate that the majority of DNA Comparative also pro- connect the variation to a specific illness, in the human genome has some sort of vides a powerful tool for studying but rather provides the raw information functional role. This challenges the long- evolutionary changes among or- that researchers can use to link genetic standing view that the human genome ganisms, helping to identify genes variation to disease risk. Although only a consists of a relatively small set of func- that are conserved among few years old, the information gleaned tional elements (the genes) along with a (which may represent critical bio- by the HapMap is beginning to yield re- vast amount of so-called “junk” DNA that logical functions across the tree of sults. Working with the HapMap’s cata- is not biologically active. The new data life), as well as genes that give each log of variation, researchers have iden- indicate the genome contains very few its unique characteristics. tified genetic variation that contributes unused sequences and, in fact, is a com- to risk for a number of diseases, includ- plex, interwoven network. ing type 2 diabetes, Parkinson’s disease, One member of the multi-disciplin- The Future heart disorders, obesity, Crohn’s disease, ary ENCODE team is a familiar face at The completion of the Human prostate cancer and age-related macular HudsonAlpha, Dr. Richard Myers, the Genome Project has allowed re- degeneration. institute’s director. Dr. Myers’ lab is iden- searchers to initiate the shift from tifying sequences in the genome where sequence determination to func- proteins or enzymes bind and activate tional understanding. In many ways, Controls and Signals DNA sequences. He is also testing a new the real work has just begun on this ENCODE, the ENCyclopedia Of DNA approach to determine the methylation journey into the genome era. Given Elements, was launched in 2003 to de- status of specific DNA regions. Methyla- the breathtaking pace of discov- velop efficient ways of identifying and tion refers to a specific chemical modifica- ery from projects like HapMap and precisely locating all of the -cod- tion of DNA which can silence or reduce ENCODE and the wide array of ge- ing genes, non-protein-coding genes the activity of the affected DNA region. nomes from other organisms now and other functional components in the (Refer to the fall 2007 Biotech 101 article available for study, it will be excit- human genome. The goal of ENCODE on epigenetics for more details about ing to watch as the impact expands is to develop the parts list of biologically across biology, health and society. n functional elements and subsequently methylation and its role in gene activity.) determine the signals that activate or si- – Dr. Neil Lamb lence each of the parts, identifying how Comparative Genomics director of educational outreach HudsonAlpha Institute for Biotechnology such signals interact with each other and Comparative genomics examines and the environment. compares the genome sequences of dif- Last summer, the ENCODE team pub- ferent species - human, rat and a wide va- lished preliminary findings based on a riety of other creatures from roundworms 5 hudsonALPHA institute for biotechnology