<<

YALE JOURNAL OF BIOLOGY AND 80 (2007), pp.145-151. Copyright © 2007.

REVIEW

Personalized Genomic Medicine with a Patchwork, Partially Owned

Christopher E. Mason a,b* , Michael R. Seringhaus c, and Clara Sattler de Sousa e Brito b aProgram on Neurogenetics, Yale University , New Haven, Connecticut; bInformation Society Project, Yale Law School, New Haven, Connecticut; cYale Law School, New Haven, Connecticut

“His book was known as the Book of Sand, because neither the book nor the sand have any beginning or end.” — Jorge Luis Borges

The is a three billion-letter recipe for the genesis of a human being, directing development from a single-celled embryo to the trillions of adult cells. Since the sequencing of the human genome was announced in 2001, researchers have an increased ability to discern the genetic basis for diseases. This reference genome has opened the door to ge - nomic medicine, aimed at detecting and understanding all genetic variations of the human genome that contribute to the manifestation and progression of disease. The overarching vi - sion of genomic (or “personalized”) medicine is to custom-tailor each treatment for maximum effectiveness in an individual patient. Detecting the variation in a patient’s deoxyribonucleic acid (DNA†), ribonucleic acid (RNA), and structures is no longer an insurmountable hurdle. Today, the challenge for genomic medicine lies in contextualizing those myriad ge - netic variations in terms of their functional consequences for a person’s health and devel - opment throughout life and in terms of that patient’s susceptibility to disease and differential clinical responses to medication. Additionally, several recent developments have compli - cated our understanding of the nominal human genome and, thereby, altered the progres - sion of genomic medicine. In this brief review, we shall focus on these developments and examine how they are changing our understanding of our genome.

THE FIVE MODALITIES OF ence” genome has changed. Indeed, molec - GENETIC VARIATION ular characterization of karyotypes in the late 1990s already revealed substantial chro - Large-scale genomic variation mosomal abnormalities in phenotypically In the few years since the release of the normal people, a surprising degree of varia - draft human genome sequence, our under - tion that seemingly conferred no negative standing of the importance of this “refer - consequences [1]. After a decade of subse -

*To whom all correspondence should be addressed: Christopher E. Mason, Department of Genetics, Yale University Medical School, 300 Cedar Street, New Haven, CT 06511; E-mail: [email protected]. †Abbreviations: DNA, deoxyribonucleic acid; RNA, ribonucleic acid; CNVs, copy number variants; ncDNA, non-coding DNA; ENCODE, Encyclopedia of DNA Elements; SegDups, segmental duplications; TEs, transposable elements; LINEs, Long Interspersed Elements; SINEs, Short Interspersed Elements; LTRs, Long Terminal Repeats; NUMTs, nuclear DNA of mitochondrial origin; SNPS, single polymorphisms; AIMs, Ancestry-Informing Markers; GINA, Genetic Information and Nondiscrimination Act.

145 146 Mason: Personalized genomic medicine

Figure 1: Structural Divisions of the Human Genome. Most of the human genome con - sists of repetitive DNA sequences and transposable elements (LINES, SINEs, LTRs, and viruses). Very little of the genome is coding sequence (exons), but there is great room for flexibility and change with many of the gene’s long sequences ().

quent work in fine-mapping the variation lion variations with respect to the reference, present in the human genome, these variable totaling 12.3 million base pairs (MB) of se - regions of DNA (called structural variants) quence. This massive amount of divergent now are thought to cover as much as 20 per - sequence includes hundreds of thousands of cent of the length of the human genome. homozygous insertions and deletions (called Thus, depending on the variant present, some “indels”), previously thought to be very rare. tens of millions of can be missing or duplicated — even quadrupled — in any Rogue agents: autonomous, repeated, and mitochondrial elements one person [2]. We now know everyone has extra copies or missing copies of large parts The organization of the human genome of the genome, called copy number variants is more complex than initially believed (Fig - (CNVs), and that such large-scale variation ure 1) [4,5]. A quarter of the genome is ded - apparently leaves us no worse off. icated to introns, the transcribed but noncoding parts of . Another third of Rampant small-scale variation the genome consists of repeats of varying The first diploid sequence of a single kinds. Some repeated sections of the human human being has been published [3], further genome serve structural purposes (for in - expanding the amount of observed variation stance, those located at the ends and in the in a single human genome. Perhaps fittingly, middle of ), but most repeats the genome in question belongs to Craig are of unknown function and take the form Venter, a genome sequencing pioneer and of simple sequence repeats or larger seg - one of the leaders of the private human mental duplications (SegDups). Counting genome consortium that produced a draft se - toward the one-third of the genome that is quence in 2001. Venter’s genome shows the repeated, SegDups (defined as segments at initial estimate of similarity between the least 1,000bp long and appear at least twice of two randomly selected individ - throughout the genome) represent 5 percent uals was too high — each person is now of the human genome overall and 15 percent thought to be a mere 99.5 percent similar of all . Some of these seg - rather than the often-quoted 99.9 percent [3]. mental duplications hold extra copies of en - In total, Venter’s genome has some 4.1 mil - tire genes. Mason: Personalized genomic medicine 147

Though the majority of the human The human genome has also ex - genome is repetitive, not all of these repeats changed genetic information with the mito - are functionless or inactive. The most ubiq - chondrial genome [7]. To date, the human uitous repeated genetic elements in the genome has been colonized by nearly 300 genome are transposable elements (some - genes from mitochondria; these genes, times called TEs, or transposons), so called called nuclear DNA of mitochondrial origin because they can transpose, or jump, from (NUMTs), are expressed by the human one place to another in the genome. When genome for the use of the mitochondria. these elements jump, they sometimes create Further, 27 of these NUMTs do not appear new forms of genes that may prove useful to in the chimpanzee or other genomes, mean - humans; however, depending upon the pre - ing they have been incorporated into the cise site of insertion, such genomic shuffling human genome within the last 4 to 6 million instead may damage existing genes (for in - years (that is, since the divergence from our stance, the might land last common ancestor with chimpanzees). in the middle of a functional gene sequence, Most of these unique NUMTs (23 of 27) are disrupting it). Transposable elements exist in present within known or predicted human various forms, including Long Interspersed genes, indicating that the symbiosis be - Elements (LINEs), Short Interspersed Ele - tween these two genomes is very fast evolv - ments (SINEs), and the small 300bp Alu el - ing and very gene-centered. ement (considered a SINE). The is present in 75 percent of introns and ac - Most of the (nonrepetitive) genome is transcribed counts for 10 percent of the entire genome, whereas the 6,000 bp (6 kb) LINEs account It has been known since the 1960s that for 20 percent of the human genome and are a significant amount of non-coding DNA present in most human genes. Indeed, when (ncDNA) is transcribed and some of this adding the sequences from repetitive ele - leads to the production of functional ncR - ments into the size of the genes, introns ac - NAs (like ribosomal RNA). However, until count for 37 percent of the human genome, recently, it was not clear what fraction of the showing just how prevalent and successful genome was actively processed or how these transposable elements have been at in - much of this might contribute serting themselves throughout the human directly to biological function. To address genome over several million years. this and other questions, the ENCODE proj - Other transposable elements have in - ect (the ENCyclopedia of DNA Elements) vaded our genome more recently [6]. Since was launched in 2003 to identify all the our divergence from the common ancestor functional elements in the human genome. we shared with the chimpanzee, approxi - The pilot phase of the ENCODE project was mately 98,000 viruses have invaded our completed in 2007, and the published results genome and are now a part of our species’ show that an astonishing 93 percent of the genetic code, busily copying themselves and non-repetitive sequence studied was either then re-inserting back into the human transcriptionally active or otherwise func - genome. These old viruses, called endoge - tionally relevant [8,9]. This surpassed earlier nous retroviruses, total 8 percent of the estimates, which had suggested that between human genome (24 percent of all repeated 30 percent and 60 percent of the genome sequence). These viruses include Long Ter - was transcriptionally active [10,11,12]. minal Repeats (LTRs), which reverse tran - scribe themselves (from RNA  DNA, as : The mutable backbone of the genome the AIDS virus does), DNA transposons, and some viruses that lie dormant and can no The fifth and final modality of variation longer replicate themselves (totaling about for the human genome is that the epigenome 4 percent of the human genome, or 12 per - (“epi” is Latin for “above” or “upon”) has a cent of repeated sequence). large role to play in human genetic variation. 148 Mason: Personalized genomic medicine

Either with slight chemical modification to from other species, it now seems likely that the DNA (methylation) or various changes massively parallel functional redundancy is to the histones supporting the DNA (methy - the most likely explanation. Could critical lation, acetylation, phosphorylation, and genetic architectures — those absolutely in - ubiquitination), the activity of a gene can be tolerant of disruption — really be that altered by inhibiting or allowing various scarce, or are there other reasons why dis - transcription factors access to the targeted ruptions of these areas are not seen? gene sequence. Some diseases now are Tests of these two genomic hypotheses being defined not only by catalogued se - (rarity vs. redundancy as an explanation for quence mutations, but also by the “epimuta - our high tolerance for variation) will occur tions” that appear to accompany the disease. within our lifetimes, and these newfound These modifications to the epigenome can reservoirs of variation will be needed to gen - be transmitted to offspring in surprisingly erate an accurate phenotype-to-genotype simple ways, such as by a mother’s behavior map. Indeed, the next step for Craig Venter’s during child-rearing [13] or general errors genome is to probe its cellular function: during egg and sperm production [14]. Be - Levy and colleagues plan to sample tissues cause technologies and methodologies for from each part of Venter’s body and test the studying are still in the early expression of all exons (i.e. the “exome”) stages, it remains unknown how much these [16]. (Suddenly, volunteering your genome type of modifications contribute to human doesn’t sound like such a good idea.) Com - phenotypic variation. piling all this information on genotype (ge - netic sequence variants such as SNPs, CNVs, and mutations) and relating it to phe - THE LARGER PICTURE: WHAT IT notype (enzyme levels, gene expression, and MEANS FOR THE GENOME severity of a disease) is a mammoth under - Taken together, these five discoveries taking. This information will need to be con - complicate traditional notions of the gene textualized and understood before it can be and our understanding of the genome. While used to implement genomic medicine, and a clear translational distinction remains be - the current paucity of such genotype-to- tween coding and non-coding genes, we phenotype databases — as well as the large now know that a large amount of the amount of work needed to create them — genome is transcriptionally active and pro - likely will delay the application of such per - duces RNAs that do not fit into current func - sonalized care for some time. tional categories. As we look closer at the However, one thing that can be done structure of the human genome, we find it very well with currently available sequence resembles a patchwork manuscript: a variants is to trace an individual’s family his - palimpsest of genetic elements (new and tory and ancestral roots. As mentioned previ - old, foreign and endogenous) constantly ously, it is now estimated that any person is adapting to fast evolutionary selection pres - 99.5 percent identical, on average, to anyone sures, as opposed to an eternal, unchanging else at the genomic level, which translates to “Book of Life” for our species. Indeed, a difference of about one nucleotide per 500 when viewed this way, it seems almost bp. This means an average of six million sin - miraculous that this “Book of Sand” [15] gle nucleotide polymorphisms (SNPs) and functions at all. For the structurally dynamic other sequence variants should exist between human genome to remain operational, given any two randomly selected human individu - our complex neurophysiology and precise als. The hunt to characterize these SNPs and developmental plan, the system must con - their distribution in various populations tain either extensive functional redundancy spurred the creation of the haplotype map - or a scarcity of critical genetic architectures. ping project (HapMap). HapMap has thus far Given the work of the ENCODE Consor - shown that SNP variation is not random, but tium in humans and corroborating evidence instead can be traced to the likely migration Mason: Personalized genomic medicine 149

Figure 2: Number of Life-Based Patents. Genes, their regulatory sequences, and gene expression profiles account for the bulk of life-based patents, followed by protein-protein interactions. Haplotypes and SNPs, which will become critical for , have just begun to be patented. Three major biochemical pathways (inset) also have nearly 1,000 patents issued [23].

patterns of Homo sapiens (modern humans) lifetime (i.e., mutations since birth). They during the last 100,000 years. Certain SNPs then hope to tailor medications to the indi - and CNVs have been found only within cer - vidual in consideration of these genetic vari - tain populations — this is particularly notice - ations. Because many commonly used drugs able among isolated peoples who have not are not equally effective in all patients, it is interbred with other groups, like the African hoped that a more thorough understanding Yorubi. More commonly, genetic variants of the genetic underpinnings to a given dis - within a group of people are not binary (i.e., ease will enable more accurate diagnosis and simply present or absent), but rather show an treatment. For example, two commonly used increased or decreased frequency. Using only cancer drugs, Gleevec [17] and Herceptin 10 of these types of markers, called Ances - [18], have been shown to be substantially try-Informing Markers (AIMs), virtually more effective if certain genes are being ex - anyone can be genotyped into a stereotype. pressed or protein conformations are pres - ent. The promise of personalized genomic medicine has prompted the creation of the PHARMACOGENOMICS AND Personal at Harvard Uni - versity (as distinguished from the Human The five new data and genetic cate - Genome Project, which is concerned with gories detailed above have created pharma - the genetic component of our species as a cogenomics, a new discipline that draws whole), and volunteers are encouraged to upon both pharmacology and . Re - submit genetic samples for whole-genome searchers in this nascent area aim to under - sequencing, which is accompanied by a rig - stand a person’s risk for disease based upon orous examination and physical characteri - his or her ancestral genetic roots, plus the ac - zation (MRIs, blood tests, medical records). crued genetic variation within that person’s It also prompted action in the United States 150 Mason: Personalized genomic medicine

Senate by Senator Barack Obama, who in - Congress every year since 1995. It has yet troduced a bill to procure more federal fund - to pass both the House and the Senate. ing for such research [19]. Personalized genomic medicine holds great promise, but is not without its risks. Any Bio-patenting: An obstacle to partitioning of human genetic variation into personalized medicine? separate states (e.g., disease vs. normal) could Pharmacogenomics research does pres - be of great benefit to both medicine and sci - ent problems, however. Whenever any sig - ence. With such information, the population- nificant disease marker is found, it is specific effects of any ailment could be generally patented immediately, potentially modeled and appropriately treated, and those frustrating future research. Notable exam - whose DNA does not predispose them to such ples of this patent rush include the BRCA diseases can be given the happy news. Still, it gene (Myriad Genetics), Canavan Disease remains unclear how well these ongoing stud - gene (MCH), and Hepatitis C marker geno - ies will translate into actual, available treat - types (Roche). If researchers from another ments for patients. The difficulties for laboratory or organization wish to look for pharmacogenomics should not be understated, additional markers or mutations within these given the emerging complexities of the genome genes, they may be infringing these patents (massive structural variation, large-scale se - and could face legal action. Unless a specific quence variation, pervasive expression of policy of open licensing for a patent is cre - ncDNA, autonomous transposable elements, ated, downstream research on these genes and epigenetic changes). To fully understand may effectively be blocked. The number of the genotypic risks for a disease, particularly human genes patented today exceeds 20 per - one with links to multiple interacting loci, sci - cent of all known genes in the genome [20], entists will need to account and correct for all of and already many patents have been issued these sources of variation. How many of these on entire biochemical pathways, such as solutions will be found and how many obsta - NF- κB [21] (Figure 2). Some preliminary cles will appear remains to be seen — but as evidence suggests that such “patent thickets” we consider these questions, the dawn of per - may negatively impact further research and sonalized medicine is showing its first light. knowledge dissemination, which are, of REFERENCES course, critical to scientific progress [22]. 1. Bryndorf T, Kirchhoff M, Maahr J, Gerdes T, Karhu R, Kallioniemi A, et al.. Comparative genomic hybridization in clinical cytogenet - SUMMARY ics. Am J Hum Genet. 1995;57:1211-20. The technologies and discoveries dis - 2. Scherer S, Lee C, Birney E, Altshuler DM, cussed here open the door to new questions Eichler EE, Carter NP, et al. Challenges and standards in integrating surveys of structural in genomics. Along with incremental knowl - variation. Nat Genet. 2007;39:S7-S15. edge of genotype-to-phenotype matches 3. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, comes a new set of issues: scientific, legal, et al. The Diploid Genome Sequence of an and ethical. At a crime scene, for instance, Individual Human. PLoS Biology. 2007;5(10):e254. should sweeps of genomic DNA samples be 4. Venter JC, Adams MD, Myers EW, Li PW, Mural permitted, as we now demand of finger - RJ, Sutton GG, et al. The sequence of the prints? Will the ability to “read” the genetic human genome. Science. 2001;291:1304-51. probabilities in an individual’s genome pro - 5. International Human Genome Sequencing Con - sortium. Initial sequencing and analysis of the vide fodder for discrimination? Should it? human genome. Nature. 2001;409:860−921. Will such knowledge transform parenting 6. Bannert N, Kurth R. Retroelements and the into a series of checkboxes to craft the ideal human genome: new perspectives on an old child? Vague fear of such eventualities has relation. PNAS. 2004;101:14572-9. 7. Ricchetti M, Tekaia F, Dujon B. Continued been lingering for years, highlighted by the Colonization of the Human Genome by Mito - Genetic Information and Nondiscrimination chondrial DNA. PLoS Biology. Act (GINA), which has been introduced into 2007;2(9):e273. Mason: Personalized genomic medicine 151

8. Kapranov P, Willingham AT, Gingeras TR. 16. Levy, Samuel. Conversation with Christopher Genome-wide transcription and the implica - Mason. 2007 Oct 16. tions for genomic organization. Nat Rev 17. Hughes TP, Kaeda J, Branford S, Rudzki Z, Genet. 2007;8(6):413-23. Hochhaus A, et al. Frequency of major mo - 9. ENCODE Project Consortium. Identification lecular responses to imatinib or interferon and analysis of functional elements in 1% of alfa plus cytarabine in newly diagnosed the human genome by the ENCODE pilot chronic myeloid leukemia. N Engl J Med. project. Nature. 2007;447(7146):799-816. 2003;349(15):1423-32. 10. Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, et al. Global identification of human 18. Dowsett M, Bartlett J, Ellis IO, et al. Corre - transcribed sequences with genome tiling ar - lation between immunohistochemistry (Her - rays. Science. 2004;306(5705):2242-6. cepTest) and fluorescence in situ 11. Rinn JL, Euskirchen G, Bertone P, Martone hybridization (FISH) for HER-2 in 426 breast R, Luscombe NM, et al. The transcriptional carcinomas from 37 centres. J Pathol. activity of human 22. Genes 2003;199:418-23 Dev. 2003;17(4):529-40. 19. Obama, B. The Genomics and Personalized 12. Mason C, Gauhar Z, Stolc V, Halasz G, van Medicine Act. S. 3822. United States Senate: Batenburg MF, et al. A gene expression map for 109th Cong; 2006. the euchromatic genome of Drosophila 20. Jensen K, Murray F. Intellectual property. En - melanogaster. Science. 2004;306(5696):655-60. hanced: intellectual property landscape of the 13. Weaver IC, Champagne FA, Brown SE, human genome. Science. 2005;310(5746):239- Dymov S, Sharma S, Meaney MJ, Szyf M. 40. Reversal of maternal programming of stress 21. Garber K. Decision on NFkappaB patent responses in adult offspring through methyl could have broad implications for biotech. supplementation: altering epigenetic mark - Science. 2006;312(5775):827. ing later in life. J Neurosci. 22. Huang KG. Impact of intellectual property 2005;25(47):11045-54. rights on scientific knowledge diffusion, accu - 14. Hitchins MP, Wong JJ, Suthers G, Suter CM, mulation and utilization [dissertation]. Boston: Martin DI, Hawkins NJ, Ward RL. Inheri - tance of a cancer-associated MLH1 germ-line Massachusetts Institute of Technology; 2006. epimutation. N Engl J Med. 23. Merrill SA, Mazza A, editors. Reaping the 2007;356(7):697-705. Benefits of Genomic and Proteomic Re - 15. Soares LM, Valcárcel J. The expanding tran - search: Intellectual Property Rights, Innova - scriptome: the genome as the “Book of tion, and . Washington DC: Sand.” EMBO J. 2006;25:923–31. National Academies Press; 2006.