<<

MOLECULAR Genomics to Personalized Healthcare MOLECULAR MEDICINE Genomics to Personalized Healthcare

FOURTH EDITION

Ronald J Trent PhD, BSc(Med), MBBS (Sydney), DPhil (Oxon), FRACP, FRCPA, FFSc, FTSE Professor of Medical Molecular Genetics, Sydney , University of Sydney and Director, Department of Molecular & Clinical Genetics, Royal Prince Alfred Hospital, NSW 2050, Australia

AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier Academic Press is an imprint of Elsevier 32 Jamestown Road, London NW1 7BY, UK 225 Wyman Street, Waltham, MA 02451, USA 525 B Street, Suite 1800, San Diego, CA 92101-4495, USA

First edition 1993 Second edition 1997 Third edition 2005 Fourth edition 2012

Copyright © 2012 Elsevier Inc. All rights reserved

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher

Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (44) (0) 1865 843830; fax (44) (0) 1865 853333; email: [email protected]. Alternatively, visit the Science and Technology Books website at www.elsevierdirect.com/rights for further information

Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress

ISBN: 0-443-04635-2 (First ed) ISBN: 0-443-05366-9 (Second ed) ISBN: 978-0-12-699057-7 (Third ed) ISBN: 978-0-12-381451-7

For information on all Academic Press publications visit our website at www.elsevierdirect.com

Typeset by MPS Limited, Chennai, India www.adi-mps.com

Printed and bound in China

12 13 14 15 10 9 8 7 6 5 4 3 2 1 Acknowledgments and Dedications

I would like to thank members of the Mary Preap and Julia Haynes from Elsevier Molecular Genetics Laboratory at RPA Hospital. have been very supportive. Their skills and dedication made molecu- I dedicate the 4th Edition to my family – Pit, lar medicine a lot more interesting. Prof. John Charlotte and Timothy. They have constantly Buchanan in Auckland understood early on provided support and understanding when I that Molecular Medicine was important for needed to do “home work” for this book. Also patient care and steered me towards the educa- my Executive Assistant Carol Yeung, who has tional aspects. My mother Ninette and my sister drawn the illustrations for all four editions and Lynette have always been there when needed. still remains enthusiastic.

vii Preface

There have been six major developments professionals are suitably engaged. The first edi- since the third edition of Molecular Medicine: tion was subtitled: An introductory text for stu- dents. This was left out in subsequent editions 1. Growth of omics particularly genomics; on the assumption that the clinical applications 2. The start of whole genome sequencing for of DNA-based medicine were being taught in patient care; the universities. However, new developments 3. Broader acceptance of in in omics are occurring rapidly, and there is some selecting the right drug or its dose based on concern that their educational aspects are not molecular typing of patient DNA; being addressed in many of the modern cur- 4. A shift to somatic cell genetics particularly ricula. Governments and major research funders solid ; are attempting to fast track the translational 5. Expansion in the Direct-to-Consumer DNA aspects of molecular medicine but this will not testing market, and be enough without linking their initiatives to the 6. Recognition of a roadblock to the effective education of tomorrow’s health practitioners. translation of molecular medicine research This edition no longer has a Glossary or including the need for better to Methodology because this material can be found understand the significance of DNA variants on the Internet. Nevertheless, Methodology and the many changes in DNA, RNA or even remains important, since patients and fami- chromosomes now detectable through omics lies are interested and will go to the Internet, strategies. so the health professional may be asked techni- The title to this edition has subtly changed cal questions. In the era of open yet personal- to include reference to personalized medi- ized medicine, there is no reason why the health cine, which, as explained in Chapter 1, is not professional and the patient or family can- new with some taking it as another example of not sit down and work through the technical inappropriate hype. Nevertheless, it attracts issues using the computer as a component of the attention and so is useful if it helps to push the consultation. translational components of molecular medi- Ronald J Trent cine and ensures the next generation of health Sydney, December 2011

ix CHAPTER 1 to Personalized Medicine

OUTLINE

Introduction 1 10 Years On 28 Genome Anatomy 2 Genome Variation 31 DNA 2 1 000 Genome Project 31 Protein-Coding Genes 9 Encyclopedia of DNA Elements Junk DNA 11 (ENCODE) Project 32 RNA 14 Personalized Medicine 32 ncRNA 15 Education and Resources 33 Chromosomes 18 Roadmap 34 22 References 36 Goals 24 The 10 Year Project 25

INTRODUCTION l Molecular genetics – the discipline within genetics that deals with the structure and There are many definitions of molecular function of DNA and RNA. medicine. In this book the term predominantly The common thread in these names is describes the effect that knowledge of DNA the way in which an understanding of DNA (and increasingly RNA) is having on medical and the ability to manipulate it in vitro or practice. Some other terms which overlap with in vivo – and increasingly now to interrogate it molecular medicine include: in silico – has greatly expanded the options that l – the application of DNA or are available in clinical practice, , RNA knowledge in research or industry. research and industry. l Genetic engineering or recombinant DNA Single Mendelian disorders are relatively (rDNA) technology – the manipulation of an uncommon and are traditionally considered organism’s DNA using DNA or RNA-based under genetics. Examples include cystic fibro- techniques. sis, hemophilia, Huntington disease and genetic

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00001-3 1 © 2012 Elsevier Inc. All rights reserved. 2 1. Genes to Personalized Medicine forms of . Complex genetic disorders are com- Today, medical research and clinical prac- mon and comprise important public health chal- tice underpinned by molecular medicine lenges both in the developed and the developing continue to provide novel insights into our world. Included here are diabetes, heart disease understanding of disease pathogenesis. From and dementia. The emerging health issues related these concepts, new to prevent or to aging and obesity also have a complex genetic treat important and common human disorders component underlying their pathogenesis. are starting to emerge. The consequences of the The understanding of complex genetic dis- Human Genome Project (described later in this orders requires a new level of sophistication chapter) are many. One of the significant but now possible through omics which describes less publicized outcomes has been the increas- an approach that characterizes all or many mol- ing trend to form large multi-centered interna- ecules within a cell, tissue or organism. The catalyst tional research collaborations that can ask very for omics has been the Human Genome Project ambitious research questions. which has rewritten the way research is con- ducted, and has enabled impressive technologic­­al developments. While genomics (all or many GENOME ANATOMY genes) will be the predominant theme of this book, it is important to acknowledge that other Most of what was considered the core com- omics particularly transcriptomics (all or many ponent of the human genome actually occu- RNA transcripts), metabolomics (all or many pies a relatively small portion of it. Only about metabolites), proteomics (all or many proteins), 1–2% contains protein-coding genes. The func- epigenomics (the complete epigenetic profile) tion of the remaining 98% is now starting to be and phenomics (the composite of the phenotypes) explored. This includes: contribute to molecular medicine. Thus genomic medicine overlaps with molecular medicine but 1. Intronic sequences; has a narrower brief. 2. Copy number variations; To store and analyze the large data sets gen- 3. Non-coding (nc) RNA genes; erated by omics requires sophisticated compu- 4. Regulatory elements, and ter power and software. This is bioinformatics 5. Repetitive DNA. (also called informatics, or computational biol- ogy). Related to bioinformatics is the concept of For convenience the term gene will gener- systems biology which attempts to join the dots ally describe segments of DNA that code for between the seemingly unrelated data that are proteins (these are also called structural genes), emerging (Chapter 4). although this does not distinguish other genes The emergence of molecular medicine may particularly the ncRNA genes described later. broadly be considered over three time periods: (1) The discovery of DNA structure in 1953 fol- lowed by developments in recombinant DNA DNA (rDNA) technologies; (2) The Human Genome Project 1990–2000, and (3) The launch of omics Many discoveries led to the uncovering of (Figure 1.1). Another way to track the mile- the double-stranded structure of DNA, pro- stones in molecular medicine is to consider posed by J Watson and F Crick in 1953, and the Nobel Prizes awarded for work in this area more followed to build the foundations for (Table 1.1). Key developments in molecular molecular medicine (Table 1.2). DNA comprises medicine are summarized in Table 1.2. two polynucleotide strands twisted around each

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 3

Double-stranded DNA can be DNA can be Automated DNA DNA sequenced amplified with PCR sequencing becomes available DNA Discovery 1953 1975 1985 1987

Beginning of "Book of Life" DNA diagnostics - Critical development for molecular medicine (human genome) can unlimited potential Human Genome Project be read base by base

Human Genome Project Controversy - DNA sequence for NIH policy starts, and first NIH patents first model organism that human genome successful gene anonymous (H.influenzae) sequences are DNA sequences published freely available

Human Genome 1990 1991 1995 1996 Project

Modern molecular Commercialization Success with model Two models: medicine era increasingly prominent organisms fuels public (free) & enthusiasm for completing commercial (user pay) human genome

First draft of Annotated final version First diploid Alternative fuels & human genome sequence of human genome human genome artificial bacterium publicly announced sequence now available published

Omics 2000 2003 2007 2010

Complete sequences Beginning of Beginning of Synthetic biology for fruit fly and a genomics era next generation on the march plant are published DNA sequencing

FIGURE 1.1 Three major developments in the evolution of molecular medicine. Various time periods are depicted with discoveries above and their implications below. other in the form of a double helix (Figure 1.2). These are called DNA variants, or point muta- In biological terms, the double-stranded DNA tions when they lead to genetic disease. structure is essential for replication to ensure Deletions or insertions affecting the codons that each dividing cell receives an identical copy can produce a smaller truncated protein or a of the DNA. frameshift abnormality (Chapter 3). The genetic code in DNA is represented by The genetic code needs to be read from the nucleotide triplets called codons (Table 1.3). sense strand. Hence, transcription to give the Each individual amino acid is represented by a appropriate mRNA sequence is taken from different triplet combination. Thus, the codons the antisense strand so that the single-stranded for a polypeptide such as: glycine-serine-valine- mRNA will have the sense sequence (anti- alanine-alanine-tryptophan will read: GGT TCT sense RNA is discussed later in this chapter). GTT GCT GCT TGG. The positions indicating More information on DNA structure, includ- where a polypeptide starts and where it ends ing its various A, B and Z forms can be found are also defined by triplet codons. For example, in reference [1]. The unit for measurement of ATG is found at the start, and the end or stop DNA is the base pair (bp). Thus 103 bp  1 Kb codons are TAA or TAG or TGA. Single base (kilobase); 106 bp  1 Mb (megabase); and changes in the DNA sequence occur regularly. 109 bp  1 Gb (gigabase).

MOLECULAR MEDICINE 4 1. Genes to Personalized Medicine

TABLE 1.1 Molecular medicine and Nobel Prize winners (1953–2011).a

Year Recipients Subject

1957a A R Todd Work on nucleotides and nucleotide co-enzymes 1958 G W Beadle, E L Tatum and J Lederberg Regulation and genes, and genetic recombination in bacteria 1959 S Ochoa, A Kornberg In vitro synthesis of nucleic acids 1962 J D Watson, F H Crick, M H Wilkins Structure of DNA 1965 F Jacob, A L Woff, J Monod Genetic control enzyme and virus synthesis 1968 R W Holley, H B Khorana, Interpretation of the genetic code M W Nirenberg 1975 D , H M Temin, R Dulbecco Reverse transcriptase and oncogenic viruses 1978 W Arber, D Nathans, H O Smith Restriction endonucleases 1980a P Berg and W Gilbert, F Sanger Creation of first recombinant DNA molecule and DNA sequencing 1989 J M Bishop, H E Varmus Oncogenes 1989a S Altman, T R Cech RNA ribozymes 1993 R J Roberts, P A Sharp Gene splicing 1993a K Mullis and M Smith Polymerase chain reaction (PCR) and site directed mutagenesis 1995 E B Lewis, C Nusslein-Volhard, Genetic mechanisms in early embryonic development E F Wieschaus 2001 L H Hartwell, T Hunt, P M Nurse Key regulators of the cell cycle 2002 S Brenner, J E Sulston, H R Horvitz Genetic regulation of organ development and programmed cell death 2004 R Axel, L B Buck Discoveries of odorant receptors and the organization of the olfactory system 2006 A Z Fire, C C Mello Discovery of RNA interference 2006a R D Kornberg Studies on the molecular basis of eukaryotic transcription 2007 M R Capecchi, M J Evans, O Smithies Targeted gene insertion into ES cells to produce transgenic mice 2009 E H Blackburn, C W Greider, J W Szostak and telomerases in chromosome protection aNobel Prize in ; all others Nobel Prize in Physiology or Medicine. The list starts at 1953 when the structure of DNA was described.

DNA Replication strand and a second complementary new strand. DNA replication involves the separation of The first step in replication is to unwind the the double-stranded DNA and then the duplica- double-stranded DNA using DNA helicase. DNA tion of each strand. The final product is two DNA polymerase then synthesizes the new strand in a copies, each of which has one original parental 5 to 3 direction. However, since the two DNA

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 5

TABLE 1.2 The evolution of molecular medicine.

Discoveries and achievements

1869: A Swiss named F Miescher isolated an acidic material from cell nuclei which he called nuclein. From this came nucleic acid. 1940s–1950s: O Avery and colleagues showed that genetic information in the Pneumococcus was found within its DNA. E Chargaff demonstrated equal numbers of the nucleotide bases adenine and thymine as well as guanine and cytosine in DNA. This and the X-ray crystallographic work by R Franklin and M Wilkins, enabled J Watson and F Crick to propose the double-stranded structure of DNA in 1953. Complementary strands that made up the DNA helix were then shown to separate during replication. DNA polymerase was discovered by A Kornberg in 1956. It enabled small segments of double- stranded DNA to be synthesized. Fifty years later his son R Kornberg was awarded the Nobel Prize in Chemistry for his work on the molecular basis of eukaryotic transcription. 1960s–1970s: Discoveries included: (1) Showing mRNA to be the link between the nucleus and the site of protein synthesis in the cytoplasm. (2) Identification of autonomously replicating, extra-chromosomal DNA elements called plasmids. These were shown to carry genes including those coding for antibiotic resistance in bacteria. (3) The genetic code for each amino acid was shown to be a nucleotide triplet (Table 1.3). In 1961 M Lyon proposed that one of the two X chromosomes in female mammals was normally inactivated. The process of X-inactivation enabled males and females to have equivalent DNA content despite differing numbers of X chromosomes. Restriction endonuclease enzymes were isolated from bacteria by H Smith, D Nathans, W Arber and colleagues. They digested DNA at specific sites determined by the underlying nucleotide base sequences allowing DNA fragments of known sizes to be produced. In 1966 V McKusick published Mendelian Inheritance in Man, a catalog of genetic disorders in humans. This became a forerunner to the many databases or banks that would subsequently be created to store DNA, information or tissues. 1970s–1980s: The dogma that DNA → RNA → protein moved in only one direction was revised when H Temin and D Baltimore showed that reverse transcriptase, an enzyme found in the RNA retroviruses, allowed RNA to be copied back into DNA, i.e. RNA → DNA. This enzyme would later provide the researcher with a means to produce DNA copies (known as complementary or cDNA) from RNA templates. Reverse transcriptase also explained how some viruses could integrate genetic information into the host’s genome. DNA ligase was discovered and allowed DNA fragments to be joined. The first recombinant DNA molecules comprising segments that had been stitched together were produced by P Berg and colleagues. S Cohen and colleagues showed that DNA could be inserted into plasmids and then reintroduced back into bacteria. Replication of the bacteria containing the foreign DNA enabled unlimited amounts of a single fragment to be produced, i.e. DNA could be cloned. DNA sequencing methodologies were described by F Sanger and W Gilbert. Protein-coding genes were shown to be discontinuous with coding regions (exons) split by non-coding regions (introns). From this splicing was described by R Roberts and P Sharp to explain how introns were removed in the process of transcription. The importance of genes from evolutionary conservation was demonstrated by E Lewis, C Nusslein-Volhard and E Wieschaus with their work on development in Drosophila. Variations in the length of DNA segments between individuals (called DNA polymorphisms) were described. Subsequently D Botstein showed that DNA polymorphisms allowed maps of the human genome to be developed. Using mitochondrial DNA polymorphisms and DNA sequence information R Cann and colleagues proposed that homo sapiens evolved from a common female ancestor in Africa. First use of DNA polymorphisms for forensic purposes reported by A Jeffreys. 1990s: Human Genome Project (HGP) starts with the publicly-funded initiative led by F Collins. The private sector later becomes involved through J C Venter. New technologies for gene mapping and DNA cloning developed. YACs (yeast artificial chromosomes) were the early choice but a better vector, BACs (bacterial artificial chromosomes) was developed. Bioinformatics starts to play a critical role in both the storage of data and its analysis because of the increasingly complex mapping and sequencing data that emerge. First model organism is sequenced. RNAi discovered by A Fire and C Mello. 2000s: The HGP is completed and the genomics era starts. This soon moves to the omics era as new analytic platforms emerge. An increasing number of genomes are sequenced and by 2005 the NG (next generation) DNA sequencing platforms and protocols emerge with the target being a whole human genome sequence costing $1 000. Metagenomics (sequencing uncultured organisms from various environmental samples) using the omics/shot gun approach starts to gain traction. 2007–2011: Analytic platforms continue to evolve rapidly with the suggestion that 3rd generation sequencing now emerging will reduce the cost of a whole genome sequence to around $100! Synthetic biology hits the headlines with the first synthetic microbe Mycoplasma mycoides JCVI-syn1.0 published. Major national and international consortia are formed to study cancer at the somatic cell DNA level. There is growing interest in the link between drug selection in cancer through DNA testing (companion diagnostics) as well as the importance of pharmacogenetics in ensuring the right drug for the right person at the right dose. The concept of Personalised Medicine takes hold.

MOLECULAR MEDICINE 6 1. Genes to Personalized Medicine

strands are anti-parallel and replication proceeds 5' 3' in the 5 to 3 direction, one of the strands (the 3' 5' one orientated in the 3 to 5 direction) will be copied continuously in the direction of the repli- cation fork (called the leading strand). The other strand will be pointing in the wrong direction at 5' 3' C G the replication fork (3 to 5) and so it is copied 3' 5' away from the replication fork (Figure 1.3). The latter strand (lagging strand) is not copied con-

5' C tinuously but in fragments (Okazaki fragments). Gaps between these fragments are eventually 4' 1' P P 3' 2' filled in with DNA polymerase and stitched together with DNA ligase. 5' 3' An RNA primer is needed before DNA P A T polymerase can work. The primer dissociates 3' 5' as the DNA polymerase moves along the tem- plate strand leaving a gap. In terms of chromo- somal replication, this means that the end of the 5' A P P newly synthesized strand will be shorter, lead- 4' 1' ing to a reduction in the repetitive elements in 3' 2' the (the chromosomal ends). This will 5' 3' need to be repaired with telomerase or the tel- G C 3' 5' omere will shorten. Computer animation of DNA replication can be found in [2].

P P Recombinant DNA (rDNA) DNA has a number of properties that can be FIGURE 1.2 The structure of DNA. Top: A schematic exploited in the laboratory. In terms of herit- drawing of the DNA double-helix. There are two comple- able genetic diseases, the DNA in all cells of an mentary strands which run in opposite directions: sense organism is identical in its sequence. Therefore, strand 5→ 3 and antisense strand 3→ 5. Yellow box: An obtaining a tissue specimen for DNA studies is expanded view of a single strand showing the three basic relatively simple, since a small amount of blood components: (1) Four nucleotide bases (C – cytosine, A – adenine, not shown are T – thymine and G – guanine). The will suffice. Isolation of DNA is straightfor- base adenine always pairs with thymine and cytosine with ward. Nuclei are first separated from cellular guanine (called Watson and Crick base pairing). Bases are debris by enzymes and detergents. DNA is then of two types: purines (A, G) and pyrimidines (T, C). (2) separated from protein by chemical or physical Deoxyribose sugar with the position of its 5 carbons num- means. Apart from blood, convenient sources of bered 1 to 5. (3) The phosphodiester (P) linkage between the deoxyribose sugars. Blue box: An expanded view of the DNA used in routine genetic diagnosis include two strands held together by hydrogen bonds between the exfoliated cells from mouth washes or the bases (two hydrogen bonds between A/T and three swabs, or hair follicles. between G/C). The higher the GC content the more stable The development of DNA probes followed is DNA. Thus, the GC content is important when design- from a 1960 observation that the two strands ing primers for a technique like PCR (Chapter 3). This was earlier thought to reflect the additional hydrogen bond of the double helix could be separated and between GC versus AT pairing but is now considered to be a then re-annealed. DNA probes were able to stacking effect. The direction for transcription is 5→ 3. identify specific regions in DNA through their

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 7

TABLE 1.3 The genetic code.

Second nucleotide First Third nucleotide [5]a T C A G nucleotide [3]

T Pheb Ser Tyr Cys T T Phe Ser Tyr Cys C T Leu Ser STOP STOP A T Leu Ser STOP Trp G C Leu Pro His Arg T C Leu Pro His Arg C C Leu Pro Gln Arg A C Leu Pro Gln Arg G A Ile Thr Asn Ser T A Ile Thr Asn Ser C A Ile Thr Lys Arg A A Met Thr Lys Arg G G Val Ala Asp Gly T G Val Ala Asp Gly C G Val Ala Glu Gly A G Val Ala Glu Gly G

aNucleotides code in sets of three (triplets) for individual amino acids. The triplets or codons are shown as they appear in DNA (T  thymine, C  cytosine, A  adenine and G  guanine). In mRNA, T is replaced by U (uracil). The code is degenerate, i.e. there can be more than one codon per amino acid. The genetic code is read from left to right, for example TTT  phe (phenylalanine); TCT  ser (serine); TAT  tyr (tyrosine). bAmino acids abbreviations are Cys  cysteine; Trp  tryptophan; Leu  leucine; Pro  proline; His  histidine; Gln  glutamine; Arg  arginine; Ile  isoleucine; Met  methionine; Thr  threonine; Asn  asparagine; Lys  lysine; Val  valine; Ala  alanine; Asp  aspartic acid; Glu  glutamic acid; Gly  glycine.

FIGURE 1.3 DNA replication. For replication the two 3′ strands (green and red) are opened and they form a fork. Since DNA polymerase only works 5’ to 3’ it makes a lead- 5′ Leading Strand ing strand with the green single-stranded DNA template and adds continuously in the direction of the fork. On the 5′ 3′ other hand, the second (red) strand is oriented in the oppo- site direction and so to move 5’ to 3’ the DNA polymerase 3′ 5′ must work in the opposite direction to give the lagging 3′ strand. In the lagging strand can be seen three Okazaki fragments with the most recently synthesized denoted by 5’ 5′ Lagging Strand to 3’. Arrows point in the direction that the DNA polymer- ase is moving.

MOLECULAR MEDICINE 8 1. Genes to Personalized Medicine

TABLE 1.4 Three types of DNA probes.

DNA probe Description Applications

Oligonucleotide 10–20 bp synthetic single-stranded DNA Can be used to: (1) Bind to each end of a fragment that segment. is to be amplified by PCR. (2) In DNA hybridization to detect fragments. cDNA Larger fragment up to Kb in size made by Used in gene mapping to detect a target sequence and mRNA via reverse transcriptase PCR. so determine the restriction enzyme digestion pattern These probes hybridize to DNA sequences around that sequence. Also used in hybridization assays from exons. to detect or polymorphisms. Genomic This is also a large fragment (up to Kb in As for cDNA probes but because it also incorporates non- size) but can comprise exons, introns and coding DNA sequences including repeat sequences it may non-coding sequences. not be as specific as the cDNA probes.

annealing (the technical term for this is hybridi- nucleotide base pairing, a single-stranded DNA zation) to complementary nucleotide sequences. probe would hybridize in solution to a prede- Therefore, DNA probes comprise single- termined segment of single-stranded DNA. stranded fragments of DNA which bind to the In 1975, solution hybridization gave way to complementary DNA sequences in another hybridization on solid support membranes, single-stranded DNA fragment. For example, when DNA digested with restriction endo- if the single-stranded target has the sequence: nucleases could be transferred to these mem- 5 – GGTTACTACGT – 3 the single-stranded branes by Southern blotting, a method named DNA probe will be 3 – CCAATGATGCA – 5. after its discoverer E Southern. The ability of The specificity of a probe resides in its nucle- radiolabeled DNA probes to identify specific otide sequence. Since double-stranded DNA is fragments, which are generated by digesting held together by hydrogen bonds, it is relatively DNA with restriction endonucleases, enabled easy to make both DNA probe and target DNA DNA maps to be constructed. This was the fore- single-stranded, e.g. heating breaks hydrogen runner of DNA analysis, which is dis- bonds. On cooling, the complementary DNA cussed further in Chapter 3. strands will re-anneal into base-paired double The double-stranded structure of DNA strands. Re-annealing will occur between the is also used by the molecular biologist to following combinations: DNA probe  DNA make primers for DNA amplification by PCR probe; target DNA  target DNA and (polymerase chain reaction). PCR forms the DNA probe  target DNA. The first DNA core technology for most DNA applications in probes were labeled with a radioactive marker molecular medicine. Although it is not neces- such as 32P, but now fluorescein is used, allow- sary for health professionals to be fully con- ing detection by lasers. DNA probes are of three versant with rDNA technologies, the one types: cDNA, genomic and oligonucleotide important exception is PCR (Chapter 3). (Table 1.4). The terminology probe and primer is con- The specificity of the hybridization reaction fusing. A DNA probe refers to a fragment of relies on the predictability of base pairing, i.e. DNA that is used in a hybridization reaction the nucleotide base adenine (A) will always to detect its corresponding, i.e. complementary anneal to the base thymine (T) while guanine fragment. In this way, a gene or DNA segment (G) will anneal to cytosine (C). Thus, because of can be identified if the probe is labeled with a

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 9 fluorescein dye. A DNA primer is also a seg- ment of DNA that hybridizes to its complemen- tary sequence, but in the context that it is used Gene for DNA amplification by PCR. GT AG GT AG E l E l E + Protein-Coding Genes DNA * The anatomy of the protein-coding gene became better defined through many new dis- pre-mRNA E l EEl coveries made during the 1970s and 1980s (Table 1.2). Eukaryotic genes are usually dis- E EE continuous – i.e. they have coding regions mRNA called exons broken up by non-coding regions called introns or intervening sequences (IVS) (Figure 1.4). During the process of transcrip- Protein tion, the entire genomic sequence is copied, and then the introns are removed by a process known as splicing to produce mRNA. FIGURE 1.4 Anatomy of a protein-coding gene. A mRNA has one important advantage over gene is a segment of DNA which contains genetic informa- DNA in rDNA technology – it contains only tion. It comprises a number of components. The beginning the essential genetic data found in exons with- (left hand or 5 end) has regulatory sequences ( * ) and the out the additional information found in introns. tail (right hand or 3 end) has a poly A tail () that helps This makes mRNA much smaller than its cor- to stabilize mRNA. The latter end has a DNA sequence AAUAAA necessary for cleavage site determination and responding DNA. cDNA refers to complemen- adding of the poly A tail. The gene itself is discontinuous tary (or sometimes called copy) DNA. The usual with coding regions called exons (red) separated by non- progression from DNA to RNA to protein can coding regions called introns (blue). Introns are also known be perturbed both in vitro and in vivo by the as intervening sequences (IVS). The border between introns enzyme reverse transcriptase. It is now pos- and exons is demarcated by splicing signals. At one intron/ exon boundary the splicing signal is a dinucleotide GT sible to take an mRNA template and produce (called the donor junction). The intron/exon boundary on from this a second strand which is the comple- the other side of the intron is an AG dinucleotide (acceptor ment of the mRNA. The double-stranded struc- junction). In addition to the GT and AG that are constant ture formed from this is called cDNA. Unlike at intron/exon boundaries there are additional nucleotide the starting or native DNA (genomic DNA), the signals that help to define when a gene should splice. Some functions for introns include: (1) Adding to genomic cDNA does not have introns but contains only complexity through alternative splicing, and (2) Housing coding (exon) sequences (Figure 1.5). regulatory regions. Precursor RNA formed initially dur- Gene expression is controlled by the promoter ing transcription copies the entire gene sequence (exons and regions located at the 5 end of genes, as well introns). The introns are next spliced out leaving the mature as more distant regulatory sequences known as messenger RNA (mRNA) with only the exons which have the protein’s code. The step from mRNA that will make the enhancers. Promoters work because they bind appropriate protein is called translation. proteins known as transcription factors. Increasing the access of transcription factors to the promot- ers will activate genes, while hiding or mutat- the chromatin (the complex of DNA and histone ing the promoter regions will down-regulate proteins in which the genetic material is pack- the gene’s function. A major influence on gene aged inside the cells of eukaryotes). The chroma- expression therefore occurs through folding of tin structure is dynamic and, in animal models,

MOLECULAR MEDICINE 10 1. Genes to Personalized Medicine

DNA

RNA polymerase

RNA Reverse transcriptase

Protein RNase

DNA polymerase cDNA

FIGURE 1.5 Making cDNA with RT-PCR. Double-stranded DNA (exons represented as red lines with introns as bro- ken blue lines) is transcribed into RNA (red line with •). In the normal course of events, the RNA is then translated into protein. However, reverse transcriptase (RT) allows a copy (cDNA) of the RNA to be made (green line). Once this occurs the RNA component of the cDNA is removed with an enzyme such as RNase. A DNA polymerase enzyme will then allow the second DNA strand of the cDNA to be formed. From the initial DNA template, a synthetic double-stranded segment containing only exon(s) has now been made. The type of PCR approach described above is usually abbreviated to RT-PCR (reverse transcriptase-PCR). changes can be inherited across generations produced by the same gene. The gene anat- which are independent of the DNA sequence omy shown in Figure 1.4, made up of three (see epigenetics in Chapter 2). At the other end exons and two introns, is provided for illustra- of the gene is the poly A tail which is added to tive purposes but is unusual as the structure of the mRNA after cleavage of the precursor RNA. most genes is much more complex. Another way in which protein complexity Alternative Splicing can be increased is through alternative splicing; The gene pool is relatively small, at around i.e. a gene with five exons, depicted as 1i2i3i4i5 20 000 genes. Most genes have many exons, and (number  exon; i  intron), can produce a their introns are often very large. For example, protein encoded by this genetic information, i.e. the BRCA1 gene (Chapter 7) has 24 exons and 12345. On the other hand, alternative splicing an mRNA (exons only) size of around 7 224 bp allows exon skipping which produces different (7.2 Kb). Yet the genomic structure (5 end, proteins: (1) Protein 1345 (exon 2 is left out), (2) exons, introns and 3 end) is considerably larger Protein 145 (exons 2, 3 are missed), (3) Proteins at around 81 189 bp (81.189 Kb). Larger size 234 and so on. The proteins produced from means a bigger protein, but it can also give the alternative splicing share some common struc- gene added flexibility in the type of protein(s) ture, but have significant differences. They are it produces. For example, there is evidence called isomorphs. that large introns contain regulatory sequences Alternative splicing is thought to occur in and non-coding (nc) RNA species (discussed most eukaryotic genes, with each having on below) that will allow different proteins to be average four different splicing options. This

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 11 is one mechanism explaining how the protein Repetitive DNA repertoire could be increased without changing Repetitive DNA can be divided into two the numbers of genes. How important it is, and classes: the tandem repetitive sequences how it compares with the role played by small (known as satellite DNA) and the interspersed ncRNAs (see below) is not known. The example repeats. The term satellite is used to describe given in which a gene made up of five exons DNA sequences that comprise short head- can produce different isomorphs by exon skip- to-tail tandem repeats incorporating spe- ping is likely to be conservative, since there is cific motifs. These make up one third of DNA now increasing evidence that introns and non- repeats and are exemplified by the macrosatel- gene segments may expand the options for lites, minisatellites and microsatellites. The lat- alternative splicing because they contain cryp- ter is the most relevant to medicine. A summary tic splice signals. of the satellite DNA repeats is given in Table As shown in Figure 1.4, there are signals at 1.5, and they are illustrated in Figure 1.6. intron/exon boundaries, such as GT and AG The microsatellites are single locus VNTRs dinucleotides, that indicate where splicing consisting of tandem, repeated, simple, nucle- should occur. If these signals are changed (or otide units of about 2–6 base pairs. The best new ones created) the cell can misinterpret the described are the dinucleotide repeats involv- signals and splice incorrectly. If this occurs, a ing bases such as adenine and cytosine (AC)n, genetic disease or new protein is possible. In where n (the number of repeats present) can comparing splicing between eight different vary from 10–60. Each STR identifies one unique organisms it was shown that levels are higher segment of the genome. Microsatellites, because in vertebrates than in invertebrates, and exon of their potential hypervariability, are more skipping is more likely where there are large informative than the biallelic RFLP system, but introns [3]. less than the minisatellites. Nevertheless, the microsatellites can be assayed by PCR, and their Junk DNA value or informativeness is increased by meas- uring a number simultaneously and adding Around 45% of the human genome com- together the information obtained. More com- prises repetitive DNA sequences that have plex and so potentially more informative DNA no apparent function. Most of the remaining polymorphisms were described in 1985. These human DNA is non-coding and non-repetitive, are called minisatellites and are discussed in and also appears to have no function. This was Chapter 9. previously called junk DNA, but is a misno- The interspersed repeats are thought to mer. Non-protein-coding DNA has now been have entered eukaryotic genomes during evo- shown to be non-random. It demonstrates lution via viral RNA, and so are examples of inter-species homology and is transcriptionally retrotransposons (Table 1.5). They contribute very active, as will be discussed in the RNA to the variability in the genome via their sites section. It is also thought to function as a hot of insertion leading to deletions being formed spot for recombination, which is possible since (and hence genetic disorders if gene function is the repeat sequences have no apparent coding perturbed) or producing hot spots for recombi- function. This means there would be less evo- nation or leading to copy number changes in a lutionary pressure for conservation. A greater gene. The insertion of these elements into genes degree of mutational activity would be possible can also increase protein variability, as sug- at these loci which would allow new genes to gested by the finding of many SINES in human form. mRNAs.

MOLECULAR MEDICINE 12 1. Genes to Personalized Medicine

TABLE 1.5 Variations in DNA in the genome.

Variation Description

Macrosatellite Small units of DNA are repeated in tandem thousands of times. Hence called VNTR (variable number of tandem repeat). This large polymorphism is found mostly in centromeres and telomeres. Minisatellite Repeat units are larger than macrosatellites but there are fewer. Also an example of a VNTR. These are discussed again in Chapter 9. Microsatellite These involve small tandem repeats, e.g. 2–6 bp in size, hence they are called SSR (simple sequence repeat) or STR (short tandem repeat). Microsatellites are used in gene discovery by linkage analysis (Chapter 2), for identification purposes, e.g. paternity testing or forensic DNA testing (Chapter 9). They form the basis of unstable triplet repeats in some neurologic disorders (Chapter 2). Single nucleotide These are single base changes with one nucleotide replaced by another. The Human Genome polymorphism (SNP – Project has greatly facilitated their discovery, and the numbers increase as more genomes pronounced SNIP) are sequenced (Chapter 4). Single base changes were previously found by digesting DNA with restriction enzymes and so they were called RFLPs (restriction fragment length polymorphisms). Today, SNPs are detected by DNA sequencing or microarrays. A related term for a SNP is the SNV (single nucleotide variation). Interspersed DNA repeats LINES  long interspersed elements. Occupy about 15% of the human genome and have been inserted randomly into eukaryotes during evolution, i.e. they are examples of retrotransposons. Can function as polymorphisms depending on their presence or absence in the genome. SINES (short interspersed elements) are derived from LINES and comprise about 10% of the human genome. They are mostly made up of Alu repeats (Alu – named after the restriction enzyme AluI) and are about 300 bp in size [4]. Copy number variations These are structural variants arising from deletions and duplications in the Kb to Mb range (CNVs) and so change the copy number for that genome region. On the basis of size CNVs contribute more than SNPs to variation in the genome. As well as functioning as polymorphisms they cause genetic disease by interfering with gene function or via dosage (gene copy number) effects. There are over 58 000 CNVs reported [5] and more are likely to be found.

Single Nucleotide Polymorphism (SNP) million human reference SNPs, which are iden- In 1978, a human DNA polymorphism tified by the prefix rs followed by a number – (RFLP) related to the β-globin gene was used e.g. rs10768683 is a DNA polymorphism found to detect the genetic disorder sickle cell ane- in the β-globin gene (HBB) [6]. mia (HbS). RFLPs were soon found through- SNPs are also present in genes where they out the human genome. They are now known can alter the triplet codon so that a different as SNPs and have replaced the microsatellites amino acid is produced (called non-synonymous as a research tool, because they can be mul- SNPs), or they have no effect on the amino tiplexed and automated, there are many of acid (called synonymous SNPs). Previously, the them and detection costs are falling rapidly. latter was considered to be a neutral change The finding that SNPs are inherited as blocks with no effect on the gene’s output, but now was an additional development that gave these it is known that some synonymous SNPs can polymorphisms extra flexibility in research influence gene function by creating cryptic (Chapter 2). As of late 2011 there were about 42 splice sites.

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 13

5′

3′ 3 1 2

A T G C A etc C G etc C Minisatellites Microsatellites

T C

VNTR RFLP Single Locus SNP Single Locus Multiple STRs VNTR Multilocus

FIGURE 1.6 Useful DNA polymorphisms in molecular medicine (see also Figure 3.2). DNA polymorphisms are arbitrarily defined as variations in a segment of DNA that are found in 1% of the population. This variation can be in fragment size or DNA sequence. Left box:  is an RFLP (restriction fragment length polymorphism) present at a single locus, and producing two polymorphic bands (large and small) of fixed size. The number of combinations generated by this bi-allelic polymorphism is limited to: large/large; small/small and large/small. The modern RFLP is now called a SNP (single nucle- otide polymorphism) because the single base change in nucleotide sequence (T – C) is sought directly rather than detecting it through an alteration in a recognition site for a restriction enzyme. Center box:  are polymorphic bands obtained for a single locus VNTR (variable number tandem repeat) minisatellite. These polymorphisms are more informative because there is greater variability between the sizes present for each of the two bands and so there is more chance that individuals will have different profiles.  Combining a number of different single locus VNTRs produces an even more characteristic set of markers per individual. Right box:  represent microsatellites. Each is a separate locus producing a different profile like the VNTR sin- gle locus. However, PCR allows simultaneous typing of multiple microsatellites giving a DNA profile with sufficient power to distinguish samples or individuals. Although microsatellites have been preferred for research applications they are now being replaced by the DNA sequence-based single nucleotide polymorphisms (SNPs) except in forensic DNA typing.

Single Nucleotide Variation (SNV) to find an official definition but generally a As more genomes are sequenced and vari- SNP refers to a benign change, i.e. single base ations in single base changes are found, a new changes that are commonly found within a term has emerged – the SNV. It is difficult population, whereas SNV is a more generic

MOLECULAR MEDICINE 14 1. Genes to Personalized Medicine term for single base variations that are yet to be CNVs are also important in somatic cell confirmed as being a SNP or a mutation (a dis- genetics, for example, the HER2 (human epi- ease causing change in the DNA). dermal growth factor receptor 2) gene and breast cancer. HER2 positive breast cancer, Copy Number Variation (CNV) caused by over expression due to gene amplifi- CNVs are defined as DNA segments 1 Kb cation, has a poorer prognosis but is more likely that are present in variable copy numbers in to be responsive to the drug Herceptin. Hence, comparison to a reference genome. These com- prior to taking this treatment, the tumor is ponents of the junk DNA are the focus of much tested for HER2 gene amplification (Chapter 7). attention at present, to define their roles in: CNVs in Table 1.5 are placed in a separate category to LINES and SINES but there is over- 1. Generating human diversity; lap with the interspersed repeats also contrib- 2. Disease causation, and uting to the formation of CNVs. On the other 3. Benign DNA polymorphisms. hand, variants that are exclusively LINES or Until recently, CNVs were difficult to identify SINES are excluded by some as being CNVs. unless appropriate quantitative PCR-type assays There is still more to know about the CNVs, were undertaken. Moreover, CNVs can be large; including what is the best definition of this hence the PCR approach would not detect them important component of junk DNA [7]. all. This is less of an issue now since the advent of omics-based technologies such as array com- RNA parative genomic hybridization (aCGH) and next generation (NG) DNA sequencing (Chapter 4). The main differences between RNA and Controversy remains on how common CNVs DNA include: are, with estimates ranging from 5% to 30% of 1. The nucleotide base thymine is replaced with the junk DNA [7]. This range reflects the abil- uracil; ity of different technologies to detect all the pos- 2. The backbone consists of ribose rather than sible CNV fragment sizes. However, what is not the 2-deoxyribose of DNA; disputed is that CNVs are hot spots for mutation, 3. RNA is usually single stranded, with some and so have possible roles in the evolution of the types such as rRNA and tRNA forming three genome. Their function in the etiology of com- dimensional molecular configurations; plex genetic diseases is actively being explored. 4. Except for some viruses, RNA does not Unexpected findings are also emerging as contain the cell’s genetic material; whole genome sequencing proceeds and the 5. RNA is easily degraded compared to DNA, data are analyzed. One of these is that many and so isolation techniques require the normal individuals have CNVs that appear to addition of chemicals to ensure that any impair gene function, without apparent clinical RNase (also written RNAase) enzymes consequences. Much remains to be learnt about present are inactivated; what is normal and what is abnormal in the 6. There are different types of RNA distinguished human genome. Nevertheless, there are tantaliz- by their functions (Table 1.6), and ing observations about CNV deletions or dupli- 7. RNA shows tissue-specificity like protein. In cations present in disorders for which there is a contrast, constitutive (germline or heritable presumed genetic component yet nothing sub- DNA) is identical in all cells. stantive has been found to date. These include schizophrenia, autism, Alzheimer disease and Thus, the relevant mRNA can only be iso- intellectual impairment [7]. lated from a tissue that is transcriptionally

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 15

TABLE 1.6 RNA functions.

Type of RNA Functions mRNA (messenger RNA) The intermediary between DNA in the nucleus and protein production in the cytoplasm. It is involved in transcription and requires the enzyme RNA polymerase II to bind to the gene’s promoter and, when additional transcription factors attach, transcription starts. A copy of one of the DNA strands is made as pre mRNA which includes the genetic information for both introns and exons. The pre mRNA is then processed in three steps: (1) The pre mRNA is capped at the 5 end. Capping is important for (2) Splicing out the intronic sequences from the pre mRNA. (3) A poly A tail is then added at the 3 end. Processing as described above is critical for mRNA production. mRNA then carries the genetic information copied from the DNA in the form of three base codons which specify a particular amino acid. The mature mRNA moves to the ribosomes where translation can occur. rRNA (ribosomal RNA) This RNA is synthesized by RNA polymerase I and makes up the complex that allows the mRNA and tRNA to interact and so produce the polypeptide via translation. The mature protein forms after post-translational modifications are completed. tRNA (transfer RNA) This RNA is synthesized by RNA polymerase III and transfers an amino acid to a growing polypeptide chain during translation. It functions like an adaptor by recognizing the codons on the mRNA and then linking them to the appropriate amino acid. ncRNA (non-coding RNA) It is estimated that about 70% of non-coding DNA is transcriptionally active with a number of RNA genes producing small RNA species with many functions. In addition there are particular RNA activities such as RNAi (RNA interference) and ribozymes (enzymatic RNA species). active in terms of the target protein. This lim- proteome is considerably more complex, con- ited the use of mRNA until fairly recently. Now, taining many 100 000s of proteins. Mechanisms through the use of PCR (Chapter 3) it has been to explain this discrepancy, including alternative shown that mRNA production in some cells, splicing, CNVs and DNA polymorphisms such such as peripheral blood lymphocytes, can be as SNPs, have been considered. Another way in leaky, i.e. there is transcription of mRNA spe- which the proteome can be diversified is through cies that are not directly relevant to the lym- ncRNA. phocytes’ function. These ectopic or illegitimate mRNA species are found in minute amounts ncRNA but the amplification potential of PCR can be utilized to isolate them. During the 1980s a new role for RNA was described, when a catalytic RNA species called Function a ribozyme was discovered. Since the small The role of RNA in transcription (mRNA) and number of protein-coding genes was insuf- translation (rRNA, tRNA) is well known, and will ficient to explain complexity in the proteome, not be discussed in detail here. It is summarized the focus shifted to junk DNA. From this came in Figure 1.4 and Table 1.6. Computer animations the discovery of ncRNA genes. It is important of both transcription and translation can be found to note that junk DNA also includes introns pre- in [2]. As noted earlier, the number of human pro- viously thought to have no function beyond tein-coding genes is smaller than expected, and splicing. Introns, particularly large ones, are comparable to or even smaller than is found in now considered to house various regulatory other animals or plants (Table 1.7). Yet the human elements including ncRNAs.

MOLECULAR MEDICINE 16 1. Genes to Personalized Medicine

TABLE 1.7 Haploid genome sizes and the number of protein-coding genes for different organisms [8].

Model Size of genomea Number of genesa

Human (Homo sapiens) 3.2 Gb 20 000 Human mitochondrial DNA 16.6 Kb 37 Non-human primate (chimpanzee) 2.7 Gb 19 000 Dog 2.4 Gb 19 300 Mouse 2.6 Gb 20 200 Zebrafish (Danio rerio) 1.4 Gb 24 000 Fly (Drosophila melanogaster) 165 Mb 13 600 Worm (Caenorhabditis elegans) 100 Mb 19 000 Flowering plant (Aridopsis thaliana) 119 Mb 30 000 Rice (Oryza sativa) 389 Mb 37 500 Plasmodium falciparum 22.8 Mb 5 300 Yeast (Saccharomyces cerevisiae) 12.1 Mb 6 600 Escherichia coli K12 4.6 Mb 4 200 Helicobacter pylori 1.7 Mb 1 590 Haemophilus influenzae 1.8 Mb 1 738 Human immunodeficiency virus 9.1 Kb 9 Epstein Barr virus 172 Kb 80

aSizes of genomes and the estimated number of genes is very approximate and varies between references.

Figure 1.7 provides an informative compari- development, differentiation and cellular iden- son by J Mattick and colleagues [9]. This shows tity. Both genetic and epigenetic abnormalities in that the protein-coding DNA sequence in a regions of the genome coding for ncRNAs lead wide range of organisms is a poor indicator of to various disorders, as discussed below [10]. the complexity of an organism, with some nota- ble outliers such as rice and a worm. In con- Housekeeping ncRNAs trast, the ratio of non-coding to total genomic DNA These are constitutively expressed and are gives a much better indication of complexity. responsible for many day to day cellular activi- Establishing the functions of non-coding ties. Included here are the tRNAs, rRNAs, RNAs is still work in progress, but there is snRNAs, snoRNAs and RNAs involved in increasing evidence that they play key roles in activities such as telomere maintenance (Table many steps of gene expression, including tran- 1.6). The reader will note a discrepancy in ter- scription, post-transcriptional modifications minology with some confusion as to whether and chromatin modeling. The latter is particu- the tRNAs and rRNAs are ncRNAs. The con- larly relevant to epigenetics (Chapter 2). The fusion continues in the broader concept of integrity of ncRNAs has been linked to normal miRNAs (micro RNAs) which include many of

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 17

A 1

0.8

0.6

0.4

0.2 Ratio of noncoding to total genomic DNA

0 s r a 0 80.00 B

40.00

CDS (Mb) 0.00 Gallus gallus Bombyx mori Fugu rubripes Homo sapiens Mus musculus Pan troglodytes Candida glabrata Ustilago maydis Aspergillus oryzae Ciona intestinalis Trypanosoma cruz i Neurospora Crass a Typanosoma brucei Yarrowia lipolytica Anopheles gambiae Aspergillus nidulans Magnaporthe grisea Arabidopsis thaliana Eschelichia coli K-12 Aspergillus fumigatus Tetraodon nigroviridis Kluyveromyces lactis Chaetomium globosum Fusarium graminearum Paramecium tetraureli a Plasmodium falciparum Plasmodium yoelii yoeli Stagonospora nodorum Entamoeba histolytica Debaryomyces hansenii Cryptosporidium parvu m Eschelichia coli 0157:H 7 Encephalitozoon cuniculi Caenorhabditis brigsae Drosophila melanogaster Dictyostelium discoideum Caenorhabditis elegan s Tetrahymena thermophila Cyanidioschyzon merola e Cryptococcus neoformans Thalassiosira pseudonana Saccharomyces cerevisiae Schizosaccharomyces pombe Myxococcus xanthus DK 1622 Phanerochaete chrysosporiu m Oryza sativa L. ssp. Japonica Burkholderia xenovorans LB400 Pseudomonas aeruginosa PAO 1 Streptomyces avermitilis MA-4680 Bradyrhizobium japonicum USDA 110

FIGURE 1.7 Comparing the complexity of the haploid genome across many species. (A) The percent of ncDNA (the ratio between total bases of non-protein-coding DNA to the total bases of genomic DNA) per sequenced genome across prokaryotes and eukaryotes. (B) The amount in Mb (megabases) of protein-coding sequence (CDS) per genome for species ranked in (A). Colors: black – 4 largest prokaryote genomes and 2 well known bacterial species; gray – single celled organ- isms; light blue – organisms that are both single and multicellular depending on life cycle; blue – multicellular organisms; green – plants; purple – nematodes; orange – arthropods; yellow – chordates; red – vertebrates. Taken from Figure 1 in the article by Taft RJ et al. The relationship between non-protein-coding DNA and eukaryotic complexity. BioEssays 2007; 29: 288–299 [9]. Reproduced with permission from the publisher Wiley Periodicals Inc.

MOLECULAR MEDICINE 18 1. Genes to Personalized Medicine the above species. More recently, long ncRNAs RNAi is found in plants, some fungi, worms have been described. For historical purposes and animals including humans and is consid- tRNAs and rRNAs are considered as a separate ered to function as: group, although they could also be included in 1. A primitive immune system to protect the ncRNAs. Definitions and functions for vari- against the intrusion of dsRNA-containing ous ncRNA species are provided in Table 1.8. species, particularly viruses and transposons; Regulatory ncRNAs 2. Transcriptional gene silencing, and 3. Post-translational regulation of cellular Unlike the housekeeping RNAs, the regu- genes via a variety of mechanisms including latory ncRNAs demonstrate tissue-specific epigenetic effects. expression. There are no steps in the chroma- tin modeling, transcription or translation path- Exogenously produced siRNA species have ways that are not modulated or influenced by been tried in genetic therapies (Chapter 8). ncRNAs. miRNAs by their action are predomi- They have been added in vitro or in vivo to gen- nantly negative regulators of gene expression; erate gene targeted knockdowns in research. This i.e. they have tumor suppressor-like effects. produces partial gene inhibition and avoids the Many miRNA genes are co-located in chromo- tedious targeting steps required with knockout somal regions that are implicated in a range of transgenic mice which give an all or nothing human cancers. Over-expression of miRNAs has effect. The siRNA effect is only temporary and also been reported in cancers, i.e. miRNAs seem can be introduced at any stage in the life cycle. to act as oncogenes (more discussion of miRNA This has been particularly valuable in studying and cancer can be found in Chapter 7) [14]. development, and the role of siRNA. RNAi moved into commercial production RNA Interference (RNAi) when it was shown that chemically synthe- Apart from the RNA catalytic activity dem- sized, small, 21–23 unit oligonucleotides could onstrated by ribozymes (Table 1.8), a final high- inhibit specific gene expression in mammalian light in molecular medicine to end the 20th cells. The novelty and applications of RNAi century was the discovery of yet another func- were quickly recognized with the award- tion of RNA. This was called RNA interference ing of a Nobel Prize to A Fire and C Mello (RNAi) or RNA mediated gene silencing. It in 2006, only eight years after their original involves a double-stranded RNA species that publications. can degrade mRNA. RNAi is mediated through: More recently, RNAa (RNA activation) has been described in mammalian cells. Small 1. siRNA (small interfering RNA) – small dsRNA molecules have been shown to tar- dsRNA species that degrade mRNA, and get gene promoter regions and so activate 2. miRNA (micro RNA) – small dsRNA species sequence-specific gene expression. The appli- that interfere with translation by imperfect cations and utility of RNAa are yet to be base pairing with mRNA as well as through determined. mRNA cleavage.

Both siRNA and miRNA share common Chromosomes intermediaries, including Dicer and bind- ing to Argonaute proteins to form RISC (RNA Chromosomes are thread like elements in the Induced Silencing Complex) (Figure 1.8). Small cell nucleus. Each chromosome contains a con- amounts of dsRNA have been shown to silence striction called the centromere, which divides a vast excess of target mRNA. chromosomes into short (p for petite) and long

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 19

TABLE 1.8 A plethora of ncRNAs [10–12]. ncRNA Definition and functionsa

Ribozymes Naturally-occurring catalytic RNA species that cleave RNA at specific sequences. Their specificity rests with the hybridizing (antisense) arms located on either side of the molecule’s catalytic domain. Clinical trials using ribozymes involve infections such as HIV or hepatitis C virus and aberrant gene expression in cancers. Constraints with ribozymes are their design which makes production difficult, and susceptibility to degradation by RNAses. microRNAs (miRNAs) One of the small ncRNAs that is double-stranded (ds) and about 20–25 bp in size. It is encoded in the genome. miRNAs occur naturally in a variety of eukaryotes and are considered to play a key role in regulating the expression of 30% of the protein-coding genes. Partially anneal to mRNA and inhibit translation via a non-specific effect, i.e. a single miRNA may target many mRNAs. As well as inhibiting translation, miRNAs can facilitate degradation of mRNA. They are processed through Dicer (Figure 1.8) and expressed in a tissue specific manner. Abnormalities can lead to disease. In humans there are 1 000 miRNA species identified [13]. Nomenclature for miRNAs starts with miR followed by a dash and number, e.g. miR-15a and miR-16-1 are two miRNAs associated with chronic lymphatic leukemia [14]. This class of RNA is being investigated for possible diagnostic and therapeutic uses. small interfering RNAs dsRNA about 21–25 bp; occur naturally in a variety of eukaryotes including plants, some (siRNAs) infectious agents and animals. Considered to be important as a form of protection against foreign DNA such as viruses and transposons. Demonstrate 100% match to complementary mRNA unlike miRNA, and so siRNAs target specific genes. This property has proven useful in the research laboratory to knock out genes. Works by cleaving RNA. Like miRNAs, processed through Dicer. In humans there are hundreds of siRNA species identified. small nuclear RNAs Associated with proteins to form nuclear ribonucleoproteins. Involved with maturation of (snRNAs) mRNA and other cellular functions. small nucleolar RNAs Associated with proteins to form ribonucleoproteins. Involved with maturation of rRNA. (snoRNAs) sdRNAs (sno-derived RNAs) may have regulatory functions. PIWI-interacting RNAs About 24–30 bp in size and Dicer independent. Well characterized like miRNA and siRNA. (piRNAs) There are millions of piRNA species identified which seem to be uniquely expressed in the mammalian germline especially the testis. May have a role in spermatogenesis and unlike siRNA and miRNA can stabilize target mRNA. Transcription initiation Short transcripts located adjacent to transcription start sites. RNAs (tiRNAs) Long ncRNAs (lncRNAs) Non-coding transcripts 200 nucleotides. Involved in regulation of protein-coding genes as well as epigenetic changes. aAbbreviations: ss – single stranded; ds – double stranded. More about the therapeutic applications of ncRNAs is found in Chapter 8.

(q) arms. The centromere can be in the center separation of the sister chromatids so that each of the chromosome or at its ends, and com- cell gets one copy after division. prises several million base pairs made up of At the end of each chromosome is the telomere a 171 bp repetitive sequence called α-satellite (Figure 1.9). This protein-DNA structure com- DNA. When chromosomes replicate in cell divi- prises long stretches of tandem TTAGGG sion, they form identical pairs (sister chroma- repeats. The telomere is important for sealing tids). The centromeres are essential for effective the end of the chromosome and maintaining

MOLECULAR MEDICINE 20 1. Genes to Personalized Medicine

Gene long siRNA siRNAs regulation dsRNA ~22nt viral A defence R D G I O C N RNAi E A R U T E Post- hairpin miRNAs transcriptional miRNA ssRNA ~22nt gene regulation

FIGURE 1.8 RNA interference (RNAi). siRNA: Long double-stranded (ds) RNA can come from a number of sources including dsRNA viruses infecting cells. When the cell recognizes dsRNA it uses an RNase enzyme called Dicer to digest it. This produces a number of small dsRNA species about 21–25 bp in size. These then interact with a protein complex called Argonaute with has endonuclease activity. One of the two strands is removed leaving the siRNA’s antisense strand which can bind to the complementary sequence in mRNA leading to the latter’s degradation. miRNA: A similar process follows with miRNAs (micro RNAs). These are small (about 20–25 bp) non-coding double-stranded RNA species that are derived from hairpin precursor RNAs (hairpins are formed by RNA folding on itself). The miRNAs do not have exact complemen- tarity to mRNA species (in contrast to siRNAs) and so they do not cleave mRNA like siRNA, but appear to regulate gene activity via inhibition of translation through non-specific binding to the 3 untranslated ends of genes. Some miRNAs with complete complementarity for mRNA will degrade it directly. Like siRNAs, the miRNAs are cleaved (at the hairpin loop) by Dicer and then attach to Argonaute proteins before exerting their effects on mRNA. its stability and integrity. Without a telomere, chromosomes demonstrate light and dark bands. each round of DNA replication would result in The light bands identify euchromatin; which gaps at the end of the chromosome. Telomerase is loosely packed DNA that contains actively solves this by synthesizing a new telomere expressing genes. The dark bands are the hetero- structure, thereby avoiding loss of genetic mate- chromatin, which is tightly packed DNA that is rial [15]. Shortening of the telomere may lead transcriptionally inactive. Heterochromatin is to apoptosis (cell death), arrest of cell prolif- largely composed of repetitive DNA including eration and aging (Chapter 7). Mutations in the the centromeres and telomeres. telomere are associated with rare but serious disorders (Table 1.9). Cytogenetics Chromosomes contain both DNA and his- The study of chromosomes is called cyto­ tone protein. This combination is called the genetics. A karyotype describes an individual’s chromatin. In the nucleus, the chromosomes are chromosomal constitution. It was only in 1956 packed tightly, which allows a large amount that the human diploid chromosome number of DNA to be located within a small space. was shown to be 46, and during the 1970s, Packing also plays a role in gene regulation, as methods were developed to distinguish bands will be discussed in Chapter 2. When stained, within individual chromosomes. Each of the 44

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 21

TABLE 1.9 some diseases associated with telomere dysfunction [15].

Disease Clinical features

Dyskeratosis congenital This is the most dramatic manifestation of telomere dysfunction producing very short telomeres. Clinically there are dystrophic nails, patchy skin hyperpigmentation and oral leukoplakia. Bone marrow failure eventually develops leading to fatal aplastic anemia. Other organ systems may be involved with a particularly serious complication being pulmonary disease. One causative gene on the X chromosome is DKC1 explaining why most cases occur in males. Patients have a 11-fold greater risk of tumor developing. Aplastic anemia Most mutations are thought to reside in telomerase genes rather than a gene such as DKC1. About 10% of aplastic anemia cases have short telomeres. Pulmonary fibrosis Around 15% of patients with familial idiopathic pulmonary fibrosis have telomerase mutations. Many patients have telomere shortening without any detectable mutations in telomerase. Patients with pulmonary fibrosis can have liver cirrhosis suggesting a common pathway. Cancer Apart from the risk of tumors with dyskeratosis congenita, association studies based on telomere genes suggest an increased risk for tumors involving skin, lung, bladder, prostate and cervix. Degenerative disease Aging and telomeres will be discussed in Chapter 7. Shorter telomeres in some studies have been found more often in those with poor prognosis heart disease. human autosome chromosomes and the X or becomes 7q31.3 where the .3 defines the sub- Y sex chromosomes can now be counted and band (Figure 1.9). characterized by banding techniques. The most In the early 1980s, the development of fluo- common of these is G-banding, which involves rescence in situ hybridization (FISH) allowed even trypsin treatment of chromosomes followed greater resolution than was possible by chro- by staining with Giemsa. G-banding produces mosomal banding. This technique combines a pattern of light and dark staining bands for conventional cytogenetics with DNA probes, each chromosome (Figure 1.9). The banding allowing single-stranded DNA to anneal to its patterns, the size of the chromosome and the complementary single-stranded sequence in the position of the centromere enable the accurate genome. In the case of FISH, the genome is not identification of each individual chromosome. isolated DNA but chromosomes on a metaphase The short and long arms of a chromosome spread. Resting (interphase) chromosomes can are divided into regions which are marked by also be studied with FISH. The potential to use specific landmarks. Regions comprise one or a number of DNA probes, each labeled with a more bands. Regions and bands are numbered different fluorochrome, in the same procedure from the centromere to the telomere along means that separate loci can be identified, com- each arm. Each band will therefore have four parisons can be made and relationships to the descriptive components. For example, the cystic centromere and telomeres established. Probes fibrosis locus on chromosome 7q31 defines a can be purchased that assign different colors band involving chromosome 7, on the long arm to the chromosomes, thereby identifying them at region 3 and band 1. Additional information more easily by their unique color – known as is available from higher resolution banding chromosome painting. With FISH, genes can be techniques which enable sub-bands to be iden- localized on chromosomes and chromosomal tified. In the case of the cystic fibrosis locus this rearrangements can be identified.

MOLECULAR MEDICINE 22 1. Genes to Personalized Medicine

2. Structural alterations, such as translocations, CHROMOSOME 7 deletions, inversions or isochromosomes, and 22 3. Cell line mixtures including mosaicism and 21 chimerism.

15.3 Although the great majority of these abnor- 15.2 p 15.1 malities are detectable by conventional cyto­ 14 genetic approaches, some are not. These are

13 an important application for FISH, which has 12 proven useful in characterizing somatic cell 11.2 chromosomal rearrangements in cancers and 11.1 11.1 hematological malignancies such as leuke- 11.21 11.22 mia (Chapter 7). FISH functions as a bridge between conventional cytogenetics and molecu- 11.23 lar DNA genetic testing, but it is labor intensive and expensive. Array Comparative Genomic 21.1 21.2 Hybridization (aCGH) is a new DNA-based 21.3 technique that is proving useful particularly 22 for investigating intellectual impairment. It is discussed in more detail in Chapter 4. Box 1.1 q 31.1 considers Down syndrome, an example of an 31.2 important chromosomal disorder. 31.3

32 HUMAN GENOME PROJECT 33 34 35 The Human Genome Project was a scientific tour de force because of the technological chal- 36 lenges that had to be overcome, and the bene- fits both planned and unexpected that emerged (Box 1.2). It also demonstrated how scientists FIGURE 1.9 Banding patterns for human chromosome 7. throughout the world could work together to The individual bands are designated by numbers. The short bring about a dream that many considered to and long arms are shown by p and q respectively; the cen- tromere by a green triangle and the telomeres by red trian- be impossible. An important private/public gles. An arrow marks position q31.3. partnership model was developed, demonstrat- ing how different views and skills could be har- Chromosomal Abnormalities nessed for gene discovery. The consequences of the Human Genome Project will influence Each somatic cell contains two sets of chro- medical practice and the conduct of medical mosomes inherited from the parents. Humans research for many years to come. Today, there have 22 sets of autosomes and two sex chro- are many ambitious multi-centered research mosomes, giving a total of 46 chromosomes. studies underway that are modeled on the Chromosomal abnormalities include: Human Genome Project (Table 1.10). 1. Numerical or aneuploidies (monosomy, The term Human Genome Project is actually a trisomy), polyploidies (triploidy, tetraploidy); misnomer, since the genomes of a set of model

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 23

BOX 1.1 DOWN SYNDROME. An important genetic disorder on chromosome Although there has been an improvement in 21 is Down syndrome, which occurs in about 1 in survival for children with Down syndrome, they 750 live births [16]. The phenotype includes: still have a life expectancy that is shortened by 10–20 years, particularly for females. The asso- 1. Dysmorphic changes which can vary between ciation between Down syndrome and increased patients; maternal age is well known with the age effect 2. Mental retardation; beginning at around 30–35 years of age. The rea- 3. Neurologic problems including son for the maternal age effects remains poorly neuropathology, hypotonia in newborns understood. Cytogenetic abnormalities in Down and infants, and Alzheimer disease; syndrome leading to triplication of part or whole 4. Congenital heart disease; of chromosome 21 include: 5. Leukemia, and 6. Immunologic defects. 1. Free trisomy ~95% (Figure 1.10), 2. Translocations ~5%, and

FIGURE 1.10 A human karyotype (47,XX) illustrating G-banding, female sex and Down syndrome (trisomy 21). The karyotype shows an additional chromosome 21 in a female. Note the light and dark bands on the chromo- somes called G-banding. Karyotype provided by Dr Melody Caramins, Genetics Laboratory Services, Prince of Wales Hospital, Sydney, Australia.

MOLECULAR MEDICINE 24 1. Genes to Personalized Medicine

BOX 1.1 (cont’d)

3. About 2–4% of cases of free trisomy 21 also for chromosome 21 (the homologous chromosome have mosaicism for a trisomy and normal in mice is 16). These have shown that the region cell lines. distal to the SOD1 gene (mutations in which cause some rare forms of motor neuron disease) is critical Most trisomy cases involve an additional mater- for developing behavioral and learning abnormali- nal chromosome 21 that has arisen by non-disjunc- ties seen in this disorder [16]. However, despite tion due to meiotic I or II errors, and this effect is having mouse models, as well as the existence of age dependent. Trisomies resulting from transloca- intensive studies of patients with Down syndrome, tions or from mitotic errors are not age dependent our understanding of its molecular basis remains and can involve the maternal or paternal chro- poor. At present, the sequence data for the long mosomes. For reviews on meiosis or mitosis see arm of chromosome 21 are being re-annotated with references [17,18]. Animal models have made it particular interest in ncRNA genes. possible to develop various segmental trisomies

organisms including mouse, fruit fly, various maps of the human genome (Chapter 2). These microorganisms, a worm, a plant and a fish were were tedious and time consuming to make, but all included in the work. The model organism genes could be found, and segments of DNA work or comparative genomics was considered were sequenced. The distance between mark- necessary for a complete understanding of the ers on a genetic map is defined as a centimorgan human genome, since the same genes are found (cM) with 1 cM equal to ~1 Mb. An initial aim in all organisms allowing experimentation to of the Human Genome Project was to produce facilitate our understanding of gene function. a genetic map to cover the entire genome with Other components were also added DNA markers that were 1 cM apart. Each of including: the DNA markers generated would require a unique identifier, and for this, the concept of 1. Consideration of ethical, legal and social sequence tagged sites (STSs) was proposed. This issues (usually abbreviated to ELSI) meant that sequencing of DNA markers would particularly privacy, confidentiality, be required. Each marker would then be identi- stigmatization or discrimination; fied by the part of its sequence that was unique. 2. The importance of educating the public and From genetic maps it was possible to construct professionals about the Human Genome physical maps, so that the distance between Project, and DNA markers could be determined in absolute 3. Gene discovery which was not an early goal terms such as Kb or Mb. This was a mammoth of the project but soon added. task since it became necessary to characterize entire regions of the genome on the basis of over- Goals lapping DNA clones that would ultimately need to be sequenced. This strategy, which was fol- The Human Genome Project had a number lowed by the publicly-funded Human Genome of goals (Table 1.11). One involved the construc- Project effort, contrasted with the approach that tion of comprehensive genetic and physical was subsequently adopted by the commercial

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 25

BOX 1.2 THE HUMAN GENOME PROJECT.

The US Department of Energy (DOE) was a potential for a HGP. However, not all scientists leading proponent of the Human Genome Project were unanimous in their enthusiasm, and there (HGP) in 1987, because of a long standing research was considerable misapprehension that the work interest in the effects of nuclear weapons includ- involved was not research in its purest sense since ing DNA mutagenesis. DNA sequencing was it was not hypothesis driven, but data gathering. critical to understanding changes in DNA. In The costs involved were also a worry, particularly the mid 1980s, DNA sequencing to detect muta- if funds for more traditional research activities tions in DNA was technically difficult and only a were diverted to the HGP. Nevertheless despite few selected genes had been studied in this way. these concerns, the HGP was initiated in late 1990 Most of the genes in the human genome had not with planned completion by 2005 and a $3 billion been discovered, and the great majority of the budget. Politically, the HGP promised both health 3  109 base pairs making up the human haploid and wealth outcomes. Health would come from genome did not contain gene sequences. This medical benefits, and wealth would be gained was then called junk DNA and, was not a tar- from technological developments leading to eco- get for DNA sequencing. Therefore, vast tracts of nomic growth and job creation. D Smith, then DNA remained unexplored and the technology to Director of the DOE’s Human Genome Program, sequence these areas was not available. No known described the HGP as developing an infrastruc- facility was big enough to take on the mammoth ture for future research. In reply to the potential for task being proposed. Despite what appeared to shrinking research funds because moneys were be insurmountable obstacles, scientists gener- going to the HGP, he made the prescient com- ally felt that the HGP was feasible, and, in 1988 ment that following the HGP individual investiga- the US Congress funded both the DOE and the tors would do things that they would never be able to NIH (National Institutes of Health) to explore the do otherwise.

sector, as discussed below. For the above to motion to give individuals a sound knowledge work, new DNA sequencing technologies were of genome research methodologies. The skills required and they needed to be more efficient required were not only in molecular biology, and cheaper. The work of constructing genetic but also included computer science, physics, and physical maps was undertaken by many dif- chemistry, engineering and mathematics. To ferent laboratories around the world. expand the pool of researchers and resources, Another goal was directed to bioinformat- funding and interactions with private industry ics. It was essential to develop computer-based were considered essential. resources, in order to store the vast amount of data generated, in the form of genome maps The 10 Year Project or DNA sequences. A considerable amount of software development would also be needed Years 1 to 5 of the Human Genome Project to allow the databases to be analyzed and sites (1991–1995) could be described as a time of genes identified. Programs were also set in of enthusiasm and steady achievements.

MOLECULAR MEDICINE 26 1. Genes to Personalized Medicine

TABLE 1.10 International research activities modeled on the Human Genome Project.

Initiative Progress

International HapMap Project (USA, UK, Canada, Any two humans are ~99.5% identical and the 0.5% difference may China, Japan and Nigeria). Catalogs genetic explain predisposition to disease. An important difference is the SNP. similarities/differences in humans and The goal of the HapMap Project is to characterize SNPs particularly develops a public database of common human their inheritance in blocks (Chapter 2 and Figure 2.13). About 106 SNPs variants. in the genome can be represented by around ~500 000 thus facilitating whole genome association studies. The project does not identify disease related genes but their haplotypes [19]. Human Variome Project The collecting and Attempts to address problems with databases (DNA mutations, organizing of all genetic variation effecting phenotypes, DNA variants) that have emerged as the volume of human disease. information about genes and disease grows. Without international coordination and the setting of standards the rapidly increasing volume of information will not be efficiently collected and curated [20]. International Cancer Genome Consortium Launched to coordinate large scale whole genome sequencing of 50 13 countries participating in 2012. different cancers. The purpose is to provide a comprehensive overview of the cancers’ genomic, transcriptomic and epigenomic profiles [21]. Related activities include the US Cancer Genome Atlas Project and UK Cancer Genome Project (see also Chapter 7). 1 000 Genome Project. This is a public catalog of genetic variation that is discussed in more detail in the text under Genome Variation [22]. ENCODE project (Encyclopedia of DNA The aim is to map the genome’s functional elements and is discussed in Elements). more detail in the text under Genome Variation [23]. International Human Epigenome Consortium The most recent initiative with the aim of mapping 1 000 reference Launched in 2010. epigenomes including identifying, cataloging and interpreting genome- wide DNA methylation patterns of all human genes in all major tissues. This is a challenge because of the differing methylation patterns between tissues [24].

Researchers in many laboratories constructed The second five years of the project (1996– maps of the genome, and then identified, 2000) were more turbulent. By 1998, the impres- by DNA sequencing, each base in the seg- sive developments in technology, particularly ment they were allotted. Although the USA’s automation, meant the timing of specific goals Department of Energy was an early leading needed to be moved forward. A new estimate for player, it was soon partnered by that coun- the complete sequencing of the human genome try’s National Institutes of Health (NIH), the was 2003. The first success stories involved the main funder of medical research with its vast sequencing of genomes of model organisms; in network of scientists. The NIH subsequently particular, the Haemophilus influenzae genome became the leading public-sector contributor was reported as sequenced in 1995 – a major through its National Human Genome Research achievement. This was soon followed by the Institute led by F Collins. Another influen- sequence for Mycoplasma genitalium, and in 1996, tial body was HUGO (The Human Genome the first eukaryotic genome to be sequenced Organisation). HUGO’s role was to coordinate was that of Saccharomyces cerevisiae. With these international efforts, and facilitate education successes, the momentum for the human and rapid exchange of information. work increased, since it was now evident that

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 27

TABLE 1.11 Components of the Human Genome Project.

Goal Purpose

1 Map and sequence the approximately 3 billion bases in the human genome. 2 Map and sequence the genomes of model organisms including bacteria, yeast, plant, nematode, the fruit-fly, and mouse as an example of a mammal. 3 Identify all genes making up the human genome. 4 Develop software and databases to: (i) Support large scale collections of data, their storage, distribution and access. (ii) Develop tools for analyzing large data sets. This would lead to sophisticated bioinformatics capability. 5 Create training posts particularly in interdisciplinary sciences related to genome research and provide training courses. 6 Transfer technologies and exchange information with the private sector as industry needed to be involved in both technology development and training. 7 Develop flexible distribution systems so that data were quickly transferred to potential users and the community. 8 Address the ethical, legal, and social issues (ELSI) arising from the Human Genome Project and provide education to the public and health professionals. genomes could be completely sequenced, and To do this, Celera adopted a different approach the resulting information was of scientific and to sequencing whole genomes. medical significance. The Celera strategy was controversial and Towards the end of the second five years, the very different to what had been followed to this influence and contribution of the commercial time. This involved a shot-gun approach, which sector grew. This became a source of tension, as bypassed the ordering of genetic or physical those who had worked for many years on the maps. Instead the entire human genome was Human Genome Project held the strong view blasted into small fragments. Each fragment that genomic information and DNA sequenc- was then individually sequenced, and then ing results should be communicated freely, computer software matched the fragments and without delay. This philosophy was at based on overlapping sequences. In effect, a odds with the protection of intellectual prop- giant jigsaw puzzle of DNA sequences was cre- erty through patenting. A high profile example ated, and computer power was used to align of commercialization came when Celera, spon- the correct overlapping parts together. The sored by another commercial company Life company also has free access to many publicly Technologies (then called Applied Biosystems), available DNA sequence databases and these took on the might of the NIH and the world, proved useful in the strategy that it adopted. and its leader J Venter boasted publicly that it had the resources (around 300 of the most Completion modern automated DNA sequencers, backed The challenge from Celera showed how the by a super-computer second only to what was commercial world could make important con- found in the US military) to finish the first tributions to molecular medicine. On the other draft of the human sequence before the NIH hand, it highlighted that this would come at a or other countries, and at a much reduced cost. cost – the availability of and access to future

MOLECULAR MEDICINE 28 1. Genes to Personalized Medicine databases or knowledge might not necessarily a series of vignettes under the title The Human be free. In June 2000, US President Bill Clinton, Genome at Ten [25]. Viewpoints on what hap- flanked by F Collins (NIH) and J Venter (Celera) pened and where the future lies were provided. announced that the first draft of the human The editorial notes that much was promised DNA sequence was now completed, with but, in terms of innovations in clinical care, lit- contributions from both the public and pri- tle was delivered. In a 2010 interview, J Venter vate sectors. Eight months later, the complete commented that the medical benefits emerging sequence of the first haploid human genome from the Human Genome Project were close to was published. Although the Human Genome zero [26]. Project had officially ended in mid 2000 (five Despite the negative comments, the Human years earlier than its anticipated completion), Genome Project has made and continues to the sequence produced was only a draft, and make significant changes in the way patient considerable work remained to ensure that care is delivered. These include: DNA sequencing errors and ambiguities were removed. In April 2003, 50 years after the struc- 1. Health professionals apart from geneticists ture of DNA had first been described, the NIH knew little about molecular medicine when announced the completion of a high quality the first edition of this book was published comprehensive sequence of the human genome. in 1993. Today, the same individuals across Many interesting and unexpected facts many disciplines are using the language of emerged from the human genome sequence, genetics and utilizing DNA genetic testing including: for a range of purposes that impact on decision making; 1. The number of protein-coding genes 2. In the clinics, individuals with a family continued to be reduced, from the earlier history of genetic disease can obtain more  calculated 100 000 to around 20 000. The certainty about risk and in many cases latter number is comparable to what was (50% for autosomal dominant conditions) observed for many of the model organisms; be reassured that a serious risk has been 2. About 1–2% of the human genome contained excluded, by using a DNA test. In these protein-coding genes, and the remainder – cases, expensive and potentially harmful initially called junk DNA – was shown to long term follow-up is no longer needed; be transcriptionally active; 3. Couples who might not have considered 3. The most common polymorphism in DNA having a family because of risks of genetic was the single base change called a SNP, and disorders in their children can now rethink 4. There was a lot of structural variation in the their options using DNA testing at various human genome caused by deletions and times, from preimplantation genetic duplications of various segments. diagnosis to, in the not too distant future, These observations have become important non-invasive prenatal testing using fetal drivers for research, which has been made pos- DNA in maternal blood; sible by the development of faster, more accu- 4. Patients about to start treatment for rate and cheaper analytical platforms. HIV AIDS or epilepsy can have a DNA genetic test to exclude use of drugs that 10 Years On are associated with life-threatening complications in certain individuals, and A decade after the completion of the Human 5. Children with rare genetic disorders that Genome Project, the journal Nature published are devastating to the families involved

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 29

have the option of DNA testing to find a expanded picture. The entire complement of diagnosis. Even if this does not necessarily genetic material is included, giving the poten- lead to treatment options, it provides some tial to understand how complex genetic disor- form of closure because a cause is known. ders arise, and the effects and mechanisms of This is particularly important in intellectual gene-gene interactions and gene-environment disability and syndromal disorders. interactions (Figure 1.11). The earlier concept of molecular medicine Genetics is a low profile clinical disci- as the study of DNA → DNA → RNA → pro- pline, because it does not put pressure on the tein in a single gene or genetic abnormality has Emergency Department or require in-patient evolved into the study of many or all genes beds, and often involves relatively uncommon (genomics), many or all RNA species (transcrip- disorders. In other words, there is less exter- tomics) and many or all of the proteins (pro- nal pressure to implement practices based on teomics) in a particular cell. A new term, the genetic medicine. Progress has not been helped phenome, has also crept into the molecular med- by the way the media and some individuals icine vocabulary. This follows the all theme, have hyped genetic discoveries with exagger- and refers to the total phenotypic characteris- ated claims that have not eventuated, leading tics of an organism, reflecting the interaction decision makers to become wary of long term of the complete genome with the environment. visions based on molecular medicine. Another of the omics becoming more relevant Another reason for the disappointing number to molecular medicine is epigenomics (from of clinical outcomes from the Human Genome epigenetics) (Chapters 2, 4). Project has been identified by research funding At the completion of the Human Genome bodies that are now emphasizing the impor- Project, the DNA sequences from a small tance of translational research to ensure genomic number of model organisms, the human (and other) new knowledge can be more easily genome and numerous human genes were transferred to the clinic. One development from deposited in databases. This has grown rapidly the Human Genome Project that may allow and today there are nearly 4 000 whole genome more effective translation to occur is the new sequences [25]. The real challenge remains the philosophy engendered through omics. Included task of working out where the protein-coding here is the potential for whole genome DNA genes are in these various sequences, their sequencing in clinical care particularly in rela- functions and the roles of the SNVs and CNVs. tion to personalized medicine that will be con- In other words, the DNA sequences need to be sidered later in this chapter and in Chapter 4. annotated, and for this better bioinformatics is needed (see Chapter 3 for further discussion of Omics DNA annotation). The work that has followed from the Human But this is not enough. Hence, research in the Genome Project has been described in vari- post-genome era has also been called functional ous ways, including post-genome or functional genomics. Included in this are transcriptomics – genomics. However, these are not helpful terms the contribution of RNA species; proteomics – the because they do not describe the rapid evo- technology and strategies required to determine lution of the omics philosophy with its focus the function of proteins, and finally systems biol- on measuring everything within a cell, tissue or ogy to try and pull it all together. How this will organism (Table 1.12). Today, the term genetics be accomplished remains to be determined, but is directed to single genes and their associ- novel approaches will be needed. The role of ated disorders, while genomics covers a greatly bioinformatics will be critical. It is likely that the

MOLECULAR MEDICINE 30 1. Genes to Personalized Medicine

TABLE 1.12 some examples of omes and omics – a growing list.a

Omics Definitions

Genomics All the genes in a cell, tissue or organism. Transcriptomics All RNA transcripts in a cell, tissue or organism. Proteomics The total proteins expressed by a cell, tissue, biological fluid or organism. Related terms are peptidomics (all peptides) and post-translational modifications such as glycosylation (glycoproteomics) and phosphorylation (phosphoproteomics). The post-translational modifications have also been called the PTMomics! Metabolomics All small molecule metabolites within a cell, tissue, fluid or organism (see also metabonomics). Metabonomics Subtle differences distinguish this term from metabolomics including the measurement of the metabolic response to various stimuli (metabonomics) while metabolomics follows the omics trend in defining all metabolites. There seems to be consensus that the two are comparable, and metabolomics is now the more popular term. Epigenomics All the epigenetic marks in a given cell, tissue or organism. Pharmacogenomics The use of genome-wide strategies to identify the inherited basis for differences between individuals in their responses to drugs. Glycomics All carbohydrates within a cell, tissue or organism. Lipidomics Comprehensive identification and quantification of all lipid molecular species in a cell, tissue or organism. Metagenomics Genomic analysis of microbial communities in different environments without alteration by culture. Derived from meta analysis – combining all data and genomics. Toxicogenomics Response of the genome to exposure to toxins, e.g. the use of microarrays in toxicology. Kinome The set of protein kinases in the genome. Cocainomics Genome and proteome profiles of brain regions in addiction. Venomics Omics approach to study venoms, e.g. identifying all proteins (proteomics) by mass spectrometry and then using these to screen cDNA libraries to identify all peptide species in a venom. Fluxomics Cell, tissue or organism based measurements of dynamic changes over time. Biolome Whole set of biological entities including DNA, RNA, protein, metabolites and so on. Bibliome All the published literature and related information. Methylome Methylation of gene promoters at CpG sites usually silences gene expression and the genome contains around 28 million such sites. Study of all the methylation patterns in the genome (methylome) can be expected to produce quite diverse patterns. aOmes comes from the Greek meaning all or whole. Ome is used to refer to the totality of a particular object while omics to the field of study, e.g. genome and genomics. Initially the concept of omics was an important new direction in molecular medicine reflecting the new sophisticated platforms that could analyze multiple substances simultaneously. Now getting a little trite with omics being applied to some unusual terms and concepts. Nevertheless, it has generated interest and enthusiasm so it continues to have some merit.

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 31

Drivers Outputs [genes/environment] Global Operations [phenome] 1 2 3 4 [Ome] A BC D

Genome Epigenome

Transcriptome

Proteome Interactome + Metabolome

Inheritance Well being Environment Disease

FIGURE 1.11 Omics. The ome occupies a central coordinating role for gene and environment interactions (left box) and the final product (phenome) in the right box. In the context of this diagram: (1) Genes refer to protein-coding genes as well as other DNA or RNA elements that influence the ome; (2) It is assumed that diseases comprise a mix of genetic and envir­ onmental components with the mix varying as depicted in columns 1 to 4 in the left box; (3) The ome outputs (right box) would normally be well being rather than disease but again there is a mix depicted by A to D, and (4) The processes are dynamic with feedback loops going in both directions.

traditional wet-laboratory approach to molecular 1 000 Genome Project research will give way to predominantly in silico (computer) based strategies for future gene dis- This is an international endeavor launched covery and functional analysis. in 2008, with the purpose of sequencing 1 000 normal individuals from different ethnic groups to produce a comprehensive catalog of human GENOME VARIATION genetic variation. So successful is this project that by 2011 the number sequenced was around The 98% of the human genome previously 2 500 from 14 different populations. The avail- called junk DNA has now been shown to be ability of the most modern DNA sequenc- very dynamic, containing hot spots for recom- ing platforms (NG – next generation DNA bination and mutation that drive evolution- sequencing, which is discussed in more detail ary change, and a vast network of ncRNA in Chapter 4) has allowed this project to be species active in terms of gene regulation and developed. One early and interesting obser- the formation of alternative gene products. vation in relation to CNVs was the number of How is this hotbed of activity being system- gen­­omic rearrangements (deletions, tandem atically studied to provide further insight into duplications and insertions of mobile elements) what makes us human? Following on from that appear to have no clinical consequences. the Human Genome Project there are now a With NG DNA sequencing protocols, it is number of large consortia working towards a usual to sequence with a 30 times (330) cov- better understanding of genomic heterogeneity. erage to be sure that most areas are covered,

MOLECULAR MEDICINE 32 1. Genes to Personalized Medicine and the results are highly accurate. However, an individual through DNA-based testing. This this level of coverage was considered exces- can be: sive for the 1 000 Genome Project, in terms of 1. Predictive long before conventional clinical time and expense and a 34 coverage was used. markers are measurable, and This reduction will be offset by: (1) Combining 2. Useful in the context of identifying risk for results from many individuals, and (2) Exome family members. sequencing which allows greater coverage of all exons (Chapter 4). It is expected that variants So while personalized medicine is not new, that occur at rates as low as 1% of the popula- it is significantly enhanced by the addition of tion will be identified. These variants will range DNA-based information. from single base SNVs to large CNVs [22]. The US President’s Council of Advisors on Science and Technology reported in 2008 that Encyclopedia of DNA Elements interest in personalized medicine stems from its potential for: (ENCODE) Project 1. Improved patient care; The ENCODE project started in 2003 and 2. Disease prevention; was distinctive because it moved from looking 3. Reduction in health costs, and at structural variation to attempting to iden- 4. Stimulating new drug development [28]. tify all the functional elements in the human genome. The pilot phase investigated methods The Council defined personalized medicine that could be used to define functional ele- as tailoring medical treatment to the individual ments in a defined segment (1%) of the human characteristics of each patient. However, treatment genome. Now the whole genome is being would better be considered in a broader context targeted as well as those from some model as it is not restricted to conventional therapeu- organisms (modENCODE) such as Drosophila tics, hence medical management might have been melanogaster and Caenorhabditis elegans. By uti- a more inclusive term (Figure 1.12). lizing both comparative and functional anal- L Hood has proposed P4 medicine; i.e. medi- yses in the model organisms it is expected cal practice based on a Predictive, Personalized, that the functional components of the human Preventive and Participatory approach. The genome will be identified. Ways in which raw philosophy here is to move away from being data (in this case DNA sequence and tran- reactive to a more proactive strategy, particu- scriptional) can be analyzed by bioinformatic larly in preventing disease development or approaches is illustrated in reference [27]. progression. More effective progress will result from ensuring that there is ownership (partici- pation) by the members of the community [29]. PERSONALIZED MEDICINE The community is likely to respond positively to a concept such as P4 medicine. The concept of personalized medicine implies Governments are starting to show interest a new approach; but this is how medicine has in molecular (DNA) medicine, as illustrated by always been practiced. Data (clinical, labora- a 2009 report from the UK’s House of Lords, tory, imaging and so on) are gathered and infor- which made 54 recommendations on how to mation that can be used for decision making is progress genomic medicine. While this report provided in the context of a particular patient. is directed to the type of health service pro- However, molecular medicine adds an addi- vided in the UK, it has recommendations that tional dimension to what can be learnt about are relevant to any health service interested in

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 33

forms of healthcare delivery, genetic practice is not confined to individuals, but involves fam- ily members who share germline DNA and so share risks if mutations in genes are present. Personalized The family connection is often a catalyst to find Medicine out more about particular genetic disorders via the Internet. Similarly, the expanding direct-to- consumer DNA testing market (Chapter 5) pro- vides a source of glossy information leading to Pharmacogenetics growing, and at times unrealistic, expectations Predictive Medicine Pharmacogenomics that important health outcomes will result. The translation of genetic discoveries into Using a DNA test to Using a DNA test to clinical practice has been slow, and to some PREDICT a disease SELECT drugs extent the health professionals need to take will develop in future or predict TOXICITY/EFFICACY some responsibility for this. The rapidly chang- ing landscape in genetics (let alone the even faster changes in genomics or omics) is a chal- lenge to continuing education even for those FIGURE 1.12 Personalized medicine. The potential to working in this field. This will be illustrated use DNA testing to select the right drug and/or the right dose for an individual thereby avoiding the risks for side in Chapter 3 under pharmacogenetics, which effects or optimizing therapeutic effects is the key feature of shows that about 10% of drugs approved by personalized medicine. However, it should not be forgotten the FDA now contain information about genetic that a unique property of DNA testing is the ability to pre- DNA tests, yet very few of these are being dict risk or disease development well into the future. This taken up by health professionals. knowledge can be used for earlier therapeutic interventions or life style choices including options for preventive meas- ures (Chapter 3). Driving Change in Clinical Practice How does one change behavior, so that the implications of molecular medicine for planning for the future (Table 1.13). However, healthcare are implemented effectively and in as with all reports, the preparation is the easy a timely way? Education is critical, and must step. Dissemination and then getting appropri- include undergraduates as well as postgradu- ate actions remain the key challenges. ates. Equally important is the availability of computer-based tools that facilitate use of this Education and Resources information in clinical practice. Another and much less satisfactory driver for change is A continuing theme in Molecular Medicine medico-legal pressure. A sobering, and even is the importance of educating health profes- gloomy, 2011 review of personalized medicine sionals. A challenge in education is terminol- from the legal perspective describes a volatile ogy as has just been illustrated with terms environment based on a mix of increasing pub- such as personalized medicine, P4 medicine, lic awareness and expectations, rapid changes genomic medicine and molecular medicine. in technology, uncertainties about the ben- Educating the community may now be less of efits of personalized medicine as well as what an issue, since the Internet provides access to technologies are ready for clinical use, and information that previously was the exclusive gaps in the knowledge of health professionals. domain of health professionals. Unlike other Although are traditionally the focus

MOLECULAR MEDICINE 34 1. Genes to Personalized Medicine

TABLE 1.13 Recommendations from the 2009 UK Report on Genomic Medicine [30].

Theme Recommendations (numbers with summaries)

Framework for translational research in the UK. 1,2. Translational research in genomic medicine; 3, 4. Reducing burden NHS  National Health Scheme. for conducting clinical trials; 2, 5. Promoting collaborative translational research; 6. Research to demonstrate the clinical utility and validity of genomic tests; 7. Evaluation of the clinical utility and validity of genomic tests for use within the NHS; 8. Evaluation and regulation of genetic and genomic tests developed outside of the NHS; 9, 10. Incentives to develop stratified uses of ; 11. Intellectual property rights; 12. Co-development and evaluation of stratified uses of medicines and genetic tests; 13. Encouraging innovation. Implementation and service delivery through 14. Overview; 15. Integration of genetics into mainstream practice; the NHS 16. Provision of genetic services in the NHS; 17. Commissioning of genetic services; 18. Commissioning across the NHS; 19, 20. Uptake of pharmacogenetic tests in the NHS; 21. Provision of laboratory services. Computational use of medical and genomic 22. Emergence and growth of bioinformatics; 23. Linking informatics data: medical informatics and bioinformatics with electronic health records; 24. Developing expertise in bioinformatics; 25. Immediate informatics needs of NHS Regional Centers and laboratories. Public engagement and ethical, social and 26–28. Public engagement; 29. Data sharing; 30–33. Data Protection Act legal issues 1998; 34. Genetic discrimination; 35–37. Life insurance; 38, 39. Direct-to- Consumer Tests. Training, education and workforce planning 40. Medical students; 41–46. Physicians in primary and secondary care; 47. Genetic education for nurses; 48–50. Genetic counseling; 51. National leadership and the role of the National Genetics Education and Development Centre; 52–54. Workforce planning. of medico-legal challenges, other health profes- 3. Appropriate informed consent in the context sionals are engaged in personalized medicine of a shift to a patient-centered focus for and will be at risk [31]. consent, and This 2011 review identifies the important 4. Failure to warn. issues in personalized medicine that must be addressed to ensure that health professionals Traditionally the health professional’s duty rather than the courts set the standards of care. of care is to a patient. This becomes blurred These include: with genetics, since family members share genes and hence risks. The predictive nature of 1. Failure to recognize genetic risk including personalized medicine has already been high- relatively rare conditions or associations. lighted as a key and unique feature in terms of Linked to this is the importance of healthcare potential. It also leads to complex appropriate and timely referrals; issues around privacy, duty of care and confi- 2. Loss of chance which means a health dentiality [31] (Chapter 10). professional’s lack of knowledge or advice Roadmap has reduced a patient’s opportunity for mitigating a wide range of genetic-related A 2010 perspective by the leaders of the NIH consequences; (F Collins) and the FDA (M Hamburg) is titled

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 35

The Path to Personalized Medicine [32]. In this sequencing options that have excited many in summary, the two organizations make a pub- molecular medicine in the past year or so as new lic commitment to developing personalized platforms churn out faster and cheaper whole medicine. A number of strategies are identified genomes. Will there be a new paradigm of clini- involving research and the appropriate regula- cal care that moves away from a focus on indi- tory environment. vidual genes and goes directly to personal whole The perspective notes the importance of genomes? This is discussed in Chapters 4 and developing therapies for rare and neglected dis- 10. At this stage all that is certain is the costs for eases and the value of tissue banks (Chapter 10). whole genome DNA sequencing will continue to Translational science is repeatedly highlighted, so fall, until they are $1 000 or cheaper. Other issues that findings in basic research can be transferred requiring resolution include the type of eHealth more rapidly into clinical practice. Observations infrastructure needed and whether the health are made about pharmacogenetic tests that could professionals are ready. In this mix the direct- be used to guide therapy (Chapter 3). to-consumer market is likely to flourish, as con- One of the recommendations is for a US sumers seek more information about their health national genetic testing registry to provide infor- outside the traditional and often bureaucratic mation about genetic (DNA) tests to health pro- healthcare structure (Chapter 5). fessionals and the public. A comparable register As a follow-up to the 2003 article A vision for is already available in the UK through the UK the future of genomics research, the US National Genetic Testing Network. Surprisingly, little was Human Genome Research Institute published mentioned about the education of health profes- its next (2011) vision of genomic medicine from sionals, particularly those now at university or DNA base pairs to the bedside [33]. Five domains at early stages in their careers. This must surely of genomic research are identified, and for each be a priority. of these, key activities are noted (Box 1.3). The unknown in all the discussion about The remaining chapters provide overviews personalized medicine is the whole genome on how molecular medicine can contribute

BOX 1.3 CHARTING A COURSE FOR GENOMIC MEDICINE FROM BASE PAIRS TO BEDSIDE [ 3 3 ] .

Domain 1: Understanding the structures of databanks of information and tissue resources genomes. Little is said about this first domain that are accumulating as part of the omics push. because it comprises the work of the Human Although DNA sequencing capabilities have been Genome Project. Nevertheless, as already noted, extended to whole genomes, there still remain the conclusion of the Project produced more regions that are difficult to sequence and these are questions than answers and further work is the next target. Sequencing alone will not give the needed with a particular focus on the epigenome complete answer and assays to determine func- and RNA. tion remain a limitation. Understanding gene- Domain 2: Understanding the biology of genomes. interaction networks and the role of non-coding The major issues here include databases and DNA and RNA are important priorities.

MOLECULAR MEDICINE 36 1. Genes to Personalized Medicine

BOX 1.3 (cont’d)

Domain 3: Understanding the biology of disease. pipeline, reduce the number of drug side effects Ambitious questions about genes, the environ- and improve efficacy of drug treatments. The ment and epigenetic factors in disease causation value added effects of genomic science include need to be addressed, including both germline better evidence, more transparency leading to and somatic DNA changes. Genetic variation an informed public and greater access to all. must be understood. Better tools for annotating Prevention rather than treating established dis- genetic variants and relating these to the rele- ease assumes a higher profile. vant phenotypes are needed. Research ethics and Domain 5: Improving the effectiveness of health- oversight including the type of consent suited to care. The key issues with the last domain are genomics remain challenges. the importance of the electronic health record Domain 4: Advancing the science of medicine. to handle genomics data, the demonstration of Genetic DNA testing will streamline diagnostics effectiveness and education of the broader com- and allow new classifications of cancer based on munity to ensure there is maximum engage- DNA changes. Pharmacogenetics and pharmaco­ ment. Novel methods for healthcare delivery genomics will reinvigorate the drug delivery will be needed.

to delivering personalized healthcare across [6] SNP database. www.ncbi.nlm.nih.gov/SNP/ a broad spectrum from basic research ↔ clini- [7] Zhang F, Gu W, Hurles ME, Lupski JR. Copy number ↔ ↔ variation in human health, disease and evolution. cal research the clinic or bedside indi- Annual Reviews in Genomics and Human Genetics viduals and families ↔ community. Discussion 2009;10:451–81. includes how governments and professional [8] Feero WG, Guttmacher AE, Collins FS. Genomic medi- organizations might respond in terms of policy cine – an updated primer. New England Journal of and regulations as well as the ethical, legal and Medicine 2010;362:2001–11. [A larger number of com- parative genome sizes may also be found in http:// social issues that must also include how devel- users.rcn.com/jkimball.ma.ultranet/BiologyPages/G/ opments in molecular medicine are accessible GenomeSizes.html which is part of an online biology to all in the global community. textbook written by John W Kimball. A comprehensive list is found in Wikipedia http://en.wikipedia.org/ wiki/List_of_sequenced_eukaryotic_genomes]. References [9] Taft RJ, Pheasant M, Mattick JS. The relationship between non-protein-coding DNA and eukaryotic [1] Ussery DW. DNA structure: A-, B- and Z-DNA Helix complexity. BioEssays 2007;29:288–99. Families. In: Encyclopedia of Life Sciences. Chicester: [10] Szymanski M, Barciszewski J. Noncoding RNAs in John Wiley and Sons, Ltd.; 2002. biology and disease. In: Encyclopedia of Life Sciences. [2] Animation of DNA replication from Howard Hughes Chicester: John Wiley & Sons, Ltd.; 2009. Medical Institute BioInteractive site. www.hhmi.org/ [11] Ghildiyal M, Zamore PD. Small silencing RNAs: biointeractive//animations.html an expanding universe. Nature Reviews Genetics [3] Kim E, Magen AL, Ast G. Different levels of alter- 2009;10:94–108. native splicing among eukaryotes. Nucleic Acids [12] Taft RJ, Pang KC, Mercer TR, Dinger M, Mattick JS. Research 2007;35:125–31. Non-coding RNAs: regulators of disease. Journal of [4] Ohshima K, Okada N. SINEs and LINEs: symbi- 2010;220:126–39. onts of eukaryotic genomes with a common tail. [13] The microRNA database. www.mirbase.org/ Cytogenetics and Genome Research 2005;110:475–90. [14] Croce CM. Causes and consequences of microRNA [5] Database of genomic variants. http://projects.tcag. dysregulation in cancer. Nature Reviews Genetics ca/variation/ 2009;10:704–14.

MOLECULAR MEDICINE 1. Genes to Personalized Medicine 37

[15] Calado RT, Young NS. Telomere diseases. New [27] Alexander RP, Fang G, Rozowsky J, Snyder M, England Journal of Medicine 2009;361:2353–65. Gerstein MB. Annotating non-coding regions of the [16] Villar AJ, Epstein CJ. Down syndrome. In: genome. Nature Reviews Genetics 2010;11:559–71. Encyclopedia of Life Sciences. Chicester: John Wiley & [28] US President’s Council of Advisors on Science and Sons, Ltd.; 2005. Technology report 2008 on personalized medicine. [17] Tease C, Hulten MA. Meiosis. In: Encyclopedia of Life www.whitehouse.gov/files/documents/ostp/PCAST/ Sciences. Chicester: John Wiley & Sons, Ltd.; 2006. pcast_report_v2.pdf [18] Nigg EA. Mitosis. In: Encyclopedia of Life Sciences. [29] Tian Q, Price ND, Hood L. Systems cancer medicine: Chicester: John Wiley & Sons, Ltd.; 2001. towards realization of predictive, preventive, per- [19] International HapMap Project. http://hapmap.ncbi. sonalized and participatory (P4) medicine. Journal of nlm.nih.gov/index.html.en 2012;271:111–21. [20] International Human Variome Project. www [30] UK’s House of Lords report on genomic medicine. .humanvariomeproject.org/ www.publications.parliament.uk/pa/ld200809/ [21] International Cancer Genome Consortium. www.icgc. ldselect/ldsctech/107/107i.pdf org/ [31] Marchant GE, Campos-Outcalt DE, Lindor RA. [22] 1 000 Genome Project. www.1000genomes.org/page. Physician liability: the next big thing for personalized php?page=home medicine? Personalized Medicine 2011;8:457–67. [23] ENCODE project. www.genome.gov/10005107 [32] Hamburg MA, Collins FS. The path to personal- [24] Human Epigenome Project. www.epigenome.org/ ized medicine. New England Journal of Medicine index.php 2010;363:301–4. [25] The human genome at ten. Nature 2010;464:649–50. [33] Green ED, Guyer MS, National Human Genome [26] Der Spiegel interview with Craig Venter. www. Research Institute. Charting a course for genomic spiegel.de/international/world/0,1518,709174,00. medicine from base pairs to bedside. New England html Journal of Medicine 2011;470:204–13.

Note: All web-based references accessed on 7 Feb 2012.

MOLECULAR MEDICINE CHAPTER 2 Genes, Environment and Inheritance

OUTLINE

Introduction 39 Complex Genetic Inheritance 63 Common Health Issues 63 Mendelian Genetic Inheritance 40 Gene Discovery 65 Gene Discovery 41 Autosomal Dominant Disorders 43 Epigenetic Inheritance 69 Autosomal Recessive Disorders 48 Epigenetic Modifications 70 X-Linked Disorders 50 Clinical Relevance 71 Imprinting 74 Other Forms of Genetic Inheritance 54 Gene-gene Interactions 54 Somatic Cell Genetics 76 Uniparental Disomy 57 Single Gene Somatic Disorders 77 Mosaicism, Chimerism 58 Complex Somatic Disorders 78 Mitochondrial Inheritance 60 References 78 Chromosomal Disorders 62

INTRODUCTION The term genetics when applied to a condi- tion implies that it is inherited, i.e. the under- Genetic diseases can be classified by their lying mutation is found in germ cells. This is mode of inheritance, into autosomal dominant, also known as germline or constitutive genetics. autosomal recessive and X-linked. A molecular However, genetics has become more complex, classification is different, and would include: with the knowledge that DNA changes are found in a number of non-inherited (sporadic) l Mutations in single genes disorders. These acquired mutations in DNA, l Complex genetic abnormalities which affect only the somatic (non-germ) cells, l Gene-gene (G x G) interactions are found in a range of both solid and hema- l Gene-environment (G x E) effects tologic cancers. Since DNA is involved, these l Epigenetic changes have a genetic component in their pathogen- l Non-heritable DNA mutations in somatic cells. esis, although the abnormality is not heritable.

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00002-5 39 © 2012 Elsevier Inc. All rights reserved. 40 2. Genes, Environment and Inheritance

An area of contention particularly for normal constitution of alleles at a specific locus, i.e. traits and the more common medical disorders the two haplotypes. is the relative contributions of genetic effects The phenotype reflects the recognizable and the environment to the final result (or phe- characteristics determined by the genotype notype – defined below). This chapter starts and its interaction with the environment. with predominantly genetic-based disorders, A phenocopy is an environmentally induced followed by examples representing genetic/ phenotype that can resemble one associated environmental interactions and concludes with with a genetic disorder. the environment being the major contributor to An individual is homozygous if both alleles at the clinical picture. a locus are identical and heterozygous if the alleles are different. Autosomal inheritance involves traits that MENDELIAN GENETIC are encoded by the 22 pairs of human INHERITANCE autosomes. X-linked inheritance refers to genes located Mendelian Inheritance in Man is a compen- on the X chromosome. dium of human genes and genetic disorders The products of both normal (wild-type) that has evolved into an encyclopedia of gene alleles at a particular locus need to be recessive loci. The first edition was published in 1966 non-functional in a disorder, e.g. with a total of 1 487 entries. In late 2011, it had hemochromatosis. dominant 20 699 entries. In 1987, the compendium became On the other hand, a disorder available online as OMIM (Online Mendelian results if only one of the two wild-type Inheritance in Man) [1]. Current entries in alleles is mutated, e.g. Huntington disease. OMIM relate to autosomal disorders in 94% The study of twins has proven useful in esti- of cases, X-linked in 6%, Y-linked disorders in mating the relative contributions of genetic 0.3% and mitochondrial DNA related disorders versus environmental effects in normal traits in 0.3%. The majority of entries involve single or diseases. Monozygotic (MZ) or identi- genes – which are the topic of the next sec- cal twins develop following the division of a tion. They have as their main feature a highly single fertilized ovum. Therefore, each twin penetrant genetic effect responsible for the starts with the same DNA content, although underlying phenotype. Because of this marked they may not be exactly 100% identical geneti- genetic effect it is possible to follow the disease cally because of post-zygotic changes in the through a family by drawing a pedigree (or DNA including epigenetic effects that can family tree). influence gene expression. In contrast, dizy- Terminology used in this chapter includes gotic (DZ) twins result from the fertiliza- the following: tion of two ova by different sperm. Thus, on Different forms of a gene at a locus are called average, DZ twins share half of their nuclear alleles. genes – which is comparable to non-twin sib- The haplotype refers to a set of closely lings. Generally, the environment shared by DZ linked DNA markers at one locus which are and MZ twins is similar. Therefore, twins are a inherited as a unit. popular model for assessing the relative contri- The genotype is the genetic (DNA) make-up butions of genes and environment in disease. of an organism. In the present context, This approach can be illustrated in dementia genotype would also refer to the genetic research. Concordance (both twins are affected

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 41 or unaffected) has shown that about 59% of MZ A variation of positional cloning enabled twins will both develop late onset Alzheimer genes to be identified on the basis that they disease. In contrast, the same risk for DZ twins were candidates for genetic disorders rather is around 24% for different sex and 32% for than their positions on a chromosome. In other same sex. These types of studies suggest that words, prior knowledge of the gene’s function the genetic contribution to late onset Alzheimer suggested it was worthwhile looking further at disease is about 50% [2]. this candidate as there was a strong likelihood that it would be involved in the genetic disor- Gene Discovery der. This provided a short-cut to gene discov- ery but required information about likely genes Until the mid-1980s, the approaches to that might be involved. The steps in positional understanding genetic disease relied on the cloning are described in more detail below. identification, and then characterization of an abnormal protein. This was taken a step fur- Chromosomal Location ther with molecular medicine, when it became The first step in positional cloning is to find possible to use a protein to clone the relevant a likely locus or chromosome involved. This gene. More information could then be obtained can come from case reports or observations in about the underlying genetic disorder from the which chromosomal rearrangements have been cloned gene. This was called functional cloning shown to occur in association with the clini- and is how the factor VIII (FVIII) or hemophilia cal picture. More commonly, it is necessary to A gene was found. However, the identification undertake a linkage analysis. This requires: of an abnormal protein was not always easy or 1. A large family in which there are a number indeed possible. For example, the genetic dis­ of known affected individuals and/or order Huntington disease was first described in confirmed normal individuals; 1872, and over 100 years later no abnormal pro- 2. DNA polymorphisms usually the micro­ tein had been found. satellites (Chapters 1, 3, 9) that are used to In the late 1980s, an alternative approach to study family members and attempt to link a the study of genetic disease became possible. phenotype with a DNA polymorphism, and This bypassed the protein and enabled direct 3. Analytic software that calculates the isolation of genes on their chromosomal loca- probability that the co-inheritance of a DNA tion. Gene(s) at this locus were next identified. polymorphism and a clinical or laboratory The causative gene was found by showing it phenotype is a chance event or is due to had mutations that co-segregated with the dis- a causative gene being present close to ease. This strategy was called reverse genetics. the polymorphism and so being inherited Subsequently, the name was changed to the along with it. more appropriate one of positional cloning. The first success stories involving positional clon- The probability measured in linkage analy- ing for human genes came in 1986 with the iso- sis is usually defined as a LOD score (Logarithm lation of the gene for chronic granulomatous (base 10) of ODds). The LOD score calculates disease by S Orkin and colleagues, and in 1987, the likelihood of obtaining the results if the with the Duchenne muscular dystrophy gene two loci are linked, compared to the likelihood isolated by L Kunkel and colleagues. Successes of observing the same data purely by chance. were slow to follow at first but, by the mid- Positive LOD scores favor the presence of link- 1990s, it became difficult to keep up with the age, whereas negative LOD scores indicate that number of new genes being discovered. linkage is less likely. By convention, a LOD

MOLECULAR MEDICINE 42 2. Genes, Environment and Inheritance score of 3 (1 000 to 1 odds) is highly sugges- be reduced even further so it can be intensively tive of linkage. Once linkage is established, the studied (Table 2.1). locus can be determined, since only DNA poly- morphisms from known loci are used. The next Gene Confirmation step is to study the locus in more detail by fine To find the correct gene in the region of inter- mapping around this region to detect potential est, DNA sequence data are entered into the causative genes. Linkage analysis is most use- various DNA and protein databases. Software ful in identifying rare alleles that have strong programs enable searches to be made that effects, i.e. Mendelian-type single gene disor- compare sequences in the databases with the ders [3,4]. recently discovered gene. Three outcomes of this search are possible: Genetic and Physical Maps These maps allow the chromosomal locus 1. A perfect match – this is bad luck because to be mapped and narrowed in distance to the the gene has already been found! point where individual genes can be identi- 2. No match – the gene is novel but there is no fied. The genetic map is made by looking at clue as to what it might do and considerable DNA polymorphisms within affected families. work is needed to determine its function, and The closer the polymorphism is to the gene, 3. Some homology (i.e. similarity) is found to the fewer will be the recombinations (breaking another entry in the database. This is the best and rejoining of the DNA) that are observed. result since the gene is still novel, and a clue Eventually, a polymorphism associated with to its function can come from the gene in the the gene itself will produce no recombination database with which it shares some DNA events. In contrast, a physical map is based on sequence. An example of how positional actual measurements, e.g. Kb (kilobase) or Mb cloning has allowed an important gene to be (megabase), and allows the region of interest to found is the work on cystic fibrosis (Box 2.1).

TABLE 2.1 Genetic and physical maps help to assign genes, DNA fragments or polymorphisms to particular locations on chromosomes [4].

Genetic map Physical map

Provides chromosomal assignment of a gene and its relative Adds to data obtained from genetic maps and provides position to other genetic markers or genes. more accurate locations of genes and other genetic markers. Calculated by family studies in humans and crossover studies Estimated experimentally by molecular and cellular with laboratory animals such as mice. The concept of synteny is techniques but not family studies, genetic crosses or helpful, i.e. synteny describes the co-location of two genes on the polymorphisms. same chromosome. In some cases the co-location is sufficiently close that the genes are inherited together. Linkage analysis is the method used to determine genetic Cytogenetics especially FISH, concordance cloning map distance between genes or DNA markers. It is calculated between two genes, PCR based techniques provide by assessing the frequency of recombination between two evidence of physical distances. Ultimately, the most polymorphic loci on a chromosome. The statistical way to report accurate measure of physical distance will come from linkage is via a LOD score (Chapter 3). whole genome sequencing (Chapter 4). Unit is a morgan. 0.01 morgan (M)  1 centimorgan (cM) and is Unit is the Kb or Mb etc (Chapter 1). 1 cM is represented by approximately 1% recombination. approximately 1 Mb of DNA.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 43

BOX 2.1 POSITIONAL CLONING IN CYSTIC FIBROSIS [ 1 ] . Initial attempts at chromosome localiza- 2. mRNA was present in tissues connected with tion in cystic fibrosis were unsuccessful. This cystic fibrosis, i.e. lung, pancreas, intestine, delayed isolation of the gene since a trial liver and sweat glands. and error approach was needed to determine The CFTR gene was found in 1989. Its which DNA polymorphic markers would genomic structure extended over 250 Kb of DNA. co-segregate with the disease. In 1985, linkage The mRNA transcript was 6.5 Kb in size. The of cystic fibrosis to DNA markers on chromo- protein encoded by CFTR had similarity to a some 7q31 was demonstrated. Subsequently, family of membrane-associated, ATP-dependent, the distance was narrowed and likely candidate transporter proteins involved in the active trans- genes within this region identified. Clues which port of substances across membranes. It was sub- suggested which was the cystic fibrosis gene sequently proven that CFTR codes for a chloride included: ion channel. This completed the cystic fibrosis 1. Conservation of DNA sequence across a story as clinically it was considered for many number of species, i.e. the gene carried out an years that the disorder was caused by an abnor- important function, and mality in salt (sodium chloride).

Candidate Genes and in silico Positional DNA sequencing becomes faster and cheaper Cloning (Chapter 4). DNA databases can also be searched An alternative way to establish a location is by computer (in silico) to identify what genes are to identify candidate genes because they are there. These are then studied to look for muta- likely to be involved in a particular disease. For tions in affected individuals. The in silico step example, in familial hypertrophic cardiomyopa- avoids the tedious and very time consuming thy, a disease of heart muscle, it is reasonable to construction of physical and genetic maps. speculate that muscle genes will be important, particularly if they are expressed in the heart. Autosomal Dominant Disorders Therefore, a plausible candidate gene for this disorder would be the cardiac β-myosin heavy The characteristic feature in a pedigree of chain gene located on chromosome 14q12. If it autosomal dominant inheritance is a vertical could be established that a DNA polymorphic mode of transmission. This occurs because the marker associated with this candidate gene disorder can appear in every generation of the co-segregated with affected individuals in a pedigree. Both males and females are affected familial hypertrophic cardiomyopathy family, and offspring are at 50% risk (Figure 2.1). There then there is good evidence that the disease are a number of additional features that need locus is on chromosome 14q. In the example to be considered when dealing with autosomal described the candidate gene is also likely to be dominant disorders. They are important for the disease-causing gene. The candidate gene understanding this type of inheritance and for approach has become increasingly attractive as counseling (Table 2.2).

MOLECULAR MEDICINE 44 2. Genes, Environment and Inheritance

I 1 2 I 1 2 I 1 2 1 2 3 4 5 II II 1 2 3

II 1 2 3 4 5 6 III 1 2 3 4 5 6 III 1 2 3 4 5

III 1 2 3 4 IV 1 2 3 4 5 6 7 IV 1 2 3 a b c

FIGURE 2.1 Pedigrees depicting Mendelian inheritance. (a) Autosomal dominant, (b) Autosomal recessive and (c) X-linked inheritance. Affected individuals are indicated by filled and carriers by half-filled circles or squares. Carriers for X-linked disorders have a dot.

TABLE 2.2 some features of autosomal dominant disorders.

Sporadic cases occur and become increasingly more common as the mutation interferes with fertility. For example, mutations in unrelated families with X-linked Duchenne muscular dystrophy are usually of independent origins because affected individuals are unlikely to survive to a reproductive age. In contrast, Huntington disease does not have a direct effect on reproduction. Thus, sporadic cases of Huntington disease are rare. Penetrance describes the clinical expression of a mutant gene in terms of its presence or absence at a stated age, i.e. an individual carrying a mutant gene may not express the clinical phenotype and so the condition is described as being non-penetrant. Penetrance can be determined from: (1) Family studies if it is possible to identify the number of obligatory heterozygotes for a mutant allele. Thus, if seven out of 10 show the clinical phenotype, the disorder is 70% penetrant, i.e. there is 70% probability that an individual carrying a mutant gene at a certain age will display the clinical phenotype. (2) The number of individuals with a DNA mutation who manifest the disease at a particular age. Apart from spontaneous mutations and death before onset of symptoms, penetrance is an additional explanation for affected offspring having an apparently normal parent. Expressivity and pleiotropy. Expressivity refers to severity. There are genes that can produce apparently unrelated effects on the phenotype or act through involvement of multiple organ systems. This is called pleiotropy. Such genes often show variable expressivity. An example of this is Marfan syndrome which has autosomal dominant inheritance and involves connective tissues in the skeletal system, the eye or the heart. Individuals with Marfan syndrome have any combination of manifestations that can also be present in different degrees of severity. Variability can occur within families in which it is presumed the same mutant allele is present. The basis for expressivity is not known but may represent gene/environment or gene/gene interactions. Somatic instability may be another explanation.

Model Although Huntington disease was described in Huntington disease is a neurodegenerative 1872 by Dr George Huntington, a family phys­ disorder with autosomal dominant inherit- ician (primary care physician), the next major ance. The offspring of affected individuals have advance did not occur until 1983. Before this, a 50% risk of inheriting the disease, which can Huntington disease could not be definitively present in various ways including a progressive diagnosed early in its course. Those at risk had movement disorder (typically chorea), psycho- to wait until their mid adult life to see if they logical disturbance and dementia. Disease onset had inherited the abnormal gene, by which is usually between 35–45 years of age, and time reproduction and other life decisions had there is complete penetrance by the age of 80. been made. Positional cloning for Huntington

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 45 disease proved to be particularly difficult because there was neither a cytogenetic location nor a candidate gene identified. Fragile X syndrome A trial and error approach was attempted, to find DNA polymorphisms linked to the 5′ 3′ (CGG) Huntington disease phenotype. The success 10-50 of this strategy would not have been possible without the large pedigrees that were identified Huntington disease in Venezuela. In 1983, a DNA marker located on chromosome 4p16.3 was found to co-segregate 5′ 3′ with Huntington disease. This showed that the (CAG) disease gene was located on chromosome 4. 6-26 From 1983, different genetic and physical map- ping strategies were used to find the relevant Myotonic dystrophy gene. These succeeded in 1993 when a gene 5′ 3′ called IT15 (IT – interesting transcript 15) was (CTG) isolated. The official gene name is now HTT. 5-30 The expressed protein is called huntingtin. The molecular defect in Huntington disease FIGURE 2.2 DNA triplet repeats and neurological involves a novel mechanism, shown in 1991 diseases [5]. The fragile X syndrome (CGG) repeat is in to result from expansions of triplet nucleotide the 5 flanking region of the gene. Normally there are about repeats. The first example of this was the fragile 10–50 repeats. Expansion beyond 200 repeats is associ- ated with methylation (silencing) of the FMR1 gene, i.e. a X syndrome (triplet repeat is CCG), followed loss of function. For Huntington disease, the (CAG)n triplet by myotonic dystrophy (CTG triplet repeat) repeat (normal number of repeats ranges from 6–26) is and then spinal and bulbar muscular atrophy located within the gene’s first exon. The repeat is a CAG (CAG triplet repeat). In Huntington disease, which codes for glutamine. Therefore, adding more poly- it was shown that there was a DNA triplet glutamines to this protein (called huntingtin) will inter- fere with its structure or function. Studies in humans and involving (CAG)n in the first exon. The normal mouse models suggest that huntingtin has its deleteri- number of repeats is 6–26 (Figure 2.2, Table ous effect through a gain of function. For myotonic dystro- 2.3). Expansions over 39 repeats are associated phy, the (CAG) repeat is located in the 3’ flanking region, with the development of disease. Statistically it and normally there are about 5–34 repeats. Mildly affected was also shown that the larger is the number of patients have 50–80 repeats while severely affected indi- viduals have 2 000 or more repeats. How expansion in repeats, the earlier is the onset of the disorder. the number of repeats located at the 3’ non-coding region Another observation related to instability in affects function of the myotonic dystrophy gene (DMPK) the repeat numbers uncovered the possibility is not known. Repeat numbers between the normal values that repeats could expand or contract slightly and those required to interfere with gene function represent when transmitted through sperm or ovum premutations. respectively. These observations explained the occasional presentation of Huntington disease through succeeding generations – could now in children or young adults (in which case, the be understood at the molecular level (Figure CAG repeat is very high) and why cases of juve- 2.3). There are a number of disorders in which nile Huntington disease invariably inherited triplet repeat expansion is the basis for increas- the mutant gene from their fathers. The concept ing severity in subsequent generations. The of anticipation – i.e. the earlier onset and more sex of the transmitting parent can influence the severe phenotype as the mutant gene is passed instability of the triplet repeats. For example,

MOLECULAR MEDICINE 46 2. Genes, Environment and Inheritance

TABLE 2.3 Neurologic diseases caused by expansions of triplet (and other) repeats [5].

Disordera (gene) Repeat (n  abnormal) Mode of inheritancea Anticipation present

HD (HTT) CAG (40) AD Yes HD1 (JPH3) CTG (41) AD Not sure SCA1 (ATXN1) CAG (39–91) AD Yes SCA2 (ATXN2) CAG 32 AD Yes SCA3 (ATXN3) CAG 52–86 AD Yes SCA6 (CACNA1A) CAG 20–33 AD No SCA7 (ATXN7) CAG 36 AD Yes, most unstable CAG SCA8 (ATXN8OS) CTG 80–150 with Complex as two Yes repeat in second gene repeats involved SCA10 (ATXN10) ATTCT repeat AD Yes, high repeat numbers SCA12 (PPP2R2B) CAG 51 AD No SCA17 (TBP) CAA/CAG repeat AD Not sure, complex because of linked repeat FMR1 related CGG (usually 200) XL Yes, also has abnormal disorders (FMR1) methylation FMR1 gene DRPLA (ATN1) CAG (48–93) AD Yes DM1 (DMPK) CTG (34) AD Yes DM2 (CNBP) CCTG (75) AD No SBMA (AR) CAG (35) XL Not sure, mild if occurs FA (FXN) GAA (66) but AR No interpretation is difficult

aHD – Huntington disease; SCA – spinocerebellar ataxia; FMR – familial mental retardation; DM – myotonic dystrophy; DRPLA – dentatorubral and pallidoluysian atrophy; SBMA – spinal and bulbar muscular atrophy (Kennedy syndrome); FA – Friedreich ataxia; AD – autosomal dominant; XL – X-linked; AR – autosomal recessive.

instability in the Huntington disease CAG who were at risk (Box 2.2). In this scenario an repeat is increased if the transmitting parent is expanded repeat of 40 had a 100% probabil- male, but it is the female parent who presents ity of Huntington disease, and a repeat 26 this risk in myotonic dystrophy. excluded this disorder. However, intermedi- ate-sized repeat expansions of 27–35 or 36–39 Translation into Clinical Practice required more careful consideration (Table 2.4). Once the triplet repeat in Huntington disease Intermediate repeats are now considered could be sized, DNA testing became possible to be premutations which, when expanded, in two circumstances: (1) To confirm a clinical will lead to Huntington disease in future gen- diagnosis, and (2) To predict the likely devel- erations. The concept of a premutation has opment of this disorder in family members helped to explain cases of apparently sporadic

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 47

FIGURE 2.3 Anticipation. Myotonic dystrophy Cataract, onset >50 is an autosomal dominant, multi-system disorder minimal muscle disease which is the most common form of adult muscular Minimal dystrophy. A feature is variable expressivity includ- ing a very severe congenital form. Molecular char- acterization has now explained the phenomenon of anticipation seen in myotonic dystrophy. The dia- Myotonia, onset >20 gram illustrates the increasing severity and earlier muscle disease Classic onset of symptoms expected in anticipation. A cor- responding expansion in the myotonic dystrophy (CTG)n triplet as it is passed through the female germline would parallel the clinical changes. Hypotonia, onset birth

Congenital

BOX 2.2 PREDICTIVE MEDICINE. An advantage of DNA over conventional differential diagnosis of a neurological disorder, pathology tests is the ability to make predic- e.g. gait disturbances or dementia. This type of tions, since mutations in DNA can be detected DNA testing (called diagnostic testing) is differ- before signs or symptoms develop. DNA pre- ent to a predictive test because the patient has dictive testing is described in a number of ways established signs or symptoms of the disorder including presymptomatic DNA testing or sus- (Chapter 3). Important issues were to emerge ceptibility DNA testing (Table 3.7). For con- from the Huntington disease predictive testing venience, the terms predictive DNA testing programs including: will be used to include all three types. From 1. Comprehensive clinical, counseling and 1983, using DNA polymorphisms linked to support facilities were necessary in a the Huntington disease locus, it became pos- predictive testing program and these had sible to undertake predictive testing within the major resource implications, and confines of a family unit, i.e. a linkage study 2. In some instances DNA tests placed further (Figure 3.9). Individuals with a family history stress on individuals and/or their families of Huntington disease now had an opportu- because they were able to show who would nity to alter their a priori risks by DNA studies. get Huntington disease and who would be Once the Huntington disease gene was found, spared. DNA predictive testing utilized direct mutation detection, and family studies were no longer The potential ethical, legal and social issues needed. DNA testing for the gene mutation also (ELSI) resulting from DNA testing are discussed became a new option to assist physicians in the in Chapter 10.

MOLECULAR MEDICINE 48 2. Genes, Environment and Inheritance

TABLE 2.4 Interpretation of (CAG)n repeat numbers instances, the genetic trait or mutation can in Huntington disease [5,6]. appear to be sporadic in occurrence. Therefore, Number (n) Interpretation of phenotype the finding of a negative family history in the autosomal recessive disorders should not be 26 Normal ignored, since the genetic defect can still be 27–35 Normal but there is risk that offspring will transmitted to the next generation, particularly develop Huntington disease if the mutant gene occurs at a high frequency in 36–39 This is associated with the Huntington a population – e.g. cystic fibrosis is usually only disease phenotype but there is the potential found in Europeans, with about 1 in 25 being for reduced severity. Some with these repeat carriers. numbers might not develop Huntington disease. There is the chance that offspring Model will develop Huntington disease. Iron overload can be acquired or genetic 40 Huntington disease (Table 2.5). Genetic hemochromatosis is an autosomal recessive disorder of iron metabo- lism. In some populations, carrier frequency can Huntington disease, in which there was no be as high as 1 in 8 with the highest incidences family history. In these circumstances, parents found in populations with a Celtic background who were able to be tested invariably demon- (Ireland, Wales and other regions in the world strated that one of them, usually the father, had where there has been migration from Ireland). a triplet repeat size in the intermediate range. Clinical features range from non-specific symp- toms such as lethargy or arthralgia to more Autosomal Recessive Disorders florid but less common presentations including diabetes mellitus, liver disease, and generalized The appearance of an autosomal reces- pigmentation. Life threatening complications sive disorder in a pedigree gives rise to a are cardiomyopathy and hepatocellular carci- horizontal rather than a vertical pattern. This noma. Early diagnosis and a relatively simple occurs because affected individuals tend to treatment via venesection can prevent disease be limited to a single sibship and the dis- progression and tissue damage [7]. ease is not usually found in multiple genera- The term hemochromatose was first described tions (Figure 2.1). Males and females are both by F Recklinghausen in 1886, and in 1935 affected. Consanguinity can be present in some J Sheldon suggested that hemochromatosis was families. The usual mating pattern that leads a genetic disorder. The next important discov- to an autosomal recessive disorder involves ery occurred in 1996, when the gene was iso- two heterozygous individuals who are clini- lated by positional cloning. The common genetic cally normal. From this union, there is a one in form of hemochromatosis is caused by muta- four (25%) chance that each offspring will be tions in the gene HFE which codes for a pro- homozygous-normal or homozygous-affected tein that has some features of the HLA class I for that trait or mutation. There is a two in four molecules. Hence the HFE gene was originally (50%) chance that offspring will themselves be named HLA-H, but this was soon changed when carriers (heterozygotes) for the trait or muta- it became apparent that the gene was not part tion. The same risks apply to each pregnancy. of the HLA complex. The HFE gene codes for a The inheritance patterns described may not protein that binds β2 microglobulin (like other be apparent, particularly in communities where MHC Class 1 molecules) and interacts with the numbers of offspring are few. In these transferrin receptor 1.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 49

TABLE 2.5 Causes of iron overload apart from the 2. p.His63Asp (H63D or His63Asp), i.e. common HFE (type 1) hemochromatosis [7]. histidine is replaced by aspartic acid at Type Classification Comments position 63, and 3. p.Ser65Cys (S65C or Ser65Cys), i.e. serine is Genetic Autosomal Type 2. Juvenile form; replaced by cysteine at position 65. recessive and earlier onset iron overload; autosomal autosomal recessive dominant inheritance; mutations in The p.Cys282Tyr defect alters the ability of forms two genes hemojuvelin (HJV) the HFE protein to bind to β2 microglobulin – or hepcidin antimicrobial which is essential for its subsequent interaction peptide (HAMP). Severe, rare with the transferrin receptor 1. This leads to an disorder. increase in cellular iron accumulation, however, Type 3. Due to mutations this is not the complete picture as HFE also in the transferrin receptor plays a role in intestinal iron absorption and 2 gene (TFR2). Autosomal recessive but very rare with interacts with the iron regulator hepcidin. The a phenotype similar to other two mutations in HFE have no effect on mutations in HFE. transferrin receptor 1 and their modes of action Type 4. Due to mutations in remain poorly understood. Apart from being ferroportin (SLC40A1) gene; homozygous for the p.Cys282Tyr mutation, autosomal dominant. Most the only other confirmed genetic risk is a dou- common form after HFE ble heterozygote for H63D/C282Y, although hemochromatosis. this combination shows considerably less iron Other genetic causes overload. for iron overload are The p.Cys282Tyr mutation is thought to have extremely rare and include arisen spontaneously a limited number of times aceruloplasminemia, transferrinemia, neonatal and then spread throughout the world. It has hemochromatosis and been suggested that the migratory patterns of H-ferritinemia. the Vikings would explain the distribution of Acquired Hematologic Thalassemias, sideroblastic p.Cys282Tyr in northern Europe. The common disease anemias, chronic haemolytic carrier frequency for this mutation implies the anemias. possibility of some type of evolutionarily selec- Dietary, Included here is iatrogenic tive advantage. For hereditary hemochromato- parenteral caused by long term blood sis, it has been proposed that women who were transfusions. carriers had a reproductive advantage since they Chronic liver Alcohol, fatty liver disease, would be less likely to be iron deficient (a com- disease porphyria cutanea tarda. mon problem in women, particularly if there is malnourishment). This does not explain the car- Miscellaneous African iron overload, other rare conditions. rier frequency in males although an evolutionary advantage would come from having some resist- ance to iron deficiency which could strengthen their immunity to infections. Other forms of Three common mutations are found in hereditary hemochromatosis must exist, particu- hereditary hemochromatosis: larly in southern Europeans, where p.Cys282Tyr 1. p.Cys282Tyr (also described as Cys282Tyr or is less common. These are broadly called non- C282Y) which means at amino acid position HFE hemochromatosis and relevant genes are 282 a cysteine is replaced with a tyrosine; now starting to be identified (Table 2.5).

MOLECULAR MEDICINE 50 2. Genes, Environment and Inheritance

Predisease disorder. An explanation for this is that loss of As demonstrated earlier, it is possible with a blood through menstruation is protective for DNA test to look for a mutation and so predict women, and so women prior to the menopause whether an individual will develop a genetic have a much lower risk of having the disease. disorder in the future. The only variable is how Therefore, the distinction between hereditary certain is the prediction. The Huntington dis- hemochromatosis and clinical hemochromato- ease model has shown predictive testing that sis is important. One study has shown about is very accurate, and able to determine many 28% of males homozygous for p.Cys282Tyr years in advance whether Huntington disease develop iron overload related disease, while for would develop. In these circumstances, the women the number is much lower at 1.2% [8]. DNA test converts a patient who is at risk into On the other hand, an individual with clinical an individual who has a predisease – i.e. a defi- hemochromatosis and the appropriate ethnic nite risk, because there is a mutation despite background is likely to be homozygous for this the patient being well and asymptomatic. mutation and further confirmatory tests such as However, risk is a difficult concept for both a liver biopsy might not be required. patients and health professionals; for exam- ple with breast cancer caused by mutations in DNA Screening BRCA1 and BRCA2 genes it is around 36–85% Screening for hemochromatosis can be (Chapters 3, 7). phenotypic (using biochemical markers such The term predisease may seem inappropriate as ferritin or transferrin) or genotypic (using to some. However, its usefulness lies in its abil- genetic DNA testing). The choice of approach ity to identify a problem at a very early stage in remains problematic. It is difficult to screen its development, with the expectation that pre- biochemically, and when the ferritin is raised, ventive interventions or earlier treatment will some damage may already have been caused. be more effective in delaying or avoiding long A raised ferritin level does not distinguish term consequences. The concept of predisease genetic from non-genetic causes. Nevertheless, is more subtly addressed by subdividing pre- the DNA test is more expensive, and progres- dictive tests into three groups: predictive, pre- sion from hereditary hemochromatosis to clini- symptomatic and predispositional (Table 3.7). cal hemochromatosis is unpredictable. In a However, these are confusing terms. It also multi-ethnic community, the p.Cys282Tyr test follows that if the result of the DNA test is will be less helpful. Because of these uncertain- not actionable (or has no clinical utility) there ties, there are no universal screening programs seems little point in doing it (discussed further underway, but the debate will continue since in Chapter 3). there is an effective and cheap treatment option available. Gene-Environment (GxE) Interactions An individual with hereditary hemochroma- X-Linked Disorders tosis has the genetic predisposition but there are environmental factors, and perhaps other X-linked disorders result from mutations genetic contributors, that will determine if there in genes on the X chromosome. Males are will be progression to clinical hemochromato- hemizygous because they only have one X chro- sis. An important environmental factor is sex – mosome and so will express fully an X-linked the male to female ratio for hemochromatosis disorder. On the other hand, females, who have is as high as 3:1 even though it is an autosomal two X chromosomes, will be carriers of the

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 51 defect in the majority of cases, and so they are usually asymptomatic. Although females have two X chromosomes to the male’s one, products pseudoautosomal from this chromosome are quantitatively simi- region lar in both sexes because one of the two X chro- PAR 1 mosomes in females is inactivated. Yp centromere Lyonization (named after Mary Lyon) describes the random X inactivation of an X male specific chromosome which occurs during embryonic euchromatin development. Because of the early onset and Yq randomness of the process, female carriers of X-linked disorders can demonstrate variable heterochromatin amounts of the gene product; namely a protein that will depend on the proportion of normal to mutant X chromosomes that remain functional. Most of the X chromosome is inactivated, PAR 2 although there are some segments that escape this process because there are comparable genes on the Y chromosome (Figure 2.4). The FIGURE 2.4 Human Y chromosome. Yp – short arm; exact sequence of events in humans is not well Yq – long arm. PAR – pseudoautosomal region. The Y chro- understood, although it is considered that epi- mosome is small, gene poor with a lot of repetitive DNA. genetic changes are involved (discussed under It also has two small pseudoautosomal regions at the ends (PAR1, PAR2). These recombine with genes mostly on the Epigenetics). The initiation of X chromosome short arm of the X chromosome during meiosis. X inactiva- inactivation comes from a specific site on the tion involving the pseudoautosomal genes does not occur X chromosome (the X-inactivation center) from on the X chromosome because gene dose in males and where is produced the X (inactive) specific tran- females will be the same (unlike the majority of genes on script by the XIST gene. Removal of this site the X chromosome). Most of the Y chromosome does not recombine and consists largely of repetitive DNA in the prevents inactivation from occurring. Skewing form of heterochromatin. of X inactivation can occur by chance, and in this rare event, a carrier female for an X-linked disorder will become a symptomatic carrier if the to occur if the trait is sufficiently common that normal X is predominantly inactivated. by chance the mother also carries the mutant The shape of a pedigree illustrating X-linked gene. An example of this would be glucose- inheritance is shown in Figure 2.1. It has an 6-phosphate dehydrogenase deficiency, with oblique character through involvement of approximately 10–20% of African Americans uncles and nephews related to the female con- being carriers or hemizygous for this defect. sultand. The usual mating pattern involves Just like autosomal dominant conditions, the a heterozygous female carrier and a normal frequency of spontaneous mutations in the male. Each son has a 50% risk of being affected X-linked disorders needs to be considered, through inheritance of the mutant mater- particularly when counseling females who are nal allele. Similarly, each daughter has a 50% potential carriers. chance of inheriting the mutant gene from her mother but will remain unaffected since she Model has her father’s normal X chromosome. Male to Coagulation factors involved in hemosta- male transmission is not seen but may appear sis function as a cascade; i.e. the first activates

MOLECULAR MEDICINE 52 2. Genes, Environment and Inheritance a second which then activates a third, and so TABLE 2.6 Clinical, laboratory and molecular features on. In mammals, five proteases (Factor VII or of hemophilia [1,5]. FVII, Factor IX or FIX, Factor X or FX, protein C Property Features and prothrombin) interact with five co-factors (tissue factor, FV, FVII, protein S and throm- Prevalence All ethnic groups. 1 in 10 000 males (FVIII deficiency), 1 in 20 000 males (FIX bomodulin) to generate fibrin. Deficiencies in deficiency). these proteins lead to bleeding. Abnormalities in two of the above factors (FIX and FVIII) are Defect Clotting co-factor VIII or IX produced in the liver. well recognized, because hemophilia results (Table 2.6). FVIII and FIX (hemophilia A and B Clinical Prolonged bleeding spontaneously or respectively) circulate as inactive precursors after minor trauma into joints, muscles, subcutaneous tissues and organs. About that become activated by a hemostatic chal- half have a severe disorder (FVIII or FIX lenge. FIX’s serine protease activity has an levels 1%) others are moderately severe absolute requirement for FVIII. Activation of (FVIII or FIX levels 1–5%) or mild (FVIII or these two products in the presence of calcium FIX levels 6–30%). and phospholipid forms the tenase complex Genetics X-linked; female carriers have 50% chance which activates FX and sets off the final steps of of transmitting to male offspring; only coagulation leading to the deposition of fibrin. about 10% of obligatory female carriers are Because of the interacting effects of FVIII and detectable because of bleeding problems or abnormal coagulation assays. FIX it is not surprising that the clinical features of hemophilia A and hemophilia B are identi- Gene FVIII gene is large with 26 exons over cal. The FVIII and FIX genes are found on the 186 Kb of genomic DNA. FIX gene is smaller with 8 exons over 34 Kb. X chromosome. Hence only males get hemo- philia while females are carriers, unless they DNA testing Except for the intron 22 inversion in have inherited a hemophilia mutation from the FVIII gene most other abnormalities involve point mutations with a small both their father and mother. Rare examples percentage around 5% having deletions. of symptomatic female hemophilia carriers are also described. The underlying mechanism is Chromosome Distal to Xq28 (FVIII); Xq27 for FIX. location considered to be non-random X inactivation, although this may be an over-simplification. Because there are well described functional domains in the FVIII and FIX proteins, DNA region of about 40 Kb located in the FIX gene’s mutations will have variable effects, and so 5’ region. In this example, the hemophilia B is a present differing severities depending on the severe disorder during childhood but improves domains involved. These include impaired spontaneously after puberty! This unex- secretion of the co-factor, interference with pected observation can be explained through binding of FVIII to FIX or von Willibrand fac- an understanding of the tor, and a range of missense changes interfering which shows a change in transcription fac- with cleavage to produce the active co-factor. tors binding at the 5’ promoter site around More discussion on FVIII and FIX and treat- puberty. Earlier hypotheses considered that this ment of hemophilia, including gene therapy is involved androgens, but now it is thought more found in Chapter 8. likely to be a growth hormone effect, explaining An interesting model is hemophilia B an age-related control of gene expression that is Leyden which involves a set of mutations in a independent of sex [1].

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 53

a b single X chromosome predisposes to an intra­ chromosomal recombination event (Figure 2.5). X Chromosome 26 23 FVIII 22 Carrier Testing in X-linked Disorders 1 Carrier detection is usually undertaken to determine if a female is a carrier, and so q28 at risk of having an affected male offspring. Like hemochromatosis, two approaches are possible: c d 1. Phenotypic assays: Protein levels for FVIII 26 23 and FIX demonstrate a wide normal range in blood. Because of random X inactivation, 26 the levels of FVIII and FIX can vary 23 considerably in females who are carriers of 1 22 22 1 hemophilia. This scatter makes an accurate assessment of carrier status difficult, if the woman tested demonstrates a normal or borderline result for the coagulant protein. FIGURE 2.5 Formation of the flip tip recombination The level may reduce the individual’s mutation in hemophilia A [1,5]. (a) The region of the X chromosome distal to band q28 contains the FVIII gene. (b) a priori risk but does not provide definitive Only relevant exons (1, 22, 23, 26) in this gene are shown proof of her carrier status. In addition to as blue bands. The red bar indicates the location within X inactivation, there are physiological intron 22 of an inverted DNA repeat. DNA homologous fluctuations seen in the coagulation factors, to this repeat and located more telomeric is also displayed due to influences such as pregnancy or taking (red bars). The ↑ indicates the direction that the factor VIII gene is transcribed. (c) This shows an intrachromosomal the oral contraceptive when baseline levels crossing over event between the two homologous regions can increase. Finally, there is the problem (broken lines). The additional (green) band in intron 22 is of assessing whether an affected relative a second intronic gene. (d) The final result from the cross- represents an example of a spontaneous over is a factor VIII gene that has been flipped around mutation rather than the transmission of (inverted) and is now in two sections – exons 1 to 22 and one repetitive segment is transcribed in a telomeric direc- a hemophilia defect within a family when tion; two repetitive segments and exons 23–26 are tran- there is only one affected male, and scribed towards the centromere. This gross structural 2. Genotypic assays: Testing for DNA mutations rearrangement has a major effect on FVIII production and is has advantages over proteins assay because found in about 50% of severe cases. The flip tip mutation is DNA is easy to obtain compared with an detectable by PCR. abnormal protein. Unlike protein, DNA is not affected by physiological fluctuations. Hemophilia illustrates the range of DNA The problem with DNA testing is that in mutations seen in genetic disorders. These addition to the flip tip mutation in factor include single base changes, deletions, inser- VIII, there are numerous other genetic tions and rearrangements. The latter is one of defects that cause hemophilia and this the most interesting of the mutations and has often requires sequencing of the whole gene also been called the flip tip. This recombination and then interpreting the significance of occurs predominantly in males because the DNA variants that are found (Chapter 3).

MOLECULAR MEDICINE 54 2. Genes, Environment and Inheritance

OTHER FORMS OF GENETIC illustrated by the hemoglobinopathies. These INHERITANCE are inherited disorders of globin and include: 1. Thalassemia syndromes, e.g. α thalassemia, The previous section described Mendelian- β thalassemia. The underlying biochemical type disorders which have as their main feature abnormality is an imbalance in the globin mutations in single genes that have a signifi- proteins that are produced, and cant effect on the phenotype. Thus a pedigree 2. Variant hemoglobins, e.g. sickle cell can be drawn and the disease followed through hemoglobin (HbS). Here globin proteins the family. However, as we learn more about are structurally abnormal. molecular medicine it is apparent that even these straightforward disorders are actually The hemoglobinopathies are usually inherited more complex, with many factors, both genetic as autosomal recessive disorders and it is esti- and non-genetic, influencing the final pheno- mated that about 7% of the world population are type (Figure 2.6). carriers. Hemoglobin, the pigment in red blood cells, Gene-gene Interactions comprises iron and a protein called globin. Four polypeptide chains make up globin including The potential for influencing the phenotype two α globin chains and two β globin chains. through gene-gene (G x G) interactions can be Following cloning of the human α and β globin

Epigenetic

Single Gene Environment

Phenotype

Life-Styles Many Genes

Modifying genes

FIGURE 2.6 Contributions to the phenotype. Even in Mendelian traits the phenotype can be subtly influenced by a number of factors, both genetic and non-genetic. The latter include the environment, e.g. exposure to irradiation, food addi- tives and life style effects such as alcohol intake and smoking.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 55 genes in the late 1970s, it was shown that the other parts of the world, including southern globins represented a gene family with a cluster China, South East Asia, India, the Middle East, on chromosome 16 (α globin genes) and a sec- Africa and other regions. The high frequency ond on chromosome 11 (β globin genes) (Figure of thalassemia carriers is due to the protec- 2.7). Another feature of the globin genes is their tion from malaria provided by these disorders. developmental regulation – embryonic, fetal The mechanism is a survival advantage for red and adult genes have been identified within blood cells carrying the thalassemia trait as each cluster. During development, there is these cells provide a poor environment for the a change in the hemoglobin profile with the growth of malarial parasites. A similar selec- complete switch from fetal (HbF) to adult tive advantage against malaria is found with (HbA) globins occurring about six months after variant hemoglobins such as HbS and HbE birth [9,10]. caused by missense mutations in the β globin The word thalassemia comes from the Greek gene. Recently, it has been shown that while θαλασσα which means the sea. It was coined in the thalassemias and the two variant hemo- 1936 when it was erroneously thought that tha- globins described protect against malaria, their lassemia was a disease found only in countries co-inheritance, particularly HbS and α thal­ bordering the Mediterranean sea. Today, at-risk assemia cancel out their individual protective populations are known to come from many effects [11].

11p15 ε GAγ γ ψβ1 δ β

β Globin gene complex 16p13.3

ζ2 ψζ1 ψα2 ψα1 α2 α1 θ1

α Globin gene complex

Chromosome 16

Chromosome 11

FIGURE 2.7 The globin gene clusters on chromosomes 11 and 16 [9,10]. Functional genes are shown as filled boxes and non-functioning ones (called pseudogenes) as open boxes. On the short arm of chromosome 11 at band position 15 is found the β globin gene complex. There is one gene which is active during embryonic life (); two which are fetal specific (Gγ, Aγ), and two are expressed in adult life (δ, β). The switch from fetal (HbF) to adult (HbA) globins is completed by about 6 months after birth. The α globin complex is on the short arm of chromosome 16 at band 13.3. There are more genes in this complex but many are non-functional. The embryonic/fetal gene is ζ2 and the two adult genes are α2 and α1. The evolution of the globin clusters from a common ancestral gene is seen by the similarity in structure and sequence which the above genes share even though they are on different chromosomes. The dotted line in the α globin complex marks the position of a DNA polymorphism. Red – embryonic or fetal genes; Blue – adult genes.

MOLECULAR MEDICINE 56 2. Genes, Environment and Inheritance

Molecular Pathology HbA (adult hemoglobin) is complete at about six The biochemical defect in the thalassemias months of age, as only then do the β globin gene is an imbalance in the number of α and β globin mutations exert their effects. Therefore, a long chains, with the normal α/β ratio being 1. If sought after but elusive goal has been to manip- this ratio moves up or down, the red blood cell ulate the globin genes to prevent fetal to adult precursors are prematurely destroyed in the switching, or reverse it once it has occurred. If bone marrow. Failure to produce α globin gives this were possible, the β thalassemias and HbS rise to α thalassemia, which is fatal in its most disorders would no longer be clinical problems. severe form. Failure to produce any β globin Evidence from DNA linkage analysis sug- (β thalassemia) is usually associated with a gests that other gene loci not on chromosome life-long, blood transfusion dependent anemia. 11 (where the β globin gene complex is found) Carriers of either thalassemia defect are clini- are also involved in the regulation of γ globin cally asymptomatic although their blood counts gene expression [12]. Further work is required range from normal to mildly abnormal. Despite to define the multiple molecular mechanisms very elegant biochemical studies it was not allowing HbF to remain high which could then possible to understand the variation in clini- be used to cure β thalassemia and HbS disease. cal or laboratory phenotypes until the globin Phenotypes genes were cloned and characterized, at which point it became apparent that G x G interac- These are not easily predicted in the hemo- tions involving α, β, and γ (fetal) globin genes globinopathies. Hence, it is essential to draw a explained many of the phenotypes. pedigree and study family members, since it is The molecular classification of α thalassemia possible that more than one type of thalassemia includes α and αo classes, based on how many has been inherited and the various G x G inter- α globin genes are deleted. αα/αα is the normal actions will not be detected if only one individ- complement of α globin genes (two on each ual is studied. Counseling at-risk couples who chromosome). A loss of one, i.e. α/αα, is het- are planning a family is difficult because it will erozygous α thalassemia. A loss of both genes not always be possible to predict the phenotype in the one chromosome, i.e. /αα, is hetero- of future offspring. G x G interactions and the zygous ao thalassemia. Various combinations effect of the environment (G x G x E) can influ- of α and αo can occur. Because there are only ence the clinical outcome, particularly in HbS. two β globin genes (one on each chromosome), the permutations are fewer. However, β globin Modifying Genes gene mutations are divided into β and βo on Other genetic factors that can affect the phe- the basis of whether there is some () or nil (o) notype are the presence of modifying genes. They β globin production. Hence, the phenotype can can be illustrated by reference to familial hyper- be variable, just like α thalassemia. trophic cardiomyopathy, which is an autosomal It is known that individuals who produce an dominant disorder that involves the muscle sar- excess of fetal Hb (HbF) for whatever reason comere. Families and individuals with this dis- will have milder forms of β thalassemia and HbS order can have a variable phenotype, including disease. Although the molecular basis for high risk of sudden cardiac death, even though they HbF production is becoming better understood, have the same gene mutation. Environmental it has been difficult to induce HbF production factors could possibly explain these differences, artificially. The severe forms of β thalassemia but there is increasing evidence that modifying become apparent once the switch from HbF to genes are important.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 57

An example of this type of gene is ACE again when the APOE4 gene is described in (angiotensin I converting enzyme). This has relation to complex genetic inheritance and two forms due to a 287 bp Alu repeat in intron Alzheimer disease later in this chapter, and the 16. Some individuals have this repeat in both p.Val129Met missense change in Prion disease their gene copies (genotype II where I  inser- in Chapter 6. tion), others have this repeat missing in both gene copies (genotype DD where D  dele- Uniparental Disomy tion), and the remainder are a mix of I and D (genotype ID). It has also been shown that the Uniparental disomy occurs when two plasma level of the ACE protein in a DD subject copies of a chromosome or part of a chromo- is higher than in a II subject, with the ID indi- some are inherited from the one parent and vidual being somewhere in between. When the nothing comes from the other parent. There distribution of I and D polymorphisms in ACE are two types of uniparental disomy: (1) are compared in mildly affected familial hyper- Heterodisomy: the two chromosomes are differ- trophic cardiomyopathy patients and those ent copies of the same chromosome due to a with severe left ventricular hypertrophy, it is meiosis I error, and (2) Isodisomy: both chromo- seen that there are more individuals with D in somes from the one parent are identical copies the latter group. Hence, the D allele is consid- due to a meiosis II error or post-zygotic dupli- ered to be associated with a poorer outlook in cation of a chromosome. There are three expla- terms of hypertrophy. Other modifying genes nations for uniparental disomy (Figure 2.8). have been implicated in influencing severity in Cytogenetic analysis will not detect uniparental familial hypertrophic cardiomyopathy includ- disomy because the chromosomal numbers are ing angiotensin II receptor 1, endothelin 1 and the same. It requires molecular analysis to show tumor necrosis factor α. At this stage, more that the two chromosomes originated from the work is needed to confirm these findings, many same parent. of which remain speculative. The chromosomal (and gene) content is Modifying genes are thought to represent the not changed in uniparental disomy and so QTLs described below under Complex Genetic there are usually no clinical consequences. Inheritance. When fully understood and char- However, disease will result if the chromo- acterized, they may allow a more complete somes or segments inherited contain imprinted understanding of pathogenesis, and from this genes – see the discussion on imprinting under a more accurate prognosis. These genes can Epigenetics below. Uniparental isodisomy can also be targets for new therapies. Here the aim also lead to genetic disease if the two identical will not be to cure the disease, but alter the chromosomes carry the same recessive muta- effects of modifying genes to improve clinical tion. This is illustrated by the very unusual well being. Although each plays only a small examples of cystic fibrosis occurring in chil- part in the phenotype, there are likely to be a dren of mothers who are known carriers but number of theses genes and so their cumula- the fathers are normal. Having excluded non- tive effects will be important. They are difficult paternity it was shown that the affected chil- to identify and characterize at present because dren had inherited two copies of the mutant they represent a more complex mode of genetic chromosome from their carrier mothers, i.e. inheritance. isodisomy. It is noteworthy in this circumstance The controversial nature of modifying genes that the cystic fibrosis phenotype was also and their effects on a phenotype will come up associated with developmental abnormalities,

MOLECULAR MEDICINE 58 2. Genes, Environment and Inheritance Mosaicism, chimerism UPD Mosaicism refers to the presence in an indi- a c vidual (or a tissue) of two or more cell lines that b differ in genotype or chromosomal constitution but have been derived from a single zygote. Mosaicism is the result of a mitotic mutation that occurs during embryonic, fetal or extra­ uterine development. Mosaic cellular popula- tions can arise from mutations in nuclear DNA or mtDNA in post-zygotic cells, epigenetic alterations in DNA and numeric or structural abnormalities in chromosomes. All these altera- tions can proceed from normal to abnormal and even vice versa. The time at which the defect arises will determine the number and types of cells (somatic and/or germ cells) that are affected. Mosaicism is likely to be found in all FIGURE 2.8 Mechanisms for uniparental disomy. large multicellular organisms to some degree. Gametes are depicted as circles, zygotes as triangles. A chro- Mosaicism can be studied using DNA mosome is shown as a bar – in the gametes it is present as one techniques that allow an accurate genotypic copy (monosomy – the normal situation); two copies (disomy) assessment of multiple tissues. In this way the and no copies (nullisomy). (a) One gamete has two copies of a chromosome and the other no copies. This situation can identity of individual cells can be established. arise following non-disjunction. Fertilization between these Clinically, mosaicism may have anything from two gametes would produce the normal diploid number but minimal to dramatic effects on a phenotype. An both chromosomes have come from the one parent, i.e. either understanding and awareness of mosaicism is iso or heterodisomy. (b) Fertilization in this case is between important, because it sometimes explains unex- a disomic gamete and a normal monosomic one. The zygote is trisomic and is unlikely to survive unless one of the three pected clinical or laboratory findings, includ- chromosomes is lost. By chance (33% of the time) the one lost ing deterioration or improvement in the clinical will have come from the normal gamete, i.e. the zygote is phenotype or unusual modes of inheritance again diploid but both chromosomes originate from the same [13]. Although DNA testing uses blood as the parent. (c) A third scenario involves fertilization between a traditional source of DNA, it is important to normal gamete and a nullisomic one. One way for the zygote to survive involves duplication of the single chromosome. test other tissues if mosaicism is suspected. Now uniparental isodisomy will result. The mechanism in (b) is considered the most likely since trisomy has been reported Chromosomal Mosaicism in chorionic villus samples but the newborn has a disorder such as the Prader-Willi syndrome which has resulted from X inactivation in females is not considered uniparental disomy. The initial trisomic situation is corrected by some to be an example of chromosomal which allows the fetus to survive but at the cost of disomy. mosaicism, because although the paternal or maternally-derived X chromosome are ran- domly inactivated in all tissues, the net effect including moderate to severe intrauterine and is no change in gene output. Others describe postnatal growth retardation. Thus, it is pos- X inactivation as an example of mosaicism sible that paternally-derived gene(s) located because there are mixtures of chromosomal on chromosome 7 are required for normal types present in the subject. Both Turner syn- development. drome (45,X) and Down syndrome (trisomy for

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 59 chromosome 21) have had chromosomal mosai- on when the mutation arose and in what cell cism demonstrated by cytogenetic analysis of types. Somatic mutations that occur as early cultured lymphocytes. The higher the percent- events in development will give rise to a more age of normal cells present, the more likely it is generalized disease phenotype. On the other that the disease will show a milder phenotype. hand, a late onset will be manifest by localized A conceptus with Turner syndrome probably or segmental disease, because fewer cell lines survives to term only when there is a coexist- are affected. Clues to the presence of mosai- ent normal cell line also present. Thus, chromo- cism may come from the finding in sporadic somal mosaicism explains why an aneuploid genetic disorders of marked tissue dysplasia fetus can survive to term if a normal cell line is which is patchy in distribution. Alternatively, present in the placenta. The common autosomal there may be mild phenotypic manifestations trisomies involving chromosomes 13, 18 and in a person with an apparent spontaneous sin- 21 are sometimes found as somatic mosaics. gle gene mutation, or a mild phenotype in an In nearly all cases, the zygote is initially com- individual with severely affected offspring or pletely trisomic but the loss of one of the tri- parents. Heritable genetic disorders that have somic chromosomes produces a normal cell line also shown somatic cell mosaicism include: which persists in the embryo (Figure 2.8). Lesch Nyhan syndrome, Marfan syndrome, There are a number of explanations for chro- Neurofibromatosis 1 and 2, Friedreich ataxia mosomal mosaicism observed during prenatal and Duchenne muscular dystrophy. diagnosis: Germline Mosaicism 1. Maternal contamination of sampled tissue; Germline mosaicism is one explanation of 2. Laboratory artifact; why parents, who are apparently normal on 3. Confined placental mosaicism, and genetic testing, can have more than one affected 4. True fetal mosaicism. offspring with an X-linked or dominant genetic Chromosomal mosaicism usually results disorder, e.g. X-linked: Duchenne muscular from non-disjunction occurring in an early dystrophy, hemophilia A or B; and autosomal embryonic mitotic division leading to the per- dominant: osteogenesis imperfecta, tuberous sistence of more than one cell line. With early sclerosis, achondroplasia, neurofibromatosis fetal sampling made possible by chorionic vil- type 1. Therefore, a suspicion of germ cell lus sampling, it has become apparent that mosaicism means that recurrence of a genetic chromosomal mosaicism affecting the placenta disorder needs to be considered when individ- occurs more frequently than previously consid- uals are counseled. ered (around 1–2% of samples). Chromosomal In the genome there are hot spots for muta- mosaicism confined to the placenta can pro- tion that explain why some genetic disorders duce false diagnostic results particularly in arise spontaneously and/or result from germ- karyotypes obtained from chorionic villus sam- line mosaicism. For example, both achon- pling. Retarded intrauterine growth in a fetus droplasia (mutations in the FGFR3 gene) and with a normal karyotype may result from aneu- neurofibromatosis 1 (NF1 gene) are associ- ploidy confined to the placenta. ated with low rates of new mutations in the germline compared to osteogenesis imperfecta Somatic Cell Mosaicism (COL1A1 gene) which has higher rates and so Mitotic errors at the DNA copying stage can greater risk for recurrences [13]. give rise to mutations in human genes. The Unlike ova, sperm are easily accessible, and clinical effect of somatic mosaicism depends so more is known about germline mosaicism

MOLECULAR MEDICINE 60 2. Genes, Environment and Inheritance

a b

BLOOD N BLOOD N M M ABNORMAL N SPERM N TISSUE M M

FIGURE 2.9 Pedigrees and DNA test patterns demonstrating somatic and germline mosaicism. N  normal; M  mutant. Blue  normal; Red  affected/abnormal DNA marker. (a) DNA testing in the peripheral blood lymphocyte shows all individuals have only the normal DNA marker. However, biopsy of abnormal tissue such as skin in the affected person shows that the DNA pattern is different, and a mutant band is also present, i.e. somatic mosaicism. (b) Illustrates two affected individuals with an autosomal dominant disorder, but phenotypically normal parents. DNA markers in the peripheral blood confirm that the two offspring have the genetic disorder. Examination of sperm DNA from males shows that the father of the two affected individuals has germline mosaicism, because some of the sperm have the DNA mutation. The proportion of affected sperm could be estimated by comparing the intensities for the normal and mutant DNA bands.

in sperm. Using PCR, normal DNA patterns Mitochondrial Inheritance obtained from somatic cells, such as peripheral blood, are compared with sperm DNA patterns. The nucleus is not the only organelle in The latter should show both normal and mutant eukaryote cells that contains DNA. Mitochondria DNA forms if there is germline mosaicism. From have their own genetic material in the form of the frequency of the mutant form, a theoretical a 16.6 Kb double-stranded circular DNA mol- recurrence risk can be estimated (Figure 2.9). ecule. Mitochondrial DNA (mtDNA) is char- acterized by a high mutation rate (10–20 times Chimerism that of nuclear DNA), few non-coding (intron) Unlike mosaicism, this refers to the pres- sequences, a slightly different genetic code, and ence in an organism of two or more cell lines maternal inheritance because the sperm head that are derived from different zygotes. During contains very little mtDNA. Mitochondria are embyrogenesis, cells from two distinct embryos essential for eukaryotic cells because they play can mix. This could occur, for example, in DZ a key role in many metabolic activities, particu- twins following intrauterine transfusion of cells larly energy production via the generation of from one to the other. It would be expected that ATP during oxidative phosphorylation. this type of chimerism would lead to immu- Since oxidative phorphorylation is control- nological tolerance following a graft from one led by nuclear DNA (~71 genes) and mtDNA twin to the other. A more common example of (13 genes), defects can lead to confusing inher- chimerism is an allogeneic organ transplant. itance patterns and phenotypes. Similarly,

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 61 most of the mitochondrial proteins are actually encoded in the nuclear DNA. mtDNA codes for rRNA, tRNA species required for mitochondrial protein biosynthesis and 13 polypeptides that form part of oxidative phosphorylation com- plexes I to V [14]. Defects in oxidative phospho- rylation affect a number of cellular processes including: 1. ATP generation; 2. Apoptosis; 3. Production of reactive oxygen species, and 4. Cellular oxidation and reduction.

Model It is only since 1988 that some genetic dis- orders, particularly those affecting organs with high energy requirements such as the brain, skeletal and heart muscles, have been FIGURE 2.10 Pedigree depicting mtDNA inheritance. proven to result from mutations in mtDNA. This is only a small pedigree and so could represent three Although it was suspected that mitochondria possible modes of inheritance: Autosomal dominant, were involved on the basis of maternal inherit- mtDNA or an imprinted gene in the male. The last two options are possible because disease transmission is only ance, biochemical abnormalities and abnormal apparent through the female line. Which of the three is morphology on microscopy, definitive proof correct will depend on the clinical features of the disease. required DNA characterization. Ultimately a DNA test will provide definitive evidence if a Features which suggest a mtDNA disease are: mutation in mtDNA is found. (1) Maternal inheritance, i.e. both males and females can be affected but the disorder be affected differentially on the basis of their is only transmitted by females (Figure 2.10); energy requirements. In addition, tissues with a (2) Energy production is preferentially high mutant to normal mtDNA ratio are more impaired so likely diseases are encephalopa- likely to be affected. thies, myopathies and cardiomyopathies; The types of mutations in mtDNA range from (3) Variable expression in affected individuals. deletions and duplications to single base changes. This is explained on the basis that each mito- It is interesting that the more severe mutations chondrion contains 2–10 DNA molecules, and demonstrate heteroplasmy, since they would in each cell there can be 1 000–10 000 mtDNA otherwise be lethal. Because of their effect on copies. mtDNA molecules in each cell are usu- reproductive fitness, these mutations are very ally identical (called homoplasmy). However, heterogeneous suggesting independent ori- if there are mutated mtDNA species, different gins. On the other hand, the milder point muta- proportions of the wild-type to mutant mtDNA tions can be found in all cells, i.e. homoplasmy. can be found in each cell and tissue. This is Examples of some genetic disorders that arise called heteroplasmy – the finding of a mixture from mtDNA defects (as well as nuclear DNA of mutant and wild-type mitochondrial DNA defects that mimic the mitochondrial phenotype) species in the same cell, and (4) Tissues will are given in Table 2.7.

MOLECULAR MEDICINE 62 2. Genes, Environment and Inheritance

TABLE 2.7 some examples of mtDNA genetic disorders [14].

Mutations Disease Clinical phenotype DNA mutation(s)

Mutations in Leigh syndrome Severe progressive Similar phenotype whether mitochondrial protein encephalopathy in children caused by mtDNA or nuclear coding genes or in (milder in adults). Severity DNA mutation – common feature nuclear genes for related to the percentage of involves energy metabolism. complex I or II mutant mtDNA species, i.e. heteroplasmy. Mutations in mt-tRNA Cardiomyopathy Children with this type of Few cases documented have genes or in nuclear (usually hypertrophic hypertrophic cardiomyopathy various changes. genes for complex type) have a poorer prognosis. I or II Mutations in Leber hereditary optic Causes blindness, predominantly About three missense mutations mitochondrial protein neuropathy in young males with reduced found in most cases and often coding genes penetrance as most carriers associated with homoplasmy, i.e. never become blind. these are mild mutations. Mutations in mt-tRNA Myoclonus epilepsy Myoclonus epilepsy, mental About 80% have the m.8344A  G genes and ragged red fibres retardation, ataxia, tremor, missense change in the tRNALys syndrome (MERRF) muscle atrophy. gene. Mutations in mt-tRNA Myopathy, , episodic vomiting Mostly missense changes genes encephalopathy, lactic and repeated cerebral episodes with the most common being acidosis, -like causing hemiparesis, hemianopia m.3243A  G in tRNALeu. episodes (MELAS) or cortical blindness. mtDNA Kearns-Sayre Opthalmoplegia, ptosis, retinal Deletions/duplications in mtDNA. rearrangements syndrome degeneration, ataxia, heart block. Usually heteroplasmic and include at least one tRNA gene.

Chromosomal Disorders sets do not survive beyond the pregnancy. Numeric alterations are detected by Abnormalities include: conventional cytogenetics, FISH or aCGH 1. Numeric (loss or gain called aneuploidy). (Chapter 4). This leads to an incorrect number of 2. Structural. These abnormalities can be chromosomes. Three copies of a particular balanced or unbalanced rearrangements. chromosome is called trisomy, e.g. Trisomy Balanced means that chromosomal 21 or Down syndrome. Other important rearrangements are present but there trisomies include trisomy 13, 16 and 18. is no gain or loss of genetic material. In One copy of a chromosome would be most cases these are harmless because a monosomy, e.g. Turner syndrome 46,X the same numbers of genes remain. In which is caused by loss of one of the contrast, an unbalanced translocation two X chromosomes. Monosomy in the leads to missing or extra genes and this autosomes is usually fatal. The addition of usually produces a severe disorder [15]. one or more complete haploid sets is called Chromosomal translocations occur when polyploidy. Triploidy and other polypoid there is transfer of genetic information from

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 63

one chromosome to another. An example COMPLEX GENETIC INHERITANCE of a reciprocal translocation is the Ph chromosome involving an exchange between Common Health Issues chromosomes 9 and 22 which ends up producing two new derivative chromosomes Complex genetic inheritance is exempli- (Figure 7.10). Other less common structural fied by commonly occurring diseases in which chromosomal changes include deletions, there are both genetic and environmental com- insertions and inversions. An example of a ponents – i.e. G x E effects as well as possible chromosomal rearrangement producing the G x G x E interactions. Some examples include: genetic disorder hemophilia A was discussed l Diabetes earlier (Figure 2.5). l Dementia and mental illness 3. Cell line mixtures. See above for mosaicism l Obesity and chimerism. l Cancer l Heart disease and hypertension Contiguous Gene Syndromes l Intellectual impairment l Congenital malformations These are complex genetic disorders that result from microscopic or submicroscopic dele- The amount of data on genes likely to be tions of contiguous genes. Other chromosomal implicated in the complex genetic disorders is structural changes can also occur. An example growing rapidly but there remain missing herita- is Williams syndrome, which involves a large bility factors that are yet to be understood or iden- deletion around 1.6 Mb on chromosome 7q11.23. tified. Drawing a pedigree in the complex genetic The consequences of this include aortic ste- disorders can confirm that multiple family mem- nosis, intellectual impairment, elfin facies and bers are affected, but provides little information transient hypercalcemia in infants. Detection of about inheritance patterns. An example is diabe- large deletions is performed by cytogenetics, tes mellitus particularly type 2 (Table 2.8). while FISH/aCGH is preferred for smaller dele- A hypothesis for complex genetic inherit- tions. DNA tests can also be designed to detect ance is based on the interaction of environmen- deleted genes or DNA segments. tal triggers with the cumulative effects of many In Williams syndrome the key gene impli- genes each of which makes a relatively small cated in the commonly found aortic valve defect contribution. Hence the concept of QTLs (quan- as well as other connective tissue abnormalities titative trait loci) has evolved which can include is the elastin gene (ELN). Mutations in this gene genes or SNPs that have regulatory function. alone will not produce the complete phenotype, In type 2 diabetes mellitus there are now over and other nearby genes or regulatory elements 40 loci or QTLs implicated in pathogenesis but must also be disrupted by the deletion. More even when the potential effect of these loci are than 20 genes are located within the deleted seg- added, it is still insufficient to explain the phe- ment but the key ones are ELN1, LIMK1, GTF21. notype. Interestingly, some of the QTLs are also In most cases, parents of an affected child are implicated in other diseases suggesting com- themselves normal and so the risk of subsequent mon pathways may be important for a range of pregnancies being affected is low. However, if a diseases. parent demonstrates a deletion in the Williams With the availability of comprehensive syndrome critical region, there is a 50% risk of genomic analysis through Next Generation other children being affected [5]. DNA sequencing (Chapter 4) there is now a

MOLECULAR MEDICINE 64 2. Genes, Environment and Inheritance

TABLE 2.8 Type 1 and type 2 diabetes mellitus [16].

Type 1 diabetes mellitus (juvenile diabetes or Type 2 diabetes mellitus (adult onset diabetes or non insulin insulin dependent diabetes mellitus – IDDM) dependent diabetes NIDDM)

Pancreas makes little or no insulin. Therefore, Insulin made is less effective. Drugs are used initially to allow the insulin needed for treatment. insulin to be used, or more insulin is produced. Weight loss and exercise help. Arises from autoimmune destruction of the Disease of late onset with a significant genetic component although pancreatic β islet cells. Usually presents before little is known about the genetic contributions which contrasts to the age of 30 years. Most often in childhood or our understanding of a rare form of diabetes called MODY (maturity teens. Concordance rates in IDDM twins are 8% onset diabetes of the young)a. The previously held view that this is (DZ twins) and 60% (MZ twins) which suggest an adult form of diabetes no longer applies as younger people are a significant genetic component. increasingly being affected. About 5–15% of diabetes cases are type 1, i.e. About 21 million affected in USA and numbers rising as obesity approximately 1 million are affected in the USA. increases. Less common in countries that do not have western-style Most countries are reporting a doubling of the diets and obesity. 90% of diabetes cases are due to type 2. incidence over the past 20 years. Risk factors: Strong genetic predisposition (HLA Risk factors: (1) aged 45 years; (2) Overweight; (3) Gestational locus and class II genes) and the environment. diabetes during pregnancy, and (4) Family history of diabetes. There are at least 40 genetic loci now implicated. aMODY is sometimes described as a rare variant of type 2 diabetes occurring in 1%–5% of diabetes in young people. Others consider it is neither type 1 nor type 2 diabetes. MODY demonstrates autosomal dominant inheritance and onset 25 years of age. A number of genes have been implicated in at least four subtypes of MODY. DNA testing to define MODY is important since young patients with this disorder must be distinguished from type 1 diabetes because they do not usually require treatment with insulin. Due to its mode of inheritance, children of an affected individual are at 50% risk of inheriting diabetes. move away from the commonly occurring QTLs Polygenic is a term best reserved for genetic (because these are the ones more likely to be diseases that result from mutations in a number detected by fairly crude approaches such as asso- of genes as illustrated earlier by the thalassemia ciation studies – described in more detail below) syndromes. Figure 2.11 illustrates the etiologi- to looking for rare SNPs that might also func- cal complexity associated with complex traits. tion as QTLs but have more powerful effects. Study of the single gene Mendelian disor- Somewhere in the mix is the environment but ders has provided significant insight into their how it works is unknown. Epigenetic and even pathogenesis. However, many of these condi- parent-of-origin effects provide additional modi- tions are relatively uncommon, and the impor- fying factors in this class of genetic disorders. tant health problems of today are considered to The term polygenic can have a number of be the complex genetic disorders. Earlier in the meanings including genetic effects resulting development of genetics, there was a period of from the interaction of multiple genes. A trait doom and gloom in positional cloning for dis- in the population such as intelligence is fre- covery of single genes in Mendelian disorders quently used to illustrate polygenic inheritance. because successes were slow in coming. The However, the environment (non-genetic effects) same has been observed with regard to com- plays an important role in the development of plex genetic disorders, although since 2007 intelligence, and this is not acknowledged by the corner may have been turned, and larger the term polygenic. Therefore, these types of numbers of interesting associations have been traits are more appropriately called complex. reported.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 65

environmental contributions become easier to identify, making it possible to

high low design better preventive strategies.

Gene Discovery The approach for gene discovery in the com- plex genetic disorders generally involves asso-

Allele Penetrance ciation (case control) studies. These compare Complexity Etiology low high DNA profiles from a cohort of known affected patients with a comparable control population. rare common Any detected genetic differences are then tested Allele Frequency to confirm whether they relate to the underly- ing phenotype (Figure 2.12).

FIGURE 2.11 A hypothesis to explain complex genetic Components of an Association Study diseases. The single gene Mendelian disorders are caused by 1. Large numbers of subjects are required – mutations in a protein-coding gene and the effects of these in the hundreds, or thousands for conditions mutations are seen clinically in terms of a disease phenotype. These represent high penetrance alleles (red circles) that such as diabetes. These numbers are are usually rare in occurrence. At the other end of the spec- necessary since phenotypes are more trum are the common but complex genetic disorders caused by difficult to define as they can be affected genetic and environmental interactions. Because commonly by the environment or, as illustrated by occurring DNA markers (SNPs – green squares) are used diabetes, there are different forms of the to look for genes in these disorders it is not surprising that many SNP-based associations are found but very few, if any, same condition. Since the gene effect is provide definitive evidence or mechanisms for disease. This relatively small it is assumed that many follows because the effects of the SNP markers are minimal genes are involved; since they are low penetrance alleles. A third but unproven 2. DNA polymorphisms (usually SNPs) are set of DNA markers to explain the missing heritability in the used to compare their distributions in patient complex genetic disorders are intermediate frequency alleles (blue triangles) with penetrance effects somewhere between and control groups [17]; the Mendelian disorders and the SNPs. To date the blue tri- 3. Short-cuts are possible if candidate genes can angles cannot be detected because of the way association be identified. Now the number of SNPs is studies are carried out, but if present, they will be found reduced because only those closely located with whole genome sequencing strategies (Chapter 4). to the candidate gene are required, and 4. Sophisticated bioinformatics is needed to It is worth expending considerable effort in compare genetic data between the two tested understanding the molecular basis of complex cohorts. genetic disorders because: There are many problems with association 1. They are relatively common health issues; studies which lead to false positive and false 2. The ability to detect those who are genetically negative results being reported. They result from: predisposed will allow the development of more targeted preventive programs; 1. Studying inadequate numbers; 3. New therapeutic targets or strategies are 2. Failing to select an appropriate matched needed, and control population to avoid stratification 4. As the genetic component to the complex errors, i.e. differences between cases and genetic disorders is understood, the controls due to ancestry;

MOLECULAR MEDICINE 66 2. Genes, Environment and Inheritance

6. Proving that genetic changes detected have functional significance is even more 200 400 difficult than for the traditional Mendelian subjects controls traits, because the genetic effect is small 50 DNA and the environment has an impact on the Disease X Normal SNPs phenotype. Some of the limitations described are now being addressed through the increasing availa- 47/50 SNPs same distribution bility of SNPs and automation. This means large SNP A B C SNP A B C scale, whole genome-based, association studies % 30 10 2 % 10 1 16 are becoming feasible, leading to the concept of GWAS (genome wide association studies).

Genome Wide Association Study (GWAS) Confirm Developments that have enabled the GWAS Function strategy to replace traditional association stud- ies include:

FIGURE 2.12 An association or case control study. A 1. One outcome of the Human Genome Project large number of subjects with a disease or a particular phe- was the initiative for developing a haplotype notype are recruited. In the example given n  200. Many map – HAPMAP – of the human genome. This more might be needed with complex diseases such as dia- showed that throughout the genome there are betes because the phenotype is difficult to confirm as there are many types of diabetes and the environment plays a haplotype blocks, or segments of the genome key role. DNA polymorphisms, usually SNPs, are then that are inherited together (Figure 2.13). This taken and the 200 subjects tested to compare the profiles understanding enabled a more rational and for a number of SNPs (50 in this example) in those with cost-effective strategy for selecting SNPs; disease X versus those without disease X (normal). Usually 2. Commercially prepared microarrays allow twice as many controls are used and it is essential that the right phenotype is made. This would be difficult in diabe- hundreds to thousands of SNPs to be tes unless dealing with a type that always presents in child- measured in an automated fashion, and, as hood. In the example illustrated, 47 of the 50 SNPs tested competition increases, the costs have started are distributed evenly across the two populations as shown to fall dramatically (Chapter 4); by statistical comparisons. Three SNPs (A, B, C) are inter- 3. Early successes, such as the first GWAS esting because there appears to be a difference in their dis- tribution between the two groups tested. These differences in 2005, provided a link between macular would need to be confirmed (perhaps in a larger cohort or degeneration and a genetic marker [18]. another population) and the SNP studied to explain func- From around 2007, the number of common tionally why it might be important. Alternatively, the SNP medical disorders for which GWAS provided is used to look for a gene that is in linkage disequilibrium additional information started to increase; (co-inherited) with it. 4. International consortia were formed ensuring the sample sizes for association 3. Having inadequate discrimination from the studies could be significantly increased; SNPs selected; 5. It became evident that complex genetic 4. High costs if large numbers of subjects and disorders were more likely to arise from SNPs are tested; defects in gene expression than from 5. Bioinformatic analytic tools are inadequate, mutations in amino acids, as found for single and gene Mendelian disorders. Hence, it was

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 67

1 2 3 4 5 6 7 8 9 11 13 14 15 16 17 10 12

Block A Block B Block C Block D

FIGURE 2.13 Haplotype blocks. Depicted schematically are 17 SNP markers spread across a segment of DNA. Until haplotype blocks were discovered, it would have been usual to select at random a number of the 17 markers across this segment to ensure comprehensive coverage of this site. Now it is apparent that chunks of DNA are inherited in blocks. For example, SNP markers 1,2,3,4 are inherited together with SNP markers 5,6 in another inheritance block and so on. This knowledge has simplified analysis because for each of the four haplotype blocks described here (A to D) it would only be necessary to select one of the SNP markers to represent the others. For example in the case of Block A, you would look to see which of the 1,2,3 or 4 SNP markers was most variable, i.e. polymorphic, and use that marker to test for all four. Therefore, four SNPs would become representative of this entire segment.

necessary to look beyond exons or genes to While impressive results are emerging from the considerably larger non-coding portions GWAS, it is important to highlight the inad- of the genome, and equacies of this strategy: 6. As whole genome sequencing studies progressed, they demonstrated surprising 1. SNPs used can only detect small effects with variation between individual genomes, Odds Ratios (OR) 5 (Chapter 3) [19]; particularly with copy number variations 2. SNPs implicated are likely to be surrogate (CNVs) which became another type of markers for control regions or other parts of polymorphic marker (Chapter 1). the genome affecting gene expression – i.e. a GWAS is only an indirect approach; Hundreds of GWAS are now underway or 3. Because of the way in which GWAS are completed across many human disorders. As undertaken only known SNPs are tested. the number of SNPs goes up (there are around This is different to what might be possible 40 million reference SNPs in the databases) the by whole genome sequencing which gives options for selecting more informative SNPs an unbiased representation of a region increase, and it is possible to include CNV-type being tested; polymorphisms in these studies. The forma- 4. GWAS does not measure G x G or G x E tion of international consortia for particular interactions, and diseases has enabled an exponential increase in 5. Most studies have involved Caucasians. the number of subjects studied, while data on Other ethnic groups need to be included, as control populations continue to expand, par- demonstrated by a GWAS of the Japanese ticularly through the use of databases such as population, which identified an additional the 1 000 Genome Project (Chapter 1). This has gene implicated in type 2 diabetes [18]. led to important new findings, both in terms of potential novel genes and pathways for patho- GWAS approaches continue to generate genesis. Meta-analyses have also identified data important novel findings, such as additional that had been missed in single studies [18]. breast cancer susceptibility loci [20]. Nevertheless,

MOLECULAR MEDICINE 68 2. Genes, Environment and Inheritance

BOX 2.3 D E N O V O MUTATIONS AS A CAUSE OF COMPLEX GENETIC DISEASE [ 2 1 ] . Although the genetic component in schizo- that the expected de novo mutation rate was phrenia is significant at about 80%, its molecu- around half this number, and so the changes lar basis remained elusive, even after many were significant and explained some cases of years of work. All that had been found were a schizophrenia. The report notes that a similar large number of genetic loci, implicated through pattern for de novo mutations was also observed linkage or association studies. Using a whole in a separate study looking at seven mental exome sequencing approach (Chapter 4) 14 retardation trios. individuals with schizophrenia and their par- Note of caution: Although potentially an ents, i.e. 14 trio samples, were tested. This study important study it is a little surprising that some showed that eight of the 14 patients had devel- of the missense variants detected were readily oped 15 spontaneous de novo mutations in their labeled as mutations. Computer-based (in silico) exons. Four mutations involved premature assessments used gave marginal results at best. stop codons giving a truncated protein, and the More will be said in Chapter 3 about the diffi- remainder were missense changes (one amino culty in classifying DNA variants, particularly acid substituted for another). It was calculated as more sequencing is carried out.

it has been predicted that GWAS will be replaced years while the population aged over 85 has eventually by whole genome sequencing. Is it quadrupled. Young people are also suffering also possible that there are relatively rare alleles from dementia with the end result in all cases in the genome that will not be detected by GWAS, a progressive cognitive dysfunction [2]. About but which are important in complex genetic 50–70% of dementia cases are due to Alzheimer inheritance because their effects are stronger. disease, of which less than 5% have an auto- A combination of GWAS and whole genome somal dominant genetic form. The remainder sequencing may provide a better understanding is sporadic and called late onset Alzheimer of these challenging genetic disorders. disease to distinguish it from the early onset Recently, a whole exome sequencing genetic form. approach in schizophrenia has shown another The majority of cases of early onset possible mechanism for complex genetic dis- Alzheimer disease cases have mutations in ease through the development of spontaneous three genes: de novo mutations in somatic cells (Box 2.3). A 1. amyloid precursor protein (APP); role for epigenetics and imprinting in complex 2. Presenilin 1 gene (PSEN1), and genetic disease is discussed below. 3. Presenilin 2 gene (PSEN2). Model These mutations interfere with the process- As people live longer the number with ing of the amyloid precursor protein coded dementia increases. In some countries the popu­ by the APP gene on chromosome 21. Amyloid lation aged over 65 has doubled in the past 70 precursor protein is normally cleaved and it is

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 69 hypothesized that one cleavage product called EPIGENETIC INHERITANCE Aβ42 (it has 42 amino acids) is produced in excess because of these gene mutations. Since Accepted views are regularly challenged in Aβ42 is highly amyloidogenic, it represents molecular medicine as new data or observations the primary toxic agent in Alzheimer disease emerge. Examples include the focus on DNA and produces a characteristic pathologic pic- changes (mutations) as the cause of disease, ture. Copy number mutations of APP also lead and the hypothesis that complex genetic disease to an increase in Aβ42 which might explain the results from G x G or G x E or G x G x E inter­ dementia associated with trisomy 21 (Down actions. However, the following observations syndrome). are not easily explicable by these mechanisms: The basis of the more common late onset, sporadic, form of Alzheimer disease is 1. Identical (MZ) twins have essentially the unknown. As indicated earlier, twin stud- same DNA content and share much of their ies have shown a strong genetic component environment yet they can develop different but apart from this it behaves like a complex genetic diseases; genetic disorder with both genetic and environ- 2. A few autosomal genes are expressed from mental factors thought to play a role in patho- only one parent (see imprinting below); genesis. Environmental triggers for Alzheimer 3. Many plants and animals have the same disease have not been definitively demon- gene content as do humans yet the latter’s strated, although some metals, toxins and phenome is considerably more complex viruses have been implicated. (Table 1.7), and The genetic components of late onset 4. All cells in an individual have an identical Alzheimer disease remain unknown, and muta- DNA profile, yet the expression of genes is tions in the above three genes are not found. tightly regulated depending on the needs of There is a controversial association between the tissue. one gene and the risk of developing Alzheimer These discrepancies might be explained disease. This gene is APOE (apolipoprotein through epigenetics, i.e. mitotically heritable E), which has three variants 2, 3, 4. Having alterations in the pattern of gene expression medi- one copy of the APOE 4 allele increases the ated by mechanisms other than changes in the lifetime risk for Alzheimer disease three fold, primary DNA sequence of a gene. The word epi- and this goes up eight fold in those who are genetics has the Greek prefix epi which means homozygous [2]. The 4 allele effect predomi- on top of. This implies that while the genome’s nantly leads to an earlier age of onset and may DNA codes for the building blocks controlling work via Aβ42 peptides. Its effect seems to be the cell, including regulatory elements influen­ stronger in populations such as Europeans and cing gene expression, there is an additional Japanese. Nevertheless, it is important to note layer to be considered. One analogy proposes that there are many individuals who are 4 pos- that the DNA code is the cell’s hardware, and itive but do not develop Alzheimer disease, and epigenetics the software allowing each cell to many with late onset Alzheimer disease who have its own unique epigenetic pattern. are not 4 positive, hence routine DNA test- Features of epigenetics include: ing for APOE subtypes is not recommended. DNA testing to look for mutations in the three 1. Stable patterns propagated across multiple Alzheimer disease genes should be restricted to cell divisions (mitosis); the appropriate circumstances, e.g. early onset 2. Epigenetic modifications (re-programming) cases, or cases with a positive family history. during meiosis at two periods in

MOLECULAR MEDICINE 70 2. Genes, Environment and Inheritance

development (gametogenesis and early Histone Modification embryogenesis discussed in Chapter 7); A second epigenetic pathway involves post- 3. Control at the transcriptional level via transcriptional covalent modification of four chemical modification of DNA or changes in histone proteins (H2A, H2B, H3, H4) around chromatin or post-transcriptional regulation which DNA wraps itself to form a nucleo- via ncRNAs, and some. The ability of genes to transcribe can be 4. Dynamic processes that can be influenced by changed by modifying the N-terminal tails of the stage of development, the environment, the histones. Modification can occur through tissue type and stochastic events [22,23]. methylation, acetylation, phosphorylation and Thus, as well as G x G, G x E, G x G x E inter- ubiquitination. The changes so produced have actions, one can now add epigenetic (Ep) effects wide ranging effects including transcription, viz. G x Ep, E x Ep, G x E x Ep and other per- DNA repair, DNA replication, alternative splic- mutations to try and explain phenotypic vari- ing and chromosomal condensation. ability in health and disease. Many different types of histone modifica- tions have been reported. Each can contribute to the fine tuning of gene expression by act- Epigenetic Modifications ing directly on chromatin structure making it more or less accessible to transcriptional activ- Four epigenetic mechanisms can influence ity. Additional protein complexes can also be gene expression [22]. recruited to activate or repress chromatin struc- ture. This potential variability in the epigenome Methylation (as well as the methylation profile) may explain This is the best characterized. The methyl- differences observed in MZ twins (Box 2.4). ated form of the base cytosine is sometimes called the 5th nucleotide base, i.e. A, T, G, C and Nucleosome Positioning methyl C. The DNA methyltransferase (DNMT) Nucleosome can function as barriers to tran- enzymes that are found in many species add a scription by blocking activators and transcrip- methyl group to some cytosines at the C5 posi- tion factors accessing DNA. One way to do tion in DNA. In mammals, DNA methylation this is through the positioning of nucleosomes is mostly present in CpG dinucleotides that are relative to the transcription start sites (TSS). A usually methylated in somatic cells. Methylation shift as little as 30 bp between the nucleosome is stably maintained during cell division at CpG and the TSS can affect gene expression with islands. When found in association with genes, loss of a nucleosome directly upstream of the CpG islands are generally located within the 5’ TSS leading to gene activation, while occlu- region where promoters are situated. Genes that sion or interference with the TSS results in gene are transcriptionally active will be hypomethyl- repression. The type of histone present will ated at the CpG islands while inactive genes significantly impact on nucleosome position- are methylated. The main enzyme responsible ing. DNA methylation can also alter nucleo- is DNA methyltransferase I (DNMT1). There is some remodeling as can large macromolecular a direct G x E connection here because DNMT1 complexes. uses methionine as the source of the methyl groups and this requires folic acid. Gene repres- ncRNA sion through methylation occurs because the The fourth way in which epigenetic changes binding of transcriptions factors is inhibited and occur involves small non-coding RNAs chromatin co-repressors are recruited. (Chapter 1). This level of gene regulation occurs

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 71

BOX 2.4 TWIN STUDIES [ 2 4 ] . Studies comparing concordance/discord- development and so there is greater likelihood ance rates for diseases between MZ (monozy- that the erasure and resetting of the epigenetic gotic) and DZ (dizygotic) twins are based on marks has been completed. The latter set of the premise that there are three major contribu- twins will have more comparable or even simi- tors to disease development – genetic, genetic lar epigenotypes. Female MZ twins would have and environment, environment alone. However, another basis for difference, which is related some discrepancies have appeared in twin stud- to X chromosome inactivation (an epigenetic ies when diseases demonstrated a strong herita- effect). Normally, X chromosome inactivation is bility factor but some MZ twins were discordant a random event, but skewed X inactivation has for the disease. One example was schizophrenia, been reported for a number of X-linked disor- which has reported heritability estimates around ders, including fragile X syndrome, Duchenne 80% but the concordance rate in MZ twins is in muscular dystrophy, color blindness and hemo- the range of 41–65%. What is going on? philia. This might explain the observations that, As explained earlier, epigenetic changes can in terms of social, behavioral and cognitive meas- occur during various stages in development, in ures, male MZ twins have demonstrated higher different tissues, or even as stochastic events, concordance rates than the same traits in female while the DNA sequence remains the same. MZ twins. Genomic imprinting, which is estab- Although MZ twins are derived from split- lished through epigenetic mechanisms during ting of the same embryo, the timing of this split gametogenesis, would also explain differences in (an early one produces a dichorionic MZ twin MZ twins. For example, the observation of dis- while a late split gives a monochorionic MZ cordance in MZ twin pairs with the imprinted twin) means the progression of the epigenetic disorder Beckwith-Wiedemann syndrome was program can differ in MZ twins – i.e. dicho- explicable because only the affected twin had an rionic MZ twins are more likely to have a dif- imprinting defect at KCNQ1OT1 (the gene that is ferent epigenetic program than monochorionic not expressed in the maternal allele) leading to MZ twins, since the latter have split later during abnormal biallelic expression of this gene.

post-transcriptionally. X chromosome inactiva- 1. Development; tion represents the combined effects of methyla- 2. Genomic imprinting; tion, histone modification and RNA mediated 3. Gene dosage, and gene silencing. 4. Genome stability. It follows that defects in the epigenetic Clinical Relevance processes will lead to abnormalities in a range of circumstances, particularly development, aging Epigenetics (and the broader context of epi- and cancer, as well as some genetic diseases. genomics) is an additional layer over the nucle- Genes that are important for methylation otide sequence that filters out certain patterns or chromatin remodeling can be mutated, and of gene expression particularly those involving: failures in these epigenetic mechanisms lead

MOLECULAR MEDICINE 72 2. Genes, Environment and Inheritance

TABLE 2.9 some human genetic disorders caused by epigenetic abnormalities [25].

Defect in epigenetics and genes involved Disorder; OMIM number in { } [1] Description

Methylation defect due to various point Rett syndrome {312750} X-linked mostly affecting females mutations in the methyl CpG binding 2 leading to a severe progressive gene (MECP2). neurodevelopmental disorder. Due to various mutations in the ATRX Alpha thalassemia mental Severe mental retardation, gene which produces a protein associated retardation (ATRX) syndrome facial dysmorphology, skeletal with DNA methyltransferases and {301040} abnormalities and alpha thalassemia. chromatin remodeling. Methylation defect in most cases due to Immunodeficiency centromeric Severe immunodeficiency, mutations in the DNA methyltransferase instability and facial anomalies chromosome instability and facial 3B (DNMT3B) gene. syndrome {242860} anomalies. Mutations in the ribosomal S6 kinase 2 Coffin-Lowry syndrome {303600} Growth retardation, facial, hand and gene (RSK2) lead to changes in chromatin skeletal abnormalities and mental structure. retardation. Chromatin defect due to mutations in the Rubinstein-Taybi syndrome Facial dysmorphology, short stature, CBP gene. CREB binding protein affects {180849} skeletal abnormalities and mental histone acetyl transferase and histone retardation. methyltransferase. to severe genetic disorders (Table 2.9). Aging the promoter regions of estrogen receptors as is a complex mix of environment and genetic individuals get older [26]. This is found in the components with the clearest example of a smooth muscle of the circulatory system as well genetic effect being a very rare condition called as atherosclerotic plaques occluding blood ves- (or Hutchinson-Gilford syndrome) sels. The assumption is that increased meth- caused by mutations in LMNA (Chapter 7 ylation plays a role in the aging/damage of the and Table 7.3). Another genetic component to blood vessels and, if proven, this provides a aging was mentioned earlier under mtDNA. potential biomarker as well as a target for novel Now it is possible to add epigenetic effects that therapies. might allow a link with the environment to be Drugs can be used to modify epigenet- established. ics via changing DNA methylation or histone An interesting but recently questioned obser- acetylation. One example is the known de- vation in MZ twins is their global and locus- methylating drug 5-aza-2’-deoxycytidine, specific epigenetic profiles including DNA which has been around for many years but had methylation, histone H4 acetylation, histone limited utility, as it was associated with seri- H3 acetylation change with age in a number of ous complications such as the development tissues. This epigenetic drift might be an expla- of leukemia. Now it is once more being used nation for discordance in late onset diseases in the treatment of lung cancer. Its ability to in MZ twins. In addition, it may highlight the inhibit DNA methylation in this tumor is pro- fact that errors in epigenetic pathways, which ducing promising results and, since this can do not have repair mechanisms like DNA, happen at significantly lower doses than were play a role in the normal aging process. A spe- used previously, the risks are less. The results of cific example is the increasing methylation of further trials are awaited.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 73

Transgenerational and Environmental or queen bees – yet their genomes are the same. Effects The use of siRNA to inhibit DNA methyltrans- Since epigenetic changes undergo two major ferase 3 had a similar effect, leading to the pref- reprogramming events (Chapter 7), it is rare erential development of queen bees [29]. for these changes to be passed from one gen- In humans, the evidence for the mediation eration to the next. Nevertheless, there is some of environmental effects through epigenetics is evidence that it might happen in the mouse and not as strong. The use of periconceptional folic other model organisms [27]. This transgenera- acid is now an accepted preventive measure tional effect is thought to represent incomplete to reduce the incidence of neural tube defects erasure of the epigenetic marks as they pass in pregnancy. The molecular basis of this through the female germline. In one example effect may be epigenetic, since it is known that it has been shown that changes to the mother’s methyl groups are essential for cranial neural diet, such as giving alcohol during pregnancy, tube closure. Inhibiting methyl transfer, or hav- can alter the color of her litter’s coats and this ing a low folate intake, would have a net effect can be passed on to the next generation. While of reducing DNA methylation. Other dietary evidence for a transgenerational epigenetic epigenetic effects including high fat and alcohol effect has been reported in animals and plants, intake, and exposure to cigarette smoke and air it is still not clearly defined in humans. Some pollutants have been described although more examples have been cited including the effect evidence is needed [29]. of famine during gestation on the methylation Important questions remain about epigen­ state of an imprinted gene IGF2, but it is diffi- etics and these will continue to drive further cult to confirm that this is epigenetic. Resolving research in this area: this issue would help to understand the role of 1. Which epigenetic changes represent direct social determinants in disease, particularly in cause and effect or are secondary to altered the disadvantaged or neglected members of the gene expression? community. 2. How does the environment effect the Similarly, an unequivocal demonstration expression of genes via epigenetic changes? that the environment can alter gene expres- 3. More needs to be known about the sion is much sought after, because it would inheritance of epigenetic effects, and lead to further insight into G x E interactions. 4. How can the translation of epigenetic A link between epigenetics and the environ- research findings into clinical practice be ment is well established in plants, as some must improved? Few biomarkers have been be exposed to long periods of cold before they identified but this is not surprising in view are able to flower fully. This is called vernaliza- of the tissue and spatial complexity of tion and is due to epigenetic changes that are epigenetic changes. responsive to environmental temperature. These inhibit flowering in long cold winters, and then Epigenome the flowering genes start to be expressed in If the Human Genome Project was a chal- response to the warmer spring/summer tem- lenge, the epigenome is considerably more peratures. The epigenetic basis for this change complex since it: in gene expression is thought to be histone modifications [28]. In honey bees it has been 1. Can vary between tissues and even between shown that feeding larvae different diets will cells within the same tissue; determine whether they develop into workers 2. Is influenced by the environment;

MOLECULAR MEDICINE 74 2. Genes, Environment and Inheritance

3. Can change as a person ages, and Genomic imprinting is an example of epi- 4. Differences between species make genetic inheritance with the difference in gene comparative studies less useful. function determined by its origin in the male or female parental germ cells. This implies that Despite this, the trend in epigenetics like during a critical time in development, some other omics (Chapter 4) is to move to broader genetic information can be marked provision- whole approaches and so to epigenomics. ally so that its two alleles undergo differential Projects underway include mapping the whole expression. As with all epigenetic marks there methylome, the acetylation states in his- occurs during development a step-wise erasure, tones, and also various tissue specific analy- re-establishment and then maintenance of the ses. Not surprisingly an International Human methylation and/or chromatin configurations. Epigenome Consortium has been launched Imprinted genes undergo the same epigenetic (Table 1.10). changes during development of the gametes One approach to the identification of meth- but they are protected from erasure in the ylated DNA sites uses the chemical bisulphite developing embryo. to mark these sites prior to DNA sequencing. Approximately 64 imprinted genes have Cytosine residues in single-stranded DNA been identified in humans, and these genes lie are converted to uracil after treatment with within clustered regions of the genome with bisulphite whereas 5 methylcytosine residues the two largest being chromosomes 15q11 and remain unchanged. Therefore, after sequenc- 11p15. Another 112 genes are predicted to be ing, the presence of a cystosine will indicate imprinted [30]. Genomic imprinting evolved in where methylated cytosines were present. The mammals around 200 million years ago before unmethylated cytosines on sequencing will the split into marsupials and eutherians, hence appear as thymine. New technologies, particu- imprinting is associated with live births. While larly Next Generation DNA sequencing, will imprinted genes occupy only a small subset of allow very accurate analyses of many methyl- the mammalian genome, they are considered to omes (Chapter 4). Presently, the focus of much be essential for normal development. epigenomic work has been directed towards An imprinted locus will be inherited along methylation because this is measurable, but Mendelian lines, but this may not be appar- much remains to be learnt about acetylation ent until it can be seen that the expression pat- and the other less well-characterized changes tern is dependent on a parent of origin effect that alter chromatin conformation. (Figure 2.14). Imprinting plays a fundamental role in normal development during embry- Imprinting onic and postnatal life (Chapter 7). It is also involved in brain function and behavior. In can- Gynogenetic embryos (both copies of each cer, the imprinting pattern in tumors can be dis- gene have a female origin) and androgenetic turbed. Since imprinting means that one of two embryos (both copies of each gene have a male alleles is normally inactive (imprinted), it fol- origin) do not develop into viable offspring. lows that a mutation in the remaining allele can In mammals, successful development needs lead to genetic disease because neither gene is genetic material from both the male and female. now expressing. However, if a mutation affects In humans the inheritance of imprinted genes the imprinted allele, there will be no clini- by uniparental disomy described below shows cal consequence because the imprinted allele that abnormalities can occur if there is inappro- does not express. In the latter case, a mutated priate dosage of certain parental alleles. imprinted allele causes no immediate problem,

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 75

(a) (b)

FIGURE 2.14 Pedigrees depicting imprinting (parent of origin effects). An imprinted locus is inherited as a Mendelian trait but the expression of the two alleles will depend on the parent of origin. (a) The paternal allele is inactive (imprinted). There will be no expression of the mutant allele when transmitted by the father. For the mutant gene to cause disease it must pass through the maternal line. (b) The maternal allele is imprinted and the disease phenotype only becomes apparent after paternal transmission of the mutant allele. In both cases there are carriers (indicated with a dot in the circle or square) who have normal phenotypes but can transmit the trait depending on their sex. There are equal numbers of affected and unaffected males and females in each generation. but it may do so in subsequent generations if NDN, MAGEL2 as well as a cluster of pater- the imprint is re-set because it has now been nally expressed small nucleolar RNAs, and transmitted by a parent of the other sex. leads to PWS. Which of these is the actual caus- ative gene(s) is not known. In contrast, loss of Models the maternal segment (containing the UBE3A Three rare syndromes are associated with gene which is expressed from the maternal the imprinted clusters on chromosomes 11p15 allele in certain parts of the brain) produces and 15q11. In the former is the Beckwith- AS – i.e. the two syndromes exhibit oppositely Wiedemann syndrome (BWS) while the lat- imprinted chromosomal segments but are con- ter cluster is linked with the Prader-Willi trolled by two adjacent imprinting control syndrome (PWS) and the Angelman syn- regions. Just like BWS, imprinting defects can drome (AS). The clinical and molecular fea- occur from deletions (of the expressing allele), tures of these disorders are summarized in uniparental disomy (involving the imprinted Table 2.10. Their etiologies remained unknown allele, whereas a growth disorder like BWS until cytogenetic and then molecular analysis involves uniparental disomy of the expressing identified atypical modes of genetic inherit- allele) or abnormalities affecting the imprinting ance consistent with imprinting. For BWS, an control regions (Figure 2.15). early clue was the finding of uniparental dis- Imprinting is best detected and its implications omy with both chromosome 11 homologs com- have become better understood through molecu- ing from the father’s gene that expresses the lar diagnosis. This enables accurate assessment of growth-potentiating IGF2 gene [1]. the parental origin for chromosomal abnormal­ Distinct but adjacent segments of chromo- ities such as deletions, aneuploidies or uniparen- some 15q11 are considered critical for normal tal disomies. For DNA diagnosis in PWS and AS, development. Loss of the paternal segment of the initial DNA test determines the methylation this chromosome region affects a number of status of the imprinting region. This is a highly paternally expressed genes, including SNRPN, reliable test, but it does not define the underlying

MOLECULAR MEDICINE 76 2. Genes, Environment and Inheritance

TABLE 2.10 Clinical, cytogenetic and DNA features of the Beckwith-Weidemann, Prader-Willi and Angelman syndromes [1,5].

Disorder Clinical and laboratory features

Beckwith-Weidemann Clinical: Pediatric overgrowth disorder associated with predisposition to tumor development. syndrome OMIM Clinical features are variable. Abnormal growth may manifest in hemihypertrophy and/or {130650} Occurs in macroglossia, enlarged organs and malformations. Most common tumors are Wilms and about 1 in 15 000 births. hepatoblastoma. Laboratory: Diagnosis is mostly based on clinical findings. Rare to have a cytogenetic abnormality detectable. DNA testing used to detect: (1) methylation abnormalities; (2) paternal uniparental disomy (10–20% cases) or (3) mutations in the CDKN1C gene (10–40% cases). Risk of recurrence depends on underlying molecular defect. Prader-Willi syndrome Clinical: Diminished fetal activity, obesity, hypotonia, mental retardation, short stature, small OMIM {176270} Occurs hands and feet, hypogonadotropic hypogonadism. Laboratory: Paternal interstitial deletions in in about 1 in 16 000 to 1 70–80%. The remainder caused by maternal uniparental disomy. Rarely an imprinting center in 25 000 births. defect is found. Recurrences unlikely in those with deletions or uniparental disomy. DNA testing for methylation defects detects 99% of cases. Angelman syndrome Clinical: Mental retardation, movement or balance disorder, characteristic abnormal behaviors, OMIM {105830} Occurs severe limitations in speech and language. Laboratory: ~70% cases due to a de novo maternal in about 1 in 10 000 to deletion of 15q11.2-q13 critical region. ~2% cases due to paternal uniparental disomy. ~2% cases 1 in 40 000 births. due to defects in imprinting control region or point mutations in the UBE3A gene. 10% will have no genetic abnormality detectable. Recurrences unlikely in those with deletions or uniparental disomy. DNA testing for methylation defects or mutations in UBE3A detects ~90% cases. defect – further DNA analysis is required to do on the results of GWAS might lead to better this. Counseling issues in both PWS and AS are discrimination in terms of risk. In one study, complex but important, because parents will imprinting effects were identifiable because the want to know the risks of recurrence. Generally, genealogy of the study population (Icelandic) in PWS the risk is low if the primary defect is a could be accurately characterized. This showed de novo deletion or uniparental disomy. In AS that some SNPs associated with diseases such counseling is more difficult because of the greater as breast cancer, basal cell cancer and type 2 range of genetic perturbations involved, includ- diabetes demonstrated parent of origin effects ing cases where the underlying defect is not [33]. This interesting finding opens up an add­ known. Risks are more likely to be low if there is itional dimension for thinking about G x E a de novo deletion or uniparental disomy. interactions in complex genetic disorders.

Imprinting and Complex Genetic Disorders As indicated earlier, the molecular basis SOMATIC CELL GENETICS for complex genetic disorders is yet to be adequately defined. Interest is now turning to The discussion in this section focuses on how the moderately rare but higher penetrant SNPs genes and the environment can lead to disease or genes, which cannot be found by GWAS type with the latter now known to play a significant approaches but may be detected through whole role. Although called genetic (the DNA sequence genome sequencing (Figure 2.11, Chapter 4). is mutated), the changes only involve somatic However, there is more to consider with a cells and so cannot be transmitted to offspring. recent observation suggesting that superim- This means that the ethical, legal and social posing parent of origin effects (i.e. imprinting) implications (ELSI) are significantly different

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 77

11p15 15q11

IGF2, INS, SNRPN? KCNQ10T1 MKRN3 MAGEL2 H19 NDN KCNQ1 CDKN1C UBE3A TSSC5 ATP10C TSSC3

Maternal Maternal

IGF2 H19 SNRPN UBEA

Paternal Paternal

FIGURE 2.15 Epigenetic control of imprinted regions in two large chromosomal clusters [31,32]. Imprinted (inactive) genes are depicted as red boxes while the active genes are green boxes. The blue symbols represent ICR (imprinting control regions) with the one on chromosome 15 more complex since it is controlling the two downstream genes depicted. The bet- ter understood locus is the ICR on chromosome 11p15 region. Here the maternal allele normally expresses the H19 gene while the paternal allele the IGF2 gene. The genes for the nearby KCNQ1 gene cluster, also differentially imprinted, are listed in the box with the paternally expressing ones uppermost. This cluster has its own ICR. The ICR between IGF2 and H19 is methylated in the paternal line and so a transcription factor (CTCF) does not bind allowing a downstream enhancer to activate transcription at IGF2. In contrast, the maternal ICR is not methylated, so CTCF binds and this inhibits the IGF2 promoter from being activated by the downstream enhancer. The two key genes in the 15q11 chromosomal region are UBE3A and SNRPN and other imprinted genes are listed in the box (paternally expressing ones uppermost). The ICR at this locus is more complex and is located next to the SNRPN gene. In the brain, the maternal allele expresses the UBE3A gene. A question mark is next to the SNRPN gene because unlike the UBE3A gene it is not conclusively shown to be the definitive causative gene for PWS. Single Gene Somatic Disorders because family members do not share the DNA changes; hence there are no risk implications for Mutations in somatic cells involving single the family. Table 3.7 shows a classification of the genes can cause genetic disorders. An example various DNA tests available and how somatic is the very rare Proteus overgrowth syndrome cell DNA testing compares to the others in terms which has been shown by extensive DNA of ELSI. There are also some similarities between sequencing to be caused by somatic mosai- the two types of genetic disorders, since both cism involving mutations in the AKT1 gene [1]. aggregate into single genes or more complex Without the mosaicism, mutations in the same gene-environment interactions. gene are predicted to be lethal.

MOLECULAR MEDICINE 78 2. Genes, Environment and Inheritance

Another unusual example is paroxys- DNA changes is furthering our understand- mal nocturnal hemoglobinuria (PNH) which ing of pathogenesis and, in terms of personal- manifests with the triad of hemolytic anemia, ized medicine, therapeutic options. With this venous thrombosis and bone marrow failure in mind, the International Cancer Genome leading to cytopenia. PNH is acquired and Consortium was formed in 2010 to sequence never inherited; therefore it must arise spon- the entire genome of 50 different tumors (Table taneously during embryogenesis but does not 1.10, Chapter 7). involve the germ cells. As expected, the defect Somatic cell DNA defects are likely to arise is found in a hematopoietic stem cell which from environmental insults that initiate and demonstrates mosaicism. The latter is essen- then progress the development of a tumor. tial because, on the basis of transgenic animal Detecting mutations in DNA from somatic cells studies, the mutations in the gene PIG-A (phos- is used in clinical practice to: phatidyl inositol glycan class A) which cause 1. Confirm a diagnosis; e.g. the Philadelphia PNH are incompatible with life if all cells are chromosomal translocation in chronic involved [34]. The PIG-A gene is required for myeloid leukemia; the production of an anchor protein that links 2. Provide clinical information about prognosis; other proteins to the cell surface membrane. e.g. MammaPrint™ in breast cancer Hematologists have long considered PNH to discussed in Chapter 3, and be a complex and challenging disease to under- 3. To personalize treatment options, stand and treat. Even though the genetic defect particularly in chemotherapy, which uses has been found, the complete story remains expensive and potentially toxic drugs. elusive. For example, males and females are equally affected, but the PIG-A gene is located The pharmacogenetic and oncologic implica- on the X chromosome. This is possible if the tions of somatic DNA mutations in cancer are mutations arise after the completion of X inac- discussed further in Chapters 3 and 7. tivation, so both males and females have one functional X chromosome and are effectively hemizygous for PIG-A function. Another obser- References vation is that mutations in this gene have been [1] Online Mendelian Inheritance in Man (OMIM). www. described in normal individuals and PNH only ncbi.nlm.nih.gov/omim occurs following expansion of the PNH clone. [2] Nussbaum RL. Genetics and genomics of demen- These are explained by linking the develop- tia. In: Ginsburg GS, Willard HF, editors. Essentials of Genomic and Personalized Medicine. San Diego: ment of PNH to other gene or environmental Elsevier; 2010. p. 687–99. factors, particularly bone marrow aplasia and [3] Stein CM. Identifying genes underlying human inher- a selective advantage of the PNH clone lead- ited disease. In: Encyclopedia of Life Sciences (ELS). ing to its preferential survival. How the latter Chichester: John Wiley & Sons, Ltd.; 2010. occurs remains unclear although it is proposed [4] Tramontano A. Bioinformatics. In: Encyclopedia of Life Sciences (ELS). Chichester: John Wiley & Sons, that there is cell mediated autoimmune damage Ltd.; 2009. to the non-PNH stem cells [34]. [5] NIH GeneTests. www.ncbi.nlm.nih.gov/sites/ GeneTests/?db=GeneTests [6] ACMG/ASHG Statement Laboratory guidelines for Complex Somatic Disorders Huntington disease genetic testing. American Journal of Human Genetics 1998;62:1243–7. Today, there is an increasing interest in the [7] Janssen MCH, Swinkels DW. Hereditary haemo- somatic cell genetic changes found in com- chromatosis. Best Practice & Research in Clinical monly occurring sporadic cancers. Cataloging 2009;23:171–83.

MOLECULAR MEDICINE 2. Genes, Environment and Inheritance 79

[8] Allen KJ, Gurrin LC, Constantine CC, et al. Iron- [23] Relton CL, Davey Smith G. Epigenetic epidemiol- overload-related disease in HFE hereditary hemo- ogy of common complex disease: prospects for pre- chromatosis. New England Journal of Medicine diction, prevention and treatment. PloS Medicine 2008;358:221–30. 2010;7:e1000356. [9] Galanello R, Origa R. Beta-thalassemia. Orphanet [24] Nipa Haque F, Gottesman II Wong AHC. Not really Journal of Rare Diseases 2010;5:11. identical: epigenetic differences in monozygotic [10] Harteveld CL, Higgs DR. α thalassaemia. Orphanet twins and implications for twin studies in psychia- Journal of Rare Diseases 2010;5:13. try. American Journal of Medical Genetics Part C [11] Williams TN, Mwangi TW, Wambua S, et al. Negative (Seminars in Medical Genetics) 2009;151C:136–41. epistasis between the malaria-protective effects of α [25] De Sario A. Clinical and molecular overview of thalassemia and the sickle cell trait. Nature Genetics inherited disorders resulting from epigenetic dys- 2005;37:1253–7. regulation. European Journal of Medical Genetics [12] Menzel S, Garner C, Gut I, et al. A QTL influencing 2009;52:363–72. F cell production maps to a gene encoding a zinc- [26] Gilbert SF. Ageing and cancer as diseases of epigen- finger protein on chromosome 2p15. Nature Genetics esis. Journal of Biosciences 2009;34:601–4. 2007;39:1197–9. [27] Daxinger L, Whitelaw E. Transgenerational epigenetic [13] Hall JG. Mosaicism. In: Encyclopedia of Life Sciences inheritance: more questions than answers. Genome (ELS). Chichester: John Wiley & Sons, Ltd.; 2005. Research 2010;20:1623–8. [14] Hofmann S, Bauer MF. Mitochondrial disorders. In: [28] Finnegan EJ, Helliwell C, Sheldon C, Peacock WJ, Encyclopedia of Life Sciences (ELS). Chichester: John Dennis ES. Vernalization. In: Encyclopedia of Life Wiley & Sons, Ltd.; 2005. Sciences (ELS). Chichester: John Wiley & Sons, Ltd.; [15] Tuna M, Amos CI. Uniparental disomy in cancer – a 2010. new tool in molecular cancer. In: Encyclopedia of Life [29] Kim K-C, Friso S, Choi S-W. DNA methylation, an Sciences (ELS). Chichester: John Wiley & Sons, Ltd.; epigenetic mechanism connecting folate to healthy 2010. embryonic development and aging. Journal of [16] Overview of diabetes from the US National Diabetes Nutritional 2009;20:917–26. Information Clearinghouse. http://diabetes.niddk. [30] Lists of imprinted genes by species. www. nih.gov/dm/pubs/overview/ geneimprint.com/site/genes-by-species [17] Information on SNPs. www.ornl.gov/sci/ [31] Sha K. A mechanistic view of genomic imprinting. techresources/Human_Genome/faq/snps.shtml Annual Review of Genomics and Human Genetics [18] Chee-Seng K, Yun LE, Yudi P, Kee-Seng C. Genome 2008;9:197–216. wide association studies: the success, failure and [32] Ager EI, Pask AJ, Gehring HM, Shaw G, Renfree future. In: Encyclopedia of Life Sciences (ELS). MB. Evolution of the CDKN1C-KCNQ1 imprinted Chichester: John Wiley & Sons, Ltd.; 2009. domain. BMC Evolutionary Biology 2008;8:163. [19] Hardy J, Singleton A. Genomewide association stud- [33] Kong A, Steinthorsdottir V, Masson G, et al. Parental ies and human diseases. New England Journal of origin of sequence variants associated with complex Medicine 2009;360:1759–68. diseases. Nature 2009;462:868–74. [20] Turnbull C, Ahmed S, Morrison J, et al. Genome-wide [34] Luzzatto L. Paroxysmal nocturnal hemoglobinuria: association study identifies five new breast cancer an acquired X-linked genetic disease with somatic- susceptibility loci. Nature Genetics 2010;42:504–7. cell mosaicism. Current Opinion in Genetics and [21] Girard SL, Gauthier J, Noreau A, et al. Increased Development 2006;16:317–22. exonic de novo mutation rate in individuals with schiz- ophrenia. Nature Genetics 2011;43:860–4. [22] Portela A, Esteller M. Epigenetic modifica- tions and human disease. Nature 2010;28:1057–68.

Note: All web-based references accessed on 13 Feb 2012.

MOLECULAR MEDICINE CHAPTER 3 DNA Genetic Testing

OUTLINE

Introduction 81 DNA Genetic Tests 99 New Tests 99 DNA Variants 82 Classes of Tests 101 DNA Polymorphisms 82 Pharmacogenetics and Pharmacogenomics 104 Mutations 84 Nomenclature 85 Evaluation 110 ACCE 110 Detecting DNA Variants 87 Polymerase Chain Reaction 87 Challenges 111 Direct Mutation Detection 91 Genetic Counseling 111 Indirect Mutation Detection 92 Medical Management 112 Calculating Risk 95 References 114 Mendelian Disorders 96 Complex Genetic Disorders 97

INTRODUCTION this chapter will be the different clinical contexts for testing and how these affect the significance DNA genetic testing (or DNA test, the terms of the test and its delivery. Unlike the tradi- are used interchangeably) describes a labora- tional pathology test, such as a hemoglobin tory assay that identifies a genotype or sets of measurement, the DNA test is more complex genotypes for a disease in a population and for because it: a particular purpose [1]. Another name is molecu- lar genetic testing, but this could be confused with 1. Does not necessarily have a normal value; cytogenetic tests such as FISH (fluorescence in 2. Can be used for multiple purposes; situ hybridization) or aCGH (array comparative 3. May require additional support or genomic hybridization) discussed in Chapter 4. counseling linked to testing, and Apart from the technical aspects of DNA 4. The technology is continually evolving genetic testing, a recurring theme throughout (Figure 3.1).

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00003-7 81 © 2012 Elsevier Inc. All rights reserved. 82 3. DNA Genetic Testing

The discussion to follow is predominantly 80 about germline DNA tests. 70

60

50 DNA VARIANTS 40 DNA Polymorphisms

Complexity 30 DNA genetic sequences vary considerably 20 between individuals. These changes are collec- 10 tively called DNA variants. Most DNA variants have little apparent functional significance, in 0 1970s 2000s Genomics which case they are known as DNA polymor- Technology Interpretation Support phisms. By convention, a polymorphism is a dif- ference in DNA sequence that occurs in 1% of the population. Since only about 1–2% of the FIGURE 3.1 Evolution of DNA genetic tests. In the 1970s the major feature of the DNA test was its techni- human genome contains sequences for protein- cal complexity with turnaround times measured in weeks. coding genes, the great majority of polymor- Known mutations were sought and results were easy to phisms will not directly affect gene activity, interpret. The available clinical infrastructure, i.e. coun- although a polymorphism falling within a regu- seling, family and support groups was rudimentary. In the latory region in the genome might have func- 2000s the technology simplified with many tests available in kit form. Once DNA sequencing started to become the tional implications. Some DNA variants can alter technique of choice, the significance of the results became a an amino acid in the protein. This might still be limitation as DNA variants of unknown significance (VUS) classed as a neutral variant or polymorphism were found. The level and type of support increased. The if the change in amino acid does not interfere genomic next stage involves based DNA/RNA tests. The with a protein’s function. Some variants do not technology is far more complex so fewer laboratories have the expertise. What the results mean are still being evalu- change an amino acid but might still impact on ated in research studies. The support needed has become gene function through changes in splicing. If the more of a challenge as the depth of information (useful amino acid change does have an effect on pro- and extraneous) increases. The complexity scale is arbitrary tein or gene function it is called a mutation. In a based on personal impressions. number of cases it is difficult to decide whether a variant is pathogenic or not. These are called An important distinction in the DNA genetic variants of unknown significance (VUS). tests already mentioned in Chapter 2 is the sub- DNA polymorphisms are used for many division of genetics into: purposes in molecular medicine, from foren- 1. Germline, or germ cell, or constitutive DNA sic DNA typing (Chapter 9) to DNA linkage tests – the patient as well as family members analysis (see below), a technique that allows are implicated, and diseases to be traced through families. There 2. Somatic cell or acquired DNA tests – the are a number of different DNA polymorphisms result impacts on the patient alone and so (RFLP, VNTR, SSR, STR, SNP) (Figure 3.2). In the ethical, legal and social issues (ELSI) clinical medicine, the two relevant ones are: are essentially no different to the more 1. Microsatellites (also called simple sequence traditional pathology tests. repeats (SSRs) and in forensic science they Applications of somatic cell DNA testing are are called simple tandem repeats (STRs)), and mostly found in cancer therapies (Chapter 7). 2. Single nucleotide polymorphisms (SNPs).

MOLECULAR MEDICINE 3. DNA Genetic Testing 83

EE* E RFLP

E E VNTR

SSR

SNP A G C T T C G C A G A

FIGURE 3.2 Four types of DNA polymorphisms (see also Figure 1.6). DNA polymorphisms are produced by changes in the nucleotide sequence or length. These result from: (i) Variations in the fragment length pattern produced after digesting DNA with restriction enzymes, (ii) Variations in the size of a DNA fragment after PCR amplification, and (iii) Variations in the DNA sequence itself. DNA polymorphisms include: (1) RFLP – restriction fragment length polymorphism; (2) VNTR – vari- able number of tandem repeats; (3) SSR – simple sequence repeats or STR – simple tandem repeat, i.e. microsatellites, and (4) SNP – single nucleotide polymorphism. RFLP: A segment of DNA is digested with a restriction enzyme E. This segment can be identified in Southern blot analysis by using a DNA probe that will hybridize to the segment marked () or PCR can be used to amplify this specific region. RFLPs result from point mutations affecting a single restriction enzyme recognition site (E*) which will either be absent or present. If absent, enzyme E will digest the DNA at the two outside E sites; if present, enzyme E will digest DNA at E* as well as the two outer E sites. The position of the probe means only fragments generated from E* and the E site on the right will be detected. Therefore, RFLPs are biallelic, i.e. they give two fragment sizes (large or small) depend- ing on whether the polymorphic restriction fragment site (*) is absent or present respectively. VNTR: The multiallelic VNTR has the potential to be more polymorphic (and so more informative) since the changes in the E-specific restriction fragment are brought about by the insertion of a variable number of repeat units at the polymorphic site (hatched area). Thus, more polymorphic DNA fragments are generated, e.g. the four different sized fragments illustrated. Because of their greater intrinsic variability, VNTRs are more informative since there is a greater chance that heterozygous patterns will be detected at any one locus. Examples of VNTRs detected by restriction enzyme digests are the minisatellites. SSR: In contrast to RFLPs or VNTRs that can be identified by Southern analysis and PCR, SSRs (or STRs) are much smaller in size and so are detectable only by PCR. They are the microsatellites and are polymorphic because repeats involve simple sequences () such as a (CA)n where n is a number usually between 2–4. Amplification of DNA containing a SSR will produce fragments of variable size. SNPs: These single base changes are similar to RFLPs but SNPs are more useful because they do not have to change the restriction enzyme digestion pattern to be detected by DNA sequencing. SNPs can be biallelic or have more than two alleles. Because these are found frequently throughout the genome (four are depicted here in relationship to  which could represent a gene) they have the potential to discriminate alleles more effectively.

As noted in Chapter 1, SNPs are also known any number. Until recently, microsatellites were as SNVs (single nucleotide variations). the workhorse of molecular medicine, but now the SSR-type polymorphisms involve 2–4 base focus has shifted to SNPs because of their utility in repeats such as (AC)n or (GAA)n, where n can be studying complex genetic disorders (Chapter 2) [2].

MOLECULAR MEDICINE 84 3. DNA Genetic Testing

TABLE 3.1 Frequencies for different types of DNA mutations [3].a

Total number mutations for: #1 all genes, Percentage reported for the same #2 cystic fibrosis, #3 α thalassemia genes

Type of mutation #1 #2 #3 #1 #2 #3 Missenseb and nonsense 60 489 866 194 56.0 58.0 58.0 Splicing 10 210 198 12 9.4 13.0 3.6 Regulatory 1 909 9 10 1.8 0.60 3.0 Small deletions 17 040 202 29 16.0 13.5 8.6 Small insertions 7 034 79 5 6.5 5.3 1.5 Small insertions/deletionsc 1 601 26 2 1.5 1.7 0.6 Gross deletions 6 938 70 71 6.4 4.7 21.0 Gross insertions 1 454 15 6 1.3 1.0 1.8 Complex rearrangements 1 035 27 7 1.0 1.8 2.1 Repeat variations 336 8 0 0.3 0.5 0 Total 108 046 1 500 336 – – – aApart from showing the types of mutations and their distribution the table also illustrates that some genes have their own profile of mutations, e.g. the large number of gross deletions in α thalassemia. bMissense mutations can be divided into non-synonymous mutations because they change one amino acid to another. However, silent or synonymous variants that result from changes in the nucleotide sequence without a corresponding change in the amino acid (because the genetic code is degenerate – Table 1.3) should not be discounted as being pathogenic because they can affect gene function through changes in splicing. cUsually abbreviated to indels.

Mutations acid to be substituted by another predominate, and are known as missense changes. Not sur- DNA genetic tests demonstrate how discover- prisingly, heterogeneity at the genotype (DNA) ies in molecular medicine have impacted on clin- level is expressed as heterogeneity in the pheno- ical care. These tests look for mutations in DNA type (clinical picture). Even within families, both using a variety of techniques. The word hetero- subtle and not so subtle differences may be seen geneity will frequently appear when describing between affected individuals. Abnormalities in DNA mutations since, with very few exceptions, the epigenetic pathway are described in Table the number and types that can alter a gene’s 2.9, and these are called epimutations. function are extensive. They range from single As DNA sequencing is becoming the pre- base changes to complex chromosomal rear- ferred approach for mutation detection, an rangements. For example, there are around 1 500 increasing number of DNA variants are being mutations that produce cystic fibrosis. Browsing found. In these circumstances it is often difficult various mutation databases provides a good to be sure whether a DNA change is a muta- overview of the types of abnormalities found tion or a DNA polymorphism. To distinguish with single base changes (missense/nonsense) the two, the variant can be investigated by in the most common followed by deletions (Table vitro, in vivo or in silico strategies (Box 3.1). In 3.1). Single base changes that cause one amino clinical practice it is usual to rely on an in silico

MOLECULAR MEDICINE 3. DNA Genetic Testing 85

BOX 3.1 I N V I T R O , I N V I V O A N D I N S I L I C O A N A LY S E S . When a new DNA variant is found, the impor- was needed. This involved software (in silico tant next question is does it alter gene function? analysis) to model what the DNA variant might This is not easy to decide and in the 1980s it was do to the structure of DNA, RNA or the protein, usual to insert the gene with its variant into a and to predict through comparisons with the plasmid in an expression vector which was then same genes in model organisms the conserva- added to a cell line. The effect of the DNA vari- tion of DNA sequence at the site of the variant. ant on gene function was then compared to a nor- The more conserved the region of DNA or an mal (wild-type) gene in this in vitro assay. While amino acid, the more likely was a change to be these types of assays were not very physiological pathogenic. Today, the in silico approach allows or even at times reproducible, they gave some many variants to be processed quickly but this indication whether the DNA variant altered gene analysis alone does not provide definitive proof expression. Other in vitro approaches utilized that a variant is pathogenic. Increasingly DNA reverse-transcriptase (RT) PCR allowing alterna- genetic test reports are now adding that a find- tive transcripts to be identified. These were use- ing is a variant of unknown significance (VUS). ful if DNA variants were thought to alter splicing. This provides a strong message to the clinician However, they only told you that alternative that there is uncertainty about clinical signifi- transcripts were detectable which might not be cance. It will also require follow-up by the labora- representative of what actually was happening tory at some future date to reassess in the light of in vivo. For getting a more relevant physiological any new information that might have emerged. phenotype, the gold standard became the gen- This follow-up will place an increasing burden eration of transgenic animals particularly the on the laboratory, particularly as the number of mouse. By inserting a gene with the DNA variant VUS reports increase with Next Generation DNA it was possible in some cases to get a clear view sequencing approaches. One recent publication of what the variant did in vivo. However, mak- suggests that for the BRCA1 and BRCA2 gene the ing transgenic animals is time consuming and percentage of VUS results is around 10–20% [4]. expensive and as the discovery of DNA variants More on in silico analysis is found in Chapter 4 increased exponentially a more efficient approach under bioinformatics.

assessment because this approach is fast, and is further discussion on interpreting DNA test realistically the only option in a busy diagnos- results in the section “Medical Management”. tic laboratory with a high throughput, where a quick turnaround time is required. In the coun- Nomenclature seling process, it is therefore essential to ensure that individuals and families are aware how the The naming of genes has been standard- DNA variant was analyzed for pathogenicity, ized through the work of the Human Genome and health professionals ordering DNA genetic Organisation’s (HUGO) Human Gene tests must understand their limitations. There Nomenclature Committee, which has now

MOLECULAR MEDICINE 86 3. DNA Genetic Testing approved over 28 000 human gene symbols and be written as cftr. Symbols for proteins are not names. These are listed on its website [5]. The italicized. purpose of an internationally consistent approach The second challenge for nomenclature has to the naming of genes ensures each has a unique been the naming of DNA variants. Because of identifier. This is critical for research and clinical the heterogeneity of DNA mutations, there service delivery. Unfortunately, there are histori- has been considerable confusion regarding cal names that continue to cause confusion, but how best to describe changes in DNA, RNA this is inevitable in a rapidly changing field. As and proteins. Since the late 1990s, consider- an example, the name for the gene implicated in able effort has gone into putting order into Huntington disease was IT15 (interesting tran- this increasingly complex and confusing field. script 15) for many years, but its official name is This is now the work of the Human Genome HTT. Genes for humans are usually written in Variation Society or HGVS [6]. The key rule is upper case, while those for animals use lower that variants should be described at the most case. The names of genes are italicized; e.g. CFTR basic level – the DNA. Variant names must also is the cystic fibrosis transmembrane conduct- relate to a reference sequence, which can either ance regulator gene, mutations in which produce be genomic DNA or coding DNA. Examples of cystic fibrosis. In a mouse the same gene would nomenclature are given in Table 3.2.

TABLE 3.2 Nomenclature for DNA mutations.

Gene and HGVS Mutation and disease nomenclaturea Comments

Premature stop codon FECH gene: Indicates that in the gene’s genomic DNA sequence at nucleotide 32 385 causing erythropoietic p.Gly321GlyfsX15 (or the coding sequence at nucleotide 963) there is a single base deletion protoporphyria c.963delG of G (guanine). This produces a frame shift (fs) at codon 321 with a new g.32385delG amino acid still remaining as a glycine (Gly). However, because of the NC_000018.8 frameshift there is a premature stop codon (X) at the 15th codon relative to the G deletion. Sickle cell disease (HbS) HBB gene: For historical reasons HbS (glutamic acid is replaced by valine in codon (missense change) p.Glu7Val (CD) 6) is well entrenched and so both descriptions are used in official c.20A  T reports. The HGVS nomenclature describes CD6 as CD7 because by g.70614 A  T convention the A of the ATG (start) codon is nucleotide number 1. NM_000518.4 In the case of HBB this means the codon numbers based on the old terminology will all increase by 1. Cystic fibrosis mutation CFTR gene: The older terminology uses ∆  deletion and F  phenyl alanine at the ∆F508 p.Phe508del 508 position involving the cystic fibrosis gene CFTR. p.Phe508del gives (small deletion) c.1522_1524 del TTT similar information. g.84631_84633del3 NM_000492.3: Hemochromatosis HFE gene: This is the usual mutation found in Northern Europeans with genetic mutation Cys282Tyr p.Cys282Tyr hemochromatosis and the HFE gene. It is also written as C282Y (missense change) c.845G  A (C  cysteine; Y  tyrosine). g.26201120G  A NM_000410.3: ag.  genomic sequence; c.  coding sequence; p.  protein sequence; NC_  the NCBI’s Reference Sequence (RefSeq) with C referring to complete genomic molecules including genomes, chromosomes etc. More commonly NM or NG are used which refer to mRNA transcripts or genomic sequence respectively. In the FECH example here 18.8 is the accession number (18) and the version number (8).

MOLECULAR MEDICINE 3. DNA Genetic Testing 87

Mutations can also be considered in terms of components also allowed new ways to detect the their effect on DNA structure. For example, a C agents causing infectious diseases. A patent was to T substitution is called a transition because obtained to cover the use of PCR, illustrating the a pyrimidine base changes to another pyrimi- growing importance of commercialization in dine, i.e. C ↔ T (transitions also involve purine rDNA technology (Chapter 10). to purine changes, i.e. A ↔ G). In contrast a Clinical health professionals rely on their transversion involves changes in purines to pyri- laboratory colleagues for the technical aspects of midines or vice versa. DNA genetic testing, but need to understand the This degree of detail must seem esoteric to utility as well as the limitations of PCR, not least the practicing clinician. Nevertheless, it is worth to be able to explain them to those undergoing noting that patients (and families) often know the test. a lot about their genetic disorder and regularly PCR is an in vitro technique for the amplifi- access the Internet to learn about new develop- cation of target DNA. It utilizes a DNA exten- ments. Hence, a health professional who does sion enzyme (DNA polymerase) which adds not understand what a mutation means is dis- nucleotide bases in a 5 to 3 direction to a advantaged very early on in the consultation if single-stranded template (Figure 3.3). There questions are asked about the implications of the are three basic steps in PCR: family’s DNA mutation, or commonly recurring mutations such as Cys282Tyr (HGVS nomencla- 1. Denaturation of double-stranded DNA into ture p.Cys282Tyr) are discussed. its single-stranded form; While producing order in chaotic terminol- 2. Annealing of oligonucleotide primers ogy is to be applauded, it is also important to to both ends of a target sequence. The reiterate that molecular medicine is personal- oligonucleotide primers are a type of DNA ized healthcare. Patients and families must be probe – i.e. they are constructed so that involved and should not be considered passive they are complementary to target DNA, but interested participants. This is further con- but unlike DNA probes, primers are more sidered in Chapter 5. Thus, terminology such as likely to be used in a technique such as HbS illustrated in Table 3.2 might be more mean- PCR than for detecting DNA mutations. ingful than the official HGVS name (p.Glu7Val). This complementarity, which extends over a distance of about 20 bases, is sufficient to ensure specificity – i.e. it will not bind DETECTING DNA VARIANTS to other regions of the genome if the right conditions for PCR are used. Oligonucleotide Polymerase Chain Reaction primers are available commercially, and 3. Addition of the four nucleotide bases and a In 1985, work by the Cetus Corporation in DNA polymerase. Taq polymerase is used California made it possible to target segments since it is relatively heat resistant allowing the of DNA with oligonucleotide primers and then denaturation step to be incorporated into amplify them with the polymerase chain reac- the overall cycle without interfering with tion (PCR). The extraordinary contributions the polymerase activity. The introduction of made by PCR in medicine, industry, forensics Taq polymerase meant PCR could become and research were recognized by the award of fully automated and enclosed. The latter a Nobel Prize to K Mullis in 1993. Subsequently, is an important consideration for avoiding the development of automated PCR gave it contamination. Provided there is knowledge of enormous potential for mutation analysis in the DNA sequence, setting up a PCR-based test genetic disorders. The identification of DNA is relatively straightforward.

MOLECULAR MEDICINE 88 3. DNA Genetic Testing

1 3.3 X 109 bp

2 (a) (b)

3 +

4 + FIGURE 3.4 Visualizing DNA in a multiplex PCR gel. This photograph of a gel shows DNA bands amplified by PCR and then separated into fragments by electrophoresis. 5 * + Track 1: DNA size marker, Tracks 2–8: different DNA sam- * ples. The band patterns are complex because this is a mul- tiplex PCR looking for various types of deletions in the α globin complex (and so producing α thalassemia). To dis- tinguish the patterns, the gel is immersed in a DNA stain- FIGURE 3.3 Polymerase chain reaction (PCR). PCR ing dye such as GelRed™. The excess stain is washed off allows amplification of a targeted DNA sequence by using and the presence of DNA is detected by using ultraviolet a DNA thermostable extension enzyme (polymerase) to light. make new copies of the sequence. Oligonucleotide prim- ers give PCR its specificity. (1) DNA. (2a) Double-stranded DNA is shown as blue and yellow bars. Here a region of interest (say 600 bp in size) from the genome is depicted, One PCR cycle comprises steps 1–3 and in (b) the PCR primers are designed to flank the ends described above. After such a cycle, each of of this region of interest. The primers (→ ←) are single- the single-stranded DNA target segments has stranded DNA sequences complementary to the ends of the targeted sequence. (3) Double-stranded DNA becomes become double-stranded through the polymer- single-stranded after heating to about 94°C. (4) The DNA ase’s activities. This is then repeated, and each is allowed to cool to about 55°C which allows the primers time a new target segment of DNA is syn- to stick to the single-stranded DNA at either ends. (5) Taq thesized. Theoretically, the number of tem- DNA polymerase (a thermostable DNA polymerase) and plates produced equals 2n, so after 20 cycles of a mixture of the four nucleotide bases are added and the temperature elevated to about 72°C which allows the Taq amplification there should be somewhere near 6 polymerase to work. The combination of primers, nucleo­ 1  10 templates. In theory, up to 1 billion cop- tide bases and the polymerase will lead to a copying of the ies of the target sequence can be produced by single-stranded segment from the primer. The new cop- PCR. Amplified DNA products are separated ied fragments of DNA are indicated *. The final product is by size with electrophoresis and then visual- double-stranded DNA which comes from the region defined by the primers. At this stage of the PCR, an initial ized by staining of the DNA (Figure 3.4). DNA template has been duplicated. Steps 3–5 are repeated One feature of PCR is its exquisite sensitiv- to produce (in theory if the process is 100% efficient) 2n ity, so that DNA from just one single cell can times the amount of template DNA (where n  number of be amplified. This ability of PCR to amplify cycles), e.g. 20 cycles should amplify the original segment small numbers of target molecules has been about 1  106 times [7]. used in detecting illegitimate transcription. As described in Chapter 1, mRNA is tissue-spe- cific, except for some leakiness in cells such as the lymphocyte. Thus, mRNA that is specific

MOLECULAR MEDICINE 3. DNA Genetic Testing 89

TABLE 3.3 Different types of PCR [7].

Type Purpose

Multiplex PCR Multiple primer combinations are mixed allowing simultaneous amplifications to occur and so many DNA mutations are tested (Figure 3.4). Gap PCR Allows deletions to be detected because primers on either side of the deletion breakpoint only give a PCR product if brought into closer proximity because of a deletion (Figure 3.5). Nested PCR The use of two sets of primers, the second of which lies within the first set of primers thereby increasing the sensitivity and specificity of PCR. In situ PCR Allows mRNA to be identified in tissue sections including formalin fixed paraffin blocks. RT-PCR Reverse transcriptase PCR is used to amplify RNA. Long PCR Allows large segments of DNA to be amplified. Conventional PCR products are usually relatively small fragments measuring in the hundreds of base pairs to around 4 Kb (Kb – kilobase or 1 000 base pairs). With long PCR amplified DNA up to 40 Kb in size is possible because the Taq polymerase is able to proof read the PCR product and correct errors that occur. Q-PCR Quantitation of DNA (or mRNA) is imprecise because amplification reactions have variable efficiency due to product concentration, limiting substrates in the reaction mixture and PCR inhibitors. Now the availability of real time PCR allows the amplification to be monitored as it progresses. In one method, a dye is released with each amplification cycle allowing real time monitoring so quantitation occurs in the exponential phase. A graph plotting dye versus number of PCR cycles is drawn, and the quantitation is based on the number of PCR cycles required to reach a designated cycle threshold (Ct). Ct values are directly proportional to the amount of starting template and so mRNA expression levels or DNA copy number. emPCR Emulsion PCR allows one single-stranded (ss) DNA molecule to be bound to beads. PCR is progressed in a water-in-oil emulsion allowing the isolation of single DNA molecules in aqueous microreactors. The result is each bead will have amplified on it millions of copies of a particular ss DNA fragment. This PCR is used in some Next Generation DNA sequencing methodologies (Chapter 4).

for muscle tissue in disorders like Duchenne from amplified products from previous tests. muscular dystrophy and the hereditary cardio- For genetic disorder detection, contamination myopathies can be characterized by amplifying is avoidable if the laboratory maintains a high mRNA from lymphocytes. PCR is rapid and standard, but is more problematic with infec- automated, so that 30 cycles can be completed tious disease or forensic DNA testing because in one or two hours. There are many different of the smaller numbers of targets used for applications for PCR (Table 3.3). amplification (Chapters 6, 9). The sequence fidelity of amplified products Errors with PCR is an additional consideration when assess- Like any laboratory technique, PCR can pro- ing the usefulness of DNA amplification, since duce the wrong result. Because of its exqui- in vitro DNA synthesis is an error-prone pro­ site sensitivity, contamination by another cess. The error rate associated with Taq DNA DNA source is always a potential problem. polymerase activity is very low. Furthermore, Contaminating DNA can come from other sam- misincorporation of bases tends to terminate ples or the operator, but most commonly arises the DNA synthesis; hence products containing

MOLECULAR MEDICINE 90 3. DNA Genetic Testing

1 Hu4 HD344 5′ - ccgccatggcgaccctggaaaagctgatgaaggccttcgagtccctcaagtccttc

2 CAGCAGCAGCAGCAGCAGCAGCAGCAGCAGC

AGCAGCAGCAGCAGCAGcaacagccgccaCCGCCGCCG 3 X HDC2 CCGCCGCCGcctcctcagcttcctcagccgccgccg - 3′ Hu3 4

FIGURE 3.6 Allele drop-out. Depicted is the DNA sequence for the beginning of the Huntington disease (HD) FIGURE 3.5 Identifying a range of DNA mutations gene HTT. The sequence is read from left to right starting by PCR. (1) A normal stretch of DNA sequence. PCR can with ccg… in the top line. Large upper case letters identify be used to amplify any segment of this DNA provided the (CAG) triplet associated with the development of HD, the sequence is known to allow appropriate primers to be n while at the end of line 3 can be seen the adjacent (CCG) designed. (2) A deletion in DNA is shown. PCR will detect n repeat which is non-pathogenic. To determine whether a this if primers are designed on either side of the dele- patient has HD, the size of the (CAG) needs to be meas- tion (shown by ↓). If the deletion is small both normal and n ured. For this, two PCR primers are designed that flank the deleted fragments will be detected with the same primers. repeats (HD344 and HDC2). These primers can give a false If the deletion is very large, primers depicted might only result if there is a polymorphism along the primer bind- detect the deleted fragment. (3) X indicates a single base ing site, i.e. one allele may not be amplified. Let us assume change in the DNA. The primers shown by the ↓ on either that the non-amplifying allele has 41 repeats. The remain- side of the X will allow that region to be amplified by PCR. ing (second) allele will amplify because it does not have The change can then be detected using DNA sequencing or the polymorphism to interfere with primer binding. Let us digesting with a restriction enzyme. (4) The ….. represents a assume that the second allele has 20 repeats. Because there DNA rearrangement including an amplification of a region. is only one allele, the result could be falsely interpreted as DNA primers designed on either side of this rearrangement being homozygous 20 and 20 (usually written 20/20). To will detect it. In contrast, deletions or rearrangements that avoid this error, a second set of primers (Hu4 and Hu3) do not have known breakpoints will not be detected by is designed for a confirmatory PCR. In this case, the PCR PCR because primers cannot be designed on either side of measures both the (CAG) and the (CCG) but this does not the breakpoints. n n matter because it will show quickly that the patient’s DNA could not possibly be homozygous 20/20. The second set of primers are designed so that they do not overlap the first, although it can be seen that this is not ideal here since there PCR-induced errors will be significantly under- is a four base pair overlap in Hu3 and HDC2. represented in the final result. Today, commer- cially-produced Taq polymerases have much lower reading errors, and there are a number contain contaminants which interfere with on the market with different properties depend- PCR. This can lead to allele drop-out, where one ing on the type of test undertaken. For example, of the two alleles does not amplify efficiently, there are Taq polymerases that are less liable if at all. An error results because the PCR to incorporate errors. Because of their greater products are misinterpreted as representing fidelity these are preferred for diagnostic PCRs. two alleles when it is only one that is actually Specially developed Taq polymerases are present. Allele drop-out can also be caused by required for long PCR (Table 3.3). the presence of a DNA polymorphism in the False negative results with PCR are an DNA primer binding site particularly at the important source of error. These arise in two 3 end. This can interfere with primer binding ways. Firstly, low DNA purity can give prob- leading to failure to amplify one allele (Figure lems, particularly if the source of DNA is sub- 3.6). The second source of error is that dele- optimal, and it is possible that the DNA will tion in one allele may not be detected by PCR

MOLECULAR MEDICINE 3. DNA Genetic Testing 91 unless the laboratory conditions and primers to look for DNA mutations became popular are designed with the deletion in mind. In this in the 1990s. These will be mentioned briefly case, the remaining normal allele is amplified below for historical reasons since they are now and the result will appear to be normal. rarely used. The health professional ordering a PCR- based test should always remember that other Single Base Changes in DNA errors (particularly clerical ones) can occur. The The gold standard in terms of mutation anal- potential for error in genetic DNA testing is an ysis has always been DNA sequencing, because important issue since these tests may have no it allows a mutation to be defined (Figure 3.7). accompanying clinical information to guide the Automated DNA sequencing became avail- health professional. This is a real concern in DNA able for routine DNA testing in the 1990s but predictive tests, since an incorrect result may was expensive. Today, DNA sequencing is not be discovered for many years, and by then a both cheap and accessible, and has become the number of regrettable clinical, personal and fam- preferred approach to testing large genes that ily decisions might have been taken. Hence, it is usually comprise multiple exons with sizeable a wise practice that all clinically important DNA introns. Two sequencing strategies are used tests, particularly the predictive ones, should be with large genes: repeated, or tested in duplicate blood samples to 1. Only the exons and the exon-intron reduce the potential for avoidable errors. boundaries in genomic DNA are studied. This detects most missense changes and a Direct Mutation Detection number of the splicing defects, and 2. cDNA rather than genomic DNA is DNA mutation analysis assumed a higher sequenced by taking mRNA from peripheral profile in the early 1990s, following the dis- blood cells (illegitimate transcription). covery of large and complex genes, and the Sequencing cDNA means the exons are realization that DNA diagnosis could provide studied, although alternative transcripts useful information for the clinical management resulting from splice site mutations may also of patients with genetic disorders. Apart from be detected. the heterogeneity associated with DNA muta- tions, it was also found that some mutations To study small genes or to look for com- recur – i.e. they are present in many unrelated monly recurring mutations, non-sequencing families – while others are family or individual approaches are also possible. A list of some specific. The latter are called private mutations. mutation-detection strategies is given in Some mutations localize to certain hot spots in Table 3.4. a gene while others are more randomly distrib- Although DNA sequencing is the gold uted. At this time, detecting DNA mutations by standard, it is important to realize that errors sequencing was not a practical option because can still occur because mutations are missed sequencing was neither cheap nor rapid, and or sequencing patterns are misinterpreted. so it was not possible to look for all mutations. This has led to the development of software Therefore, early DNA mutation testing proto- that helps interpret DNA sequencing traces. cols focused on the identification of common Various computer programs are discussed in and recurring mutations. The less common or more detail in Chapter 4. Another problem with family-specific ones were not sought unless DNA sequencing is the detection of deletions in the laboratory had a particular interest in a DNA. As shown in Table 3.1 these are impor- disorder. Hence a number of indirect methods tant causes of gene dysfunction and may be

MOLECULAR MEDICINE 92 3. DNA Genetic Testing

G A TT G CC TT A C G A GG A C T C A T A G C A T CC C T C AA GC AA A T A G A T CCT C A TT A CC C A TT G C AA A G G 97 113 129 145

ATTGCCTT ACGAGGAAGGTTC AAC TCCCTTAAGGCAAA TTAA CTTCCAATTCCCAATGGC AA GT 113 129 145 161

FIGURE 3.7 Direct DNA testing for missense changes by sequencing. Shown is DNA sequence from a segment of the factor VIII (hemophilia A) gene. Top is the mother and below is her hemophiliac son. Four colors (black, red, green, blue) represent the four nucleotide bases guanine, thymine, adenine and cytosine. The DNA nucleotide base of interest in this case is shown by ↓. A CAA (blue – green – green) codon (normal) in the woman is replaced by a TAA (red – green – green), i.e. a stop codon in the son. Since the mother does not have this change she is not an obligatory carrier although what can- not be excluded is germinal mosaicism, i.e. a mix of normal and mutant genes in the ova. the predominant mutation in some disorders. ligation-dependent probe amplification) was Not realizing that a deletion might be present developed. This technique incorporates DNA reduces the effectiveness of DNA sequencing. binding, ligation and lastly PCR to enable It can lead to a false negative result because quantitation, i.e. deletions or additional copies DNA is effectively hemizygous at the site of the of a gene can be identified. Up to 50 different deletion and so the remaining normal allele is genomic DNA or RNA sequences can be stud- sequenced. ied, and changes as small as one nucleotide are claimed to be detectable (Figure 3.8). DNA Deletions As mentioned earlier, PCR can detect dele- Indirect Mutation Detection tions in the DNA by gap PCR. However, for this to occur it is necessary to know the char- As DNA sequencing continues to fall in acteristics of the deletion so that appropri- price it is replacing the earlier methodologies ate primers can be designed (Figure 3.5). To that were used to detect DNA mutations indi- look for known as well as unknown deletions rectly. These are DNA scanning and linkage a new technique called MLPA® (Multiplex analysis.

MOLECULAR MEDICINE 3. DNA Genetic Testing 93

TABLE 3.4 strategies for identifying mutations in DNA [7].

Type of approach Description Applications

Direct sizing of a PCR A deletion of two or more bases (or insertions) The ΔF508 (p.Phe508del) deletion involving product can be detected by sizing a PCR fragment. 3 bp is seen on electrophoresis by measuring the size of the PCR fragment generated. RFLP (restriction fragment DNA is digested with restriction enzymes Restriction enzymes are less frequently length polymorphism) and the presence of a single base change used to detect DNA changes but remain can be detected. useful approaches. ASO (allele specific A single-stranded labeled probe is used to ASOs are used to identify a wide range of oligonucleotide) hydridize against single-stranded target DNA DNA mutations as well as polymorphisms. looking usually for single nucleotide changes. OLA (oligonucleotide Two oligonucleotide probes are designed to Useful for a range of mutations including ligation assay) hybridize adjacent to each other on the target insertions, deletions and single base sequence. Once adjacent, the two probes changes. Can be multiplexed. can be joined by DNA ligase. ARMS (amplification Oligonucleotide primers are designed to Useful for a range of mutations and can be refractory mutation system) amplify preferentially one of the two alleles. used in multiplex PCR.

FIGURE 3.8 Direct testing for DNA deletions. A MLPA trace for the β globin gene region is simulated (see also Figure 2.7). The green bars represent control oligonucleotide probes that bind to non-globin regions in the genome. These demonstrate what is normal in terms of binding of probes and PCR. The blue bars are probes binding to the region of interest and here it is the β globin genes and related control region (LCR). The intensity of amplified products, which reflects how much probe was bound, is depicted on the Y axis. This MLPA profile shows an extensive deletion (half intensity indicates a heterozygote, i.e. normal and deleted alleles) affecting some of the d globin gene and the entire β globin gene with the 39 endpoint of the deletion defined between the probe with half and full intensity (far right). As well as deletions, the MLPA test can detect duplications. Figure drawn by Dr Anthony Cheong, Department of Molecular & Clinical Genetics, Royal Prince Alfred Hospital, Sydney, Australia.

MOLECULAR MEDICINE 94 3. DNA Genetic Testing

DNA Scanning To identify uncommon mutations or to study large and complex genes, various DNA techniques were devised to scan a segment of DNA. Changes present were detected via alter- a/b a/b ations in mobility or chemical reactivity of the target compared to normal DNA. However, the changes uncovered were not necessarily ? ? pathogenic. Scanning techniques were used b/b a/a a/b a/a b/b to reduce the amount of sequencing required. a/b One technique was dHPLC (denaturing High Performance Liquid Chromatography) which detected DNA segments with altered nucleotide sequences because they changed the mobility of FIGURE 3.9 DNA linkage study. Understanding how the DNA fragment. Once such a fragment was DNA polymorphisms are used to follow a disease within detected, the presence of a mutation was con- a family (called linkage analysis) is a difficult concept. firmed by DNA sequencing. Other scanning Essentially, a polymorphism is used as a surrogate marker for methods included SSCP (single stranded con- a chromosomal location or gene. In the case of the β globin formation polymorphism), DGGE (denaturing gene depicted here, each individual has two genes and so two polymorphic markers should be detectable. To undertake link- gradient gel electrophoresis) and CCM (chem­ age analysis the first step involves identifying family-specific ical cleavage of mismatch). Today, whenever DNA polymorphic markers that will distinguish the two possible, the clinical DNA testing laboratory β globin genes. The polymorphisms are not mutations but has replaced scanning with DNA sequencing. simply DNA sequence changes or fragment sizes that allow the two genes to be distinguished. Once the polymorphisms Linkage Analysis are identified, they are traced in a family and compared to the Linkage analysis is useful in research strate- clinical phenotypes. In the pedigree given the two parents are β thalassemia carriers. Their carrier status is easily determined gies such as positional cloning (Chapter 2) but by blood counts and special tests for thalassemia. is rarely used for DNA diagnosis because it is They have a female child who has homozygous β thalassemia an indirect approach to mutation detection. It (β thalassemia major) (→), and they also have a normal male. works by finding co-segregation between DNA The thalassemia status for a third (female) child is (?). The polymorphisms and the disease phenotype in mother is also pregnant and the fetus (indicated by a triangle) has an unknown thalassemia status. Let us assume that the members of a family [8]. For linkage analysis it is underlying β globin gene mutations cannot be identified in necessary to have a family under study contain- this family. Therefore, linkage analysis is the next approach to ing at least one known affected individual, or one use. The polymorphisms which distinguish the two β globin confirmed normal member. It is also necessary to genes in this family are defined by the letters a and b. Both the have DNA polymorphisms that are located phys- parents are carriers and have the a/b polymorphic markers. This information alone is not enough for diagnosis. The key ically close to the gene causing the disease. Once individual for this is the homozygous-affected child who is these two prerequisites are available, the inherit- b/b. This shows that the polymorphic marker b identifies the ance of the different polymorphisms through the mutant β thalassemia gene in this family. Therefore, it can be family can be followed and individual markers assumed that the marker a defines the normal gene. This is can be linked to the genetic disorder or the nor- confirmed by showing the normal child is a/a. The child with the unknown status is a/b and so she must be a carrier (which mal phenotype (Figure 3.9). Other members of could have been more appropriately determined through the family or a fetus in utero can then be assessed a blood count than a DNA test). The fetus can have three with the same DNA polymorphisms to predict combinations and these will predict the genetic status, i.e. a/a normal or abnormal phenotypes. ( normal), b/b ( homozygous-affected) and a/b ( carrier).

MOLECULAR MEDICINE 3. DNA Genetic Testing 95

It may be difficult to get families with phe- notypes that are unequivocal, and so a linkage 1 a c e study involves a lot of work. It will not always be possible to undertake such studies, because key family members might be unavailable. DNA poly- b d f morphisms can also be uninformative if they do not allow disease and normal phenotypes to be distinguished. Linkage studies have a number of 2 intrinsic problems including: (1) Non-paternity, I which will give a false connection between a DNA 1 2 polymorphism and the disease gene being stud- ace/bdf ace/adf ied, and (2) Recombination of DNA segments – which is a function of the distance between a polymorphic marker and the gene of interest. II ? Although oversimplified, a physical distance of 1 2 3 4 1 Mb in DNA is roughly equivalent to a genetic dis- ace/bdf adf/bdf ace/ace adf/adf tance of 1 cM (cM  centimorgan). 1 cM indicates a 1% recombination potential – i.e. in 100 meioses 3 b c e there will be one recombination event between the DNA polymorphism and the target DNA of a interest. The use of intragenic polymorphisms d f such as SNPs located within the introns or exons of genes, or microsatellites found within introns or polymorphisms located in the immediate 5 or 3 region of genes reduces the risk of recombination. FIGURE 3.10 Detecting recombination using flank- Another trick when using DNA polymor- ing DNA markers in the adult polycystic kidney disease phisms is to group a number across a seg- locus (PKD1). (1) The three polymorphic markers and their ment into a haplotype. In other words, a single alleles for the PKD1 locus are: a or b; c or d; e or f. The open box () is the normal gene and its associated polymor- DNA polymorphism may not be informative, phisms are a,c,e; the filled box () is the mutant gene and its but when it is used in conjunction with other associated polymorphisms are b,d,f. (2) The pedigree illus- polymorphisms, its value increases. As well trates the segregation patterns for the above three polymor- as increasing the informativeness of polymor- phisms. I-1 (female) has PKD1. Two of her children (II-1, phisms, haplotypes help to identify recombina- II-2) are clinically affected, and so they allow the mutant- specific haplotype to be identified as bdf/ since this is what tion events (Figure 3.10). the three have in common. The one male offspring (II-3) has not inherited the maternal bdf/ haplotype which is consist- ent with his normal phenotype at age 50 years. The remain- CALCULATING RISK ing female sibling (II-4) is a problem. Her adf/adf genotype does not fit. Non-paternity is unlikely since it is the mater- nal haplotype that is the problem. This is an example of DNA genetic testing is undertaken to deter- recombination that has occurred somewhere between the mine risk. Knowing these risks, a patient and a/b and the c/d loci (shown in panel 3). The mutant-specific family can make informed decisions on inter- haplotype has now become adf/ rather than bdf/. Therefore, ventions that will prevent disease or prolong II-4 has actually inherited the PKD1 mutation which would have been missed if only one set of polymorphisms (a/b) well being. Risk estimation is well established had been used in this linkage study, i.e. the recombination in Mendelian (single) gene disorders, but more event would not have been detected and II-4 incorrectly problematic in complex genetic disorders. diagnosed as normal.

MOLECULAR MEDICINE 96 3. DNA Genetic Testing

A number of parameters are used in determin- ing risk and these will impact on the accuracy of the calculation. They are: High 1. Type of inheritance, including the possibility BRCA 1, 2 Absolute de novo of a mutation or mosaicism; CFTR 2. Rare but unusual forms of inheritance such as imprinting; 3. The penetrance of the disorder and related APOE4 Unknown variables such as age and sex; HbS 4. Laboratory data, for example serum cholesterol, which assist in defining a phenotype; Variable 5. Family details such as history of disease, reproductive history, consanguinity particularly for recessive conditions, FIGURE 3.11 The difficult concept of risk in DNA availability of DNA genetic test results for genetic testing. DNA genetic testing can produce a spectrum other family members and non-paternity; of results from completely informative to less informative 6. The degree of definition of the phenotype or even no information at all. (1) Finding that a child with a of the disease itself and the existence of clinical phenotype suggesting cystic fibrosis (CFTR gene) is homozygous for the p.Phe508del mutation provides 100% other independent risk factors, particularly confirmation of the diagnosis. Detecting homozygosity for environmental ones, and this mutation in an asymptomatic newborn through screen- 7. Population data including ethnicity. ing also indicates that the individual will develop cystic fibro- sis. (2) Calculating risks with BRCA1 and BRCA2 DNA testing is more problematic and influenced by the type of mutation Mendelian Disorders found. Some mutations represent founder effects and are com- Calculating risks in Mendelian disorders will mon in certain ethnic populations. The significance of these involve a number of scenarios including a dem- mutations is better understood and an approximate risk can be calculated although it is never absolute as the genetic and onstrated disease (diagnostic DNA genetic test), environmental factors in breast cancer are more complex com- an asymptomatic person who has a higher than pared to cystic fibrosis (see Chapter 7). Mutations that occur background risk because of a family history only within families are called private and their significance (predictive DNA genetic test) or random risks (risk) can be more difficult to determine. (3) Unlike the above sought within a population screening study. two disorders that are caused by many mutations, HbS (sickle cell hemoglobin) is more straightforward as one mutation in Risks will vary from unknown to low to var- a single gene is causative. Therefore, it is relatively easy to iable or even certainty (Figure 3.11). The meth- identify heterozygotes or homozygotes for HbS (a biochem­ical odology used is called Bayesian analysis and test will also do this). Affected homozygotes have a serious relies on a number of considerations including: genetic disorder, however, some will have a milder pheno- type because of other genetic factors such as co-existing α tha- 1. Prior probability: the likelihood of inheriting lassemia, a raised HbF (both these will reduce the level of HbS the disease-causing allele (being a carrier) in the blood) or environmental interactions. (4) Apart from versus not inheriting a disease-causing allele mutations and their risk for disease, there are multiple DNA markers (often polymorphisms) that have been associated with (not being a carrier) before any variables are increased or reduced population risks. An example is APOE4 considered; DNA testing to determine risk for dementia (a complex 2. Conditional probability: probability influenced genetic disorder). There have been numerous studies suggest- by available data on the likelihood of being a ing a link between the E4 variant and dementia but these are population carrier or not being a carrier; relatively low risks and are based (Chapter 6, Box 6.1). What the results mean to any one individual is still uncer- 3. Joint probability which is the product of (1) tain so this type of DNA test is not recommended because it and (2), and has little clinical utility.

MOLECULAR MEDICINE 3. DNA Genetic Testing 97

4. Posterior probability which is a normalized A key one is that these disorders arise from calculation so that the two options of carrier multiple low penetrance but cumulative gene and non-carrier together come to unity. effects, and also have environmental contri- butions (Chapter 2). Risks for the complex Two clinical cases follow, in which the risks disorders differ from the Mendelian ones dis- have been calculated by Bayesian analysis [9]. cussed above, because susceptibility markers or l A female is the daughter of an obligatory genes are used, which are not the sole cause of carrier for hemophilia A, so her starting prior the disease but contribute in an undefined way probability of being a carrier is 1 in 2, or 50%. to its development. This woman has had three normal sons. This A technique such as linkage analysis information is helpful in terms of conditional described above does not work for gene discov- probabilities, because if she were a carrier each ery in the complex genetic disorders because of the three would have had a 1 in 2 chance of phenotypes and modes of transmission are dif- having hemophilia, whereas if she were not a ficult to define. Therefore, a new strategy was carrier all her sons would be normal. Together devised based on case-control comparisons. these facts when multiplied give the joint They are called association studies and have probability. From this a normalization is made evolved into genome wide association studies and then the posterior probability is determined (GWAS) (Chapter 2). The results of these stud- (Table 3.5 has the actual calculations). In this ies provide a statistical probability that a gene case, the woman started with a carrier risk of or DNA polymorphism and a clinical pheno- 1 in 2 but ended with 1 in 9. type are linked, and have proven to be effective l A male of Irish background has a partner who in identifying many genes or loci implicated in is a known heterozygote for the p.Phe508del complex genetic diseases. Research studies uti- cystic fibrosis mutation. They are planning lizing the case control (association) approach to start a family and want to know the risk calculate risks that are expressed as odds ratio, of having a child affected by cystic fibrosis. relative risk or absolute risk (Table 3.5). The male starts with a prior probability of 1 in However, data from association or GWAS 20, which represents the carrier frequency in research studies are now being used by direct- his ethnic background. However, he has had to-consumer DNA testing companies (Chapter 5) cystic fibrosis DNA genetic testing and this has to predict risks for individuals. This is a poor excluded a number of the common mutations, assumption as the complex genetic disorders which reduces his risk of being a carrier by are likely to involve many genes as well as gene- 10% (a conditional probability). Taking all this environment (G x E) interactions and epigenetic information into account, his posterior probability effects. These effect(s) might be captured within shows his risk has been reduced from 1 in 20 to large population studies but would be missed at 1 in 191 (Table 3.5 has the actual calculations). the individual level. Since his partner is a known carrier their Although much will be said about the combined risk of having a child with cystic importance of the molecular medicine team and fibrosis is 1 in 191 (male)  1 (partner)  1 in 4 including the family physician (primary care (recessive condition) or 1 in 764. physician) in the management of genetic cases, calculating risks is not easy and often requires Complex Genetic Disorders referral to a specialist or genetic counselor. Not surprisingly there is concern about direct- A number of assumptions are made when to-consumer DNA testing, which requires the calculating risk for complex genetic disorders. patient or consumer ordering the test to know

MOLECULAR MEDICINE 98 3. DNA Genetic Testing

TABLE 3.5 Calculating risks for mendelian and complex genetic disorders.

Measure Explanations Calculation

Bayesian (hemophilia Probability Being carrier Not being carrier The joint probability of being a carrier case) [9] in this case is 1 in 16 and 1 in 2 of not Prior 1 in 2 1 in 2 being a carrier. These numbers are normalized by dividing each joint Conditional 1 in 2 (#1 son) 1 probability by the sum of the two joint 1 in 2 (#2 son) 1 probabilities. This gives the posterior 1 in 2 (#3 son) 1 probability Joint 1 in 16 1 in 2 Normalization for being a carrier is 1/16 divided by 1/16  1/2 Normalization for not being a carrier is 1/2 divided by 1/16  1/2 Posterior 1 in 9 8 in 9

Bayesian (cystic Probability Being carrier Not being carrier Normalization for being a carrier is fibrosis case) [9] 1/200 divided by 1/200  190/200 Prior 1 in 20 19 in 20 Normalization for not being a carrier is 190/200 divided by 1/200  190/200 Conditional 1 in 10 1 Joint 1 in 200 190 in 200 Posterior 1 in 191 190 in 191

Odds ratio (OR) [10] The odds of disease developing when the risk allele is Formula: present in the case group divided by the odds of disease OR  (a/c) / (b/d) or (ad)/(cb)a developing when the risk allele is absent in the control Worked example group. Used in association case control studies. The closer : APO OR is to 1 the smaller is the difference between the two 4 allele in Alzheimer disease (AD) found in 0.47 AD patients but groups. For rare events (which would be likely with the b complex genetic disorders), the OR approaches the RR only 0.15 controls . Gives OR of 5.02 (relative risk). Generally compared to RR, the OR makes [11]. the effect appear larger.

Relative risk (RR) [10] RR compares risks in two different groups by measuring Formula: RR  (a/(a  b))/(c/(c  d)) the likelihood of disease when the risk allele is present in Worked example one group compared to the likelihood of disease when the : APO risk allele is absent in the second group. This could also be Above data on 4 gene and AD the absolute risk in one group compared to the absolute gives RR of 1.97. risk in the second group.

Absolute risk (AR) [10] The probability that something will happen to an For example, saying that a N-W individual during a specified time period. The absolute European male has a 1 in 400 (0.0025 risk is also called the adjusted life time risk. or 0.25%) risk of developing clinical hemochromatosis.

(Continued)

MOLECULAR MEDICINE 3. DNA Genetic Testing 99

TABLE 3.5 (Continued) Combining risks from If data for multiple risk alleles in a genetic disorder are Marker 1 has a RR of 1.24, marker multiple DNA markers available, then it is possible to multiply them to get an 2 has a RR of 1.32, marker 3 a RR of overall risk provided the markers are independent of each 0.60 and marker 4 a RR of 0.82. The other, for example, linkage disequilibrium is excluded. overall relative risk here would be This is similar to what is done to calculate likelihoods of 1.24  1.32  0.60  0.82  0.81 matches in the forensic case (Chapter 9). a Template Case Control Risk allele present a b Risk allele absent c d bThe APO4 allele has been intensively studied in relation to its risk factor for developing Alzheimer disease. A quick calculation of OR, RR and other parameters can be made [10] using a frequency of 0.47 in patients with Alzheimer disease from the United Kingdom and 0.15 in controls [11].

what test to choose and how to interpret the than industry but they are more likely to result (Chapter 5). take on rare genetic disorders that are less A final consideration concerns the DNA commercially viable, and genetic test itself. How reliable is it, and what 3. Private or public DNA diagnostic service are the possibilities of error? If the laboratory laboratories. These have the practical is appropriately accredited and quality pro- skills but can lack the resources or time cedures are in place, the tests’ analytic valid- because clinical priorities will always take ity (Table 3.6) should not be a major concern. precedence (Figure 3.12). Any known limitations of the test should be discussed as part of the counseling process. How can one get the best of all worlds? One Importantly, PCR is no different to other in vitro approach would be to partner the infrastructure tests and can lead to error. Generally, this is not and innovative work of industry and research well appreciated by health professionals who, laboratories with the skills and experience in because they do not fully understand PCR may quality issues and validation processes found not question its accuracy. in service laboratories. This will not be easy because of conflicting goals in different envi- ronments. However, government can play a DNA GENETIC TESTS role here by providing the incentives for link- ages to be developed. New Tests Once a DNA genetic test is developed, what happens next? A common outcome, particularly Genetic DNA tests have been developed by a with high profile DNA tests, is a rash of media range of organizations: publicity with the researcher awkwardly trying 1. Industry has already demonstrated it is an to balance the exciting potential of the discovery important contributor to gene discovery, with the fact that its clinical significance is still to including the hemochromatosis HFE gene and be determined. Unlike the drug discovery pipe- the breast cancer BRCA1 and BRCA2 genes; line, the research phase for new DNA genetic 2. Publicly funded research laboratories. tests is considerably less well-defined. In rela- Arguably these may have fewer resources tively rare genetic disorders, it is unlikely that

MOLECULAR MEDICINE 100 3. DNA Genetic Testing

TABLE 3.6 Characteristics of a DNA test required for validation and evaluation.

Parameter Explanation

Sensitivity Proportion of individuals with a disorder having a positive/abnormal (DNA) test. Good screening tests have high sensitivity since the aim is to detect as many as possible with the disorder. Calculation  TP/TP  FNa Specificity Proportion of individuals without a disorder having a negative/normal (DNA) test. Confirmatory tests have a high specificity because the aim is to avoid false diagnoses. Calculation  TN/TN  FPa Positive predictive value Likelihood of someone with a positive/abnormal DNA test having that disorder, i.e. in a (PPV) group with a disorder how many will test positive? Unlike sensitivity and specificity, the prevalence of a disorder in the population will influence PPV and NPV. Commonly occurring disorders have a higher PPV and a lower NPV. Calculation  TP/TP  FPa Negative predictive value Likelihood of someone with a negative/normal (DNA) test who are normal, i.e. in a group (NPV) without a disorder how many will test negative? The rarer the prevalence of the disease being tested, the lower the PPV and the higher will be the NPV. Calculation: TN/TN  FNa Analytic validity Tests the laboratory component such as how accurately it measures the genotype. For an ideal DNA test the sensitivity and specificity would be 100%. Quality control and quality assurance are other components of this measure, as are the samples and processes used to obtain the DNA. Clinical validity Ability of the DNA genetic test to detect or predict the presence or absence of the phenotype or disease being tested. Complex issue as it overlaps laboratory, clinical and population measures. Could take into consideration evidence of research findings and measures of test performance [1]. Clinical utility Ability of the DNA genetic test to influence management or lead to clinical improvements or outcomes. Might also include negative aspects such as the potential harm of the test if undertaken inappropriately. This is the most difficult of the four parameters described (analytic validity, clinical validity, clinical utility and ELSI). Another way to consider this parameter is: (1) Purpose of test, and (2) Feasibility of test delivery [1]. ELSI ELSI issues will differ depending on the type of test used. They include: (1) Potential for stigmatization or discrimination; (2) Privacy and confidentiality issues for individuals and family members; (3) The type of consent needed, and (4) Implications of intellectual property on test availability. aCalculating sensitivity; specificity; PPV, NPV [12].

+ − Y axis is the Test result; X axis is Phenotype being measured. TP – true positive; FP – false positive; FN – false negative; TN – true negative. + TP FP

− FN TN evidence for clinical utility from random control- it is probably ready for the clinic. If so, formal led trials will be obtainable because of the low research approval by the institutional ethics frequency of the condition. committee is no longer relevant as the test The transition from a research to a clinically will be judged by standards set by regulatory useful DNA genetic test can be problematic. authorities backed by legislative requirements One view is that if the investigator is will- particularly in terms of the test’s safety; i.e. it ing to stand up in court to defend the test then does what it is supposed to do. Less apparent in

MOLECULAR MEDICINE 3. DNA Genetic Testing 101

Industry

Service New DNA Research Evaluation Clinic Laboratory Tests

Research Laboratory

FIGURE 3.12 DNA genetic test pipeline. New DNA tests emerge from industry, research laboratories and in some cases DNA testing (service) laboratories. How intensive they are evaluated through formal research protocols is dependent on the laboratory’s experience and other factors including public pressure because of the perceived benefits to health. the legislative requirements might be the impor- predicting a risk – a concept that is more mean- tance of the test’s clinical usefulness, which ingful to a wider range of health professionals is discussed later in this chapter. If the DNA and the community. Another predictive test is genetic test is not defensible in a court of law, also called pharmacogenetics and will be dis- then it is still research and should be under- cussed below. taken with the appropriate research oversight. The next consideration is the fairly unique feature of DNA genetic tests that allows the Classes of Tests same test to be used for multiple clinical pur- poses, as illustrated by the HFE genetic DNA DNA genetic tests are not particularly well test in hemochromatosis (Figure 3.14). The understood by many health professionals. It clinical context in which a DNA genetic test is is also concerning that some new graduates, conducted is important, because on this will who will be the practicing health professionals depend the type of consent needed as well as of tomorrow, are not very familiar with these the clinical expertise, genetic counseling and tests. This reflects the very rapid changes that family support that may be required as a com- have developed since the structure of DNA ponent of the test. These are relevant considera- was described just over 50 years ago. The con- tions, and are shown in the HFE model which fusing terminology and classification of DNA uses the same test for two main purposes: diag- genetic tests do not help. This is illustrated in nostic – i.e. confirming a clinical suspicion that Figure 3.13 and Table 3.7. An example of this an individual has hemochromatosis; or predic- confusion would be the distinctions that are tive – i.e. testing at-risk family members for the drawn between a DNA predictive test, a DNA relevant HFE mutation. Consent, support and presymptomatic DNA test and a DNA pre- genetic counseling for the diagnostic test are dispositional DNA test. The precise classifica- not that different if the individual’s serum fer- tions and descriptors are scientifically correct ritin level was exceptionally high and a liver and appropriate but are they really necessary, biopsy had confirmed the cause of this was and do they add to the mystique and potential hemochromatosis. In this scenario, the DNA confusion to the tests? For simplicity in this genetic test replaces the more risky liver biopsy. chapter the aforementioned three tests will be However, a liver biopsy would not be used to called predictive because in effect they are all test asymptomatic family members including

MOLECULAR MEDICINE 102 3. DNA Genetic Testing

Multi-purpose DNA Genetic Test

Preventing disease/ Screening for carriers Relationship testing complications

e.g. predictive, e.g. paternity, e.g. testing for prenatal, workforce, carriers of X-linked pharmacogenetic & ancestry or autosomal recessive conditions somatic cell DNA test

Detecting disease Identifying traits Research

e.g. diagnostic, e.g. behavioral, All types of DNA tests newborn screening, phenotypic and will emerge from prenatal and life style tests research studies somatic cell DNA test

Multi-purpose DNA Genetic Test

FIGURE 3.13 The multi-purpose DNA genetic test. As shown in Table 3.7, there are many ways to describe DNA tests. In this diagram, the function of the DNA test becomes the descriptor and under each function (relationship testing, diag- nosis, screening, identifying traits, prevention and research) different types of DNA tests are to be found. It is important to highlight DNA testing in the research environment because this is how many new tests are developed. Nevertheless, there needs to be a constant reminder that these tests are yet to be fully evaluated for their clinical usefulness. those with normal serum ferritin levels, effective interventions to reduce complica- whereas a DNA genetic test could be used to tions such as cirrhosis or hepatocellular carcin­ predict with variable certainty who was at risk. oma. Both scenarios are feasible but complex, The variable certainty in HFE reflects other con- with the former involving ELSI considerations, tributors to the final phenotype, including sex, and the latter requiring more objective assess- age, and environmental factors such as alcohol ment of the test’s clinical utility, particularly intake. In the predictive scenario, consent, sup- as the penetrance is low in those who are port and counseling become considerably more homozygous p.Cys282Tyr. The same DNA important. genetic test could then be used within the con- As shown in Figure 3.14, the same HFE DNA text of a research study. genetic test could be used for other purposes Based on the comments made above, Figure including prenatal testing for a late onset adult 3.13 attempts to get away from the traditional genetic disorder which is treatable, or screen- naming of DNA genetic tests in terms of every ing populations if earlier detection means more possible permutation, to a more compact view

MOLECULAR MEDICINE 3. DNA Genetic Testing 103

TABLE 3.7 A classification for different DNA genetic tests (see also Figure 3.13).

Name What the test does

Diagnostic Comparable to a traditional laboratory test since confirms a clinical diagnosis, e.g. HFE DNA testing in hemochromatosis. Predictivea Tests an asymptomatic individual at risk for a genetic disorder and predicts the risk of developing it. If the individual with a mutation has an increased risk but not everyone with a mutation develops the disorder it is called predictive to, e.g. BRCA1, BRCA2 DNA tests for breast cancer have penetrance of 36–85%. Presymptomatica Tests an asymptomatic individual at risk for a genetic disorder and predicts the risk of developing it. If an at-risk individual with a mutation in the underlying gene is almost certain to develop the disease in his/her lifetime, the test is called presymptomatic to distinguish it from predictive, e.g. HTT DNA test for Huntington disease has a penetrance of 100%. Predispositional or Tests asymptomatic individuals in a population to predict the risk of developing a genetic Susceptibilitya disorder which is usually complex. Therefore, the risk (absolute or relative) is low, e.g. finding a 2% increase over the general population, e.g. T2D, APOE4. Pharmacogenetic or From single or multiple genetic markers can predict likely response to or toxicity from pharmacogenomic therapeutic drugs, e.g. TPMT, MammaPrint®. Many definitions for pharmacogenomics but not clear why these are all needed. Classification into single gene (pharmacogenetics) or multi gene (pharmacogenomics) is consistent with omics. In practice this type of DNA test is no different from predictive/predispositional or susceptibility testing. Screening Carrier testing usually for autosomal recessive conditions. Tests an asymptomatic population for carriers and so identifies risk to offspring, e.g. cystic fibrosis in terms of reproductive decisions. An accepted public health based screening program involves newborns. Cascade screening or Screening family members of patients diagnosed with a genetic disorder. Useful for its clinical testing descriptor value but does not tell you much about the actual DNA test. Prenatal DNA testing of the fetus for genetic disorders including adult onset ones or sex selection. In a separate category because of ELSI but actually a diagnostic or predictive/presymptomatic test. Pre-implantation genetic A form of prenatal testing but used in an IVF approach thereby avoiding the necessity for diagnosis termination of pregnancy. Life style DNA tests Based on SNP association studies so will involve small risks many of which have not been confirmed. Doubtful that these tests produce much relevant information because still an emerging area. Examples include nutrigenetics and dermatogenetics. Trait testing A predispositional/susceptibility type test that does not deal directly with medical disorders but traits. Examples include sexing, eye color, athletic ability and behavioral traits. Relatedness testing Examples here would be DNA paternity/maternity testing; forensic DNA testing; ancestry or kinship testing; identification in the workplace. Somatic cell testing Unlike all other categories in this table, somatic cell testing does not involve DNA changes in germ cells and so there are no implications for family members. Research All types of DNA tests will emerge from the research laboratory. Therefore, it is important but often difficult to decide when a research DNA test is now appropriate for clinical decision making and so moves into one of the above categories. aPredictive, presymptomatic and predispositional (susceptibility) distinctions might be useful for genetic specialists but are confusing for others including the public. This leads to unnecessary complexity since the tests are all predictive with the major difference being the level of penetrance.

MOLECULAR MEDICINE 104 3. DNA Genetic Testing Pharmacogenetics and Pharmacogenomics Hemochromatosis A drug’s efficacy and its potential for side DNA Genetic Test effects are influenced by many parameters including: 1. ADME (absorption, distribution, metabolism Diagnostic and excretion) which in turn relies on the individual’s well being, or presence of Screening disease particularly in relation to liver, kidney, heart and lung function. External factors such as age, sex, weight, body fat,

Prenatal smoking, alcohol intake and nutrition are also important; Research 2. Drug-drug interactions, and 3. Variability due to genetic makeup involving germline and somatic changes in DNA.

Predictive The existence of a genetic contribution to drug metabolism has been known since it became evident that drug levels in the blood or urine were changeable and heritable. However, it was not until the molecular era that this could be clearly attributed to vari- ous drug metabolizing genes, and the basis FIGURE 3.14 Different applications for the same of these effects identified. An example of how hemochromatosis DNA genetic test. Detecting the much more we now know about drug effects is p.Cys282Tyr mutation in the HFE gene (the cause of genetic hemochromatosis) can be used for multiple purposes. provided by considering the cytochrome P450 enzymes and gene family (Box 3.2). The combination of traditional pharmaco- logic knowledge and genetics became pharma- based on six broad outcomes. This binning of cogenetics. There are many different definitions the DNA genetic test is artificial but perhaps for pharmacogenetics and the related phar- more helpful to those who need to understand macogenomics. Those used in this text are: better the breadth and potential of the tests. Pharmacogenetics: The effect that the genotype has As noted in Table 3.7, DNA genetic tests on an individual’s drug response, and generally can be conducted for purposes other than deals with a single or small number of gene effects. direct medical care. These can be called recrea- To be consistent with the concept of omics intro- tional tests and include genealogy or life style duced in the next chapter, pharmacogenomics issues such as dermatogenetics and physical involves the use of genome-wide strategies, includ- or behavioral traits (Chapter 5). Three other ing microarrays or DNA variant profiles to identify circumstances in which DNA genetic testing is the inherited basis for differences between individu- undertaken will be described in the following als in their responses to drugs. chapters. These are: (1) Workplace (Chapter 6); As the costs of healthcare soar, the empha- (2) Paternity testing (Chapter 9), and (3) sis on prevention increases – and what bet- Insurance (Chapter 10). ter way to save health dollars than to reduce

MOLECULAR MEDICINE 3. DNA Genetic Testing 105

BOX 3.2 CYTOCHROME P450 ENZYMES. Inherited variations in our ability to metabo- metabolize the drug at one end of the spectrum lize drugs are common. An important class of to, (2) Ultra fast metabolism of the drug. A large enzymes involved in drug metabolism is the number of drugs (close to 20% of those com- cytochrome P450 enzymes (CYPs – acronym monly prescribed) are metabolized by CYP2D6. for cytochrome P). These are the major phase I About 5–10% of Caucasians have a deficiency in drug metabolizing enzymes and so involve their metabolizing potential, and so drug effects oxidation, reduction and hydrolysis. CYPs are exaggerated. This deficiency is inherited as that share at least 40% DNA sequence homol- an autosomal recessive trait. Those with two ogy are grouped within families denoted by mutations (one mutant allele from each par- an Arabic number. A letter after this denotes a ent) are particularly at risk. Once the CYP genes subfamily, and members within subfamilies are were cloned it was possible to show that muta- numbered sequentially, for example, the gene tions were heterogeneous including single base CYP3A4. Although humans have a large number changes and deletions. Over 700 DNA variants of P450 enzymes, important drug metaboliz- have now been described although many of ing activity is found within families 1, 2 and 3 these have not been assessed for functional sig- and within these families there are six major nificance [13]. This heterogeneity in the number enzymes: CYP1A2, CYP2C9, CYP2C19, CYP2D6, of mutations would make it difficult to under- CYP2E1 and CYP3A4. Although the main focus take routine DNA screening for CYP2D6 with so far has been on drug metabolism, CYPs are conventional technology, although exome or also essential for metabolizing some toxins. For whole genome sequencing (Chapter 4) would example, enhanced activity of the CYP2D6 gene be an ideal approach to screen for variants in has been associated with a number of cancers this and other genes. Another interesting muta- (bladder, liver, pharynx, stomach and cigarette tion with CYP2D6 is the presence of multiple induced lung cancer). The explanation is that gene copies with up to 12 being described, i.e. increased metabolism of environmental toxins patients with this abnormality would be super- by CYP2D6 leads to the accumulation of carci- metabolizers, and so drug doses would need to nogenic intermediates. CYP2D6 also illustrates be increased to achieve a therapeutic effect. This the broad effects that a gene can have on drug variant seems to be particularly common in East metabolism ranging from: (1) An inability to Africans.

the frequency of adverse drug reactions? The 3. 4% of the hospitals bed capacity was used to importance of this was illustrated in a UK study treat these patients, with an estimated cost of involving 18 820 patients admitted to two hospi- £466 million, and tals over a six month period. It showed that: 4. Most side effects were considered avoidable, or possibly avoidable, with the common 1. 1 225 of these admissions were related to problem drugs being warfarin, low dose adverse drug reactions; aspirin, diuretics and non-steroidal anti- 2. Overall mortality was 0.15%; inflammatory drugs [14].

MOLECULAR MEDICINE 106 3. DNA Genetic Testing

Clinical Practice pharmacogenomic tests are listed in Tables 3.8 In mid 2003, the USA’s Food and Drug and 3.9. Other regulators have not taken the FDA Administration (FDA) considered whether it approach and do not require information about should recommend that DNA testing for TPMT pharmacogenetic DNA testing. Not surprisingly, mutations (a gene involved in the metabolism there is reluctance by most health professionals to of the cytotoxic thiopurine drugs 6-mercaptop- add pharmacogenetic DNA testing to the patient urine and 6-thioguanine) become mandatory work-up. This procrastination is unlikely to be before these drugs are used. An advisory com- justifiable in the longer term, particularly as the mittee recommended against this, although it concept of personalized medicine continues to suggested more information was given about be advocated, leading to greater expectations by risks with TPMT genetic variants. The FDA members of the community. Will the regulators went along with this because of: need to take more positive action for practice to change? The alternative is the medico-legal driver 1. The high cost of the DNA test; which will inevitably lead to a lot of unnecessary 2. The difficulty some physicians might have in testing. interpreting the results; Despite the lack of progress, there have been 3. Possible delay in starting treatment; some success stories: 4. The potential that testing might reduce l Abacavir is used for treating HIV infection. drug doses and so suboptimal treatment, It was shown in 2008 that patients with which could have serious consequences in a the HLA B*5701 genotype were at risk of potentially fatal disorder such as leukemia, developing a potentially life threatening and drug-related allergic reaction known as the 5. A final justification was the ease with which Stevens-Johnson syndrome [19]. Although the drug’s toxic effects could be monitored this risk is uncommon (occurring in about by serial blood counts. 5–8% of patients), it is a serious adverse Of the above, (4) would seem to be per- event that can be reduced in frequency to suasive, but the others less so. The costs of about 3.4% by DNA testing for the risk the DNA test would seem insignificant com- genotype and avoiding Abacavir in these pared to complications such as neutropenia or patients. The pharmacogenetic approach thrombocytopenia, if these prolong the hos- has now become a routine part of HIV pital stay. It would also seem reasonable to do management (Table 3.8). the test after treatment was started, so that, as l Clopidogrel is an important anti-platelet a minimum, those falling outside the normal drug used to prevent clotting after coronary distribution in terms of gene activity could be stenting. It is a prodrug and for activation monitored more regularly. The FDA’s decision must be oxidized by cytochromes such was not particularly helpful because it identi- as CYP2C19. Within the community, fied a risk and a possible way to avoid it, but variability in clinical response to the drug is left the final decision to the health professional. considerable. Poor metabolizers (CYP2C19*2 Since the latter do not generally know much or CYP2C19*3 genotypes, found in about about pharmacogenetics, it is not entirely sur- 2% of whites, 4% of blacks and 14% of prising that very few took up this option. Chinese according to the manufacturers) Today, about 10% of drugs approved by the will have less active drug and are at risk FDA contain pharmacogenetic information of the stent being thrombosed leading to although it is not mandatory for DNA testing [15]. myocardial infarction. The FDA recommends Some examples of available pharmacogenetic and that those who are poor metabolizers

MOLECULAR MEDICINE 3. DNA Genetic Testing 107

TABLE 3.8 some examples of pharmacogenetic germline DNA testing [16].

The TPMT gene is involved in the metabolism of thiopurine drugs used to treat leukemia, rheumatoid arthritis, inflammatory bowel disease and prevention of graft rejection. The TPMT enzyme has variable activity with high, intermediate and low metabolizing potential. Low metabolizers (~1 in 300 individuals) are more prone to complications; predominantly neutropenia if given standard doses. High metabolizers are more likely to reject organ transplants because the effective dose is reduced. DNA tests can distinguish the low and high metabolizers allowing drug dosage to be adjusted. The molecular basis for differential gene function involves missense changes with the important variants being TPMT*3A (Caucasians) and TPMT*3C (South East Asians or Africans). CYP2D6 and CYP2C19 are genes involved in metabolizing a range of drugs for treating depression and psychosis. Different genotypes have a significant impact on drug metabolism, e.g. 7–10% of Caucasians are poor CYP2D6 metabolizers and the percentage of rapid CYP2D6 metabolizers varies considerably in different ethnic groups. The UGT1A1 gene is involved in metabolism of irinotecan used to treat metastatic colorectal cancer. 20–35% of patients treated with this drug experience severe diarrhea and neutropenia with about 5% mortality. The UGT1A1*28 variant is associated with higher risk of complications. VKORC1, CYP2C9 are two genes that explain up to 40% variance in clinical response to warfarin used as an anticoagulant for many clinical indications. Other factors that influence the warfarin effect are age, sex, smoking, liver disease and concomitant medications. A narrow therapeutic index and high variability in drug response make warfarin a good candidate for pharmacogenetics. Poor metabolizing variants associated with VKORC1 and CYP2C9 can lead to bleeding complications particularly in the first few months after starting treatment. CYP2D6 plays an important role in Tamoxifen metabolism, a drug used to prevent recurrence of breast cancer after treatment. Tamoxifen is a prodrug and must be metabolized to its active product endoxifen. There is considerable inter-population and individual variability in how this drug is activated ranging from poor, intermediate, extensive to ultra-rapid. About a third of women treated with tamoxifen relapse and there is variation in the frequency of side effects. Some trials have shown a relationship between genetic metabolizing status and clinical response while others have not. Statins are front line drugs to treat dyslipidemias. They function by reducing serum cholesterol and stabilizing atherosclerotic plaques. An important and common side effect of these drugs is myopathy which can be asymptomatic showing up only as a raised creatine kinase or it can be associated with life threatening rhabdomyolysis. Because of its role in liver transportation, variations in SLCO1B1 function detected by SNPs are thought to predispose to myopathy. Drug- drug interactions are also important contributors to myopathy, particularly those that inhibit CYP3A4 function. Stevens-Johnson syndrome and toxic epidermal necrolysis is a serious and even fatal skin reaction to a number of drugs including carbamazapine (epilepsy) and abacavir (HIV). It has been shown that patients with certain HLA types (carbamazapine: HLA-B*1502 in patients of Asian ancestry while HLA-B*3101 is the risk allele in Europeans, and abacavir: HLA-B*5701) are more likely to develop this complication of therapy. Therefore, these drugs should not be used in patients with these HLA types unless the benefits clearly outweigh the risk.

consider alternative products. Rarely, also need to be considered. This illustrates individuals metabolize this drug more the complexity of pharmacogenetics, where rapidly (CYP2C19*17) and so are at risk of drug-drug and drug-gene interactions occur, bleeding. A final consideration is drug-drug but also reinforces the potential impact of interactions and here it is important to note genes on efficacy as well as adverse events. that a number of drugs can interfere with CYP2C19 and so reduce the effectiveness of Somatic cell DNA testing has already been clopidogrel. Until 2011 this appeared to be mentioned in this chapter (Table 3.7) and will the complete story, but then data started to be discussed further in Chapter 7. It represents emerge suggesting there were other genes a growth area in DNA testing and will allow a involved in metabolism and these would more personalized medicine approach in drug

MOLECULAR MEDICINE 108 3. DNA Genetic Testing

TABLE 3.9 somatic cell-based pharmacogenomic DNA testing [17,18] and two pharmacogenetic tests.

Products Comments

Tests similar to MammaPrint® (Figure 3.15) include The pharmacogenomic-based strategies involve measurement of Oncotype DX™ a 21 gene panel and Theros a 2 gene multiple genes using microarrays and are sometimes called gene panel ratio signature coupled with the molecular expression signatures. grade index. HER2 gene amplification: About 30% of women with Trastuzumab is a humanized monoclonal antibody against the metastatic breast cancer have overexpression of the HER2 receptor. This drug is expensive and has significant side HER2 protein (human epidermal growth factor 2 effects. Therefore it is best used in personalized therapy, i.e. only receptor). These cells are less likely to respond to in those patients who are most likely to respond. Assessing the conventional therapies and so a novel treatment copy number of the HER2 receptor is possible by conventional using a monoclonal antibody targeted to the HER2 immunohistochemistry staining or a molecular cytogenetics test receptor was developed. Overexpression of HER2 is such as FISH. generally caused by amplification of the HER2 gene. KRAS gene: Cetuximab is another monoclonal Like Trastuzumab, Cetuximab works best in a particular cancer antibody used in cancer therapy. It is specific for subtype, i.e. where the KRAS gene in colorectal cancer is the wild EGFR (epidermal growth factor receptor) and type. Activating mutations usually found in exon 2 of KRAS (about is effective in colorectal cancer unresponsive to 42% of tumors) are now sought before therapy with this expensive chemotherapy. drug is started.

selection for cancer treatment based on knowl- clinical trials (RCTs) mandated by regulatory edge of DNA mutations in the patient’s own authorities. The second is the post-marketing cancer tissue. This approach also provides a experiences – i.e. the results of using the drug on boost for new drug discovery, discussed in patients for therapeutic purposes. more detail in the next section. Somatic cell Pre-marketing testing is stringently control- DNA testing was first applied clinically in led and expensive because it involves RCTs. leukemic disorders, because changes in DNA Many products fail at this stage if the regu- made it easier to confirm or even make a clini- latory authorities are not convinced of the cal diagnosis. Subsequently, somatic cell DNA product’s efficacy, or there are concerns about testing enabled the progress of disease and its toxicity. Increasing costs to manufacturers and therapy to be followed through detection of falling healthcare budgets are having a nega- minimal residual disease (Chapter 7). tive impact on the drug discovery pipeline and fewer new drugs are being produced. In this Drug Development environment, a more cost effective way to con- Genomic technologies increasingly play a role duct an RCT is needed. One pharmacogenetic/ in many aspects of the drug discovery pipeline, pharmacogenomic approach is to stratify the including target identification, target validation, subjects by DNA testing and select for trial only lead identification, in vitro biomarker discov- those likely to respond. Another approach is to ery and animal safety testing [20]. Once devel- exclude from the trial those who are likely to oped, drugs need to be evaluated for safety have serious adverse events as determined by and efficacy through clinical trials. Information their pharmacogenetic profiles. about the potential for adverse drug reactions Drugs that have failed the regulatory process is obtained at two stages in a drug’s develop- are being re-evaluated using retrospective phar- ment. The first is the pre-marketing randomized macogenetic/pharmacogenomic stratification

MOLECULAR MEDICINE 3. DNA Genetic Testing 109

Rx #1 High Risk

2 500 genes

Breast Cancer Patients 70 genes

Rx #2 Low Risk

FIGURE 3.15 Pharmacogenomic analysis to guide breast cancer prevention treatment modeled on a commercial prod- uct MammaPrint® [18]. A DNA microarray can be developed using knowledge of genes likely to be involved in breast cancer. Initially this microarray would include many redundant genes. In one example the initial gene expression profile involved around 25 000 genes. The key genes providing information about prognosis in terms of metastasis-free 5 year survival were then identified by comparing two groups: women with breast cancer who had relapsed within 5 years and women who had not relapsed. From the 25 000 genes a more defined expression profile was developed which in the case of MammaPrint® has 70 genes. Data generated with this microarray using banked biopsy tissue from breast cancer were then validated and the product was approved by the FDA. This genomic profile can be used in conjunction with the more traditional prognostic parameters to guide decision making in terms of what adjuvant treatments might be useful to prevent relapse. In some cases of low risk breast cancer, it might even be reasonable to have no adjuvant treatment because the prognosis for some in this subgroup is excellent. Post-marketing ongoing evaluation of this product is still underway. More on this test is found in Box 4.3. to see if benefits can be improved or the fre- environment and ethnic-specific effects that quency of side effects reduced. This is a poten- will influence the efficacy and toxicity profile tially important gain for the pharmaceutical of a new drug. Post-marketing monitoring does companies, although as described in Chapter not involve formal randomized studies, but 10, it will produce a cohort of individuals who generally relies on observations. These will not might not be eligible to have these new drugs easily detect rare or unusual effects. Ultimately, particularly if they are subsidized by govern- costly and long term cohort studies are needed ment. Hence, despite its potential value, some to fully evaluate the efficacy and side effects pharmaceutical companies are not taking the associated with drugs. Not surprisingly, it can stratification approach because they would be expected that new drugs will continue to prefer to develop a drug that has wider appli- produce unexpected side effects unless other cations and then, if necessary, re-evaluate using strategies, for example, pharmacogenomics, stratification. can be added to the regulatory and marketing Pre-marketing cannot answer all questions steps. Another evaluative approach that has about toxicity, or provide all permutations and recently assumed greater prominence is com- combinations of genes, drug-drug interactions, parative effectiveness research [21] (Chapter 4).

MOLECULAR MEDICINE 110 3. DNA Genetic Testing

BOX 3.3 THREE EXAMPLES ILLUSTRATING THE CLINICAL USEFULNESS OF A DNA TEST. (1) Huntington disease (HD): Almost all who cancer, there is no guarantee that a mutation have Huntington disease will have a muta- will be detectable. Even if mutations are not

tion involving an expansion of a (CAG)n found in known breast cancer causing genes, triplet in the HTT gene, i.e. the test is very sen- the individual remains at risk of sporadic forms sitive. The rare exceptions have clinical fea- of breast cancer. However, in contrast to HD,

tures of the disease but normal (CAG)n repeats. there are therapeutic options in breast can- These are phenocopies due to other HD loci, cer albeit fairly radical ones like prophylactic e.g. HDL2 involves an expansion of triplet bilateral mastectomy. (3) Hemochromatosis: The repeats in the junctophilin 3 gene. No one with DNA test to diagnose genetic hemochromato- triplet repeats 26 in HTT will develop HD (i.e. sis is very useful because it is associated with a the test is highly specific). An individual with relatively simple and effective treatment involv- triplet repeats 40 will develop HD and the ing regular venesection. Its value in a commu- penetrance is 100%. Complexities arise with nity screening program is more problematic intermediate level of repeats (between 27–39) because: (i) The DNA test provides most infor- (Table 2.4). So the HD DNA test itself is very mation when individuals of north-western useful but this information does not lead to any European ethnic background are tested, and a known treatment at present, although impor- homozygous p.Cys282Tyr mutation is detected; tant life decisions are better informed. (2) Breast (ii) Penetrance is low and variable since there cancer: Testing for BRCA1 and BRCA2 DNA are environmental and other genetic factors that mutations is a different story since the pene- influence progression and severity, and (iii) The trance can change from 36% to 85% depending costs associated with education and provision of on ethnic background and the type of muta- genetic counseling and support are considerable tion found (Chapter 7). In addition, because of particularly if dealing with communities with the size of the gene and the number of genes multi-ethnic backgrounds. likely to be found in the genetic forms of breast

EVALUATION variable is the penetrance of the underlying genetic disorder (Table 2.2). The ideal DNA diagnostic test should be sen- sitive and specific. To predict the likelihood of ACCE disease on the basis of the test result, the param- eters of positive predictive value and negative The ideal DNA test scenario would include: predictive value become important. Although 1. A highly sensitive and specific test; well established in practice, the parameters in 2. A genetic disease with high penetrance, and Table 3.6 reflect population-based values and 3. A disease that is treatable (Box 3.3). so can be less meaningful for the individual. As well as the above measures to assess the use- Because DNA tests are more complex than fulness of laboratory tests, another important the conventional pathology tests, different

MOLECULAR MEDICINE 3. DNA Genetic Testing 111

Overlying this are quality assurance programs that guarantee that the final reports are correct and allow laboratories to benchmark their per- Clinical Validity Clinical Utility formances against others. The OECD has made an important contribution here through its 2007 Clinical phenotype Risk versus benefit report on quality assurance guidelines in DNA Sensitivity, genetic testing [23]. Clinical trials, Specificity, The second component of the evaluation is NPV, Economic evaluation, PPV, Interventions, the clinical value or effectiveness of the test (cap- Penetrance QA tured by clinical validity and clinical utility); in other words, does the test make any difference Laboratory genotype Broader issues to the clinical management of the patient (and Sensitivity, Acceptability, their family), or does the test alter the outcomes Specificity, Discrimination, of the disease? Included in the test’s clinical value QC, Privacy QA would be ELSI. As there are increasing demands on health dollars, there is often a health eco- nomic analysis as part of the assessment. A use- Analytic Validity ELSI ful model that illustrates how DNA genetic tests can be evaluated is given by epilepsy. In a 2010 report, it was shown that although there are over 20 genes now described as being involved in epi- lepsy, it was possible to use an ACCE systematic FIGURE 3.16 The four components to the ACCE approach to evaluate and identify the diagnostic approach for evaluating DNA tests. These are: A – analytic or predictive DNA tests that had some, a lot or validity; C – clinical validity; C – clinical utility and E – ethical, legal, social implications [1,22]. little clinical utility [24]. Despite considerable work and discussion about ACCE and other approaches to evaluat- approaches have been developed for evalua- ing DNA genetic tests, there remains a lack of tion. One of these is called ACCE (A – analytic consensus on what is the most appropriate way validity; C – clinical validity; C – clinical utility forward and, in some jurisdictions, there is little and E – ethical, legal, social issues) (Figure 3.16, commitment to formal evaluation of DNA genetic Table 3.6). Much has been written on the value testing, particularly its clinical utility. Therefore, of the ACCE framework and how it can be it is timely to be reminded that if this is difficult, improved as well as other approaches to evalu- the challenges for the emerging genomic based ating genetic DNA tests [1,22]. DNA/RNA tests are even greater. There are two important components in evaluating a DNA genetic test. The first is the test itself including how well does it measures CHALLENGES what it is supposed to measure – broadly cap- tured by analytic validity. This measures labora- Genetic Counseling tory performance and is usually well addressed by regulatory agencies whose main focus is the Genetic counseling provides patients and safety of medical products. Continuing assess- families with information which allows them ment of safety is ensured through the various to make informed decisions. Who gives the quality control steps included in each assay. genetic counseling? This is relevant as a new

MOLECULAR MEDICINE 112 3. DNA Genetic Testing

Somatic Germline Germline Germline cell Diagnostic Screening Predictive DNA test

Implications Increasing risks for: of DNA test ELSI other family members

FIGURE 3.17 ELSI and DNA genetic tests. A DNA test can have different implications for individuals and family members. The most straightforward are the somatic cell DNA tests because these affect only the patient and have no sig- nificance for family members. In contrast, all germline DNA tests will have some implications for family members because genes are shared. Within the latter group are different layers of complexity based on the reason for testing. Germline diag- nostic DNA tests have the least risks because they are only confirming that the patient has a disease. In contrast, the germ- line predictive DNA tests are of potential concern because they are dealing with clinically normal individuals. In between are the screening tests with ELSI that will vary depending on the underlying disease, populations being screened, prepara- tion and support for those being screened and so on. cohort of health professionals called genetic to the individual case. Has knowledge of the counselors has emerged. We can distinguish molecular pathology of thalassemia, arguably two types of genetic counseling, given by: the most intensively studied of all Mendelian genetic defects, reached the stage that the co- 1. A range of medical practitioners including inheritance of β thalassemia with other genetic specialists and family physicians, and changes (e.g. an increase in fetal hemoglobin 2. Professional genetic counselors who are skilled or coexisting α thalassemia) will enable a to deal with complex problems or family based confident prediction of severity to be made? studies that require expertise and resources not Unfortunately, the answer is no. readily available in clinical practice. To avoid the implication of genetic determin- ism, genetic information, including the results Individuals seeking advice because of a fam- of DNA genetic tests, is considered comparable ily history of genetic disease or a couple in the to other types of medical information. However, prenatal diagnosis situation are given appro- DNA testing is not so straightforward, since priate information that will allow informed the same test can be used in different clinical decisions to be made. The rapid advances in scenarios and the associated risk/benefit con- molecular genetics can assist this process but siderations can vary considerably. An example can also complicate it if the information is already noted is the germline DNA test versus incomplete, or is not linked to relevant thera- the somatic cell DNA test which will have sig- peutic options. For example, a question fre- nificant differences in ELSI (Figure 3.17). quently asked is how severely affected will be an offspring with a particular genetic disorder? In β thalassemia, the molecular basis for the Medical Management milder, non-transfusion-dependent form called thalassemia intermedia is understood in some The management of patients with inherited cases. However, other factors are also involved, genetic disorders can be different to the tra- making population-based data less relevant ditional model of clinical care, because risks

MOLECULAR MEDICINE 3. DNA Genetic Testing 113

molecular medicine team is multidisciplinary and must also include a close link with research to ensure both an effective translational pipe- Clinician - line as well as responsible use of research DNA Researcher tests in the context of clinical care. There are two certainties about molecular Primary Specialist medicine. The first is that new research discov- Care MD Physician eries will continue and they will require skilled clinician-researchers to work with others to Individual/ translate their findings into clinical practice. Family The second is in relation to DNA genetic test- ing. With the move from genetics to genomics, Laboratory Support as well as the use of modern DNA sequenc- Scientist Network ing approaches (Chapter 4) it is inevitable that there will be an increasing number of VUS. Counselor These include DNA missense changes as well as structural variations. In terms of DNA genetic testing there are some emerging challenges: 1. What is the laboratory’s responsibility for FIGURE 3.18 Molecular medicine team. The focus ongoing reviews of VUS results to see if new for molecular medicine is the individual and family. information has emerged to allow them to be Depending on the type of DNA test, various other members of the molecular medicine team are needed. Ultimately, re-classified into more definitive categories the primary care (family) physician will be the key profes- of pathogenic or non-pathogenic variants? sional involved in the long term care and so must be fully 2. How can the DNA test report be readable engaged in the process. For the molecular medicine team to and also provide sufficient information to be effective will require an electronic health record (EHR) show the evidence underpinning some of the for data storage or a link to where data are stored. It would be expected that DNA tests might be generated by differ- conclusions or results? ent members of the team so the EHR is important to allow An overview of the laboratory’s responsibil- all authorized team members to access data and results and avoid unnecessary testing. Although the patient and family ity for follow-up, titled The Coming Explosion are the focus they need to assume responsibility for their in Genetic Testing – Is There a Duty to Recontact? genetic information. identifies 10 salient points that should inform further discussion, particularly as whole genome and exome sequencing (discussed in are shared between family members and so ill Chapter 4) markedly expands the pool of VUS health or even well being is not limited to one [25]. Some points that have not already been individual. One model where all genetic dis- made include: orders are dealt with by genetic specialists has worked well to date, but is not sustainable in 1. Current VUS will become interpretable in the longer term particularly with the move the future; from Mendelian genetics to complex genetic 2. Multiple healthcare providers might be disorders. Therefore, a team approach utilizing responsible for test results included in the the expertise and, perhaps just as importantly, medical record; the resources of a number of health profes- 3. Patients will have different expectations for sionals is the way forward (Figure 3.18). The being recontacted, and

MOLECULAR MEDICINE 114 3. DNA Genetic Testing

4. The health implications for relatives and the the genetic cancer disorders where considerable physician’s duty of care are yet to be fully epidemiologic and molecular data have been explored. collected. One model has five variants classes:

One surprising conclusion notes that despite Class 1 – not pathogenic (0.001 probability considerable debate, there is no clear understand- of being pathogenic); ing of whether there is an obligation to recon- Class 2 – likely not pathogenic (0.001–0.049); tact patients with updated DNA test results. Class 3 – Uncertain (0.05–0.949); However, this comment is based on the tradi- Class 4 – likely pathogenic (0.95–0.99), and tional model of clinical care where the clinician is Class 5 – definitely pathogenic (0.99) [4,26]. the gatekeeper. As will be discussed in Chapter 5, This is an ambitious attempt at describing the model may not be relevant in a rapidly chang- and quantifying risk so it is easier to under- ing area like molecular medicine where more stand for the physician who ordered the test responsibility needs to be taken by the patient. as well as the patient. However, there is still a An alternative approach to long term follow-up is long way to go particularly with the non-cancer already underway through the US-based Knome® disorders where data required to assess risk company and involves the individual holding the are not so readily available. The work has been primary DNA sequencing data and periodically taken up as one of the activities of the Human returning to the company for an update on what Variome Project [27]. new information is available. Differing viewpoints are held on the con- tents for the DNA test result. Either, (1) The report should be brief and simple so that it is References understandable by a non-specialist physician, [1] Burke W, Zimmern R. Moving beyond ACCE: an or, (2) The report should include all the data expanded framework for genetic test evaluation. 2007. with the expectation that there may be changes www.phgfoundation.org/policydb/11756/ present that could be important at some future [2] Facts on SNPs: www.ornl.gov/sci/techresources/ Human_Genome/faq/snps.shtml date when more is known about DNA variants [3] The Human Gene Mutation Database at the Institute and how they might impact on the phenotype. of Medical Genetics in Cardiff (www.hgmd.cf.ac.uk/ Option (1) is a little unrealistic unless the dis- ac/index.php). A professional version with a larger order is a Mendelian one with fairly straight- number of mutations is also available but this requires forward mutations that provide unequivocal a paid subscription (see https://portal.biobase- international.com/hgmd/) evidence for or against disease. Huntington dis- [4] Calo V, Bruno L, La Paglia L, et al. The clinical signifi- ease DNA testing might be an example of this cance of unknown sequence variants in BRCA genes. situation, although even here there is a range of Cancers 2010;2:1644–60. DNA triplet repeats that are difficult to interpret [5] HUGO gene nomenclature committee. www. (Table 2.4). Option (2) relies on long term follow- genenames.org/ [6] Human Genome Variation Society. www.hgvs.org/ up but should be the format to aim for because mutnomen/recs.html there is considerable uncertainty with many of [7] Metzker ML, Caskey CT. Polymerase chain reac- the changes detected, particularly with the move tion (PCR). In: Encyclopedia of Life Sciences (ELS). into omics that is the subject of the next chapter. Chichester: John Wiley & Sons, Ltd.; 2009. One problem with option (2) is how to [8] Borecki IB. Linkage and association studies. In: Encyclopedia of Life Sciences (ELS). Chichester: John describe the significance of results so they are Wiley & Sons, Ltd.; 2005. clinically meaningful. Attempts are underway to [9] Sharpe NF, Carter RF. In: Genetic testing: care, con- classify variants according to risk, particularly in sent and liability. New Jersey: Wiley-Liss; 2006.

MOLECULAR MEDICINE 3. DNA Genetic Testing 115

[10] EpiMax table calculator – epidemiology and lab sta- [20] Semizarov D, Blomme E. In: Genomics in drug dis- tistics from study counts. www.healthstrategy.com/ covery and development. New Jersey: John Wiley & epiperl/epiperl.htm Sons Inc.; 2009. [11] AlzGene Forum providing many data and facts in [21] Khoury MJ, Rich EC, Randhawa G, Teutsch SM, AD. www.alzgene.org/ Niederhuber J. Comparativeness effectiveness [12] www.wikihow.com/Calculate-Sensitivity,-Specificity,- research and genomic medicine: An evolving partner- Positive-Predictive-Value,-and-Negative-Predictive- ship for 21st century medicine. Genetics in Medicine Value 2009;11:707–11. [13] Home page of the P450 (CYP) allele nomenclature [22] Addressing challenges in genetic test evaluation: committee and database of mutations in these genes. evaluation frameworks and assessment of analytic www.cypalleles.ki.se/ validity 2011. US Agency for Healthcare Research and [14] Pirmohamed M, James S, Meakin S, et al. Adverse Quality. www.ncbi.nlm.nih.gov/books/NBK56750/ drug reactions as cause of admission to hospital: pro- [23] OECD guidelines for quality assurance in spective analysis of 18 820 patients. British Medical molecular genetic testing 2007. www.oecd.org/ Journal 2004;329:15–19. dataoecd/43/6/38839788.pdf [15] FDA’s list of drugs with pharmacogenetic tests [24] Ottman R, Hirose S, Jain S, et al. Genetic testing in the available. www.fda.gov/Drugs/ScienceResearch/ epilepsies – report of the ILAE Genetics Commission. ResearchAreas/Pharmacogenetics/ucm083378.htm Epilepsia 2010;51:655–70. [16] Wang L, McLeod HL, Weinshilboum RM. Genomics [25] Pyeritz RE. The coming explosion in genetic testing – and drug response. New England Journal of Medicine is there a duty to recontact?. New England Journal of 2011;364:1144–53. Medicine 2011;365:1367–9. [17] Sotiriou C, Pusztai L. Gene expression signatures [26] Plon SE, Eccles DM, Easton D, et al. Sequence vari- in breast cancer. New England Journal of Medicine ant classification and reporting: recommendations for 2009;360:790–800. improving the interpretation of cancer susceptibility [18] HTA: Impact of Gene Expression Profiling Tests genetic test results. Human Mutation 2008;29:1282–91. on Breast Cancer Outcomes. www.ahrq.gov/ [27] Human Variome Project: www.humanvariomeproject. downloads/pub/evidence/pdf/brcancergene/brcan- org/ gene.pdf [19] Mallal S, Phillips E, Carosi G, et al. HLA-B*5701 Screening for Hypersensitivity to Abacavir. New England Journal of Medicine 2008;358:568–79.

Note: All web-based references accessed on 15 Feb 2012.

MOLECULAR MEDICINE CHAPTER 4 Omics

OUTLINE

Introduction 117 Other Omics 137 Proteomics 137 DNA Sequencing 117 Metabolomics 140 Technology 117 Phenomics 142 Bioinformatics Support 119 Metagenomics 146 Research Applications 121 Clinical Applications 122 Systems Biology 147 Clinical Applications 147 DNA Microarrays 125 Technology 125 Overview 149 Gene Expression 126 References 151 SNP Microarray 127 Array-Based Comparative Genomic Hybridization (aCGH) 128 Bioinformatics 129 Research Applications 131 Clinical Applications 134

INTRODUCTION research discoveries and clinical care through molecular medicine. The Human Genome Project has generated more questions than answers, but there is little doubt that it has led to many new technological DNA SEQUENCING developments. The ability to study all or most genes, mRNA transcripts, proteins and a range Technology of cellular products was considered as the emergence of omics in Chapter 1. This chapter Arguably one of the most significant, recent, will expand on omics, and the way it is driving technological developments has been DNA

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00004-9 117 © 2012 Elsevier Inc. All rights reserved. 118 4. Omics sequencing; which is now faster, cheaper and DNA cloning step was no longer required, it easier to do [1,2]. A chronology of the develop- became faster and easier to use. Improvements ment of DNA sequencing is given in Table 4.1. over the original chain termination method It shows a slow start before 1977, followed by a and the availability of capillary electrophoresis period of reliance on Sanger sequencing, which (Box 4.1) ensured that it is now a routine part achieved its full potential once it became fully of clinical diagnostic work. However, it remains automated. A new round of innovations fol- expensive, and considerable work is needed to lowed the completion of the Human Genome annotate the DNA variants found (Chapter 3). Project in 2000, leading to the development of The next major development was an increase next generation (NG) DNA sequencing. in throughput, that was made possible by dra- Although Sanger enzyme-based DNA matically increasing the number of sequences sequencing was initially less popular than generated. Initially called massively parallel the Maxam and Gilbert chemical method, it sequencing, this is now known as NG DNA soon became the preferred technique because sequencing (Table 4.2). The megabytes (Mb) it needed fewer toxic materials and, once the of DNA sequences that were generated by the

TABLE 4.1 Landmarks in the development of DNA sequencing.

Date Event

1953 Structure of DNA shown to be a double-stranded helix. 1972 Recombinant DNA technologies allow DNA to be cloned. 1977 Sequencing methods developed by A Maxam and W Gilbert using chemical degradation and F Sanger using enzymatic synthesis. Sanger and Gilbert awarded Nobel Prize for this achievement. Late 1980s First semi-automated sequencing platforms developed commercially. Sequencing lengths generated measured in kilobases (Kb). 1995 First complete bacterial sequence described for H. influenzae by J Venter and colleagues. Now sequencing options expand from Kb to Mb (megabase). 2000 First sequence of human haploid genome announced. Takes until 2003 for annotated more accurate version to be published. Cost around $3 billion. 2004 The US National Human Genome Research Institute funds work to reduce the cost of a whole genome sequence to $1 000 in 10 years. 2004–2005 The move from megabase (Mb) to gigabase (Gb) of sequencing length comes with the realization that Sanger sequencing has reached its limitation and new approaches based on massively parallel sequencing start to emerge. 2007 Complete human diploid genome sequences publicly announced for J Watson and J Venter with the former costing about $2 million. 2009 First human genome sequence using single molecule sequencing technique. 2010 Single molecule sequencing (third generation) has the potential to increase sequence data generated from Tb (terabyte) to Pb (petabyte). Advantages include: faster, occurs in real time, longer read lengths and easier detection of heterozygous changes. Claims are made that a whole genome sequence will cost $100 and take 15 minutes to do within 5 years. 2010 First publication of a human metagenome – an additional layer of complexity for bioinformatics.

MOLECULAR MEDICINE 4. Omics 119

Sanger method expanded up to the gigabyte cost limitation is the DNA or RNA preparation (Gb) and terabyte (Tb) range. In NG DNA steps that require cloning or PCR. However, sequencing, individual sequencing fragments more novel approaches are being developed are very small (around 100 bp) so it became nec- including sequencing in real time. One that essary to have multiple coverage with a de novo is particularly interesting is single molecule whole genome sequence typically requiring sequencing because it allows the initial cloning 330 coverage to ensure that most (but not nec- or amplification step in NG DNA sequencing essarily all) regions of DNA were adequately to be bypassed thereby saving time and money. represented. For re-sequencing or targeted Presently, single molecule (also called third sequencing applications, a less dense cover- generation) DNA sequencing is in the roll-out age was acceptable. Another popular strategy phase and there are only a limited number of was exome sequencing, which was considerably research publications available that describe its easier and faster to do as only known exons utility and applications (Table 4.2). (including their exon-intron boundaries) were included in the clonal/amplification/sequenc- Bioinformatics Support ing steps (Figure 4.1). The four components to NG DNA sequenc- Compared to Sanger sequencing, the data ing are shown in Figure 4.2. NG DNA output from NG DNA sequencing is a signifi- sequencing utilizes some conventional meth- cant bioinformatics challenge with support odologies such as sequencing by synthesis required in: (1) Production bioinformatics to pro­ (DNA polymerase) or ligation (DNA ligase) and cess raw sequence data generated including these remain expensive. A significant time and quality assurance steps to remove suboptimal

BOX 4.1 DEVELOPMENTS IN DNA ELECTROPHORESIS. After PCR the next step is usually electro- For clinical and forensic DNA testing slab phoresis to assess PCR products or separate gels have been replaced by capillary gels. These them into fragments. DNA is negatively charged are commercially produced and involve a very and so migrates towards the positive electrode fine capillary packed with gel. Capillary gel (the anode). Sizing of separated DNA is under- electrophoresis has revolutionized DNA electro- taken by comparing against standard markers. phoresis because it is fast, reproducible, auto- Mobility shifts can now be identified and meas- mated and quality assurance measures can be ured. The separation of DNA is possible with implemented. Sizing is undertaken by computer slab gels made from agarose or polyacrylamide software which takes away another source of (for smaller fragments). Slab gels have inher- human error. ent problems when precise fragment sizing is needed for clinical diagnosis or forensic cases: 1. Variable texture that can influence electrical conductance leading to inconsistency in fragment size calling, and 2. Automation is difficult.

MOLECULAR MEDICINE 120 4. Omics

TABLE 4.2 Comparisons between DNA sequencing methods [1,2].

First generation (Sanger) DNA sequencing: In 1977 chemicals and radioactivity were used for sequencing but these were soon replaced by enzymatic methods. Until PCR became available DNA needed to be cloned to generate multiple copies of single fragments. Sequencing reagents were then incorporated into the PCR itself in what was known as dye termination (Sanger) sequencing. For Sanger sequencing about 100 DNA fragments yielding ~1 Kb of read length are sequenced in parallel. The introduction of capillary electrophoresis allowed greater automation, better QC and more accurate sizing. Multiple samples (96) could be analyzed simultaneously. This type of sequencing remains the gold standard in relation to error and reproducibility. Second Generation (Massively parallel or Next Generation) DNA sequencing: From 2005, Mb to Gb of DNA sequence could be generated through massively parallel sequencing of millions of short (50–150 bp) fragments. Although this terminology best describes the new development, it was soon overtaken by the preferred Next Generation (NG) DNA sequencing. For this development, it was necessary to fragment DNA and then prepare pure cloned DNA using various types of PCR including emulsion or bridge PCR. The actual sequencing utilized the traditional Sanger synthesis approach or other methods such as ligation. This was the first major step towards the $1 000 whole genome sequence. It is still expensive compared to Sanger sequencing although the dollar per base cost is very cheap. A downside of NG DNA sequencing is the relatively small fragments sequenced although each year the read length increases. The size limitation is overcome by the depth of sequencing (for example 330) through the generation of massive (parallel) amounts of overlapping sequences that are then able to be placed in the appropriate part of the jigsaw puzzle through bioinformatics. One ongoing concern is that small fragments might give a distorted view of the genome and so NG DNA sequencing remains under evaluation for clinical diagnostic work. Third Generation (single molecule) DNA sequencing: This started around 2007 and is work in progress. Advantages include bypassing the initial library/cloning/PCR DNA preparation step and going directly to sequencing of single molecules. This became possible as miniaturization allowed a single DNA molecule to be sequenced in real time. Read lengths are predicted to be longer and output is said to be 1 000 times NG DNA sequencing. The expected commercial competition will ensure hardware costs continue to fall. The use of single-stranded DNA without the requirement for cloning or PCR is attractive from a clinical diagnostic perspective because it avoids the inherent errors that occur with amplified DNA. Informatics implications are more complex as Tb to Pb of data are generated. Computer storage capacity and analytic software will remain significant limitations.

Germline DNA OR Somatic DNA Research Clinical diagnostic Direct-to-Consumer

Whole Sanger Targeted Exome genome sequencing NG sequencing NG sequencing sequencing

FIGURE 4.1 The evolution of DNA sequencing. Traditional Sanger sequencing is moving towards Next Generation (NG or massively parallel) DNA sequencing with a whole genome sequence the ultimate goal. In the meantime, there are intermediate applications proving popular until the costs or the bioinformatics infrastructure for whole genome sequenc- ing are addressed. They include: (1) Targeted sequencing (or re-sequencing) which allows the study of many genes in the one sequencing run. An example would be to study all known breast cancer related genes (around 20 genes in 2011) rather than the limited BRCA1 and BRCA2 genes to progress further down the path of personalized medicine. (2) Exome sequencing (all exons in the human genome) which has enabled the discovery of new genes for Mendelian disorders. The different DNA sequencing options should also be considered in the context of germline DNA versus somatic cell DNA, and how they were provided (research, diagnostic or direct-to-consumer).

MOLECULAR MEDICINE 4. Omics 121

cheaper) whole genome sequence possible in the not too distant future (Box 4.2). As the data 4 Analysis 3 generated in sequencing expand to Pb (peta- of sequence data byte), resources may become rationalized, with fewer but larger centralized laboratories per- Production forming the actual sequencing and analytic bio- of sequence DNA/RNA data informatics being conducted in-house. If this Libraries PCR 2 happens, more bioinformatics capacity will be required. Cloud computing may solve some of Generating 1 DNA these issues, particularly storage, but there will sequence be concerns around privacy and security, as the legal oversight will be dependent on where the computing facility is located. FIGURE 4.2 Four components to NG DNA sequencing. (1) DNA (or RNA) preparation steps, preparation of librar- ies and fragment amplification by PCR. These are time con- Research Applications suming and costly steps likely to be replaced by more direct access to DNA through single molecule sequencing in third One concern of the Human Genome Project generation platforms. (2) The DNA sequencing method- was the bypassing of hypothesis-driven ologies usually involve a stepwise chemical synthesis step. research, with block-buster type projects rely- This represents a target for cheaper costs and improved efficiency in third generation platforms. Two bioinformat- ing on a mass of data to produce something ics steps follow: (3) Data processing and (4) Data analysis. useful. NG DNA sequencing will further pro- These remain potential road blocks to the $1 000 genome mote this approach. Nevertheless, impressive having clinical utility because the cost will not be in the research findings have already emerged and actual sequencing but the bioinformatics. As already noted, it is increasingly difficult to criticize a strategy third generation platforms will reduce the sequencing costs but will complicate the bioinformatics because of the larger that may be the only way to answer difficult data sets (Tb to Pb) compared to Gb to Tb with NG DNA questions. NG DNA sequencing in medical sequencing. research has been used for: sequence. Data are then ready for the customer l Cataloging and understanding diversity who will have specific requirements for analy- in humans, animals and other organisms. sis, and (2) Analytic bioinformatics which is the l Revisiting the pathogenesis of complex next step in the process and dependent on the diseases. research aims. l Replacing or improving GWAS. Approximately half the costs of NG sequenc- l Providing an alternative approach to ing are in the bioinformatics component, i.e. transcriptomics. software costs and skilled scientists’ time. In l Drug development through identification most cases it is the bioinformatics that is lim- of novel targets. iting, as software available for conventional DNA sequencing, with its focus on long read Although discussion and interest tends to lengths, does not work well with the shorter focus on whole genome sequencing, a related read lengths and large data sets generated with strategy that is often preferred is whole exome NG DNA sequencing. Thus, new software and sequencing, because it is cheaper, easier to do algorithms are being developed. and has smaller bioinformatics requirements. NG DNA sequencing is evolving rapidly into This approach captures only a small proportion third generation platforms, making a $1 000 (or of the genome (the protein-coding genes), which

MOLECULAR MEDICINE 122 4. Omics

BOX 4.2 ARCHON GENOMICS X PRIZE. As well as the goal of a $1 000 whole genome would mean lower health costs. As of late 2011, sequence, another incentive for progress was the Archon Genomics X Prize had not been won announced in 2006. This was the Archon and the cost for sequencing a whole genome was Genomics X Prize, worth $10 million, to be given considerably less than $10 000. Therefore, a new to the team that could sequence: initiative was announced, revitalizing the Prize and making it more focused and relevant – with 1. 100 human diploid genomes; the $10 million reward remaining. Now the pur- 2. In 10 days; pose was to sequence 100 human genomes from 3. For $10 000 per genome; centenarians and so was dubbed as 100 over 100. 4. With 1 error in every 105 bases sequenced, Since centenarians represent a rare and extreme and human model for studying aging, it is hoped 5. The sequence must accurately cover at least that the whole genome sequencing approach 98% of each genome [3]. might shed further insight into the genetic basis The X Prize Foundation is described as an edu- of aging, as well as providing an incentive for cational non-profit organization, whose goal is improved technology. It is interesting to compare to create radical breakthroughs for the benefit of the standards expected in 2006 with those in 2011 humanity. The reasons given for selecting whole which required: genome sequencing was to include in mutation 1. A whole, medical-grade, genome sequence; detection a more comprehensive profile of an 2. 100 human haploid genomes; individual’s DNA mutations including those that 3. Completed within 30 days (the longer might be missed because they are in regulatory time frame was considered necessary after regions or repetitive sequences, and to catalog consultation with industry); mutations occurring exclusively in somatic cells. 4. Total cost of $1 000 per genome; Getting a more comprehensive profile of an indi- 5. Accuracy of 1 error per 106 bases, and vidual’s genomic makeup was expected to assist 6. 98% completeness including identification of the pursuit of personalized medicine for pharma- insertions, deletions and rearrangements. cogenetics and preventive medicine by screen- ing for mutations before disease was established. The competition was scheduled to run over a Ultimately the personalized medicine approach month from 3 January to 3 February 2013.

is a limitation since it will miss regulatory important in research and more so in clinical sequences and copy number variations (CNVs). testing. Whatever approach is used, the sensitivity and specificity of NG DNA sequencing is still Clinical Applications being defined, particularly for clinical applica- tions. This is made more difficult as new plat- The many research applications of NG DNA forms continue to emerge on a regular basis. sequencing had placed little pressure on indus- Quality assurance issues with sequencing are try to consider how this technology might be

MOLECULAR MEDICINE 4. Omics 123 used in the clinic. However, in 2010 new plat- becomes a form of targeted DNA sequenc- forms emerged, designed for the clinical­ diag- ing and helps reduce the number of unwanted nostic laboratory. Some applications for patient incidental findings that will invariably emerge care include: with NG DNA sequencing. In contrast to the above, there is less con- l Somatic cell cancer DNA testing. sensus on how NG DNA sequencing will be l Targeted gene DNA testing. delivered as a clinical service. The first two of l Diagnosis of difficult cases. the five clinical applications described above l Clinical screening of asymptomatic are moving forward. Somatic cell DNA testing individuals. for cancer is being developed through work l Reproductive screening. like that of the International Cancer Genome Discussions of the role of NG DNA sequenc- Consortium (Chapter 7). Although guided by ing in clinical care will generate many differ- various research protocols, the results obtained ent views. Some clinicians are emphatic that are being used, often on an ad hoc basis, for the technology should not be the driver, and decisions on patient care. The second applica- it is still too early to move this broad, catch- tion, involving targeted sequencing, is also pro- all, sequencing strategy from research into the gressing. For this, a number of genes relevant clinic. Others express the view that NG DNA to a patient’s clinical disorder can be studied sequencing has the potential to revolutionize simultaneously, rather than being sequenced the way medicine is practiced, particularly in separately. This could potentially be affordable terms of personalizing decision making. Two (and so improve access), have a reduced turn- key points in the debate are technological/qual- around time and give a better overview of the ity issues, and the way this type of sequencing health problem. is delivered in clinical care. Unsurprisingly, the An example of targeted NG DNA sequenc- direct-to-consumer market has taken on NG ing is an individual with hypercholesterolemia, DNA sequencing and is moving forward with who could have the LDLR and other genes attractive offers underpinned by broad dis- involved in lipid metabolism sequenced to claimers, encouraging individuals to purchase confirm the diagnosis of familial hypercholes- their whole genome sequences (Chapter 5). terolemia as the underlying cause of elevated Overall, the technological aspects of NG cholesterol levels. The DNA mutation can then DNA sequencing for clinical care are less of be used for testing asymptomatic family mem- an issue, although concerns around quality bers (predictive DNA testing). Sequencing remain to be addressed. There is a general view applications can be taken further if the patient that results are not given to the patient until is treated with a cholesterol lowering agent they are validated against the gold standard such as a statin, since it becomes possible to of Sanger sequencing or they are confirmed check for the presence of genes with pharmaco- using a different NG DNA sequencing plat- genetic relevance (Table 3.8). This comprehen- form. Outstanding technological issues will sive DNA-based care could be undertaken by be addressed as the analytic platforms evolve. cloning or amplifying by PCR the target genes Bioinformatic tools are important because they and then NG DNA sequencing. Alternatively allow results to be filtered, so even if a whole a whole genome or exome strategy can be fol- genome or exome sequence is obtained, it is lowed, filtering out what is not needed. possible to remove or hide data or genes or NG DNA sequencing for diagnosing a dif- segments of the genome that are irrelevant to ficult clinical problem is acceptable, particu- the clinical problem under consideration. This larly if there is a significant health risk and

MOLECULAR MEDICINE 124 4. Omics conventional approaches have failed to find the incomplete knowledge. In particular, the use of cause. Some examples of this approach are start- SNPs in complex genetic disease is contentious, ing to emerge, which illustrate how this technol- as many findings come from association stud- ogy can be life saving (see Overview below). ies and so the clinical utility for an individual’s What is certain is that as more sequencing is health is difficult to assess. It is sobering but done, more variants of unknown significance not surprising to note that one variant found (VUS, Chapter 3) will be found, and these will in this patient was reported to cause late onset place an increasing burden on the laboratory hypertrophic cardiomyopathy, and then subse- and the clinician. The patient and family may quently shown to be a benign polymorphism. be given a list of DNA changes that are yet to This emphasizes the need for: (1) Mutation be classified in terms of an illness, and more databases and their careful and methodological problematic, DNA mutations that are associ- curating to ensure that data entered are correct, ated with known diseases. Thus, germline NG and (2) A more rigorous approach when analyz- DNA sequencing to screen healthy individuals is ing variants if only in silico data are used. potentially a concern because of the likelihood The way in which the results were given to that variants with pathogenic potential will be the patient is also noteworthy. Included were found incidentally. Some of the earliest whole three tables with a long list of genes and vari- genome sequences of high profile individuals, ants associated with disease but different degrees including the Nobel Laureate James Watson and of significance using headings such as unknown the genomics researcher J Craig Venter, have importance or potentially important. The results already demonstrated that each individual can were depicted in a complex conditional depend- have over 100 of these changes, including some ency diagram, highlighting risks of various that are purported to be lethal, with no appar- diseases that had at least a 10% post-test risk ent effects on health. Which sequence changes probability. A finding that should be followed have real consequences, and which are artifacts with further evaluation to assess clinical util- of the technology remain to be determined. ity was the comment that ….63 clinically relevant previously described pharmacogenetic variants …… in Case Study genes that are important for drug response …… [4]. Some insight into how personalized medi- cine will be developed through whole genome New Clinical Paradigm? sequencing is starting to emerge. An example Will NG DNA sequencing change the way of this is a clinical risk assessment based on a DNA genetic testing is undertaken in the clinic? 40 year old male’s family history and his whole Presently DNA sequencing using the traditional genome sequence [4]. Genomic risk factors Sanger approach to detect mutations in the were estimated from: BRCA1 and BRCA2 genes costs over $2 000. Yet, the goal for NG DNA sequencing is to sequence 1. Variants in genes causing Mendelian genetic the whole genome for around $1 000 (and disorders; exome sequencing already costs less than this). 2. Novel mutations detected during the study; An obvious goal would be a once-in-a-lifetime 3. Variants implicated in genes influencing drug whole genome sequence with the data stored metabolism, i.e. pharmacogenetic tests, and in an electronic health record. Appropriate fil- 4. SNPs associated with complex genetic tering then allows relevant genes to be inter- disorders. rogated each time they might provide useful This study highlights a new paradigm for clinical information, e.g. prior to taking medica- medical care based on comprehensive but as yet tion, or in testing for health issues like diabetes,

MOLECULAR MEDICINE 4. Omics 125 heart disease and in an aging community, DNA MICROARRAYS dementia. The same DNA sequence might also help at the time of death as a component of This section deals with the transcriptome the traditional postmortem (Chapter 9). Since and ways in which it may be studied. DNA a whole genome sequence need only be done microarrays (DNA chips) are 2D grids con- once to look for germline mutations, it would taining ordered high density arrangements of be very cost effective compared to the current nucleic acids spots. Each spot (up to ~102 to piecemeal approach that relies on sequencing 106 spots in any one array) represents a DNA single genes. The economic benefits will not be probe that is attached to an inert surface such missed by those holding the health dollars. as a glass slide or a silicon wafer. Target DNA Some issues that will influence how effec- or cDNA can be hybridized to the probes. tively NG DNA sequencing progresses into the Microarrays allow a snapshot to be taken of clinic include: gene or cellular activity in the cell. They also provide a composite picture of multiple DNA 1. The accuracy of NG DNA sequencing markers such as SNPs or CNVs. This informa- compared to the gold standard Sanger tion can be compared between controls (normal sequencing. A UK study, describing cells, tissue or study cohorts) and patients, to exome sequencing to detect mutations in identify significant differences. High through- TP53, BRCA1 and BRCA2 genes in breast put screening of gene expression can reveal cancer, suggests that overall reagent costs molecular signatures of what is occurring at the and analysis times were reduced, and cellular level. This knowledge can be exploited the sensitivity and specificity of Sanger in the clinic for diagnostic purposes, or in sequencing could be achieved by obtaining research to understand disease initiation and 50 coverage with NG DNA sequencing [5]; progression. 2. How to ensure secure storage of the large data sets generated and the protection of Technology privacy? Fortunately, various professional organizations have already started to deal Microarray probes are either double- with the relevant ELSI (Chapter 10). Perhaps stranded (ds) DNA or oligonucleotides. The only a temporary solution is needed if costs probes can be printed using similar technol- fall below $1 000 (and it has been suggested ogy to ink jet printers. dsDNA probes are larger that they could fall as low as $100), since than oligonucleotide ones and so have higher storage and privacy issues might be sensitivity, although the specificity may be addressed by repeating the whole genome lower. Since oligonucleotide probes are smaller, sequence each time it is needed; they allow a larger number of spots per micro- 3. The best way to evaluate the clinical utility array. Printed microarrays can be developed of this approach. It has been proposed in-house for particular purposes, and the array that only NG DNA sequencing that leads density is typically around 10 000 to 30 000 [6]. to actionable clinical decisions should be Commercially available in situ synthesized undertaken. This is sensible although it will microarrays allow a much higher density of depend on how actionable is defined, and spots (around a million) because the oligonu- 4. Educational and workforce issues need to cleotide probes are synthesized directly onto be addressed, in particular the training of the surface of the microarray. An example of scientists and clinicians in the interpretation this is the Affymetrix Genome Wide Human of DNA variants. SNP Array 6.0 that has 1.8  106 spots (genetic

MOLECULAR MEDICINE 126 4. Omics markers) for detecting SNPs and CNVs. The In this type of analysis it is possible to measure costs of commercial microarrays are falling, any number of mRNA species. For example, but the trade-off is that they cannot always what is the difference at the genomic level be individualized for particular experiments. between a cell line that is growing normally Importantly, only known genes, variants or and the same cell line that has become cancer- mRNA species are detectable. ous? A way in which to make this comparison is Target DNA that is hybridized to the to hybridize the mRNAs from the two different microarray can be labelled with fluorescein cell lines against a microarray which has genes which allows multiple colors to be detected of relevance to carcinogenesis (Figure 4.3). with a laser. Microarrays can be studied in Differences in expression might help explain the different cells or tissues, and comparisons biology of tumors, or detect tumor-specific tar- in terms of gene expression are made. An gets for better diagnostics or new drug develop- accepted cut off for gene expression in micro- ment. Commercially produced microarrays are arrays is greater than two-fold (this means an now available covering a wide range of genes up-regulated gene) or less than 0.5 fold (that (TP53, CYP450) or genetic pathways (apoptosis) is, a down-regulated gene). It should be noted or organisms (E. coli gene array). that expression microarrays are only screens. More objective predictors to guide treat- They identify likely changes in the transcrip- ment and prognosis would be invaluable for tome. Results need to be confirmed by more managing many diseases, particularly cancers. specific measures, such as real time Q-PCR An example of what might be possible is the (Table 3.3). clinically-based microarray test MammaPrint® New bioinformatics tools were required approved by the FDA for breast cancer diagno- to address the needs of microarrays. These sis (Box 4.3). The MammaPrint® test also high- included the design of probes for the hybridi- lights a number of problems: zation conditions required, and the analysis of complex data sets. Analysis includes the com- 1. Costs must be reasonable to allow greater parison of the various hybridization signals to access for patients; ensure quality and consistency between experi- 2. Work practices must change to ensure ments as well as inter-laboratory variability. availability of fresh tissues (to isolate The ability to assess the intensity of the signal mRNA) rather than the traditional formalin generated is basic to determining whether a preserved material or blocks; gene is up or down regulated. Additional flex- 3. Clinical utility needs to be evaluated, and ibility became possible when multiple colors 4. Regulators must decide what is the were used in labeling genes. There are different appropriate oversight mechanism for this types of gene microarrays allowing measure- type of test. ment of: (1) Gene expression; (2) DNA marker In clinical medicine, microarrrays might be profiles, and (3) Detection of CNVs. useful for:

l Gene Expression Diagnostic confirmation and disease classification. The expression microarray that allows the l Personalized treatment selection through transcriptome (all the RNA species in a given analysis of the individual’s germline DNA cell) to be studied and compared with the tran- and somatic cell DNA in tumor tissue. scriptome in another cell has proven to be suc- l Better prognostic indictors derived from cessful in both research and clinical service. tumor DNA.

MOLECULAR MEDICINE 4. Omics 127 SNP Microarray Normal Tumor mRNA mRNA As discussed previously (Chapter 2) genome wide association studies (GWAS) have signifi- RT-PCR cantly advanced the potential to detect genetic markers or genes implicated in complex genetic cDNA-Cy3 cDNA-Cy5 disorders. This was possible because: mix equal portions 1. Larger cohorts were tested; Hybridize to microarray 2. The genome could be divided into haplotype blocks thereby needing fewer SNPs, and 3. Multiplexing SNPs became easier and cheaper with microarrays.

The Affymetrix SNP array was mentioned earlier. It was developed to enable applicabil- ity across many populations. It contains cover- age redundancy to optimize the detection rate, as it is difficult to ensure uniform hybridization Scan with laser laser conditions across all SNP probes. Alternative products are bead arrays such as Illumina’s BeadChips. These can be customized for a partic- ular need or available off the shelf, for example, there is a panel that contains SNPs from 400 Analyze with genes implicated in cancer. The Illumina com- bioinformatics tools pany has also introduced flexibility in its analytic platforms, allowing both NG DNA sequencing and SNP genotyping to be undertaken with the same instrument. Apart from SNP detection, the commercial arrays enable CNVs to be detected. FIGURE 4.3 Comparing gene expression in normal An obvious clinical application for microarrays versus cancer tissue with a DNA microarray. A microarray is mutation detection, as this would allow known can identify important genes in a cancer tissue. Total mRNA mutations to be printed on a chip. A number have from both normal and cancer tissue is made into cDNA. ® The normal tissue cDNAs are labeled with a green dye been produced, such as the Roche AmpliChip (Cy3) and the cancer tissue cDNAs with Cy5 (red color). CYP450 for drugs metabolized by CYP2D6 and The cDNAs are mixed in equal proportions, and hybridized CYP2C19. Despite its attractiveness, the microar- to the microarray which has spotted onto it DNA probes for ray approach to DNA genetic testing has not been genes with relevance to cancer. Following hybridization, the popular perhaps because of the costs of chips, excess cDNAs are washed off, and the microarray plate is scanned with a laser to detect four possible color changes: and methods based on hybridization are not (1) Red – cancer tissue genes; (2) Green – normal tissue ideal for the close to 100% detection rate needed genes; (3) Yellow – genes from both cancer and normal tis- in clinical work compared to a lesser requirement sue are expressing because red  green  yellow, and (4) in research. Another important consideration is Black – no marked genes are expressing. Using appropriate that the underlying mutations for most genetic software and the results from control DNA samples, it is possible to identify the intensity of each red and green color diseases are very heterogeneous with ones spe- to estimate the level of the gene being expressed as well as cific to families often predominating. These the global gene expression profiles. private mutations would not be detected through

MOLECULAR MEDICINE 128 4. Omics

BOX 4.3 PERSONALIZING TREATMENT THROUGH MICROARRAYS. An example of how microarray-based tests cell cycle, DNA replication, growth, prolif- might impact clinical decision-making is illus- eration, transformation and apoptosis. The 70 trated in research findings first published in genes were spotted onto another microarray 2002. This work was initiated because breast and make up the MammaPrint® test. The RNA cancer patients with the same disease sta­ profile was considered a more powerful pre- ging have different outcomes and survival dictor of outcome than standard measures and rates. Conventional prognostic indicators rely has been approved by the FDA (more on this in on lymph node status, histological grade and Chapter 7). Clinical trials are now underway to immunophenotyping of the tumor. Treatment determine the test’s clinical utility. One study options for early stage breast cancer after the is MINDACT, which started in 2007 and closed tumor is removed vary from doing nothing to in mid 2011 when it had recruited over 6 000 adjuvant chemotherapy or anti-estrogen agents patients. Validation data are eagerly awaited of such as Tamoxifen; both of which have sig- the claim that the tumor’s microarray profile nificant side effects. It is difficult for patients to can predict early stage breast cancer patients decide what to do, particularly when it is known who will do well without chemotherapy [7]. that a large number of women will not relapse. The test requires fresh tumor tissue from which In developing a microarray for breast cancer, the to extract mRNA. This does not fit into the researchers at the Netherlands Cancer Institute traditional work flow which utilizes paraffin in Amsterdam first took mRNA from 78 primary embedded DNA, so significant clinical benefits breast tumors obtained from women 55 years will need to be demonstrated before changes old, who were lymph node negative. Of these, in practice result. The same company that pro- 34 patients subsequently developed metastases duced the above breast cancer genomic screen is within five years, and 44 remained disease free working up a similar one for colon cancer. This after five years. mRNA from tumors were ini- is ColoPrint® and involves an 18 gene signature. tially hybridized against 25 000 human genes. It is targeted to stage II cancers, where following It was shown that prognostic information was resection of the tumor there is uncertainty about captured predominantly by 70 genes whose bio- the value of adjuvant chemotherapy (a position logical function spanned many potential path- similar to early stage breast cancer) as many ways in breast cancer development, including patients are cured by alone.

microarrays that include only known mutations. Array-Based Comparative Genomic For detecting novel mutations DNA sequenc- Hybridization (aCGH) ing is needed. Therefore, it is likely in the longer term that increasingly cheaper costs for DNA In earlier editions of Molecular Medicine, there sequencing will mean many of the microarray- was discussion about a new development in based applications are replaced by NG DNA cytogenetics called FISH (Fluorescence In Situ sequencing. Hybridization). FISH utilized DNA probes that

MOLECULAR MEDICINE 4. Omics 129 hybridized to metaphase or interphase nuclei changes found are de novo or inherited. In the and allowed chromosomal location as well as USA and Europe there are emerging clinical gene copy number to be detected. Cytogenetic- and laboratory practice guidelines to address based techniques were able to detect chromo- this issue [8]; somal abnormalities at the 5–10 Mb level of 2. Quality assurance. This is being resolved as resolution but dividing cells were necessary. home-brew kits are replaced by commercial FISH could detect deletions and duplications ones, and not previously seen with cytogenetics at a res- 3. Evaluation for clinical utility. olution around 2 Mb for metaphase FISH and A critical step in the development of aCGH is even better for interphase FISH. However, FISH evaluation. Challenges ahead are illustrated in was technically demanding, required special a 2009 health technology report on aCGH used equipment and was limited to chromosomal for patients with developmental delay/men- regions detected by the DNA probes. FISH tal retardation or autism spectrum disorder [9]. is still useful but it is likely to be replaced by Two quotes from this report are noteworthy: aCGH (also called chromosomal microarray or molecular karyotyping). The results of neither conventional cytogenetic evaluation nor aCGH evaluation have been sys- aCGH uses DNA rather than chromosomal tematically studied for impact on patient outcomes preparations. aCGH probes (oligonucleotides other than diagnostic yield, which is an intermedi- or cloned segments of DNA) are tiled on micro- ate outcome. Impact of testing on the kinds of out- scope slides and hybridized against patient and comes that matter to the patient and family has been control DNA (Figure 4.4). aCGH kits are com- directly addressed in very few studies. Thus, it is not possible to draw evidence-based conclusions mercially available and provide various levels regarding the clinical utility of aCGH genetic evalu- of cover across the genome depending on the ation. The same may also be said of conventional number of probes used. aCGH is attractive for cytogenetic evaluation. clinical practice because of: Expert consensus and clinical guidelines state that genetic information is of value because it estab- 1. Ease of use; lishes a causal explanation that is helpful to fami- 2. Higher detection rate; lies. It is suggested that such genetic information 3. Faster turnaround time, and avoids additional consultations and various types 4. Automation. of diagnostic tests, assists with early and improved access to community services that may ameliorate aCGH is useful when investigating possible or improve behavioral and cognitive outcomes, chromosomal imbalances or CNVs leading to provides estimates of recurrence rates to better birth defects, developmental disorders includ- guide reproductive decision-making, and enables an understanding of prognosis and future needs. ing intellectual impairment. It is considered by However, little evidence supports these outcomes. some to be the first tier diagnostic test in these circumstances [8]. This approach is proving Although only DNA microarrays have been popular in prenatal testing and mutation detec- described, there are microarrays for proteins, tion when CNV is the underlying abnormality. carbohydrates and other potential biomarkers. Nevertheless, some problems with aCGH need As well as 2D microarrays, it is possible to have resolution including: 3D suspension arrays. 1. The significance of some CNVs detected which is comparable to DNA variants of BIOINFORMATICS unknown significance. Centralized databases including the scientific literature help here Bioinformatics describes the application of as does the study of parents to determine if computational tools and analysis to capture,

MOLECULAR MEDICINE 130 4. Omics

FIGURE 4.4 Array-based Comparative Genomic Hybridization (aCGH). An example of a duplication and deletion on chromosome 16p. The patient’s DNA is labeled with green fluorescent dye and the normal control DNA has a red dye. The two DNA samples are allowed to hybridize onto slides coated with DNA probes, usually oligonucleotides. Probes can represent regions in the genome known to have CNVs causing disease, or there can be probes scattered across the whole genome. Different aCGHs are available depending on what is needed. Where there is no quantitative difference between the patient and the control both green and red colors will appear around the baseline (center of figure; 0 along the top axis 4 to 4). Where there are duplications/deletions in the patient’s DNA the green/red will predominate. Top: The green intensity is about 0.5 while red is 0.5, i.e. there is an excess of green which indicates a duplication at the site of these probes. Bottom: A relative deficiency of green which is around 1 (patient) and 1 for control (red) DNA means a dele- tion at this locus. aCGH provided by Dr Melody Caramins, South East Area Laboratory Services, Prince of Wales Hospital, Sydney, Australia.

MOLECULAR MEDICINE 4. Omics 131 store and interpret biological data. It intersects a generated. The solution was to deposit the number of disciplines, including biology, medi- sequences electronically into various databases cine, computer science, information technol- such as GenBank and EMBL. Information about ogy and mathematics. There are many related proteins was placed in databases including terms used interchangeably with bioinformatics, PIR (Protein Information Resource) and PDB including informatics, computational biology, (Protein Data Bank) (Table 4.3). medical informatics, eHealth and health informa- As well as expanding the storage capacity tion technology. In this chapter bioinformatics through better computer hardware, new soft- will be used as a broad descriptor. A new term ware programs were required to analyze the has emerged; in silico (computer based) analysis – data. Since protein-coding genes occupy only which complements the more traditional in vivo a small proportion (1–2%) of the total genome, and in vitro approaches to study gene function. and are discontinuous with exons interspersed In modern biological research, bioinformat- within introns, an initial focus for bioinformat- ics is essential for managing and analyzing ics was predicting the location of protein-coding data. The computer also increasingly impacts regions in the genome [11]. Another was the on medical practice, through the availability of analysis of DNA sequence from newly discov- sophisticated databases, accessible to patients, ered genes, to predict their function. For this, the community and health professionals over the DNA sequence was compared with other the Internet. Computers can potentially assist sequences to look for homology (similarity). in clinical decision making. The importance Software programs, such as FASTA (abbrevia- of bioinformatics has closely paralleled the tion for Fast – all), allowed comparisons with growth of molecular medicine and the recent other sequences in the databases. Finding some evolution of omics. As the omics analytical plat- homology to another gene would help in trying forms have become more automated, the role to understand function. Finding no homology and input of the laboratory scientist or pathol- made it more problematic for the researcher to ogist is diminishing, and the role of the bioin- predict function. formatician is growing, as well as becoming a As the Human Genome Project progressed, limitation to progress. For the full translation an increasing number of model organisms and of molecular medicine discoveries into clinical plants were sequenced and compared through healthcare delivery it will be necessary to build bioinformatic in silico approaches. More sophis- a sophisticated bioinformatics infrastructure, ticated software had to be developed to cope while at the same time ensuring that health with the increasing complexity in data analy- professionals and the community are suffi- sis. A program called BLASTN (Basic Local ciently educated to utilize these resources. Alignment Search Tool Nucleic acid) provided more rapid and better information about DNA Research Applications sequences and gene characterization. Further challenges have emerged, as studies Two key catalysts for major developments in of gene expression generated data from poten- bioinformatics were the Internet’s arrival [10], tially thousands of genes using microarrays. and the Human Genome Project (Chapter 1). The earlier requirement for bioinformatics to The importance of bioinformatics in molecular provide understanding of relatively straightfor- medicine became apparent in the 1980s, when ward one-dimensional objects such as a DNA DNA sequencing data began to accumulate. sequence has changed significantly, to cope These data had to be stored, and the traditional with information related to networks and the paper methods were inadequate for the amount relationship between genes (systems biology).

MOLECULAR MEDICINE 132 4. Omics

TABLE 4.3 some useful clinical laboratory or research bioinformatics sites. Note: All web-based references accessed on 16 Feb 2012.

Name URL and Comments

NCBI (National Center www.ncbi.nlm.nih.gov Repository for many bioinformatics tools and databases including for Biotechnology GenBank; RefSeq; Entrez; BLAST; FASTA; dbSNP, dbGaP; PubMed; OMIM; peptidome; Information) DCODE. EMBL nucleotide www.ebi.ac.uk/embl/ Europe’s primary DNA, RNA nucleotide sequence resource. Data are sequence databases exchanged on a daily basis with two other similar databases (see GenBank, DDBJ). DDBJ–DNA databank of www.ddbj.nig.ac.jp/ DDBJ is a member of the International Nucleotide Sequence Databases Japan developed and maintained collaboratively between DDBJ, EMBL and GenBank for over 18 years. These three databases are synchronized and so contain the same data but differ in the way the data are displayed. Ensembl www.ensembl.org/index.html Joint UK, European Bioinformatics Institute (EBI) initiative. This database has many complete and up to date annotated entries on selected eukaryotic genomes. UCSC Genome http://genome.ucsc.edu/ A commonly used genome browser. Bioinformatics UniProt www.uniprot.org/ A curated protein sequence database providing a high level of annotation (e.g. description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Protein Data Bank (PDB) www.pdb.org/pdb/ Contains information about experimentally determined structures of proteins, nucleic acids and complex assemblies. PIR – Protein Information http://pir.georgetown.edu/ Integrated protein informatics resource for genomic, proteomic Resource and systems biology research. International Society for www.iscb.org/ Involved in policy, giving members access to publications and meetings, and Computational Biology functions as a portal for information on training, education and employment. International HapMap http://hapmap.ncbi.nlm.nih.gov/ A resource to find genes associated with human disease Project and pharmacogenetics. Database of genomic http://projects.tcag.ca/variation/ A curated catalog of structural variation in the human variants genome. Vega (Vertebrate http://vega.sanger.ac.uk/ A central repository for high quality annotations of vertebrate Genome Annotation) finished genome sequence. 1 000 Genomes www.1000genomes.org/ A comprehensive catalog of human genetic variation. Rfam www.sanger.ac.uk/resources/databases/rfam.html Information about RNA families. Pfam www.sanger.ac.uk/resources/databases/pfam.html Information on classifying proteins. miRBase www.mirbase.org/ Searchable database of published miRNA sequences and annotation. KEGG (Kyoto www.genome.jp/kegg/ Contains descriptions of cellular pathways, e.g. metabolic pathways Encyclopedia of Genes and disease related pathways. and Genomes Databases) COSMIC www.sanger.ac.uk/genetics/CGP/cosmic/ A catalog of somatic mutations in cancer. (Continued )

MOLECULAR MEDICINE 4. Omics 133

TABLE 4.3 (Continued)

Name URL and Comments Human Microbiome http://commonfund.nih.gov/hmp/ NIH sponsored program to characterize the microbiota Project in different sites of the human body in both health and disease. Zebra fish model http://zfin.org/cgi-bin/webdriver?MIval=aa-ZDB_home.apg Provides access to a variety of organism database resources for those working with this model animal. FlyBase http://flybase.org/ A database of Drosophila genes and genomes. Caenorhabditis Genome www.sanger.ac.uk/Projects/C_elegans/ A database of Caenorhabditis genome sequencing projects. GeneCards www.genecards.org/ Searchable, integrated, database of human genes that provides concise genomic, transcriptomic, genetic, proteomic, functional and disease related information on all known and predicted human genes. Cytochrome P450 home www.cypalleles.ki.se Useful site to observe the considerable heterogeneity with DNA changes page in the P450 genes. International Cancer www.icgc.org/ International project to map 50 different tumors that have clinical and societal Genome Consortium importance across the globe. Human Genome Variation www.hgvs.org/mutnomen/ Determines the official nomenclature for describing DNA Society variants and mutations. Mutation surveyor www.softgenetics.com/mutationSurveyor.html Allows changes in a DNA sequence to be detected by comparing to a reference sequence. Alamut http://www.interactive-biosoftware.com/software/alamut/overview Provides useful algorithms to interrogate DNA sequence changes and highlights relevant literature as well as recommended HGVS nomenclature. Human Gene Mutation www.hgmd.cf.ac.uk/ac/index.php International database containing genetic mutations Database across a wide range of diseases. This is available free, with a professional version containing a larger number of entries also accessible but for a subscription fee. Human Variome Project www.humanvariomeproject.org/Goal of this international project is to capture and catalog all human genetic variations which are country specific or gene/disease specific.

A quantum leap in bioinformatics comput- weeks to complete could be finished in seconds ing power as well as analytic software became to hours. Supercomputers are computers with the necessary with the emergence of NG DNA fastest calculation speeds, and are at the frontline sequencing. of processing capacity. Some supercomputers can reach high speeds because they have been Hardware Developments designed for one purpose. Presently, a major lim- A number of adaptations have been made to itation to increasing computational speed is sec- meet the hardware (computer power) challenge ondary heating. This remains a challenge for the of bioinformatics. One was the development of computing industry. Cloud computing describes computer grids by linking computer and database Internet-based computing using shared resources resources across widely distributed scientific and software. Access is available on demand and communities. With this type of computer power, payment is made to cover the capital expenditure homology searches that used to take days to (hardware, software) and services.

MOLECULAR MEDICINE 134 4. Omics

Clinical Applications Although DNA mutation databases are important resources, they are also a trap for the In a rapidly moving field such as genom- inexperienced because variants in databases ics, information needs to be regularly updated. are not necessarily true mutations and each The bulk of data now being generated means has to be judged carefully based on the evi- the Internet is the only route for accessing data- dence provided. Particularly difficult to evalu- bases and linking relevant information to pub- ate are variants involving intronic changes lications in journals that provide the health that are potential splicing mutations. Missense practitioner (and often patients and families) changes can be interrogated using a number with up to date and comprehensive informa- of well-established software algorithms that tion. In terms of genetic disorders, one of the consider conservation, homology and the most extensive and useful databases is OMIM – potential for altering protein structure or con- Online Mendelian Inheritance in Man which formation. Nevertheless, even these can be dif- is regularly updated. For each clinical condi- ficult to confirm as true mutations. Ultimately, tion described, it provides links with relevant this uncertainty has to be transmitted in the publications as well as the related DNA or pro- genetic counseling. Effective interactions tein data, and comes in a historically formatted between the laboratory health professional summary. This and other useful databases are and the clinical health professional are essen- listed in Tables 4.3 and 4.4. tial to ensure results of DNA tests are fully understood by the patient (and their family In Silico Analysis of DNA Variants members). As will be highlighted in Chapter 5 An example of how bioinformatics and the direct-to-consumer model for DNA testing molecular medicine have impacted on the bypasses this link. Examples of some software delivery of clinical genetic services is the use programs used to gauge the clinical signifi- of sophisticated software to interrogate DNA cance of DNA variants are given in Box 4.4 and sequence data. There are three key resources an overview of what steps are necessary to available to laboratory health professionals to evaluate the significance of a DNA variant is assess the clinical significance of DNA variants: found in [12].

1. DNA mutation databases and the scientific eHealth literature; There are many components of eHealth, 2. In silico approaches utilizing software, and including electronic health records (EHR), deci- 3. In vitro or in vivo experimentation. sion support systems, eConsulting and tele- As already noted in Chapter 3, the increasing medicine [13,14]. Computers now comprise an volume of DNA sequencing data that are now integral component of most clinical practices. being generated makes it impractical to under- Electronically recorded patient information pro- take the third option, and so in silico analysis, vides the start for computer-generated prescrip- coupled with information derived from DNA tions that have many benefits, including links mutation databases and the literature, becomes to software that highlights risks from drugs or the default approach. DNA mutation data- drug-drug combinations. Just as bioinformatics bases have also proved to be key resources for is playing an increasingly important role in the depositing new DNA mutations, as these are no research applications of molecular medicine, longer accepted for publication in journals. An so will eHealth initiatives set the pace for the example of such a database is the Human Gene translation of molecular medicine into clinical Mutation Database (Table 4.3). practice.

MOLECULAR MEDICINE 4. Omics 135

BOX 4.4 I N S I L I C O SOFTWARE AND DNA SEQUENCING. Using various software programs, DNA vari- variant and compares it to other databases includ- ants can be interrogated in silico to assist in their ing Ensembl, UCSC Genome Bioinformatics, detection and interpretation. One example is Swiss Prot, dbSNP and PubMed. In terms of mis- Mutation Surveyor®, which can identify where sense changes, the software considers conserva- variants are present in Sanger sequencing. The tion of the nucleotide and amino acid across many claimed detection sensitivity is 5% of the primary species during evolution; physicochemical differ- peak, and an accuracy 99% (when used to ana- ences between the wild type amino acid and the lyze both the forward and reverse sequencing mutated one and whether the change occurs in a strands). The software compares the patient’s protein domain. On the basis of this, it makes a DNA sequence with a reference one, then identi- prediction about likely pathogenicity of the vari- fies changes, producing various characteristics ant. Although this type of software has helped in including quality scores. The latter is important the interpretation of DNA variants, it is still ulti- because no software (and the same would apply mately the responsibility of the laboratory sci- to the naked eye) is infallible and poor quality entist or pathologist to make the final call on the sequences and/or artifacts such as dye blobs can DNA variant’s significance. This is not always an lead to errors. For this reason laboratory staff will easy task, and increasingly the laboratory DNA always visually confirm any changes reported. sequencing component progresses rapidly while For DNA sequencing, the quality score is called the assessment of the result becomes the limita- Phred, and is based on parameters taken from tion in turnaround time. It is sobering to note that the DNA sequence peak shape and resolution. a formal health technology assessment of Alamut Because it is a logarithmic scale a Phred quality came up with positive recommendations about its score of 10 implies the base call accuracy to be clinical utility, but noted that mistakes will occur if around 90% while a score of 20 means 99% accu- the primary information in the databases interro- racy. Once a variant is identified, it is assessed for gated by this software is not correct, or is written function, which can be aided by Alamut, a deci- in a confusing format. Links to the above software sion support program. This software takes the programs may be found in Table 4.3.

Software programs facilitate drawing pedi- including family history and previous breast grees and obtaining family history (Table 4.4). pathology. This information is then returned to More relevant for taking genomic discoveries the health professional in the form of a relative into the clinic will be the availability of com- risk. The program also has succinct informa- puter generated algorithms for decision mak- tion about various options available for the at- ing. The provision of in silico tools to make risk patient (Table 4.4). Some current and future clinical practice easier is just as important as applications of eHealth are described in [13,14]. formal educational activities. For example, Professional genetic counseling services are the National Cancer Institute in the USA has faced with increasing demands and more com- developed an Internet-based program which plex clinical scenarios. This trend will continue allows the physician or counselor to input clini- as new genes and genetic risks are defined in cal information relevant to breast cancer risk the complex genetic disorders (Chapter 2). The

MOLECULAR MEDICINE 136 4. Omics

TABLE 4.4 some clinically-relevant resources available online. Note: All web-based references accessed on 16 Feb 2012.

Name Comments

Online Mendelian Inheritance www.ncbi.nlm.nih.gov/omim A must for any clinician dealing with genetic diseases. in Man (OMIM) Reputable and regularly updated. Links to DNA and protein information and databases. National Cancer Institute’s www.cancer.gov/aboutnci/cis/page1/print?page=&keyword Evidence based (NCI) Information Service summaries providing genetic basis for various cancers. NCI’s Breast cancer risk www.cancer.gov/bcrisktool/Interactive tool for health professionals to measure assessment tool woman’s risk of invasive breast cancer. Canadian Diabetes Association www.diabetes.ca Wide ranging information for patients and health professionals Website dealing with diabetes. National Centre for www.ncbi.nlm.nih.gov/About/primer A science primer providing useful summaries of Biotechnology Information many topics in genomics. NCBI also hosts PubMed and OMIM. (NCBI) NCBI’s GeneClinics www.ncbi.nlm.nih.gov/sites/GeneTests/?db=GeneTests Provides information for diagnosis, management and counseling for genetic disorders. Pharmacogenomics www.pharmgkb.org/ Comprehensive database of information about pharmacogenetics/ Knowledge Base (PharmGKB) pharmacogenomics including a list of drugs with genetic information available. Gene Therapy Clinical Trial Site www.wiley.co.uk/genmed/clinical/ Lists gene therapy studies undertaken worldwide. Internet genetic counseling www.informeddna.com/ Advertises through the Internet for genetic counseling to be service delivered by telephone. DECIPHER http://decipher.sanger.ac.uk/ A database of phenotypes associated with genetic disorders caused by chromosomal abnormalities. This is increasingly a challenge as techniques such as aCGH identify many new submicroscopic changes. BioInform – Genomeweb www.genomeweb.com/newsletter/bioinform/ Started as a bioinformatics news service but now deals with broader issues. Medline Plus® www.nlm.nih.gov/medlineplus/ency/article/001657.htm Health information US National Library of Medicine and NIH CSHL Dolan DNA Learning www.dnalc.org/resources/animations/ Series of around 30 animations on many Center molecular medicine topics. Many other educational resources also available. Family health program https://familyhistory.hhs.gov Family history program. community is also more knowledgeable about is through computer-based education and genes and genetics as a result of the many telehealth initiatives. media reports or access to the Internet. This As will be discussed in Chapter 5, the means the level of detail requested by patients Internet is used to deliver DNA tests directly and families will challenge health profession- to consumers. Now direct-to-consumer coun- als. The same standard for counseling services seling services are being advertised through the must be provided to those living in rural or Internet or are available by telephone (Table 4.4). remote regions. In this environment, traditional Significant concerns have been expressed about one-to-one, face-to-face counseling may not be the bypassing of health professionals in the feasible. One way to address these expectations marketing of DNA tests. Nevertheless, there

MOLECULAR MEDICINE 4. Omics 137 are also lessons to be learnt. In particular, how and the epigenome, the contributions from more effective use can be made of the electronic other omics provides a more complete pic- media in delivering clinical services. While ture. As shown in Table 1.12, the list of omics the Internet is essential for educating patients, has expanded dramatically. For the pur- families and the community at large, the risk pose of Molecular Medicine, some of the more of cyberchondriasis is increased as information conventional omics are described below, previously found only in specialized medical although it is exciting to think about the prospects journals or books is now readily available for all. for new approaches such as venomics (Box 4.5) Another new paradigm is the online doctor- or the concept of cocainomics. The emergence of patient consultation or eConsultation. Apart omics has made it essential to understand how from privacy and confidentiality issues related genes interact in complex biological models. This to Internet traffic, this approach has many is now possible through systems biology. advantages for the patient and the physician when it comes to simple problems includ- Proteomics ing repeat prescriptions or communicating the results of tests. However, there are medico-legal Proteomics is the analysis of the total pro- issues to be overcome since electronic commu- teins (proteome) expressed by a cell, tissue, bio- nication can make it more difficult to assess logical fluid or organism. Important distinctions how well a patient has understood the informa- between genomics and proteomics include: tion provided, or the physician may not have a 1. Proteomic biomarkers are present in complete picture of the clinical problem from biological fluids like plasma, serum, urine as an email. well as in cells and tissues; As access to the Internet increases, there 2. The proteome is not static, but constantly will be more pressure for eConsultations to changing in response to both endogenous become a part of clinical practice. In response, and exogenous stimuli; professional bodies such as the American 3. The proteome will differ in different cells Medical Association and the American Medical and tissues; Informatics Association have developed guide- 4. Added complexity results from protein lines on how electronic communication should conformation and post-translational be used. Recently, a review compared the use modifications, and of emails between physicians and patients in 5. There is no technique comparable to PCR 2008 versus 2005. It found that overall there has that allows minute amounts of a protein to been little change and, perhaps surprisingly, be amplified for ease of assay. there seemed to be less interest in taking up this method of communication by physicians in The above points show that the proteome 2008 [15]. Of concern was an apparent decrease more closely resembles the transcriptome than in adherence to best practice guidelines. These the genome. trends would seem inconsistent with the rapid The surprising observation that the human developments that are occurring in personal- genome has far fewer genes than originally ized medicine. anticipated (from 100 000 at the beginning of the Human Genome Project to the contemporary view of around 20 000) remains to be explained. OTHER OMICS Earlier it was believed that the most direct way to understand our complex proteome (mil- Although the focus of Molecular Medicine lions of proteins versus tens of thousands of is predominantly the genome, transcriptome genes) was to characterize genes, and from this

MOLECULAR MEDICINE 138 4. Omics

BOX 4.5 VENOMICS. New paradigms for drug discovery are to identify many more targets from small needed, and one approach is the identification polyamines found in some spiders or the complex of novel peptides. What better place to look than and large proteins found in other venoms. The the diverse venoms found in many invertebrates familiar approach where data (DNA or peptide and vertebrates? Apart from snakes and some spi- sequences) are compared against various data- ders, venoms have been ignored or have proven bases to help in identification will be less help- to be too difficult to study because of the minute ful in venomics, because many of the peptides amounts present. It is thought that there are in venoms are unique. In these circumstances, about 41 000 species of spiders, which could pro- the entire sequence has to be obtained de novo vide over 12 million biologically active peptides. and then the challenge would be to determine Currently, only about 600 peptides have been the various conformations, including disulphide described [16]. The potential of omics-based tech- linkages, that are important for functionality. nologies to study minute quantities of venom pro- Interesting times are ahead and no doubt more vides new opportunities. It would be possible to opportunities will arise for bioinformatics-based combine both proteomic and genomic strategies modeling to assist in determining function.

understand the proteins. Methods to discover This became even more apparent when DNA and sequence genes made this achievable. sequencing methods improved as the Human This idea now needs to be re-assessed, because Genome Project progressed. Today, advances in the protein-coding DNA (about 1–2% of the mass spectrometry (MS) combined with liquid genome) does not explain sufficient variabil- chromatography (LC) have underpinned impor- ity or even the human phenome and there must tant developments in proteomics, metabolomics be something else occurring at the level of the and lipidomics [17]. Generally two methods are genome/transcriptome/epigenome to account used to identify proteins: (1) Proteins in a com- for the comparable number of genes across both plex mix are digested into peptides, separated vertebrates and invertebrates (Table 1.7). Hence, by chromatography and analyzed, or (2) Protein effort is increasingly being directed back to the mixtures are first separated and then analyzed study of proteins. Proteomics has also been revit­ without any prior digestion. In both cases the alized by important technological developments, analysis is undertaken with mass spectrometry. particularly the evolution of 2-dimensional pro- In mass spectrometry, the mass-to-charge tein gel electrophoresis into the higher resolution ratio (m/z) of gas phase ions is measured. From liquid chromatography and mass spectrometry. this a mass spectrum is developed to identify a substance. Typically in a strategy called shot gun proteomics, a protein (or even a number of pro- Technology teins) is digested into peptides and separation Although the term proteomics was coined undertaken by passage through a liquid chro- in the mid 1990s, a limitation to its develop- matography (LC) column, before the product is ment was the difficulty in sequencing a protein. introduced into the mass spectrometer (hence

MOLECULAR MEDICINE 4. Omics 139

LC-MS). The peptides are next ionized and from its linear amino acid sequence is pres- vaporized. Ionization can occur by techniques ently not possible in silico. Protein shape can be such as electrospray (ES) or via matrix assisted looked at in terms of known protein structures laser/desorption ionization (MALDI). Ionized that have previously been determined through peptides in a high vacuum system are then X-ray crystallography or nuclear magnetic res- exposed to a laser beam. The laser blasts off the onance imaging using a resource such as the ionized peptides and they fly down a vacuum PDB database (Table 4.3). Software programs tube towards an oppositely charged electrode. including FASTA and BLASTP are used to per- There are various ways to measure the m/z, with form the calculations. In trying to predict pro- a popular one being TOF (time of flight) hence tein function, use can be made of evolutionary MALDI-TOF. It is also possible to refine the relationships to proteins whose structure has analysis further through Tandem MS (MS/MS). already been determined. This serial analysis allows some of the peptides from the first mass scan to be rescanned. Mass Applications spectrometers now enable the mass of peptides Biomarker discovery: A biomarker is a bio- (or metabolites) to be determined rapidly and logical measure such as a compound (usually accurately. The result is a spectrum based on the a protein) that can be used to improve diagno- various m/z ratios generated, with the height of sis or detect risk, follow disease progress or the each peak in that spectrum approximating the effects of a treatment. Considerable effort has abundance of that particle. gone into biomarker discovery in diseases such Bioinformatics-based algorithms then take as Alzheimer disease or Parkinson disease. the MS data, and allow them to be identified Although these two neurodegenerative disor- through comparisons with known peptides in ders have distinct phenotypes, they have over- the databases. Once high throughput methods lapping features [18]. Apart from attempting to became available to characterize proteins accu- find biomarkers that are based on medical imag- rately, it was necessary to develop databases ing, a lot of work has gone into examination of comparable to the ones used to store DNA data. body fluids, particularly cerebrospinal fluid to Despite these developments, the proteomic identify protein and other biomarkers. databases remain inferior to the genomic ones This field is still evolving and shares some because they are limited by substrate access, similarities with gene association studies, as since proteins need to be isolated from relevant biomarkers can be identified but determining tissues (in contrast to germline DNA, which is their functional significance is the challenge identical in all tissues). and limitation. Like cancer, the progression of Bioinformatic analysis of amino acid neurodegenerative disorders is complicated by sequences and protein function prediction fol- coexisting secondary changes, such as inflam- lows along the lines described above for DNA, mation, cell death and perhaps regeneration. although is more complex [11]. The amino acid Unlike genomics, protein biomarkers in a vari- sequence of the protein determines its ultimate ety of tissues or fluids will give different results. conformation and so its biological function. The changes found are dynamic and easily However, the protein’s final shape can be influ- influenced by environmental factors, so it is not enced by other variables, particularly the physi- surprising that proteomic profiles are often not cochemical environment in which the amino reproducible between studies. Nevertheless, the acids or protein exist and the structural and potential of MS-based strategies to identify and functional contexts for the amino acids or pro- quantify biomarkers will add to the vast quanti- tein. This means that predicting protein shape ties of data being generated.

MOLECULAR MEDICINE 140 4. Omics

Protein microarrays: These generally rely on and must also cope with hypertonic condi- the capture of peptides or proteins using anti- tions. The earliest investigations of its proteome body immunoassays. Commercial kits are now took place in 2002 using 2D electrophoresis and available and provide functional analysis in MALDI-TOF, and identified 102 proteins. Today, areas such as inflammation, signal transduc- the numbers have dramatically increased to tion, phosphorylation and so on [19]. Claims around 1 989 proteins involving 15 major red are made that combinations of protein biomark- blood cell pathways and 50 major networks. ers can be used to distinguish cancer from other The interactome identified has confirmed and conditions, and it is inevitable that a conten- demonstrated the key functions of the red blood tious screening marker such as PSA (prostate cell and how they are maintained including: specific antigen) will be replaced by biomarkers 1. Surviving oxidative stress because of the with greater specificity and sensitivity. constant exposure to high oxygen levels; Drug development: Proteomics is an important 2. Requiring the cytoskeleton to unfold, and entry into drug discovery and development as 3. Apoptosis pathways important for the red ultimately it is the protein that is the effector blood cell’s aging process [20]. in disease. Applications for proteomics in drug discovery include:

1. Interrogating databases as these have many Metabolomics peptides and proteins that will help to identify Metabolomics refers to the total number of novel targets or model different structures as small molecular mass organic compounds found well as protein-protein interactions and post- in or produced by cells, tissues, fluids or an translational modifications; organism. Polymerized structures such as pro- 2. Utilizing biomarkers to assist in all stages in teins and nucleic acids are excluded. Molecules drug development including the monitoring that make up the metabolome are called metab- of efficacy and toxicity, and olites [21]. The closely related term metabonomics 3. Producing cheaper or novel drugs. is included under this definition (see Table 1.12). The human endogenous metabolome is esti- For example, knowledge of protein struc- mated to contain a few thousand species. ture can be used to make synthetic (cheaper) Investigating the metabolome utilizes simi- products exemplified by the antimalarial drug lar approaches to those described for proteom- artemisinin or novel therapeutics (Box 4.6). ics, although it is complicated by significant dynamic changes. For example, measuring the Interactome metabolome requires consideration of envi- Related to the proteome is the interactome, ronmental factors such as drugs, dietary com- which describes all the protein-to-protein inter- pounds and even pollutants [21]. This potential actions within a cell, tissue, fluid or organism. It for background noise is an additional challenge is usually expressed as a directed graph and is for experimental design and bioinformatic an attempt at a systems biology approach (see analysis. below). This can be illustrated with the mature Mass spectrometry has previously been red blood cell which does not have a nucleus, described as a core technology for proteom- and so has a relatively simple proteome and ics and metabolomics. However, for the latter interactome because there is little mRNA. Apart any one single approach is usually insufficient. from carrying oxygen, the red blood cell has to Another technology used to measure the metab- cross narrow capillaries by changing its shape, olome is NMR spectroscopy (NMR – nuclear

MOLECULAR MEDICINE 4. Omics 141

BOX 4.6 DRUGS DEVELOPED THROUGH MOLECULAR TECHNOLOGIES. Artemisinin exemplifies how an expen- so inhibits production of TK. More recently this sive natural product can be synthesized more drug has been approved for use in gastrointes- cheaply. It is isolated from the plant Artemisia tinal stromal tumors because these are associ- annua, and in combination with other antima- ated with activating mutations in the KIT gene larials it is used to treat multi-drug resistant (a receptor TK). The successful introduction malaria. However, it is expensive to isolate and of Imatinib has led to a number of other TK there are uncertainties associated with growing inhibitors being developed including gefitinib, this plant. These constraints make it unattain- nilotinib and dasatinib. Although they all work able in the developing countries where it is most through the same effect on ATP inhibition, the needed. A synthetic precursor product was made second generation products differ in their tar- in 2006 using a rDNA approach (Chapter 8) geted kinases. In some cases, the newer prod- but this was not sufficiently active and needed ucts are now preferred as a front line treatment. changes to its structure. Now with funding from TK inhibitors have been shown to be effective the Bill and Melinda Gates Foundation and in a number of cancers, and they are now being involvement of the biopharmaceutical company trialed in non-malignant diseases, including pul- Sanofi-Aventis, researchers from the University monary hypertension, rheumatoid arthritis and of California are attempting to make a synthetic other conditions. product that will cost around $1 per dose. It will Trastuzumab (Herceptin®) is a humanized be reliably produced and not subject to weather monoclonal antibody against the human epi- and other conditions that impact on the native dermal growth factor receptor type 2 (HER2). plant that is the current source of this product. Following discovery of the HER2 gene and its The next two examples involve targeted ther- related protein, it was shown that this biomar- apies, where drug use is limited to patients who ker (amplification of the gene or its protein satisfy specific requirement(s) based on protein product) could identify a subgroup of breast or DNA tests from tumor tissue. cancer patients with a poor prognosis. Hence Imatinib (Gleevec®) is a small molecule spe- a targeted therapy was developed for patients cifically developed to inhibit tyrosine kinase with metastatic breast cancer who were unre- (TK). It was originally produced in response to sponsive to conventional therapies. It is asso- the bcr/abl translocation in chronic myeloid ciated with significant side effects and so is leukemia, which has a fusion gene with unregu- preferentially used in patients who are most lated TK activity. Imatinib binds close to the ATP likely to respond – i.e. those with HER2 over- binding site specific to the bcr-abl product and expressing breast cancer.

magnetic ). NMR detects nuclear spectroscopy, metabolites can be identified by the spin which is found in atoms with an odd mass chemical shift in resonance frequencies. Like MS, number, e.g. 1H, 31P [21]. Nuclear spin is detect- the shift in peak identifies the product, while the able in atoms that contain odd numbers of pro- height of the peak gives an indication of quantity. tons and neutrons in the nucleus. Using NMR Generally this approach has poor sensitivity.

MOLECULAR MEDICINE 142 4. Omics

Another technique used in metabolomics is Human Models gas chromatography linked to mass spectrom- The concept of deep-phenotyping is used etry (GS-MS). Here the sample (containing to explain ways in which the human phenome volatile, non-polar metabolites) is vaporized might be generated [22]. For this, it is neces- and passed through a chromatograph in the sary to document more comprehensive clinical gas phase, before being analyzed by MS. More and investigative parameters with preference recently, the LC-MS approach described earlier for the generation of quantitative data. A heat has become the preferred approach for investi- map can be generated to allow statistical assess- gating the metabolome. ment of what might be overlapping syndromes The metabolome is dependent on the (Figure 4.5). The human genome with its genome, the transcriptome and the proteome, 3 billion bases represented by four possible com- as well as the environment, hence it provides binations is relatively straightforward compared additional information that might be useful for biomarker development or understanding physiologic and disease pathways. Examples Disease of how metabolites are being studied to explain A BC D E drug toxicity (hepatic and renal) as well as 1 identifying biomarkers in a range of human disorders are given in [21]. 2

3 Phenomics 4 The phenome is the entire set of pheno- types in a cell, tissue, organ, organism or spe- 5 Characteristics cies. It is derived by systematic measurement 6 of phenotypic contributors, including qualita- tive and quantitative traits, allowing it to be 7 defined on a much broader whole-body scale. 8 As the accuracy of genomic based measure- ments improves, more attention is being paid to the phenotype – which remains the criti- FIGURE 4.5 A heat map to define a human phenome. cal variable in any genetics or genomics study. The heat map is generated by placing a phenotype class or disease along one axis (X in this case) and phenotypic char- Confounding factors in genetic studies include acteristics on the Y axis. In this example A to E represent 5 pleiotropy, penetrance, epistasis, allelic and phenotypically similar disorders while the numbers 1 to locus heterogeneity. These effects should be 8 are characteristics derived from the phenotypes in these considered in designing research protocols but disorders. A two color heat map is shown with red  ↑ cannot be avoided. In contrast, errors in the intensity/prevalence of the characteristic compared to a ref- erence range; blue  ↓ intensity/prevalence of the charac- phenotype occuring because of phenocopies teristic; white  absent characteristic. A pink or light blue can be avoided, or their effects can be lessened color would suggest a less conclusive phenotype. Based on by more careful assessment of the phenotype the patterns shown, it would appear that disorders A and C [22]. An example would be the genetic disorder are similar; A and D share some similarities while A and E thalassemia and acquired iron deficiency. Both and to a lesser extent A and B are different. This more rigor- ous assessment of phenotype would help in selecting sub- have similar phenotypes in terms of the hema- jects for a case control association study or define better the tologic profile but are usually distinguishable underlying disorders. See [22] for examples using one and with care. two color heat maps.

MOLECULAR MEDICINE 4. Omics 143 to the human phenome, which apart from its models or manipulate existing ones to test potential complexity, will contain components the function of genes (Box 4.7). The rDNA yet to be defined. A Human Phenome Project approaches can be divided into two strategies; akin to the Human Genome Project would reverse or genotype driven animal models, be significantly more complex because of the and forward or phenotype driven models. The intrinsic difficulty in determining both quali- reverse strategy is essentially the transgenic tatively and quantitatively what components animal – i.e. manipulating a specific gene in a should be included. In the meantime there have mouse will provide information about a dis- been many initiatives to catalog human pheno- ease. The gene driven strategies require a priori types and phenomes including the publication knowledge of likely gene function. In contrast, of personal genomes from members of the pub- the forward strategy makes no prior assump- lic as well as celebrities. Considerable progress tions and focuses on the disease (phenotype) has been made in understanding the phenome and from this, knowledge of the underlying through animal studies. genomic changes can be gained. An example of the forward approach is the ENU mouse. Animal Models ENU mouse: ENU (N-ethyl-N-nitrosourea) Unlike humans, animals can be manipulated is a potent germline mutagen that is used to experimentally and bred under specific con- generate single nucleotide mutations in DNA. ditions. Some animal models of disease arise Using this chemical, it is possible to create spontaneously, but a more useful approach random mutations in mouse DNA, and then is to produce experimentally the phenotype observe the resulting phenotypes. Those which required which allows the natural history of a resemble human diseases are studied to iden- disorder to be followed over many generations tify the relevant gene. From this, the human and various interventions can be tried. homolog can be isolated. Difficulties with this Traditional animal models: For many years, model include a preference for ENU-induced inbred strains of animals, particularly the mutations to occur at A-T base pairs and so laboratory mouse, have been important tools mutations at G to C sites are under-represented. for studying a wide range of human disorders. Because there is no prior information, detecting Inbred mice are produced by repeated sister- the various phenotypic changes, particularly brother matings over about 20 generations. subtle ones is challenging [24]. The end result is a syngeneic mouse which will Zebrafish: Danio rerio is an attractive model be identical (e.g. homozygous) at every genetic organism because of its small size, short life locus, and to other mice of the same strain. cycle, and ease of growth. It is easier to work Another type of inbred mouse is the congenic with in terms of gene identification since its one. Although derived from one strain, selec- genome is half the size of the human or mouse. tive breeding allows this animal to have genetic It is a particularly good model when study- material from a second strain at a single locus. ing development because the embryos are Naturally-derived animal models provide con- transparent, and develop outside the moth- siderable information, but they have limita- er’s body, so they can be studied in real time. tions, for instance the mutation may not be In the zebrafish, antisense approaches to gene representative of that found in the human dis- manipulation have been used successfully to order. Importantly, there are many diseases for knock out genes, and then observe the effects which a suitable animal model does not exist. on the phenotype (Chapter 8). Zebrafish can be Transgenic mouse: Recombinant DNA (rDNA) used to evaluate drug toxicity by direct release methods provide a way to create new animal of the drug in the fish tank and observation of

MOLECULAR MEDICINE 144 4. Omics

BOX 4.7 TRANSGENIC MOUSE MODELS. Transgenic mice have become an invalu- 2. Embryonic stem cells also allow a gene able resource for understanding human disease. to be targeted to its appropriate locus, and Three types are available: replace its normal wild-type counterpart by homologous recombination; i.e. integration 1. The conventional transgenic mouse is into the genome is no longer random. Gene produced by a microinjection of DNA into function can be inhibited (knock-out mouse) the pronucleus of a fertilized oocyte, which or the effect of a specific gene or gene is then inserted into a pseudopregnant mutation can be observed (knock-in mouse) foster mother. In this model, the injected (Figure 4.6) (See Chapter 8 for discussion of transgene is randomly inserted into the homologous recombination). genome. Despite this it can still function and 3. The two types of transgenics so far described its expression will produce a new phenotype. represent an all-or-nothing effect, and there Foreign DNA that has become integrated is widespread expression of the transgene into the germline of what is now a chimeric in many tissues. Therefore, it is difficult to mouse enables the gene to be transmitted to investigate subtle phenotypic changes or progeny. Appropriate matings will produce distinguish primary from secondary effects. homozygotes containing the transgene. The uncontrolled expression of the transgene

1 2 3 4 5 6

ES cell DNA colony with ES cell Chimeric + isolated homologous Positive colonies mouse transfected from recombinant ES cells grown DNA (–) colonies microinjected into blastocysts

FIGURE 4.6 Embryonic stem (ES) cells for in vivo expression of recombinant DNA. This method produces transgenic mice which are used to test the function of genes in vivo. (1) ES cells are transfected with foreign DNA. ES cells will take up DNA into different random sites in the mouse genome. In a very rare instance, the integra- tion will have occurred into the correct site in the genome by homologous recombination. (2) Colonies of ES cells are grown. (3) DNA is isolated from pools of colonies. (4) The colony which has DNA integrated into the correct position in the genome by homologous recombination can be identified by PCR (marked in red here). (5) ES cells with the homologous recombined DNA are injected into mouse blastocysts. (6) Using different colored mice as sources of ES cells (e.g. white mouse) and blastocysts (e.g. black mouse) will enable chimeric (white and black) mice to be distinguished. If the transgene has also integrated into the germline it will be possible to obtain a homozygous animal by breeding [23] (Chapter 8 has further discussion on ES cells).

MOLECULAR MEDICINE 4. Omics 145

BOX 4.7 (cont’d )

during embryonic development could also is called the Cre-lox system (Figure 4.7). A be lethal if it is not normally expressing at summary of gene targeting, homologous this time. To improve on these limitations, it recombination and the Cre-lox system is is now possible to make a conditional knock- found in the citation for the 2007 Nobel Prize out mouse, which means that the inserted in Physiology or Medicine awarded to gene can be switched on or off conditional to M Capecchi, M Evans and O Smithies for a specific stimulus. One approach to make a their work in homologous recombination and conditional transgenic mouse utilizes what transgenic mice [23].

2 1 X

Floxed Liver transgenic cre transgenic

3

Liver specific mutant created-responsive to an external stimulant

FIGURE 4.7 Cre-lox system to generate a conditional transgenic mouse. Cre (causes recombination) recom- binase enables recombinations to be made where there are recombinase recognition sites called loxP (locus of recombination). (1) The floxed transgenic (flanked by lox) is produced by the usual embryonic stem cell homolo- gous recombination approach but in this case the gene of interest is constructed so that it is flanked by loxP sites. Mice with this transgene are bred to homozygosity, but have no phenotypic changes because the Cre recombinase is needed. (2) To introduce the Cre recombinase requires breeding to a Cre expressing transgenic mouse. This trans- genic has Cre under the control of a promoter which can be tissue or time specific. For example, using the cardiac myosin promoter will mean the gene will express only in cardiac tissue. By introducing into the promoter an ele- ment requiring a drug such as tetracycline it becomes possible to turn on the Cre gene only when there is exposure to tetracycline. (3) Offspring of the Cre/Floxed mating on exposure to tetracycline will allow targeted recombina- tion to occur and so inhibit gene function (i.e. a knock-out). Because this is tissue or time specific it allows some control of the transgenic gene expression and avoids the potential for lethality [23].

MOLECULAR MEDICINE 146 4. Omics toxic effects in embryos or adult fish. For drug the gut mucosa leading to inflammation [26]. discovery, mutant zebrafish can be exposed to Animal studies have also shown that the gut various compounds and disease-suppressing microbiome might contribute to obesity, thereby effects sought as markers for novel drugs. broadening the concept that obesity is a product Mutants in zebrafish produced by ENU have of nutritional and genetic factors (Chapter 6). also proven useful models for human disorders The efficacy of complementary medicines, such (Table 4.3) [25]. as the taking of probiotics to enhance the benefi- cial bacteria in the gut can now be better assessed Metagenomics by NG DNA sequencing approaches. The human microbiota refers to the commu- Human Microbiome Project nity of microbes that lives in symbiosis with its host. The set of genes encoded by the micro- The goals of the NIH sponsored Human biota is called the microbiome. Humans have Microbiome Project read like a mini Human four major microbiomes – gut, skin, oral cavity Genome Mapping Project: and reproductive tract. Metagenomics refers to 1. Determine if individuals share a common the sequencing of uncultured microorganisms human microbiome; in various environmental niches to provide a 2. Understand if changes in the human snap shot of the microbial populations, thereby microbiome can be correlated with human allowing their biodiversity to be studied. health; The nonpathogenic human gut bacterial flora 3. Develop new technologies and has been described as the third major genome bioinformatics tools, and of mammals after nuclear and mitochondrial 4. Address ELSI raised by human microbiome DNA, with the difference being that it can research (Table 4.3). change. The human gastrointestinal tract has a diverse bacterial flora in terms of both number The Human Microbiome Project utilizes two and species. It is the site for important mutually strategies developed through metagenomics. beneficial interactions including digestion and DNA present in a particular environment is iso- immunity. Numbers quoted for the gut flora lated using degenerate PCR primers to amplify are pretty impressive – 500 different species, all 16S or 18S ribosomal RNA (rRNA) spe- diversity greater than what is found in the skin, cies representing prokaryotes and eukaryotes oral cavity or reproductive tract and a cumu- respectively. Since these RNA species contain lative microbiome genome that is 100 times highly conserved regions an overview of what larger than the mammalian nuclear genome is present can be obtained. Alternatively, DNA [26]. Since many of the gut flora cannot be cul- or RNA is prepared from the pool of micro­ tured, the only option for identifying new spe- organisms, subcloned, amplified, and then NG cies and cataloging those present is NG DNA DNA sequencing is used to give an overview of sequencing. what is present. Both approaches rely on final Although the gut microbiome is important identification through in silico comparisons for normal health, it is also implicated in inflam- with protein, DNA and RNA sequences already matory bowel diseases such as Crohn disease, in the databases. ulcerative colitis and irritable bowel syndrome. The challenges for bioinformatics in meta­ Differences in the microbiomes for these condi- genomics are significant [27]. Sequencing a sin- tions could indicate a breakdown in the toler- gle organism was only achieved in 1995 (Table ance normally existing between microbes and 4.1) but today it is relatively easy to provide a

MOLECULAR MEDICINE 4. Omics 147 complete picture of any organism’s genomic (nucleotides); UniProtKB-Swiss-Prot, Protein structure with the assistance of bioinformatics. Data Bank (proteins) (Table 4.3), while Medline On the other hand, metagenomic approaches are and PubMed offer computerized access to the considerably more difficult because there will be scientific literature [29]. Having mined these a mixture of sequences representing many organ- resources, the data need to be analyzed for isms, and the sequences themselves will be rela- function by homology searching (DNA and tively small because they have been generated by protein) or identifying particular domains in NG DNA sequencing. Thus, there is a growing the case of proteins. Predicting protein struc- demand for better software, and skills to process ture is more difficult as most remain unknown. and then analyze the data from metagenomics Inference may only be possible. Each of the studies. data sets (for example genome, transcriptome, More recently the viral metagenome (viriome) proteome, metabolome and phenome) provides has been studied in different environments. This information and allows the construction of net- work is technically more challenging because works, but none gives the complete picture on there is no reference point equivalent to the its own. Merging all the information together ubiquitous 16S rRNA genes found in prokaryo- and developing integrated models requires tes. Nevertheless, different viriomes are being additional bioinformatic input. This is needed characterized to identify the pathogens present. to assemble the data sets into some form of net- Interesting results are already emerging, with work that is consistent with the model under over 50% of the DNA or RNA sequences study, and then converting the network into unknown. Ultimately, it is expected that new a computational model that can be tested in insights into virus-host interactions will become silico against specified biological parameters. possible. For example, knowledge of viral ecol- Ultimately, it will be necessary to validate ogy could be used for monitoring emerging the model through in vivo studies. Successful infections or assessing water quality [28]. applications of systems biology require multi­ disciplinary contributions, particularly biology, mathematics, engineering and physics. SYSTEMS BIOLOGY It has been suggested that there are two approaches in systems biology: (1) Top down – Systems biology is the computational recon- by computer modeling and simulation, and (2) struction of biological systems [29]. It is based Bottom up – integrating all clinical, laboratory on an interdisciplinary approach that involves and imaging data. The latter would have par- holistic rather than reductionist strategies to ticular relevance to the clinic. understand complex interactions in biological systems. In this way quantitative models can Clinical Applications be developed to predict function and behav- ior in a system. In biology the drivers for sys- In medical practice, an approach compa- tems biology include omics-based data sets rable to systems biology is already followed, that have been integrated through advanced since clinical, family, laboratory and imaging computer science and computational analyses data sets are all considered in decision mak- (Figure 4.8). The ultimate output would be the ing. However, this is ad hoc, not validated and production of a virtual cell. is derived informally. From being theoretical Genomics and proteomics data sets are constructs, research-based systems biology found in the literature and in many databases strategies are now able to be simulated in silico, notably EMBL, GenBank, DDBJ and Ensembl becoming more robust and reproducible as

MOLECULAR MEDICINE 148 4. Omics

FIGURE 4.8 A representation of systems biology. Left of arrow: Symbols represent individual data or data sets gener- ated through omics. However, isolated data sets per se do not identify the complex interactions that might be occurring. Information may only be meaningful if it can be linked together. Right of arrow: Systems biology utilizes computer-based algorithms to join related data sets in terms of metabolic pathways or function. This produces a better understanding of the 3-dimensional picture in the cell or tissue. evidence is accumulated. Today, there is grow- Clinical trials will then be required to test any ing interest in developing a more systematic relevant observations. In some circumstances, it approach that is underpinned by bioinformat- may not be possible to generate numbers for a ics in concepts such as systems pharmacology statistically significant clinical trial (for example and systems pathology. The former seeks to drug-drug interactions) and in silico modeling develop a whole-organism understanding may only be possible. Other advantages to a of drug action. To do so requires a thorough systems pharmacology approach would be the understanding of the drug’s potential effects generation of decision-making software tools generated by input from clinical markers, ani- for the clinician, and the identification of poten- mal models, the effects of the drug on cells, tial new targets for drug development. tissues and organs. Interacting networks can Systems pathology follows along similar lines then be modeled in silico and all data are used and provides a more global approach to manag- to understand better the effects of drugs on an ing complex systems such as cancer. Examples individual including drug-drug interactions. where this would be helpful are: (1) PSA

MOLECULAR MEDICINE 4. Omics 149

(prostate specific antigen) screening for detect- remains an important driver for new discover- ing early prostate cancer, and in predicting the ies. In this environment, a robust mechanism outcome of treatment, and (2) early stage can- to evaluate clinical utility or effectiveness is cer when the primary tumor is removed but the essential. patient is left with a dilemma in terms of what Traditionally, new drugs or diagnostic tests adjuvant therapies (if any) are needed to reduce are assessed within a population. Evidence- the risk of relapse. based medicine (EBM) approaches, such as Traditional surgical, biochemical, molecu- randomized clinical trials (RCT) allow the lar, imaging and pathological markers for evaluation of product safety. However, most predicting outcomes are still limited in their RCTs (and the same applies to GWAS stud- utility. Systems pathology implies that a more ies) measure efficacy as an outcome – i.e. does global assessment of markers and their interac- something work or not. RCTs are generally tions will allow various biological networks or conducted under ideal conditions, so the strict dynamics to be found. Following validation, requirements set by regulators can be met. As bioinformatics-based algorithms can be devel- we learn more about human variation, par- oped to identify treatment options personalized ticularly at the DNA level, and differences in to the tumor or the patient. susceptibility to disease, it is evident that popu- Some successes are emerging: lation stratification within RCTs might provide more reliable data. The ultimate in stratifica- 1. In hereditary ataxias, seemingly unrelated tion is represented by the individual in his or findings derived from known abnormal her own environment which is likely to be less proteins secondary to gene mutations, than ideal. The RCT is difficult in this respect complex protein-protein interaction and newer approaches are needed particularly networks and related pathways have been for molecular medicine which will invariably connected, showing that these neurological involve gene plus environment (G x E) effects. disorders are likely to result from RNA Comparative Effectiveness Research (CER) is splicing defects that promote the death of an additional evaluative approach. It was given Purkinje cells. a boost in the USA with a new Act in 2009 pro- 2. Parkinson disease involves at least six genes in viding $1.1 billion to fund its implementation. pathogenesis with many different pathways. CER involves a direct comparison of existing health There was no unifying hypothesis of how interventions (DNA genetic tests or genetic ther- these interacted to cause brain damage, until apies in the present context), and the examina- a more global picture based on genomic and tion of outcomes in a real life environment with proteomic data identified mitochondrial effectiveness as the end point – i.e. does an inter- pathways as being important [30]. vention do what it claims to do in ordinary cir- cumstances [31]. Gathering data for CER can be OVERVIEW via traditional RCTs and systematic reviews, as well as other means. An important medical inter- A number of concepts have been described vention is the NG DNA whole genome or exome in Chapters 1 to 4, each having fine distinctions sequence. But does it have clinical utility? Case in terminology – such as molecular medicine, reports in the rare genetic disorders would sug- genomic medicine and personalized medi- gest that NG DNA sequencing is clinically effec- cine. An attempt at connecting them is made tive (Box 4.8). However, these disorders are rare in Figure 4.9. Whatever the distinctions, a com- and the numbers are not there for an RCT. A CER mon thread linking them is technology, which approach might be better to make an assessment.

MOLECULAR MEDICINE 150 4. Omics

Molecular (DNA/RNA/OMICS) Medicine

Population-based Personalized medicine medicine Population stratification

RCT RCT

CER

Collective Individual result result

Drivers Technology & $ WGS

DNA Genetic & genomic tests diagnostics Pharmacogenetics for Rx Outcomes Pharmacogenomics & drug delivery New drugs DNA based population stratification

Business case (Government; Health organisations)

Success Uptake by Health Professionals

Engagement of community

FIGURE 4.9 Relationship between molecular, genomic and personalized medicines. Molecular medicine describes the use of DNA (RNA) based knowledge to inform clinical practice although the impact of other omics must be considered. Genomic medicine is a recent term for what is essentially the same activity although the name implies a more restricted focus to DNA. Populations are traditionally used to assess new therapies or models of care. Underpinning this is evidence- based medicine via randomized clinical trials (RCTs). Outcomes produce a one-size-fits-all view which is a very different philosophy to personalized medicine. The latter is reached via population stratification and the evidence comes from the traditional RCT as well as other methodologies such as comparative effectiveness research (CER). Drivers for molecular medicine are technology and industry with the immediate goal being whole genome sequencing (WGS). Outcomes include a range of DNA genetic and genomic tests and a renewed drug development pipeline through pharmacogenomics. Success will depend on the appropriate business model that is attractive to those who hold the health dollars; interest and under- standing by health professionals and an educated and engaged community.

MOLECULAR MEDICINE 4. Omics 151

BOX 4.8 THE EFFECTIVENESS OF NG DNA SEQUENCING IN MANAGING RARE DISEASES. Two recent success stories demonstrate how this, an allogeneic hematopoietic cell transplant whole genome sequencing or exome sequencing was performed and the child’s gastrointestinal can provide invaluable input into the diagnosis disease resolved [32]. and treatment of rare genetic disorders. The second case involved two non-identical The first involves a severely affected male twins aged 14 years. They had been diagnosed child aged 15 months with an acute coli- when aged 5 as having DRD (dopamine respon- tis resembling Crohn disease. Known causes sive dystonia) and were treated with L-dopa. were sought using conventional investigations However, their condition deteriorated and including DNA sequencing of potential candi- whole genome sequencing was undertaken. This date genes. All failed to give an answer, until showed two mutations (a missense change and a whole exome sequencing strategy was used. a premature stop codon) in the SPR gene which It identified a hemizygous missense change in had previously been associated with DRD. As a the gene XIAP (X-linked inhibitor of apoptosis). result of this observation, the L-dopa treatment This gene plays a key role in the pro-inflamma- was supplemented with 5 hydroxytryptophan tory pathway and represents a novel mechanism which bypassed the SPR gene defect. This led to for developing Crohn disease. On the basis of clinical improvement in both twins [33].

There is a place for both RCTs and CER in molec- [7] Cardoso F, Van’t Veer L, Rutgers E, et al. Clinical ular medicine but flexibility is needed so that application of the 70-gene profile: The MINDACT trial. Journal of Clinical 2008;26:729–35. either or both may be appropriate depending on [8] Miller DT, Adam MP, Aradhya S, et al. Consensus the potential utility of a discovery. statement: Chromosomal microarray is a first- tier clinical diagnostic test for individuals with developmental disabilities or congenital anoma- References lies. American Journal of Human Genetics 2010; 86:749–64. [1] Next steps in the sequence: the implications of [9] Formal HTA on aCGH for the genetic evaluation of whole genome sequencing for health in the UK. patients with developmental delay/mental retar- PHG Foundation 2011. www.phgfoundation.org/ dation or autism spectrum disorder. http://www. reports/10364/ bcbs.com/blueresources/tec/vols/23/acgh-genetic- [2] Mardis ER. A decade’s perspective on DNA sequenc- evaluation.html ing technology. Nature 2011;470:198–203. [10] Origin of the Internet from the Internet Society. [3] X Archon prize. http://genomics.xprize.org/ http://www.isoc.org/internet/history/brief.shtml [4] Ashley EA, Butte AJ, Wheeler MT, et al. Clinical [11] Tramontano A. Bioinformatics. In: Encyclopedia of assessment incorporating a personal genome. Lancet Life Sciences (ELS). Chichester: John Wiley & Sons, 2010;375:1525–35. Ltd.; 2009. [5] Morgan JE, Carr IM, Sheridan E, et al. Genetic diagno- [12] Calo V, Bruno L, La Paglia L, et al. The clinical signifi- sis of familial breast cancer using clonal sequencing. cance of unknown sequence variants in BRCA genes. Human Mutation 2010;31:484–91. Cancers 2010;2:1644–60. [6] Miller MB, Tang Y-W. Basic concepts of microarrays [13] Blumenthal D, Glaser JP. Information technology and potential applications in clinical microbiology. comes to medicine. New England Journal of Medicine Clinical Microbiology Reviews 2009;22:611–33. 2007;356:2527–34.

MOLECULAR MEDICINE 152 4. Omics

[14] Ullman-Cullere MH, Mathew JP. Emerging land- [24] Acevedo-Arozena A, Wells S, Potter P, et al. ENU scape of genomics in the electronic health record mutagenesis, a way forward to understand gene func- for personalized medicine. Human Mutation 2011; tion. Annual Reviews Genomics and Human Genetics 32:512–6. 2008;9:49–69. [15] Menachemi N, Prickett CT, Brooks RG. The use of [25] Lieschke GJ, Currie PD. Animal models of human physician-patient email: a follow-up examination disease: zebrafish swim into view. Nature Reviews of adoption and best-practice adherence 2005–2008. Genetics 2007;8:353–67. Journal of Medical Internet Research 2011;13:e23. [26] Carroll IM, Threadgill DW, Threadgill DS. The gas- [16] Escoubas P, King GF. Venomics as a drug discovery trointestinal microbiome: a malleable, third genome of platform. Expert Reviews of Proteomics 2009;6:221–4. mammals. Mammalian Genome 2009;20:395–403. [17] Griffiths WJ, Wang Y. Mass spectrometry: from pro- [27] Wooley JC, Godzik A, Friedberg I. A primer on teomics to metabolomics and lipidomics. Chemical metagenomics. PLOS Computational Biology Society Reviews 2009;38:1882–96. 2010;6:e1000677. [18] Shi M, Caudle WM, Zhang J. Biomarker discovery in [28] Rosario K, Breitbart M. Exploring the viral world neurodegenerative diseases: a proteomic approach. through metagenomics. Current Opinion in Virology Neurobiology of Disease 2009;35:157–64. 2011;1:1–9. [19] Yu X, Schneiderhan-Marra N, Joos TO. Protein micro- [29] Kersey P, Apweiler R. Linking publication, gene and arrays for personalized medicine. protein data. Nature Cell Biology 2006;8:1183–9. 2010;56:376–87. [30] Villoslada P, Steinman L, Baranzini SE. Systems biol- [20] D’Allesandro A, Righetti PG, Zolla L. The red blood ogy and its application to the understanding of neuro- cell proteome and interactome: an update. Journal of logical diseases. Annals of 2009;65:124–39. Proteome Research 2010;9:144–63. [31] Khoury MJ, Rich EC, Randhawa G, Teutsch SM, [21] Roux A, Lison D, Junot C, Heilier J-F. Applications Niederhuber J. Comparativeness effectiveness of liquid chromatography coupled to mass spec- research and genomic medicine: An evolving partner- trometry-based metabolomics in clinical chemis- ship for 21st century medicine. Genetics in Medicine try and toxicology: A review. Clinical Biochemistry 2009;11:707–11. 2011;44:119–35. [32] Worthey EA, Mayer AN, Syverson GD, et al. Making [22] Lanktree MB, Hassell RG, Lahiry P, Hegele RA. a definitive diagnosis: successful clinical application Phenomics: expanding the role of clinical evaluation of whole exome sequencing in a child with intracta- in genomic studies. Journal of Investigative Medicine ble inflammatory bowel disease. Genetics in Medicine 2010;58:700–6. 2011;13:255–62. [23] 2007 Citation for the Nobel Prize in Physiology [33] Bainbridge MN, Wiszniewski W, Murdock DR, et al. or Medicine. http://nobelprize.org/nobel_prizes/ Whole genome sequencing for optimized patient man- medicine/laureates/2007/advanced.html agement. Science Translational Medicine 2011;3:87re3.

Note: All web-based references accessed on 16 Feb 2012.

MOLECULAR MEDICINE CHAPTER 5 Delivering Genetics and Genomics Direct-to-Consumer

OUTLINE

Introduction 153 Self-Regulation 164 Definitions and Marketplace 154 Professional Standards 164 Types of DTC DNA Tests 155 Education and Research 166 Direct-to-Consumer Advertising 166 Pros and Cons 156 Future 166 Benefits 157 Concerns 158 References 167 Practice 163 Ways Forward 163 Regulation 163

INTRODUCTION medicine with the physician-laboratory link becoming closer and more interactive. It is now time to rethink the patient’s role. The drivers Clinical training involves many interactions for change are: with patients. In contrast, laboratory medicine training, if given at all prior to specialization, 1. Technological advances; conforms to an outdated physician-patient 2. The emerging importance of the Internet; relationship with the latter expected to play a 3. Expansion of the non-medically trained passive role (Figure 5.1). This becomes particu- workforce; larly noticeable in genetics or molecular medi- 4. Relatively poor understanding of molecular cine, as patients often know a lot about their medicine by some health professionals, health issues from the media, Internet or their while the lay person is becoming more aware relatives. of new developments, and The traditional physician-patient-laboratory 5. Expansion in direct-to-consumer (DTC) paradigm continues to evolve in molecular DNA testing.

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00005-0 153 © 2012 Elsevier Inc. All rights reserved. 154 5. Delivering Genetics and Genomics Direct-to-Consumer

Patient Physician Laboratory

Laboratory Consumer

Laboratory Consumer Health Professional

FIGURE 5.1 Different relationships between the patient, physician and laboratory. Top: The traditional approach to laboratory medicine places the physician in the center with the patient to physician and physician to laboratory interactions separated. Middle: In contrast, DTC DNA testing empowers the consumer to take responsibility which is appropriate and to be encouraged. However, it fails in how it does this because two key elements are missing: (1) There may be no mech­ anism for professional advice or support. (2) The consumer is vulnerable when the service is delivered from an offshore facility. Bottom: This continues the DTC theme but includes an ad hoc health professional (not necessarily a physician) who can be accessed by the consumer. Is this suitable in a rapidly changing field if competence cannot be gauged and long term follow-up is not possible? It would be difficult to seek justice for wrong advice if the health professional is located offshore.

Definitions and Marketplace and healthier living, whereas the disclaimers that follow state that the product represents The term direct-to-consumer (DTC) DNA information alone and should not to be used for testing will be used here to refer to DNA genetic health-related decision making. An illustrative tests that a laboratory sells directly to a consumer. example modified from the website of one DTC Apart from company employees (some of DNA testing company states: whom may be health professionals) there are no physicians involved. DTC DNA tests may Our Tests: Discover your genetic predisposition also include those ordered by non-medical to disease and take steps to maximize your health. health professionals such as a pharmacist. In However, at the bottom of this page is a future, DTC DNA testing kits will be purchased Disclaimer: and used by the consumer at home in a similar way to the accepted self-measurement of blood This service is not a test designed to diagnose, sugar for monitoring diabetes. Presently, this is treat or prevent a disease or medical condition and not an option because the technology is limiting. is not intended to be medical advice. This service has not been approved by the (regulatory body) for diag- This chapter is relatively short but has not nostic use. been included in Chapter 3 as part of the DNA genetic testing theme, because it is a new para- In terms of consumer protection provided digm for medical service delivery. At present, by regulatory bodies and legislation, it is essen- there remains some confusion as to what is tial to distinguish the DTC DNA tests sold to actually being sold in DTC DNA testing. This consumers located in the same country (legal can only be determined by careful review of protections are enforceable) versus those sold the claims and disclaimers on each company’s through the Internet by offshore based laborato- website. Is it the equivalent of a medical DNA ries (where there is little if any legal protection test, from which health-related decisions can be for practical purposes) (Box 5.1). made by the patient and family, or is it a prod- At the beginning of 2000 the DTC DNA test- uct that is not meant to be used for medical ing market was small, but expected to grow, as purposes but simply provides information? The noted in the United Kingdom’s Human Genetics message on many websites is ambiguous. The Commission 2003 report Genes Direct [1]. In overall theme promotes health-related choices the same year, the Australian report Essentially

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 155

BOX 5.1 REGULATORY ISSUES IN DTC DNA TESTING. Each jurisdiction enacts legislation to protect assays). These types of tests are exemplified by its community by ensuring that DNA tests are microarrays where multiple genes are tested safe (analytic validity) and benefit the patient and the results are analyzed to give a composite (clinical utility and clinical validity). However, risk factor for the patient. However, a few years as demonstrated by the Department of Public after the draft was released, the FDA withdrew Health in California in 2008, this is not easy. In the IVDMIA draft guidance in order to develop this particular case, the regulator sent letters to a broader, more comprehensive document for a number of laboratories requiring them to cease all laboratory developed tests. IVDMIA type- and desist performing genetic tests without tests are complex and expensive and so not usu- appropriate accreditation or ordering of tests by ally provided DTC. Even if an effective national physicians. It is noteworthy that this order only regulatory regime were able to be identified for applied to the testing of California residents, and DTC DNA testing, the global marketplace in presumably consumers from other states or out- which these tests are provided will make regu- side the USA could continue to be tested. The lation a major if not insurmountable obstacle. FDA also attempted to review its regulation for In 2010, changed circumstances made the FDA genomic type tests and issued draft guidance on revisit the issue of regulation in DTC DNA test- IVDMIAs (in vitro diagnostic multivariate index ing, and this is discussed in the text.

Yours: the protection of human genetic informa- l Pseudo-medical testing – Dermatogenetics tion identified DTC DNA testing as a potential (DNA information for cosmetic purposes) or problem, and made recommendations on how nutrigenetics (DNA information to improve to proceed, in the context that only a handful well being through dietary strategies). of laboratories were actively involved but more l Recreational genetics – Tracing distant would follow [2]. Both were correct because in ancestors or kin, assessing ability such as less than a decade the scene had changed signifi- sport. cantly, with around 20 DTC DNA testing labo- l Relationship testing – Applications ratories operating in the USA by 2011 [3]. More include paternity or immigration testing. noteworthy than the number of laboratories is Microbiological testing can also use pattern the range of DNA tests offered. Some reasons for testing and comparisons. the rapid growth in DTC DNA testing are sug- gested in Table 5.1. The UK’s Human Genetics Commission has attempted to give a comprehensive classification of genetic DNA tests into 11 distinct classes [4]. Types of DTC DNA Tests Table 3.7 goes further, employing 14 classes. However, these classifications are complex There are a number of DNA genetic testing since there is overlap between them. While it products for sale: might be helpful for experts to deal with subtle l Genetic disorder or trait – Confirming a clinical differences between DNA tests, it can be con- diagnosis or predicting susceptibility to fusing to others, and so the four broad groups disease or traits. described above are preferred in this chapter.

MOLECULAR MEDICINE 156 5. Delivering Genetics and Genomics Direct-to-Consumer

TABLE 5.1 Growth in the DTC DNA testing industry over a decade.

Issues contributing to the development of DTC DNA testing

The DNA testing industry is a new entrepreneurial and potentially lucrative commercial application of molecular medicine. It is likely to continue growing as genomics moves to the more common public health problems such as diabetes. The media helps with regular reporting of new genetic discoveries and public interest is maintained. The community is increasingly using the Internet to purchase a range of goods and the ease with which this occurs is appealing. Not surprisingly, the option to order a test and collect the DNA samples required at home is also preferred. There can be additional financial incentives if DTC DNA testing companies sell products linked to the test such as skin creams or dietary supplements. These products are advertised as being personalized based on the customer’s DNA profile. As the costs of healthcare increase and the population ages, the emphasis on prevention becomes a key message from government. This resonates nicely with DTC DNA genetic testing with its promise of predicting risk factors before development of disease. The conventional paradigm illustrated in Figure 5.1 that the patient reports to the medical practitioner who deals exclusively with the laboratory doing the test is outdated in the era of the Internet with its options for blogging, chatting and live interactions. Although paternalistic attitudes from health professionals are no longer acceptable, the individual demand for greater independence and autonomy may not fit in well with the traditional patient-physician relationship in laboratory medicine. The philosophy of DTC DNA testing companies that individuals must be more involved and responsible for their health particularly when it comes to the personal issues of genetic information and genetic health is correct and consistent with the many social interactions possible through the Internet. The DTC process is superficially more secure because the individual orders the test and he or she gets the result. This would be particularly the case if disclosure of a DNA test result might impact negatively on employment, health or life insurance.

The medical impact of DTC DNA testing exemplified further in this chapter, the health must also consider what is known about a dis- significance of a mutation in a Mendelian dis- order and what effective interventions can order is very different to finding genetic asso- result from this knowledge. This is exempli- ciations in the more complex genetic disorders. fied by two extremes in diabetes. There exists These distinctions are not highlighted by DTC a rare but important subgroup called MODY DNA testing companies. (maturity onset diabetes of the young) which has Mendelian genetic inheritance and so is caused by mutations in a single gene. In con- PROS AND CONS trast, type 2 diabetes (T2D) is a global health problem resulting from genetic and environ- There are numerous commentaries on the mental (G x E) interactions (Table 2.8). In type risks and benefits of DTC DNA testing. These 2 diabetes, there is an expanding list of genes will inform but also confuse if they compare or DNA SNPs (over 40) that may contribute to apples and oranges. For example, the public is this disorder, but knowledge of their effects (or given a glimpse of DNA testing in forensic what to do with the genetic information) has investigations in popular TV shows. The DNA not yet progressed beyond research hypoth- test looks clean and quick with no problems, eses (Figure 5.2). As discussed in Chapter 3 and and invariably promotes a good ending as

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 157

DNA Genetic Tests Disease

Clinical Implications Laboratory Implications

Mutations Association Questions to ask in MODY in T2D

Yes, although MODY No, because test is Is the test useful? is rare population based

The result is not clear What does the result mean? Treatment will change for the individual

Uncertain if not Yes, because Is counseling/support needed? clear what results Mendelian disorder mean

Yes, others might Uncertain if not What are implications for have the same clear what results family members? mutation mean

FIGURE 5.2 Clinical and laboratory implications for a disorder such as diabetes and DNA genetic testing. For any DNA test, there are four questions the physician must ask. In DTC DNA testing it is the consumer who does this. For MODY, the questions are more straightforward and answerable since mutations in the glucokinase gene provide a diagno- sis. The DNA test is also important because it alters treatment. For type 2 diabetes (T2D) current DNA-based knowledge will have little if any clinical utility or even clinical validity because it is thought T2D is caused by interactions between at least 40 genetic markers and the environment. Each of the presumed T2D genes is considered to have a small but cumulative effect on disease development.

Benefits the culprit is apprehended. The same type of DNA profiling is also used for paternity testing Few would argue that purchasing a DTC or kinship testing but each of these scenarios DNA test via the Internet is easy. Traditional has different consequences. These differences laboratory services should take note and learn. can be exacerbated further by the way the test In countries where the potential for discrimina- is provided; whether through the traditional tion in employment, health or life insurance has laboratory route or via DTC DNA testing. In not been addressed, the DTC approach could be other words, a DNA test can be used for mul- seen as a way of bypassing the family (primary tiple purposes ranging from medical to non- care) physician, who may be obliged to release medical applications. Hence, it is not the test medical information about the patient includ- itself that is the key issue in the debate about ing DNA test results. There are many personal DTC DNA testing but more importantly the issues and freedoms expected by members of reason it was undertaken, how it is provided, the modern community that are better addressed and what use is that knowledge in terms of through direct-to-consumer DNA testing healthcare. (Table 5.1). Nevertheless, perceived benefits need

MOLECULAR MEDICINE 158 5. Delivering Genetics and Genomics Direct-to-Consumer to be balanced with risks – could this approach Consumer protection laws can always be tight- adversely affect the physical or psychological well ened if necessary (although as mentioned ear- being of an individual and his or her family? lier this is only relevant to DTC DNA testing companies operating within the country or Concerns jurisdiction). Nevertheless, a sobering exam- ple of how DTC DNA testing misled consum- DTC DNA testing for two of the four classes ers is provided by two USA Government listed earlier (pseudo-medical and recreational Accountability Office reports which followed genetics) is not the focus of this chapter. These audits of companies providing this service types of DNA genetic tests can be fun applica- through the Internet (Box 5.2). Paternity and tions and should not lead to harm unless an immigration (relationship) testing can be pur- extreme diet or intervention results. At worst, chased but whether the courts will accept these the consumer will lose money. Therefore, caveat results determines their commercial viability. emptor is the overarching principle in these For the remaining category, involving medi- tests, and truth in labeling together with a bet- cally relevant tests, there are eight issues that ter educated community is the way forward. need to be addressed.

BOX 5.2 2006, 2010 AUDITS BY THE US GOVERNMENT ACCOUNTABILITY OFFICE (GAO). In providing testimony before a US Senate 2. Two of the four companies also supplied committee, the GAO reported on a study it had expensive dietary products that were undertaken to monitor four DTC DNA test- purported to be selected as being beneficial ing companies which provided nutrigenetic on the basis of the DNA profiles. However, testing [5]. The companies claimed that by test- the GAO found that the products simply ing DNA it was possible to identify nutritional contained multivitamin combinations that or life-style changes that had health implications. could be purchased much more cheaply; To test this, the GAO took DNA from two unre- 3. Despite the DNA samples coming from only lated women and one male and used this DNA to three individuals, the data generated were make up 14 fictitious individuals with different inconsistent and even different for the same ages, weights and life styles. The 14 samples were DNA sample, and sent to the companies and the results obtained 4. One laboratory was not appropriately were reviewed by the GAO which found that: accredited. 1. Despite the companies issuing numerous In 2010, the GAO revisited the DTC DNA disclaimers that their DNA tests were not testing industry by auditing the performance intended to diagnose disease, their reports of four laboratories. Its report showed that little identified all 14 individuals as being at risk if anything had improved and now inappropri- of contracting a range of medical conditions, ate behavior had moved from pseudo-medical including osteoporosis, cancer, type 2 DTC DNA tests to include medically relevant diabetes, hypertension and others; ones [6].

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 159

Selecting the Right DNA Test more problematic. There are many It can be difficult to know which DNA test other genes involved in breast cancer, will be clinically useful. This will depend to hence the penetrance is between 36–85% some extent on clinical circumstances, fam- (Chapter 7), and there are G x E interactions. ily history and the health professionals under- Furthermore, mutations in the two breast standing of what is available and what might genes are often single base changes be informative. For example, the finding of that produce different amino acids, the a homozygous p.Phe508del mutation in the significance of which can be uncertain; i.e. CFTR gene will indicate that a sick child with this is an instance of variants of unknown the relevant signs and symptoms has cystic significance or VUS (Chapter 3). fibrosis. On the other hand, the same test in an l The most challenging tests to interpret are adult with abdominal discomfort and no other those looking for susceptibility in complex relevant past medical history would be a waste genetic disorders, such as type 2 diabetes, of time and money because it will not provide illustrated in Figure 5.2. This is because the any useful information about the clinical prob- concept of risk is more difficult to grasp as it lem; i.e. the cystic fibrosis DNA test only has is small (x2 for example) and combines with clinical utility in the appropriate clinical con- other risks, both genetic and environmental. text. This DNA test might also be harmful in Since risks used in susceptibility tests are the latter example if a negative result is misin- based on large population studies, there are terpreted as excluding pancreatitis (a manifest­ problems assuming the same risk applies ation of cystic fibrosis). to an individual, even without taking into consideration stratification issues related to Interpreting the Results ethnicities and populations. Understanding what the DNA test result For the experienced clinician and other means can be reasonably straightforward in health professionals involved in molecular some circumstances, or a challenge for both the medicine, assessing risk and explaining this health professional and the patient in others. to a patient and his/her family are not easy Three examples of increasing complexity are tasks, and one has to wonder how a member given below: of the community who is not trained in risk l Huntington disease is an autosomal calculations fully understands the implications dominant genetic disorder with 100% (Figure 5.3). Add to this the finding that even penetrance. The DNA test for this disorder laboratories providing the same test can come measures the expansion in CAG triplet up with different risks depending on what repeats in the HTT gene (Table 2.4). A result DNA markers were used (Box 5.3). of 40 repeats means that Huntington disease will develop, and 26 repeats is Laboratory Standards normal. However, there is an intermediate Is the DNA testing laboratory accredited zone between these numbers that involves and does it have the skills to perform a DNA uncertainty. Working out what this result genetic test? This would be easier to determine means requires skill, and if necessary, help if dealing with a local laboratory rather than from professional colleagues. one operating overseas through the Internet. l Interpreting the result of DNA testing in Concerns about an unregulated market led the breast cancer by looking for mutations in OECD to release standards on quality assur- the BRCA1 or BRCA2 gene is considerably ance [8]. The requirements are demanding but

MOLECULAR MEDICINE 160 5. Delivering Genetics and Genomics Direct-to-Consumer

Genome wide OR Average Epidemiological association study population risk study

Case control study RR

Absolute risk

Adjusted life time risk

FIGURE 5.3 Odds ratio OR or relative risks RR obtained from case control or association studies. The OR or RR (explained in more detail in Table 3.5) obtained are multiplied by the average population risks known from epidemiologi- cal studies to produce an absolute or adjusted life time risk and this is usually given to the customer. As illustrated in reference [7], a relative risk of 1.5, defined by an individual having a particular SNP marker when multiplied by a known population risk of 10%, would give the individual an overall absolute risk of 15%. Of course, the important question is how meaningful is this number to the individual (taking into consideration his or her environmental exposures, other possible risk factors including ethnicity). Even if the risk is real, another unknown is what impact the result will have in terms of life style changes or interventions to improve health.

BOX 5.3 THE CHOICE OF DNA MARKERS IN COMPLEX DISEASE CAN INFLUENCE RESULTS OF DNA TESTING. A 2009 article in Nature compared two DTC on population research studies and then applied companies and made recommendations on how these results to individual cases. The SNPs might they might perform better [7]. The study showed be the same or even different for the same dis- the companies performed very well in terms ease. Another issue affecting risk determination of the actual DNA test – i.e. the same answer was the type of population used to calculate the was obtained in 99.7% of cases. So the safety or average population risk. In this particular exam- quality of the DNA test itself (analytic validity) ple, one company considered population risks was excellent, although it should be noted that in terms of males versus females while another the two companies were leaders in this field. company used age as the discriminator. In some In contrast, the ways in which the companies respects this is déjà vu, because in the early days interpreted the risks indicated by these tests of forensic DNA testing, individual companies was a concern. This is illustrated in Table 5.2. used their own patented sets of DNA mark- There were a number of significant inconsisten- ers and this led to confusion and inaccuracies cies between the two companies in terms of the because of the ways results were interpreted (see degree of risk of contracting the same serious Chapter 9). The forensic problem was addressed diseases that was reported to customers. The when DNA testing markers were rationalized report noted that the main reason for this dis- and came in commercially prepared kits. Now crepancy was the selection of SNP markers used results across different DNA testing laboratories in risk estimation. Companies chose SNPs based could be compared and appropriate QC started.

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 161

TABLE 5.2 Results of DTC DNA testing for some reports, unless there was appropriate oversight a important medical disorders [7] . by a research ethics committee (Chapter 3). This Disorder Consistent resultb Different result cautious approach does not presently apply to the DTC industry; although as indicated earlier Breast, colon or 7 3 the disclaimer that the DTC DNA test should prostate cancer not be used for clinical decision making seems Autoimmune 7 3 to get around the issue. Nevertheless, regula- disease (SLE, RA) tory agencies will need to consider the appar- Celiac disease 5 – ent differences between mainstream DNA Crohn disease 2 3 genetic testing and that provided DTC to bring the latter more into line with standards that Type 2 diabetes 2 3 ensure the purpose of the DNA test and what Multiple sclerosis 4 – it is providing in terms of healthcare are more Restless leg 2 3 transparent. syndrome There is a view held by some in the DTC testing industry that all members of the com- a Results were provided as increased, reduced or same population risks. munity are entitled to genetic information bThis means the number of patients given the same result by the two DNA testing companies. The third column shows how many patients were given even if it is incomplete or preliminary. Others different results for the same disorder. would say that it is unethical to provide infor- mation that is wrong, incomplete, or could lead to harm [10]. In Chapter 3, Figure 3.12, the DNA test development pipeline identified appropriate. How these are implemented will steps to take before the test is allowed into the depend on the regulatory agencies in each clinic. There is the risk that some of these are country. This can be confusing to the health bypassed in DTC testing (Figure 5.4). professional let alone the consumer.

Privacy and Confidentiality Research Versus Validated DNA Tests Superficially, a DTC DNA test undertaken One company places the 100 DNA tests through an overseas-based laboratory is attrac- for health issues and traits that it sells into two tive because it gives the customer the power to categories: limit access to the result by others, even family 1. Established research reports (previously members. However, this needs to be balanced by called Clinical reports) – conditions or traits the reality that the individual’s DNA sample is for which there are multiple research studies now held by a third party, and there is little that published; i.e. associations that are regarded can be done to retrieve or limit use of that DNA as reliable, and for other purposes. Some DTC DNA testing com- 2. Preliminary research reports – research that panies provide the consumer with the option to has not yet been confirmed by the scientific give consent for research studies, but what hap- community. pens if the company decides to use the DNA sample for other purposes, sells the material to a In addition there is the comment that the third party, or ceases to trade? company’s list grows every month as new research is published [9]. In mainstream medical practice, it would be Worried Well difficult to justify the use of DNA genetic tests The Internet is a growing and unlimited for clinical care solely on the basis of research source of information. Some will be accurate

MOLECULAR MEDICINE 162 5. Delivering Genetics and Genomics Direct-to-Consumer

More rapid translation Industry Evidence-based practice

Service New DNA Research Evaluation Clinic Laboratory Tests

Research DTC Laboratory

FIGURE 5.4 The pipeline for developing a DNA test can be significantly truncated for DTC DNA tests. Research funders expect discoveries to be translated more quickly into clinical practice. However, this should occur once there is the evidence that harm is prevented or appropriate therapy can be started. Health dollars should only be spent on proven prac- tices. DTC DNA testing is attractive because it pushes the translational aspect. Indeed it demonstrates potential new ways in which this might occur, e.g. by networking customers and so increasing the pool of research participants. The benefit needs to be weighed against the risk of moving into what is the equivalent of a clinically-based test before sufficient evi- dence can be accumulated to demonstrate clinical utility as a minimum. This is particularly important when dealing with co-dependent technologies (companion diagnostics) (Chapter 7). and useful while other material has the poten- Equally certain is that the inappropriate use tial to cause harm. There is little that can be of DNA testing will diminish public trust, done about this, apart from ensuring that which may then have a negative impact on the community is sufficiently educated and conventional DNA genetic testing and genetic sophisticated, so that data from the Internet are research. This loss of trust will follow if claims viewed with a healthy degree of skepticism. consistently fail to deliver on promises. Further In terms of healthcare, it is important that the discussion on ELSI and DTC DNA testing is Internet does not produce a cohort of worried found in [12] and Chapter 10. well within the community. DTC DNA testing has the potential to add to Genetic Counseling the worried well through unnecessary testing or inaccurate DNA test results initially affecting the There is general agreement that genetic individual and then flowing on to family mem- counseling of some form is needed in relation bers. The family physician will be the person who to DNA genetic tests. Does this have to be both has to deal with this problem and so needs to be pre- and post-test counseling? How intensive aware of DTC DNA testing, including what can must it be, and who gives it – is the physician or cannot be provided in terms of health issues the right person, or is it necessary to have pro- [11]. The worried well also have societal implica- fessional genetic counselors? Another question tions because they will utilize health resources might be does the counseling need to be face- that could be better directed to other needs. to-face and one-to-one in view of the sensitivity of some issues that are discussed? The answers Public Trust to these questions depend on the context of One can be certain that opinions on DTC the DNA genetic test, and to some extent the DNA testing will be passionate and contrasting. resources available.

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 163

The DTC DNA testing industry initially Commission (discussed below) and the view of ignored the counseling issue, but now some the American Society for Human Genetics, that companies are responding to this concern by a one size fits all approach will not work for DTC linking their laboratory services with Internet or genetic testing. telephone-based counseling, which is provided for an additional charge. This appears to address Regulation the question of counseling, if appropriately quali- Governments will be slow to introduce new fied staff are available to answer questions from legislation until problems are clearly defined, consumers [13]. Nevertheless, the effectiveness a legislative solution can be seen and the com- of such telephone or Internet-based counseling munity response is loud. This is particularly needs to be assessed, particularly when deliv- relevant to molecular medicine, where it can be ered to customers from a distance, and when expected that changes will continue to emerge family physicians have been bypassed and so and new laws may soon become outdated, or are unlikely to be helpful. The customer may not even cause further problems through unfore- have access to legal protection from incorrect or seen circumstances. The importance of not inappropriate advice. inhibiting innovation and new developments are additional considerations. Psychiatry Practice The DTC industry has grown with minimal interference from government, apart from a few An overview of the impact of genetics on temporary setbacks with the regulatory author- psychiatry practice is given in reference [14]. ities in New York and California and a warning Included are various patient and family attitudes from the FDA in 2006 following a report from to DNA diagnostic or predictive testing in seri- the US Government Accountability Office (Box ous mental illnesses, such as bipolar disorder, 5.2). However, the landscape changed in 2010 schizophrenia and depression. Not surprisingly, with some well publicized events including: individuals were more interested in DNA genetic testing where the results were more indicative l Poor laboratory practice, when a DNA plate of disease than results that were lower in prob- was reversed and the results for numerous ability – i.e. in this cohort of patients and families, customers of one DTC DNA testing service information per se was not considered as useful as were incorrect. The error and subsequent a DNA test result that was clinically meaningful. publicity also demonstrated that this high This report concluded that there is currently lit- profile company did not actually do the tle evidence of clinical utility or validity in DNA laboratory work but subcontracted it to genetic tests for mental illness. Yet DNA tests for another laboratory. bipolar disorder and schizophrenia appear fairly l The publication of a study showing two regularly in the tests offered DTC. DTC DNA testing services were able to get the laboratory component of the DNA test correct, but made a number of errors in the WAYS FORWARD interpretation of the results for some clinically important diseases, including cancer Because of the difficulties regulating the (Table 5.2). Internet, a mix of approaches is considered l An announcement that DTC DNA testing the most appropriate way to deal with DTC kits providing information on medical DNA testing. This is consistent with the rec- disorders would become available through a ommendations of the UK’s Human Genetics US-based drugstore chain.

MOLECULAR MEDICINE 164 5. Delivering Genetics and Genomics Direct-to-Consumer l A second and very critical 2010 report by landscape. It might be the only relevant the US Government Accountability Office, approach for services offered via the Internet describing unethical and even illegal from offshore locations. To progress self-regu- behavior detected during an audit over 12 lation, the UK’s Human Genetics Commission months (Box 5.2). developed Principles to guide the behavior of DTC laboratories (Table 5.3). Their purpose was The US Congress and the FDA had to respond to address the gap between regulations across to the above. A summary of the regulatory ver- jurisdictions and promote consistency and high sus non-regulatory (usually advisory) options standards in this market. Whether these will be also featured as an opinion piece in Nature, effective in terms of self-regulation is difficult although no new insights into how to deal to predict. The Principles are predominantly with cross-border issues were proposed [15]. directed to laboratory activities. They might In early 2011, the FDA appeared to be moving also help regulators to draft new or strengthen towards more stringent regulatory requirements existing legislation to protect their own com- for DTC DNA testing, making it necessary for munities. It will be interesting to see what level medical tests to be ordered by physicians who of compliance occurs with offshore Internet- would also be involved in their interpretation. based providers. Laboratories would need appropriate certifica- An editorial in the Lancet welcomed the tion when carrying out medical tests. This new Principles, but suggested that without over- direction by the FDA has not been popular with sight by a regulatory body they could not be industry or some scientific organizations, with enforced and so are unlikely to change practices concern expressed about stifling individual [16]. Some expectations in the Principles were rights, business and innovation. How far the also a little unrealistic, such as the requirement FDA will take this matter remains to be seen. for confirming the identity of the person pro- In mid 2010, the Australian equivalent of the viding the biological sample. This is certainly FDA (TGA – Therapeutic Goods Administration) a good idea but not generally expected from introduced new regulations for IVDs (in vitro mainstream laboratories, and it is difficult to diagnostic devices) with genetic tests being see how this might be verified in the case of a placed in class 3 in a risk classification of 1 to 4 sample sent from a distant location. with 4 being the highest public health risk group. It was also stated that self-testing for serious medi- cal conditions would not be allowed. Self-testing Professional Standards would include products purchased in a store or DTC services can circumvent the require- online where there was no medical professional ment for a health professional to order the test involved. This would seem to exclude DTC in a number of ways. One example involved DNA testing for serious medical disorders – at the director of a high profile company suggest- least those performed in Australian-based labo- ing that since he was a medical practitioner ratories. However, it appears that disclaimers he could veto any requests, so in effect he was and confusion about what is actually being sold ordering the tests. Another approach was to might circumvent this policy. provide prospective consumers with a list of company-affiliated physicians who would Self-Regulation arrange the referral. Family physicians are not needed in this circumstance. However, noth- Self-regulation is often preferred over a leg- ing has changed in terms of whether this is a islative solution, particularly in a changing real DNA test or not, and in addition there is the

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 165

TABLE 5.3 Common framework of principles for DTC genetic testing services [4].

Principle Components

Marketing/ Transparent evidence used in test selection; truth in advertising; adherence to regulatory requirements. advertising Regulatory Evidence should be provided for association based tests including what has been published. information Information for Issues that need to be addressed with reports, risk calculations, duration of sample storage, will consumers samples be used for other purposes, counseling implications, complaints mechanisms and so on. Counseling Provision of information about pre- and post-test counseling by suitably accredited health professionals. and support Consent Consent including confirmation person ordering the test is the one providing the biological sample. Testing minors/individuals with diminished capacity particularly in high impact DNA genetic tests. Data protection Appropriate protection to ensure privacy and confidentiality. Sample ELSI and professional standards apply to the use, storage, transfer and disposal of biological samples. handling Nature and duration of storage should be identified and what would happen if the company were to cease trading. Laboratory OECD recommendations for quality assurance should be followed. processes Interpreting Depending on the category of DNA tests, qualified professionals should be involved in the test results interpretation of the results. Risk assessment type tests should be based on scientifically sound algorithms. Provision of Considerations: how tests results are issued, to whom and their impacts on the customer and his/her results family. Some form of evaluation is required to gauge how customers understand information and test results provided. Continuing Information provided with test result: how customers can access further professional input including support any subsequent questions they might have. Complaints Satisfactory complaints process for dissatisfied customers.

problem that a health professional who should medical standard requires all relevant informa- know better is ordering what might be a dubi- tion to be provided, so an informed choice can ous test. At least if something goes wrong the be made by the patient. Arguably an extreme health professional, provided he is not located example of this is Rogers v Whitaker, in which offshore might need to take some responsibility. the High Court of Australia ruled that patients Medical colleges and other professional should be fully informed of the risks involved organizations have not consistently taken up in a procedure, even if they are very low (in this the challenge of DTC DNA tests. In particu- case the risk for the particular serious complica- lar, development of policy statements and pro- tion was about 1 in 14 000). In this environment fessional education is necessary to ensure all a physician cannot be paternalistic or less than members are familiar with this type of testing. accurate in reporting risks, as the consequences Two standards in relation to duty of disclosure of failure would be dealt with by the courts. In are now seen with DNA genetic testing. The contrast, the business standard with DTC DNA

MOLECULAR MEDICINE 166 5. Delivering Genetics and Genomics Direct-to-Consumer testing is vague, with disclaimers and careful selected and perhaps not representative of the selection of wording or claims in advertising broader cohort. Since the follow-up was rela- material. tively short, the impact that DTC DNA testing had on possible medical interventions could Education and Research not be assessed [17]. Consumers need information about DTC Direct-to-Consumer Advertising genetic testing in a variety of formats and from trusted sources to help them appreciate A related and growing issue is direct-to- the benefits versus the risks, and so informed consumer advertising (DTCA), which has already choices can be made before using these serv- provoked controversies in cosmetic surgery, and ices. Healthcare professionals, particularly more recently in regenerative therapies. It uses family physicians, must also understand this sophisticated media advertising which resonates approach to DNA testing so they can more with members of the community who want to effectively engage with their patients. take greater responsibility for decision mak- There are challenges in moving forward. The ing. This is particularly relevant to personalized rhetoric now needs to be supplemented with medicine. The development of the Internet and research to determine if there are risks with the various social interactions possible in today’s DTC DNA testing. Included would be finding media provide the environment for more aggres- better ways to communicate risk to both health sive DTCA, often within a background of infor- professionals and patients, and determining mation that is not directly health related. The what consumers understand in terms of DTC noise generated hides what is actually deliver- DNA testing [10]. It would be helpful to know able. The DTCA issue is well summarized in how many consumers use DTC DNA testing, terms of cancer, an emotive and important issue how results are viewed or interpreted and what for many in the community [18]. Like DTC DNA psychological or other impacts, if any, do these testing, there are pros and cons that individually types of tests have on customers and their fam- make sense, although it is more difficult to move ily members. Ultimately, the most important from broad aspirational goals to how these will question is: does information obtained from make a difference to healthcare. The cancer study DTC DNA testing make any difference to the quoted makes the interesting point that while all behavior of those tested, particularly in terms members of the community are equally exposed of preventive medicine? to DTCA, only those who have access to the One study attempted to answer some of Internet can obtain further information that will these questions, and it demonstrated the prob- help in decision making or understanding the lems with this type of research. It followed implications. Those disadvantaged or less affluent about 2 000 consumers who purchased a com- will miss out. mercial genome-wide risk assessment scan. As indicated earlier in this chapter, there are From a relatively short follow-up, around five around 20 DTC DNA testing companies although months post-testing, it appeared there were no 12 months earlier the number had peaked at untoward effects, either physical or psychological, around 30. The change comes about because from the testing. However, the study design more companies are now requiring physician- did not attempt to assess the clinical validity referrals although they advertise directly to the or utility of the testing offered, and 44% of con- consumer [3]. This is a welcome trend and places sumers dropped out of the study, so the final additional emphasis on having an educated med- group that was evaluated was to some extent ical workforce.

MOLECULAR MEDICINE 5. Delivering Genetics and Genomics Direct-to-Consumer 167

Future [4] UK’s Human Genetics Commission – A common framework of principles for direct to consumer The next phase in DTC DNA testing is whole genetic testing services. www.hgc.gov.uk/client/ genome sequencing (Chapter 4) and this has Content.asp?ContentId=816 started with $999 being quoted for a whole exome [5] United States Government Accountability Office 2006 report: nutrigenetic testing: tests purchased from four sequence – i.e. for sequencing all the exons in all web sites mislead consumers. www.gao.gov/new. 20 000 genes. This is cheap if one compares the i­tems/d06977t.pdf figure with BRCA1, BRCA2 testing which costs at [6] United States Government Accountability Office 2010 least $2 000 depending on where it is done. report: direct-to-consumer genetic tests. www.gao. One DTC company has also given a preview gov/new.items/d10847t.pdf [7] Ng PC, Murray SS, Levy S, Venter JC. An agenda for of how DNA test results might be provided personalized medicine. Nature 2009;461:724–6. in the future. It has moved from offering DTC [8] 2007 OECD guidelines for quality assurance in whole genome sequencing to DTC interpreta- molecular genetic testing. www.oecd.org/dataoecd/ tion of whole genome sequencing. Customers 43/6/38839788.pdf will bring their whole genome sequence that [9] US DTC DNA testing company 23andMe. https:// www.23andme.com/health/all/ has been obtained elsewhere, and the company [10] Caulfield T, Ries NM, Ray PN, Shuman C, Wilson will interpret it and issue a report. Periodically B. Direct-to-consumer genetic testing: good, bad or the customer returns to check on new informa- benign?. Clinical Genetics 2010;77:101–5. tion that will have emerged and the report is [11] Edelman E, Eng C. A practical guide to interpretation updated and re-issued [19]. From a business per- and clinical application of personal genomic screen- ing. British Medical Journal 2009;339:1136–40. spective this is attractive since it ensures a regu- [12] Hogarth S, Javitt G, Melzer D. The current landscape lar source of income. It also helps to address a for direct-to-consumer genetic testing; legal, ethical, problem that will face laboratories as large data and policy issues. Annual Review of Genomics and sets are generated from omics approaches. With Human Genetics 2008;9:161–82. these will come many VUS. How are these man- [13] Informed medical decisions – a direct-to-consumer genetic counseling service on the Internet. www. aged in terms of updating patients and referring informeddna.com/ on to clinicians new information that may be [14] Mitchell PB, Meiser B, Wilde A, et al. Predictive and relevant to the VUS? We might learn something diagnostic genetic testing in psychiatry. Psychiatry from the DTC model! Clinics of North America 2010;33:225–43. [15] Beaudet AL, Javitt G. Which way for genetic test- regulation?. Nature 2010;466:816–8. [16] New guidelines for genetic tests are welcome but References insufficient. Lancet 2010;376:488. [17] Bloss CS, Schork NJ, Topol EJ. Effect of Direct- [1] More Genes Direct – 2007 publication developed from to-Consumer Genomewide profiling to assess an earlier document titled Genes Direct. www.hgc. disease risk. New England Journal of Medicine gov.uk/client/document.asp?DocId=139&CAtegor 2011;364:524–34. yId=10 [18] Kontos EZ, Viswanath K. Cancer-related direct- [2] Essentially Yours: the protection of human genetic to-consumer advertising: a critical review. Nature information in Australia. www.austlii.edu.au/au/ Reviews Cancer 2011;11:142–50. other/alrc/publications/reports/96/ [19] The Human Genome Interpretation Company. [3] Genetics & Public Policy Center providing a list of DTC http://www.knome.com/ DNA testing companies. www.dnapolicy.org/images/ reportpdfs/NewMethodsForDTCTable_updated_ Jan2012.pdf

Note: All web-based references accessed on 20 Feb 2012.

MOLECULAR MEDICINE CHAPTER 6 Public Health, Communicable Diseases and Global Health

OUTLINE

Public Health 169 Global Health 194 Introduction 169 Non-Communicable Diseases 194 Preventive Medicine 170 Obesity 195 Population Screening 170 Nutrigenetics and Nutrigenomics 197 Changing Behavior 174 Bioeconomy 199 Workplace 175 References 199 Communicable Diseases 178 Detection 178 Pathogenesis 185 Emerging and Re-Emerging Infections 192

PUBLIC HEALTH within a population. Until fairly recently epide- miologists relied on traditional approaches and Introduction measures of population health. Now an addi- tional dimension is available through molecular Public health is a community-based strategy (DNA) testing. to improve health and well being, and to pre- The potential to study the interactions vent disease through research, policy, education between genes and environments is a powerful and appropriate practice. It is very different instrument for those whose research and clini- to personalized medicine which focuses on the cal focus is a population. Tools used by geneti- individual. Common to both is the potential cists, such as genome wide association studies for DNA-based information to enhance clini- (GWAS) and DNA banks are now accepted as cal care. Fundamental to the practice of public legitimate methodologies for research under- health is epidemiology – the study of the causes, taken by public health professionals and epi- distribution, control and prevention of diseases demiologists. A flavor of what is possible in

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00006-2 169 © 2012 Elsevier Inc. All rights reserved. 170 6. Public Health, Communicable Diseases and Global Health public health genomics can be found in the the primary focus the health of the community, Centers for Disease Control and Prevention while screening programs coming from a genet- website [1]. ics viewpoint consider the individual’s rights to be paramount. Hence, the philosophy behind the consent process can be different. This is Preventive Medicine illustrated by the newborn screening program. The concept of prevention is a gold stand- From a public health perspective, screening ard in public health, moving the focus from newborns to prevent a serious disorder such treating an established disease to maintaining as congenital hypothyroidism, with its associ- well being, and avoiding disease or delaying ated severe intellectual impairment, can lead its onset. As well as research and education, to important clinical outcomes. The screening prevention requires appropriate interventions. test itself poses no risk to the newborn (com- In the prevention of a disease such as cervical pared to, for example, some vaccinations) and cancer, screening is a core activity, but there is the benefits are significant. Therefore, what another preventive focus which is directed at type of consent process is needed? The options finding risk factors, for example elevated blood vary from no consent (if newborn screening is cholesterol. The social determinants of disease, mandated by law) to an opt-out consent proc- particularly in developing countries, and the ess to a fully informed written consent (more underprivileged, under-resourced or minority groups are also being emphasized and factored into prevention strategies. TABLE 6.1 Preventive measures can be implemented There are multiple preventive approaches at different stages of disease [2]. (Table 6.1), and prevention is not all-or-nothing Prevention Description Examples as interventions are possible during different Primary Promoting Immunization, health phases of pre-disease and disease development health prior to promotion campaigns (Figure 6.1). This expanding view of prevention development including anti-smoking is considered by some to be a weakening of the of disease or and healthy diet concept, and even the terminology is confusing injuries choices. with public health and population health used Secondary Detecting Screening, case finding, by some to mean the same thing (the pragmatic disease in early detection. approach adopted in Molecular Medicine) while its early others distinguish the two [2]. In this complex (asymptomatic) stages mix, the application of molecular (DNA) knowl- edge provides additional options for prevention Tertiary Reversing, Preventing strategies; from the earliest possible detection of arresting complications of or delaying chronic diseases such disease development to novel therapies. progression of as diabetes including disease rehabilitation. Population Screening Quaternary Avoiding Availability of medical consequences based information There are different ways DNA genetic testing related to from the Internet can be incorporated into population screening overmedication, leading to an increase programs once established criteria balancing overdiagnosis in the worried well. risk versus benefit are adequately addressed or incidental Direct-to-consumer (Table 6.2). Screening programs developed findings, e.g. DNA genetic testing imaging. (Chapter 5). through a public health perspective have as

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 171 on consent in Chapter 10). Some argue that The types of screening programs available informed consent is essential because the through DNA genetic testing are listed in Table parent(s) must be engaged to ensure they will 6.3. The utility of DNA-based strategies, par- provide regular supplementation for congeni- ticularly the potential for PCR to test many tal hypothyroidism in an affected child. Others samples quickly and cheaply, has meant that would say that the health implications of an widespread screening of a population becomes affected child for parents and society make a practical consideration. As more genes are screening the highest priority. Less seems to be sequenced, the number of mutations identifi- written about the child and who protects his or able by PCR will increase. Omics-based tech- her right to a healthy life. Different arguments nologies will continue to expand the options for based on a public health versus a genetics DNA testing. approach are possible [4]. Two early examples of selective screening programs targeted to at-risk populations illus- trate some advantages and disadvantages of this type of testing. Tay Sachs disease is a fatal neurodegenerative disorder of childhood. It is inherited as an autosomal recessive trait. Since the early 1970s, individuals at risk of having

TABLE 6.2 some criteria that a disease should meet prior to it being considered suitable for population screening [3]. Burden of disease Category WHO principles of early disease detection

1 2 3 4 Condition l It should be an important health problem l There should be a recognizable latent or Time early symptomatic stage l The natural history should be adequately FIGURE 6.1 Progression of disease with burden understood (Y axis) plotted against time (X axis). Three different dis­ Test l There should be a suitable test (see [3] and orders (green, red and yellow) develop over four time peri- Table 3.6 for criteria on DNA tests including ods with gradual increase in burden of disease over this sensitivity, specificity and so on). time. Ideally, detection of disease onset should occur at the l The test should be acceptable to the earliest phase (1) rather than waiting until the burden is population such that treatment becomes very difficult (phase 4). Some disorders lead to significant burden of disease (green) while Treatment l There should be an accepted treatment for others move along at a relatively lower level of activity patients with the disease (yellow). Preventive steps should be available for all four Screening l There should be an agreed policy on whom phases with the goal being to push back from 4 to 3 to 2 program to treat and ideally 1 to optimize treatment outcomes. DNA testing l Facilities for diagnosis and treatment (screening) plays an important part in this push-back as it should be available has the potential to detect genetic mutations that predis- l The cost of case-findings should be pose to disease (germline DNA) or early signs of disease economically balanced in relation to perhaps in somatic cell DNA in tumors (phases 2 or 3). While possible expenditure on medical care as a the focus is on DNA one should not forget that develop- whole ments in other omics (particularly proteomics, metabo- l Case-findings should be a continuing lomics or epigenomics) can identify changes that assist process earlier diagnosis.

MOLECULAR MEDICINE 172 6. Public Health, Communicable Diseases and Global Health

TABLE 6.3 some examples of DNA-based screening (more on this in Chapter 2). The US-based sickle strategies. cell screening program, which was also started Screen Explanation in the early 1970s, targeted the at-risk African American population. The initial version of Family Family members at increased risk for a this program produced more harm than good. screening genetic disorder can be screened and this information used for early interventions or Results led to a lowering of self-esteem, over- for decisions in reproduction. This approach protection by parents and discrimination. The requires the family-specific DNA mutation discrimination came from employers, insur- to be known. For example, cystic fibrosis ance companies, health insurers and potential screening in parents and siblings can be spouses. Why did the two screening programs implemented if a newborn child is shown to be heterozygous for the p.Phe508del produce different outcomes? mutation. Family screening has also been One reason for successful Tay Sachs screen- called cascade testing because once it is ing was the nature of the target group, which established which side of the family has the comprised individuals of Jewish origin who particular risk allele, screening via family had better educational opportunities and social tracing can be undertaken. infrastructure. Another contrast between the Population (1) Pre-disease DNA testing – a two programs was the close community consul- screening contemporary DNA population screening tation undertaken prior to testing for Tay Sachs. dilemma is hemochromatosis. (2) Carrier testing for single gene Mendelian disorders Because of the problems associated with sickle such as Tay Sachs disease in Ashkenazi cell screening, changes were made, including Jewish populations. the removal of legal compulsion to be screened Newborn An accepted approach to test for reversible and improved counseling and education facili- screening or treatable genetic or congenital disorders ties. These enabled more successful testing to in newborns without any specific risk. Does be pursued. Experiences with these programs this by analyzing a drop of blood collected illustrate the necessity for counseling and pub- by heel prick. lic education to explain the significance of mass Workplace The options available are discussed in the screening results as key ethical considerations screening text. in design. Today, there are other population DNA screening dilemmas including: (1) Cystic fibrosis population screening, and (2) Sickle cell children with Tay Sachs have had the opportu- trait screening in sport. nity for genetic screening and counseling. As a result, the incidence of Tay Sachs disease has Cystic Fibrosis Population Screening been reduced without the societal problems Over 1 500 mutations in the CFTR gene pro- that developed following the implementation of duce cystic fibrosis, although p.Phe508del is the population screening for sickle cell disease. most common found in northern Europeans. Sickle cell disease is an autosomal recessive Others are much less frequent. So how useful is a disorder with around 100 million carriers world- test that will not detect all those who are affected? wide, and 2 million in the USA, most of whom For example, if only the p.PheF508del muta- are African Americans. There can be consider- tion is sought, false negative results in couples able morbidity and mortality associated with from a population with a frequency for this those who are homozygous affected, although mutation of 70% will be 0.51 (1  (0.7  0.7)) – the ultimate outcome is not entirely genetic in i.e. approximately half the couples will not be origin, as environmental factors are important identified by this approach. Detection of the less

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 173 common mutations (some of which are only Sickle Cell Screening in Sport present at a 1–2% frequency in the population) Population screening for sickle cell disease is would add to the workload, but would not sub- in place in some newborn screening programs, stantially increase the information gained by the particularly those involving at-risk populations. screening program. Additional problems that Sickle cell disease has potentially fatal conse- would need to be resolved before embarking on quences, but its effects can be ameliorated or widespread cystic fibrosis screening include: avoided by early medical intervention including 1. Uncertainty about disease severity for some the use of antibiotics. DNA testing for the sickle mutations. Thus, counseling in a number of cell trait is used in at-risk couples or popula- instances will be difficult and incomplete, and tions, since offspring of an at-risk couple have a 2. Potential for racial profiling as cystic fibrosis 1 in 4 chance of inheriting sickle cell disease. is rarely found in some populations, for As will be discussed below under Workplace, example Asian, and so detection rates in DNA testing can be used to screen selected pop- these will be minimal. Some would argue ulations to detect individuals who are at risk that one should not exclude or include of a work-related illness. In this context, work particular ethnic groups in screening can include sport. Since hypoxia is one pre- programs because this places undue cipitant for an acute attack in sickle cell disease, emphasis on ethnicity and predisposition one might see justification in screening play- to genetic diseases. Others would say ers involved in a sport likely to lead to hypoxia. that disease and ethnic predisposition is a What to do with this information could be prob- reality and, in the context of personalized lematic, but the issue is already facing some healthcare, needs to be considered. sporting bodies, as exemplified by the case of a 19 year old university student who died as a Debate continues about the value of cystic result of a rare complication of sickle cell trait, fibrosis mass population screening in contrast and the subsequent court action. In this case, the to testing individuals or at-risk families (selec- organization responsible for student sports at tive screening). Even if laboratory facilities were this level determined that sickle cell trait screen- available, major genetic counseling and public ing would become mandatory despite the trait, education efforts would be required to ensure in contrast to the disease, rarely leading to seri- that those tested fully understood the impli- ous medical complications [7]. cations of the results. The financial resources Screening for a trait is another example of the needed to carry out a mass screening program public health versus the genetic approach, with would be enormous. In view of this, and the the latter considering sickle cell trait to be a good inability to detect all mutations with present trait since it has evolved with time to protect technology, recommendations vary. In the USA against malaria. Therefore, care is taken to avoid the recommendation is for limited screening – discrimination against or stigmatization of car- perhaps of pregnant women, or selective screen- riers. In contrast, the public health (or more ing of groups or families who are at higher risk likely in this case the medico-legal) perspective than the general population [5]. Other coun- views the trait as a risk factor that needs to be tries do not recommend screening of pregnant screened for, to identify those who might need women. A 2010 European consensus statement appropriate interventions or, more problematic, on carrier screening provides an in depth over- exclusion from a sport. It will be interesting to view as well as a framework for what might be see how this controversial screening program possible in member states [6]. for an autosomal recessive trait unfolds.

MOLECULAR MEDICINE 174 6. Public Health, Communicable Diseases and Global Health

Newborn Screening gathered. One review found evidence that DNA Taking blood from the newborn’s heel to test genetic testing for rare genetic variants such as for treatable and/or preventable medical disorders the BRCA1 and BRCA2 genes in breast cancer has been in place since the early 1960s. Initially does lead to changes, such as follow-up mam- this was undertaken with biochemical testing mograms [9]. Less clear was whether this knowl- and then DNA analysis was added. Next, tan- edge influenced the behavior of other at-risk dem mass spectrometry (Chapter 4) became pos- family members. The health literacy of the popu- sible, allowing metabolomic-type approaches lation remains a critical factor in whether behav- to screening for amino acids, organic acids and iors change. If so, statistics emerging from the fatty acid metabolism to be included [8]. same review are worrying; more than a third of Today, there is little dispute that screening US adults have limited health literacy and only newborns for treatable disorders such as phenyl­ about 12% have sufficient health literacy skills ketonuria and congenital hypothyroidism are to understand this type of information [9] (see important public health initiatives. Less clear is Chapter 10 for more discussion of education). the value of newborn screening for a variety of Familial Hypercholesterolemia other conditions, including the hemoglobinopa- thies, galactosemia, maple syrup urine disease, It is worthwhile concluding this section homocystinuria, biotinidase deficiency, con- with a scenario discussing from the laboratory to genital adrenal hyperplasia and cystic fibrosis the bedside, although in today’s philosophy of [5]. The options for screening have been fur- avoiding hospitalization and expensive medical ther expanded by tandem mass spectrometry, interventions we should be saying from the labo- with its potential to detect many metabolites ratory to the community. The example is familial both normal and abnormal [8]. The former is hypercholesterolemia (FH), an autosomal dom- an important consideration, since false posi- inant Mendelian disorder which is reasonably tive results from screening will place addi- common in many populations, affecting about 1 tional pressure on the health system as well in 500 people in a country like the UK. Familial as increasing the worried well (Table 6.1). The hypercholesterolemia is clinically important, debate about informed consent, presumed con- as 50% of affected men will develop coronary sent or even legal compulsion in public health artery disease by the age of 50, and 30% of measures such as newborn screening will con- women will do so by the age of 60 [10]. Heart tinue for some time. UK also estimates that of the 120 000 predicted to be affected in the UK, only 15 000 have been Changing Behavior identified [11]. Can public health measures utilizing DNA testing help to bridge this gap? The applications of molecular medicine in Presently the standard criteria of family his- public health practice have introduced new tory, clinical examination and serum choles- options for preventive programs and inter- terol measurement are insufficient, particularly ventions. However, changes will only occur if if familial hypercholesterolemia needs to be health professionals (starting with medical stu- detected earlier to optimize the effect of anti- dents) understand the implications and basis cholesterol drug therapy. for molecular medicine and incorporate this Our molecular understanding of familial knowledge into their work. hypercholesterolemia started in 1972, when Will DNA based knowledge lead to bet- M. Brown and J. Goldstein used biochemical ter health choices by members of the commu- and cell culture approaches to study this disor- nity? Data on this are only now starting to be der. Subsequently they showed that cholesterol

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 175 metabolism was controlled by a receptor called ongoing, community-based, familial hypercho- LDL (low density lipoprotein) and abnormali- lesterolemia screening service run by special- ties in it would lead to familial hypercholeste- ized nurses. It has produced some impressive rolemia. They were awarded the Nobel Prize in detection rates which are expected to reduce Physiology or Medicine in 1985 for their work. morbidity and mortality in the longer term. Once the LDLR gene for this disorder was iso- The NICE guidelines allow a similar approach lated, DNA tests for a variety of purposes in other countries. It will be important to evalu- (diagnosis, prediction and screening) could be ate the clinical effectiveness of this preventive developed. measure utilizing DNA testing. The addition of DNA testing in the manage- ment of familial hypercholesterolemia now improves the diagnostic accuracy, and the same Workplace test can be used to identify at-risk family mem- DNA testing in the workplace could be under- bers. However, this comes at a cost. DNA test- taken for: ing is not simple, as the LDLR gene is large and mutations are often family-specific. Therefore, 1. Detecting predisposition to disease or injury DNA sequencing is needed and any changes because of genetic susceptibility; found are not necessarily pathogenic in nature, 2. Detecting exposure to toxins; but can be variants of unknown significance 3. Litigation, and (Chapter 3). Mutations in other genes can also 4. Identity checks [12]. produce a similar clinical picture (phenotype). These include APOB, ARH and PCSK9 which Detecting Predisposition to Disease interfere with the cholesterol pathway. Finally, or Injury environmental factors such as diet, smoking This is the most contentious of the four and hormones also impact on the cholesterol applications, since it implies that DNA genetic level. Thus, the costs and considerable work testing can predict who will develop an illness involved would need to be balanced against the or an injury in a particular work environment. clinical benefits of earlier diagnosis for individ- One example of the approach is beryllium uals, families and the broader community. exposure, which occurs in industries such as Failure to make a diagnosis of familial defense, aerospace, nuclear power, electron- hypercholesterolemia might have been less of ics and dental prostheses. Even if a worker is an issue before cholesterol-lowering drugs such not directly dealing with beryllium, second- as the statins became available. Today, treat- ary exposure can occur via airborne parti- ing an individual with elevated cholesterol is cles. Family members exposed to dust carried very effective, and it is generally believed that on clothing or footwear may also be at risk. intervening early avoids cardiovascular and Individuals sensitized to beryllium are at risk related complications of familial hypercholes- of developing acute or chronic disorders of the terolemia. In 2008 NICE (the UK’s National skin and lung, with the most serious conse- Institute for Health and Clinical Excellence) quences being carcinoma of the lung or chronic published guidelines for a new approach to granulomatous lung disease (chronic beryllium the treatment and diagnosis of familial hyper- disease). cholesterolemia, which included personalized Research has shown that genetic variants of medicine through DNA testing of individu- the HLA-DPB1 gene, particularly HLA-DPB1E69 als and at-risk family members detected by are found more often in exposed workers who cascade testing. In the Netherlands there is an go on to develop a cell-mediated, type IV,

MOLECULAR MEDICINE 176 6. Public Health, Communicable Diseases and Global Health delayed hypersensitivity reaction, leading to so the positive predictive value of 11.7% is not chronic beryllium lung disease. Mortality asso- high enough to make DNA testing a worth- ciated with this complication is around 36–62% while screen [13]. [13]. However, it is important to note that the Other examples highlighting ethical and legal HLA genotype per se is insufficient to lead to dilemmas include the APOE4 DNA marker disease and within the environment there are and predisposition to dementia following head modifying factors such as the type of job; e.g. injury in boxing (Box 6.1). Another genetic link machining is more risky. between sport and illness is autosomal dominant Will testing for HLA-DPB1 variants predict familial hypertrophic cardiomyopathy, which is which workers are likely to develop beryllium caused by mutations in muscle sarcomere genes. related disease? Despite the odds of lung dis- This disorder may initially present as sudden ease associated with the glutamic acid 69 vari- cardiac death following strenuous physical activ- ant being high (84% of workers with chronic ity. Although the molecular DNA defects under- beryllium disease versus 36% in exposed work- lying this disorder are known, their number and ers without this disorder), the DNA test would complexity make it impractical to screen profes- not be particularly helpful, because the preva- sional sportsmen and women, unless there are lence of HLA-DPB1Glu69 in the normal popula- reasons such as a family history, unexplained tion is high (40%) while the prevalence of disease syncopal attacks, or cardiac findings during clin- among beryllium workers is relatively low (5%) ical examination. Generally, an individual with

BOX 6.1 GENES AND SPORT. The APOE4 gene variant described earlier trainer be at risk of litigation for not advising a (Chapter 2) is associated with a greater risk of boxer to have their APOE4 status determined? developing Alzheimer disease, and the risk Should someone with this genetic marker be appears to be further increased in boxers – excluded from boxing? Hypothetical questions presumably as a consequence of chronic brain such as these continue to be asked, but there are trauma. In a recently reported study, 50% of indi- no clear answers. If genetic testing is used for viduals with chronic traumatic encephalopathy screening for susceptibility to work related conditions were shown to carry at least one APOE4 allele it should show: (one was homozygous for this marker) compared to the general population carrier rate of 15% [14]. 1. Strong evidence for linking the working Although considering only a small sample size, environment and the disorder; a 2006 report suggested that the APOE4 vari- 2. The disorder has serious implications for the ant was also associated with poorer cognitive health or safety of employees; and behavioral outcomes following moderate 3. The test has the appropriate sensitivity, and severe traumatic brain injury [14]. Should specificity and other parameters, and an individual who has the APOE4 marker (par- 4. Privacy and the potential for inappropriate ticularly someone who is homozygous for this discrimination are addressed. marker) avoid boxing? Would an employer or

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 177 this type of inherited cardiomyopathy is warned has been caused and by what particular toxin against playing competitive sports as strenuous (screening workers with illness). activity is associated with sudden cardiac death. Those with the disorder can have their heart Litigation rates monitored electronically, or have defibril- Quantifying the evidence of exposure is a lators implanted to instantly revert ventricular significant hurdle in a tort action (called toxic arrhythmias that arise. tort if the wrongful act involves exposure to a toxic substance). It is not easy for a plaintiff to Detecting Exposure to Toxins prove that exposure to a toxic substance has There are many potential toxins in the work- occurred and that the toxic substance was the place. Genetic monitoring has been used in cause of illness or injury. Conversely, a defend- circumstances involving radiation and geno- ant in a toxic tort may have difficulty disprov- toxic chemical exposures. Detecting damage ing a claim because of doubtful or minimal to DNA is important but difficult, especially evidence. However, exposure to xenobiotics at low exposures where health effects may not (compounds that are foreign to the body) will become apparent until well into the future. As provoke changes in gene expression in any bio- was shown after the Chernobyl nuclear power logical system. This is the rationale behind the reactor accident in 1986, chromosomal damage use of transcriptomics to identify or character- in workers exposed to significant γ radiation in ize changes that result from exposure to toxins. the clean-up operation was an important indi- There is potential for toxicogenomics to provide a cator of damage. However, age and smoking new and more definitive evidence of exposure habits were confounding factors for genetic to a toxic substance by looking for particular cel- damage, and the costs of FISH assays for lular responses before and after exposure to it. detecting chromosomal abnormalities were too high for large scale population studies [12]. Identity A new approach to detecting DNA dam- Workplace DNA testing to establish iden- age might be possible with Next Generation tity is used in the military and the police. The (NG) DNA sequencing, which is interesting purpose is to have on record a reference DNA since detecting radiation-induced DNA damage profile for identifying, if necessary, body parts was one of the early reasons for initiating the (war, fighting or terrorism) or to assess crime Human Genome Project (Box 1.2). The potential scene contamination. These aims are not contro- for quantitating cellular and tissue damage is versial but concerns include: illustrated by the use of this to study genomes of patients with lung cancer caused by cigarette 1. Security of the DNA sample, and who has smoking. Tobacco smoke contains more than 60 access to it; carcinogens, and damage results from chemical 2. Will the DNA sample, or more likely the modification of purines by mutagens, inability DNA profile, be included in the databases of the DNA repair mechanisms to correct this which are used to search for criminal damage and incorrect nucleotide incorporation activities? opposite the distorted base during DNA replica- 3. How long are the DNA samples/profiles tion [15]. NG DNA sequencing allows the DNA kept – i.e. are they destroyed once the signatures of tissue damage and DNA repair individual is no longer in the military/ to be cataloged. It may show sufficient specifi- police, and city to permit monitoring of the environment 4. Is this a voluntary or compulsory part of the (by screening workers) or detect when damage employment agreement?

MOLECULAR MEDICINE 178 6. Public Health, Communicable Diseases and Global Health

COMMUNICABLE DISEASES acid testing) because they involve both DNA and RNA. Detection Previous editions of Molecular Medicine gave an in depth overview of how knowledge of There are many applications of molecular DNA could be used to improve the detection medicine in the communicable diseases caused of infectious agents for patient care. This detail by bacteria, viruses, fungi, parasites and in a is no longer necessary because DNA testing is rare example by an abnormal protein. As well now used routinely in clinical management and as the known infectious agents, there are the public health strategies. The various diagnostic newly emerging (or re-emerging) infections and tests derived from the traditional phenotypic an increasing number of immunocompromized tests to DNA-based genotypic tests are summa- patients exist. To this mix the development of rized in Table 6.4. therapy-resistant organisms and bioterror can As already noted, the utility of DNA be added. In such a changing environment, no sequencing, particularly for viral infections single therapeutic or preventive approach will (because their genomes are relatively small), be sufficient. What is certain is the ongoing has expanded rapidly and now contributes key requirement for rapid and accurate detection of data for investigating new outbreaks. Just as infectious agents, which is best undertaken by occurred in genetics, an omics approach will molecular-based diagnostics. In infectious dis- become increasingly preferred – already the eases, these are usually known as NAT (nucleic concept of infectomics is being touted. More

TABLE 6.4 Two approaches to laboratory testing in microbiology are the traditional phenotypic tests or the new genotypic DNA or RNA tests (NATs).

Phenotype-based tests Genotype-based tests

Traditional diagnostic approaches include: Strategies for analyzing pathogen nucleic acid tests: l Microscopy – staining, appearance l Nucleic acid hybridization l Culture and growth characterization l Plasmid identification l Biochemical testing l Chromosomal DNA banding patterns l Immunological profiling (antisera or antibodies). l PCR amplification techniques l Microarray based assays l DNA Sequencing. Can provide clues for identifying new pathogens. Like the trend in genetic disorders, DNA sequencing is assuming Tried and trusted approaches that are often relatively greater utility for detecting infectious agents [16]. Unlike genetic cheap and technically easier than genotypic methods. DNA testing, contamination is a major source of error because However, can be slow and so not always useful there is considerably less template DNA. during epidemics, emergencies or new infections. Phenotypic variation can occur during pathogens’ life Variation less of an issue but finding DNA or RNA does not cycles making it difficult to interpret results at times. necessarily confirm an organism to be pathogenic. For example, the detection of CMV DNA by PCR in a patient’s serum could mean active disease or latent infection. Host immune responses can be delayed or may Best for detecting difficult to culture organisms or there is a mix remain persistent even after resolution of infection. of pathogens. DNA testing has greater sensitivity and also allows Cross-reacting antibodies from natural infection or virulence factors and drug resistance to be detected. Q-PCR helpful vaccination can produce false positive results. in monitoring treatment with viruses such as HCV and HIV.

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 179 sophisticated bioinformatics is being developed apparent that there is a pangenome. This means to deal with metagenomics (Chapter 4) and that different strains of an organism have: this will ensure that new software will allow 1. The same core genes; the sequence information from complex mixes 2. A number of genes that are variable of organisms (even those in clinical specimens) and used for adaptation to particular to be analyzed and separated into distinct environments, and organisms. 3. A set of genes with no known function Evaluating a NAT is based on traditional (Figure 6.2). measures: The pangenome varies between organisms, 1. Sensitivity; for example, all genes for B. anthracis appear to be 2. Specificity; present in only four species. In contrast, for E. coli, 3. Positive predictive value (PPV), and it is likely that the pangenome will require hun- 4. Negative predictive value (NPV) (Table 3.6). dreds of these bacteria to be sequenced. Apart Tests with high PPVs are needed for infec- from providing further insights into the struc- tions where a false negative will have signifi- ture and function of organisms, knowledge of the cant clinical or psychological consequences, for pangenome is likely to be more informative than example, tests for sexually transmitted infec- any individual genome when considering new tions. Tests with high NPVs are required when virulent forms or the development of drug resist- it is essential that positives are not missed, for ance. To study and understand the pangenome example blood screening. requires an omics approach. It is also apparent that while microbial genomes are small compared Taxonomy and Comparative Genomics to eukaryotes (Table 1.7) they are relatively rich The first microorganism to be sequenced was H. influenzae in 1995. Since, there have been large numbers of microbial and viral sequences deposited in databases, including both patho- gens and non-pathogens. Completed, whole- genome sequences exist for around 3 000 Unknown bacteria, 41 eukaryotes (19 of these being genes fungi) and 2 675 viruses. In addition, 40 000 and 300 000 partial sequences for influenza and HIV-1, respectively, have been completed [16]. Variable genes The numbers of sequenced microorganisms will continue to grow exponentially and Core metagenomic approaches will allow the detec- genes tion of many novel organisms (Chapter 4). The larger databases available for study will ensure sophisticated comparative genomics can be FIGURE 6.2 The pangenome comprises all genomes in undertaken for research and clinical applica- a group of organisms [16,17]. The pangenome is divided tions. DNA-based information is adding a into: (1) Core genes – essential for basic function; (2) Variable new dimension to taxonomic classification, as genes – these reflect the environment that the organism needs to deal with, and (3) Unknown genes – found on DNA described below for viruses. sequencing but function is unknown. The relative sizes are As multiple de novo sequences of the not drawn to scale but are meant to show a smaller core, same organism are obtained, it has become with large numbers of genes with unknown function.

MOLECULAR MEDICINE 180 6. Public Health, Communicable Diseases and Global Health in protein-coding genes (humans 1–2%, microbes 3. Developing rapid diagnostics and detecting 90%) [16,17]. the identity of new viral outbreaks, and Unlike all other cells that have DNA as their 4. Understanding evolution and hence genetic material, viruses are considerably more relatedness for molecular epidemiologic diverse in what they use. This is reflected in a strategies investigating outbreaks of old molecular classification that defines seven dif- and new viruses, and monitoring drug ferent viral classes on the basis of their genetic resistance [18]. material and replication strategies. ds – double stranded; ss – single stranded; () – positive- Nosocomial Infections and Drug Resistance sense or plus strand; () – negative-sense or Nosocomial, or hospital acquired, infections complementary strand: are usually associated with medical devices such as catheters, or surgical procedures. Apart from l dsDNA – example is adenovirus wound and urinary tract infections they lead to l ssDNA – adeno-associated virus life-threatening pneumonia and septicemia. Some l dsRNA – rotavirus statistics on these types of infections include: l ss() RNA – poliovirus l ss() RNA – rabies virus 1. They were the sixth leading cause of death in l ss() RNA plus reverse transcriptase – the USA in 2002 with approximately 99 000 retrovirus deaths; l DNA plus reverse transcriptase – HBV. 2. Estimated cost to the US Healthcare budget is over $5 billion annually; Viruses are the smallest organisms, and 3. Approximately one third are preventable, and have genome sizes measured in kilobases. 4. Gram negative bacteria are involved in more The International Committee on Taxonomy than 30% of infections [20]. of Viruses (ICTV) develops an agreed taxon- omy and nomenclature. It maintains an official The convergence of gram negative bacteria index and publishes this information. In its that are increasingly antibiotic resistant and 2009 release, the ICTV recognized six orders of a reduction in drug development programs viruses with another group yet to be placed into has produced a gloomy scenario for hospital an order. There were 87 families, 19 subfamilies, acquired infections. 348 genera and 2 285 species confirming further Causes for antibiotic resistance are many the heterogeneity found in viruses. The build- including: ing of an accurate taxonomic classification has 1. Unnecessary or inappropriate use of many advantages, including new insights into antibiotics in humans; the biology of the viruses and their evolution- 2. Availability of antibiotics over the counter; ary relatedness which provide important clues 3. Use in the food industry including meat, when dealing with new infections [18] (Box 6.2). agriculture, aquaculture; Applications of DNA sequencing in virology 4. Poor patient compliance in taking prescribed include: drugs; 5. Transmission by farm or pet animals treated 1. Identifying the function of viral proteins to with antibiotics, and allow a better understanding of how viruses 6. Inadequate infection control measures in evade host immune responses or promote hospital and clinical care (Table 6.5). their own migration and spread; 2. Defining regulatory controls or proteins that New drugs are not being developed as quickly might become targets for new anti-virals; as they are needed because of high production

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 181

BOX 6.2 INTEGRATED DNA AND RNA IN THE GENOME. The genomes of vertebrates contain many these integrations came mostly from two groups copies of retroviral sequences acquired dur- of RNA viruses from the negative () strand ing evolution. These could function to protect RNA Mononegavirales order. These were either the host from viral infection, and possibly as a Ebola and Marburg viruses – Filoviridae family source or natural reservoir for the virus to persist associated with lethal hemorrhagic fevers – or and transmit. However, it is now apparent that it Bornavirus – Bornaviridae family are associated is not only retroviruses that can integrate a copy with neurological and psychiatric disorders of their RNA into the host’s somatic and germ- which can be fatal. The vertebrates that had the line genome, which is the necessary first step integrations suggested these events had occurred before replication can occur. The genome of some over 40 million years ago. Therefore, the con- bees has been shown to contain sequences from a servation of sequences coding for virus-like positive () strand RNA Dicistroviridae that infects proteins is thought to have some selective advan- insects. These bees are resistant to infection by tage, possibly increasing the host’s resistance the virus. Following this observation, a compara- to infection. Conversely, continued integration tive bioinformatics study of genomic sequences and persistence might provide viruses with a from 48 vertebrate species using sequence data natural reservoir for future infections. An exam- from non-retroviruses containing single-stranded ple would be bats, which are now thought to be (ss) RNA genomes was undertaken. Surprisingly, natural reservoirs for the Ebola and Marburg it was shown that about half the vertebrates had viruses. Sequences from these viruses are detect- integrated non-retrovirus sequences into their able in some bats with some having open read- genomes. The next unexpected finding was that ing frames [19].

costs, the time required for clinical trials, regula- culture methods to diagnose TB are slow. tory demands and a concern that products will Similarly, the first generation of molecular DNA become obsolete once resistance develops. Apart diagnostic tests is complex, requiring sophis- from rapid diagnosis of the causative micro­ ticated laboratory expertise and resources organism, improved detection of antibiotic resist- [26]. Multidrug resistant TB (defined as infec- ance strains is also needed. These requirements tions that are resistant to at least isoniazid and can be met by a NAT approach, although this is rifampicin) is emerging globally, particularly only the first step of a more comprehensive inter- in India and China. Cases of extensively drug- nationally coordinated plan to address the issues resistant TB now exist, meaning that TB is also of antibiotic resistance. resistant to a number of the second line anti-TB The urgency of this matter is well illustrated drugs. Failure to detect resistant cases of TB is by tuberculosis (TB), where global control of the rule rather than the exception, particularly this increasingly problematic public health where laboratory resources are limited. This challenge requires better and faster diagnosis of means that new, DNA-based, detection kits, the primary infection as well as early detection especially those that can be multiplexed and of drug resistance. The traditional phenotypic automated are eagerly awaited.

MOLECULAR MEDICINE 182 6. Public Health, Communicable Diseases and Global Health

TABLE 6.5 some therapy resistant multidrug resistant organismsa [21–25].

Example Type of resistance

Methicillin Well known nosocomial infection and difficult to treat. Different MRSA strains are reported in relation resistant to community-acquired infections usually in association with relatively minor skin or soft tissue S. aureus infections. However, community-acquired MRSA can now lead to life threatening infections. Generally (MRSA) MRSA infections are spread through direct person-to-person contact hence many are preventable by hand washing. Both traditional culture and DNA testing can be used to detect MRSA. Which is preferred will depend on costs and laboratory staff skills. Vancomycin Unlike the higher profile infections caused by S. aureus, enterococci are less well known as nosocomial resistant infections but can cause fatal diseases particularly if associated with vancomycin resistance (VRE). enterococci Individuals at risk are: (1) Hospitalized for a prolonged time; (2) Immunosuppressed, and (3) Post (VRE) surgery or have devices such as urinary or intravenous catheters. Prior treatment with vancomycin and other antibiotics is an important predisposition. Resistance can occur intrinsically or acquired. Resistance genes include vanA, vanB, vanD, vanE and vanG with the first two commonly associated with VRE. Gram negative There is now antibiotic resistance emerging in the gram negative bacteria. Initially this appeared as bacilli plasmid encoded β lactamases producing resistance to penicillin. It then expanded into ESBL (extended spectrum β lactamases) producing resistance to penicillins, cephalosporins (1st to 3rd generations) and monobactams but not cephamycins or carbapenems. Today, there is added concern about the next trend involving NDM-1 (New Delhi metallo-β-lactamase 1) because the carbapenem resistance gene (blaNDM-1) has been detected by PCR. In the latter example, patients have acquired resistant E. coli or K. pneumoniae species in the Indian subcontinent and brought these back to the UK. Tuberculosis (TB) Multidrug resistance (MDR) TB involves the first line drugs particularly isoniazid and rifampicin (rifampin). Extensively drug resistant (XDR) TB is the next step with additional resistance including second line drugs. Malaria The single drug approach used initially to treat or prevent malaria has now given way to combination therapy including artemisinin (Box 4.6) as resistance emerges across the world. The molecular basis for drug resistance is complex involving many genes including pfCRT, pfMDR which is an ortholog of the P-glycoproteins found in mammals in association with multidrug resistance in cancer (Chapter 7), and mutations in the DHFR gene that produce resistance to antifolate drugs. Influenza Two mainstays for drug treatment during the influenza A (H1N1/09) virus pandemic that started in 2009 were the viral neuraminidase inhibitors oseltamivir (Tamiflu®) and zanamivir (Relenza®). A single amino acid mutation in H1N1 soon appeared (His274Tyr) and this produced a virus that was resistant to oseltamivir although the US Centers for Disease Control and Prevention indicated that almost all viruses remained susceptible. Examples of resistance to zanamivir were not reported. aGenerally these organisms are called MDROs (multiple drug resistant organisms) although some like MRSA or VRE are specifically referring to one type of antibiotic/drug resistance.

A fully automated NAT method to detect only detects resistance to rifampicin it shows the both TB and rifampicin resistance was reported way ahead, particularly if omics-based diagnos- in 2010. This uses uncultured sputum and tics including microarrays are developed [26]. can be completed in less than 2 hours with impressive sensitivities and specificities even Public Health Testing – Blood Transfusion in patients with TB and HIV, where smear- Services negative disease is more common. Since it is Viruses such as HIV, HBV and HCV assume fully automated, it does not require sophisti- added notoriety when they are implicated in cated hands-on expertise. Although the NAT transfusion-derived infections involving blood

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 183 and plasma-derived products. Previously, patients, the fetus or the neonate. The risk of blood transfusion services based their donor prion diseases is considered below. and blood screening programs on detect- Ease of international travel means a poten- ing antibodies or antigens in the donor or tial donor could become infected elsewhere. blood supply. However, this has proved to be This contingency is covered by donor question- inadequate, and an important addition to the naires that allow self-exclusion (particularly screening protocols is the use of PCR to iden- for infections that are not routinely sought). tify viral DNA or RNA. The advantages of a For example, to prevent transmission of prion NAT include higher sensitivity and greater diseases through blood, some transfusion serv- reliability during the window period – which ices have excluded donors who have lived in is the time between a blood donor becoming the UK over certain time periods (see below). infectious and donor screening tests becom- Other reasons for deferral include fever with ing positive, i.e. seroconversion has occurred. headaches the week before donation (a risk of The use of NAT, better serologic-based assays West Nile and other viruses) or travel to certain and more effective regulatory controls have regions (a risk of malaria). made contemporary blood products consid- erably safer. Ultimately, transfusion services Awaiting Better Diagnostics – Prion must balance safety against access to blood and Diseases its products. What is screened for will depend A rare but important form of communicable on the types of infections found within a geo- dementia is found in the prion diseases (also graphic region as well as affordability of the called transmissible spongiform encephalopa- screening tests. thy, or TSE). These diseases affect humans and NAT-based assays for screening blood dona- a number of animals used for meat including tions can be used to screen pools of donations, cattle, deer, sheep and goats. The term prion for example, 16–24 donations simultaneously or comes from protein and infectious and was individual ones. The former is more rapid and coined by S. Prusiner who was awarded the cheaper, but rare instances of HIV, HBV or HCV Nobel Prize in Physiology or Medicine in 1997 can be missed. The testing of individual dona- for his work on prions. The important compo- tions is the method of choice but until recently nents of prion disease include the PRNP gene was too expensive. Today, as new analytic plat- and its cellular product PrPc (prion protein cel- forms allow rapid and automated multiplexing lular) which can become the infectious protein NATs to be used, the screening of individual product PrPSc (prion protein scrapie). The nor- donations becomes more cost effective. mal PrPc is a cell surface glycoprotein found in Blood transfusion services test blood and a wide range of animals, having a function that donors for a range of infectious agents depend- as yet remains unknown. PrPc needs to change ing on national requirements. The WHO recom- its conformation to its isoform PrPSc to be infec- mends mandatory screening for HIV-1, HIV-2, tious. No nucleic acid is involved in this proc- HBV, HCV and syphilis, while the requirement ess, highlighting the novel way in which prion for HTLV-I, HTLV-II (HTLV – human lym- disease arises and is propagated [28]. The dis- photropic virus) and malaria are decided on a ease leads to widespread neurodegeneration regional basis [27]. Other infectious agents that with cognitive and motor impairment. It is fatal can be screened for include West Nile virus, and there is no treatment (Box 6.3). Work con- dengue and emerging infections. Screening can tinues to develop an early diagnostic marker also be undertaken in selected cases, for exam- for this disease. This is a priority for screening ple, CMV free-blood for immunosuppressed blood and its products.

MOLECULAR MEDICINE 184 6. Public Health, Communicable Diseases and Global Health

BOX 6.3 PRION DISEASE. Prion disease may be sporadic, inherited, an early age of onset (Figure 6.3). Mutations in iatrogenic or transmissible from animal to the PRNP gene account for the inherited forms human via infected meat and now human to of CJD. However, in the vast majority of spo- human via blood products. The dementia that radic cases, there are no detectable DNA muta- results includes sporadic, iatrogenic, inherited tions, and the change from PrPc to the abnormal and variant Creutzfeldt-Jakob disease (CJD) in PrPSc is thought to occur because of somatic humans, bovine spongiform encephalopathy or mutations or other, as yet unknown genetic or BSE in cattle (related to the 1986 epidemic of mad environmental factors. Risk factors for devel- cow disease in the United Kingdom), and scrapie oping vCJD include young age, residence in in sheep and goats. In 1996, the emergence of the United Kingdom especially between 1985 variant CJD (vCJD) in humans is thought to and 1990, and intriguingly, homozygosity for have arisen from transmission across the spe- a codon 129 polymorphism in the PRNP gene. cies of the BSE agent. vCJD is characterized by At this position there is either a methionine or a

Human Prion Disease

Commonest CJD 85% Autosomal dominant Sporadic PRNP Inherited Mutations PRNP CJD Worldwide CJD Cause unknown 10-20% CJD

Met/Val CD129

1% CJD Most recent form CJD Variant Iatrogenic Contaminated instruments c Presumed animal to CJD PrP human prions CJD Transplants BSE contaminated meat Blood & blood products Younger patient cf Pituitary extracts sporadic CJD

Gerstmann-Straussler- Cannibalism Kuru PrPSc Scheinker disease 2 Rare diseases Fatal familial insomnia

FIGURE 6.3 Human prion diseases [28]. There are different types of Creutzfeldt-Jakob diseases (CJD), and two other prion-related diseases. The most recently described is vCJD which is thought to have occurred as a result of direct animal to human spread through contaminated beef products. Now there is evidence for human to human spread via transplants or blood products.

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 185

BOX 6.3 (cont’d)

valine. In normal individuals, the combinations valine allele because a longer incubation period of methionine/methionine, methionine/valine is needed to develop prion disease without the and valine/valine are present. However, in additional genetic risk factor. Other less well patients with vCJD, the homozygous methionine characterized polymorphisms in this gene have is always found, suggesting that this may lead to been detected and may represent additional genetic predisposition. Most patients developing genetic modifiers [28]. Prion disease remains a iatrogenic CJD after receiving pituitary extracts challenge for the future, particularly to explain for growth hormone are also homozygous for how the infectious forms occur without any methionine. If this is correct, some have hypoth- apparent conventional infectious agents being esized that a second wave of vCJD will occur in involved. Better diagnostics and some form the future involving those who are methionine/ of therapy are needed for this rare but fatal valine heterozygotes or homozygotes for the infection.

Pathogenesis 3. Consider how new treatment options including vaccines can be developed. The pathogenesis of many infections has been determined from studies utiliz- Virulence Factors ing light/electron microscopy, cell culture or Microorganisms have developed a range of immunoassays. To these can now be added virulence factors to allow them to invade a host nucleic acid (DNA, RNA) based method- (Figure 6.4). The best known are toxins, which ologies. Advantages provided by nucleic are broadly divided into: acid techniques include the ability to detect latent (non-replicating) viruses, and to local- 1. Exotoxins – usually proteins secreted by ize their genomes to nuclear or cytoplasmic both gram positive and gram negative regions within cells. Tissue integrity remains bacteria. They can be deadly, for preserved during in situ nucleic acid hybridiza- example, tetanus exotoxin and diphtheria tion and so histological evaluation can also be exotoxin, and undertaken. NAT can be manipulated to enable 2. Endotoxins – usually heat stable a broad spectrum of serotypes to be detect- lipopolysaccharides found in the gram able. This is particularly valuable in emerg- negative bacterial cell wall. ing infections where the underlying serotypes Nevertheless, killing the host is not benefi- are unknown. Today, a very powerful applica- cial to the invading organism and in some cir- tion of NAT is the ability to sequence whole cumstances it is essential that the host does not genomes, and so identify a pathogen or what die. This is exemplified by H. pylori, which has it is likely to be. From its genomic sequence it sophisticated virulence factors including VacA becomes possible to: and CagA allowing it to invade and cause dam- 1. Predict its role in disease pathogenesis; age to the host. However, the same organism 2. Find regions in the genome suitable for rapid has also evolved to ensure its continued survival diagnostics via NAT, and by modulating its cell killing capacity because

MOLECULAR MEDICINE 186 6. Public Health, Communicable Diseases and Global Health

protection, at least in emergencies [30]. A better understanding of how toxins work and func- tion as targets for new drugs is coming from molecular studies. Adhesins The traditional targets for conventional anti- microbials (usually antibiotics) include com- Toxins ponents of the bacteria that are essential for survival, such as the cell wall, the cell cycle, DNA replication and protein synthesis. This Secretion systems approach kills (bacteriocidal) or inhibits growth (bacteriostatic) of most bacteria, but invari- Gene regulation ably allows some residual subpopulations with natural immunity to be positively selected for, and hence the development of antimicrobial FIGURE 6.4 Virulence mechanisms used by bacteria. resistance will follow. Therefore, focus has now Four mechanisms can be used by bacteria to invade a host. Which predominates will vary for each microorganism. shifted to developing the next generation of (1) Adhesins allow bacteria to attach to host cells. This is antimicrobials, which target virulence factors. the first step in the infective process. Some bacteria have This would overcome the pathogenicity of the appendages such as pili and flagella to facilitate attachment; organisms without necessarily killing them and (2) Many toxins are produced and have been well charac- so avoids setting up an environment for resist- terized both biochemically and molecularly; (3) Bacteria ultimately need to secrete their products into the host cell ance strains to emerge [30]. through specific secretory systems. A number have been described and are needle-like to allow the passage of tox- Host Resistance ins from the bacteria into the host, and (4) Implied in the Microorganisms have developed sophisti- concept of a pangenome is a complex bacterial genome to cated ways in which to invade a host, but hosts orchestrate the various changes needed to infect a host and produce the appropriate effects. The regulatory environ- have evolved many protective mechanisms ment for this will need some common pathways and spe- (Figure 6.5). The host’s response in terms of cific ones when comparing different bacterial species. genetic modifications is particularly relevant to molecular medicine. In humans, evidence the CagA protein while cytotoxic per se counters for a genetic component influencing the out- some of the effects of the VacA toxin [29]. come of an infectious disease comes from the Toxins have many different actions, and following observations: (1) Not all exposed to using broad spectrum antimicrobials to inacti- HIV-1 get infected, and those who do progress vate them might not always suffice (Table 6.6). to AIDS show different responses, and (2) Some Nevertheless, the potential for this approach ethnic groups are more resistant or susceptible to treating or preventing infection is illus- to infections, e.g. resistance to malaria in some trated by B. anthracis – a bacteria causing Black Africans. anthrax. It achieved added notoriety because HIV-AIDS: The main HIV co-receptor of an attempt at bioterror using postal letters in involved in the infection process is CCR5. 2001 (Box 9.5). The attenuated anthrax bacteria Naturally occurring mutations in this receptor – (Pasteur strain) used for immunization lacks such as a 32 base deletion present in up to its toxin confirming the latter’s importance in 20% of European populations (about 1–2% are disease causation. Animal studies also suggest homozygous) – allow these individuals to be that antibodies that inhibit the anthrax toxin highly resistant (homozygotes) or partially from binding to host receptors might provide resistant (heterozygotes) to HIV-1 infection

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 187

TABLE 6.6 some bacterial toxins in the gastrointestinal tract [30–32].

Bacterium Toxins and their effects

Clostridium botulinum Associated with foodborne illness.a Produces seven antigenically distinct neurotoxins that are important to detect. Conventional diagnostic assays are used although they are slow and difficult. A number of NAT have been developed and are being evaluated. Clostridium perfringens Associated with foodborne illness. Is a ubiquitous organism in nature. Produces two β toxins detectable by traditional assays or PCR NATs. Escherichia coli The enterohemorrhagic E. coli (EHEC) remain an important cause of foodborne illness with one serotype 0157:H7 and other EHECs serious public health problems. Shiga 1 and Shiga 2 are the two main toxins and are so named because of similarities with the Shigella dysenteriae toxin. Molecularly the Shiga toxins have two subunits: A (active unit) and B (receptor binding unit). This toxin structure is similar to what is seen with the anthrax toxin although the latter has three subunits (1 for binding, 1 called the lethal factor and 1 called the edema factor). Rapid and sensitive methods to detect EHEC and its toxins for clinical purposes including source and spread are possible with NAT. In mid 2011, an outbreak of EHEC in Europe caused deaths and involved serotype 0104:H4. Its source was shown to be infected sprouted seeds. Using NG-DNA sequencing platforms, the genome for this pathogen was completed within a week. It showed the E. coli to be a hybrid strain and identified a number of antibiotic resistance genes. These findings might explain the pathogen’s virulence and could also be used to design rapid NAT diagnostics. Vibrio cholerae There are 10 pathogenic vibrio bacteria associated with foodborne illness (particularly seafood) with cholera being the best known. PCR NATs have proven valuable in detecting the underlying vibrio as well as relevant toxins. Clostridium difficile A major cause of diarrhea in hospital patients and those in long term care. Serious infection is worsened by prior use of antibiotics that change the normal microbiota and allow proliferation of toxin producing C. difficile. A hypervirulent strain of this organism is spreading and is defined by NAT PCR as ribotype 027 which is thought to have risen by mutations in the toxin regulator gene leading to overproduction of toxins A and B. Its spread may be underestimated because NAT typing is not used in all countries. Listeria monocytogenes An important pathogen in the food industry with major outbreaks already reported in several countries. Virulence genes are located within a 9 Kb cluster and are involved in ensuring cell to cell spread. They include a hemolysin gene (hlyA) with its product LLO essential for pathogenicity and three other genes. Detection methods include conventional agar plating but NATs provide greater flexibility particularly if large numbers of food products need to be screened. Helicobacter pylori Spiral organisms causing gastroduodenal disease including gastritis, peptic ulcer, gastric cancer and lymphoma. The importance of showing a link between these diseases and H. pylori was recognized with the award of a Nobel Prize for Physiology or Medicine to B Marshall and R Warren in 2005. Spreads from person to person and can produce a chronic life long infection unless treated. Non invasive but leads to chronic inflammation with cancer as possible sequelae. The two toxins are CagA and VacA and there are two types of H. pylori – 1 and 2. Each has the vacA gene but only type 1 has the cagA gene and so is the more pathogenic. Even though type 2 has the vacA gene it does not seem to express. There are many approaches to diagnosis including distinguishing types 1 and 2. NAT methods work well with gastric biopsies. aFoodborne illness remains an important public health issue with major health and economic consequences. The US Centers for Disease Control and Prevention (CDC) estimates each year 1 in 6 Americans (48 million people) get a foodborne illness and around 3 000 die.

MOLECULAR MEDICINE 188 6. Public Health, Communicable Diseases and Global Health

Host genetic factors that provide some pro- tection from malaria have been identified. These include single gene effects seen in the hemoglobinopathies such as sickle mutation Chemical Microbiota responses (HbS), HbE, α thalassemias and β thalassemias. The hemoglobinopathy protective effect results from abnormal red blood cells that quickly lyse when invaded by parasites and so the para- sites die. In the case of the sickle mutation this occurs because of the sickling effect while with Physical Genetic HbE and thalassemias it reflects the small and barriers adaptations poorly hemoglobininized red blood cells. There are many different hemoglobinopa- thies, but usually one type predominates in a given population; for example, black Africans will have HbS, South East Asians HbE and Mediterranean populations will have different FIGURE 6.5 Host mechanisms to protect against inva- thalassemias. Each protects against malaria but sion by microorganisms. Various protective mechanisms allow the host to escape or modulate invasion by a microor- co-inheritance can cancel out this effect. Thus, ganism. (1) Microbiota in the host (microbiota – normal micro- HbS co-inherited with α thalassemia removes bial flora; metagenome (Chapter 4) – the genetic (DNA/RNA) the malaria protection because it makes the red material isolated from an uncultured microbial environ- blood cell abnormality less severe [34]. ment); (2) Physical barriers such as skin or mucosa, pH, tem- Genetic factors may also enhance the risk perature and secretions; (3) Chemical barriers particularly the immune response, and (4) Genetic adaptations which of infection. These are more subtle as they are evolve over a long period of time but provide an effective thought to involve multiple genetic effects; mechanism to protect against certain pathogens. i.e. QTLs (quantitative trait loci) that are dif- ficult to detect. They have been sought by and disease progression [33]. Studies are now association (case control) studies and now underway with anti-HIV drugs that target the by GWAS (genome wide association studies) CCR5 receptor and a bone marrow transplant (Chapters 2, 3). These studies have identified approach is described in Chapter 8. predisposition genetic loci to N. meningitidis Malaria: The two most common forms of meningitis, tuberculosis, HCV, leprosy and HBV. malaria (P. falciparum and P. vivax) produce severe In the case of HBV it is the HLA locus that anemia. P. falciparum is also associated with cere­ seems to be the key factor in predisposition and bral malaria, respiratory and metabolic compli- it is perhaps not coincidental that non-response cations. This spectrum is partly explained by P. following vaccination with HBV vaccine is more falciparum being able to invade a large propor- likely to occur in those with certain HLA types tion of red blood cells, whereas P. vivax can only such as DRB1*03 and DRB1*07 HLA types [33]. invade the reticulocytes. Another explanation is the mode of entry of these parasites into red Influenza blood cells; P. falciparum has a number of routes The three RNA influenza viruses (A, B, C) of invasion, whereas P. vivax can only enter red are distinguished by their internal group- blood cells that carry the Duffy blood group. This specific ribonucleoprotein. Only influenza A parasite is not seen in West Africa because the and B are medically significant, since epidemics populations there are Duffy negative. or pandemics have not occurred with influenza

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 189

global spread are the water birds, it undergoes genetic changes. In the past 100 years there have been four influenza pandemics: Lipid membrane 1. 1918 H1N1; Hemagglutinin 2. 1957 H2N2; Neuraminidase 3. 1968 H3N2, and 4. 2009 H1N1.

RNA M protein A fifth outbreak (H5N1) has not been declared a pandemic but remains a concern. Avian influenza (avian flu, bird flu, H5N1, 1997 FIGURE 6.6 Structure of the Influenza virus. This and re-emergence in 2003): This remains a world- RNA virus has two key surface glycoproteins: (1) Hema­ wide threat to health, with some regarding a gglutinin (HA or H) – facilitates the entry of virus into host cells through attachment to sialic acid receptors, and H5N1 pandemic as being potentially more dev- (2) Neuraminidase (NA or N) – involved in the release of astating than the 1918 Spanish flu outbreak. progeny virions from infected cells. The HA is the major In 1997, the first cases of human infection from determinant against which are directed neutralizing anti- exposure to sick birds or their droppings were bodies, and so also the target for influenza vaccines. In con- reported in Hong Kong, indicating that this trast, the NA is an important target for antiviral agents. virus subtype had jumped the species barrier. C. Influenza A has the potential to produce Eighteen patients were admitted to hospital pandemics because it infects other species and six died. Fortunately, the timely culling of apart from humans, including birds, pigs and over a million chickens controlled this particular horses. Influenza B only infects humans and outbreak. Today, H5N1 still causes outbreaks in so its antigenic structure does not become suf- chickens, and sporadic human infections con- ficiently different to cause pandemics. In con- tinue to be reported, with a mortality of over trast, viruses such as measles undergo minimal 50%. In contrast to H1N1 swine flu and SARS antigenic variation with one infection giving (Severe Acute Respiratory Syndrome) that have life-long immunity. been spread from human to human and through The subtyping of the influenza A virus is travel, the H5N1 bird flu remains relatively con- based on its outer viral proteins, which include tained because spread is predominantly through two important and distinct antigenic glycopro- chickens or other birds. teins: Hemagglutinin (H – composed of 16 differ- The common human influenza virus (H3N2) ent types) and neuraminidase (N – nine different is highly contagious but rarely lethal. Avian types) (Figure 6.6). Although the envelope anti- flu in chickens (H5N1) is a particularly viru- gens are capable of producing many different lent type that can kill rapidly and causes wide- combinations (as seen in water birds), a smaller spread organ damage. Fortunately, it is not number are found in humans. To date only a easily transmitted from birds to humans, and few have been implicated in human to human more importantly, human to human spread spread (H1N1, H2N2, H3N2, H1N2, H5N1, is poor. However, swapping genetic mate- H9N2 and H7N7) with highly pathogenic avian rial, should an individual be co-infected with influenza subtypes found only in H5 and H7 both, might produce a hybrid H5 (avian flu) N2 subtypes (Figure 6.7) [35]. (human flu) virus with devastating effects. DNA As the influenza A virus passes through its sequencing of the viral genome from various hosts, the most important of which in terms of outbreaks has shown that the virus continues to

MOLECULAR MEDICINE 190 6. Public Health, Communicable Diseases and Global Health

H2N2

H1N1

(2009) 1957 1918

H3N2

1968

H7N7

2003

(1989, 2002)

H1N2

2003

1977

H1N1

1999 (2003) 1997

H9N2 H5N1

FIGURE 6.7 Major animal–human and human–human influenza outbreaks. Since the 1918 pandemic, a number of important outbreaks have been recorded (subtypes and dates are given as well as hosts involved). A worrying trend is the increasing numbers of new subtypes in humans, as well as an expanding animal involvement since 1997, in particular the domestic chicken. mutate. This has implications for pathogenicity, Spanish influenza (H1N1, 1918): The virus from as well as antiviral drug resistance, and having this pandemic, which killed about 40 million the right vaccine ready if needed. In this unpre- people, had not been isolated. Without a virus dictable environment, the value of rapid NAT little research was possible, then the viral RNA diagnostics is crucial to detect early cases and sequence was determined using material from for public health surveillance. The genes of the archival tissue, including formalin-fixed autopsy virus that caused the 1918 pandemic have been material. The sequence itself did not provide studied to better understand what makes an clues for why the Spanish influenza virus was influenza virus virulent and capable of produc- so virulent, and so the next step was to recon- ing a pandemic [35]. struct the viral coding segments and clone them

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 191 into plasmids. Individual genes from the H1N1 (PB2) was then found to be important for viral 1918 virus were then introduced into a common transmissibility [35]. laboratory viral strain and pathogenicity sought. Swine influenza (H1N1, later called H1N1(09), Although the H and N glycoproteins were fac- 2009): After the appearance and then rapid tors in the virulence of this virus, it was also disappearance of SARS (Box 6.4), followed shown that one of the RNA polymerase subunits by the concerns regarding the possibility of known as PB1 was involved. Another subunit a H5N1 pandemic that did not occur (so far),

BOX 6.4 S A R S ( S E V E R E A C U T E R ESPIRATORY S YNDROME). This infection attracted a lot of publicity and 2. Rapid whole genome sequencing of viral provoked considerable fear when it emerged in RNA enabled the development of PCR based China and then Hong Kong in 2003. SARS sub- diagnostic assays, and sequently spread to many countries, producing 3. In searching for animal reservoirs, RT-PCR around 700 deaths in the first half of 2003. This based techniques were used. These allowed was at one time described as the first pandemic SARS-CoV to be detected, as well as of the 21st century, but it never progressed identifying genetic differences between the beyond an epidemic because of effective pub- human and animal virus. lic health measures effected by mid 2003 [36]. The outbreak ended just as quickly as it The social and economic impacts of this infec- started. Only occasional cases were reported in tion were considerable, including major disrup- early 2004, and none after the end of April that tions to international travel. SARS was shown to year. However, there remain many unanswered be caused by a novel coronavirus (CoV) which questions including the inconsistent human to was thought to have crossed the species barrier, human transmission which might have been although the animal reservoir for SARS took a due to super-spreaders. while to find. It is now thought to be: Another observation was the relatively 1. Masked palm civets – used for exotic food large numbers of health workers who became dishes in China, and infected. This became an issue when two of the 2. Horseshoe bats [37]. nine persons infected in China in 2004 worked in a reference laboratory conducting research Traditional approaches such as viral culture, into the virus. A similar scenario was reported electron microscopy and serology helped to earlier in Singapore. The latter case was docu- characterize the SARS virus. Nevertheless, SARS mented on RNA sequencing of the virus to illustrated the value of NAT approaches in deal- be due to a contaminated laboratory culture ing with an emerging virus. Molecular testing that the scientist had been working with three enabled the following to be possible in a very days before showing signs of the infection. The short time frame: WHO subsequently flagged the importance of 1. Typing of the virus from two different laboratory containment when dealing with the countries (Taiwan and Hong Kong) showed SARS virus. that human to human spread had occurred;

MOLECULAR MEDICINE 192 6. Public Health, Communicable Diseases and Global Health the world in 2009 was faced with another pos- virulence of a pathogen, or secondary to micro- sible serious influenza outbreak. This outbreak bial adaptation. A review of the major infections was described as swine flu, because it was a in history provides some background to the well-recognized cause of influenza in pigs. The emerging ones. They are: virus is related to the H1N1 virus that caused the Spanish flu, and can spread from per- 1. Plague of Athens 430 BC; son to person. The WHO declared a swine flu 2. Black death (Y. pestis) in 1340s; pandemic in June 2009. Vaccines were rapidly 3. French pox (syphilis) 1494; developed and stockpiles of antiviral drugs, 4. Small pox 1520; particularly the two mentioned in Table 6.5, 5. European cattle epidemics including were released to the public. Rapid NATs requir- anthrax, foot and mouth disease 1700s; ing RT-PCR because it is an RNA virus were 6. American plague (yellow fever) 1793; developed (see Table 3.3). This flu was a little 7. Cholera pandemic in Paris 1832; unusual because it tended to be more severe in 8. Measles outbreak in Fiji 1875; younger people, including children and preg- 9. Spanish influenza 1918, and nant women whereas deaths from seasonal flu 10. HIV-AIDS from 1981 [38]. involve mostly the elderly. Despite early con- cerns expressed by public health officials and Zoonoses considerable media hype, the WHO declared Most emergent viruses are zoonotic – i.e. the H1N1 pandemic over in August 2010. they are acquired from animals that are reser- voirs of infection. This is particularly relevant Emerging and Re-Emerging Infections in the modern world, where the consequences of easy migration, deforestation, agricultural Emerging (newly discovered, for example practices, dam building and urbanization are SARS – Box 6.4) and re-emerging (previously making, and will continue to make, a major known, for example dengue virus) infections impact on the ecology of animals. For exam- have increased significantly in the past 20 years. ple, yellow fever is thought to have emerged in Many factors contribute including: the New World as a result of the African slave trade which brought the mosquito Aedes aegypti l Globalization, particularly increased travel in ships’ water containers. More recently, and trade; Aedes albopictus, a potential vector for dengue l Changes in human behavior, poverty and virus, has become established in the USA fol- social inequality; lowing its conveyance from South East Asia l Economic development, changes in the in old car tires. With this, the threat of dengue environment, weather and land use; in the North American continent has become l Lapses in public health measures including real. Humans have populated rural areas to those due to poverty or war; an increasing extent, as well as pursuing more l Complacency by communities or outdoor recreational activities. There is also a government; growing trend for exotic animals to be kept as l Mutations, selection and genetic household pets. Changes in global climate may reassortment in organisms; also contribute directly, through their effects on l Bioterror. vegetation, insect and rodent populations. Very few emerging infections represent novel Table 6.7 lists a number of zoonoses that have pathogens. Most are re-emerging infections become established as new infectious diseases, resulting from a change in the epidemiology or or are emerging as problems for the future. Some

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 193

TABLE 6.7 some examples of zoonoses resulting in new human infectionsa.

Pathogen (1) Clinical problems, (2) Emergence, (3) DNA applications

West Nile virus – RNA virus (1) An asymptomatic febrile illness but can be complicated by meningitis, from Flaviviridae family (Genus encephalitis or paralysis. Usually transmitted by mosquitoes. Also associated Flavivirus) – related to Yellow with blood or organ donation; pregnancy, lactation; infected needles or laboratory fever; Japanese encephalitis specimens. (2) Isolated in 1937 from Uganda and found in many parts of the world. Appeared in the USA in 1999, and has rapidly spread across North America. The virus is maintained by a bird-mosquito-bird cycle. (3) NAT is used to screen blood donors who may be asymptomatic carriers. Monkeypox virus – DNA virus (1) Self limited febrile illness with vesiculo-pustular eruptions. Confused with from Poxviridae family more serious illnesses and is spread animal to human or human to human. (Orthopoxvirus) – related to (2) Recognized in 1958 and remained localized to Africa until 2003 when it was smallpox detected in a mid-west USA outbreak. Traced back to rats imported from Africa to which native prairie dogs were exposed and became infected and then infected humans. Appears to be contained. Primary animal reservoir is the rat. (3) DNA characterization helped in identifying this virus as monkeypox. Ebola virus (Ebola) – RNA virus (1) Hemorrhagic fever in humans (mortality 50–90%). Example of increased from Filoviridae family human to animal contact in tropical forest with outbreaks generally resulting (Ebolavirus) – related to Marburg from the handling of infected dead animals. Humans highly contagious once virus disease established. (2) First isolated in 1976 from Sudan and Zaire. Since, sporadic outbreaks have occurred but remain in Africa. Animal host is unclear although bats are suspected as being natural reservoirs. (3) NAT assays for rapid and sensitive diagnostic tests described. Lassa virus (Lassa fever) – RNA (1) Hemorrhagic fever with 20% having severe multisystem disease. Virus excreted virus from Arenaviridae family in human urine or semen for months post infection. (2) Endemic in west Africa (Arenavirus) since 1950s. Rodents are the primary reservoir and infect humans through fecal or urine contamination of food stores or if eaten. Human to human transmission occurs. (3) RT-PCR multiplex assay that can detect all important acute hemorrhagic fever viruses and provide information on viral loads has been described [39]. Hantavirus – RNA virus from (1) Hemorrhagic fever with renal and pulmonary syndromes causing potentially Bunyaviridae family (Hantavirus) fatal disorder. Infection occurs through exposure to aerosolized rodent excreta or bites. The aerosolization aspect makes this virus a particular concern for bioterror. (2) Isolated in 1979 in Korea. Now established within the Eurasian continent and the Americas. Outbreaks reported in the USA thought to be related to climatic changes, increasing vegetation and rodent population. (3) See Lassa fever. Lyme disease – Bacterial spirochaete (1) Early non-specific malaise can be complicated by arthritis, neurologic and Borrelia burgdorferi cardiac problems. Tick (Ixodes spp.) transmitted disease. (2) First recognized in USA in 1957, since then reported in many countries. Mice, rodents and birds are the intermediate hosts. (3) DNA characterization useful for epidemiologic purposes, and to explain variable clinical features in different countries. HIV – RNA virus from Retroviridae (1) Serious acquired immunodeficiency disorder. (2) Cross species transmission family (Lentivirus) from non-human primates followed by human to human spread. Detected in 1981. Evidence for the link between non-human primate and human disease includes: (i) Similar viral genomes; (ii) Prevalence in the natural host, and (iii) Geographic co-location. (3) NAT has been helpful in all phases of this particular disease from diagnosis to prognosis (in terms of viral load determination and detection of viral resistance).

(Continued)

MOLECULAR MEDICINE 194 6. Public Health, Communicable Diseases and Global Health

TABLE 6.7 (Continued)

Pathogen (1) Clinical problems, (2) Emergence, (3) DNA applications

Australian bat Lyssavirus (ABLV) – (1) Serious viral infections with high risk for fatal encephalitis first causing a RNA virus from Rhabdoviridae problem in Australia in the mid 1990s. Lyssavirus closely related both serologically family (Lyssavirus) and molecularly to the rabies virus while Hendra is related to Nipah virus. (2) Bats Hendra virus – RNA virus from are reservoirs for both viruses. Two ABLV deaths reported to date have resulted Paramyxoviridae family (Henipavirus) from a scratch or bite from an infected bat. Hendra infects humans via exposure to the bodily fluids of infected horses. (3) Both phenotypic (serology) and genotypic (PCR tests) available for ABLV and Hendra. Chikungunya (CHIKV) – RNA virus (1) Can result in severe illness comparable to dengue fever followed by arthralgias from Togaviridae family (Alphavirus) that can last for years. Transmitted to humans by Aedes mosquito bites. (2) Endemic to tropical Africa (first isolated in Tanzania in 1953) and Asia although recently outbreaks seen in Western Pacific, Europe and India. Main reservoirs are monkeys. (3) Traditional tests take time or may give false positives (serology). RT-PCR useful and gives rapid result. aBSE, CJD, and vCJD are dealt with in the text.

of these are newly acquired in the west, while can be used to improve global health. Cheaper others remain endemic to specific countries. drugs and vaccines for all communities is an However, any disease may be spread through important benefit that should come from molec- international travel, or the mass dislocation of ular-based technologies. Another would be bet- large populations through civil unrest. There is ter NATs. In this respect it is intriguing to recall also an increasing possibility that a number of how direct-to-consumer DNA testing (Chapter 5) pathogens could be used for bioterrorism. Some makes effective use of the Internet. Could the of the zoonoses associated with a viral hemo­ Internet be one way to improve accessibility for rrhagic clinical picture can be confused with disadvantaged communities or those in rural other clinical infections including malaria, lept- and remote regions? Consideration of how ospirosis, and N. meningitidis and in these poten- genomics can play a part in the bioeconomy, tially fatal conditions, a rapid screening test is with its potential to generate income, improve essential. In terms of bioterrorism and the dif- food production and sustain a better environ- ferential diagnosis of hemorrhagic fevers, NAT ment, are some of the challenges now being assays are presently the only option to allow taken up by bodies such as the OECD. rapid and sensitive diagnostic tests to be devel- oped. If new therapeutics are required, the first Non-Communicable Diseases step will be nucleic acid sequence analysis of the microorganisms’ genomes so that it can be clas- A large part of this chapter has dealt with sified and identified. Next, potential targets for infectious diseases and how these impact on vaccines or drug therapies can be established. individuals, communities and ultimately glo- bal health. To complete the story, it is necessary to consider non-communicable diseases since, GLOBAL HEALTH apart from their primary effect on health and well being, they can also contribute to a com- In an era of personalized medicine, one munities’ vulnerability to infectious diseases should not lose sight of how molecular medicine (Table 6.8).

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 195

TABLE 6.8 some global health challenges [21,40–41]. 4. To address these problems it is essential to have better evidence-based decision making, Non-communicable Communicable diseases diseases more effective regulation and behavioral interventions that are known to work. Major infectious disorders – Hypertension AIDS, TB and malaria The need to shift focus more to community- Emerging (new) pathogens, Tobacco and alcohol based prevention and concentrate less on attempt- e.g. Hendra, Australian bat ing to cure a problem once it is established has lyssavirus already been highlighted [40]. Re-emerging (old) pathogens, Hyperglycemia, physical e.g. C. difficile, mumps inactivity, overweight and obesity Obesity Influenza A (H1NI and H5N1 Childhood underweight A number of the non-communicable health strains) and resistance problems listed in Table 6.8 have obesity as a Zoonoses accounting for Unsafe water, poor contributing factor. In the USA obesity contin- about 60% of emerging and sanitation and hygiene ues to be a major health challenge; 2003–2004 re-emerging infections estimates indicated that 66% of the US popula- Antibiotic resistance, Indoor smoke from tion was overweight, and 32% obese, as defined e.g. multidrug resistance, solid fuels (low-middle by a BMI  30 kg/m2. Another estimate is that extensively resistant TB, income countries) 50% of the adults in the USA will be clinically vancomysin resistant or urban outdoor air enteroccocus, and drug pollution (high income obese by 2030 [42,43]. resistance, e.g. malaria countries) Current understanding is that most cases of obesity are caused by a mix of genetic and Agents that could be used for Suboptimal breast bioterror, e.g. anthrax, plague, feeding, low fruit or environmental factors, although their relative smallpox vegetable intake contributions remain to be determined. The rapid development of obesity worldwide can only be an environmental effect. Nevertheless, many people in the same environment have A Perspective on global non-communicable not developed obesity and so genes must play diseases makes some sobering observations a role. Comparisons between monozygotic and including: dizygotic twins, as well as other studies, show 1. 60% of all deaths are due to chronic diseases, greater concordance for the BMI (a surrogate with most occurring in low to middle income measure for obesity), i.e. there is an important countries with a disproportionate number of genetic component to obesity, with estimates young people dying during their productive indicating that this is a strong effect (around years; 80%) [43]. 2. Non-communicable diseases are likely One hypothesis, which has been around for to have a more detrimental effect on 50 years, captures both genes and environment. global economic development than fiscal It suggests that genes important for metabo- crises, natural disasters or pandemic lism in humans evolved over time to respond influenza; to periods of famine. These so called thrifty 3. In the next 10 years, it is projected that China genes allowed hunter-gatherer populations to (as one example) will lose $558 billion in process food into fat deposits during times of national income because of preventable heart plenty, so that they could survive when food disease, stroke and diabetes, and was not available. Today, these same genes

MOLECULAR MEDICINE 196 6. Public Health, Communicable Diseases and Global Health respond inappropriately when food is readily 3. Complex but common forms of obesity for available all year round, and so obesity results. which the traditional association or GWAS Evidence for this genetic evolutionary effect is have been used to identify risk alleles [44]. still awaited. Other hypotheses include: Genes or gene loci implicated in obesity 1. Fetal programming (perhaps via epigenetic have been listed in a Human Obesity Gene changes) with maternal nutrition a key factor Map last updated in 2005 [45]. This map pro- in how the child will grow postnatally; vides a summary of published data that are 2. Sedentary lifestyle, i.e. diet and lifestyle are not necessarily confirmed or authenticated but the main contributors and from the genetics gives a flavor of the rich genetic heterogene- perspective this would put the focus onto ity expected with a complex phenotype such metabolic enzymes; as obesity. Observations made about the 2005 3. Increased reproductive fitness, since the human obesity map include: number of offspring is positively correlated 1. 176 cases involving obesity in humans are with the BMI of women – i.e. adiposity due to single gene mutations in 11 genes; increases fertility, and 2. 253 genetic loci have been reported for 4. Many others [43]. obesity from genome wide scans; The public health response to the obesity 3. There are 426 findings of positive epidemic is focused on eating less, avoiding associations with 127 candidate genes; fast foods and exercising more. However, this 4. Association studies in 22 genes have been approach is not working. Can a more person- replicated at least five times, and alized genomics strategy help? Will a scientifi- 5. There are putative obesity loci on all cally plausible understanding of how diet, the chromosomes except Y. environment and obesity interact allow govern- Microbiome and Obesity ments and individuals to take a more effective approach? One way to pursue this would be to It is intriguing to recall the observation in know more about the genes involved in obesity. Chapter 4 that the gut metagenome shows a characteristic alteration in obese subjects, and The Genetics of Obesity so the microbial flora may play a role in obes- Our current understanding of genes and ity that is independent of net calorie intake. In obesity is still rudimentary, so medical or moti- obese humans and animals (mouse, rat and pig) vational interventions cannot be tested. At the the ratio of the two major bacterial divisions Firmicutes genetic level, obesity can be considered in three in the gut shows a predominance of Bacteroides groups: over the . This is likely to be a pri- mary rather than secondary effect, because 1. Monogenic, Mendelian defects, such as when germ-free mice were fed the microbio- mutations in the melanocortin-4 receptor ata derived from lean or obese mice, the phe- gene (MC4R) leading to an autosomal notype of the recipient mice moved towards dominant cause for obesity in up to 6% of that of the donor mouse – i.e. the obese or lean individuals, particularly those with more phenotype was transmissible via the microbi- severe forms and earlier ages of onset ome. One mechanism for this observation may (Box 6.5); be that the obese microbiome can extract more 2. Syndromal disorders such as Prader-Willi energy from food [46]. New targets for inter- syndrome, Bardet-Biedl syndrome and ventions may be found as the metagenomics Pseudohypoparathyroidism type 1A, and story unfolds and more is found about the gut

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 197

BOX 6.5 GENES AND OBESITY. Apart from the MC4R example given, other 1. Fto null mice are protected from obesity genes associated with obesity have a recessive by increased energy expenditure; mode of inheritance. They include mutations 2. FTO expression in humans is highest causing deficiency in leptin and its receptor in the brain, particularly the cerebral (LEP, LEPR) which act via the hypothalamus to cortex, and control appetite and energy expenditure. One 3. Duplication of a chromosomal region report, concerning a child with congenital leptin containing FTO (and other genes) was deficiency, described how a sustained reduction associated with mild obesity and mental in weight occurred following treatment with retardation in a case study. recombinant human leptin. Other genes in the It was reported recently that a reduction in leptin-melanocortin pathway are also implicated brain volume in healthy elderly individuals including POMC and PCSK1. A human gene was also associated with the same FTO allele for FTO was shown to be implicated strongly with obesity. Perhaps this is not surprising since obes- the BMI (body mass index) in a genome wide ity is also a risk factor in cognitive decline and association study involving subjects with type II dementia. Very rare monogenic causes of obes- diabetes. This has been replicated in other stud- ity include mutations in genes associated with ies and appears to be reflecting common SNP hypothalamic function such as SIM1, BDNF polymorphisms in intron 1, with the risk allele and NTRK2. These may lead to abnormalities in highly prevalent in the general population. energy balance resulting in hyperphagia and a European carriers who are homozygous for the net positive energy intake [43,44]. risk allele weigh on average 3 kg more. Some clues to FTO gene function include:

flora and its effects on a range of issues includ- also plays an important role (Chapter 3). ing obesity and inflammation. A more personalized approach becomes pos- sible through nutrigenetics – how individu- Nutrigenetics and Nutrigenomics als respond differently (because of genetic variation) to the same diet, for example, through Nutrition is a key environmental variable changes in blood pressure or serum choles- and so any starting point in understanding terol, and nutrigenomics – the role of nutrients obesity must encompass nutrition, including and bioactive food compounds in gene expres- its various genetic components. There is a par- sion. The ultimate goal is the development of allel here with pharmacogenetics. Conventional personalized nutrition options to ensure health dietary guidelines take consideration of age, and prevent disease [47]. Overarching these sex, height, weight and level of physical activ- goals is the incredible diversity of genetic, ity but not genetic variability. Many of these cultural and environmental considerations parameters are used to determine drug dosage, in diet. Nutrigenomics can be approached although it is now clear that genetic variability through many of the omics including genomics,

MOLECULAR MEDICINE 198 6. Public Health, Communicable Diseases and Global Health epigenomics, transcriptomics, proteomics, meta­ not recommend for or against the use of vita- bolomics and so on. min D supplements in reducing the risk of can- cer. The D2 and D3 forms of vitamin D need to Diet and Cancer be metabolized to the active 1,25-dihydroxy­ One can be sure of controversy and robust vitamin D and this involves a number of debate when the influence of diet, nutriceuti- enzymes (including cytochrome P450 dis- cals (nutrition  pharmaceutical), complemen- cussed earlier in relation to drug metabolism tary medicines or food additives are discussed in Chapter 3). The role of vitamin D in cancer in relation to cancer development. Knowledge may be better understood through a molecular of the link between cancer and diet is not new approach. This is important in view of the suc- and numerous research studies provide con- cessful public health campaigns in reducing the flicting data. This is not surprising since indi- risk of sun-related skin cancers. Interventions vidual genetic variability will make the small, recommended include the generous application multiple but cumulative effects of diet on DNA of sunscreens, avoidance of sun and the wearing damage difficult to measure or even replicate, of wide brimmed hats, particularly in children. just as association-based studies looking for While successful in preventing skin cancers, genetic factors in complex diseases produce there is concern (although this is controver- conflicting results. sial) that vitamin D deficiency may result. If so, One example is vitamin D deficiency, which there are risks to consider in terms of rickets and is said to cause cancer, although this is very related bone problems, and potentially cancer. controversial. The US National Cancer Institute The nutrition of cancer cells is also an area confirms a knowledge gap here, stating it does of interest. A relevant observation is known as

TABLE 6.9 Delivering growth and labor productivity through genomics [49].

Activity Examples

Agriculture Conventional agronomic practices have helped to increase global food yields but more is needed as the world’s population increases. Genetic-based knowledge is now being added to overcome roadblocks in productivity. A major step forward occurred when whole genome DNA sequences of many plants and staple foods such as rice were published. Salinity, drought and uncontrolled flooding are some of the challenges for rice growers. Whole genome sequences are now being interrogated to identify genes that might overcome these problems without necessarily going the full but controversial next step which are GM (genetically modified) crops. Livestock As living standards improve so does the expectation that more protein in the form of meat will become available as food. Like agriculture, the traditional animal breeding approach has led to better yields except for fish. Mapping in the 1990s to identify genes that would enhance breeding was also effective with the more powerful SNP mapping becoming an improvement on this in the 2000s. Today, whole genome sequencing has been completed in the pig, chicken and cattle and is expected to identify important genes to improve breeding and meat yields. Alternative fuels Solar and wind power are being used as alternative energy sources although air transport still relies on petroleum fuels. There is now considerable interest in identifying genes in the cow rumen or the termite gut to find new enzymes that can be used to digest wood isolated from various crops and so produce sugars that can be fermented into ethanol for fuel. Algae may also be induced to overexpress ethanol-producing genes and for this all that is needed is sunlight. As a bonus algae will take in carbon dioxide from the atmosphere.

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 199 the Warburg effect. O. Warburg was awarded References the 1931 Nobel Prize in Physiology or Medicine [1] CDC website on public health genomics. for discovery of cytochrome C oxidase. He also http://198.246.98.21/genomics/about/AAG/index. showed that cancer cells produce lactic acid htm from glucose even under non-hypoxic condi- [2] Starfield B, Hyde J, Gervas J, Heath I. The concept tions; an observation that now bears his name. of prevention: a good idea gone astray? Journal of This is considered to reflect abnormal regula- Epidemiology and Community Health 2008;62:580–3. [3] WHO screening criteria as interpreted in one juris- tion of glycolysis, since this pathway is very diction (Australia). www.health.gov.au/internet/ active compared to normal cells, even in the screening/publishing.nsf/Content/pop-based- presence of sufficient oxygen [48]. This finding screening-fwork/$File/screening-framework.pdf might have implications for new cancer therapy [4] Parsons EP, Bradley DM. Newborn screening pro- targets and help us to understand better how grammes. In: Encyclopedia of Life Sciences (ELS). Chichester: John Wiley & Sons, Ltd.; 2008. genes are involved in cancer causation. [5] Khoury MJ, McCabe LL, McCabe ERB. Population screening in the age of genomic medicine. New England Journal of Medicine 2003;348:50–8. Bioeconomy [6] Castellani C, Macek M, Cassiman J-J, et al. Benchmarks for cystic fibrosis carrier screening: a The OECD broadly defines bioeconomy as European consensus document. Journal of Cystic “the set of economic activities relating to the inven- Fibrosis 2010;9:165–78. tion, development, production and use of biological [7] Bonham VL, Dover GJ, Brody LC. Screening stu- products and processes”. It makes the prediction dent athletes for sickle cell trait – a social and clini- that biotechnology (in primary production, cal experiment. New England Journal of Medicine 2010;363:997–9. health and industry) can offer solutions that [8] Wilcken B, Wiley V. Newborn screening. Pathology will lead to the emergence of a bioeconomy. 2008;40:104–15. The OECD as an economy-based organiza- [9] McBride CM, Koehly LM, Sanderson SC, Kaphingst tion considers greater social benefits globally KA. The behavioral response to personalized genetic will come from improving sustainable growth information: will genetic risk profiles motivate indi- viduals and families to choose more healthful behav- without depleting resources, and labor pro- iors? Annual Review of Public Health 2010;31:89–103. ductivity. The latter can be enhanced through [10] UK 2009 NICE guidelines in familial hypercholes- innovation, which is particularly suited to teromaemia. www.nice.org.uk/nicemedia/pdf/ genomics as many of the future developments CG071NICEGuideline.pdf will be delivered in silico (Chapter 4) and so [11] Heart UK; familial hypercholesterolemia git Toolkit to implement NICE guidelines. www.heartuk.org.uk/ expensive infrastructure is not necessary. Some FHToolkit/ examples of how the bioeconomy will benefit [12] Genetics in the workplace: implications for occu- from genomics and other omics can be found in pational safety and health Nov 2009. Department Table 6.9. of Health and Human Services. Centers for Disease The expectation is that the bioeconomy can Control and Prevention. National Institute for Occupational Safety and Health. http://origin.cdc. be used to make substantial socioeconomic gov/niosh/docs/2010-101/pdfs/2010-101.pdf contributions to OECD and non-OECD coun- [13] McCanlies EC, Kreiss K, Andrew M, Weston A. HLA- tries, and from this will come better health out- DPB1 and chronic beryllium disease: A HuGE review. comes, improved productivity of agriculture American Journal of Epidemiology 2003;157:388–98. and industrial processes and enhanced environ- [14] McKee AC, Cantu RC, Nowinski CJ, et al. Chronic traumatic encephalopathy in athletes: progres- mental sustainability. In an attempt to optimize sive tauopathy after repetitive head injury. Journal the potential of the bioeconomy, the OECD has of Neuropathology and Experimental Neurology published a long term (2030) policy agenda [50]. 2009;68:709–35.

MOLECULAR MEDICINE 200 6. Public Health, Communicable Diseases and Global Health

[15] Pleasance ED, Stephens PJ, O’Meara S, et al. A small- [30] Rasko DA, Sperandio V. Anti-virulence strategies to cell lung cancer genome with complex signatures of combat bacteria mediated disease. Nature Reviews tobacco exposure. Nature 2010;463:184–90. Drug Discovery 2010;9:117–28. [16] Relman DA. Microbial genomics and infec- [31] Simjee S, editor. Foodborne diseases. New Jersey: tious diseases. New England Journal of Medicine Humana Press; 2007. 2011;365:347–57. [32] Clements ACA, Soares Magalhaes RJ, Tatem AJ, [17] Lapierre P, Gogarten JP. Estimating the size of the bac- Paterson DL, Riley TV. Clostridium difficile PCR terial pan-genome. Trends in Genetics 2009;25:107–10. ribotype 027: assessing the risk of further world- [18] Williams CH, Stanway G. Viruses: genomes and wide spread. The Lancet Infectious Diseases genomics. In: Encyclopedia of Life Sciences (ELS). 2010;10:395–404. Chichester: John Wiley & Sons, Ltd.; 2009. [33] Kaslow RA, Shrestha S, Tang JJ. Susceptibility [19] Belyi VA, Levine AJ, Skalka AM. Unexpected inherit- to human infectious diseases, genetics of. In: ance: multiple integration of ancient Bornavirus and Encyclopedia of Life Sciences (ELS). Chichester: John Ebolavirus / Marburgvirus squences in vertebrate Wiley & Sons, Ltd.; 2008. genomes. PloS Pathogens 2010;6:e1001030. [34] Penman BS, Pybus OG, Weatherall DJ, Gupta S. [20] Peleg AY, Hooper DC. Hospital-acquired infections Epistatic interactions between genetic disorders of due to gram-negative bacteria. New England Journal hemoglobin can explain why the sickle-cell gene of Medicine 2010;362:1804–13. is uncommon in the Mediterranean. Proceedings [21] Pang T. Germs, genomics and global public health. of the National Academy of Sciences of the USA How can advances in genomic sciences be integrated 2009;106:21242–21246. into public health in the developing world to deal [35] Tumpey TM, Belser JA. Resurrected pandemic influ- with infectious diseases. HUGO Journal 2009;3:5–9. enza viruses. Annual Reviews of Microbiology [22] Lee JH, Jeong SH, Cha S-S, Lee SH. New disturbing 2009;63:79–98. trend in anti-microbial resistance of gram-negative [36] WHO SARS risk assessment and preparedness pathogens. PloS Pathogens 2009;5:e1000221. framework October 2004. http://www.who.int/csr/ [23] Kumarasamy KK, Toleman MA, Walsh TR, et al. resources/publications/CDS_CSR_ARO_2004_2.pdf Emergence of a new antibiotic resistance mechanism [37] Shi Z, Hu Z. A review of studies on animal res- in India, Pakistan, and the UK: a molecular, biologi- ervoirs of the SARS coronavirus. Virus Research cal and epidemiological study. The Lancet Infectious 2008;133:74–87. Diseases 2010;10:597–602. [38] Morens DM, Folkers GK, Fauci AS. Emerging infec- [24] Travassos MA, Laufer MK. Resistance to antimalarial tions: a perpetual challenge. The Lancet Infectious drugs: molecular, pharmacologic and clinical consid- Diseases 2008;8:710–9. erations. Pediatric Research 2009;65:64R–70R. [39] Trombley AR, Wachter L, Garrison J, et al. [25] Kumar S, Kumar A, Dixit VK. Direct detection Comprehensive panel of real-time TaqMan™ and analysis of vacA genotypes and cagA gene of polymerase chain reaction assays for detection and Helicobacter pylori from gastric biopeis by a novel absolute quantification of Filoviruses, Arenaviruses multiplex polymerase chain reaction assay. Diagnostic and New World Hantaviruses. American Journal of Microbiology and Infectious Disease 2008;62:366–73. and Hygiene 2010;82:954–60. [26] Alcaide F, Coll P. Advances in rapid diagnosis of [40] Narayan KMV, Ali MK, Koplan JP. Global noncom- tuberculosis disease and anti-tuberculous drug resist- municable diseases – where worlds meet. New ance. Enfermedades Infecciosas y Microbiologia England Journal of Medicine 2010;363:1196–8. Clinica 2011;29(Supl 1):34–40. [41] USA’s NIAID summary of emerging and re-emerging [27] WHO 2010 recommendations: Screening donated infections. www.niaid.nih.gov/topics/emerging/ blood for transfusion-transmissible infections. http:// pages/list.aspx www.who.int/bloodsafety/ScreeningDonatedBlood [42] Agurs-Collins T, Khoury MJ, Simon-Morton D, Olster forTransfusion.pdf DH, Harris JR, Milner JA. Public health genomics: [28] Aguzzi A, Calella AM. Prions: protein aggrega- translating obesity genomics research into population tion and infectious diseases. Physiological Reviews health benefits. Obesity 2008;16(S3):S85–94. 2009;89:1105–52. [43] Walley AJ, Asher JE, Froguel P. The genetic contri- [29] Oldani A, Cormont M, Hofman V, et al. Helicobacter bution to non-syndromic human obesity. Nature pylori counteracts the apoptotic action of its VacA Reviews Genetics 2009;10:431–42. toxin by injecting the CagA protein into gastric epi- [44] Ho AJ, Stein JL, Hua X, et al. A commonly carried thelial cells. PloS Pathogens 2009;5:e1000603. allele of the obesity-related FTO gene is associated

MOLECULAR MEDICINE 6. Public Health, Communicable Diseases and Global Health 201

with reduced brain volume in the healthy elderly. [48] Koppenol WH, Bounds PL, Dang CV. Otto Warburg’s Proceedings of the National Academy of Sciences of contributions to current concepts of cancer metabo- the USA 2010;107:8404–9. lism. Nature Reviews Cancer 2011;11:325–37. [45] Rankinen T, Zuberi A, Chagnon YC, et al. The [49] OECD’s / HUGO’s Symposium on genomics and Human Obesity Gene Map: the 2005 update. Obesity bioeconomy, Montpellier France 17 May 2010. www. 2006;14:529–644. oecd.org/document/41/0,3343,en_2649_34537_ [46] Ley RE. Obesity and the human microbiome. Current 45430633_1_1_1_1,00.html Opinion in Gastroenterology 2010;26:5–11. [50] OECD’s: The bioeconomy to 2030: designing a policy [47] Fenech M, El-Sohemy A, Cahill L, et al. Nutrigenetics agenda. http://www.oecd.org/document/56/0,3746, and nutrigenomics: Viewpoints on the current sta- en_2649_36831301_36960312_1_1_1_1,00.htm tus and applications in nutrition research and prac- tice. Journal of Nutrigenetics and Nutrigenomics 2011;4:69–89.

Note: All web-based references accessed on 21 Feb 2012.

MOLECULAR MEDICINE CHAPTER 7 Development, Aging and Cancer

OUTLINE

Development 203 Apoptosis 224 Introduction 203 DNA Repair 225 Homeobox (HOX) Genes 204 Epigenetics 226 Other Genes 206 Metastasis 227 Imprinting 207 Germline Cancers 228 Epigenetics 210 Introduction 228 Puberty 211 Colon Cancer 228 Aging 211 Breast Cancer 231 Introduction 211 Somatic Cell Cancers 234 Genetic Components 211 Introduction 234 Animal Models 213 Hematologic Malignancies 235 Oncogenesis 214 Solid Malignancies 238 Introduction 214 Co-dependent Technologies/Companion Oncogenes 216 Diagnostics 240 Tumor Suppressor Genes 217 Viral Induced Cancers 240 miRNA Genes 221 References 242 Cell Cycle 223

each represents a form of growth (normal and DEVELOPMENT abnormal). The astute reader may find subtle messages that provide new insights; particularly Introduction in relation to cancer, which unlike development and aging can occur at any time, although it is Development, aging and cancer are considered primarily a disease of older people. together because at the molecular level there are Despite considerable diversity, most animal common pathways involved in all three, and bodies have basic similarities in their bilateral

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00007-4 203 © 2012 Elsevier Inc. All rights reserved. 204 7. Development, Aging and Cancer symmetry around a head-to-tail axis. Therefore, clusters were formed. In humans, the HOX it is not surprising that the genes involved in gene clusters (HOX-A to HOX-D) are found development have been well-conserved dur- on chromosomes 7p14, 17q21, 12q13 and 2q31 ing evolution. An important advance, made respectively (Figure 7.1). An amazing obser- possible by molecular medicine, has been the vation is that in all species the genes remain identification of these genes. A better under- aligned in the same relative order as they do in standing of normal development has subse- Drosophila. The degree of conservation between quently provided insight into malformations the genes is so high that vertebrate genes can and their underlying mechanisms. replace their invertebrate counterparts in trans- The success of this work has depended on genic Drosophila embryos. HOX genes are regu- basic biological research utilizing animal mod- lated by retinoic acid, epigenetic effects and els such as the fruit fly Drosophila melanogaster, more recently inhibitory miRNAs have been the mouse, and more recently, the zebrafish. identified. The interspecies conservation of the important HOX genes are involved in the development developmental genes has allowed the equi­ of the body pattern as well as in hematopoi- valent ones in humans to be identified and esis, and the growth of the central nervous characterized. The significance of this work system, axial skeleton and limbs, gastrointes- was acknowledged by the 1995 Nobel Prize tinal tract, and genitalia. Hematopoiesis is an for Physiology or Medicine being awarded to ongoing activity during the animal’s lifetime. E. Lewis, C. Nusslein-Volhard and E. Wieschaus Mutations in the HOX-A13 gene produce clear for their work on the genetic mechanisms in structural malformations, affecting develop- early embryonic development. ment of the hands, feet and genitals, lead- ing to the hand-foot-genital syndrome. One Homeobox (HOX) Genes mutation in HOX-D13 resembles the DNA tri- plet repeat described for Huntington disease In vertebrates and invertebrates, families (Chapter 2) although the repeat is a poly alanine of genes play key roles in development. The or (GCG,GCA,GCT,GCC)n. Such a repeat is same genes can be involved in different peri- unusual because it contains the four different ods of development and in different organs [1]. codons for alanine. These mutations are thought Mutations in Drosophila, which caused a part of to exert a dominant-negative effect – i.e. the the body to be replaced by a structure normally abnormal protein from the one mutated allele found elsewhere, were shown in the early 1980s interferes with the function of the remaining to involve genes called HOX (from homeobox). (normal) protein. In Drosophila, the physical arrangement of these Not surprisingly in view of their key role in genes is identical to the order in which they development, HOX genes can also be associ- are expressed along the head-to-tail axis of the ated with cancer. Chromosomal translocations embryo during development; i.e. the more 59 in some leukemias can lead to fusion proteins a gene is, the closer to the posterior of the ani- with leukemogenic potential such as HOXA9 mal it is expressed in the developing body. The and myeloid leukemia. See also the BCR-ABL HOX genes are called HOM-C in invertebrates. translocation in chronic myeloid leukemia Humans, like most vertebrates, have 39 HOX discussed below. genes arranged in four clusters. These genes have evolved from a single ancestral gene by Homeobox tandem duplication, and then diverged pro- A conserved DNA sequence is found in ducing the HOX cluster. From this, multiple all HOX and other developmental genes. It is

MOLECULAR MEDICINE 7. Development, Aging and Cancer 205

A13 A11 A10 A9 A7 A6 A5 A4 A3 A2 A1

HOX-A 5Ј 3Ј (7p)

B13 B9 B8 B7 B6 B5 B4 B3 B2 B1 HOX-B 5Ј 3Ј (17q)

C13 C12 C11 C10 C9 C8 C6 C5 C4 HOX-C 5Ј 3Ј (12q)

D13 D12 D11 D10 D9 D8 D4 D3 D1 HOX-D 5Ј 3Ј (2q)

Paralogous 13 12 11 10 9 8 7 6 5 4 3 2 1 groups

Transcription 5Ј (posterior) 3Ј (anterior)

FIGURE 7.1 HOX gene clusters [1]. In vertebrates (including humans) there are 39 HOX genes organized into four chromosomal clusters. The genes can be vertically aligned into 13 paralogous groups determined by the homeobox DNA sequence homology. Paralogs are genes in the same species that are so similar in their nucleotide sequences that they are assumed to have originated from a single ancestral gene. The numbering of genes in each cluster is based on their DNA sequence similarity and relative position to each other. Functional genes are represented by colored boxes. In general, par- alogous HOX genes (for example HOX-A7 and HOX-B7) are more similar to each other than adjacent genes on the same HOX cluster (i.e. HOX-B7 and HOX-B6). All genes are transcribed in the same direction (→). However, the 3’ genes (head of body) are expressed before the 5’ genes (tail of body). In vertebrates, the 5’ HOX-A and HOX-D genes are involved in limb development. called the homeobox and is 180 bp in size. The 60 created by rDNA methods in Drosophila have amino acid encoded by the homeobox is called provided evidence that HOX might be the mam- the homeodomain and has DNA-binding proper- malian equivalent of the HOM-C genes, since ties – i.e. homeoproteins are transcription fac- structural deformities of the head and neck tors. Thus, this class of genes can regulate the result. Despite the identification of these highly expression of many other genes. Comparative conserved genes, the search for natural mutants DNA analyses have shown that homeobox has been less fruitful. There are two possible genes evolved from common ancestral genes explanations for this. Mutations in HOX are only and their subsequent divergence reflects the expressed as an abnormal phenotype when both morphological complexity of the organism in alleles are inactivated (in contrast to the dom­ which they are found. For example, insects inant effect of PAX genes – discussed below), and Drosophila have a single cluster of the Hox and paralogous genes from the various clusters genes, while vertebrates have four. can compensate for each other. Although HOX Animal models are used to understand the are the best studied homeobox genes, there role of homeoboxes in development. For exam- are about 200 others dispersed throughout the ple, mutations that occur spontaneously or are genome, or found in clusters such as PAX.

MOLECULAR MEDICINE 206 7. Development, Aging and Cancer

Other Genes The finding of a relatively large number of developmental disorders associated with PAX Paired-box (PAX) Genes contrasts with the HOX genes, and is explained Another conserved DNA sequence is present by the dominant nature of PAX mutations, in mice and also in other species as divergent which need only one of the two alleles to be as worms and humans. This is called the paired mutated in order to be expressed. Like HOX, box. The relevant genes are known as PAX abnormal function in PAX (such as that occur- (paired box). In the human, there are nine of ring in association with a chromosomal translo- these genes dispersed over many chromosomes. cation), can also lead to tumor formation. A 128 amino acid, DNA binding domain in PAX is conserved in mammals and Drosophila. Like SOX Genes the homeobox, this sequence has the proper- Sox proteins constitute a large family of ties of a DNA transcription factor. Some of the transcription factors, characterized by a DNA- PAX genes also contain homeobox domains. binding HMG (high mobility group) domain PAX genes are involved in the development of of about 79 amino acids. This domain is highly sensory organs, the nervous system and cellular conserved, and was first found in the mamma- differentiation at epithelial-mesenchymal tran- lian testis-determining factor SRY gene. Hence, sitions [1]. the name SOX derives from Sry-type HMG box. A number of natural mutants involving The HMG domain has an interesting effect on PAX produce clinical problems (Table 7.1). DNA. First it binds, then it distorts the DNA’s

TABLE 7.1 PAX genes in mammals and their association with human disease [1–3].

Genea Organ/tissue Chromosome Human disease

PAX1 Skeleton, thymus 20p Vertebral malformations PAX2 CNSb, kidney 10q Renal coloboma syndrome PAX3 CNS, neural crest, skeletal 2q Waardenburg syndrome; muscle craniofacial-deafness-hand syndrome PAX4 Pancreas 7q Type 2 diabetes PAX5 CNS, B lymphocytes 9p Lymphoma PAX6 CNS, eye, pancreas 11p Aniridia; other eye problems leading to severe visual impairment PAX7 CNS, cranio-facial, skeletal 1p Rhabdomyosarcoma muscle PAX8 CNS, kidney, thyroid 2q Thyroid dysplasia; congenital hypothyroidism; thyroid tumors PAX9 CNS, cranio-facial, skeletal 14q Abnormal tooth development muscle

aAll PAX genes have a paired domain. In addition, PAX 4, 3, 6 and 7 have a homeodomain. bCentral nervous system.

MOLECULAR MEDICINE 7. Development, Aging and Cancer 207 shape, and by so doing it allows genes in the for miRNAs to act as inhibitors or stimulators DNA to be expressed. of cellular activity allows them to have multiple There are about 20 SOX genes in 7–10 entry sites for tumor development (discussed groups. These genes are involved in a diverse below under Oncogenesis). range of developmental and differentiation activities. Mutations leading to developmental SRY Gene abnormalities in SOX genes include: Male and female development in humans is genetically determined. A number of genes 1. SOX2 – anophthalmia syndrome; have now been shown to be necessary for early 2. SOX9 – campomelic dysplasia, Pierre-Robin gonadal development. Based on mice gene syndrome, and knockout studies, these are now known to 3. SOX10 – Waardenburg syndrome (also include Emx2, Lhx2, Lhx9 and Pax2 (Figure 7.2). caused by mutations in PAX3). This cluster of genes leads to the development of the primitive gonad that will later differentiate miRNA Genes into the testis or the ovary. The molecular events Development involves spatial and tempo- involved in the testis determining pathway are ral regulation of gene expression. The totipotent the better understood. One gene triggering the cells of the early embryo can produce any cell of cascade leading to the development of the tes- the body. However, cells need to separate into tis is SRY (sex determining region of the Y). This those that will differentiate into tissues (somatic gene is found on chromosome Yp11.3, and is cells) and those that will remain undifferentiated intronless. Like the SOX genes described above, to variable degrees (stem cells). In addition, as SRY has a conserved DNA binding domain of cells differentiate some will become germ cells, 79 amino acids (HMG box). Thus, SRY is likely and these have a specific role while still able to to function as a transcriptional regulator similar function as stem cells. In the early embryo, it is to the SOX genes and is thought to be an evolu- the maternal mRNA and proteins that are avail- tionary derivative of Sox-3. able, but as the zygote develops, these need SRY was shown to be the testis determining to be inactivated so the zygote can continue factor (TDF) in 1990. Confirmatory evidence for to develop with its own mRNA and proteins. this includes: (1) About 15% of 46,XY females miRNAs (Chapter 1) play important roles here (i.e. sex reversals) have mutations in SRY, pre- through their inhibitory effects on mRNA [4]. dominantly in the HMG box, and (2) XX mice An important protein in germline develop- have their sex reversed if the Sry gene is added ment is nanos. It is initially maternal in origin, as a transgene. However, since only a very but subsequently miRNAs inhibit its production small proportion of 46,XY sex reversals have in somatic cells while allowing it to continue in SRY mutations, and SRY is only found in mam- the germ cells which then retain their stem cell mals, other genes must be involved in testis status [4]. Reference [4] also provides a compre- determination. These are located on the X chro- hensive table which lists miRNAs, their targets mosome and the autosomes. and how these interactions impact on develop- ment and function. miRNAs take on broader Imprinting roles in gene regulation once somatic cells start the development pathway into specific lineages. Genomic imprinting is found in eutherian These remain important, as shown by the devel- (placental) mammals, and was introduced opmental abnormalities that occur in animal in Chapter 2. Various theories have been models if miRNAs are inhibited. The potential developed to explain why some genes are

MOLECULAR MEDICINE 208 7. Development, Aging and Cancer

Primordial Germ Cells Yolk Sac

Lhx2, Lhx9, Pax2, Emx2

Urogenital ridge

SF1, WT1

Indifferent Gonad

Y chromosome SRY, X chromosome Autosomes SF1, WT1, SOX8, SOX9, WNT4

Testis AMH, AMHR, SF1

SOX9 Testosterone (SRD5A2)

Development of Regression of male genitalia female genitalia

FIGURE 7.2 Development of the gonads [1,5]. A number of genes are involved in development of the gonads that extend from the endoderm yolk sac to the urogenital ridge to form the primitive (indifferent) gonad. They have been identi- fied from mouse work. More is known about the testis pathway compared to the ovary. The SRY gene is well characterized for its role in differentiation of the primitive gonad into the testis, although it is only found in mammals and so other genes are involved. The Y chromosome is needed for testicular differentiation and, in its absence the undifferentiated gonad will develop into the ovary. Once the testis is developed, testosterone (via its activating SRD5A2 gene) will inhibit the develop- ment of female external genitalia, and the AMH, AMHR (anti-mullerian hormone and its receptor) and SF1 (steroidogenic factor 1) genes will inhibit the formation of the internal female genitalia. preferentially expressed by the paternal alleles 2. Genes with a predominant effect on placental and others by the maternal alleles, but none function (usually paternally-expressing). have been proved. Imprinted genes fall into The maternal versus paternal effect is dem- two functional groups: onstrated by two tumors. Ovarian teratomas 1. Those involved in fetal development (in are considered to have arisen from a single general, these are more often maternally- germ cell. They are composed of tissues with expressing genes), and all three germ layers (ectoderm, mesoderm and

MOLECULAR MEDICINE 7. Development, Aging and Cancer 209 endoderm) found in the fetus. All benign forms reproductive cloning (Chapter 8), is whether usually have a normal 46,XX (female) chromo- these procedures lead to disturbances in somal complement. In contrast, hydatidiform imprinted genes or epigenetic programming. moles affect the placenta and are invariably Other known consequences of ART include 46,XY. They are thought to have arisen from twinning, preterm birth and low birth weight fertilization of an empty ovum by a sperm. The (Box 7.1). Although around four million babies haploid sperm chromosome complement is then have now been born by IVF, these numbers duplicated, so all chromosomes are of paternal may be too small to detect potentially impor- origin. It is estimated that 1% of mammalian tant side effects, particularly if these are rare genes are imprinted with abnormalities in the events in normal circumstances. The effects of imprinting process associated with a number of such disturbances are unlikely to show up until rare genetic disorders (Table 2.10). some time into the future, or may not become An important consideration of assisted apparent until children born from IVF have reproductive technologies (ART), including children of their own. Follow-up and molecular in vitro fertilization (IVF) and intracytoplasmic analysis will be essential to address any con- sperm injection (ICSI), and procedures such cerns. In this respect it is interesting that the as somatic cell nuclear transfer (SCNT) and Nobel Prize for Physiology of Medicine in 2010

BOX 7.1 ASSISTED REPRODUCTIVE TECHNOLOGIES (ART) [ 6 ]. These have proven very successful for treat- since there is an increased number of multiple ing infertile couples (about 10% of couples), and gestations with ART. More worrying are the now account for around 1% of births in the USA. reports of higher risks of Angelman syndrome, Since the first IVF (in vitro fertilization) baby in Beckwith-Wiedemann syndrome and retinoblas- 1978, the techniques have evolved to include ICSI toma. However, because these are very rare disor- (intracytoplasmic sperm injection) for infertile ders it is difficult to obtain sufficient numbers for males. In addition, the embryo can be manipu- accurate risk analysis. So the findings are contro- lated and cultured in vitro as well as biopsied versial and will need to be confirmed. If shown for preimplantation genetic diagnosis (PGD). to be correct it will be necessary to determine Around four million children have been born as whether these adverse consequences are second- a result of IVF and some are now having children ary to what is causing the infertility and/or the of their own. Until recently, it was believed that ART procedure itself, particularly the use of ICSI, children born through ART developed normally a technique that could select immature sperms. A and had the same frequency of malformation third explanation for adverse consequences is an rates as the general population. However, there epigenetic defect, particularly imprinting, since it are now reports which suggest that this conclu- is noteworthy that both the two syndromes men- sion may need to be reviewed. These include tioned above are examples of imprinted genetic studies showing an increase in preterm birth and disorders involving the expression of maternal low birth weight, which perhaps is unsurprising genes (Table 2.10).

MOLECULAR MEDICINE 210 7. Development, Aging and Cancer was awarded to R. Edwards for his work on the epigenetic process during spermatogenesis IVF although the first baby was born in 1978. can lead to male infertility [7]. One comment made about the delay in giv- Preimplantation embryo: The egg and sperm ing this award was that it might have reflected have their own epigenetic signatures that uneasiness about the safety of this procedure, are removed (via demethylation) during the which has been addressed by the passage first division after fertilization, and then reset of time. (methylated) before blastocyst implantation. Even modifications that will be re-established Epigenetics go through the erasure and resetting steps. However, the genes that are imprinted main- Epigenetics is assumed to play a role in the tain their methylation status, thereby con- inheritance of complex diseases and the devel- tinuing the parental imprint. The erasure and opment of cancers (see below). During early resetting of the epigenetic mark is essential as development, the epigenetic control of gene the embryo develops and tissue specific genes expression is important for both imprinted and need to be expressed, whereas genes required non-imprinted genes. Epigenetic mechanisms by the embryo up to the preimplantation stage are also being studied to explain how environ- can be switched off. Overall, these changes mental factors might influence development, as mean pluripotential cells in the embryo now discussed briefly in Chapter 2. begin their pathway of differentiation. In normal somatic cells, epigenetic modifica- X chromosome inactivation occurs during tions through DNA methylation are stable and early development, and ensures that males heritable. However, in mammals there are two and females have an equivalent X chromosome critical periods during early development when gene content. X inactivation is started by XIST; the methylation patterns are reprogrammed one of the genes at the X-inactivation center across the whole genome to allow cells to start on the X chromosome (XIST – X-inactive spe- afresh. The two critical periods for epigenetic cific transcript). XIST produces a non-coding reprogramming are at gametogenesis and in the RNA that coats and so inhibits the inactivated preimplantation embryo. X chromosome. In the active X chromosome, Gametogenesis: During the development of the XIST gene is epigenetically silenced the mouse primordial germ cells, there is com- through methylation while it remains unmeth- plete demethylation (erasure) of the whole ylated (transcriptionally active) in the inactive genome, including the imprinted genes. Several X chromosome. days later there is divergence in the epige- netic process with methylation taking place in Developmental Abnormalities and the male germ cells. Methylation of the female Epigenetics germ cells occurs only after birth as the oocytes Cloning by somatic cell nuclear transfer grow. This reprogramming of the epigenetic (SCNT) or reproductive cloning has been stud- pattern is important so that the imprints can ied to determine whether epigenetic repro- be reset on the basis of the sex of the develop- gramming, particularly at the preimplantation ing embryo. Another reason for this universal embryo stage, is normal (Chapter 8). The very reprogramming is the removal of any acquired low yield from these two forms of experimen- epigenetic modifications due to environmen- tal cloning, and the fetal or placental abnor- tal or other genetic effects in the germ cells, malities found in cloned animals, might reflect thereby limiting the potential for transgenera- epigenetic abnormalities, including the unsuc- tional epigenetic inheritance. Perturbations in cessful erasure and resetting of the donor

MOLECULAR MEDICINE 7. Development, Aging and Cancer 211 epigenome. Evidence to date suggests this might significant information about puberty, including be the case – which is relevant to the earlier its variation in onset. discussion of IVF and fetal abnormalities [7]. Like all complex genetic traits, the next step Different regions of the central nervous sys- was to undertake larger GWAS-type research tem have complex regulatory requirements using the female menarche as the phenotype. for gene expression. Therefore, it is not sur- Thirty new genetic loci were identified in a prising that perturbations in the epigenetic meta-analysis of GWAS, although their collec- pathway are found fairly frequently alongside tive effects were modest, accounting for about neurodevelopmental disorders, as described 3.6–6.1% of the variance in age of menarche. It in Table 2.9 for Rett syndrome (DNA methyla- is noteworthy that a link was made between tion abnormality), Rubinstein-Taybi syndrome puberty and the genes that regulate hormonal and Coffin-Lowry syndrome (histone modifica- pathways, as well as with nutrition and body tion defects) and ATRX syndrome (nucleosome weight [9]. These results are consistent with the positioning abnormality). observation that menarche is highly dependent on nutritional status. Epigenetic effects have Puberty also been shown to influence puberty and repro- ductive capacity in animal models [8]. Overall, Puberty is the stage in development which it is likely that the highly complex interactions enables reproductive capacity. It is a complex of genes within functionally related networks genetic trait which has a variable onset, and is might only be understood through a systems affected by both genes and the environment. biology approach (Chapter 4). Evidence based on twin and other studies sug- gests that the genetic component is considerable, ranging from 50–80% [8]. Fertility in the female AGING results from the cyclic and pulsatile release of GnRH, FSH and LH; i.e. hypothalamic- Introduction pituitary-gonadal interactions. The initiation Aging can be considered a complex genetic of puberty is thought to be triggered by a sus- disease that results from various interactions tained pulsatile increase in GnRH. including G x G, G x E, G x Ep, E x Ep and G x The role of genes in puberty has been stud- E x Ep (G – gene; E – environment; Ep – epigen­ ied by using genetic models of delayed puberty, etic). Like all complex diseases, data are being for which there are two naturally occurring accumulated about the genetic components, ones: but we still do not know enough about the 1. Idiopathic hypogonadotropic hypogonadism – epigenetic or environmental effects. These lat- absent spontaneous sexual maturation ter may further complicate the picture if aging associated with low-normal range is influenced by evolution through selection gonadotropins, and for reproductive fitness – i.e. the genes which 2. Kallmann syndrome – delayed puberty in are important in reproduction could influence association with anosmia. (positively or negatively) genes that will subse- quently have an impact on aging. Mutations in a number of genes are the cause of these two rare syndromes, and they Genetic Components have provided insights into the hypothalamic- pituitary-gonadal interactions and the olfactory Genes are important in aging and it is esti- pathway. However, they have not resulted in mated that about 25–35% of the human lifespan

MOLECULAR MEDICINE 212 7. Development, Aging and Cancer

TABLE 7.2 Theories of aging.

Theory Explanation mtDNA DNA (particularly mtDNA with its higher mutation rate than nuclear DNA) undergoes continuous damage from the environment including UV light, radiation, chemicals and endogenous agents such as free radicals and reactive oxygen species that are by products of the cells’ normal metabolic processes. Reactive oxygen species are of interest because of the proximity of oxygen and electrons involved in oxidative phosphorylation. Damage to nuclear DNA is repaired in a number of ways but there are fewer in mitochondria. High energy dependent organs such as the brain are particularly at risk. Any changes are somatic, and not passed onto the next generation so appearing as sporadic disorders. Trangenic mice with a mutation in the mtDNA polymerase PolgA gene have a shortened life span and premature functional decline [11].

Telomere Telomeres comprise (TTAGGG)n repeats that cap and protect the end of chromosomes. With each cell division the telomere loses a little of its end which is repaired and lengthened by telomerase. In vitro, cells can continue to divide for a fixed number of times but this can be extended by increasing the telomerase activity suggesting that the shortening of telomeres may be a factor in the cells’ aging in culture. Transgenic mice with extended telomeres live longer (provided they are protected from developing cancers). Humans with telomerase deficiency develop dyskeratosis congenita, a multisystem premature aging syndrome. Features include bone marrow failure, early graying, dental loss, osteoporosis and malignancy [10]. Epigenetic Epigenetic markers in a number of tissues (DNA methylation, histone H4 acetylation, histone H3 acetylation) change with age in MZ twins. At an early age twins have identical methylation patterns but with time differences develop. This epigenetic drift may explain discordance in late onset diseases in MZ twins. It may also highlight that errors in the epigenetic pathways (which do not have repair mechanisms like DNA) play a role in the normal aging process. One example is increasing methylation of the promoter regions of estrogen receptors as individuals get older [12]. This is reported in the smooth muscle of the circulatory system as well as atherosclerotic plaques occluding blood vessels. The assumption is that increased methylation plays a role in the age-related damage to blood vessels. If proven, there is a potential biomarker as well as a target for novel therapies. Metabolic See text for discussion of the worm C. elegans.

may be influenced by genetic factors. However, cancer and its rare Mendelian counterpart known observations from twin studies suggest that as familial adenomatous polyposis (see below), genetic effects may not become significant until no significant breakthroughs have occurred. the age of about 60 years [10]. Many theories Research strategies in aging also focus on an have been proposed to describe the aging proc- extreme model, i.e. individuals with extended ess. These range from damage to DNA (nuclear lifespan, such as centenarians. In association-type and mitochondrial), changes to telomere gene studies of this cohort, the APOE gene has lengths, epigenetic mechanisms and metabolic consistently shown up, although little is known effects (Table 7.2). about how it might work. It was already noted One approach to identifying genetic factors in in Chapter 2 that the E4 allele of APOE is asso- complex diseases is to study models with a simi- ciated with a higher risk of Alzheimer disease. lar phenotype but more straightforward inher- Not surprisingly, centenarians have a lower fre- itance. These are usually rare Mendelian type quency of this allele. This could mean the other disorders (Table 7.3). Unfortunately, like other APOE alleles (E2 or E3) somehow contribute to similar comparisons, including sporadic colon longevity, or the result might be spurious since

MOLECULAR MEDICINE 7. Development, Aging and Cancer 213

TABLE 7.3 Rare genetic models of aging.

Model Explanation

Hutchinson-Gilford Around 150 individuals reported worldwide; caused by mutations in the lamin A gene syndrome (progeria) (LMNA). The phenotype is precocious senility starting in the first year of life and death from coronary artery disease in the teens. The abnormal protein product from the mutated LMNA gene is progerin but how this leads to premature aging is not known. It is thought the DNA repair or more extensive abnormalities affecting gene expression are responsible [10]. Described as progeria of the adult; involves accelerated aging changes leading to death before age 50 through heart disease or cancer. There is evidence of an unstable genome in this disorder caused by mutations in the WRN gene. The protein of this gene is also involved with telomere maintenance which might be another mechanism for accelerated aging [10]. Disorganized development Brooke Greenberg is a female with chronological age 16 but physical and cognitive phenotypes equivalent to an infant. However, her bone age is estimated at 10 years and her telomere length approximates her chronological age. She has no known genetic or chromosomal abnormalities although whole genome sequencing is yet to be reported. It is proposed that her problem reflects a developmental process that is uncoordinated due to abnormalities in putative developmental regulator gene(s) [13].

there will be fewer individuals living with the E4 degenerative and other diseases. Instead it seems allele as many would have died from Alzheimer that these are postponed. The pathways of partic- disease. Recently, a genome-wide linkage study ular interest to aging are those that involve stress- of 279 families with multiple long-lived siblings response genes or nutrient sensors [15]. In times showed possible loci of interest on chromosomes of plenty stress levels are low and genes in these 3p24-22, 9q31-34 and 12q24. Candidate genes pathways support growth and reproduction. have been identified in these regions [14]. No In contrast, when there is less food and stress doubt results from whole genome sequencing results, the pathways support cell protection and will soon emerge and provide a more in depth maintenance which also extends the lifespan. view of what genes might be important in these The above observations fit in nicely with regions. This goal received a boost in late 2011, dietary restriction, which is the one consistent and when the Archon X Prize was modified to focus unequivocal finding associated with prolonging on sequencing 100 whole genomes of those aged life. This has been reproducibly found in many 100 years or older (Box 4.2). species under different experimental condi- tions, and has also been shown to be related to Animal Models a number of nutrient pathways including insu- lin/insulin-like growth factor (IGF1) signaling. Results from animal studies first suggested In C. elegans, mutations in the equivalent of the that aging was not simply a gradual decline in insulin/IGF1 pathway can double the worm’s the cellular processes, but was also under the lifespan. It has been shown that there are actu- control of signaling pathways and transcription ally three nutrient sensing pathways in C. elegans factors described in other biological activities. that respond to different forms of food limita- Somewhat surprisingly, studies have also sug- tion, and they work through transcription fac- gested that prolonging the lifespan of animals tors influencing expression of many genes. does not mean an accumulation of life-related Intriguingly there are reports of mutations in

MOLECULAR MEDICINE 214 7. Development, Aging and Cancer the IGF1 gene receptor being overrepresented in Today, evidence for a genetic component in a cohort of Ashkenazi Jewish centenarians and cancer includes: DNA variants in the insulin receptor gene being l Normal cells can transform into tumors by linked to longevity in a Japanese cohort [15]. gene transfer studies with oncogenes. The GenAge Database provides an extensive l Individuals with a genetic defect in DNA catalog of genes (aging, longevity, differentially repair have an increased risk of cancer. expressed) from humans and model organisms. l DNA mutagens, such as chemicals or Its statistics page shows that C. elegans has 555 physical agents, elicit tumors in animals. gene entries related to aging while the human l Structural chromosomal rearrangements has 261. Other model organisms include D. mel- can predispose a subject to tumor anogaster (75 genes), mouse (68 genes) and yeast development. (87 genes) [16]. l Somatically acquired DNA mutations can resemble those seen in familial cancers. ONCOGENESIS l In vivo gene manipulation can produce tumors in transgenic mice. Introduction Just like genetic disease, cancers demonstrate two distinct inheritance patterns: One of the earliest breakthroughs in cancer research came in 1910 when P. Rous implicated 1. Rare but highly penetrant genes transmitted viruses in cancer by showing that a filterable as Mendelian disorders, such as familial agent (virus) was capable of inducing cancers adenomatous polyposis. This affects young in chickens. At about this time (1914), T. Boveri adults, whose cancer develops at around proposed his chromosomal theory of cancer. 40 years of age (see below), and These early discoveries were pivotal but had to 2. Common but low penetrant genes, similar wait over 50 years before they were confirmed to what is proposed for complex genetic and the molecular cancer era started (Table 7.4). disorders. Pathogenesis involves the

TABLE 7.4 Fundamental discoveries contributing to our understanding of cancer pathogenesis [17–20].

Discovery Implications

1910: Viruses cause cancer Filterable agent (virus) shown to induce cancers in chickens. It was called RSV (Rous sarcoma virus). P Rous awarded a Nobel Prize in 1966 for this work. 1914: Chromosomal theory P Boveri proposed that tumors grew because of abnormal segregation of chromosomes of cancer to daughter cells. Other predictions later proven to be correct, e.g. cell cycle checkpoints, oncogenes, tumor suppressor genes, predisposition and genetic instability. 1960: Philadelphia D Hungerford suggested that a chromosomal rearrangement (Philadelphia chromosome) chromosome in chronic myeloid leukemia caused this disease. As new cytogenetic techniques develop, particularly banding, it becomes possible to show that there are consistent changes in chromosomes 9 and 21 (the components of the Ph chromosome) and how this translocation caused leukemia. 1971: Two-hit hypothesis for Based on epidemiological data and observations of retinoblastoma and Wilms tumor, cancer A Knudson proposed that two hits are needed for tumor development (Box 7.2).

(Continued)

MOLECULAR MEDICINE 7. Development, Aging and Cancer 215

TABLE 7.4 (Continued)

Discovery Implications 1976: First proto-oncogene In a normal avian model, the cellular equivalent of the src retroviral gene discovered. The described name src used because of its similarity to the transforming retroviral gene (oncogene) of the Rous sarcoma virus (src  abbreviation for sarcoma). The normal version of this gene called a proto-oncogene. In 1982 DNA from a bladder cancer cell line cloned and shown to induce cancerous transformation in other cells. Mutations in the normal precursor gene HRAS (a proto-oncogene) produced the homologous tumorigenic oncogene. For their work on oncogenes J Bishop and H Varmus were awarded a Nobel Prize in 1989. 1976: Tumor clonal evolution P Nowell proposed that tumors arise from a series of evolutionary steps. They start theory from an initiating event that moves a normal cell into the tumor pathway. Subsequently, additional mutations lead to genomic instability and different tumor clones are produced with growth advantages. Finally, the tumor has multiple clones causing the genetic heterogeneity observed in cytogenetic studies. 1983: Epigenetic mechanisms Reductions are observed in the methylation content (and so potential gene activation) implicated in colon cancer cells compared to normal. Subsequently cancer epigenome shown to have global changes in DNA methylation (genome wide hypomethylation and site specific CpG promoter hypermethylation) and histone modifications. This provides other mechanisms for tumor development. 1986: First tumor Cellular sequences that repress or control growth led to the finding of tumor suppressor suppressor gene cloned genes (TSGs). Loss or mutation of TSG DNA through genetic and/or acquired events (retinoblastoma gene) produced unregulated cellular proliferation and hence neoplasm. Many oncogenes and TSGs would emerge, with an important one TP53 found in 1989. It is the most commonly mutated gene in human cancers. 1991: APC gene cloned The causative gene for the Mendelian genetic disorder familial adenomatous polyposis causing colon cancer discovered. Mutations in this gene also found in sporadic colon cancer although they alone do not explain the pathogenesis for sporadic colon cancer. 1993: Defects in mismatch A second class of genes leading to another cause of genetic colon cancer (hereditary repair gene nonpolyposis colon cancer) found. They provide additional insight into the role of DNA repair in cancer. 1994: Breast cancer genes Major breast cancer genes (BRCA1, BRCA2) isolated. Initially thought to be highly cloned penetrant but subsequently variable penetrance exhibited. Disappointingly, they give little insight into the common sporadic forms of breast cancer. Late 1990s: Telomerase and Telomerases add a TTAGGG repeat to the telomere to prevent its shortening. Activation cancer of telomerase can lead to tumor formation. For their work on telomeres and telomerase, E Blackburn, C Greider, J Szostak were awarded a Nobel Prize in 2006. 2001: Cyclins and CDK as Controlling the phases of the cell cycle is essential and any perturbations can lead to regulators of the cell cycle tumor development. For their work on two important cell cycle regulatory molecules L Hartwell, R Hunt, and P Nurse were awarded a Nobel Prize in 2001. 2002: ncRNA and leukemia A class of ncRNA (miRNAs) was shown to be deleted/down-regulated in chronic lymphocytic leukemia. miRNAs can function as oncogenes or tumor suppressor genes depending on cell type involved. For their work on RNAi (another class of ncRNA) A Fire and C Mello were awarded a Nobel Prize in 2006. 2010: Launch of International Multinational consortium to catalog 50 different tumors by whole genome sequencing. In Human Genome Consortium addition, proposed to generate transcription and epigenomic datasets for some tumors. The first whole genome sequence for a cancer (acute myeloid leukemia) reported.

MOLECULAR MEDICINE 216 7. Development, Aging and Cancer

accumulation of mutations in many genes As will be described below under familial ade- over time and the environment plays a role. nomatous polyposis, the multistep progression of An example would be the more common cancer relies on a series of mutations in key genes, sporadic forms of colon cancer that typically with each step allowing the next to take place. present in people in their 60s. A more recently recognized enabler is the ability of tumor cells to provoke host immune responses. For many years, tumorigenesis was hypoth- These lead to an enhancement of tumor progres- esized to be a multistep process, but only the sion via the hallmarks described. Although there application of recombinant DNA techniques would seem to be many different genes involved provided evidence for this. It is now possible to in development of cancer, they fall into a limited identify molecular (DNA) changes responsible number of classes, and the same genes appear to for the initiation, promotion and progression of play a role in many cancers. cancers. The ability to define mutations at the An individual’s response to cancer involves DNA level has also enhanced the accuracy of many variables, such as the state of their diagnosis. Therapeutic options based on knowl- immune system, their nutrition and well being, edge of the DNA changes in tumor tissue are the extent of disease, their response to treat- now used in personalized medicine to treat can- ment, and the development of drug resistance. cer (see Somatic Cell Cancers below). Genetic effects utilize different classes of genes Cancer is heterogeneous in its presenta- including oncogenes, tumor suppressor genes, tion, clinical type, biologic progression, and miRNA genes, apoptotic genes, repair genes treatment options. These involve seemingly and epigenetic modifications to DNA. multifaceted interactions. However, at the molecular level it is now apparent that cancers share similar pathways, thereby allowing a bet- Oncogenes ter understanding of pathogenesis, and giving The RNA tumor viruses (retroviruses) pro- opportunities for targeted therapies with new vided the first proof that genetic factors can play drugs. There are a number of breakdowns of a role in carcinogenesis. Retroviruses have three normal cell function in cancers. These give can- core genes (env, gag – coding for structural pro- cer cells the ability to survive, proliferate and teins and pol – codes for reverse transcriptase) disseminate. They have been described as the (Figure 7.3). Reverse transcriptase is an enzyme Hallmarks of Cancer [23] and include: that allows RNA to be converted into cDNA. In this way, the retrovirus can make a DNA copy 1. Sustaining proliferative signaling; of its RNA which can then become incorpo- 2. Evading growth suppressors; rated into the host’s genome. D. Baltimore, R. 3. Resisting cell death; Dulbecco and H. Temin were awarded the Nobel 4. Enabling replicative immortality; Prize for Physiology or Medicine in 1975 for their 5. Inducing angiogenesis; work on reverse transcriptase and retroviruses. 6. Activating invasion and metastasis; A fourth gene (oncogene) gives retroviruses 7. Re-programming energy metabolism, and the ability to induce tumor growth in vivo or to 8. Evading immune destruction. transform cells in vitro. In the latter situation, Two enablers are required for the above cells lose their normal growth characteristics and breakdowns to occur: acquire a neoplastic phenotype. Viral DNA and RNA sequences having transforming properties 1. Genomic instability and mutation, and are called viral oncogenes (v-onc). Their names 2. Tumor promoting inflammation. are derived from the tumors in which they were

MOLECULAR MEDICINE 7. Development, Aging and Cancer 217

5Ј 3Ј

1 CAP GAG POL ENV AAA

2 GAG POL ENV SRC

3 LTR GAG POL ENV SRC LTR

Envelope Glycoproteins (ENV)

Internal capsid proteins (GAG)

RNA genome = Reverse transcriptase (POL)

FIGURE 7.3 The structure of a retrovirus. (1) RNA tumor viruses (retroviruses) have an RNA genome. This RNA has two features of eukaryotic mRNA, i.e. a capped 59 end and a poly-A tail at the 39 end. Retroviral RNA codes for three viral proteins: (i) a structural capsid protein (gag) which associates with the RNA in the core; (ii) the enzyme reverse transcriptase (pol), and (iii) an envelope glycoprotein (env) which is associated with the lipoprotein envelope of the virus. (2) Transforming retroviruses have an oncogene. In the example here the oncogene is that of the Rous sarcoma virus (src). (3) Retroviruses are so named because they have a RNA genome and are able to replicate through formation of an intermediate (provirus) which involves integration of the retroviral genome into that of host DNA. The provirus has LTRs (long terminal repeats) on either side of the RNA genes. The LTRs are several hundred base pairs in size and insert adjacent to smaller repeats derived from host DNA. first described. For example, v-sis, Simian sar- are complex and may involve interactions with coma; v-abl, murine Abelson leukemia; v-mos, other proto-oncogenes. Proto-oncogenes can act Moloney sarcoma, v-ras, rat sarcoma, but v-src is at multiple stages in cell growth and they are from virus sarcoma-­producing. Viral oncogenes activated into becoming oncogenes by different have cellular homologs called cellular oncogenes pathways (Figure 7.4) [24]. Some examples of (c-onc). proto-oncogenes are given in Table 7.5. Oncogenes are important for cancer develop- ment because in their normal state (where they Tumor Suppressor Genes are called proto-oncogenes) they provide the cell with stimulatory signals. Aberrant function leads The identification of proto-oncogenes and to uncontrolled stimulation which is dominant oncogenes in the pathogenesis of cancer was an in type; i.e. one of the two alleles is abnormal. exciting development in molecular medicine. There are many proto-oncogenes. Their roles However, only about 20% of human tumors

MOLECULAR MEDICINE 218 7. Development, Aging and Cancer

were propagated in culture, the tumor pheno­ Cell type became re-established. This effect was proliferation seen in a wide range of tumor lines and was Apoptosis considered to indicate the influence of TSGs derived from the normal cells. Subsequent loss Transcription of chromosomes, which occurred on serial pas- factors sage of cell lines, enabled reversion to the neo- Chromatin remodeling plastic phenotype when the TSGs were lost. Growth Sophisticated molecular techniques, particu- factors larly gene knockouts, enable specific genes to Growth factor receptors be inactivated in transgenic mice. These studies Signal have shown definitively that genes can function transduction as tumor suppressors. Apoptosis In contrast to oncogenes that work through regulators a gain-of-function (stimulation), the TSGs normally inhibit cellular activities, and so in Oncogenes promoting cancer they work through loss-of- function (inhibitory) mechanisms. Their effects are recessive rather than dominant – i.e. both Viruses alleles need to be inactivated. Some examples are found in Table 7.6. Chromosomal The different roles played by TSGs are still rearrangements being defined. Ways in which these genes in Mutations their wild-type (normal) configurations can Amplification prevent the development of cancer include: 1. Inhibiting cell proliferation; Activators 2. Inducing differentiation or cell death, and 3. Stimulating DNA repair. These three will be described in more detail FIGURE 7.4 The oncogene pathway. Proto-oncogenes below. However, it should be noted that the genes can be activated into oncogenes by various changes to involved in regulating the cell cycle and apopto- DNA. Oncogenes can then disrupt normal cell proliferation or apoptosis through interference with a number of normal sis, for example, TP53, can also indirectly contrib- cellular mechanisms. ute to DNA repair since they slow down the cell cycle (or stimulate apoptosis) and so assist the showed changes in these genes. Oncogenes DNA repair enzymes to correct any defects. were not abnormal in the inherited cancer syn- The two-hit model for carcinogenesis dromes. Thus, other molecular explanations (Box 7.2) works well with a tumor such as were sought and these led to the identification hereditary retinoblastoma which involves the of the tumor suppressor genes (TSGs). TSG RB1. However, not all TSGs play the same Early experimental evidence for TSGs came dominant role in tumorigenesis and this sug- in the late 1960s, from murine cell hybrids gested there are different types [25]: formed by fusions between normal and tumor cells. These were found to revert to the normal l Gatekeeper TSGs play a central role in the phenotype. Subsequently, as the hybrid clones regulation of cellular proliferation by

MOLECULAR MEDICINE 7. Development, Aging and Cancer 219

TABLE 7.5 Examples of proto-oncogenes.

Class General function Examples

Growth Factors Act via cell surface receptors to induce cellular SIS – codes for the β chain of the platelet division. derived growth factor (PDGF). Receptor tyrosine Binding to their membrane receptors is the first erbB family – ERBB1 – epidermal growth factor kinases step in delivery of mitogenic signals to the cell’s receptor 1; ERBB2 (or HER-2/neu) – epidermal interior to initiate cell division. growth factor receptor 2. Signaling Transduction: Method by which the extracellular RAS – membrane associated G protein and growth factor at the cell surface receptor transfers activates signaling pathways. (transduces) its signal to the nucleus by a number of ways including via G proteins. Transcription factors Proto-oncogenes can encode nuclear binding MYC – major role in control of cell proliferation factors and in this way control gene expression. and apoptosis. This is the most commonly associated oncogene in human tumors.

TABLE 7.6 Examples of tumor suppressor genes.

Class General function Gene(s)

Cell surface proteins Adenomatous polyposis gene APC – interacts with β catenin (familial colon cancer) Cell cycle factors Inhibitors of cell cycle progression TP53 – acts as transcription factor in two key cellular pathways involved in damage or stress: cell cycle and apoptosis (Box 7.3). RB1 – cell cycle regulator Apoptosis Programmed cell death BCL2 – Opposite effect to TP53, i.e. blocks apoptosis and so prolongs a cell’s life. DNA repair DNA repair of double-stranded ATM (ataxia telangiectasia gene) breaks

Mismatch repair MLH1 (hereditary non-polyposis colon cancer gene – Box 7.4)

inducing cell death or cell cycle arrest in genes involved in telomere maintenance. cells that have accumulated cancer-forming Mutations in these genes do not lead directly mutations. These are highly significant and to tumor formation but set up the unstable lead directly to tumor formation. Restoring genetic environment for tumors to develop normal gatekeeper function should control through mutations in gatekeeper genes or tumor development. Examples include RB1, proto-oncogenes. TP53 and APC (see Colon Cancer below). l Landscaper TSGs have their effect through l Caretaker TSGs do not directly regulate the production of an abnormal stromal cellular proliferation, but play a more environment. This milieu promotes tumor global role maintaining genome integrity by development. An example might be the protecting against damage and mutations. increased risk of colon cancer in ulcerative Examples are the DNA repair genes and colitis. This inflammatory bowel disorder

MOLECULAR MEDICINE 220 7. Development, Aging and Cancer

BOX 7.2 TWO-HIT MODEL FOR TUMORIGENESIS. Most tumors occur in adult life and their develop one or more tumors. On the other hand, frequency increases with age, consistent with sporadic forms of the tumor required two sepa- an accumulation of DNA mutational events. rate somatic events. The second hit must occur Cancers occur in childhood much less fre- in the same cell lineage that has experienced the quently. In these circumstances, it is thought first or predisposing hit. The probability of this that a different mechanism is operational. Two is relatively low, and so sporadic forms of the examples are retinoblastoma (origin from primi- tumor occur later in life and have the additional tive retinal stem cells) and Wilms tumor (origin features of being unifocal and unilateral. The from primitive renal stem cells). Generally these discovery of the retinoblastoma tumor suppres- two childhood tumors occur sporadically, but sor gene (RB1) and then Wilms tumor suppres- in some cases they are inherited in a Mendelian sor gene (WT1) validated Knudson’s hypothesis. fashion and associated with multicentric or bilat- It was also shown that in most cases the germ- eral tumor formation. Today, we know that the line hit involved a point mutation that had been latter have arisen because of germline mutations inherited, and the second or somatic cell hit was in TSGs. However, it was only in the early 1970s an acquired deletion in the remaining wild-type that A Knudson proposed a two-hit model for allele. The deleted second allele was detected tumorigenesis based on epidemiological stud- because of loss of heterozygosity – i.e. it could ies of these two tumors. Knudson’s hypothesis be shown by DNA testing that one of the two required, in either the sporadic or genetic forms expected polymorphic DNA markers present of retinoblastoma, the tumor cells to acquire two in germline DNA was found to be missing in separate genetic changes in DNA before a tumor somatic cell DNA because of the acquired dele- developed. The first, or predisposing, event tion at this locus. The RB1 gene and its protein could be inherited either through the germline (pRb) was a tumor suppressor gene that func- (familial retinoblastoma) or it could arise de novo tioned as a key regulator of the cell cycle path- in somatic cells (sporadic retinoblastoma). The way, i.e. a gatekeeper, blocking progression from

second event occurred in somatic cells. Thus, the G1 phase of the cell cycle. Mutations or dele- in sporadic retinoblastoma both events arose tions of RB1 would lead to permanent cell cycle in the retinal (somatic) cells. In familial retino- dysregulation eventually leading to tumorigen- blastoma the individual had already inherited esis. The two-hit hypothesis is not accepted by one mutant gene and required only a second all and in a recent report, the observations made hit affecting the remaining normal gene in the from unilateral retinoblastoma cases (which are somatic cells. The frequency of somatic muta- more likely to be sporadic cases) are used to tions was sufficiently high that those who had argue that Knudson’s theory is an oversimplifi- inherited the germline mutation were likely to cation [21,22].

leads to damaged epithelium which is Other ways in which TSGs can work is by constantly replaced and the regeneration limiting a cell’s proliferative capacity by induc- required provides the environment for tumor ing it to undergo differentiation. In this way, development. the relatively greater mitotic activity seen in the

MOLECULAR MEDICINE 7. Development, Aging and Cancer 221

BOX 7.3 TUMOR SUPPRESSOR GENE T P 5 3. This gene (also written p53 or P53) has been There is also evidence that TP53 may have described as the most significant cancer-related a negative effect on angiogenesis, an essential gene. It is a TGS implicated in both inherited and property for solid tumors to progress. When sporadic cancers and is the most frequently altered DNA is damaged, TP53 mediated pathways TGS in human non-hematopoietic malignancies. attempt to repair the injury through arrest of The gene’s importance is suggested by its evo- the cell cycle and stimulation of the DNA repair lutionary conservation, with mouse and human mechanisms. When repair is not successful, TP53 proteins having around 80% homology. The gene stimulates the apoptotic pathway to remove the is expressed in all cells, and functions as a TSG damaged cell. In normal cells, the level of TP53 by inhibiting the transformation of cells in cul- is low but following exposure of the cell to DNA ture by oncogenes, and the formation of tumors damaging agents, e.g. irradiation or certain in animals. Transgenic mice that have both TP53 chemicals, hypoxia, the level of the p53 pro- genes inactivated by gene knockout are normal tein dramatically increases. The 53 kDa protein at birth but, by 6–9 months of age, 100% develop encoded by TP53 is a transcription factor which a range of cancers. In humans, inheritance of a can regulate a number of genes at the DNA level.

mutated TP53 gene can produce the serious multi- p53 blocks progression of the cell cycle in the G1 organ cancer syndrome called Li Fraumeni asso- phase. This allows DNA repair to occur prior ciated with sarcomas, breast and brain cancers to entry into the S phase. The cell cycle effect of and leukemia. Cancers shown to have mutations p53 ensures that damaged DNA is not allowed affecting the TP53 gene include colon, lung, brain, to replicate, hence it has been called the guardian breast, melanoma, ovary and chronic myeloid of the genome. Mutant TP53 forms demonstrate leukemia in blast crisis. Defects observed lead to altered growth regulatory properties and can loss of both alleles in 75–80% of cases with one also inactivate normal (wild-type) p53 protein; defect often a deletion and the second a missense i.e. a dominant negative effect since inactivation of point mutation which can produce an abnormal one of the two tumor suppressor loci produces protein. Another way to interfere with TP53 is what appears to be a dominant phenotype since through the binding of exogenous viral antigens the mutant protein inhibits or interferes with the or cellular oncogenes to the normal p53 protein. product from the remaining normal allele [21]. TP53 plays a key role in inhibiting tumor develop- ment through multiple mechanisms: 1. Checkpoint control of the cell cycle; 2. Induction of apoptosis, and 3. Stimulation of the DNA repair mechanism.

miRNA Genes undifferentiated cell gives way to an end-cell that divides less frequently. As well as prevent- The increasing significance of ncRNA in gene ing the formation of tumors, an additional role function was discussed in Chapter 1. miRNA is for TSGs lies in normal development as dis- one type of ncRNA (Table 1.8). It inhibits trans- cussed earlier in this chapter. lation and can also produce some degradation

MOLECULAR MEDICINE 222 7. Development, Aging and Cancer

BOX 7.4 HEREDITARY NON-POLYPOSIS COLON CANCER (HNPCC OR LYNCH SYNDROME). This is a colon cancer that is transmitted as an genes required for mismatch repair of DNA. autosomal dominant genetic disorder. Affected For the reasons already mentioned, DNA muta- individuals are at high risk of early onset colon tion testing is demanding and so a screening cancer which is predominantly located in the test based on DNA microsatellite instability can right colon. Although rare (about 1–4% of color- be used to identify which cancers might repre­ ectal cancers) it is important to detect so that sent HNPCC. An alternative approach is to the affected individual (and other at-risk fam- use immunohistochemical staining for protein ily members) can be monitored and the colon products of the mismatch repair genes in tumor removed prior to cancer becoming established. samples. Finding microsatellite instability does Life time risk for developing cancer is about not mean HNPCC is present because it is also 60–70%, with onset usually occurring in the mid found in sporadic cancers. Nevertheless, it is a forties. Lynch syndrome is also associated with pointer to Lynch syndrome and makes detec- extracolonic tumors including stomach, small tion of mutations in the mismatch repair genes bowel, biliary tract, uterus, ovary and kidney. more likely. The 15% of sporadic colorectal can- Because HNPCC is difficult to diagnose clini- cer with microsatellite instability are thought to cally, a set of criteria known as the Amsterdam have occurred through inactivation of the mis- criteria have been defined but cases are still match repair genes by epigenetic changes such missed. HNPCC is caused by mutations in as hypermethylation [21].

of mRNA – i.e. miRNA inhibits gene expres- so inhibit tumor formation. Specifically, miR-15a sion. miRNAs are thought to regulate over 30% and miR-16-1 are thought to inhibit the anti- of mRNAs and play important roles in develop- apoptotic BCL2 oncogene which is important ment, differentiation, cell proliferation, apoptosis for the survival of the malignant lymphocytes. and responses to stress. Tumor profiles based Therefore, loss of function of these two miRNAs on gene expression, identified through microar- will promote abnormal tumor cell survival. rays or RT-PCR, have demonstrated that both In contrast, the miR-17-92 cluster was shown loss and gain of miRNA expression are associ- to be over-expressed in some lymphomas. ated with tumor development. It has also been Thus, whether miRNAs function as oncogenes shown in regions of chromosomal translocations or TSGs relies to some extent on the associated where there are no apparent oncogenes or TSGs cell or tissue. The list of miRNAs functioning as present, that miRNAs genes are located in the oncogenes is shorter, but involves a wide range relevant breakpoints. of malignancies both hematologic and solid. For The first clear example of miRNA involve- example, miR-21 is up-regulated in many can- ment in cancer was chronic lymphocytic leuke- cers where it appears to block apoptosis [26]. mia, as noted in Table 7.4. Here the expression of The expression of genes coding for miRNAs two miRNAs (miR-15a and miR-16-1) was inhib- can also be altered (down-regulated) via epige- ited. The downstream effect of these miRNAs, netic modification of their promoters through which act as TSGs, is to induce apoptosis and CpG methylation. Demethylating drugs such

MOLECULAR MEDICINE 7. Development, Aging and Cancer 223 as 5-aza-2’-deoxycytidine can restore the func- G tion of these miRNAs. Treatments based on 0 replacing inhibited miRNAs or turning off up- regulated ones are being trialed either through M direct introduction of the miRNA or with the use of gene expression vectors (Chapter 8). As noted earlier, there are many different genes and mul- tiple changes in genes in cancers. However, at G2 the molecular level this heterogeneity becomes G1 less apparent because common pathways can be identified. Since a single miRNA can inhibit many mRNAs its effects would spread across S multiple pathways. This potential advantage of miRNA-based therapy would need to be bal- anced by the side effects that would emerge if normal physiological pathways were also FIGURE 7.5 Cell cycle. The cell cycle has four distinct stages. G1, S, G2 and M. G1 and G2 (G  gap) are growth inhibited. phases that prepare the cell for the important S (DNA syn- thesis) or M (mitosis phase). Each phase needs to be com- pleted in the correct order. There is also the Go or quiescent Cell Cycle phase where the cell has left the cycle and stopped divid- The cell cycle consists of a series of highly ing. The  indicate the position of two key check points, although there are others. At these checkpoints the cell eval- ordered events leading to duplication and divi- uates progress and can arrest the cycle if repair is needed. sion of a cell. The process requires production Checkpoints can also lead to activation of apoptosis if cell of new DNA, segregation of chromosomes, damage cannot be repaired. Different cyclins are produced at mitosis and then division. Extracellular signals various stages of the cell cycle. Growth factors and mitogenic control entry into, exit from and progress of the signals induce cells to leave the quiescent (Go) phase and progress through G . Oncogenes promote growth, and are cell cycle. At key points in the cell cycle, signal- 1 particularly involved with the G1 phase. TSGs inhibit cell ing pathways monitor the progress of upstream cycle and promote apoptosis particularly in the S phase. events prior to a cell progressing further. These Repair genes do their work during S and G2. In a rap- monitoring stages in the cell cycle are often idly proliferating somatic cell, the entire cell cycle can take called checkpoints. The cell cycle is divided up to 24 hours to complete. G1 is the longest phase (about 12 hours). S phase about 7 hours; G2 4 hours and mitosis into five components: 1 hour.

Go – resting phase with cells having their 2n (diploid) DNA content; cell proliferation versus growth arrest, DNA G – cell growth phase (2n); 1 repair or apoptosis [27]. This is accomplished S – DNA synthesis phase (4n); through a series of positive and negative sig- G – cell growth phase (4n); 2 nals that determine whether cells will continue M – mitotic phase (4n→2n) (Figure 7.5). to live or die. The complex steps involved in A critical step in control of the cell cycle the cell cycle are now better understood at the comes at the G1 to S transition. After this point, molecular level. the cell is irreversibly committed to the next cell The cyclins are key components that have division. stimulatory effects on the cell cycle. They work in Cellular and tissue integrity requires an concert with their catalytic partners the cyclin exquisite balance between DNA synthesis and dependent kinases (CDK) to hyperphosphorylate

MOLECULAR MEDICINE 224 7. Development, Aging and Cancer the products of the retinoblastoma TSG fam- G1-S and G2-M, although others exist [27]. The ily. As a result, the E2F transcription factor is TSGs can reduce the potential for tumor forma- released and leads to up-regulated expression of tion by interfering with the progress of the cell genes that are crucial for cell cycle progression. cycle until damaged DNA is repaired. One of Here the retinoblastoma pathway is fundamental the key players here is TP53, which responds to normal cellular proliferation as well as tumor to DNA damage by stimulating the expression formation. Not surprisingly, retinoblastoma pro- of multiple proteins including p21 that induces teins play a role in a range of tumors apart from G1 phase cell cycle arrest to allow time for DNA the classic example of genetic retinoblastoma. repair mechanisms. In contrast, inhibitory influences on the cell cycle come from a series of checkpoints that Apoptosis respond to multiple internal and external stim- uli. Checkpoint control pathways sense damage Development as well as ongoing mainte- and respond to it. Mutations in these pathways nance of many adult tissues relies on a bal- lead to genetic instability. Two main families of ance between proliferation, differentiation and CDK inhibitors exist: CIP/KIP and INK4. The cell death. Cell death can occur by necrosis or best studied cell cycle checkpoints are found at apoptosis (Figure 7.6). Apoptosis is a highly

Cell Death

Necrosis “dirty” Apoptosis “clean”

Overwhelming Development or Unwanted cells External cell injury Genetic trauma control

Morphological Cell lysis changes Biochemical changes Content release (caspases)

Inflammation Apoptosis bodies

FIGURE 7.6 Mechanisms for cell death [28]. Cells will die by necrosis in response to significant trauma or injury. This is a dirty death because cells lyse and release their contents into the extracellular space leading to inflammation. This can cause further cellular damage or death. In contrast, apoptosis is a cleaner way for cells to die and is under strong genetic control. It allows tissue homeostasis to be maintained during normal development and takes out cells that are damaged or unregulated. During apoptosis, there is condensation of the nucleus and cytoplasm. A family of cysteine proteases (caspases) is activated to cleave certain polypeptide chains. The cells that will die by apoptosis are fragmented into smaller membrane enclosed apop- totic bodies which are then removed through phagocytosis. Inflammation is not a major component of this pathway.

MOLECULAR MEDICINE 7. Development, Aging and Cancer 225 regulated multistep process comparable to involved in the extrinsic pathway via activation what is seen with the cell cycle. Both share key of cell surface (death) receptors. These induce regulators. Virtually all cells have an inbuilt apoptosis through release of caspases 3, 6 and 7. apoptotic program that is triggered by a vari- Other factors controlling apoptosis include ety of stimuli (growth factor withdrawal, geno- the MYC oncogene which promotes apop- toxic insults, UV irradiation) thereby ensuring tosis, and the TP53 TSG which can induce a maintenance of cellular integrity. There are two damaged cell to undergo apoptosis, and so major apoptotic pathways, the intrinsic or remove a potential focus for tumor formation. stress pathway and the extrinsic or death recep- Cells damaged by chemotherapeutic agents tor pathway [24,28]. will stimulate the production of TP53 and so undergo apoptosis. This additional anti-tumor Intrinsic Pathway effect may explain why cancers with wild-type The significance of this pathway is shown TP53 genes respond better to treatment. On the by its evolutionary conservation – the same other hand, a mutant TP53 gene cannot func- genes are found in many species, including tion in this way, and damage to the cancer cell humans. It involves both inhibitory and stimu- produced by the chemotherapy will accumulate latory branches which ultimately merge, result- in cells that have not been directly killed by the ing in activation of caspases 3, 6 and 7 and treatment. These cells could then form a new hence apoptosis. Some key genes in this path- clone of more malignant, treatment-resistant way are BAX (BCL2-associated X protein) and tumor cells. BCL2. BAX produces a protein that alters the mitochondrial membrane permeability which DNA Repair stimulates apoptosis. In contrast BCL2 is anti- apoptotic, i.e. it protects cells from death. BCL2 Unlike RNA, proteins and other cellular and BAX genes share considerable homology components that are continuously replaced, despite having opposite effects. Lymphoid cells DNA does not undergo a regular turnover. exposed to an activated BCL2 gene following DNA is also exposed to many damaging agents a chromosome 14:18 translocation eventually both exogenous and endogenous, such as oxi- develop into a malignant lymphoma, because dants, ultraviolet and ionizing radiation, chemi- spontaneous mutations which occur in these cals and mutagens. Therefore, in response to cells are unable to be contained by the cell damage a DNA repair system is required. Its dying, and so they accumulate (discussed fur- importance is confirmed by the finding that ther under Somatic Cell Cancers). Following many of its genes are evolutionarily conserved. normal stress, proteins are released that inhibit There are a number of DNA repair pathways BCL2 and so activate apoptosis. There are other containing over 150 genes. Some examples are anti-apoptotic proteins that work through the summarized in Table 7.7. intrinsic pathway. The response to DNA damage is broader than simple repair, and includes cellular aging Extrinsic Pathway (via arrest of the cell cycle) and cellular death Extrinsic stimuli include some proteins (via apoptosis). However, mutations in the from DNA viruses that inhibit apoptosis and DNA repair mechanisms are associated with so ensure that viral infection can continue. the development of cancer and a number of Suppressing apoptosis would also be an advan- other serious genetic disorders, confirming the tage when growth factors and cytokines are pre-eminent role which DNA repair plays in released. Fas ligand, TNF, TGF and cytokines are cellular function and normal development.

MOLECULAR MEDICINE 226 7. Development, Aging and Cancer

TABLE 7.7 DNA repair mechanisms [29].

Mechanism Explanation Genes Diseases

Mismatch repair Removes nucleotides that have been misincorporated MLH1 Lynch syndrome (MMR) as DNA is being copied. Acts on single base MSH2 (hereditary non-polyposis mismatches as well as small displaced loops a few MSH6 colon cancer or HNPCC) bases in size which occur in repetitive regions, e.g. PMS2 (Box 7.4) microsatellites. Nucleotide excision Predominantly involved in removing bulky helix 12 genes Xeroderma pigmentosa – repair (NER) distorting lesions from DNA usually caused by predisposition to skin UV light or chemical carcinogens. In this mechanism cancer [30] the damaged site is excised in a ~30 bp segment. This Cockayne syndrome – is a complex pathway involving many proteins. developmental disorder [30] Base excision and Minor distortions in DNA produced by some MYH Non-FAP multiple single strand break oxidative and methylation abnormalities are removed UNG adenomas (MYH), Hyper repair (BER) by base-excision repair resulting in the damaged base APTX IgM syndrome (UNG) and being excised and replaced with the correct one. some neurological disorders with ataxia (APTX) [30] Nonhomologous The most important DNA damage generated by BRCA1 (HR) Familial breast and ovarian end-joining (NHEJ) ionizing radiation is double-stranded breaks. Two BRCA2 (HR) cancer syndrome and homologous mechanisms are used to repair these – NHEJ and HR. recombination (HR) The former is said to be error prone because in repairing DNA it can lead to loss of genetic material.

DNA Mismatch Repair Therefore, it would be helpful to have a sim- There are four DNA mismatch repair genes ple screening test to determine if the work associated with hereditary non-polyposis colon intensive DNA testing was going to be fruitful. cancer (HNPCC) (Box 7.4). As for familial ade- Microsatellite instability can be used as a guide. nomatous polyposis, those who have a germ- For example, a family with the clinical features line mutation in the above genes only develop of HNPCC and a positive test for microsatellite cancer when a second hit occurs, and inacti- instability (by DNA tests or immunohistochemi- vates the second (normal) allele. For the reasons cal staining) would be worthwhile studying for discussed later with respect to familial adeno- DNA mismatch repair mutations. HNPCC is also matous polyposis, DNA predictive testing in more complex to manage than familial adeno- HNPCC is clinically useful but more complex matous polyposis because of the concomitant because: increase in non-colonic tumors that may develop.

1. Detecting clinical cases of HNPCC is Epigenetics difficult; 2. Four genes are involved, and Another way to inhibit genes (particularly 3. A proportion of the mutations are missense TSGs) is through epigenetics (Chapter 2). In changes, and so the finding of variants of tumors, loss of methylation has been observed unknown significance can be a problem, in CpG dinucleotides (normally most are e.g. 20–40% of missense changes with the methylated), and increased methylation in mismatch repair genes may fall into this class. CpG islands associated with gene promoters

MOLECULAR MEDICINE 7. Development, Aging and Cancer 227

(normally demethylation would be found here). it is also apparent that there can be global These changes can precede mutations in genes hypomethylation and this has been reported and can be found at the earliest stage of tumor in colon cancer. The degree of methylation can formation. Disruption to the normal histone also increase as the tumor progresses from an modification patterns is also present in cancer. adenoma to a carcinoma. Thus, there is considerable interest in epige- Compared to gene mutations, epigenetic netic changes as potential biomarkers or prog- changes are reversible and so are promising nostic indicators [18]. targets for new therapies. Drugs that can target Epigenetic silencing of one allele is compat- epigenetic modifications have been approved ible with the two-hit cancer hypothesis – i.e. by the FDA for some rare hematologic malig- a germline mutation (first hit) is followed by a nancies for which there are no effective treat- second hit (somatic mutation or epimutation). ments [18]. Clearly a risk here is the potential The latter could occur via the methylation of a for non-specific epigenetic effects which would gene promoter. It is also possible that in spor­ have significant consequences for expression of adic cancers, hypermethylation of the promoter other genes. could down-regulate both alleles. Imprinting might also be a mechanism by which the sec- Metastasis ond hit occurs. If a locus is imprinted, only one of the two alleles is functional. In this circum- Both benign and malignant tumors dem- stance, it would require a single hit to inactivate onstrate excessive growth, but benign tumors the one functional allele (Chapter 2). remain encapsulated and do not spread. Loss of imprinting has been detected in some Malignant tumors normally will not kill cancers, e.g. both maternal and paternal IGF2 through a local effect but do so because the alleles are expressed in Wilms tumors. In nor- tumor spreads to distant sites in the body; i.e. mal tissue, it is the paternal IGF2 allele alone they metastasize. Despite the importance of the that is functional. IGF2 (insulin growth fac- metastatic process, its molecular basis is only tor 2) is a gene which has, as its name implies, now starting to be understood. Previously it growth stimulatory effects. Hence, the normal was considered that metastasis represented the output from a single gene is increased when final stage of the multistep process leading to the maternal allele also expresses. The effect of a cancer, but now it is apparent that metastasis a relaxation of imprinting is not clear, but it may itself represents a series of incremental changes, predispose to tumor formation, since a gene each of which allows progression to occur. that is not normally expressed is now func- Broadly, metastasis has two major features [23]. tional. The story of imprinting and carcinogene- Physical spread from the primary to distant sites. sis is still in its early days. As well as explaining For a tumor to start the metastatic process it how tumors develop, the loss of imprinting must invade locally, get into nearby blood or opens the potential for a future line of treat- lymphatic vessels, move to distant sites, pass ment since re-establishing the imprint, if this from the lumen of the vessel into tissue where were possible, would allow the additional gene small nodules of cancers (micrometastases) which is expressing to be turned off. are formed. Since there are preferred sites for Loss of methylation (gene activation), can metastatic formation it is assumed that homing potentially lead to tumor development, since receptors are involved. E-cadherin is a key mol- genes that are normally repressed are now ecule controlling cell to cell adhesion. Loss of turned on – or even overexpressed. Despite its function leads to changes in cell shape and finding promoter methylation in tumors, attachments to other cells and the extracellular

MOLECULAR MEDICINE 228 7. Development, Aging and Cancer matrix. Increased expression works against follows Mendelian-type inheritance, although invasion and metastasis while down regulation it has similarities to complex genetic disorders promotes metastatic formation. Interestingly, because the penetrance for mutations in breast a series of transcription factors involved in cancer is variable, and there appear to be many embryological development (Snail, Slug and other genetic components involved as well as Twist) have also been shown to be perturbed strong environmental effects. in experimental models of metastasis including their inhibition of E-cadherin gene expression. Colon Cancer Tumor cells have to adapt and grow in a new environment. Once a tumor cell reaches a new Colorectal cancer is one of the common- environment it must adapt before it is able est cancers in western countries, with a life- to grow. This is considered to be a separate time risk of 5–6%. Three to 5% of these cancers process to spread because it has been shown have a strong familial risk that is inherited as in some cancers that micrometastases are a Mendelian autosomal dominant trait. They present but these have not progressed to mac- include familial adenomatous polyposis (FAP) roscopic metastatic deposits. In some cases, and hereditary non-polyposis colon cancer the micrometastases can lie dormant for many (HNPCC) (Box 7.4). An additional 20% of color- years, even after the primary tumor is removed. ectal cancers have a positive family history, but Breast cancer and melanoma behave in this with less well-defined genetic factors. Apart way. Adapting to the new environment can from the genetic variants present, the feature involve multiple different pathways and dor- that distinguishes colorectal cancer from other mancy might reflect a trial and error approach frequently occurring malignancies is its dis- until the right combination of changes are tinct precancerous state associated with the found. These are presently being sought adenomatous polyp. This means colon cancer through the identification of molecular signatures is a unique model for studying the evolution of for metastatic cells. a solid tumor, because progression can be fol- lowed from the premalignant (polyp) stage to the locally advanced and then invasive (meta- GERMLINE CANCERS static) cancer (Figure 7.7). Familial adenomatous polyposis (FAP) is a Introduction rare form of colon cancer (~0.5% of all cases), and is inherited as an autosomal dominant dis- Although rare compared to the more com- order, with close to 100% penetrance, although mon sporadic forms of cancer, there remains there is some variation in the phenotypic considerable interest in defining the molecu- expression of this disease. It is characterized by lar basis of high penetrant, germline cancers in hundreds to thousands of polyps in the colon the expectation that they might provide further with the risk of cancer closely related to the insight into what is happening in the low pen- number of polyps present. Because of this high etrant (sporadic) forms. Unfortunately, this has risk, treatment involves prophylactic removal not been a productive strategy to date. of the colon. Colorectal cancer occurs typically Two examples follow. The first is a rare in the 40s age group, or about 10–15 years after form of colon cancer, called familial adeno- the initial appearance of polyps [21]. matous polyposis, which demonstrates One clue that the gene for FAP was located autosomal dominant Mendelian-type inherit- on the long arm of chromosome 5 came from ance. The second is breast cancer, which also the chance observation of a deletion involving

MOLECULAR MEDICINE 7. Development, Aging and Cancer 229

Normal Early adenoma Advanced adenoma Carcinoma

SMAD4 APC KRAS TP53 DCC

Increasing number of mutations + epigenetic changes + environmental mutagens

FIGURE 7.7 Multistep genetic model for tumorigenesis in familial adenomatous polyposis (FAP). An initial insult affecting the colonic tissue can involve any number of genes. The example given here is APC – adenomatous polyposis coli. This is inherited in FAP but may be acquired in sporadic colon cancer. This initiates the tumor pathway through the devel- opment of the early adenoma and then genomic instability leads to other mutations in genes. The colonic epithelium with these accumulating mutations develops a growth advantage over normal tissue. Additional mutations involving the proto- oncogene KRAS and tumor suppressor genes such as SMAD4, DCC (deleted in colon cancer) contribute to the adenoma moving on to development of carcinoma. One of the late genetic changes involves the tumor suppressor gene TP53. During the above stepwise progression, epigenetic factors such as hypomethylation of DNA predispose to further genomic instabil- ity. Throughout this process the environment, e.g. mutagens in food, can contribute to DNA damage. The development of cancer from the first mutated cell relies on an accumulation of genetic defects until the appropriate combination of mutated oncogenes, tumor suppressor genes and DNA damage is present. this chromosome and the finding of FAP in the Since only one gene causes FAP, and pen- same family. Positional cloning was started at etrance is very high, predictive DNA testing is the chromosome 5q locus and this led to the worthwhile. There is also justification for test- identification of the FAP gene (called APC – ing children (in contrast to Huntington disease adenomatous polyposis coli). The APC gene which has no effective treatment) because now extends over 8.5 Kb and has 21 exons. Exon 15 the information from the predictive test can be is responsible for 75% of the coding sequence. put to practical use – i.e. there will be a 1 in 2 It has two hot spots for mutations at codons (50%) risk to offspring of an affected parent. 1 061 and 1 309, although all codons between The at-risk children will need to be followed 200 and 1 600 are sites for mutations. The APC carefully using colonoscopy, because at some gene is associated with both germline and time in the future, prophylactic colectomy will somatic cell mutations. Germline mutations need to be considered before the premalignant are found in most FAP patients. About 95% of polyps become cancerous. DNA predictive test- mutations in APC involve nonsense changes or ing will immediately exclude half the at-risk frame shift mutations, leading to the produc- children from further follow-up (because they tion of a truncated protein. do not have the affected parent’s mutation)

MOLECULAR MEDICINE 230 7. Development, Aging and Cancer thereby avoiding an unpleasant procedure such 100% in this disorder, a number of associated as colonoscopy. The children with the paren- conditions are found. They include: tal mutation will develop cancer at some time 1. Attenuated FAP characterized by a smaller (FAP shows high penetrance), and so surveil- number of adenomatous polyps although the lance will need to be undertaken diligently and risk for cancer is still increased; prophylactic colectomy planned appropriately. 2. Desmoid tumors will develop in about 10% About 20% of FAP patients have no family of patients and contribute to morbidity and history, but demonstrate a mutation – i.e. they mortality; most likely have spontaneously developed 3. A wide range of other cancers both intestinal an APC mutation, or a less likely explanation and extra-intestinal can develop, and is germinal mosaicism. In the former case, the 4. Pigmented lesions of the ocular fundus risks for siblings will be the same as the general occur in about 60% of families. This is not a population. Children of a parent with a sponta- premalignant state and does not affect vision neous mutation still have a 50% risk of inherit- but is useful in detecting at-risk individuals ing the mutant allele. Hence, knowledge of the before polyps develop. parent’s status (preferably molecular) is neces- sary to confirm the true genetic inheritance of The degree of severity as well as risk of familial adenomatous polyposis. developing associated complications of FAP is to some extent determined by the position of Genotype/Phenotype Correlations the APC mutation. Generally, mutations in the FAP provides an example of how knowl- central portion of the APC gene are associated edge at the gene level (the genotype) can help with a severe phenotype and extracolonic man- to predict the clinical picture (the phenotype). ifestations, while mutations at either end lead Although the penetrance for cancer is nearly to a milder disease (Table 7.8).

TABLE 7.8 genotype/phenotype correlations in the APC gene [21].

Phenotype Genotype

Classic disease Mutations in APC are usually found between amino acids 169 and 1 393. Deletions can also produce this phenotype. Severe disease Usually there are mutations between amino acids 1 250 – 1 464 (in what is known as the mutation cluster region), and particularly at amino acid 1 309. Attenuated disease Mutations are usually in the 5’ or 3’ ends of the gene, or in the alternatively spliced region of exon 9. Low penetrance late onset disease I1 307K variant found in about 6% of Ashkenazi Jews. Desmoid tumors APC mutations usually in amino acids 1 310 – 2 011 (end of mutation cluster region and a portion 3’ to it). Retinal pigmented epithelium APC mutations are found between amino acids 463 – 1 444 (mutation cluster region as well as a portion 5’ to it). Higher risk duodenal adenoma Mutations in amino acids 976 – 1 067 Higher risk medulloblastoma Mutations in amino acids 457 – 1 309

MOLECULAR MEDICINE 7. Development, Aging and Cancer 231

APC Gene ascertain who are at risk. Many types of famil- APC is a TSG and somatic mutations are ial cancers have been reported, but the sites found in the majority of colorectal adenomas most commonly involved are breast, ovary, and inactivation of both alleles is common in melanoma, colon, blood and brain. Clinical fea- sporadic cancers. The majority of somatic muta- tures which suggest a familial cancer include: tions in APC occur within a small segment of 1. Two or more close relatives affected; the gene known as the mutation cluster region, 2. Multiple or bilateral cancers in the same person; located between amino acids 1 250 and 1 464. 3. Early age of onset, and Another mechanism by which the APC gene 4. Clustering, for example, occurrence of both can be inactivated is through hypermethylation breast and ovarian cancer. of its 5’ promoter region. APC encodes a large multi-domain that allows Some clinical facts about breast cancer which various interactions with proteins. The central are particularly relevant to molecular medicine repeat region domains play a key role in APC are given in Table 7.9. There are four hereditary function through binding of β catenin thereby breast cancer syndromes: promoting its degradation. A mutation in APC 1. Breast and ovarian cancer syndrome – is associated with an accumulation of β catenin BRCA1 and BRCA2 genes (this will be the which stimulates transcription of a wide vari- main focus of the discussion to follow); ety of genes and so tumors develop. It should be 2. Li Fraumeni syndrome – TP53 gene; noted that about half of non-APC related colorec- 3. Cowden syndrome – PTEN gene, and tal cancers also have an accumulation of β catenin 4. Hereditary diffuse gastric cancer through mutations in other pathways, show- syndrome – CDH1 gene [21]. ing that the β catenin step is a crucial one in car- cinogenesis. C-terminal binding sites of APC are implicated in microtubule binding and cell cycle BRCA1 and BRCA2 Genes activities necessary for chromosomal stability. Historical developments in our knowledge of hereditary breast ovarian cancer syndrome include: Breast Cancer 1. Loss of heterozygosity in breast cancer tissue After non-melanoma skin cancer, breast can- was reported for a number of chromosomes cer is the most commonly diagnosed cancer in in the late 1980s; women, and the commonest cause of cancer 2. In 1990, breast cancer was localized to death in this group. It is second only to lung chromosome 17q21 by linkage analysis; cancer as a cause of death from cancer. By 75 3. In 1994, the BRCA1 gene was cloned, and the years of age, nearly one in 10 women in the BRCA2 locus on chromosome 13q12-q13 was USA will develop this disease. Pathogenesis of identified; breast cancer is complex, involving physiologi- 4. By 1995, it was shown that some sporadic cal, environmental, life style and genetic factors. ovarian cancers had mutations in the BRCA1 In contrast to the inherited cancer syndromes, gene but no sporadic breast cancers had such as retinoblastoma or familial adeno- abnormalities affecting this gene; matous polyposis described earlier, familial 5. In 1995, the BRCA2 gene was isolated; cancers refer to neoplasms that cluster in fami- 6. In 2002, microarrays demonstrated how lies. However, because of a complex mode breast cancer patients could be stratified into of genetic inheritance it can be difficult to high and low risk, and

MOLECULAR MEDICINE 232 7. Development, Aging and Cancer

TABLE 7.9 Facts about breast cancer [21,31–33].

Feature Comments

Family history and 5–10% of women have a mother or sister with breast cancer. 10–20% have a first or second inheritance in familial cases degree relative with breast cancer. The risk increases with the number of affected relatives and age at diagnosis. Two important genes are BRCA1 and BRCA2 although mutations in these genes only account for about 5–10% of all breast and ovarian cancers. Penetrance is variable and there is strong evidence for G x G and G x E interactions (Chapter 2). Environmental risk factors Age, reproductive history, menstrual history, hormone therapy, radiation exposure, mammographic breast density, physical activity, alcohol intake, anthropometric variables, e.g. body mass index and history of benign breast disease. Criteria for DNA testing These vary but generally reflect family history, early age of onset, bilateral breast tumors and breast cancer in a male at any age. More comprehensive criteria for DNA testing for BRCA1 and BRCA2 are provided in [21,31]. More recently, there has been some interest in a genome first approach which means that as DNA testing becomes faster and cheaper it might be better to screen individuals without a requirement for a family or clinical history to suggest a genetic predisposition. Early data are still equivocal in terms of how many are missed with the current screening criteria, but it is likely that in future genome first might be the preferred option. Classification Traditional parameters include histologic analysis, immunohistochemical staining for estrogen receptor (ER), progesterone receptor (PR), HER2 and lymph node involvement. Alternatives now being considered are based on RNA expression profiling such as luminal A, B, HER2 positive and basal-like subtypes. A kit developed from this is described below. Genomic DNA testing The potential to use gene profiling to stratify patients with breast cancer into high and low risk groups is being trialed with a number of multi-gene screening assays. One example is MammaPrint® (Chapters 3,4). Targeted molecular therapies Since BRCA1 and BRCA2 are defective in double-stranded DNA repair by homologous recombination, a new molecular approach to treatment is being attempted. This introduces a drug that further inhibits DNA repair and so gives the tumor cells a double dose of this problem (synthetic lethality). One example inhibits PARP an enzyme involved in base excision repair (Table 7.7). In vitro and in vivo models show that this causes selective killing in cells that are mutated for BRCA1 and BRCA2.

7. From 2000 on, a number of breast cancer (about 10–20% of these are eventually classified susceptibility genes or SNPs (common, mid as variants of unknown significance). Although and low penetrance) were identified. mutations are inherited in an autosomal domi- nant way, the second (normal) gene is either BRCA1 and BRCA2 are large genes (24 exons, inactivated by a dominant-negative effect, or, as 1 863 amino acid proteins; 27 exons, 3 418 amino is found in breast and ovarian cancer tissue, the acid proteins respectively). They function as second (normal) allele is often deleted leading TSGs, and more than 1 200 inherited mutations to complete loss of function. Not surprisingly have been reported. Like APC, most mutations since these are large genes, their functions are affect the protein structure. Some deletions complex, involving the regulation of transcrip- occur. Missense changes are less common and tion, cell cycle and genome integrity. Both genes so difficult to interpret in terms of pathogenicity have an important role in double-strand DNA

MOLECULAR MEDICINE 7. Development, Aging and Cancer 233 repair by homologous recombination, while increasingly of interest. Taking the example of BRCA1’s response to DNA damage is more BRCA1 and BRCA2, these genes have strong complex involving G1-S checkpoint arrest. effects in terms of breast cancer risk, but muta- tions per se are insufficient to cause it. To identify Risk and Penetrance missing heritability factors, many GWAS stud- A confusing aspect of hereditary breast ovar- ies are underway and whole genome sequenc- ian cancer syndrome is the concept of life time ing is being used to catalog the genes involved risks and penetrance which are generally used in breast cancer. The genes in the G x G interac- interchangeably. Following are some data for tions can be divided into high penetrance rare mutations in different genes and their clinical alleles, low penetrance common alleles, and consequences: now with Next Generation (NG) DNA sequenc- ing, medium penetrance risk alleles. As our 1. Life-time risk of breast cancer in those with understanding of G x G effects becomes more mutations in BRCA1 (50–80%), BRCA2 meaningful, the approach to the management of (40–70%); breast cancer will change (Figures 2.11, 7.8). 2. Life-time risk of ovarian cancer with mutations High penetrance rare alleles (relative risk 5.0 in BRCA1 (40%) and BRCA2 (20%) although [21]). Apart from BRCA1 and BRCA2, there mutations in the central ovarian cancer cluster are other genes associated with a high risk for region are associated with a higher risk; breast cancer, including: TP53 – a wide range of 3. BRCA1, BRCA2 mutations are rare in most tumors; PTEN – tumors in the breast, gut and populations (1 in 400 individuals) but more thyroid; CDH1 – breast and gastric cancer. common in Ashkenazi Jewish individuals, Medium penetrance alleles (relative risks around with 1 in 40 such persons carrying one of 1.5 and 5.0). These include mutations in three mutations. Therefore, life-time risks CHEK2, ATM, BRIP1, and PALB2 genes. These given above apply to most populations but genes demonstrate founder effects and so their there are exceptions, and significance may be influenced by the popula- 4. Oophorectomy in carriers with BRCA1 and tion in which they are sought. The type of DNA BRCA2 mutations reduces the life-time risk mutation may also move these from medium to of developing breast cancer by 50% [21]. high risk alleles. A type of breast cancer with a poorer progno- Low penetrance common alleles (relative risk sis is called triple negative breast cancer, because of 1.01 and 1.5). There are nearly 20 genes it is negative for the estrogen receptor and pro- identified through GWAS. Like other genetic gesterone receptor, and does not show ampli- associations in complex diseases, the role of fication of the HER2 gene. This cancer is more these genes (many are in fact polymorphisms likely to have a BRCA1 mutation. Apart from a and have no obvious function) remains to be proven increased risk for a second cancer occur- determined. Examples include: FGFR2, TOX3, ring in the contra-lateral breast, it is still not cer- MAP3K1, LSP1. Collectively these genes may tain that those with BRCA1 or BRCA2 mutations account for about 5–10% of the heritable factor have an overall poorer prognosis. in familial breast cancer. NG DNA sequencing approaches are now Other Genes in Breast Cancer being tested in clinical care using large pan- Our current understanding of complex els of breast cancer genes derived from all the genetic disorders is that they represent G x G high to low risk alleles described above. Up to and G x E (G – gene, E – environment) inter- 28 genes have been sought in the one screen- actions, with epigenetic changes becoming ing DNA test. This has the potential to provide

MOLECULAR MEDICINE 234 7. Development, Aging and Cancer

but overlapping group with the heritable

Pathways (germline/germ cell/constitutive) genetic dis- orders. The key difference between the two is Pathophysiology that somatic cell DNA changes are not herit- able and so there are no implications for family members. The role played by mutations in the genetic Risk material of somatic cells has already provided Prevention alleles new insights into pathogenesis, and from this

Therapies new diagnostics and biomarkers have arisen. This knowledge is being used to target (per- New sonalize) drugs in order to obtain maximum benefits. One example of this approach is Screening the identification of mutations in the serine-­ threonine kinase BRAF gene. This has generated FIGURE 7.8 Different types of risk alleles provide a lot of interest because a new class of drugs new information on cancer. The information gathering continues as genetics moves into genomics and mutations inhibiting BRAF has proved to be very promis- are sought in a range of cancers including Mendelian, com- ing in the treatment of advanced melanoma – plex, germline and somatic cell. Some common pathways which previously had few therapeutic options. are being detected despite the heterogeneity of mutations At the molecular level, cancer is a significant found. Targeting treatment to specific tumors is occurring therapeutic challenge because each tumor is dif- but the prospect of common pathways opens up the option to have new broader based therapies. Ultimately knowl- ferent, and each patient has a different genetic edge of pathogenesis and the availability of biomarkers background that responds to the tumor or the including DNA tests to detect tumor development earlier drugs used to treat it. Nevertheless, the vari- (and screening to identify at-risk individuals) will allow ous profiles of tumors are being identified, and more effective preventive measures to be undertaken. this will provide additional information about classification and treatment. For example, the finding that other tumors apart from melanoma more accurate risk estimates which, in some cir- have mutations in BRAF will allow the anti- cumstances, might mean patients have reached BRAF drugs mentioned to be tried as alterna- a certain risk threshold that makes them eligi- tive treatment options. ble for subsidized investigations or treatments. There are established ways to diagnose a However, this type of testing comes at a cost, hematologic or solid tumor. Next it is important as the interpretation of results is more complex, to proceed to some classification to guide ther- particularly with the medium to low risk alleles apy and prognosis. These steps include: where a larger number of variants of unknown significance can be expected. 1. Microscopic examination of stained material using blood, biopsied or fine needle aspirated tissue. Diagnosis is made on SOMATIC CELL CANCERS the basis of cell morphology and staining characteristics. A greater level of resolution is Introduction possible with electron microscopy; 2. Immunophenotyping which allows the Somatic cell genetic disorders exemplified by identification of specific antigens by staining hematologic and solid tumors form a separate with monoclonal or polyclonal antibodies;

MOLECULAR MEDICINE 7. Development, Aging and Cancer 235

3. Cytogenetic analysis in the hematologic follow DNA changes during various stages of a malignancies allows tumor-producing malignancy. translocations to be detected, and 4. DNA-based testing is the most recent Translocations addition with many options available such Lymphocytes are unique cells, since they as in situ hybridization to detect specific are able to undergo somatic rearrangements of sequences, for example, the identification of their immunoglobulin or T cell receptor genes. oncogenic human papillomaviruses. This is essential for generating molecules of sufficient diversity to enable recognition of the New classifications have become possible vast array of antigens to which an organism through molecular medicine, based on DNA will be exposed. Thus, gene families encoding changes or molecular pathways that are shared the immunoglobulin and T cell receptor genes between tumors. Another advantage of DNA are arranged in two configurations: testing by PCR is that archival materials such as formalin fixed, paraffin wax embedded tis- 1. Functionally inactive or germline state, and sue blocks remain suitable for DNA testing for 2. Functionally active or rearranged state each a considerable period of time. The technique of which is unique and contributes to the of laser-capture microscopy allows individual polyclonal response (Figure 7.9). cells in a sample to be studied, thereby avoid- Immunoglobulin diversity in the B lym- ing the contaminating effect of adjacent stromal phocytes reflects rearrangements of the heavy cells although this is not always feasible in rou- chain region on chromosome 14, followed by tine clinical practice. The interest and work in rearrangements in the κ light chain genes. If somatic cell genetics/genomics is growing rap- successful, the product is a mature B cell mak- idly and is expected to make important shifts ing an immunoglobulin with a κ light chain. If in our understanding of tumors and how treat- unsuccessful, the λ light chain genes rearrange ment can be personalized. to give a B cell making immunoglobulin with a λ light chain. The repertoire is further diver- Hematologic Malignancies sified by the addition of somatic mutations including random nucleotide insertions at V-D Solid tumors are initiated by two or more and D-J junctions. Similar rearrangements and mutations in DNA followed by a multistep single base changes in genes occur to form the progression. In contrast, leukemias do not T cell receptor repertoire. generally demonstrate the random genome This process of gene rearrangement is error instability seen in the solid tumors, and they prone, so it is possible that the immunoglobu- are often associated with a single non-random lin or T cell receptor genes can be accidentally reciprocal chromosomal translocation event. spliced next to or into other genes, including These translocations can lead to tumor for- proto-oncogenes. One way for this to occur is mation through inactivation of a TSG or acti- by a chromosomal translocation. Following vation of a proto-oncogene. Hematopoietic this, the cells containing the rearranged immu- malignancies present in the first instance as an noglobulin or T cell receptor genes can be aggressive disorder. They usually become more driven by the juxtaposed proto-oncogene and malignant during the course of their natural eventually a malignant clone arises (Table history. Access to abnormal cells in the periph- 7.10). Should a lymphoid cell form this type eral blood or bone marrow makes their study of clone, all its sister cells will carry the hall- easier. Thus, they have been useful models to mark of its unique gene rearrangement. This

MOLECULAR MEDICINE 236 7. Development, Aging and Cancer

Variable (V) Diversity (D) Joining (J) Constant (C)

1

V DJ C 2

VDJ C 3

4

FIGURE 7.9 Immunoglobulin genes in the germline and how they rearrange. During development of a stem cell into a B or T lymphocyte, there are rearrangements of the germline immunoglobulin genes (which number in the hundreds). This rearrangement generates the diversity in immune proteins necessary for effective antigen recognition. (1) The different immunoglobulin heavy chain genes are: V  variable; D  diversity; J  joining; C  constant. (2) The first recombination in the heavy chain locus involves a D to J step. (3) This is then followed by V to D–J recombination. (4) To detect these rear- rangements, DNA primers for PCR are based on regions which are known to be conserved (→ ←). Similar rearrangements occur with the immunoglobulin light chain genes (λ, κ) which do not have the equivalent of D genes, and the T cell receptor genes. monoclonality can be detected when investigat- During development of blastic transformation, ing patients with hematopoietic malignancies. additional DNA changes affect other genes Chronic myeloid leukemia: This is a malig- including TP53 or RB1. nant clonal disorder involving a pluripotential Acute promyelocytic leukemia: This is a rare hematopoietic stem cell, and predominantly variant of acute myeloid leukemia involv- affects young adults. It usually presents in ing the promyelocyte cells. In addition to the chronic phase, and within three to four years usual leukemia-related problems, patients with develops into an accelerated, and then acute acute promyelocytic leukemia are at risk of phase called blastic transformation. Over 95% severe bleeding due to deficient clotting fac- of cases have the Philadelphia (Ph) chromo- tors. Like chronic myeloid leukemia, acute some which results from a reciprocal trans- promyelocytic leukemia is associated with a location (exchange of chromosomal material particular translocation (Table 7.10). In most between two or more chromosomes) involv- cases, this translocation disrupts two genes ing chromosomes 9 and 22 (Table 7.10; Figure and leads to the formation of a fusion protein 7.10). The fusion gene product from the trans- PML-RARα (PML – a putative transcription fac- location (BCR-ABL) contains ABL, a proto- tor implicated in a number of cellular processes oncogene which has tyrosine kinase activity. including apoptosis, growth regulation, tumor Because of the translocation the proto-oncogene suppression, RNA processing; RARα is the is no longer regulated normally. This leads to retinoic acid receptor alpha gene). A key activ- increased cell proliferation, reduced apoptosis, ity of the RARα gene involves the neutrophil adhesion abnormalities and genomic instability. differentiation pathway, so inactivation of this

MOLECULAR MEDICINE 7. Development, Aging and Cancer 237

TABLE 7.10 Some translocations and gene changes in hematologic malignancies [34].

Disorder Translocation Genes that are juxtaposed

Chronic myeloid leukemia (CML) t(9,22)(q34.1;q11.23)a BCR gene (chromosome 22) and the ABL proto- oncogene (chromosome 9). Acute promyelocytic leukemia (APML) t(15;17)(q22;q21) RARα gene (chromosome 17) and the PML gene (chromosome 15). Follicular lymphoma (85%) and diffuse t(14;18)(q32.33;q21.3) BCL2 proto-oncogene on chromosome 18 to IgH lymphoma (30%) locus on chromosome 14. B cell CLL, myeloma, mantle cell t(11;14)(q13;q32.33) BCL1 proto-oncogenes on chromosome 11 to IgH lymphoma locus on chromosome 14. Burkitt lymphoma, B cell ALL t(8;14)(q24.21;q32.33) Exons 2 and 3 of proto-oncogene MYC on chromosome 8 to IgH locus on chromosome 14.

Abbreviations: CML (chronic myeloid leukemia), ALL (acute lymphoblastic leukemia), AML (acute myeloid leukemia), APML (acute promyelocytic leukemia), IgH (immunoglobulin heavy chain), CLL (chronic lymphocytic leukemia). aTerminology for cytogenetic rearrangement: Translocation (t) between chromosomes 9 and 22. The position on 9 is q34 (long arm band 34) and on 22 it is q11 (long arm band 11).

9 gene’s function through the translocation-pro- duced fusion protein leads to maturation arrest at the promyelocyte stage. The fusion protein is considered to have a number of actions includ- 22 Ph ing a dominant-negative effect on the normal gene product. BCR BCR / ABL 11.23 Novel Therapies Developed from SIS Knowledge of Molecular Defects To attempt a cure in chronic myeloid leuke- mia requires an allogeneic stem cell bone mar- row transplant. However, this is not available 34.1 for all patients, and there is significant mortal- ABL ity and morbidity associated with transplan- tation. No other effective treatment options existed for this leukemia until the late 1990s, FIGURE 7.10 Philadelphia chromosome transloca- when a new drug was designed specifically to tion resulting in altered gene function. A reciprocal trans- interfere with tyrosine kinase activity. The drug location between chromosomes 9 and 22 produces the Philadelphia (Ph) chromosome in chronic myeloid (granu- imatinib mesylate (Gleevec) was one of the first locytic) leukemia. ---  breakpoints. The Ph chromosome to be developed from knowledge of a molecular comprises the portion of chromosome 22 above ---- and the defect. It has proven to be a very effective treat- small segment of chromosome 9 below the ----. This results ment (Box 4.6). in juxtaposition of ABL from chromosome 9 with BCR from An understanding of the molecular pathol- chromosome 22. The sis proto-oncogene is not considered to have a functional effect from this translocation because it ogy in acute promyelocytic leukemia soon is located at some distance from the actual chromosome 22 allowed novel treatments to be developed to breakpoint (22q11.23). inhibit the RARα-PML fusion protein. One

MOLECULAR MEDICINE 238 7. Development, Aging and Cancer drug is ATRA (all trans-retinoic acid). ATRA strengths and weaknesses. PCR including works by binding to PML-RARα thereby inhib- Q-PCR is very sensitive as it can detect one iting its downstream effects, as well as induc- leukemic cell in 103–108 normal cells. PCR ing degradation of this fusion protein. The primers can be designed to detect fusion tran- remission rate for acute promyelocytic leuke- scripts or immunoglobulin/T cell gene rear- mia has now dramatically improved, particu- rangements. However, only around 50% of larly when ATRA is used in combination with the leukemias have identifiable chromosomal chemotherapy. breakpoints which would allow DNA tests to be used. Minimal Residual Disease The treatment of leukemia requires monitor- Solid Malignancies ing to ensure there is a long term remission or cure. For this, molecular testing is needed to Chromosomal rearrangements in the leuke- detect minimal residual disease, with the longer mias were also found in the solid tumors, but term outlook improved if treatment is started in the early karyotypes appeared to show differ- early relapse (best detected molecularly) rather ent changes for the same tumor, and so their than waiting for a full hematologic relapse. significance was not appreciated. Chromosomal Minimal residual disease refers to submi- banding, developed in 1970, changed this by croscopic disease; i.e. disease that remains making the identification of rearrangements occult within the patient but eventually leads more accurate. A historical and scientific over- to relapse. A patient’s response to anti-leukemia view of how cytogenetics has evolved in oncol- treatment is influenced by many factors, includ- ogy is provided in [19]. ing the tumor burden at the time of diagnosis, Another development was the appreciation which can be considerable (up to 1012 leukemic that balanced chromosomal changes were likely cells). In complete remission, the traditional to be the important ones, because these did microscopic approaches have limited capabil- not change the chromosomal or gene content ity to detect residual disease, and it is estimated but allowed the inactivation or stimulation of that based on microscopy alone there could oncogenes or TSGs. Examples of balanced chro- remain a residual 108 to 1010 leukemic cells. In mosomal changes are reciprocal translocations this circumstance, it is understandable that such as the Ph chromosome or inversions where relapse can occur. To improve treatment, mini- chromosomal segments are switched around mal residual disease monitoring has become an by 180°. Today, many solid tumors can be important component of modern therapy for shown to have specific chromosomal rearrange- leukemia. This approach is not readily avail- ments that involve tumor-forming genes [35]. able with solid malignancies because with the A further difference between solid and hema- leukemias, the blood or bone marrow provides tologic tumors was the finding that different a source of accessible tissue for monitoring. malignant clones might be present early on Minimal residual disease detection was first in the same solid tumor. These arise during attempted with polyclonal or monoclonal anti- the multistep process of tumor development. bodies. However, some of the antigens detected Apart from their potential to confuse diagno- by these antibodies were also present on nor- sis, this heterogeneity provided further insight mal or precursor cells and, so better methods into tumor development and explained treat- were needed. Today, there are two approaches ment failures or development of resistance due for detecting minimal residual disease: PCR to different sensitivities to chemotherapeutic and flow cytometry. Each method has its own agents [24].

MOLECULAR MEDICINE 7. Development, Aging and Cancer 239

Despite the impressive recent findings of Atlas project which has similar aims to ICGC cytogenetic-based approaches, it is still techni- and so far has characterized genomic changes cally easier to look for DNA changes, and so in brain and ovarian cancers. this is often the preferred way of investigat- Obtaining the full clinical benefits from ing cancers. DNA testing of solid tumors did sequencing studies will not be easy. Roadblocks not start until fairly recently, although it is ahead include: now rapidly moving forwards. Interest in solid tumors has to some extent bypassed genet- l Ensuring there is the bioinformatics ics to move directly into genomics-type ini- infrastructure and expertise for the analysis tiatives. This is illustrated by the formation of and storage of the vast data sets being the International Cancer Genome Consortium generated. (ICGC) which has the goal of sequencing 50 l Understanding the results of tumor types. By 2012, 13 countries had commit- comparative studies in tumors and their ted to this initiative with around 20 different microenvironments including stromal tumors being sequenced using whole genome cells at the genomic, transcriptomics and approaches [36]. epigenomic levels. Even non-mutated cells need to be considered as these can play a role Next Generation (NG) DNA Sequencing in tumorigenesis or response to treatment, as The number of somatic cell tumors exemplified by the strategy of inactivating sequenced to catalog the changes present is the normal PARP gene to facilitate the rapidly growing with over 400 having whole killing of cells that have BRCA1 or BRCA2 gene sequencing and many more exome mutations (Table 7.9) [32]. sequencing [37]. Even at this early stage it is l Getting changes in pathology and surgical apparent that there is remarkable heteroge- practice so that fresh and adequate samples neity in the numbers and types of mutations are available for omics-based analyses. detected. One estimate is there are around In particular, tumors that are not readily 48–101 somatic variations per tumor. However, accessible and so give small amounts such when these changes are considered in terms of as pancreatic cancer, or necrotic tumors such biological pathways rather than isolated muta- as lung cancer which provide poor quality tions, the complexity lessens. For example, a tissue. dozen or so pathways are reported to give the l Encouraging health professionals to take on same end results even if they get there through this new practice direction. For this it will changes in different genes [20]. be necessary to provide clinically useful NG DNA sequencing of cancer genomes algorithms for decision making. Perhaps the (exome sequencing, transcriptome sequenc- door has already opened since one report ing, epigenome sequencing and whole genome suggests that knowledge of the molecular sequencing) has been launched with the ICGC profile in ER positive, lymph node negative mentioned above. An advantage of NG DNA breast cancer led to changes in adjuvant sequencing is the ability to detect mutations in treatment recommendations in about a third DNA as well as structural variations including of the patients [32]. copy number changes. A centralized database l A significant challenge will be evaluating COSMIC (Catalogue of Somatic Mutations in the clinical utility of DNA variants including Cancer) is available and will help to distinguish nucleotide changes, insertion-deletions, the passenger from the driver mutations [37]. copy number variations, chromosomal The NIH is also funding The Cancer Genome rearrangements and the presence of

MOLECULAR MEDICINE 240 7. Development, Aging and Cancer

foreign (oncogenic viral) DNA. This will be Co-dependent Technologies/Companion particularly difficult with molecular profiling. Diagnostics A key driver for personalized medicine in can- Molecular Profiling cer treatment is the use of DNA testing to guide Some applications of omics-based approaches therapy – i.e. to aid in deciding on the best drug, for cancer management are now starting to based on likely response and the potential for side appear. Two examples are: effects. Costs must also be considered in assess- ing risk/benefit. The linking of two technolo- 1. Tumor profiles for breast cancer using a gies (DNA testing and drug delivery) to enhance 70 gene array or a 21 gene array set are their overall effect is called a co-dependent commercially available and used to predict technology or a companion diagnostic. Examples prognostic outcomes (Figure 3.15, Table 3.9 include [39]: and Box 4.3). Although these tests have been approved by the FDA their clinical utility l Cetuximab is a humanized monoclonal remains uncertain. They are currently being antibody designed to inhibit EGFR assessed through randomized clinical trials. (epidermal growth factor receptor). This Another way to profile tumors is NG DNA class of drug works best when the wild-type sequencing which allows a large number of (normal) KRAS gene is found in tumors such breast cancer related genes to be assessed as colorectal and non-small cell lung cancer. simultaneously rather than the traditional A similar class of drug is Trastuzumab. Like BRCA1, BRCA2 as discussed earlier, and Cetuximab, it has significant side effects and 2. An unsatisfactory dilemma in oncology is expensive. The drug works best in treating is the cancer of unknown primary which advanced breast cancer when the HER2 gene usually is well advanced and metastasized is amplified (Table 3.9, Box 4.6). when detected. Treatment options are l Vemurafenib is designed to inhibit the limited because the primary is unknown. serine-threonine kinase BRAF. It works best Gene based profiles are being developed when the BRAF gene is mutated particularly with the aim of identifying a likely primary at the valine 600 position. BRAF mutations source for these tumors [38]. are detected in melanoma, colorectal cancer, thyroid, gall bladder and other cancers. Sorting out the clinical significance of somatic mutations in cancer will not be easy, as they are Personalized medicine to manage patients presently considered to be either passenger or with cancer is an important goal. This approach driver mutations. The former have arisen because will become possible by assessing the genetic of genomic instability and the tumor’s landscape profile of the tumor and the patient’s germ- and do not play a major role in actual tumor line DNA so that the most appropriate tumor- development. On the other hand, driver muta- specific therapies can be identified. At the same tions might comprise the minority of changes time the selection of drugs and their doses can found in the cancer’s DNA, but are involved in be informed by the patient’s ability to metab­ tumor formation perhaps through giving the cell olize them (pharmacogenetics). a growth or survival advantage. As the number of the driver mutations accumulate, the cell Viral Induced Cancers becomes a cancer. If this model is correct a goal of NG DNA sequencing strategies will be to Around 20% of human cancers result from identify and distinguish these two classes. chronic infections. Fifteen percent have a viral

MOLECULAR MEDICINE 7. Development, Aging and Cancer 241 etiology and are predominantly found in HBV (hepatitis B virus) infection) (Table 7.11). developing countries. Both DNA and to a much Although RNA viruses have a higher profile in lesser extent RNA viruses (some of the RNA terms of cancer causation, they are less likely to viruses have reverse transcriptase and are then cause tumors in humans than DNA viruses. called retroviruses) cause tumors in humans. In Oncogenic DNA viruses include hepatitis B general, retroviruses produce tumors from the (HBV), papillomavirus (HPV), Epstein-Barr introduction of an oncogene into the cell or viral virus (EBV), Kaposi’s sarcoma virus (KSHV), activation of cellular proto-oncogenes. The 70 or human herpes virus 8 (HHV-8) and Merkel cell so cellular proto-oncogenes identified through polyomavirus (MCPyV) (Table 7.11). Compared study of oncogenic viruses are mostly involved to the oncogenes of RNA viruses (v-onc), the in cellular proliferation or apoptosis [40]. DNA viruses do not have obvious cellular As well as working through oncogenes, equivalents (c-onc) but cause cancer through retroviruses can also lead to cancer through viral protein-cellular protein interactions and insertional mutagenesis. Examples of these this was how the TP53 gene was discovered [40]. oncogenic viruses are HTLV-1 (human T cell It is thought that several DNA viruses have leukemia virus type 1) and HIV (human immu- evolved specific proteins that inactivate the nodeficiency virus). More recently a link with p53 protein to avoid its antiviral effect. It has cancer has been shown with HCV (hepati- also been shown that DNA viral proteins can tis C virus); another RNA virus that results inactivate the protein product of RB1 which from chronic infection and leads to hepatocel- is another key TSG involved in cellular DNA lular carcinoma (similar to what occurs with replication.

TABLE 7.11 Seven oncogenic viruses causing human cancer [40].

Virus Consequences

HTLV-1 The only known retrovirus that causes a specific human malignancy – acute T cell leukemia/ lymphoma. Oncogenic activity is not via oncogene activation but through the release of a viral protein that induces genomic instability and dysregulation of cell cycle checkpoints.

HBV Hepatotropic virus. Its replication cycle within the liver nucleus leads to the formation of mature virions via reverse transcriptase (which is atypical for a DNA virus). Liver damage following chronic HBV infection is thought to be due to the host’s immune response. Most primary HBV infections in adults are self limited (compare with HCV). About 5% of primary infections in adults continue and lead to persistent infection. It is estimated that about 350 million people worldwide are HBV carriers. About 20% of chronic carriers progress to the serious complication of cirrhosis. Another serious consequence is hepatocellular carcinoma with carriers being 100 times more likely to develop this than non-carriers. Despite the HBV genome being sequenced and characterized, it is disappointing that even today relatively little is known how this DNA virus causes hepatocellular carcinoma. HCV The main source of HCV infection is intravenous drug use. Nearly 170 million people worldwide are infected and it is now the most common reason for liver transplantation in countries like the USA. About 70% of those infected develop chronic hepatitis and like HBV, this can lead to hepatocellular carcinoma. Another malignancy that is more common in HCV carriers is lymphoma. Quantitative RNA assays and genotyping enable predictions to be made how an individual will respond to antiviral therapy.

(Continued)

MOLECULAR MEDICINE 242 7. Development, Aging and Cancer

TABLE 7.11 (Continued)

Virus Consequences HPV HPV exhibits species specificity and induces hyperplastic epithelial lesions as a result of infection leading to warts. Cervical cancer is the second most common tumor in women worldwide and is caused by HPV (particularly types 16 and 18) acquired mainly through sexual activity. Viral E6 and E7 oncoproteins code for proteins essential for viral replication and they bind to and inactivate TP53. There is much optimism that the recently released HPV vaccine will reduce the number of cervical cancers just as HBV vaccination has reduced the risk for hepatocellular cancer. MCPyV Recently discovered polyoma DNA virus that appears to infect most healthy individuals but in the elderly or immunosupressed it can cause a rare and aggressive skin cancer with neuroendocrine features (Merkel cell carcinoma). Viral DNA is integrated into host DNA and expresses the large T antigen viral oncoprotein. HHV-8 (KSHV) Herpes virus that infects lymphocytes where it can remain dormant. It causes Kaposi sarcoma which is an endemic tumor in Africa where it remains localized to the skin and rarely leads to problems. However, it causes lymphomas and extra-cutaneous tumors in the immunosuppressed particularly HIV infected individuals. Like other DNA viruses, it produces oncogenic proteins that inhibit TSGs such as TP53 and RB1. EBV EBV is a ubiquitous human herpes virus that infects most adults. After being infected, the individual remains a carrier for life. An in vitro characteristic of EBV is its ability to immortalize lymphocytes and so is useful in the research laboratory to provide a permanent supply of a particular cell line (or an unlimited source of DNA). Inappropriate expression of EBV latent genes leads to tumors including: Burkitt lymphoma, post-transplant B cell lymphoma, Hodgkin disease and nasopharyngeal carcinoma. Two EBV latent proteins LMP-1, LMP-2 interfere with cell signaling pathways involved in cell adhesion and morphogenesis.

References tive traits. Reproduction, Fertility and Development 2011;23:64–74. [1] Carlson BM. In: Human embryology and develop- [8] Gajdos ZKZ, Henderson KD, Hirschhorn JN, mental biology, 4th ed. Mosby Philadelphia: Elsevier; Palmert MR. Genetic determinants of pubertal tim- 2009. ing in the general population. Molecular and Cellular [2] Chi N, Epstein JA. Getting your Pax straight: Pax pro- 2010;324:21–9. (This journal issue teins in development and disease. Trends in Genetics covers many aspects of the genetic, hormonal and 2002;18:41–7. neural mechanisms involved in puberty in mammals.) [3] NIH Genetics Home Reference. http://ghr.nlm.nih. [9] Elks CE, Perry JRB, Sulem P, et al. Thirty new loci gov/gene for age at menarche identified by a meta-analysis of [4] Staton AA, Giraldez AJ. MicroRNAs in development genome-wide association studies. Nature Genetics and disease. In: Encyclopedia of Life Sciences (ELS). 2010;42:1077–87. Chichester: John Wiley & Sons, Ltd.; 2008. [10] Turker M. Ageing. In: Encyclopedia of Life Sciences [5] Simpson JL. Mammalian sex determination. In: (ELS). Chichester: John Wiley & Sons, Ltd.; 2009. Encyclopedia of Life Sciences (ELS). Chichester: John [11] Walter L, Lee SS. Mitochondria as a key determinant Wiley & Sons, Ltd.; 2008. of aging. In: Encyclopedia of Life Sciences (ELS). [6] Fortunato A, Tosti E. The impact of in vitro fertiliza- Chichester: John Wiley & Sons, Ltd.; 2009. tion on health of the children: an update. European [12] Gilbert SF. Ageing and cancer as diseases of epigen- Journal of & Gynecology and Reproductive esis. Journal of Biosciences 2009;34:601–4. Biology 2011;154:125–9. [13] Walker RF, Pakula LC, Sutcliffe MJ, Kruk PA, [7] Jammes H, Junien CI, Chavatte-Palmer P. Epigenetic Graakjaer J, Shay JW. A case study of “disorganized control of development and expression of quantita- development” and its possible relevance to genetic

MOLECULAR MEDICINE 7. Development, Aging and Cancer 243

determinants of aging. Mechanisms of Ageing and [28] Cairrao F, Domingos PM. Apoptosis: molecular Development 2009;130:350–6. mechanisms. In: Encyclopedia of Life Sciences (ELS). [14] Boyden SE, Kunkel LM. High density genomewide Chichester: John Wiley & Sons, Ltd.; 2010. linkage analysis of exceptional human longevity iden- [29] Lehmann AR, O’Driscoll M. DNA repair: disorders. tifies multiple novel loci. PloS ONE 2010;5:e12432. In: Encyclopedia of Life Sciences (ELS). Chichester: [15] Kenyon CJ. The genetics of ageing. Nature John Wiley & Sons, Ltd.; 2010. 2010;464:504–12. [30] OMIM (Online Mendelian Inheritance in Man). www. [16] A database of genes related to ageing. http://genomics. ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM senescence.info/genes/ [31] National Cancer Institute. Genetics of breast and [17] Balmain A. Cancer genetics: from Boveri and Mendel ovarian cancer. www.cancer.gov/cancerinfo/pdq/ to microarrays. Nature Reviews Cancer 2001;1:77–82. genetics/breast-and-ovarian [18] Rodriguez-Paredes M, Esteller M. Cancer epigenet- [32] McDermott U, Downing JR, Stratton MR. Genomics ics reaches mainstream oncology. Nature Medicine and the continuum of cancer care. New England 2011;17:330–9. Journal of Medicine 2011;364:340–50. [19] Rowley J. Chromosomes in leukemia and beyond: [33] Cianfrocca M, Gladishar W. New molecular classi- from irrelevant to central players. Annual Reviews in fications of breast cancer. CA. A Cancer Journal for Genomics and Human Genetics 2009;10:1–18. Clinicians 2009;59:303–13. [20] Bell DW. Our changing view of the genomic land- [34] Frohling S, Dohner H. Chromosomal abnormal­ scape of cancer. Journal of Pathology 2010;220:231–43. ities in cancer. New England Journal of Medicine [21] Chung DC, Haber DA, editors. Principles of Clinical 2008;359:722–34. Cancer Genetics: a handbook from the Masssachusetts [35] Mitelman F. Cancer: chromosomal abnormalities. In: General Hospital. New York: Springer; 2010. Encyclopedia of Life Sciences (ELS). Chichester: John [22] Mastrangelo D, Hadjistilianou T, de Francesco S, Wiley & Sons, Ltd.; 2010. Lore C. Retinoblastoma and the genetic theory of [36] International Cancer Genome Consortium. www.icgc. cancer: an old paradigm trying to survive to the evi- org/ dence. Journal of Cancer Epidemiology 2009 article ID [37] The Wellcome Trust Sanger Institute database of 301973. somatic cell mutations. www.sanger.ac.uk/genetics/ [23] Hanahan D, Weinberg RA. Hallmarks of cancer: the CGP/cosmic/ next generation. Cell 2011;144:646–74. [38] Cowin PA, Anglesio M, Etemadmoghadam D, Bowtell [24] Croce CM. Oncogenes and cancer. New England DDL. Profiling the cancer genome. Annual Review of Journal of Medicine 2008;358:502–11. Genomics and Human Genetics 2010;11:133–59. [25] Kinzler KW, Vogelstein B. Landscaping the cancer ter- [39] Chin L, Andersen JN, Futreal PA. Cancer genom- rain. Science 1998;280:1036–7. ics: from discovery science to personalized medicine. [26] Garzon R, Calin GA, Croce CM. MicroRNAs in can- Nature Medicine 2011;17:297–303. cer. Annual Review of Medicine 2009;60:167–79. [40] Bergonzini V, Salata C, Calistri A, Parolin C, [27] Funk JO. Cell cycle checkpoint genes and cancer. In: Palu G. View and review on viral oncology research. Encyclopedia of Life Sciences (ELS). Chichester: John Infectious Agents and Cancer 2010;5:11. Wiley & Sons, Ltd.; 2005.

Note: All web-based references accessed on 24 Feb 2012.

MOLECULAR MEDICINE CHAPTER 8 Molecular and Cellular Therapies

OUTLINE

Introduction 245 Regenerative Medicine 263 Definitions 263 Recombinant DNA Products 245 Cloning 264 Hemophilia 245 Stem Cells 265 Vaccines 249 Other Therapies 270 Gene Transfer 252 Xenotransplantation 270 Somatic Cell Gene Therapy 252 Synthetic Biology 271 RNA Therapies 261 Regulatory Aspects 262 References 272

INTRODUCTION Definitions in this chapter can vary and there is also some overlap as will be demonstrated in Therapies derived from manipulating DNA the section on Regenerative Medicine. Hence, are an emerging application of molecular medi- the title Molecular and Cellular Therapies is cine. They range from new drugs to gene trans- used to capture most applications. fer and the promise of regenerative therapies. More recently, the potential applications of syn- thetic biology are starting to emerge. Not sur- RECOMBINANT DNA PRODUCTS prisingly when new horizons are explored their risks are highlighted, as exemplified in 1997 by Hemophilia Dolly the Sheep which demonstrated that the cloning of animals had now become possible. Mutations in the factor VIII (FVIII) gene The therapeutic applications of molecular med- produce hemophilia A. This X-linked disorder icine need educated and engaged health pro- demonstrates the many challenges in develop- fessionals and members of the community to ing therapeutic products by recombinant DNA ensure forward progress, while safety and the (rDNA) means. The gene is large, with 26 exons ethical, legal, social issues (ELSI) are addressed. and a genomic structure extending over 186 Kb.

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00008-6 245 © 2012 Elsevier Inc. All rights reserved. 246 8. Molecular and Cellular Therapies

The protein contains 2 332 amino acids and is cryoprecipitate. In the 1970s, anti-hemophilic synthesized as a single chain. Next, the mid- factors with higher concentrations and improved portion (B subunit) of the molecule is excised stability became available. More effective treat- since it is not required for hemostatic function. ment programs including on demand home The heterodimers formed are held together by therapy allowed patients to begin infusions the calcium. instant a bleeding episode started. However, complications with the plasma- Plasma Products derived anti-hemophilic factors remained sig- A historical summary of developments in the nificant (Table 8.2). The risk of blood-derived treatment of hemophilia is given in Table 8.1. hepatitis B virus (HBV) infection was thought Landmarks include the isolation, in 1964, of a to have been resolved once transfusion serv- specific factor VIII enriched product known as ices introduced donor screening programs, and

TABLE 8.1 Milestones in the management of hemophilia A.

Year Discovery

1840 Bleeding episode treated with normal fresh blood. 1920s Plasma rather than whole blood shown to be effective. 1930s–1950s Fractionation of plasma identifies components with anti-hemophilic activity. Factor VIII (FVIII) implicated as the cause of hemophilia A. 1964 Cryoprecipitate is produced by allowing frozen plasma to thaw. A cold insoluble precipitate remains which is concentrated FVIII (cryoprecipitate). 1970s High potency freeze-dried FVIII concentrates become available allowing home therapy to start. 1980s The low point in hemophilia treatment occurs in the early 1980s with HIV and HBV in the blood supply infecting many patients. Two US biotechnology companies clone and express the FVIII gene (1984) aiming to produce a recombinant DNA product. More effective viral inactivation steps are incorporated in the manufacture of FVIII products. Monoclonal antibody-purified FVIII becomes available. Alternative hemostatic pathways are used to bypass the effect of inhibitors. Clinical trials of rhFVIII start with encouraging results emerging. 1990s First generation recombinant human (rh) FVIII and rhFIX (hemophilia B) licensed for clinical use. Activated rhFVII used to bypass FVIII inhibitors. Swedish trials recommend primary prophylaxis as a way to reduce bleeding and joint problems in children with hemophilia. First B subunit depleted rhFVIII released. US gene therapy clinical trials in hemophilia A and B start in 1999. 2003 Second generation rhFVIII is established and unlike earlier recombinant products does not use bovine albumin or other human proteins as stabilizers although there is exposure to albumin during some steps in manufacture. However, no reports of infections with recombinant products. Third generation rhFVIII are produced without exposure to exogenous animal or human albumin or plasma proteins during any stage in manufacture. 2009–2011 Gene therapy for hemophilia uses different tissues for gene insertion (hepatocyte, muscle, endothelial, hematopoietic stem cell) with viral and non-viral vectors. Encouraging results are observed in animal models but not replicated in humans. Although safe, ongoing elevations of FVIII or FIX were not maintained until a study in 2011 described good clinical responses in hemophilia B patients.

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 247 virus-inactivating steps were incorporated into rhDNA Derived Products commercial production. Nevertheless, the sub- In 1987, the first patient was treated with a sequent recognition of other viruses, such as recombinant human (rh) FVIII (Figure 8.1). human immunodeficiency virus (HIV), hepati- The use of mammalian cell lines such as CHO tis C virus (HCV) and parvovirus highlighted (Chinese hamster ovary) enabled complex post- the problems of human-derived products – translational steps such as glycosylation to be even those that had undergone viral inacti- undertaken. Removal of the B-subunit, which vation steps such as heating and/or organic was not required for hemostasis, facilitated solvent exposure. The additional safeguards commercial production. The activity of the increased production costs, but gave no guar- recombinant product was equivalent to mono- antee that all viruses (known and unknown) clonal antibody-purified FVIII, and the poten- would be neutralized. For example, a parvo­ tial to develop inhibitors was comparable to virus can withstand temperatures of up to other products. 120°C, and viruses without lipid envelopes are The value and efficacy of rhFVIII is now well not sterilized by organic solvents or detergents. established, and there have been no reports of A final, critical consideration for any plasma- infections resulting from its use. Today there derived product is its availability, which can are a range of products that have no human never be guaranteed because it will always or non-human protein contaminants [1,2]. The depend on a regular supply of donors. Hence availability of a regular and controllable sup- there has been a move away from plasma prod- ply of recombinant products will allow better ucts unless recombinant ones are unavailable or planning and more effective treatment of bleed- too expensive. ing problems. A new approach called primary

TABLE 8.2 Problems with use of plasma-derived hemophilia treatment products.

Problem Details

Infection Lipid enveloped viruses: Hepatitis B (HBV), hepatitis C (HCV), HIV, West Nile virus. 60–70% of hemophilia patients with severe disease in the 1980s infected with HIV. Higher infection rate for HCV. Non-lipid enveloped viruses: Hepatitis A, Parvovirus B19. Others: Slow virus infections, a range of organisms including non-viruses that are suspected but not proven to be pathogenic; unknown organisms or ones yet to emerge. Liver Disease Progressive and potentially fatal liver disease in 10–20% of those with chronic HBV or HCV developing cirrhosis. Risk for hepatocellular carcinoma 30 times higher than general population. Immunosuppression Contaminating proteins in factor concentrates (including pure ones) implicated as the cause for immunosuppression. Both T and B cell function impaired. Inhibitors Exposure to neoantigens produces a risk of antibodies developing against FVIII or FIX. There is some correlation with the underlying molecular defect and this complication is associated with all types of FVIII concentrates. Availability, cost Plasma-derived products are expensive when costs of purification are added and their availability will depend on a human source. Recombinant products are limited by costs.

MOLECULAR MEDICINE 248 8. Molecular and Cellular Therapies

prophylaxis became possible, where regular infusions of concentrates are started in young children after the first joint blood and/or before the age of two years [2]. This strategy is consid- Plasmid CHO cell line ered superior to treatment on demand because it

Transfection prevents blood from getting into the joint and so initiating the damaging events that lead to

F8 & DHFR joint abnormalities in hemophilia. Balancing the above are the high costs of recombinant therapeutics. Thus, they are not affordable to all, particularly those in develop- Selection ing countries where the plasma-derived prod- Amplification ucts might still need to be used. However, with time and greater market competition the costs will continue to fall. Fermentation Purification Inhibitors in Hemophilia The development of inhibitors in hemophilia FIGURE 8.1 Steps involved in the production of recom- is a serious complication, occurring in approxi- binant human Factor VIII (the correct name for the gene is F8). mately 30% of patients with severe hemo- Producing rhFVIII requires a mammalian cell line such as CHO philia A and about 5% of patients with severe (Chinese hamster ovary) to enable post-translational modifica- hemophilia B. Inhibitors are antibodies against tions to occur. CHO has been used for over 20 years and has proven successful because it is adaptable and can grow to high coagulation FVIII or FIX, and they occur as a densities in suspension cultures. A second useful property is its consequence of: flexibility in terms of genetic manipulation. The F8 gene (red box) is introduced into a plasmid vector which also contains 1. Genetic predisposition, since the risk is the dihydrofolate reductase (DHFR) gene (yellow box). The higher if there is a positive family history. genetically engineered plasmid is then transfected into a CHO Other genetic risk factors include large cell (only the nucleus is shown in the diagram) which takes up deletions, translocations or nonsense the plasmid either episomally or it is randomly integrated into mutations in the FVIII gene and the chromosomal DNA. The latter is needed for long term expres- sion. Selection and amplification are the next steps. For this expression levels of some immunoregulatory CHO cells are grown in methotrexate (a folic acid antagonist molecules such as interleukins, and that blocks DHFR). CHO cells with only endogenous DHFR 2. Environmental factors including exposure to will not grow but CHO cells with additional inserted copies of new antigens (neoantigens) in blood products DHFR will survive and grow (these cells will also be carrying to which immunological tolerance has not the F8 gene). The next step in the process involves isolating the highest FVIII expressing CHO cells i.e. those with multiply inte- been developed. The presence of large bleeds grated copies in CHO cell chromosomal DNA. This is obtained and their effects on how foreign antigens in by serially diluting and then examining different cell lines blood products are presented to the immune which are exposed to increasing concentrations of methotrex- system may also be important [2]. ate. At this stage the production steps are tedious and time con- suming taking up to six months for each new drug. However, Individuals with inhibitors have higher mor- the end point is clones of CHO cells with integrated F8 that bidity and are at greater risk of dying from an expresses efficiently and long term. The selected CHO cells are uncontrollable bleeding episode, since conven- fermented in large, commercial volumes. Proteins isolated are purified and checked for contaminants (CHO proteins, DNA). tional factor replacement becomes ineffective. Functional activity is assessed, and a constant source of rhFVIII Therefore, another application of molecular becomes available for clinical use. medicine is to provide a better understanding

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 249 of how inhibitors develop, and ways in which two billion people worldwide are infected they can be circumvented. with 350 million becoming chronic carriers. Treatment of patients with inhibitors may Approximately 600 000 persons die annually involve an attempt at inducing tolerance by from complications, such as cirrhosis and hepa- exposing them to regular and long term admin- tocellular carcinoma. HBV is 50–100 times more istration of factor concentrates. If this does not infectious than HIV and is an important occu- work, activated, plasma-derived, prothrombin pational hazard for health workers. complexes (mixture of factors II, VII, IX, X) can In 1982, a HBV vaccine became available that be used. These overcome the block on FVIII was produced by using plasma from known activation resulting from the development of chronic HBV carriers. Because of its source, antibodies. However, these products are expen- stringent purification and inactivation proce- sive and they have the same infection risks as dures became mandatory. Thus, the vaccine plasma-derived FVIII. They can also increase was expensive and its production limited by the risk of thrombosis. the availability of infected plasmas. The vaccine A solution to inhibitors was found with the was not well received by the public in view of development of an rhFVIIa (FVIIa is activated fac- the risk that other viruses might be transmit- tor VII). This product is now approved in many ted despite the inactivation steps undertaken. countries. It can lead to thrombosis but this is Because of these problems and the importance more of an issue if used for off-label indications, of HBV as a cause of liver disease, an rhDNA i.e. other uses apart from treating inhibitors in vaccine was released in 1987. hemophilia [3]. Examples of other drugs produced The HBV story illustrates the usefulness of through rDNA technology are given in Table 8.3. rDNA technology in vaccination programs, and how this has led to a declining incidence Vaccines of this infection over the past decade. In some countries, it has also been possible to show a Vaccines have proved to be highly effec- reduction in hepatocellular carcinoma rates. tive, relatively cheap and so affordable by most Nevertheless, HBV remains a global problem communities (Box 8.1). Nevertheless, modern even in countries with low endemicity. Routine production techniques require increasingly vaccination programs for newborns, infants, stringent quality control during manufacture, children and risk groups such as health work- as well as better assessment of toxicity. In terms ers and prisoners have been implemented to of standardization and quality control, rhDNA reduce the spread of this virus. More still needs vaccines have a lot to offer. Infectious agents to be done, including raising vaccination rates which are difficult or dangerous to produce by for other risk groups, such as intravenous drug conventional culture techniques might also be users and homosexual males. better developed through rDNA means. Genetic Different rhDNA HBV vaccines have now manipulation would also be useful to reduce been manufactured, and confirmed to be as the likelihood of reversion to wild-type strains, effective as the plasma derived products, such as might occur with a HIV vaccine, or to although there is variability in immunogen­ increase the antigenicity of a particular compo- icity depending, to some extent, on the sub­ nent derived from the infecting organism. units incorporated into the vaccines. For many vaccines it has been possible to show long Hepatitis B Virus Vaccine term immunity, although the aim on a glo- HBV is a DNA virus with distinctive surface bal basis would be for life long protection. It and core components. According to the WHO, has also been possible to identify categories of

MOLECULAR MEDICINE 250 8. Molecular and Cellular Therapies

TABLE 8.3 some therapeutic products prepared by rDNA technologies.

Natural and recombinant product

Human growth hormone (hGH). A protein of 191 amino acids essential for growth. It is species-specific and so the only biological source is human. Following the successful treatment of a pituitary dwarf in 1958 with hGH, programs were established to isolate it from human cadaver pituitaries. However, the programs were ceased in the mid-1980s when a number of recipients died from Creutzfeldt-Jakob disease, a fatal slow virus infection of the central nervous system (Chapter 6). Recombinant human GH (rhGH). The natural product was replaced with rhGH following the cloning and expression of the gene in 1979. Because the mature protein does not require sophisticated post-translational modifications, it can be prepared using a simple bacterial expression system. However, there are two problems with this expression system: (1) A requirement for extensive purification to remove bacterial impurities particularly endotoxins and, (2) The presence of an additional methionine amino acid at the start of the protein. This occurs because the eukaryotic start codon (ATG) is translated in the prokaryotic system into a methionine. Clinical trials during the mid-1980s confirmed the efficacy of the rhGH and it has remained in continuous use since. No significant side effects have been reported. The additional methionine does not lead to an increase in antigenicity of the product. Gonadotropins. Glycoprotein hormones for infertility treatment include FSH (follicle stimulating hormone), LH (luteinising hormone), and HCG (human chorionic gonadotropin). They were first prepared from animal products in the 1930s and then human pituitary glands. This was discontinued for the reason given earlier with hGH. Urinary derived gonadotropins then became the source of this product. Despite a good track record for safety, there remained concerns about the human source. Recombinant hormones. These were isolated and expressed in CHO cells because, like FVIII, post-translational glycosylation was required for activity. rhGonadotropins are now available. They demonstrate 99% purity leading to higher specific activity and lower immunogenicity. Risks of infections or being exposed to other foreign proteins in urine are eliminated. The relative costs of urinary derived and recombinant product remain controversial depending on how the cost analysis is undertaken. However, recombinant hormones are now preferred for infertility treatment. Hematopoietic growth factors. Bone marrow hematopoietic cells come from the proliferation and differentiation of progenitor cells that form specific lineages following interactions with cytokines. The pluripotent stem cell is the ultimate source of the lymphoid and myeloid precursor cells. The latter differentiates into the platelet, erythroid, neutrophil and macrophage lineages. Hematopoietic growth factors include the colony stimulating factors (CSFs), the interleukins (ILs) and erythropoietin (EPO). Examples of some CSFs include: G-CSF (G – granulocyte); M-CSF (M – monocyote or macrophage) and GM-CSF (GM – granulocytes, macrophages). Uses for G-CSF or GM-CSF include: (1) Treatment of febrile episodes related to neutropenia (due to disease, drug therapy or bone marrow transplantation), and (2) Mobilization of stem cells from peripheral blood to stimulate recovery after bone marrow transplantation. Recombinant products. Although characterized in the 1960s, the minute amounts isolated and the complex interactions and target cells associated with these factors limited further understanding of their roles. In the 1970s, M-CSF and GM-CSF were produced biochemically followed by G-CSF in the early 1980s. However, these amounts were minute. The in vivo significance of these products remained unclear until the relevant genes were cloned. rhG-CSF was made in 1986. The others soon followed. The FDA approved clinical uses of G-CSF and GM-CSF in 1991. Erythropoietin (EPO) was discovered in 1953 by A Erslev and subsequently shown to be produced by the kidney. However, the amount able to be isolated from this organ or the urine limited its therapeutic use until rhEPO was produced using a mammalian expression system. This product is now available to treat anemia of chronic renal failure or cancer. It is also used by athletes for drug doping.

individuals who demonstrate poorer vaccina- Human Papillomavirus (HPV) Vaccine tion responses. These include the elderly, obese, Until recently, HBV was the only example smokers, alcoholics, and those with chronic of a successful rhDNA vaccine, although more diseases including the immunosuppressed. For successes had been obtained in veterinary prac- these, revaccinations are needed. tice and the meat and livestock industry. Now

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 251

BOX 8.1 CONVENTIONAL VACCINES. There are different options with conventional 1. The public and health professionals are vaccine development: reassured about quality issues and stringent manufacturing requirements. To some 1. Live attenuated (non-pathogenic but respect these are easier to achieve with rDNA immunogenic) organisms such as the Sabin derived products, which also have the benefit oral poliomyelitis, measles and rubella of greater flexibility in terms of antigenic vaccines; selection and enhancing immunogenicity. 2. Inactivated (killed) microorganisms, e.g. Salk DNA vaccines, discussed in the text, provide parenteral poliomyelitis vaccine, and another alternative to the conventional live 3. Subunit vaccines, i.e. one or more antigenic attenuated vaccines, and components are available as with influenza 2. More effective educational strategies are and recombinant hepatitis B vaccines. needed, as exemplified by the 2009 influenza A (H1N1) pandemic. This was initially thought to Another variable is the adjuvant added to be a serious threat to health because the related enhance antigenicity. Despite the efficacy of con- and earlier H1N1 Spanish flu pandemic of 1918 ventional vaccines, some might not have reached had killed 20–40 million people. clinical use due to the stringent licensing regula- Despite the initial concerns about the 2009 tions now in force. For example, the oral poliomy- pandemic and the media hype about the large elitis (Sabin) vaccine was initially more attractive number who could die, only about 20% of than the Salk vaccine and eventually replaced it adults in the USA were vaccinated. More of a because it was easier to administer, less costly to concern was that less than 50% of healthcare produce and could spread to other non-immu- workers were vaccinated [4]. There are many nized contacts; i.e. herd immunity. However, on explanations for what is in effect a failure of an rare occasions it could revert to the wild-type neu- important public health measure, particularly rotoxic strain and so produce poliomyelitis. This if the 2009 pandemic had turned out to be as risk was considered acceptable in the early days of severe as originally thought. One reason for the vaccinating against polio, but from the early 2000s low uptake of vaccination was the perception the vaccine has been withdrawn from a number of of its value vis-à-vis its potential side effects, so countries. This was necessary because polio had better educational strategies are needed. been eradicated and the only cases of polio were now related to vaccination – i.e. this risk was no An interesting look into the future is the use longer acceptable – and the parental Salk vaccine of systems biology (Chapter 4). This approach in has been used since. Live attenuated vaccines developing a new vaccine would utilize known are more likely to be associated with severe side genetic immune signatures for an in silico predic- effects such as the example of polio given and the tion of likely T and B lymphocyte responses as earlier but now disproven belief that measles vac- well as adjuvant effects prior to a formal clini- cination led to the severe neurologic disorder sub- cal trial for efficacy. The result is a single and acute sclerosis panencephalitis (SSPE). definitive clinical trial rather than a hit and miss Two strategies are needed to maintain effec- approach that can occur when testing a vaccine tive vaccination programs: prepared by conventional means [5].

MOLECULAR MEDICINE 252 8. Molecular and Cellular Therapies a HPV vaccine is on the market, and is being DNA vaccines are presently being assessed used in many countries to vaccinate adolescent for use against AIDS, malaria and a variety of girls against cervical cancer. This is the second cancers. The HPV example already described most important cancer in women worldwide, involves a prophylactic vaccine which has little with over 250 000 deaths annually. It is predom- effect in established disease. For a therapeutic inantly caused by HPV infection. vaccine, the HPV E6 and E7 antigens need to be The HPV subunit vaccine would not have targeted, and trials using DNA vaccines are pres- been developed without rDNA technology ently underway. Recent results suggest that DNA being used to identify the important antigenic vaccines are safe but are poorly immunogenic component (L1 protein), and then manufacture because they lack cell specificity and spread it using a similar system to that for the HBV poorly to surrounding cells. Nevertheless, these vaccine. This allows a number of HPV types, vaccines can be given repeatedly. including those most often associated with The DNA vaccines pose a regulatory chal- cancer – i.e. HPV6, HPV11, HPV16 and HPV18 – lenge, since they lie somewhere between a to be targeted. It will not be known for some conventional vaccine (which has as one of its years how effective the vaccine is in terms of purifying steps the removal of any nucleic cervical cancer prevention. It should be noted acids) and the traditional gene transfer vec- that since there are multiple HPV types associ- tor discussed in the next section. Issues which ated with this cancer, the vaccine will not pre- remain unresolved include: vent all cancers. Like any new rhDNA product it is expensive. This means developing coun- 1. Whether DNA from these vaccines integrates tries where cervical cancer is less easily pre- into host DNA, and if it does, what are the vented through cervical pap screening may not consequences? have access to the vaccine [6]. 2. Anti-DNA antibodies to injected DNA are known to develop in animal studies, but the DNA Vaccines potential for their development and so risk Nucleic acid (DNA) vaccines predominantly of autoimmune disease in humans remains utilize genes in the form of plasmid DNA. Such unknown [7]. genes express proteins to produce a sustained antigenic stimulus, and so generate an ongoing immune response. There are various routes for GENE TRANSFER administration including parenteral, topical or a gun that delivers tiny amounts of DNA-coated Somatic Cell Gene Therapy gold beads. The first DNA vaccine was used in 1990. This approach to vaccination has provoked Gene therapy can be defined as “the transfer of interest because of the relatively simple way in genetic material (DNA or RNA) into the cells of an which vaccines can be prepared to deliver a range organism”. It aims either to produce a therapeu- of protein antigens for immunization. In animal tic effect or to mark a cell with a gene so that it studies, these vaccines stimulate both humoral can be followed or identified as part of a research and cell-mediated immune mechanisms, com- protocol. An example would be marking cells in parable to what occurs with live attenuated vac- a transplantation scenario to determine if cancer cines. Thus, they could be an alternative, but relapse occurs in host (patient) or donor cells. safer, approach to live viral vaccines and should Therefore, gene transfer is probably a better be better than inactivated (dead) vaccines in the description than gene therapy, since a therapeutic breadth of the immune response they elicit. intent is not necessary. However, for convenience

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 253 the term gene therapy will be used to cover all 2. A condition in which the cause of the defect applications. Gene therapy in humans refers to is a single gene and the gene has been somatic cell gene therapy, meaning the target is a cloned; somatic cell, and transmission to future genera- 3. A condition in which regulation of the gene tions cannot occur. Germline gene therapy, an need not be precise, and example of which would be a transgenic animal, 4. A condition in which technical problems is prohibited (this is discussed further below). associated with delivery and expression of the gene have been resolved. Applications When first proposed as a therapeutic option, Similar considerations would hold for gene therapy was considered only in the con- acquired disorders such as cancer, although text of genetic disorders. Today, gene therapy in these circumstances a cure might not be the has broader clinical applications; particularly prime goal and so the same stringent criteria cancer and infectious diseases [8]. Disorders for might not necessarily apply. which gene therapy has been tried or consid- Strategies for Gene Delivery ered include: Genetic diseases There are two ways to transfer DNA (RNA) into cells – ex vivo or in vivo (Figure 8.2). A pre- l Immunodeficiencies, e.g. adenosine requisite for ex vivo transfer is the ability to deaminase deficiency, severe combined culture cells in vitro. Therefore, not all cells are immunodeficiency, chronic granulomatous suitable targets for this type of gene therapy. disease, Wiskott-Aldrich syndrome. Another requirement is the ability to return the l Cystic fibrosis, familial genetically-altered cells to the patient – i.e. the hypercholesterolemia. l Storage disorders, e.g. leukodystrophies, Gaucher disease. l Coagulopathies, e.g. hemophilia A, B. Ex Vivo In Vivo l Leber congenital amaurosis. l Hemoglobinopathies, e.g. β thalassemia, sickle cell disease. Acquired diseases l Cancer, e.g. melanoma, brain and renal tumors. l HIV AIDS. l Cardiac and vascular disease. FIGURE 8.2 Gene transfer. Ex vivo: This approach l Neurological disorders, e.g. Parkinson involves the removal of cells from the patient. DNA (or disease, Alzheimer disease. RNA) is next introduced into the cells which are then cul- tured to obtain adequate numbers. The genetically-altered l Others, e.g. retinal degeneration, cells (which may also be physically or antigenically altered epidermolysis bullosa. following the ex vivo maneuvers) are then returned to the patient. In some circumstances, ex vivo transfer is the only Criteria have been proposed to identify the feasible option, e.g. hematopoietic cells. In terms of safety, types of genetic disorders for which gene ther- there is more confidence with ex vivo transfer since only apy might be appropriate. They include: the appropriate cells will take up the DNA/RNA. In vivo: A more physiological but challenging approach is in vivo 1. A life-threatening condition for which there transfer which involves direct entry of DNA (or RNA) into is no effective treatment; the patient. Targeting is now required.

MOLECULAR MEDICINE 254 8. Molecular and Cellular Therapies cells need to be transplantable. The above con- integrate, the expression of the introduced gene siderations have meant that work with ex vivo is only transient. transfer has predominantly focused on hemat- Viral (biological): The preferred method of opoietic cells. Apart from the fact that ex vivo gene transfer involves the use of viruses partic- transfer may be the only suitable approach ularly the retroviruses. Wild-type retroviruses available in many cases, it has another advan- can convert their RNA into double-stranded tage in terms of safety, i.e. there is more control DNA which can then integrate into the host’s over which cells will take up the foreign DNA. genome. Viral proteins encoded by the gag, pol However, in vivo transfer is considered to be and env genes make up approximately 80% of more physiological, and may be the only option the retroviral genome. These RNA segments in some circumstances, for example, dissemi- can be deleted and replaced by a foreign gene, nated cancer. In vivo transfer remains a priority for example, human adenosine deaminase awaiting further developments to ensure that (ADA). Now the recombinant retrovirus is no the right cells express the transferred DNA, and longer infectious because it cannot make its they do so in adequate numbers. The concept own structural proteins. This is a prerequisite of targeting becomes a real issue when in vivo for gene therapy. Persistent infection by the transfer is considered (discussed below). genetically engineered retrovirus would not The ultimate aim in gene transfer is to get be permissible since it might lead to neoplastic DNA into specific tissues. There are two ways change, the wrong cells expressing the gene, or to do this: the germ cells becoming infected and so pass- Physical: The cell and nuclear membranes ing on any unwanted genetic effects introduced can be made more permeable to DNA fol- via gene transfer to future generations. lowing co-precipitation of DNA with calcium To become a useful vector for DNA transfer, phosphate, or an electric shock (called electro- the retrovirus must infect in a controlled way. poration). Using micropipettes, it is possible to This can be done with packaging cells. These con- inject DNA into the cell’s nucleus. More novel tain a helper retrovirus that has also been genet­ approaches to facilitate movement of DNA into ically manipulated to produce empty virions, a cell include: meaning that structural proteins are present but a complete infectious virion cannot be made. 1. Injection of DNA directly into muscle cells; However, the retroviral vector with its inserted 2. Insertion of DNA via cationic liposomes in ADA gene can utilize the structural proteins pro- a process known as lipofection; i.e. it uses duced by the helper virus in the packaging cells synthetic spherical vesicles which have lipid to form a complete (infectious) virion which can bilayers and so are able to cross the cell undergo one round of infection. This would be membrane, and enough to get the genetically engineered retro­ 3. Coating of DNA with proteins and using a viral RNA into the target cells’ DNA. gene gun – DNA-coated microprojectiles. Advantages, challenges and risks with retro- Physical methods can be relatively inef- viral vectors include: ficient when it comes to cells taking up DNA. Advantages More importantly, DNA inserted into the host genome in this way is usually present as mul- 1. A single virus infects one cell; tiple copies and, if it does integrate into host 2. The virus is usually non-immunogenic, and DNA, there is no control over the sites of inser- 3. Integration into the host genome means tion. Thus, the function of normal genes could there is the potential for long-term be affected. If the introduced DNA does not expression of the inserted gene.

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 255

Challenges gag env 1. The target cell must be dividing before 1 the retrovirus can integrate into the cell’s Entry

genome; pol 2. Transduction efficiency is usually inadequate; 3. DNA insert size is limited which can be a 2 Loss of envelope problem if a large gene is involved, and 4. Retroviral vectors are produced from living cells so there is worry that contaminants 3 Loss capsid - viral RNA from these cells will be present. 4 Risks RNA/cDNA (reverse transcriptase) 1. Integration is random, and so there is always 6 the worry that a normal gene is inactivated Host DNA 5 DNA/DNA (viral) or an oncogene is activated, and 2. There is the potential for retroviruses to revert to replication-competent organisms and so induce cancer (Figure 8.3). viral RNA 7 See Chapter 7, Figure 7.3 for more discussion Translation - on cancer and retroviruses. Because of these ENV/POL/GAG issues, a number of other viruses have been developed for gene therapy (Table 8.4). 8 Target Cells Non-lytic budding Another consideration in gene therapy is the target cell. If a retroviral vector is used for trans- duction, an important prerequisite for the tar- get cell is that it should be dividing, so that the FIGURE 8.3 Life cycle of a retrovirus. (1) The enve- retrovirus can integrate into the host genome. lope protein enables the retrovirus to bind to the surface of The target cell should also be appropriate to host cells on infection. (2–5) Double-stranded DNA derived the type of expression required. For example, from viral RNA and the action of reverse transcriptase is a neurological disorder may derive no benefit required before the retroviral genome can be integrated into from the transfer of genes into hematopoietic that of the host. (6–8) The provirus formed replicates to pro- duce mature viral particles which are extruded from the cell cells. Finally, the target cell needs to be long- by non-lytic budding. lived to prolong the effects of gene therapy. The ideal target cell would be pluripotent stem cells, since integration of a gene into such cells The human bone marrow pluripotent stem should produce a cure, or at the very least a cell is elusive, but gene transfer into this type of long-term effect. Because of the potential availa- cell has been possible because of the infectious bility of stem cells, and the considerable experi- capability of the retroviruses. Nevertheless, ence gained with bone marrow transplantation, expression observed in these instances has been a lot of the work has focused on the hematopoi- low and of short duration. Thus, gene ther- etic stem cells as targets for gene transfer. apy would be difficult in disorders for which

MOLECULAR MEDICINE 256 8. Molecular and Cellular Therapies

TABLE 8.4 Comparisons between different vectors used in gene therapy.a

Property RV LV AV AAV

Size of insert in Kbb 8 8 8 5c Ease for manipulating ✓ ✓ ✓ Difficult to manufacture – titer, quality or potency ✓ ✓ ✓ Considerable experience with its use ✓ ✓ Immunogenic, i.e. potential for host immune responses leading ✓ to transient effects or toxicity Limited utility as only infects dividing cells ✓ Infects both dividing and non-dividing cells ✓ ✓ ✓ Remains episomal (so will not integrate) ✓ ✓d Integrates, i.e. potential for insertional mutagenesis ✓ ✓ ✓d Risk that it can cause human disease ✓ ✓e ✓e Potential for long-term gene expression ✓ ✓

aThe transmission of genetic material from one cell to another by viral infection is called transduction. Acquisition of new genetic markers by incorporation of added DNA into eukaryotic cells by physical or viral means is called transfection. Retrovirus (RV), lentivirus (LV), adenovirus (AV) and adeno-associated virus (AAV) vectors [1]. bGenerally the larger the insert size accommodated, the more flexible is the vector. However, this would not be relevant if the inserted gene were small. cLimited size of insert compared to other three vectors. dAAV vectors have limited capacity to integrate into host DNA and what occurs is less random than for the retroviruses. AAV vectors also form episomal DNA which remains intact in non-dividing cells. eLentiviruses are derived from HIV and so there is concern that they might revert to wild type through recombination. Adenovirus infections are common per se and so not a health issue. However, there is evidence that immunologic responses to AV can lead to significant problems. The latest generation of AV vectors are less immunogenic. significant gene expression would be required Introducing New Genes – Genetic Disorders to produce an adequate supply of protein. This Gene therapy has been very effective in may be overcome with recent developments in treating two forms of immunodeficiency in molecular technology including: children. The first of these was for adenosine 1. The potential to stimulate division of deaminase deficiency (Table 8.5), and the sec- pluripotent stem cells with the recombinant ond was SCID-X1 (severe combined immuno- human growth factors, making these cells deficiency – X-linked type 1) which has allowed move out of the Go phase of the cell cycle a number of these otherwise severely immuno­ and so becoming more accessible to infection compromized children to live nearly normal by a retrovirus; lives (Box 8.2). Wiskott-Aldrich syndrome 2. The use of monoclonal antibodies to identify and chronic granulomatous disease are other the surface antigens found on primitive cells, immunodeficiencies that have also benefited such as CD34 cells, and from a gene transfer approach as has one case 3. The availability of DNA sequences which of β thalassemia reported in 2010. can significantly up-regulate, i.e. increase Promising results in the neurodegenera- gene expression. tive disorders known as the leukodystrophies

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 257

TABLE 8.5 some examples of gene therapy trials [8].

Disease and gene therapy approach

Adenosine deaminase deficiency (ADA) [9]. ADA is an autosomal recessive severe combined immunodeficiency (SCID) in children. Death usually occurs at 1–2 years of age. Medical treatments include: (1) PEG-ADA – comprises the natural product (ADA) coupled to polyethylene glycol (PEG) to increase half-life. PEG-ADA is expensive and while improving well being it is not curative, and (2) Bone marrow transplantation from a HLA-identical sibling donor transplant is curative but 20% of patients can have this. Gene therapy: In 1990, a 4-year-old child with ADA received an infusion of autologous lymphocytes genetically altered by a retrovirus containing a normal ADA gene. The child had not responded adequately to PEG-ADA and so approval was given for gene therapy. Features at the DNA level which made ADA deficiency a good candidate for gene therapy included: (1) Target cells were lymphocytes and so accessible through the blood; (2) T lymphocytes have a relatively long lifespan; (3) The gene had been cloned and was small (3.2 Kb), and (4) It was expected that a moderate level of gene expression would be sufficient to reduce mortality in this condition.

In 2009, there was a review of 10 ADA deficient patients treated with autologous CD34 bone marrow cells transduced with a retrovirus containing a normal ADA gene. All were alive after a median follow-up of 4 years (range 1.8 to 8.0). Eight patients no longer required PEG-ADA replacement and their blood cells continued to express the inserted ADA gene. Nine patients had improvements in their immune systems with T cell counts increasing and T cell function normalizing. Because of this the treated children were able to lead normal lives. A few serious adverse events did occur as a result of the gene therapy but when the appropriate HLA match sibling bone marrow donor was not available the authors concluded that gene therapy was a safe and effective form of treatment.

Hemophilia [1,10]. There are effective medical treatments for hemophilia including rhDNA coagulation factors but they are expensive. Patients can develop antibodies and so become difficult to treat. Hemophilia A or B are considered good candidates for gene therapy because: (1) No significant regulation of the inserted FVIII or FIX genes is required since normal plasma levels have considerable variability; (2) A small increase in the factor levels would be sufficient to convert a severe disease (1% deficiency) into a milder form (5–10% deficiency); (3) The ease of accessing blood cells for ex vivo transduction, and (4) Animal models (mouse, dog, non-human primate) are available for pre-clinical studies.

Gene therapy. Over 40 patients with FVIII or FIX deficiency have been treated. The trials were shown to be safe but clinically ineffective. Strategies attempted include: (1) In vivo administration (by IV, IM or intrahepatic injection) of a viral vector containing the normal gene, and (2) Ex vivo transduction of cells such as fibroblasts. This approach is technically more difficult but allows screening to exclude insertional mutagenesis. The FVIII gene is 5 larger than FIX and so more difficult to package although the B domain deleted gene is easier to work with. Some interesting results have emerged including the finding of retroviral sequences in semen following IV injection. This study was stopped in view of the potential for accidental germline spread although it is likely that the vector sequences were in the tissues or fluids biopsied rather than the sperm itself. A few patients demonstrated a persistently elevated FVIII level (one lasted 10 months), and intrahepatic injection of FIX allowed one patient a transient rise in his FIX to 13%. However, there was uniformly poor long-term gene expression, a problem found in most other gene therapy studies. It is now back to the laboratory to address the problems of low to variable expression and antigenicity, i.e. a better vector is needed. This may have been found in hemophilia B with a report late in 2011 describing success. The study (US ClinicalTrials identifier NCT00979238) involved six patients with follow up to 16 months. Of these, four were able to stop FIX prophylaxis treatment with their levels of FIX going from 2–11%. Longer term follow up is now awaited.

Cancer [11]. Unlike the above two examples of a single gene defect, cancer is a more complex problem requiring multiple approaches including combinations of therapies. Different strategies are possible such as: (1) Stimulating the patient’s natural immunity; (2) Killing or interfering with the growth of cancer cells with drugs; (3) Inserting a wild-type tumor suppressor gene, e.g. TP53; (4) Increasing tolerance to high doses of chemotherapy or delaying drug resistance; (5) Anti- angiogenesis effects, and (6) Novel approaches such as miRNA, oncolytic viruses.

(Continued)

MOLECULAR MEDICINE 258 8. Molecular and Cellular Therapies

TABLE 8.5 (Continued)

Gene therapy. The following have been tried: (1) Cytokines are involved in key host defense mechanisms, e.g. interleukins 1, 2, 6, 8, interferon γ and TNF-α (tumor necrosis factor α), and their presence is required to activate cytotoxic T lymphocytes leading to tumor rejection. These genes are inserted into tumor cells or autologous fibroblasts which are then injected back into patients to stimulate tumor immune responses; (2) To enhance selectivity with cytotoxics, one gene therapy approach utilizes the conversion of an inactive compound (prodrug) to an active metabolite. For example, 5 fluorocytosine (the prodrug) is converted to 5 fluorouracil (cytotoxic agent) by reaction with cytosine deaminase delivered with a viral vector. Following gene transfer, cells which express the 5 fluorocytosine will be destroyed when exposed to cytosine deaminase while the remaining cells survive. This form of gene therapy involves a suicide gene; (3) Replacement of abnormal tumor suppressor cell activity such as TP53 since gene mutations are found in over 50% of cancers (Chapter 7); (4) Key problems with cancer chemotherapy are bone marrow toxicity and drug resistance. To protect the bone marrow, chemoprotection gene therapy protocols have been designed to target stem cells, and introduce into them genes such as MDR1 (multidrug resistance 1). This gene codes for P-glycoprotein and provides cells with resistance to a wide range of cytotoxic drugs; (5) An essential step in tumor and metastatic progression is angiogenesis, and gene therapy targeted to stopping this is being attempted, and (6) miRNAs can function as oncogenes and tumor suppressor genes in cancer, thereby providing a number of possible strategies for their use in treatment. Promising results are seen in animal studies. Eye disease [12]. The eye has some unique properties for gene therapy including its accessibility while remaining compartmentalized and immune-privileged. Gene therapy. Clinical trials for retinal diseases (retinitis pigmentosa, age-related macular degeneration) and non-retinal diseases (uveitis, glaucoma) are underway following promising animal studies. Impressive results were reported in 2009 with Leber congenital amaurosis, a rare genetic cause of blindness. More studies are underway. The preferred vector is the adeno-associated virus (AAV) because it does not integrate into the genome and elicits a minimal immune response from the host. There appears to be long-term transgene expression in the retinal cells. HIV [13]. The conventional HAART (highly active antiretroviral therapy) approach to AIDS treatment needs to be maintained for long periods of time making drug resistance likely to result. Thus, various gene therapy strategies have been developed including immunostimulation, RNAi, viral lytic approaches. However, while safe, none has produced a significant or lasting effect. Gene therapy. One recent and promising approach is based on a clinical observation, i.e. CCR5 functions as a co-receptor for HIV to enter cells. One variant of CCR5 involves a 32 bp deletion in the coding region leading to a truncated protein. Homozygotes for CCR5 Δ32 (approximately 1% of Caucasians) have significant resistance to HIV infection because the virus has lost one of its cellular entry points. Heterozygotes for this variant demonstrate delayed progression to clinical AIDS and viral loads are lower. An important observation emerged when a patient with AIDS developed acute leukemia and needed a bone marrow transplant. The HLA-matched marrow donor was homozygous for the CCR5 Δ32 variant. The transplant successfully treated the leukemia and it was also observed that HAART therapy could be stopped, i.e. the HIV infection had benefited from the marrow transplant primarily used to treat the leukemia. The above was tested in five AIDS patients needing autologous bone marrow transplantation to treat AIDS-related lymphoma. For this, the patients’ cells were transduced with a lentivirus that contained genes that inhibited key proteins in HIV as well as an RNA that inhibited CCR5. It was noted that there were insufficient autologous transduced cells to produce a measurable antiviral effect but it was noteworthy that the virus continued to express 24 months after it had been introduced. This was the first demonstration of a long-term effect of gene therapy in HIV-AIDS.

are starting to emerge. Two examples are adre- which highlighted the plight of these patients noleukodystrophy (ALD) and metachromatic and the importance of understanding the cause leukodystrophy (MLD). These lead to severe and finding new therapies. HLA-matched bone neurological impairment in both children and marrow stem cell transplantation can be used adults. ALD had attracted considerable pub- in ALD, but there is no definitive treatment for lic interest following the movie Lorenzo’s Oil, MLD (Table 8.6). The interest in ALD and MLD

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 259

BOX 8.2 CASE STUDY INVOLVING SCID (SEVERE COMBINED IMMUNODEFICIENCY) – TYPE X1 [ 1 4 ] . X-linked SCID is a rare genetic disease (about clinical hold (no further product could be given 1 male in 75 000 live births) that occurs due to a and no new patients could be enrolled), and this deficiency in T cells and natural killer (NK) cells, was soon followed by more drastic clinical holds and abnormal B cell function. Affected boys will on gene therapies using comparable approaches die within a year if the T and NK deficiencies to SCID-X1 (i.e. if (1) Retroviral vectors were are not corrected or they do not live in a sterile used, and (2) Hematopoietic progenitor cells were environment. The molecular basis for SCID-X1 the target). When more information became avail- is a mutation in the IL2RG gene which codes able about the leukemia, it was apparent that for the common γ chain cytokine receptor. This this complication had only occurred in patients mutation causes a severe disorder because the with SCID-X1. The FDA then allowed studies to gene codes for a subunit found on six different resume, subject to greater surveillance, monitor- cytokine receptors (interleukins 2, 4, 7, 9, 15 and ing for potential insertional mutagenesis events, 21). Definitive treatment for SCID-X1 involves and a risk/benefit analysis being carried out for bone marrow transplantation from a HLA each individual protocol. In the specific case of matched sibling. This has a 72% survival rate. SCID-X1, the risk/benefit analysis would need to Unfortunately, less than 20% of affected infants compare gene therapy risks of leukemia against have such a donor. More risky types of marrow alternative treatments such as haplo-identical transplants are possible, but the mortality rate or mismatched marrow transplants. To put the is significant because the matching is less ideal, leukemia into perspective, four of nine children graft-versus-host disease requires immunosup- treated in Paris developed this complication pression, and T cell function is not completely which responded to chemotherapy in three cases. restored. Eight of nine treated children had responded to Even with transplantation, B cell function the gene therapy, and their health had improved, may not be restored and so life-long supple- with seven showing sustained immune reconsti- mentation with immune globulin is needed, so tution up to 11 years post treatment. The under­ alternative treatments such as gene therapy have lying genetic events leading to the acute leukemia been considered. Points in favor of a gene ther- developing are complex and include: apy approach include a well-defined mutation in a relatively small gene, and the involvement 1. The retroviral – IL2RG gene insert contained a of long-lived T lymphocytes. In 1999, children promoter element to drive the gene. Through with SCID-X1 started to be treated by gene inappropriate insertion this activated the therapy using a normal IL2RG gene transduced LMO2 proto-oncogene which is essential for ex vivo into autologous CD34 bone marrow normal hematopoietic development; lymphocytes. The results looked promising, until 2. Stem cells (including the CD34 precursor one child developed acute T cell lymphoblastic cells) in young children may be more leukemia. The FDA responded cautiously at first, susceptible to insertional mutagenesis events but then a second child was diagnosed with the because they are still immature and have a same complication. All SCID studies were put on greater proliferative capacity;

MOLECULAR MEDICINE 260 8. Molecular and Cellular Therapies

BOX 8.2 (cont’d)

3. SCID provided a particularly suited By 2011, insertional mutagenesis as a compli- environment for the preferential proliferation cation of gene therapy had been reported only of the transduced CD34 cells compared to in cases involving immunodeficiency including all other endogenous lymphocytes, and 4 of 20 patients with SCID-X1, 1 of 2 with 4. Viruses that integrate are more likely to insert chronic granulomatous disease and 1 of 10 with in or near transcriptionally active genes. Wiskott-Aldrich syndrome.

TABLE 8.6 Leukodystrophies [15].

Disorder Clinical features and conventional therapies

Adreno-leukodystrophy Peroxismal disease characterized by progressive demyelination within the central nervous (ALD) system, adrenal insufficiency and diagnostic accumulation of VLCFAs (very long chain fatty acids) in plasma and tissue. Caused by mutations in the ABCD1 gene. There are two X-linked clinical forms: (1) Childhood cerebral involvement which is rapidly progressive associated Incidence 1:17 000 males with brain demyelination, and (2) Adult slowly progressive variant affecting spinal cord and peripheral nerves (called adreno myeloneuropathy). Microglial cells in the central nervous system are derived from bone marrow cells and this suggested that allogeneic bone marrow hematopoietic stem cell transplantation might work in ALD. This has now shown to be successful in the severe childhood form but only when used early on in the disease and there is a HLA-matched bone marrow donor. Metachromatic Lysosomal storage disease characterized by accumulation of sulfatides and extensive white leukodystrophy (MLD) matter damage leading to loss of both cognitive and motor functions. Most cases are caused by mutations in the ARSA gene. There are four clinical forms: (1) Late infantile; (2) Early Autosomal recessive juvenile; (3) Late juvenile, and (4) Adult. The most common are (1) and (2) with children Incidence 1:40 000 showing difficulty walking after the first year of life. There is progressive neurologic impairment both motor and cognitive and most children die by age 5. The adult form is more slowly progressive and manifests with psychosis or spinocerebellar ataxia. Cell therapy approaches including bone marrow transplantation or stem cell therapy are unpredictable in their effects in MLD and enzyme replacement therapy continues to be trialed.

has intensified, as new omics approaches are stem cells has been used to treat patients with applied to newborn screening and so a number ALD who could not get a matched alloge- of inborn errors of metabolism, including the neic bone marrow transplant. A key factor above, are being detected before clinical fea- in the success of these trials was early treat- tures start to develop (Chapter 4). Therefore, ment. The transplanted hematopoietic stem the potential to reduce their severity or pre- cells differentiated into brain microglial cells, vent them from developing by early diagnosis and these inhibited the progression of the becomes possible. demyelination that was occurring. After four Ex vivo gene therapy using a lentivirus to years, two patients had neurological benefits transduce the normal ABCD1 gene into CD34 that were comparable with what would have autologous peripheral blood hematopoietic been expected from a successful bone marrow

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 261 transplant, with only about 10% of the hemat- However, new activities for RNA have now been opoietic stem cells being corrected by gene identified, including the formation of RNA-RNA, transfer. These results are impressive because RNA-DNA or RNA-protein interactions. These, the children would otherwise have died [15]. as well as the observation that RNA can have The situation with MLD is more complex a catalytic effect, open up the potential for RNA because marrow transplantation is not gener- in therapeutics. In earlier editions of Molecular ally useful, so an in vivo strategy by which the Medicine, there was discussion of antisense oligo- normal ARSA gene is introduced directly into nucleotides as ways in which to manipulate the the brain has been preferred. To do this an nucleus for research and therapeutic purposes. AAV vector (Table 8.4) was selected because it The focus has now shifted to RNA interference. is less toxic and has some neural tropism. This approach has been tested as a proof of concept RNA Interference (RNAi) in normal non-human primates and it seems RNAi is a mechanism that allows cells to to work. Next will be a phase I/II clinical trial down-regulate or inactivate gene expression. to test for toxicity (and perhaps efficacy) in It is an important evolutionary pathway, par- humans. In the longer term, the gene therapy ticularly during development, and is found in approach may provide an option for other and single-celled organisms, plants and animals. more common neurodegenerative disorders It protects against foreign DNA in the cell that such as Alzheimer disease. might emerge in the form of a viral infection or transposons. RNAi is now a standard tool in Introducing New Genes – Other Diseases research, and it is being tested as a therapeu- As well as the increasing number of proto- tic approach to modulating unwanted gene cols for gene therapy, another development has expression. The two RNAi species of particu- been the change in emphasis from genetic dis- lar relevance to this chapter are siRNA (small ease to cancer, HIV-AIDS and a range of other interfering RNA) and miRNA (micro RNA). diseases. A breakdown of the various trials for For siRNA the double-stranded (ds) RNA which gene therapy has been attempted can be needs to be cleaved by the ribonuclease protein found in an international database of human Dicer into smaller fragments. One of the two gene therapy trials [8]. strands of the dsRNA (called the guide or anti- The slower onset of success with disorders sense strand) will join with its complementary other than the immunodeficiencies is not sur- matching mRNA and the two will then interact prising, since the underlying genetic defects with the cellular multiprotein RNA-induced are significantly more complex, and the selec- silen­cing complex or RISC. Argonaute-2 (Ago- tive advantage provided to CD34 stem cells 2) is a protein component of RISC and will transduced with wild-type genes is not present. cleave the target mRNA at nucleotide positions Nevertheless, promising data are emerging, 10–11 (from the 59 end of the matching siRNA) particularly when gene therapy is combined [16]. miRNAs play an important role in post- with stem cell therapy as described above for transcriptional gene regulation by binding to neurodegenerative disorders. the complementary target sites in the mRNA and inhibiting translation. siRNAs come from RNA Therapies exogenous sources such as viruses, or are pro- duced endogenously as exemplified by the As noted in Chapter 1, RNA has lived in the miRNAs since these are coded for in the shadow of DNA since it appeared to have lim- genome. Further information on RNAi is found ited applications in transcription and translation. in Chapter 1, Figure 1.8.

MOLECULAR MEDICINE 262 8. Molecular and Cellular Therapies

Synthetically produced RNAi species pro- greater flexibility. For example, the same vide a tool that could be used to silence any miRNA can target a number of genes and the gene with a known sequence in a potent and expression levels for miRNAs vary from 1 specific manner. They can be delivered into cells copy per cell to 10 000 copies per cell [17]. The by viral or non-viral vectors. Not surprisingly, regulatory potential of any miRNA reflects to the biggest challenge to clinical translation with some extent its activity in any cell. An intrigu- RNAi is getting these products into cells effi- ing possibility for improving the delivery of ciently using either the precursor dsRNA that RNAi species into the cell is via nanotechnol- needs to be processed by Dicer or the more ogy which is defined as the “intentional design, mature siRNA. Diseases for which RNAi would characterization, production and applications of be relevant include cancers or infections since materials, structures, devices and systems by con- the goal here would be to inhibit disease-caus- trolling their sizes and shape in the nanoscale range ing genes or RNA transcripts. Although siRNA (1 to 100 nm)” (Box 8.3) [18]. targets only one mRNA species, miRNAs can interact with a large number of mRNAs and so Ribozymes the potential for non-specific side effects (called The first generation of RNA products in off-target cells) will need to be addressed. gene therapy utilized catalytic RNA molecules Novel approaches are being tested with the known as ribozymes. These are naturally occur- miRNAs. These include the insertion of arti- ring RNA species that cleave RNA at specific ficial miRNA target sites into the transgene sequences. Ribozymes would have similar or viral genome that is delivered via gene applications to those described above for RNAi. therapy. Cells that do not produce the intro- A phase II clinical trial using a ribozyme in HIV duced miRNA will allow the transgene or viral reported in 2009 that while there had been no genome to proliferate. In contrast, cells with the consistent effect on viral load, the level of CD4 same endogenously produced miRNAs will lymphocytes (cells infected by HIV) were sig- destroy any vector introduced mRNA because nificantly higher in the ribozyme treated group. it also contains the miRNA target sites that are This prompted the suggestion that there might acting as decoys. This leads to greater cellular now be an alternative or additional treatment specificity in terms of the introduced gene. An to HAART (highly active antiretroviral therapy) example of this would be oncolytic viruses and which, while very effective, is also a demanding vectors that carry suicide genes which are a cur- treatment regimen associated with significant rent interest in cancer treatment. The effective- side effects [19]. However, there are technologi- ness of this approach depends to some extent cal constraints with ribozymes, particularly in on the ability to distinguish between normal their design, which makes production more dif- and cancerous cells. As described above, the ficult. They are also susceptible to degradation co-insertion of a miRNA decoy (selected on the by RNAses. Just as has been described for all basis that this miRNA is found in normal but other gene therapy approaches, more efficient not cancer cells) into the suicide transgene or methods for delivery of ribozymes into cells oncolytic virus will mean that following gene will need to be developed, and the in vivo effect transfer, normal cells will inactivate the trans- needs to be of longer duration. gene or oncolytic virus because the miRNA decoy is recognized, but cancer cells will not Regulatory Aspects and the suicide gene effect will proceed [17]. Selecting which miRNAs to use in gene The monitoring of gene therapy proto- therapy is a challenge, but choice also means cols by various government and institutional

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 263

BOX 8.3 [ 1 8 ] . Nanomaterials consist of metal or nonmetal Tumors generally have poor lymphatic drain- atoms or a mixture of metal and nonmetal atoms age and porous vasculature, which is a good com- called metallic, organic or semi-conducting par- bination for nanoparticles to preferentially infuse ticles respectively. A feature of nanomaterials is into and deliver agents for treating, marking or their large ratio of surface area to volume, which imaging tumors. The rate of infusion of the prod- allows the surface to be coated with many mol­ uct can be controlled by the composition of the ecules. Apart from size and shape, their other carrier polymer. Not surprisingly, the inefficient key features are their electronic, magnetic and transfer of genes using conventional vectors for optical properties that vary with their composi- gene therapy has moved to the experimental use tion. Nanomaterials have been approved by the of plasmid-containing genes that are compacted FDA for use in humans, or are being trialed as: into nanoparticles. A number of gene therapy tri- als in animals have demonstrated very promising 1. Drug carriers; results in retinitis pigmentosa or advanced ovar- 2. Agents for diagnostic imaging, and ian cancer. Compared to viral vectors, nanopar- 3. Genetic (DNA) testing including alternative ticles are less antigenic. There are no apparent approaches for third generation DNA short-term toxicities with nanoparticles although sequencing. it remains to be seen if there are long-term effects.

biosafety committees has been intense. It was their work or, of more concern, trials in which not until September 1989 that the USA National investigators had a financial interest perhaps Institutes of Health (NIH) approved the first through holding a patent for the therapeutic marker study involving transfer of DNA into product used. patients with melanoma, a malignant skin cancer. In September 1990, the first therapeu- tic transfer of a genetically engineered cell REGENERATIVE MEDICINE was undertaken in a 4-year-old child with the potentially fatal genetic disorder adenosine Definitions deaminase (ADA) deficiency (Table 8.5). This regulatory oversight has ensured safe Regenerative medicine is defined in a 2011 UK and steady progress for gene therapy. However, report as a “therapeutic intervention that replaces or a problem occurred in 1999, when an 18-year- regenerates human cells, tissues or organs to restore old male, Jesse Gelsinger, died as a direct result or establish normal function”. The report notes that of gene therapy (Chapter 10). The consequences regenerative medicine utilizes small molecule of this were significant, including a reassess- drugs, biological products, medical devices and ment of the regulatory procedures in the USA, cell-based therapies. Non-regenerative applica- and a greater focus on conflicts of interest, par- tions of the same technology include drug dis- ticularly those involving clinical investigators covery and toxicity testing [20]. conducting trials using vectors or products As noted earlier, the title for this chapter that were produced by companies sponsoring includes cellular therapies, because cells are

MOLECULAR MEDICINE 264 8. Molecular and Cellular Therapies playing the key therapeutic role often in associa- animals, it is very inefficient and error prone – tion with manipulated DNA or genes. However, it took around 277 attempts to get Dolly, and regenerative medicine is now part of the medi- many of these produced malformed fetuses. It cal terminology particularly in relation to stem is assumed that the inefficiency of the proce- cells. So it will be used in this section to describe dure partially reflects developmental abnor- therapeutic applications of stem cells and, in malities and/or perturbations in the epigenetic relation to this, cloning approaches, particularly control of cells during development. It should somatic cell nuclear transfer that might have also be noted that Dolly had no paternally- therapeutic intent. derived genes which is relevant for imprinted genes (Chapter 2). So it is not surprising that Cloning as a technique, SCNT is considered to be nei- ther efficient nor reliable enough to be used for Cloning has many different meanings. DNA human reproductive cloning. can be cloned, cells can be cloned, and monozy- gotic twins are examples of clones. Dolly the Therapeutic (Experimental) Cloning sheep proved that whole animals (and possi- This is another application of SCNT which bly humans) could be cloned experimentally. allows the production of embryonic stem cells Cloning in the context of regenerative medi- for research or therapy (Table 8.7). Because both cine fits in best with the latter example. It can types of cloning can utilize the same technology be further considered under reproductive clon- it is difficult to ban one but not the other, and so ing and therapeutic cloning which are terms some countries have banned both. Others have that some claim are misleading and would be banned SCNT only if used for producing a live more descriptive if called live-birth cloning and clone along the lines of Dolly. experimental cloning respectively. Common to There remain two unresolved issues in the both forms of cloning is the technique of SCNT debate around manipulating human embryos (somatic cell nuclear transfer). for cloning:

Reproductive (Live Birth) Cloning 1. The therapeutic potential of embryonic stem cells versus adult stem cells and more In 1997, Dolly the sheep showed that DNA recently induced pluripotent stem cells, and from a differentiated tissue cell (mammary gland) 2. The ELSI when creating or using embryos could be taken and reprogrammed to produce a for research and not fertility purposes [22]. cloned copy. The process involved the removal of the nucleus from the mammary gland cell. Dolly the Sheep was a spectacular achieve- It was then inserted into a sheep oocyte which ment, but it also produced important scientific, had been enucleated. The altered oocyte was ethical and moral dilemmas, and differing opin- next inserted into a surrogate mother by stand- ions within the lay and scientific communi- ard in vitro fertilization techniques (Figure 8.4). ties. The scientists who produced Dolly have In this example, the clone’s genetic com- expressed dismay that this technology might be position was virtually identical to that of the applied to human reproductive cloning because mother and there was no paternal contribution. there are many questions that remain unan- The recipient enucleated egg still has mitochon- swered. This will be an important challenge for drial DNA within its cytoplasm, contributing scientists and the community particularly if evi- about 1% of the total DNA in the clone. This dence or even lack of evidence is replaced by process is called SCNT, and although it pro- emotive issues such as infertility or a dying child duced Dolly the sheep as well as a host of other who might be given a second life through cloning.

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 265

donor eggs

enucleated

OR Somatic Isolated cell nucleus

grow to blastocyst stage

isolate & culture IVF ES cells

surrogate mother transplant

FIGURE 8.4 Somatic cell nuclear transfer (SCNT). The nucleus (red ) from a somatic cell (mammary gland was used for Dolly) is isolated and then inserted into an enucleated donor egg. The egg with its new nuclear DNA is stimulated to divide, and then re-implanted into a surrogate mother by IVF for reproductive cloning, i.e. the production of a live animal. Alternatively, the egg serves as a source of embryonic stem (ES) cells (therapeutic cloning) that might or might not have the same antigenic makeup as the somatic cell donor depending on whether the donor cells are autologous or allogeneic. In theory, these cells could then be used for organ or tissue transplantation. Although SCNT produces a clone that is virtu- ally identical to the donor, there are some differences because the enucleated donor egg still contains within its cytoplasm mitochondrial DNA (perhaps 1% of total DNA) that is genetically distinct to mitochondrial DNA in the donor. In this form of asexual reproduction it is difficult to control for epigenetic effects (Chapter 2).

Stem Cells and, in the longer term, producing new organs. Hematopoietic stem cells are now routinely Stem cells are non-specialized cells that can used in bone marrow transplants, for which self-renew and transform into other cells. They the Nobel Prize in Medicine or Physiology was have varying potential to form different cells. awarded to E Donnell Thomas in 1990. Their l Unipotent – forms one differentiated cell type. two important properties for regenerative med- l Multipotent – forms all cell types that icine are: constitute an organ, e.g. a hematopoietic 1. Self renewal – the capacity to make more stem cell. stem cells, and l Pluripotent – forms most if not all of the 2. Differentiation – the ability to give rise to adult cell types in the body. different progeny when exposed to the l Totipotent – forms all cell types including appropriate transcription factors. adult, embryo and placenta. In doing this, a progenitor cell is first Stem cells are the key in regenerative medi- formed. This is the precursor to the cine to repairing or replacing damaged tissues specialized cell (called a differentiated cell).

MOLECULAR MEDICINE 266 8. Molecular and Cellular Therapies

TABLE 8.7 Comparisons between therapeutic and reproductive cloning [21,22].

Property Therapeutic cloning Reproductive cloning

Alternative Cloning-for-biomedical-research or Cloning-to-produce-children or live birth cloning. terminology experimental cloning. Purpose Production of embryos for research or Cloning to enable the birth of a living human or embryonic stem cells for research/ animal genetically identical to another. therapeutic purposes. The embryos are not permitted to develop into a fetus or produce a live birth. Technical feasibility Embryonic stem cells can be produced. Likely that in humans (as was found in animals) Easier ways to make ESCs include the use there is a significant risk of malformations or genetic of existing cell lines or spare embryos from abnormalities (related to failed epigenetic control of IVF. gene expression). Animals cloned in this way include Dolly and others although most died. Acceptance and A focus for debate with polar views Unacceptable scientifically (although some scientists legality emerging. Some countries have made this have a dissenting view). Illegal in some countries. type of cloning illegal. There is a 2005 UN General Assembly declaration against this type of cloning.

Stem cells are of three types: three major cell lineages (endodermal, meso- dermal and ectodermal). Mouse ESCs were iso- 1. Embryonic – found within the embryo’s lated in 1981, and the human equivalents were inner cell mass; found in 1998. Applications of ESCs include the 2. Adult – found within many differentiated following: organs or tissues such as the bone marrow or Researching and understanding disease patho- cord blood, and genesis: Transgenic mice are useful animal mod- 3. Induced pluripotent stem cells – adult els to study human disorders (Chapter 4). They stem cells that have been genetically are produced by microinjection of DNA into reprogrammed into behaving like embryonic the pronucleus of a fertilized oocyte. Although stem cells (Table 8.8). the gene of interest is not inserted into its cor- rect position in the genome, it still remains pos- Embryonic Stem Cells sible to add new genes which can function in In the embryo’s blastocyst stage before vivo. Thus, expression of the mutant transgene implantation (about day 5–7 embryo), the inner will produce the clinical phenotype. An exten- cell mass contains all the cells that will make sion of this is the transgenic mouse which has up the fetus. Some of these cells are pluripo- been created by gene knock-out. This involves tent, because they will give rise to all types of homologous recombination between an intro- somatic cells as well as the germ cells. When duced mutant gene and the corresponding these pluripotent stem cells are grown in vitro wild-type gene. Now gene function can be they are called embryonic stem cells (ESCs or inhibited or the effect of a specific mutation ES cells) (Figure 8.5). When maintained under observed (Chapter 4, Box 4.7). appropriate culture conditions, ESCs can be ESCs have been critical for developing cultured indefinitely in an undifferentiated knock-out transgenics. Since ESCs are pluripo- state. When differentiated, ESCs give rise to the tent, they can be genetically manipulated and

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 267

TABLE 8.8 Three types of stem cells for regenerative medicine [23,24].

Stem cell Properties

Adult stem cells (ASCs, Undifferentiated stem cells present within differentiated cells in a tissue or organ. They also called somatic stem can renew themselves and usually differentiate into all other cell types. Their primary role cells) is in local maintenance and tissue repair. The ASC has an established therapeutic track record as demonstrated by bone marrow transplants. ASCs are easily accessible in different Multipotent tissues and rejection is not a problem when autologous. They are not associated with tumor formation unlike the following two other types of stem cells. They are multipotent rather than pluripotent (at least until iPSCs came along). A disadvantage in using autologous tissue is continued progression of disease so these cells would be precluded in genetic diseases unless the gene defect had been corrected. Finally, it is difficult to identify or separate ASCs in some organs so their manipulation or use becomes restricted. Embryonic stem cells Derived from inner mass cells taken from pre-implantation embryos at in vitro fertilization. (ESCs or ES cells) Their embryonic origin at this stage means they should be able to differentiate into the three germ cell layers of endoderm, mesoderm and ectoderm. Apart from the ELSI surrounding Pluripotent ESCs the challenges using these stem cells in regenerative medicine are: (1) It is difficult to generate fully functional cell types and so considerable effort has gone into determining experimental conditions to drive ESCs into differentiating into required cell types, and (2) They can form tumors (teratomas) and it is this capacity which indicates that ESCs are pluripotent in type. The risk for teratoma formation reduces if the differentiation process produces a homogenous population. It may be exacerbated if immunosuppression is needed to counter rejection. Induced pluripotent These are adult cells that have been genetically engineered using viral vectors to stem cells (iPSCs) dedifferentiate and behave like ESCs through the reprogramming of genes and growth factors found in the latter. First found in mice in 2006 and in the following year in humans. They Pluripotent demonstrate features of pluripotent cells and can generate cells with features of all three germ cell layers. They have a lot in common with ESCs including problems such as teratoma formation, variable capacity to differentiate and difficulty in generating fully functional cells. The use of viral vectors for inserting the reprogramming factors could lead to genome instability and hence cancer. Their main attraction is as an alternative source for ESCs.

+

egg fertilised blastocyst embryo cord blood adult + sperm zygote

FIGURE 8.5 Sources of stem cells. Embryonic stem cells are derived from the 5–7 day embryo known as the blastocyst. The outer layer of cells depicted as circles are the trophoblast that will go on to form the placenta. The cells at the bottom forming the inner cell mass are the embryonic stem cells. Cord blood stem cells obtained from the placenta at birth, and various sources of stem cells in the adult make up the adult stem cells.

MOLECULAR MEDICINE 268 8. Molecular and Cellular Therapies then reintroduced into the blastocyte of a devel- 1. Growing large numbers of the cell type oping mouse to produce a chimera. Foreign required; DNA which has become integrated into the 2. What are the appropriate types of stem cell germline of the chimera will enable the gene to use, i.e. what degree of differentiated ESC to be transmitted to progeny. Appropriate is needed for various scenarios? matings will produce homozygotes contain- 3. Another consideration is the question of ing the transgene. ESCs allow a gene to be tar- rejection since the source of the tissue is geted to its appropriate locus and replace the not normally the recipient’s. This could be normal wild-type counterpart by homologous addressed by anti-rejection treatment similar recombination. Using this approach, a better to what is already used in allografts. Another understanding of genetic inheritance or dis- way to control rejection is SCNT discussed ease pathogenesis becomes possible. The utility above, and of knock-out studies in defining the function 4. Can these cells cause cancer? of unknown genes is illustrated by the mouse hox-1.5 gene, which was inactivated by homolo- A comparison of ESCs and adult stem cells is gous recombination. Homozygous mutants for given in Table 8.8. this defect developed a phenotype similar to ESCs will always be sourced from excess the human DiGeorge syndrome, i.e. absent par- embryos obtained during in vitro fertilization athyroid and thyroid glands with defects of the procedures. Although these excess embryos heart, major blood vessels and cervical cartilage. would eventually be destroyed, their use for Human transplantation of tissues or organs: human research remains controversial. In The pluripotent and immortal qualities of ESCs response to this, governments have usually reg- makes them ideal candidates for use in trans- ulated access to ESCs, or banned their use. plantation; to repair damaged tissues or replace tissues that have undergone degenerative Adult (Somatic) Stem Cells changes. There is evidence from mouse work Although ESCs have the greatest potential that ESCs might prove useful in conditions to differentiate into various cell types, the dis- such as Parkinson disease, myocardial infarc- advantages mentioned earlier, particularly their tion and spinal cord injuries. In 2009, the FDA source, makes adult stem cells (ASCs) attrac- gave approval for a trial to start using human tive alternatives for transplantation, provided ESCs to treat acute spinal cord injury. This trial their ability to differentiate into different cells experienced some initial setbacks, including and tissue types (described as plasticity) can the finding of cysts in the pre-clinical mouse be proven. Therefore, the debate centers on animal model. The trial resumed when it was the degree of plasticity possible, with claims confirmed that these did not represent tumors that ASCs can differentiate into a wide range but for commercial reasons was discontinued of tissues. Others are more skeptical, want- in 2011 [23]. A second ESC study was approved ing to know if the ASC is actually changing its for macular degeneration which is an important function, or is this apparent plasticity due to a cause of blindness. Preliminary results for the coexistent or itinerant stem cell that has been first two patients treated emerged in early 2012 carried along with, say a hematopoietic stem and showed no major side effects and perhaps cell that now appears to be producing a brain some improvement in visual acuity. cell. Another explanation of the ASC’s apparent Significant technical challenges are needed plasticity involves cell fusion between some- before the promises ascribed to hESCs are real- thing like a hematopoietic stem cell and the ized. These include: host’s target cell thereby making it appear as

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 269 though the hematopoietic stem cell has differ- there is evidence that these two cell types are entiated into a distinct cell. different including their epigenetic patterns The potential uses of ASCs are comparable to and copy number variants. Whether these dif- what has been proposed for ESCs: ferences are due to the way the iPSCs were formed, or are intrinsic to the cells remains to 1. Research: Understanding dedifferentiation be determined. Some also question the ability and redifferentiation: The complex of these cells to proliferate which would limit control of cellular differentiation is not their value for making tissues or maintaining understood, and so the option to have a them. Animal studies have shown that iPSC- model (the ASC) to explore the molecular derived-dopaminergic neurons can correct the and cellular controls would provide disease phenotype in Parkinson disease but invaluable scientific knowledge, as well results in humans will not be available for a as possible therapeutic options to induce while. cells to change their primary differentiation A novel application of iPSCs was recently pathway, and described, involving endangered species, an 2. Therapeutic: Some evidence is already example of which is the white rhinoceros. The available from mouse models that adult 2 230 animals that were living in 1960 are now neurogenic stem cells can be used in the down to seven and none are reproducing, even treatment of Parkinson disease. in the wild [25]. Researchers have shown that the four human genes needed to reprogram a Induced Pluripotent Stem Cells somatic cell into an iPSC also work in the rhi- The first description of human induced noceros. The iPSCs produced will be stored and pluripotent stem cells (hiPSCs) came in 2007. when required it is proposed to generate germ Since then, there has been great interest in cells and use these with IVF to continue the hiPSCs, because they appear to behave like species. ESCs but do not have the ethical constraints There has been considerable hype over the of using human embryos. Industry has also powers of stem cells to cure a wide range of become involved and many patents have been human disorders. However, more scientific evi- filed for ways in which the reprogramming dence is needed and this will be a challenge. step, needed to convert an adult cell into an Apart from regenerative medicine applications, ESC-like cell, might be undertaken (Table 8.8). stem cells are key prerequisites for technologies SCNT is one way to get iPSCs but, as shown by like gene therapy. Without the stem cell becom- Dolly, it is inefficient and has never been per- ing involved, gene therapy will always have a formed in humans. The usual approach is to transient effect, limited by the half life of the cell use a set of four genes OCT4, SOX2, KLF4 and that has been genetically altered. Stem cells per MYC. These are introduced into the cell with se promise alternative therapeutic approaches retroviral vectors. Just like gene therapy, this which can be used alone, or the stem cells can lead to multiple inserted copies and/or genetic environment can be genetically manipu- insertional mutagenesis. The problem might be lated to provide it with greater flexibility. overcome if small molecules and proteins can be made to work [24]. Transdifferentiation Because the iPSCs are similar to ESCs (in Another option in generating stem cells is to terms of genes, surface proteins, telomerase lev- bypass the pluripotent cell phase (and so the els and both are pluripotent) they also have the risk for tumor development) and go directly potential to form tumors (Table 8.8). However, from a somatic cell such as a fibroblast to the

MOLECULAR MEDICINE 270 8. Molecular and Cellular Therapies cell of choice, e.g. a nerve cell, by using the functions of the patient’s liver can be taken right combination of transcription factors. This over temporarily by perfusing the patient’s is technically possible, but the drawback might blood through pig liver tissue, and be that it produces a cell line that does not have 4. More controversially, does the presence of limitless growth potential. an animal-derived substance, such as bovine serum albumin, in a therapeutic product constitute a xenotransplant? OTHER THERAPIES Despite some enthusiasm for whole organ Xenotransplantation xenotransplants in the 1980s–1990s, the current focus is on cellular xenotransplants for treating The conventional organ transplant is an allo- diabetes and conditions such as Parkinson dis- graft – i.e. donor and recipient are the same spe- ease. Cells prepared from the donor animal are cies. However, it is estimated that in the USA injected into the patient directly or encapsulated there are around 110 000 patients waiting for within a membrane. Although some prelimi- organ transplants and only about 30 000 will nary results are promising, two major problems become available [26]. To increase the number need to be resolved before xenotransplants can of donor organs, some countries have used a be more realistically assessed in clinical trials, presumed consent approach; i.e. failure to opt out namely graft rejection and the risk of infection. of being a donor at time of death allows organs to be removed from that person. However, it is Graft Rejection unlikely that this approach will be universally acceptable and, in some jurisdictions, the next of Rejection of xenotransplants involves both kin has the final say on permission to transplant antibody and cellular responses to the foreign tis- irrespective of the deceased’s wishes. In this sue. The best characterized is called hyper­acute environment of increasing demand, xenotrans- rejection, and results from preformed anti­bodies plantation has been proposed as an alternative to the animal tissue leading to a rejection response source of organs or, in the case of diabetes, a tis- within minutes of transplantation. At present, the sue transplant that might replace the dysfunc- most suitable animal for cellular xenotransplanta- tional pancreatic islets in the patient. tion is the pig because of its breeding characteris- Xeno (Ξνo) comes from the Greek for foreign tics, and because its organs are comparable in size or strange. Xenotransplantation describes the to those of humans. The basis for the hyperacute transplantation of living cells, tissues or organs rejection in this animal is the presence of pre- from one species to another. The scope for a formed antibodies in the recipient to a ubiquitous xenotransplant is broad, and may involve: carbohydrate epitope in pig vascular endothe- lium called α1,3Gal. Activation of the comple- 1. Solid organs, such as a heart from a non- ment pathway also occurs as does T lymphocyte human primate or a pig; mediated cellular damage. Approaches to control 2. Cells, for example, pig islet cells could be this rejection include: used to treat diabetes in humans; 3. Animal extracorporeal organs or tissues 1. Transgenic pigs are humanized by taking out could be used to support a human until the gene for α1,3Gal; the latter’s own tissues start to work. An 2. The pig’s complement pathway is modified, example of this would be acute poisoning and which leads to temporary liver failure from 3. The effects of recipient CD40 and CD154 which the patient will recover with time. The T lymphocytes are blocked [27].

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 271

A dysfunctional coagulation pathway also it to occur with the continuing support of the contributes to the rejection and different strat­ community and the avoidance of untoward egies are being tested to resolve this. side effects, appropriate regulatory oversight is needed. The dilemma of getting informed con- Animal-to-Human Infection sent when there are a number of unknowns in The transmission of bovine spongiform relation to risk is discussed in Chapter 10. Also encephalopathy (Chapter 6) from cattle to relevant to xenotransplantation is the tension humans had sensitized regulators, scientists between the traditional ethical principles of and the community to the potential for animal- autonomy, versus public health measures that to-human (zoonotic) infections. Although might be needed to prevent spread of infec- porcine tissue would not transmit hepatitis tion, particularly if xenotransplant recipients viruses, the risks of passing herpes and cytome- declined follow-up and surveillance require- galovirus would be similar to human-to-human ments. Should such an event occur, exposed transplants. Nevertheless, the major concern in family members might also become involved. xenotransplantation is the risk of the horizon- Xenotransplantation has been banned in tal transmission of pig endogenous retroviruses many countries. Recently, prohibition has been (abbreviated to PERV) with three types being relaxed in some jurisdictions, subject to careful identified, namely PERV A, B and C [27]. The short- and long-term monitoring. The effect of potential for PERV infection has an experi- banning a technology such as xenotransplanta- mental basis, but is difficult to quantify in a tion is exemplified by Living Cell Technologies real time human situation. The risks would be Ltd, a New Zealand based company that had compounded if the recipient was immunosup- been experimenting with pig islet cells as an pressed prior to transplantation. Overall, the alternative treatment for type 1 diabetes since the risk for PERV is likely to remain unknown until early 1990s. Work started but then had to stop large studies are conducted and there is a longer because of regulatory issues in its home country. follow-up period. Recent data are reassuring As a result the company moved its clinical trials since they suggest that while PERV infection offshore. Since 2009, the ban on xenotransplan- can occur in vitro, there is little evidence that tation has been lifted, and now clinical trials are this is accompanied by replication of PERV. underway in New Zealand and other countries Steps are being taken to reduce the risks of using pathogen-free pigs (to reduce the risk of infections from xenotransplants, including: PERV) as donors for islet cells that are encapsu- lated in a semi-permeable gel [28]. 1. Breeding pigs that are free of certain pathogens using specialized facilities and their periodic testing for these pathogens; Synthetic Biology 2. A better understanding of the biology of Synthetic biology (or synbio) describes an PERV and whether it will infect humans, and emerging area of research that combines science 3. Development of long-term follow-up and (biology, chemistry, genetics) with engineering surveillance strategies to ensure that any and computer science. Its goal is to construct novel infection is quickly detected and novel biological (living) entities out of non-living appropriate containment implemented. materials, or to redesign existing ones so they do something that would not naturally occur, Regulation for example, manufacturing a product. Synbio Ultimately, the value of xenotransplantation has potential to lead to significant scientific, will be based on its risk versus its benefits. For commercial and social outcomes [29,30]. It is

MOLECULAR MEDICINE 272 8. Molecular and Cellular Therapies not a new field of study. However, it has gained that progress was temporarily stopped because of greater impetus through recent developments in a single base pair deletion in an essential bacter­ molecular medicine and science, including the ial gene. In contrast, major changes in sequence ease and cheapness of DNA sequencing. or structural variants in non-essential parts of the Potential applications of synbio include: bacterial genome did not impair viability. Examples of a new in vitro synthesized anti- 1. Production of new medical therapeutics, malarial drug (arteminisin) from knowledge of diagnostics and tissues; its chemical structure is described in Box 4.6, and 2. Biofuels as alternatives to fossil fuels; opportunities for synthetic biology in developing 3. Detecting and removing pollutants; new biofuels are briefly mentioned in Table 6.9. 4. Production of chemicals or fibers, and New vaccines have also been proposed, includ- 5. Novel food additives in agriculture. ing re-engineering Helicobacter pylori so its non- Challenges for synbio are: immunogenic flagellin is altered, by adding a component of the E. coli flagellin. The chimeric 1. Concern in the scientific and broader product formed now provides a vaccine to pro- community for harm through uncontrolled tect against H. pylori. Biosensors are another focus interference with the environment and for synbio, as these are core elements available biodiversity or increasing the potential for for the cell to respond to environmental stimuli. bioterror, and An intriguing goal for biosensors is to create an 2. Rethinking of patents and intellectual artificial nose made up of microsensors based on property issues particularly in re-engineering bacterial or enzymic systems for detecting spe- developments. cific compounds [29,30,32]. Generally, all work in synbio has been con- Applications ducted in microbes, as these are easier to The synthetic bacterium created by J Venter manipulate than more complex organisms. and colleagues in 2010 was considered by some Goals such as new therapeutic agents or bio- in the media as the first example of artificial life. fuels can be managed within current regula- However, this was incorrect as the experiment tory requirements and available biotechnology involved the creation by genetic engineering of a including computer science. However, the next bacterial genome (1.08 Mb Mycoplasma mycoides development might utilize mammalian syn- JCVI-syn1.0 genome) that was then inserted into bio, perhaps in the field of stem cell research. another living organism Mycoplasma capricolum The potential to manufacture or remanufac- which had had its genome removed [31]. The tour ture what is already there will have important de force part involved the in vitro synthesis of the health applications, and no doubt will also raise entire M. mycoides genome from a published refer- additional concerns about safety. Hence it will ence sequence. The end result was a new bacter­ be important for steady but safe progress to ium with the expected phenotypic characteristics have a transparent and functional regulatory of donor DNA that was also self-replicating. framework in place. More about the regulatory The importance of rapid, accurate and cheap and ELSI of synbio can be found in Chapter 10. DNA sequencing was illustrated in this project to identify the baseline reference genomes for the two myoplasmas studied and then to test References the accuracy of the various component DNA [1] Callaghan M, Kaufman RJ. Haemophilias: gene sequences synthesized in vitro to build the syn- therapy. In: Encyclopedia of Life Sciences (ELS). thetic bacterial genome. It is interesting to note Chichester: John Wiley & Sons, Ltd.; 2009.

MOLECULAR MEDICINE 8. Molecular and Cellular Therapies 273

[2] Coppola A, Di Capua M, Dario MN, et al. Treatment [16] Grimm D. Small silencing RNAs and gene therapy. In: of hemophilia: a review of current advances and Encyclopedia of Life Sciences (ELS). Chichester: John ongoing issues. Journal of Blood Medicine 2010;1: Wiley & Sons, Ltd.; 2010. 183–95. [17] Brown BD, Naldini L. Exploiting and antagonizing [3] Puetz J. Optimal use of recombinant factor VIIa microRNA regulation for therapeutic and experimental in the control of bleeding episodes in hemophilic applications. Nature Reviews Genetics 2009;10:578–85. patients. Drug Design, Development and Therapy [18] Kim BYS, Rutka JT, Chan WCW. Nanomedicine. New 2010;4:127–37. England Journal of Medicine 2010;363:2434–43. [4] Harris KM, Maurer J, Kellerman AL. Influenza [19] Mitsuyasu RT, Merigan TC, Carr A, et al. Phase 2 gene vaccine – safe, effective and mistrusted. New England therapy trial of an anti-HIV ribozyme in autologous Journal of Medicine 2010;363:2183–5. CD34 cells. Nature Medicine 2009;15:285–92. [5] Aderem A. Fast track to vaccines. Scientific American [20] 2011 report: Taking stock of regenerative medicine 2011;304:50–5. in the United Kingdom. www.bis.gov.uk/assets/ [6] Hung C-F, Monie A, Weng W-H, Wu TC. DNA biscore/innovation/docs/t/11-1056-taking-stock-of- vaccines for cervical cancer. American Journal of regenerative-medicine Translational Research 2010;2:75–87. [21] NIH site on basic facts about stem cells. http:// [7] Faurez F, Dory D, Le Moigne V, Gravier R, Jestin A. stemcells.nih.gov/info/basics Biosafety of DNA vaccines: new generation of DNA [22] Pynes CA. Human cloning: legal aspects. In: vectors and current knowledge on the fate of plas- Encyclopedia of Life Sciences (ELS). Chichester: John mids after injection. Vaccine 2010;28:3888–95. Wiley & Sons, Ltd.; 2009. [8] The Journal of Gene Medicine (Wiley) Clinical Trial [23] Teo AKK, Vallier L. Emerging use of stem cells Site. Comprehensive source of information on world in regenerative medicine. Biochemical Journal wide gene therapy clinical trials. www.wiley.com/ 2010;428:11–23. legacy/wileychi/genmed/clinical/ [24] Wu SM, Hochedlinger K. Harnessing the potential of [9] Aiuti A, Cattaneo F, Galimberti S, et al. Gene ther- induced pluripotent stem cells for regenerative medi- apy for immunodeficiency due to adenosine deami- cine. Nature Cell Biology 2011;13:497–505. nase deficiency. New England Journal of Medicine [25] Ben-Nun IF, Montague SC, Houck ML, et al. Induced 2009;360:447–58. pluripotent stem cells from highly endangered spe- [10] Petrus I, Chuah M, VandenDriessche T. Gene ther- cies. Nature Methods 2011;8:829–31. apy strategies for hemophilia: benefits versus risks. [26] Cooper DKC, Ayares D. The immense potential of Journal of Gene Medicine 2010;12:797–809. xenotransplantation in surgery. International Journal [11] Touchefeu Y, Harrington KJ, Galmiche JP, Vassaux of Surgery 2011;9:122–9. G. Review article: gene therapy, recent develop- [27] Pierson RN, Dorling A, Ayares D, et al. Current sta- ments and future prospects in gastrointestinal oncol- tus of xenotransplantation and prospects for clinical ogy. Alimentary Pharmacology and Therapeutics application. Xenotransplantation 2009;16:263–80. 2010;32:953–68. [28] Living Cell Technologies Ltd. www.lctglobal.com/ [12] Liu MM, Tuo J, Chan C-C. Gene therapy for ocu- [29] European Commission report – Ethics of syn- lar diseases. British Journal of Opthalmology thetic biology 2009. http://ec.europa.eu/bepa/ 2011;95:604–12. european-group-ethics/docs/opinion25_en.pdf [13] DiGiusto DL, Krishnan A, Li L, et al. RNA-based [30] Synthetic biology: an introduction 2011 (a summary gene therapy for HIV with lentiviral vector-modified of a comprehensive European Academies Science CD34 cells in patients undergoing transplantation Advisory Council report on synthetic biology). www. for AIDS-related lymphoma. Science Translational easac.eu/home/reports-and-statements/detail-view/ Medicine 2010;2:1–8. article/synthetic-bi-1.html [14] Hacein-Bey-Abina S, Hauer J, Lim A, et al. Efficacy [31] Gibson DG, Glass JI, Lartigue C, et al. Creation of a of gene therapy for X-linked severe combined immu- bacterial cell controlled by a chemically synthesized nodeficiency. New England Journal of Medicine genome. Science 2009;329:52–6. 2010;363:355–64. [32] Presidential Commission for the Study of Bioethical [15] Biffi A, Aubourg P, Cartier N. Gene therapy for Issues December 2010 report on synthetic biology. leukodystrophies. Human Molecular Genetics www.bioethics.gov/documents/synthetic-biology/ 2011;20:R42–53. PCSBI-Synthetic-Biology-Report-12.16.10.pdf

Note: All web-based references accessed on 27 Feb 2012.

MOLECULAR MEDICINE CHAPTER 9 Forensic Science and Medicine

OUTLINE

Introduction 275 Relationship Testing 290 History 276 Molecular Autopsy 291 Report from the US National Academy Bioterror 292 of Sciences 279 Microbial Forensics 294 Expert Evidence 280 Scientific Research 296 DNA Profiling 281 Future 297 Technology 281 The Conservatism of the Courts 297 Crime Scene 285 In-Field Testing 297 DNA Databases 287 Personalized Justice and Sentencing 297 Disaster Victim Identification 288 Cold Cases 289 References 298 Post-Conviction DNA Testing 290

INTRODUCTION l The living, i.e. clinical forensic medicine with its involvement in assault, rape and The word “forensic” comes from forensis some forms of trauma. (Latin) meaning “before the forum” to reflect l Forensic science is the application of scientific Roman times when a criminal case was pre- expertise to questions of law in relation to sented to a forum of people. Definitions for criminal or civil actions. It involves many “forensic medicine” and “forensic science” are areas of expertise, including evidence based many including: on crime scene investigation, archeology, l Forensic medicine refers to the application of anthropology, toxicology, fingerprint medical knowledge to questions of law. It deals analysis, , DNA with: profiling, accounting, polymer engineering, l The deceased, e.g. forensic pathology, engineering, microbial forensics, psychology, forensic odontology, forensic psychiatry, psychophysiology, vehicles and anthropology, and forensic entomology; traffic accidents, fire investigation, body

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00009-8 275 © 2012 Elsevier Inc. All rights reserved. 276 9. Forensic Science and Medicine

analysis, e.g. CCTV camera, arts, document Initially, protein markers were based on the examination, ballistics, information- ABO blood groups. Subsequently, other blood computer science, entomology and so on. groups, serum proteins, red blood cell enzymes and then histocompatibility (HLA) antigens In terms of molecular medicine, the focus of have been typed. These markers complemented this chapter will be DNA testing applications the fingerprints, and in some cases, became the within forensic medicine and science. primary forensic evidence in the courts. One major disadvantage of protein markers is History their limited degree of variability. Thus, the find- ing of commonly occurring protein polymor- The traditional dermatoglyphyic fingerprints phisms in samples from a crime scene and from made their appearance in the 1890s, and were a person of interest or the accused would be of adopted by the courts over the next few dec- doubtful value if the probability of this event ades (Table 9.1). This was followed by the use of could not exclude it occurring by chance. For protein polymorphisms to compare crime scene protein markers, probabilities that coincidence samples (blood, semen, tissue) with blood taken could explain a similar match can be as low as from the accused. Genetic differences detected 1 in 100 to 1 in 1000. Therefore, their utility in through protein polymorphisms have been a legal sense is better directed to exclusion – i.e. used in forensic laboratories since the late 1960s. samples from a crime scene and the accused had

TABLE 9.1 Developments in DNA forensic testing.

Year Event Role played by DNA

1890s Traditional (dermatoglyphic) Nearly 100 years later the DNA fingerprint arrives. Traditional fingerprints fingerprint accepted as a accepted by the courts without the rigor now imposed on DNA fingerprints. unique identifier. 1985 Immigration authorities deny DNA evidence confirms the child is related to a woman with UK residency. entry of a Ghanaian child into It also showed she is likely to be the mother and not an aunt. On this the UK. evidence, the child is allowed entry into the UK. 1987 Youth arrested for two DNA excludes the individual but links the two crimes. Subsequently, a new rape-murders committed in suspect is convicted. 1983 and 1986. 1989 New York Supreme Court – A double murder becomes one of the first cases to test the validity of DNA People versus Castro. evidence. Mid 1990s Move to PCR-based STR This represents a key technologic advance in DNA fingerprinting. analysis. 1994 DNA fingerprinting dispute Two key players in a public dispute on the value of DNA fingerprinting laid to rest (Nature). publish a joint paper indicating their concerns about the scientific basis for DNA fingerprinting are now resolved. 1995 O Simpson declared innocent This case emphasizes the importance of crime scene investigation, and in a double murder case. chain-of-custody of samples for DNA testing. 1995, 1998 UK National DNA Database Provides some early examples of how a national database can assist in established (followed soon solving crimes but also illustrates potential civil liberty issues. after by the FBI’s database).

(Continued)

MOLECULAR MEDICINE 9. Forensic Science and Medicine 277

TABLE 9.1 (Continued)

Year Event Role played by DNA 1996 US National Institute of Study identifies 28 individuals convicted of serious crimes (some on death Justice commissioned report row) who are then exonerated because of DNA evidence. on DNA evidence in the courts. 2001 2 792 people killed in terrorist Illustrates the most difficult scenario for disaster victim identification based attack on the World Trade on the numbers killed as well as the state of the remains. Center in New York. 2001 Anthrax bioterror threat. Potential for bioterrorism becomes real and provokes DNA-based response systems to be set into place. 2004 mtDNA and bioterror. Perpetrator of terror bombing identified though mtDNA profile. 2009 Court sentencing influenced Murderer’s sentence reduced based in part on evidence that he had a class by genetic DNA test results. of genes which contributed to the crime because they made him more aggressive.

different protein polymorphisms and so were colleagues described how more complex DNA not related – than inclusion – i.e. samples from a polymorphisms (called minisatellites) could be crime scene and the accused had the same pro- used to produce DNA profiles for individuals. tein polymorphisms and so are likely to have The courts became interested when the poten- come from the same source. tial for identification of individuals on the basis Other problems inherent in protein analysis of their minisatellite DNA patterns was real- include: ized. The first court case to allow DNA finger- prints as evidence took place in 1987. 1. The quantity of tissue required; DNA polymorphisms provided a more 2. The ease with which proteins degrade. This sophisticated approach to tissue compari- is particularly relevant to the crime scene sons. They enable exclusion or inclusion of the where ideal laboratory conditions will not be accused or person of interest, since the chance found, and the tissue available for analysis of a match between DNA markers taken from will, more often than not, be limited in evidentiary material at the crime scene and amount and quality, and the accused occurring by chance are highly 3. Evidence based on protein markers is unlikely – probabilities from 1 in 105 to 1 in 106 unlikely to be helpful or even available are now achievable. Subsequently, the avail- for a crime committed in the past, because ability of PCR and the finding of another type protein, unlike DNA, does not last and of DNA polymorphism called microsatellite or cannot be stored for long periods. simple tandem repeat (STR) (Chapter 1, Table 1.5) has given DNA fingerprinting added utility DNA Polymorphisms (see PCR below). DNA polymorphisms were first described British and North American courts of law in the 1970s – early 1980s. Their numbers and soon accepted DNA testing as a suitable types have since expanded rapidly (Chapter 1, form of evidence in civil and criminal cases Table 1.5). The inherent variability in DNA (Table 9.1). For the police, DNA-based evidence polymorphisms led to the concept of DNA has been particularly valuable in excluding a fingerprinting in 1985, when A Jeffreys and person of interest, as well as pointing to a likely

MOLECULAR MEDICINE 278 9. Forensic Science and Medicine suspect. A particular appeal of the DNA fin- 2. The statistical methods that were used to gerprint lies in the robustness of DNA, so that calculate the likelihood of DNA matches samples from a crime scene are suitable for test- were questioned following some absurd ing, even long after the crime was committed. claims about matches. Particular concern The value of the DNA fingerprint mirrored the was expressed when the accused came from steady improvements in DNA technology as a minority ethnic group. In the State versus well as a better understanding of the distribu- Castro example (Box 9.1), the laboratory tion of DNA polymorphisms within popula- reported that a DNA match between a tions (Figure 9.1). blood stain found on the accused and blood In the late 1980s – early 1990s, DNA forensic from the victim had a 1 in 108 probability fingerprinting passed through a controversial of occurring by chance alone. However, phase. Three problems were identified: the comparisons used to derive the chance association were considered invalid for a 1. The use of patented DNA polymorphisms number of reasons, including the fact that meant inter-laboratory comparisons were they had not been made against an ethnic not possible and so quality assurance was group to which the accused belonged, i.e. severely compromised; Hispanic, and 3. The chain of custody issue was particularly important for DNA evidence because of the greater risk for contamination with PCR. DNA Fingerprint These problems have now generally been resolved with input from government, law enforcement agencies, particularly the US

Ambiguous Federal Bureau of Investigation (FBI), and No Match or Match the involved laboratories. In a relatively short No Result time frame, DNA technology has had a major impact on the judicial system, which is note- exclusion (~20%) sample problem DNA specimens same worthy given the slow pace with which it usu- test problem random match ally moves. In 1995, it became legal in the UK false result for law enforcement agencies to take DNA from hair roots or buccal swabs (i.e. non-intimate FIGURE 9.1 Outcomes from a DNA comparison in the forensic situation. There are three possible outcomes from samples) of those convicted of serious crimes. DNA fingerprinting: (1) No match: This is a very power- The aim was to establish the UK National DNA ful argument for excluding an individual. (2) Ambiguous Database. In 1990, the FBI started work on a or no result: This may be due to problems with the speci- similar database, and there are now many oth- men or the test. (3) Match: This finding can be interpreted ers worldwide. in a number of ways and it is the function of the laboratory, expert witnesses and the courts to determine which is the Law and order are important election issues. most likely. Possibilities include: (i) The crime scene DNA A common response is for additional law and the suspect’s DNA are the same; (ii) The crime scene enforcement officers to be appointed, but rarely DNA and that of the suspect are the same by chance; (iii) is there a corresponding increase in forensic A false or spurious result from errors including collection science capability to deal with the additional or processing of the sample; misinterpretation or incor- rect reporting of the laboratory results. The error may also evidence that will result. The backlog of evi- have resulted from criminal intent on the part of the police, dentiary material that is found in many coun- la­boratory staff or the victim. tries will continue to grow. Capacity building is

MOLECULAR MEDICINE 9. Forensic Science and Medicine 279

BOX 9.1 CONTROVERSIES OVER DNA EVIDENCE IN THE COURTS. In 1989 the first major controversy over DNA number of cases had to be withdrawn by the forensic testing arose during a pretrial hearing in prosecution because DNA data comprised an a double murder involving the State of New York important part of the evidence. Cases already versus Castro. At this time DNA evidence was decided were appealed. The scientific contro- first seriously questioned by a number of lead- versies about DNA fingerprinting continued ing scientists. This subsequently led to the dem- into the 1990s, particularly with respect to the onstration of suboptimal laboratory practices, significance of identical matches. Public interest as well as doubtful interpretations of the sta- reached extraordinary levels in the 1995 trial of tistical significance of DNA polymorphic data. O Simpson in the USA, when he was accused The potential for error was difficult to quantify, of a double murder and the evidence included because quality control programs could not blood at the crime scene, on items of his clothes be developed with DNA profiling since many and in his car. Despite very strong DNA evi- forensic laboratories utilized their own single- dence presented by the prosecution, the accused locus VNTR probes protected by patents. This was acquitted. This case highlighted that DNA meant that there was less transparency and evidence is only as good as the laboratory that inter-laboratory comparisons were impossible. produces it, and equally important, the way in Some of the evidence from DNA studies in the which police and forensic experts handle the Castro case was deemed inadmissible, although evidence to ensure that the chain-of-custody the accused later confessed. Following this a cannot be challenged.

needed to increase the number of skilled indi- jurisdictions, demands and limitations [1]. viduals working in this area. Fortunately, this It made the important point that forensic sci- has been helped by popular television shows ence involved many disciplines resulting in which have given forensic science a higher pub- disparate technologies, methodologies, pub- lic profile. However, to get the best people and lished material, reliability of measurement and to keep them will require more academically other issues. Practitioners were also diverse in rigorous training, continuing education pro- their training and skills, ranging from medi- grams and skills in research. cal graduates, scientists, technicians, crime scene investigators to law enforcement officers. Report from the US National Academy Not surprising in this complex mix, the review of Sciences found differing standards and inconsistencies between states as well as between the state and In response to the US Congress, the US federal systems. The report made 13 recommen- National Academy of Sciences undertook a dations to improve forensic services by: broad review of forensic science services in 2009. It dealt with a system that was complex 1. Better assisting law enforcement officials to and decentralized with a multiplicity of players, identify perpetrators with higher reliability;

MOLECULAR MEDICINE 280 9. Forensic Science and Medicine

2. Reducing the occurrence of wrongful US legal system. Many of the recommendations convictions, and will apply to other countries and jurisdictions. 3. Enhancing the nation’s ability to address homeland security needs (Table 9.2). Expert Evidence The report provides a fairly ambitious blue- The way that expert witnesses present their print. If it is fully implemented it will drive sig- evidence to the courts is critical. For exam- nificant improvements in forensic sciences in the ple, a claim by the prosecution that a DNA

TABLE 9.2 Recommendations for Strengthening Forensic Science in the United States: a path forward [1].

No. Recommendation

1. Set up independent federal entity National Institute of Forensic Science (NIFS). 2. NIFS should establish (i) Standards required for reporting results and court testimony; (ii) Develop some model laboratory reports. 3. Research undertaken to address issues of accuracy, reliability, uncertainty and validitya as well as automation that will enhance technologies. 4. Public forensic laboratories and facilities should move from the administrative control of law enforcement agencies or prosecutors’ offices. 5. Research undertaken on human observer bias and sources of human error in forensic testing; Standard Operating Procedures to minimize errors developed. 6. NIFS to work with organizations and expert groups to develop tools for advancing measurement, validation, reliability, information sharing and proficiency testing. Protocols for forensic examinations, methods and practices should be established. Standards developed should serve as accreditation tools for laboratories and guides to certification, training and education of staff. 7. Laboratory accreditation and individual certification of forensic science professionals should become mandatory. 8. All laboratories must establish routine quality assurance and quality control procedures to ensure accuracy of analyses. 9. National code of ethics for all forensic science disciplines should be established. Individual professional societies are encouraged to incorporate this into their own professional code of ethics. 10. NIFS to work with educational institutions to improve and develop graduate education programs designed to move across organizational, programmatic and disciplinary boundaries. 11. Establish medical examiner systems to replace existing coroner systems. Extend and improve medical examiner offices; support research education and training in forensic pathology; form working group to develop and promote standards for best practices for death scene and postmortem examinations; accredit medical examiner officers; ensure all medico-legal autopsies are performed or supervised by board certified forensic pathologists. 12. NIFS to launch a new broad-based effort to achieve nationwide fingerprint data interoperability. 13. NIFS to prepare (in conjunction with the Centers for Disease Control and Prevention and the FBI) forensic scientists and crime scene investigators for their potential roles in managing and analyzing evidence from events affecting homeland security. aNo different to the discussion in Chapter 3 on standards required to assess the clinical utility, clinical validity and analytic validity of DNA genetic tests as well as measures of uncertainty.

MOLECULAR MEDICINE 9. Forensic Science and Medicine 281 match between the accused and blood obtained expectations have raised concerns, particularly from the victim’s clothing represents a 1 in 106 regarding the way that juries might overvalue chance of a random event is very persuasive the significance of DNA evidence. evidence. However, an equally crucial com- ponent to this evidence is the requirement to explain how the test was done, what are DNA PROFILING its potential drawbacks, and importantly the methods used to assess the statistical probabil- Technology ity of a random match. Challenges for the courts include: DNA Amplification (PCR) A number of properties make PCR ideal for 1. How can complex scientific data be forensic testing: presented to a jury, and will jurors with little knowledge of DNA fingerprinting be 1. Minute amounts of evidentiary material overawed by the science? at the crime scene will provide enough 2. Ensuring a more conducive and non- template for DNA analysis; threatening atmosphere for the expert 2. Degraded DNA can still be amplified, since witness in the adversarial system. only a small segment of DNA is required for The alternative is to have a diminishing primers to bind in PCR; pool of experts, since court-related work 3. As little material is needed for a PCR, it is time-consuming, and experts can be remains possible to retest the sample in made to feel very uncomfortable during another laboratory, or at some future date; cross-examination. “Why get involved?” is 4. Automation is available, leading to less the feeling of some who might otherwise chance of contamination and greater contribute useful knowledge. accuracy in fragment sizing, and 5. Formal quality assurance programs can be Novel ways of providing expert evidence to established. the jury have been proposed but are criticized within the legal profession because they are per- Balancing the above are two problems: ceived as having the potential to introduce bias PCR-based errors. Like any laboratory pro- (that of the expert) or junk science to the courts. cedure, there is always the potential for errors These are certainly valid concerns. On the other to occur through misincorporation by the Taq hand from a scientific viewpoint, it is frustrating polymerase enzyme, or differential amplifica- to observe that molecular science can produce tion of DNA sequences leading to what is called results in the forensic scenario that might be allele drop-out (Chapter 3). These are not neces- ex­plicable following a reasoned debate. In the sarily a problem with PCR in forensic practice adversarial system, this can be difficult to achieve, if the test can be repeated, and modern Taq and potentially credible evidence can be rejected. enzymes are less likely to introduce misincor- The community is generally familiar with porations. Does exposure to the environment DNA profiling. Indeed, television shows with its consequent DNA-damaging effects have made the taking of DNA from the most lead to errors in PCR? Experience would now unexpected crime scene material and then its suggest this is not an issue. In other words, if overnight processing in a glamorous and super- amplification occurs, the end product will be efficient forensic laboratory into an everyday relatively free of artifacts because PCR ampli- occurrence. Not surprisingly there is more fies very small fragments and even damaged to the process than this, and such unrealistic DNA remains a suitable template for PCR.

MOLECULAR MEDICINE 282 9. Forensic Science and Medicine

There remains concern when low template ensure that human tissue is tested as well as (or low copy number) DNA is used, i.e. the the correct STR) and high sensitivity. It must be crime scene sample is very small, and even not multiplexed because of the small quantities of visible to the naked eye. In this circumstance, material available. Some flexibility in the meth- the PCR may not be reliable, and there may be odology is needed depending on the evidence insufficient material to run the test in duplicate collected and the type of crime under investi- for confirmation. This is further exacerbated gation. The small amount of material available when DNA mixtures from more than one indi- becomes more significant if the crime scene is vidual are present. In these circumstances some potentially contaminated with other sources of courts will not admit this evidence. There is DNA. Hence, there are now three approaches to also a greater likelihood of contamination by developing a DNA profile. They utilize nuclear other DNA sources. DNA, mitochondrial DNA or Y chromosome Contamination. The effect of contaminating specific DNA (Table 9.3). Which one or which DNA on PCR has already been mentioned in combination is the best depends on the case relation to genetic disorders, and the detection under investigation. of pathogens. This problem occurs in the ideal laboratory despite high standards of practice. Calculating the Probability of DNA Matches Potential sources of contamination in forensic DNA profiling allows patterns (genotypes) DNA testing are numerous, including the lab- between two samples to be compared, and then oratory, police, other parties, the crime scene an estimate is made about the likelihood that and amplified products already present in the the two are related – i.e. the probability of them forensic laboratory. being present in another individual. For this, it An important goal of the Human Genome is essential to know the frequency in the popu- Project was technology development lation of the various markers that make up that (Chapter 1). This has resulted in new auto- genotype. This forms the basis of the statisti- mated methods for DNA analysis such as flu- cal calculation to determine whether the two orescein-labeled DNA primers with PCR, the specimens are most likely derived from the same use of capillaries for DNA electrophoresis and source (Figure 9.3). The product rule used to the sizing of DNA fragments with lasers and calculate this probability makes a number of computer software. These have all contributed assumptions, such as random mating, and to the forensic laboratory becoming a sophisti- whether the alleles for the multiple STR mark- cated DNA analysis facility ensuring the highest ers segregate independently of each other, i.e. quality DNA fingerprinting. Ongoing standards there is no linkage disequilibrium. The above is are maintained through formal accreditation, a fairly simplistic overview of how probabilities regular quality assurance and proficiency test- are calculated and a more in depth description ing. Externally-based testing programs involv- can be found in [3]. ing the analysis of unknown samples using For some time there was considerable the same STRs can give courts an indication debate about random mating and the effects of a laboratory’s performance in DNA typing of linkage disequilibrium in ethnic or minority (Figure 9.2). groups within a community. Does a single allele present for one marker represent homozygosity Choice of DNA Markers for that marker, an additional null allele which DNA profiling for forensic purposes requires has not been typed, or two alleles which cannot the same high standards as DNA testing for be distinguished? These were some of the ques- genetic disorders, including high specificity (to tions that had been asked by scientists and the

MOLECULAR MEDICINE 9. Forensic Science and Medicine 283

1

2 67 66 65 64 63 62 61 60 59 base pairs 58 57 56 55 54 53 52 Time (6 months)

FIGURE 9.2 Measurement and quality assurance issues with DNA fragment sizing. Software-based automated frag- ment calling is now used to measure the size of an allele. This ensures both accuracy and reproducibility. (1) Depicts six different PCR amplified DNA polymorphisms that are distinguished by their sizes (and so migration) or different color when there is co-migration. Two examples of the latter are marked by ↑. The other four polymorphisms are homozygous for the same allele and so would be less helpful in DNA profiling. The small orange colored peaks are standard size markers. (2) Illustrates regular QA measurements over a period of six months for four DNA fragments in the size range 55 to 65 bp. The graph confirms the reproducibility of the DNA electrophoresis with very little drift over this period of time. This semi- quantitative assessment of measurement accuracy became possible once capillary electrophoresis replaced the traditional gel electrophoresis.

courts. Although the debate had been vigorous, on the frequencies of DNA markers, popula- ultimately it enhanced the quality of the science tion geneticists now agree that the differences (Table 9.1). these make to the final calculations are mini- Today, there are a number of DNA data- mal. For example, a probability of 107 might bases with which to make direct comparisons be reduced to 106 because of population dif- even in sub-populations. For example, a large ferences. A ten-fold or even one hundred-fold city in the USA might have STR allele frequen- difference in probability should not in itself be cies for its sub-populations, including Black, sufficient to convict or acquit a defendant. Caucasian, Hispanic, American Indian and Estimates of the likelihood of random DNA Asian. Despite all the rhetoric about regional matches occurring can be very impressive. For and ethnic factors and their potential effects example the odds of this being a chance event

MOLECULAR MEDICINE 284 9. Forensic Science and Medicine

TABLE 9.3 Three different types of DNA used in forensic testing.

Types of DNA, their properties, advantages and disadvantages

Nuclear DNA: When the FBI developed its DNA database in the late 1990s, an important component to this work was the selection and validation of nuclear DNA markers known as STRs (microsatellites) that would be used for this database. Thirteen STRs were validated. All but two are on different chromosomes and so inherited independently of each other which adds to the power of the result in terms of its ability to distinguish different profiles. None are in coding regions to avoid potential ethical issues such as the finding of a genetic abnormality in an individual without that person’s consent, and the possibility that there could be some selective effect on the STR if a coding DNA polymorphism was used. In addition, a 14th DNA marker is derived from the amelogenin gene (located on both the X and Y chromosomes) allowing sex to be determined. Today, forensic DNA testing laboratories use commercially produced kits containing various combinations of the STRs from 9 to 16. Some kits are manufactured for specific purposes, for example, one kit deals with small amounts of DNA that might be relevant in cold cases or represent minimal material at the crime scene. Kits specifically designed to detect DNA from the Y chromosome (i.e. male sex) even in a male to female mix of 1: 1 000 are also available. MtDNA: This DNA is highly variable in a region known as the D loop. Variability predominantly results from single base changes and some length polymorphisms. Advantages of mtDNA in forensic testing include: (1) Exclusive maternal origin facilitates the analysis of some family relationships; (2) Thousands of copies are present in each cell (in comparison there are only two copies of nuclear DNA per cell). Therefore, smaller crime scene samples can be tested, and (3) Hair is frequently found at the crime scene but there are multiple potential sources including victim, accused, police, bystanders etc. Individual hairs must be studied. Nuclear DNA (present as two copies per cell) is extractable from hair roots but not the shafts. However, multiple copies of mtDNA are present in the shafts, and so it is possible with PCR to type individual hairs without the necessity for roots to be present. Other useful sources of mtDNA are bones and teeth. These are robust specimens from which DNA can be extracted even though many years may have lapsed since the crime. Disadvantages of mtDNA DNA in forensic testing include the ease with which contamination occurs compared with nuclear DNA testing. Therefore, technical demands are greater. Another concern about mtDNA is heteroplasmy, i.e. the presence of one or more mtDNA types in an individual. It has been shown, for example, that hairs from an individual might demonstrate a different mtDNA profile because of heteroplasmy (Chapter 2). An example of how mtDNA was used to assist police identify a terrorist is given in Box 9.2. Y chromosome STRs: The use of the Y chromosome for forensic analysis is only a fairly recent development because the appropriate STRs had to be found. For example, the X and Y chromosome share some DNA sequence and this region of DNA would not be suitable for an STR that was to be Y chromosome specific (Figure 2.4). There are now a number of validated Y chromosome STRs and these are particularly useful in sexual assault crimes because any contaminating DNA from the female (victim) does not interfere with examination of tissue (semen or blood) from the accused male. Even contaminated samples can be distinguished with this approach. A downside of Y chromosome STRs is their reduced power of discrimination as there will only be one band present because there is only one Y chromosome. So a number of Y-STRs will be needed. One commercial kit provides 17 markers and there are various standards published on what is required for suitable typing. Another application for Y chromosome STRs is in mass disasters or missing person identification when DNA from male relatives is only available.

might be less than one in a trillion (1012). The individuals. The extreme here would be identi- impact on a jury of hearing a figure like one in cal twins who share the same DNA profile. This a trillion compared to say a one in a thousand has already produced dilemmas, including an would be significant. Nevertheless, as will individual accused of theft in the UK and, in be discussed below, some estimates for ran- Malaysia, a man suspected of drug smuggling. dom matches are now being questioned. One The latter was sentenced to death but released should also note that probabilities are for ran- because it could not be proven which of the dom matches and so not relevant for related twins was the actual culprit.

MOLECULAR MEDICINE 9. Forensic Science and Medicine 285

Suspect #1 Victim Suspect #2 Judicial Perspective Courts can take different views on the admissibility of DNA evidence. In some juris- dictions it cannot be used to convict on its own, or at best it can only be concluded that DNA taken from the crime scene might be the same as that from the accused. Other courts allow a DNA profile as the sole basis for conviction, since it is considered to be more reliable than some forms of evidence, particularly visual DNA DNA DNA identification. A contemporary view of DNA STR STR % STR evidence from a judge is given in [4]. Many relevant cases are highlighted, with one being A A 0.20 A considered by the High Court of Australia to B B 0.50 B determine whether a jury could be prejudiced C C 0.50 X if the probability of a mtDNA profile (mtDNA D D 0.05 D because a hair shaft from the victim was used) E E 0.01 X in a murder trial was given as a percentage, i.e. F F 0.40 X 99.9% of the population would not have this G G 0.10 G profile, rather than presented as a frequency of H H 0.10 X 1 in 1 600 as the probability of a random match. It will be interesting to see how this is decided.

FIGURE 9.3 Calculating the statistical chance of a ran- Crime Scene dom match using the product rule in DNA fingerprinting. Following a crime, blood is found on the clothes of two At the crime scene there may be stains (blood suspects. DNA is prepared from the victim, and the profile or semen), tissues (skin, hair under the vic- is then compared to DNA from the blood on the suspects’ clothing. The results of eight STRs (A–H) are given for tim’s fingernails) or objects, including weapons. DNA from the victim (center). For simplicity, only one allele DNA from these evidentiary samples is used is used although each STR can have up to two alleles. The to build a profile which will then be compared frequency (%) for each allele in the population is also given. to that from suspect(s), evidence linked to the Even without any calculations, suspect #2 can be excluded suspect(s) or national DNA databases. Apart since at four alleles (marked with an X) the DNA profile taken from the blood on his clothes has different fragment from the person(s) of interest, there are a number sizes than blood from the victim. In contrast, the DNA of sources for DNA in the crime scene, includ- profile from the blood on the clothes from suspect #1 has ing the victim, third parties, or the environment. the identical STR profile. The chance that this is a random This potentially complex mix is illustrated by a event is calculated by multiplying together the frequencies case of rape, where DNA can come from: of all the alleleic components making up the profile (called the product rule). In this case, the chance that the DNA 1. The victim in the form of blood, body tissues from the victim and the DNA from the blood on suspect or secretions and bacteria; #1 is a random event is 1 in 107. Using more STRs would increase further the probability that the two samples were 2. One or more assailants; related although the statistical chance that this is a random 3. Semen from earlier consensual intercourse; match is already extremely low. 4. Animals or bacteria from the crime scene, and 5. A possibility that the source of DNA was planted by a third party including the victim or police.

MOLECULAR MEDICINE 286 9. Forensic Science and Medicine

DNA profiling is still possible in such a sce- from the blood spots will subsequently pro- nario. DNA from microorganisms or other ani- vide important evidence connecting the victim mals does not usually cross-hybridize with with the individual wearing the clothing or the human-specific DNA. DNA from sperm is more crime scene. An interesting example of an unu- robust. Therefore, laboratory protocols can be sual source of DNA was the finding in 2007 by designed to utilize this property and enhance Finnish police of a mosquito in a stolen car. The its isolation at the expense of DNA from other mosquito appeared to be engorged, and so it tissues (Table 9.3). The problem of multiple was taken as evidence in case it had bitten the human DNA sources can also be addressed. person who stole the car. This turned out to be First, the victim’s DNA profile is obtained and the case, as DNA extracted from the mosquito subtracted from the overall profile. DNA con- was matched to a profile on the police database. tributed from an innocent third party can be This profile led to a person of interest who sub- treated in the same way. Next, DNA patterns sequently confessed to the crime. As DNA pro- from evidentiary samples are compared to filing becomes more automated, reliable and those obtained from potential assailant(s). From cheaper, it is being used for less serious crimes, these comparisons it becomes possible to get particularly house breaking and robbery. a better understanding of how multiple DNA No matter how good the DNA fingerprint sources relate to crime scene DNA. is, its value is ultimately dependent on a well Blood from the victim may have spilled or established chain-of-custody. This is essential to splashed onto an assailant’s clothing or the avoid the criticism that the police or others had crime scene, for example a car. DNA isolated tampered with the evidence, or planted false

BOX 9.2 THE BOMBING OF THE AUSTRALIAN EMBASSY IN INDONESIA. Following any terrorist attack, the law enforce- what was assumed to be tissue samples from the ment agencies must identify the perpetrator(s). perpetrator. This was matched against mtDNA Some organizations claim responsibility but this profiles taken from the mothers of four known is less likely to occur in Southeast Asia making terrorist suspects. One of the four mothers had the task of the police more difficult. In September a similar profile and so a suspect was tentatively 9, 2004, the Australian embassy in Jakarta identified. The next step was to take different (Indonesia) was bombed in a suicide attack. The tissue fragments from the presumed perpetrator bomb was massive and very few pieces of the and get a more comprehensive DNA profile perpetrator remained. Ten other people were using 13 CODIS STRs. Three tissues gave identi- killed and many were injured [2]. The forensic cal profiles confirming that they were the same investigators assumed that the perpetrator’s person. The identity of the bomber was finally remains were most likely to be the furthest from determined by using the CODIS STRs and the blast (the deceased victims were about 10 matching them against the mother identified ear- meters away) and so started searching for and lier through mtDNA testing and her husband. testing remains located at the furthest points. This confirmed the parents of the perpetrator An initial mtDNA screen was obtained from and so his identity.

MOLECULAR MEDICINE 9. Forensic Science and Medicine 287 evidence. Anything less will invalidate what suspected of a crime (with the seriousness vary- might otherwise be very persuasive DNA evi- ing between jurisdictions) rather than the dence. Doubt about the chain-of-custody com- earlier emphasis on getting DNA from those prised an important component of the defense convicted of a serious crime. In some cases the case in the O Simpson murder trial with police DNA profile from crime victims has been stored being accused of planting key evidence linking and could be used at some future date to con- him to the victims and the crime scene (Box 9.1). vict them of a crime. As discussed below under Partial DNA Matches, DNA from relatives can DNA Databases also be used to identify possible criminals. Another concern is how long DNA profiles The UK’s National DNA Database was the are kept. Some remain on databases indefi- first to be created and is probably the largest nitely, while others are removed at various internationally, based on that country’s popu- times after an individual is released from jail. lation. Over 5.6 million individuals’ profiles In one noteworthy case in 2008, the UK gov- and 400 786 crime scene profiles are stored. The ernment was found by the European Court of US National DNA Index System (NDIS) held Human Rights to have acted illegally by keep- nearly 10 million Offender DNA profiles and ing the DNA profiles of two British citizens 384 604 000 Crime Scene DNA profiles in 2011. despite no conviction being recorded when the Related is the FBI developed CODIS (Combined police dropped charges. DNA Index System) which is software built to Access to various DNA databases is restricted allow law enforcement crime laboratories to to law enforcement agencies for privacy and compare DNA profiles stored in local, state and security reasons. However, this has raised con- the national DNA databases. These databases cerns, particularly in relation to how the odds hold profiles from: are calculated for unrelated people who share the same DNA profiles. Because the databases are 1. Convicted offenders; closed to external review there is no independ- 2. Unsolved crime scene evidence; ent way of checking these calculations. This issue 3. Missing persons containing DNA has emerged in the USA, and a claim has been information from: made that getting a random exact match with a. Relatives of missing persons; nine CODIS STRs is more common than might b. Unidentified human remains; be expected. The official calculation for a ran- 4. Disaster victim identification. dom match if nine CODIS STRs are used is 1 in CODIS has also had a indirect benefit on 754 million in Caucasians. Nevertheless, it was how DNA fingerprints are obtained by requir- reported that as many as 90 random matches ing the validation of a set of STRs that are now had been made in the relatively small Arizona used with greater confidence [5]. DNA database, which has 65 493 DNA profiles [3]! Although the study has provoked criticism Concerns about DNA databases it raises doubt in those jurisdictions where DNA Ultimately, the value of DNA databases matches are not made on the more robust number needs to balance civil liberties with law and of 13 CODIS STRs but only nine markers are order issues. Hence, despite the success stories used. Until the issue of odds and their accuracy associated with the various forensic DNA data- is addressed there will continue to be an uneasy bases, the ethical and privacy issues emerging feeling in the scientific and lay communities. from them are considerable. A controversial As the name implies, DNA databases hold issue is the collection of DNA from individuals information not DNA samples. New knowledge

MOLECULAR MEDICINE 288 9. Forensic Science and Medicine or technologies will not necessarily allow the match approach [6]. The list is impressive DNA database to be updated, and hence the and involves serious crimes that might have importance of getting it right the first time. remained unsolved. Ultimately the benefits This assumes that after DNA samples are taken coming from this approach need to be balanced from individuals they are destroyed and only with the damage to innocent parties who just the profiles are kept. This may not be true in all happen to share parts of a DNA profile with the jurisdictions, which highlights another poten- perpetrator. This will continue to be an ongoing tial ethical dilemma if DNA samples are kept as issue as more jurisdictions are seen to be mov- there is the risk of misuse at some future date. ing to this type of trawling of DNA databases.

Partial DNA Matches Disaster Victim Identification An interesting but controversial use of DNA databases is the obtaining of partial DNA The 2001 9/11 terrorist attack on New York’s matches (also called familial DNA searches). World Trade Center left 2 792 deceased victims. This does not implicate a particular indi- Three years later, a tsunami struck a number of vidual which would require a perfect match, countries bordering the Indian Ocean leaving but because of the genetic association would around 217 000 dead. Disaster victim identifica- bring relatives to the attention of the police. tion (DVI) in these two circumstances was dif- Some argue that criminals, particularly those ferent, because in the former there was severe involved in very serious crimes, have lost their tissue fragmentation and after the tsunami the right to genetic privacy, yet no one would disa- geographical dispersion of victims became an gree that their genetic relatives still retain this issue. The large numbers of victims made DVI right. A website from the US District Attorney considerably more complex than what had in Denver provides a list of cases that would been experienced previously, after incidents not have been solved without the partial DNA such as plane crashes (Box 9.3).

BOX 9.3 DNA FINGERPRINTING IN THE CASE OF A MASS DISASTER [ 7 ] . Swissair Flight 111 crashed on the 2 September used to extract DNA for matching to family sam- 1998 with the loss of 229 lives (215 passengers ples. Over 300 living family members gave their and 14 crew). Bodies were dismembered when DNA for comparative analysis and within 3.5 the plane fell, 4 km off the US coast at the begin- months, the forensic laboratory had unequivo- ning of its flight from New York to Geneva. cally identified the 229 victims. The grim task ahead was to identify individu- Two months after the World Trade Center ter- als, put together the human remains (there ror attack, American Airlines Flight 587 crashed were 1 277 crash scene samples) and perhaps in New York, killing 265 passengers, crew and determine cause of death, since a charred body five victims on the ground. With the lessons might suggest where an explosion had occurred. learnt from the World Trade Center as well as Various personal effects (toothbrush, combs, hair infrastructure in place, all bodies had been iden- brushes etc.) were also found, which could be tified within one month.

MOLECULAR MEDICINE 9. Forensic Science and Medicine 289

In 2006, a report, “Lessons learned from 9/11: of DNA made identification more difficult, DNA identification in mass fatality incidents”, was accounting for the overall disappointing suc- published [8]. It made wide ranging recommen- cess rate of less than 10%. dations on what had been learnt from 9/11 and what was needed to respond to any future mass Cold Cases disaster. Regarding DNA profiling, it noted that DNA typing for a mass disaster is essentially the DNA profiling can be used in old or same process as dealing with missing persons unsolved crimes or to identify human remains. with the following additional requirements: The availability of parental DNA samples might allow identification of a body when con- 1. Processes in place might need to ventional means (physical appearances, der- accommodate significant changes in matoglyphic fingerprints, dental charts) have numbers; been unsuccessful. Dissimilar DNA profiles will 2. A decision of whether to identify all the exclude a relationship. Teeth are important evi- victims or all the remains needs to be made. dentiary material in forensic cases, since they If the former, DNA analysis would stop once are more resistant to postmortem degradation the last victim was identified; and extreme environmental conditions. Teeth 3. DNA-based identification may become a are also easy to transport and serve as a good second choice to the use of visual identifiers, source of DNA. Comparisons of antemortem dental and traditional fingerprinting because dental records with skeletal remains have long of time constraints. Nevertheless, DNA provided a useful means of identifying individ- evidence should be taken in case the other uals, even in a mass grave. In affluent societies, approaches fail; dental records may be decisive in determining 4. Early planning for the processing of the identity of individual victims. However, reference samples is essential; i.e. DNA from in less affluent communities, which are more the victims’ personal effects and kinship likely to be involved in human rights abuses DNA samples from relatives need to be associated with mass murder, dental records made available; are unlikely to be available. In this situation, 5. Outsourcing of the DNA testing is likely to the only option for identification might be DNA be needed; analysis. 6. Ensuring intact chain-of-custody and There are many examples in the media of appropriate clerical documentation require crimes that have been solved decades after they prior planning, and were committed because evidentiary mater­ 7. Project management issues need to be ial has been re-examined using DNA profiling identified. for the first time, or more sensitive DNA tech- Because of the extreme heat and fragmen- niques have become available. For cold crimes tation of bodies, only about 1 585 of the 2 792 to be solved, there needs to be cooperation 9/11 victims had been identified by 2005. between law enforcement agencies, the foren- DVI during the tsunami involved standard sic laboratory and a centralized DNA data- approaches unless excluded by decomposi- base. Statutes of limitation that were imposed tion. DNA testing was made possible by taking because of the knowledge that with time wit- samples from buccal mucosa, hairs, muscle and ness accounts may no longer be accurate may when the body was decomposed, from ribs, need to be reassessed because DNA testing teeth and the femur. Due to decomposition in can still provide answers after many decades. salt water and high temperatures, degradation A 2002 National Institute of Justice report

MOLECULAR MEDICINE 290 9. Forensic Science and Medicine provides some practical considerations about is no longer available but the recommendations DNA profiling to solve cold cases [9]. are worth summarizing (Table 9.4) [11].

Post-Conviction DNA Testing Relationship Testing DNA profiling can be used by the defense Relationship testing is used for a number of to exclude a match or appeal a conviction. An purposes, including the elucidation of paternity accused who is on trial because of evidence or family membership usually in the context of obtained from an eyewitness may find that immigration. DNA testing is the only means by which inno- cence can be proven. DNA fingerprinting will Paternity Testing save time in police investigations since sus- This can provoke emotive debate and con- pects can be quickly excluded. Despite being troversy particularly with consent issues. Who acquitted of a crime, an individual can suffer gives consent is difficult, although ideally it humiliation and possible stigmatization fol- should include all parties involved, i.e. the lowing arrest and trial. Wrongful arrest can putative father, mother and child. There is gen- be avoided by DNA testing. Two experienced eral agreement that whatever is undertaken forensic laboratories (the US Federal Bureau has to make the child’s interest the paramount of Investigation and the British Home Office) concern, and the results must be obtained in have reported that DNA testing has allowed a way that is acceptable to the courts. In the suspects to be excluded in approximately United Kingdom, a new law covers DNA theft 20–25% of cases. The Innocence Project reports that there have been 258 post-conviction DNA exonerations in the USA. Of these, 17 were sentenced to death before DNA proved their innocence! The aver- TABLE 9.4 The Justice Project’s recommendations for expanding post-conviction DNA testing [11]. age sentence served by those exonerated was 13 years, and 70% were members of minority Recommendation groups. In about 40% of cases, the actual per- 1. Requires the preservation of biological evidence petrator was identified by DNA testing [10]. throughout a defendant’s sentence and devises Causes for errors identified by this project standards regarding custody of evidence. include social determinants like poverty and 2. Ensures that all inmates with a DNA-based innocence race, and criminal justice issues, such as: the claim may petition for DNA testing at any time and inclusion of incorrect eyewitness testimony; without regard to plea, confession, self-implication, the nature of the crime, or previously unfavorable test poor, illegal or inappropriate forensic testing; results. overzealous police or prosecutors and inept 3. Requires judges to grant post-conviction testing petitions defense lawyers. Since these cannot easily be when testing may produce new material evidence corrected or prevented, the importance of post- that raises a reasonable probability of the petitioner’s conviction DNA testing to identify the innocent innocence or reduced culpability. 4. Ensures that practitioners have access to objective and cannot be overemphasized. Another project reliable forensic analysis at independent laboratories, called the Justice Project had published in 2008 subject to judicial approval. a report “Improving access to post-conviction DNA 5. Provides counsel and covers the cost of post-conviction testing” which makes six practical and impor- DNA testing in cases where a petitioner is indigent. tant recommendations for how to expand post- 6. Standardizes post-testing procedures for cases that produce testing results favorable to a petitioner. conviction DNA testing. The original document

MOLECULAR MEDICINE 9. Forensic Science and Medicine 291 when DNA genetic testing is undertaken with- country has been less well received. DNA tests out consent (Chapter 10). This approach does are also expensive and so not affordable by address the issue of consent in the paternity all. They may not always be helpful. In cases scenario, although would not apply if there is involving close relationships, even a large bat- an appropriate court order. tery of STRs may not be discriminatory enough Two paternity testing scenarios can be to prove conclusively whether a relationship considered: is, for example, father-son or uncle-nephew. As for the motherless paternity case, other DNA 1. Straightforward trio cases – mother, child markers can be tried but these particular cir- and alleged father, and cumstances remain problematic and will, in the 2. More complex cases, for example child and longer term, require more sophisticated DNA alleged father but no mother. fingerprints. In paternity testing, the STRs provide high sensitivity (few false negatives) but low spe- cificity (false positives can occur because unre- Molecular Autopsy lated people can share STR alleles). However, this is less of an issue in the trio case because Another application of DNA analysis is deter- apart from a random match, the problems of mining a cause of death. This has been called molecular autopsy mistyping through silent (null) alleles or allele the . It has been researched in drop-out are less relevant as all three individ­ relation to sudden cardiac death, particularly uals are being examined and so silent alleles in those under 35 years of age, and in children. will be detected (Figure 3.6). The chance of a In these circumstances, the usual causes of heart random match is never completely excluded, disease, particularly coronary artery disease, but the chance can be reduced further, if nec- are less likely to be able to explain the sudden essary, by using a larger number of STRs or death. More likely causes are inherited dis­orders other DNA polymorphisms, such as mtDNA leading to cardiomyo­pathies or conduction or Y-chromosome-specific STRs. Two-person defects. However, the traditional postmortem (mother­­less) paternity testing cases are more examination may not find anything structurally of a dilemma because it is not known which abnormal, including normal toxicology leading alleles in the child have come from the mother, to a presumptive cause of death being given as meaning assumptions need to be made, and cardiac arrhythmia. This can occur in 10–30% of more complex statistical analyses are required. cases [12]. There are two outstanding issues in The risks with a missing parent from mistyp- this circumstance: ing due to null alleles or allele drop-out now 1. Is the diagnosis correct? become a real issue with the STRs. 2. If the deceased had a genetic disorder are there ongoing risks for living family Immigration members? This application of DNA testing has been received with mixed feelings. When first used A study looking at 49 cases of sudden and in the UK it showed that a distant family mem- unexplained death, including some where ber seeking to emigrate had been incorrectly normal coronial postmortems had been per- denied entry. This was based on a DNA test formed, considered the above by looking for that confirmed the family relationship. More mutations in genes causing the autosomal recently, the potential to use a broader screen- dominant Long QT syndrome as well as a sec- ing approach to immigration in the same ond but autosomal recessive condition CPVT

MOLECULAR MEDICINE 292 9. Forensic Science and Medicine

BOX 9.4 LONG QT SYNDROME (LQTS). This is an autosomal dominant genetic dis- risk factor. Treatment includes the avoidance order caused by mutations in 13 genes that of drugs that prolong the QT interval and the encode cardiac ion channel subunits or proteins administration of β blockers, as these reduce involved in modulating ionic currents. LQTS the risk or severity of serious arrhythmias. Left is estimated to have a prevalence of about 1 in cardiac sympathetic denervation and cardiac 2 500 and is characterized by a long QT interval pacing can also be used. Implantable defibril- on the ECG and syncopal episodes that could lators have been inserted in those who have result in cardiac arrest and sudden death sec- previously had a cardiac arrest or have par- ondary to ventricular fibrillation. The three ticular clinical indications. DNA genetic testing commonest forms of LQTS are LQT1, LQT2 which looks for mutations in these genes can be and LQT3 and involve the KCNQ1, KCNH2 and used to confirm a clinical diagnosis or to assess SCN5A genes. Arrhythmias can be precipitated at-risk family members for a known mutation, by a number of factors. including emotional or most of which are family-specific. This approach physical stress and even rest or sleep. Strenuous to diagnosis is successful in about 70–80% activity such as competitive sport is another of cases [13].

(catecholaminergic polymorphic ventricular medications or activities that might make tachycardia) (Box 9.4) [12]. It showed: their condition worse and more prompt and effective treatment should clinical features of 1. In about a third of these deaths, there the disease develop (Box 9.4). were mutations in the relevant cardiac channelopathy genes, and The potential value of the molecular autopsy 2. In about half the families tested all but one has also been considered in cases of unexpected of the mutations was inherited and so family drowning as well as sudden unexplained members were at risk. deaths from epilepsy. Non-medical applications for forensic DNA The above would have an important preven- testing include testing for tainted food, identi- tive implication by recognizing an inherited fying endangered species, veterinary forensic cause of sudden cardiac death and so providing practice and determining the origin of prohib- an opportunity through DNA testing to study ited substances (Table 9.5). at-risk family members leading to: 1. Exclusion of risk for 50% of those tested for autosomal dominant conditions. These BIOTERROR individuals need no follow-up because they do not have the family-specific mutation The sequencing of model organisms in the found in the deceased, and Human Genome Project (particularly bacteria 2. The 50% carrying the family mutation and viruses in the context of this chapter) was will need appropriate counseling and undertaken to provide insight into the human careful follow-up, including avoidance of genome. However, an organisms’ DNA or RNA

MOLECULAR MEDICINE 9. Forensic Science and Medicine 293

TABLE 9.5 Other applications for DNA profiling.

Application Details

Food industry Determining what is actually in pet food or what comprises meat for human consumption is possible through the identification of DNA markers. The crisis facing the beef industry after the bovine spongiform encephalitis (BSE) outbreak in the UK demonstrated how vulnerable the industry is if tainted products continue to be sold. Meat from potentially endangered species including non-human primates such as gorillas and whales can be traced through DNA profiling [14]. Restaurants are also taking advantage of DNA technology with the promise to customers that the meat supplied can be traced from the farm to the restaurant via DNA testing. Endangered species The study of endangered species relies on accurate taxonomic classification. This is being undertaken in studies of tigers whose numbers declined between 1920 and 1970 because of hunting, loss of habitat, decline in prey and other factors [15]. The last Caspian tiger died in 1970. Today, efforts at determining relatedness with other tiger species such as the Malay tiger can be pursued through mtDNA testing or nuclear DNA testing with the former possible using samples from extinct species. Veterinary forensic practice The investigation of cruelty to animals is often suboptimal. There are now moves to develop new courses in which skills in forensic medicine and science can be transferred from human practice to veterinary practice. There are bizarre cases reported where DNA testing was used to confirm that a dog rather than a human had sexually abused a child [16]. Prohibited substances The plant Cannabis sativa has two uses: (1) Production of hemp fiber for rope and fabric (plant stems), food and oil (plant seeds), and (2) An intoxicant (plant flowers and leaves). Different types of Cannabis are grown depending on whether fiber or the intoxicant (Cannabis or marijuana) is required. DNA fingerprinting using different polymorphisms is being studied to provide a Cannabis sativa gene profile that will distinguish fiber and intoxicant plant varieties. Another goal is to characterize genetically the plants so that illegally grown or seized material can be traced back to the original sources or local from imported products can be distinguished [17].

sequence can be used for understanding patho- in the USA in the early 1970s, leaving many genesis and developing rapid diagnostics both unvaccinated and so vulnerable targets in this of which are essential countermeasures to bio- and other countries. In Asia, case fatality rates terror. Chemicals and infectious agents in war of around 30% were observed during epidem- have been recognized for over a thousand ics, and there is no known treatment. As part years. However, bioterrorism is relatively new of its biological warfare program, the former with the first well-documented case occurring Soviet Union produced smallpox, anthrax and in 1984 (Table 9.6). other pathogens, and it remains a concern that Two pathogens of particular relevance in some of these organisms could fall into the bioterror are anthrax and smallpox. DNA hands of terrorists. genotyping, and subsequently whole genome Biological warfare, bioterror or the crimi- sequencing proved its value in the 2001 US nal use of microorganisms or their toxins is anthrax cases (Box 9.5). Today, a pathogen such possible in a number of ways, including the as anthrax would be detected rapidly by PCR- contamination of food or water supplies, infec- based methods. Smallpox is another serious tion of animals or even insects. A more serious infection since routine vaccination was stopped attack would involve aerosols containing the

MOLECULAR MEDICINE 294 9. Forensic Science and Medicine

TABLE 9.6 History of biological warfare and bioterror [18].

Time Agent Effect

Greeks, Romans and Bodies of humans and animals were Plague outbreak in the 14th century attributable to the Tartars used to poison drinking water or Tartars catapulting the bodies of plague victims over spread infections. the walls into the city of Caffa. 17th and 18th British and French soldiers used Smallpox used to kill American Indians. centuries smallpox via blankets. World War I German plan to use glanders to infect Not implemented. horses (and then humans) in USA. Neurotoxic chemicals used. Estimated to cause 1 million casualties. World War II Japanese use of anthrax, cholera and Used against the Chinese. plague. Cold War 1970s–80s Accident in Soviet Union weapon’s Outbreak of inhalational anthrax. laboratory. 1984 USA Religious cult in Oregan USA spread 750 cases food poisoning with a delay of over a year Salmonella to prevent voting in an to determine the cause. election. 1980–88 Iran, Iraq Chemical warfare using mustard Difficult to confirm number of casualties but one and other gases. estimate is 10 000 killed by chemical weapons during Iran – Iraq war or the Kurds in Iraq. 1993–1995 Japan Japanese cult releases sarin, 5 000 injured and 12 deaths due to sarin in Tokyo botulinum toxin and anthrax. subway. Seven deaths from sarin in Matsumoto. 2001 USA Anthrax dispersed by mail. Five deaths due to anthrax. Criminal investigation demonstrates the earliest applications of microbial forensics (see text, Box 9.5).

pathogens because there is now the risk for with infectious agents in bioterror. Since part infecting a very large number of people. The of the bioterror agenda is to inflict fear, panic length of the incubation period is also a consid- and economic chaos in addition to the actual eration since it would, to some degree, influ- morbidity and mortality, the traditional public ence the number infected before containment or health approach to an infectious disease crisis is treatment was initiated. inadequate. The attribution (who did it?) of biological Microbial Forensics attacks is not easy, but DNA-based technology is now a powerful approach which comple- Microbial forensics is new. It is defined as ments the more traditional chemical and physi- a scientific discipline dedicated to analyzing cal analyses. A DNA or RNA sequence allows evidence from a bioterrorism act, biocrime, early identification of the infectious agent, hoax or inadvertent microorganism or toxin and genetic fingerprints can provide insight release, for the purpose of identifying those into their possible sources (Box 9.5). It is also responsible for the crime [19,23]. Many gov- important to look for changes which may sug- ernments have developed plans for dealing gest whether the organism has been modified

MOLECULAR MEDICINE 9. Forensic Science and Medicine 295

BOX 9.5 BIOTERRORISM USING ANTHRAX [ 1 9 ] . Anthrax is caused by Bacillus anthracis, a in court. In February 2010, the US Department gram positive, spore-forming organism. It is of Justice concluded the investigation (code usually acquired by humans through exposure named Amerithrax) with a statement saying that to infected animal products or contaminated the evidence established the late Dr B Ivins was dust. The major forms of anthrax are cutaneous responsible. Just prior to this, the FBI had asked (95% of cases, with a mortality of about 20%) the US National Academy of Sciences to review and pulmonary (100% mortality if not treated the scientific and technical methods used by the before symptoms develop). One week follow- FBI. This was in response to disquiet about the ing the 11 September attack on the World Trade investigation and the FBI’s view that the sci- Center, letters containing dry powdered spores ence was important to this and future cases. The of Bacillus anthracis were mailed to addresses NAS report was issued in February 2011, and it in New York and Florida. Three weeks later, questioned some conclusions from the genetic similar letters were sent to two US senators in analysis, but noted that there were significant Washington DC. Four letters were recovered, technology limitations in 2001 and these are although more were suspected of being posted likely to be overcome by new tools including based on the distribution of infections that whole genome sequencing [20]. Later that year, would ultimately occur. The contaminated letters a scientific report was published using whole resulted in 22 anthrax cases (half involving the genome sequencing, and the findings were con- skin and half the lungs) and five deaths. These sistent with the FBI data and conclusions on were examples of bioterrorism, since all bacte- the source of the anthrax. The report noted that ria came from the one source (the Ames strain), B. anthracis, unlike other bacteria, is genetically as determined by DNA typing. However, this homogenous with reduced genetic variability strain had been used for research in a number of related to spore formation which can remain dor- US and overseas laboratories, so this knowledge mant for many years. Another bacterium which did not help to identify the source of anthrax or accumulated mutations rapidly might have been the perpetrator. Spores from the bacterium were more difficult to study in this way [21]. The lat- characterized genetically and shown to have ter is an interesting observation, particularly distinct patterns that allowed additional tracing in light of another study which showed that of its origin. Subsequently this was identified as phylogenetic analysis of HIV-1 DNA sequences the US Army’s Medical Research Institute for was a powerful tool in a number of court cases Infectious Diseases (USAMRIID) in Maryland, involving health professionals thought to have and in particular flask RMR-1029, created in 1997 infected their patients. HIV-1 has a very dynamic by the scientist Dr Bruce Ivins, although over genome because of its high mutation, recombina- 100 individuals could have had access to this tion and replication rates. Yet DNA-based evi- culture. The FBI focused on two scientists at this dence was used to implicate (and exclude) health facility with one subsequently winning a viola- professionals (particularly MDs and dentists) tion of privacy lawsuit because the FBI could as sources of infection [22]. Microbial forensics not prove its case. The second person of interest is an important new initiative using molecular (Ivins) committed suicide before he was indicted, medicine tools and will continue to evolve and so his innocence or guilt could not be tested produce novel findings.

MOLECULAR MEDICINE 296 9. Forensic Science and Medicine to make it more pathogenic, or has been weap- Scientific Research onized to enhance its spread. The process of attribution has three phases: The risks of bioterror have started an inter- esting debate around the possibility for dual 1. Identifying the infectious agent in an research by life scientists. This means the use unusual outbreak of disease; of research data in the life sciences to develop 2. Characterizing the outbreak as either natural biological weapons, bioterrorism or bio-war- or deliberate in origin, and fare [24]. In particular, the availability of DNA 3. If the event is intentional, finding out who or sequences might provide information that what organization was responsible [19]. allows terrorists to genetically engineer their The importance of attribution cannot be organism to make it more virulent, or harder overstated, since it will be the key to preventing to detect. In this environment it is likely that further attacks. On the other hand, the inappro- research involving potential bioterror weap- priate attribution of a bioterror threat or attack ons will be monitored and censored by gov- can lead to disturbing political or human conse- ernments, because of its security risk. The quences as summarized in [19] and Box 9.5. US National Institute of Health has formed Priorities that must be addressed in micro- a National Science Advisory Board for Bio- bial forensics include: security which identified characteristics of life science research that would be of concern in 1. Development of rapid DNA or proteomic- relation to biological agents or toxins. These based diagnostic strategies for infectious include: agents. A DNA chip containing information about infectious agents, or ability to 1. Enhancing their harmful consequences; sequence rapidly the whole organism and/ 2. Disrupting immunity or effectiveness of or a protein chip able to detect toxins will immunization without clinical and/or comprise the front line for rapid diagnostics; agricultural justification; 2. Understanding pathogenesis, including 3. Changing the following properties – host-pathogen interactions and knowledge resistance to prophylactic or treatment of the organisms’ transcriptomes, will options, enhanced evasion of detection be invaluable to this goal. Fortunately, a methods; number of bacterial pathogens have been 4. Increasing the stability, transmissibility or completely sequenced and will provide a ability to disseminate; resource to move ahead in understanding 5. Altering the host range or tropism; where genes are by in silico methods, and 6. Enhancing susceptibility of a host from this their function, and population, and 3. The final challenge will be how to treat or 7. Generating a novel product or reconstituting manage bioterror related outbreaks. an eradicated or extinct product [24]. Vaccines have always played a key role in Checks and balances, and appropriate the control of infectious diseases, but these may review processes are needed to mitigate the not be enough in the bioterror scenario, where risks while at the same time avoiding the poten- the infectious agent had the potential to be dis- tial that restrictions placed on sensitive work tributed widely and acutely or indolently and will prevent the sharing of information through not recognized for some time. New therapeutic traditional routes such as conferences and pub- approaches including cellular therapies may be lications. Item (7) above about novel products is needed. very relevant to molecular medicine, where the

MOLECULAR MEDICINE 9. Forensic Science and Medicine 297 potential to manipulate an organism is to some skin, and this can be characterized by DNA extent only limited by the researcher’s inven- sequencing. The implication of this study is tiveness. This is well illustrated in synthetic that if human-derived DNA is not available at a biology (Chapter 8) and the example of the syn- crime scene, it might be possible to compare the thetic bacterium (Chapter 10). bacterial DNA profile taken from evidentiary material with that of the suspect’s skin flora.

FUTURE In-Field Testing Just as there is interest in clinical practice The Conservatism of the Courts to move the laboratory closer to the patient’s The courts and scientists demonstrate an bedside or the consultant’s office (called point- interesting dichotomy when it comes to DNA- of-care DNA testing), so also is there a move based evidence. The courts are conservative to perform some forensic DNA testing at the and prefer to deal with precedents or well- crime scene [23]. In microbial forensic practice established technologies. In contrast, scientists this is essential, because of the public health strive to develop new techniques, and in the problems arising from potential harmful micro- field of molecular medicine, the changes are organisms at the crime scene that would need many. This comparison is particularly relevant to be identified urgently, to minimize their with the roll-out of sophisticated DNA sequenc- spread to others. DNA profiling is also a pos- ing platforms. If it is the case that third gen- sible option, particularly as DNA analyzers eration sequencers will work through single become smaller, through advances in nanotech- molecule technologies that are fast, accurate, nology and as greater use is made of robotics. sensitive and cheap, it is likely that the estab- This would free up time for the expert forensic lished 13 CODIS STRs comparisons will be scientists who would remain at the laborato- replaced by a new paradigm – whole genome ries and receive electronically transmitted DNA sequencing – which, with the number of SNPs profiles from evidentiary material worked on and CNVs that could be identified and then at the crime scene. The search for potential sus- profiled, must be close to producing a unique pects could then start almost immediately. DNA profile for any one individual. In this scenario there might be a two-step process. Personalized Justice and Sentencing The initial interrogation of the DNA forensic database will rely on CODIS-like STRs. The A hypothetical situation which could emerge confirming DNA profile with a new sample of during judicial conferences involves expert DNA from the accused might then be based on witnesses who provide genetic DNA infor- whole genome sequencing? mation to influence sentencing by the courts. Other new developments available through Is this any different to a mental health assess- omics (Chapter 4) will challenge the courts. ment produced for the court to consider before A recent example is the finding that humans sentencing? The genetic twist had previously appear to carry unique skin bacterial profiles, been dismissed as wishful thinking, but in as shown by DNA sequencing [25]. This very 2009 it became a reality. A case occurred in preliminary research study has suggested that Italy, involving a murder to which the accused bacterial colonies in certain environments such confessed and was sentenced to 9 years in jail as the keys on a computer can be directly com- with a mitigating factor given as his poor men- pared to the bacterial colony on an individual’s tal health. There was an appeal by the defense

MOLECULAR MEDICINE 298 9. Forensic Science and Medicine against this sentence, and the Appeals Court focus of the investigation turned to the dose or judge asked for further clarification of the per- dosing regimen used [27]. A lot has been said son’s mental state. Neuroscientists from two about personalized medicine, but the oppor- Italian universities produced evidence that tunity for pharmacogenetics-based evidence the murderer had changes in brain scans as informing the courts opens up the promise for well as five genes linked to violent behavior. personalized justice; i.e. DNA-based evidence is These expert witnesses concluded that these added to what is normally used [28]. genetic findings made him more prone to vio- There are many interesting and challenging lent behavior. Perhaps surprisingly the judge times ahead in forensic science, as DNA ana- accepted this evidence and reduced the sen- lytic platforms continue to progress rapidly, tence by a year. This verdict was unexpected, and our understanding of complex genetic and was reported in Nature [26]. The disbelief traits, particularly antisocial disorders, aggres- reflected the still incomplete understanding of sion and psychopathy become better character- how genes influence aggressive behavior. Some ized at the molecular level [29]. even suggested that the sentence could have been increased if the judge believed that there was an innate predisposition to violence! References A variation on the theme above is the use [1] Strengthening Forensic Science in the United States: A of DNA testing in forensic toxicology (more Path Forward. www.nap.edu./catalog/12589.html appropriately now called forensic pharmaco- [2] Sudoyo H, Widodo PT, Suryadi H, et al. DNA analysis genetics) to determine whether a drug over- in perpetrator identification of terrorism-related dis- dose was the result of foul play or due to an aster: suicide bombing of the Australian Embassy in Jakarta 2004. Forensic Science International: Genetics individual’s inability to metabolize a stand- 2008;2:231–7. ard dose of a drug (Chapter 3). These scenar- [3] Kaye DH. Trawling DNA databases for partial ios have already come before the courts. An matches: what is the FBI afraid of? http://papers. example is a codeine overdose in a child. This ssrn.com/sol3/papers.cfm?abstract_id=1551467 drug is given for a range of relatively minor [4] van Daal A, Haesler A. DNA evidence: current issues and challenges. http://search.informit.com.au/docum health problems but can also lead to death. entSummary;dn=310793256010662;res=IELHSS; 2011. The normal metabolism of codeine is depend- [5] US Government website. DNA Initiative, advancing ent on the CYP2D6 gene, which converts it criminal justice through DNA technology. www.dna. to the active product morphine. Individuals gov/solving-crimes/cold-cases/howdatabasesaid/ have three variants of the gene with the two codis/ and human genome project information – DNA forensics. www.ornl.gov/sci/techresources/ extreme types being: Human_Genome/elsi/forensics.shtml 1. Poor drug metabolism (and so low [6] Denver DA. Familial DNA database searches. www. denverda.org/dna/Familial_DNA_Database_ pharmacologic effect), and Searches.htm 2. Ultra-rapid metabolism producing an excess [7] Carmody G. Identification of victims using DNA of the morphine product and potential from relatives: the Canadian experience. In: XIX toxicity. International Congress of Genetics 2003. Abstract 6H, p. 55. One court case involved twins, both of whom [8] 2006 US Government report on lessons learnt from had suffered an overdose of codeine as a result 9/11: DNA identification in mass fatality incidents. of which one died. Pharmacogenetic DNA test- www.ncjrs.gov/pdffiles1/nij/214781.pdf [9] 2002 US Department of Justice Office of Justice ing showed that the overdose was not due to a Programs. National Institute of Justice Special genetic defect causing ultra-rapid metabolizing. Report – Using DNA to solve cold cases. www.ncjrs. Since the twins were normal metabolizers, the gov/pdffiles1/nij/194197.pdf

MOLECULAR MEDICINE 9. Forensic Science and Medicine 299

[10] Innocence Project. www.innocenceproject.org/ of the Amerithrax investigation. Proceedings of the understand/ National Academy of Sciences USA 2011;108:5027–32. [11] Justice Project. www.deathpenaltyinfo.org/studies- [22] Scaduto DI, Brown JM, Haaland WC, Zwicki DJ, dna-testing-and-use-forensic-science Hillis DM, Metzker ML. Source identification in [12] Tester DJ, Ackerman MJ. Postmortem Long QT syn- two criminal cases using phylogenetic analysis of drome genetic testing for sudden unexplained death HIV-1 DNA sequences. Proceedings of the National in the young. Journal of the American College of Academy of Sciences USA 2010;107:21242–7. 2007;49:240–6. [23] Budowle B, van Daal A. Extracting evidence from [13] Online Mendelian Inheritance in Man and the LQT forensic DNA analyses: future molecular biology syndrome. http://omim.org/entry/192500 directions. BioTechniques: Beyond Darwin: The [14] Baker CS, Steel D, Choi Y, et al. Genetic evidence Future of Molecular Biology 2009;46:339–450. of illegal trade in protected whales links Japan [24] Atlas RM. Responsible conduct of life scientists in with the US and South Korea. Biology Letters 2010 an age of terrorism. Science and Engineering Ethics doi:10.1098/rsbl.2010.0239 2009;15:293–301. [15] Driscoll CA, Yamaguchi N, Bar-Gal GK, et al. [25] Fierer N, Lauber CL, Zhou N, McDonald D, Costello Mitochondrial phylogeography illuminates the origin EK, Knight R. Forensic identification using skin bac- of the extinct Caspian tiger and its relationship to the terial communities. Proceedings of the National Amur tiger. PloS ONE 2009;4:e4125. Academy of Sciences of the USA 2010;107:6477–81. [16] Wiegand P, Schmidt V, Kleiber M. German shep- [26] Feresin E. Lighter sentence for murderer with herd dog is suspected of sexually abusing a “bad genes”. In: Nature News (www.nature.com/) child. International Journal of Legal Medicine published online October 30 2009. doi:10.1038/ 1999;112:324–5. news.2009.1050. [17] Howard C, Gilmore S, Robertson J, Peakall R. [27] Ferreiros N, Dresen S, Hermanns-Clausen M, et al. Application of new DNA markers for forensic exami- Fatal and severe codeine intoxication in 3 year old nation of Cannabis sativa seizures – developmental twins – interpretation of drug and metabolite con- validation of protocols and a genetic database. www. centrations. International Journal of Legal Medicine ndlerf.gov.au/pub/Monograph_29.pdf; 2008. 2009;123:387–94. [18] Fraser CM. A genomics-based approach to biodefence [28] Wong SHY, Happy C, Blinka D, et al. From personal- preparedness. Nature Reviews Genetics 2004;5:23–33. ized medicine to personalized justice: the promises of [19] Koblentz GD, Tucker JB. Tracing an attack: the translational pharmacogenomics in the justice system. promise and pitfalls of microbial forensics. Survival Pharmacogenomics 2010;11:731–7. 2010;52:159–86. [29] Gunter TD, Vaughn MG, Philibert RA. Behavioral [20] Review of the scientific approaches used during the genetics in antisocial spectrum disorders and psy- FBI’s investigation of the 2001 anthrax letters. http:// chopathy: a review of the recent literature. Behavioral www.nap.edu/catalog.php?record_id=13098 Sciences and The Law 2010;28:148–73. [21] Rasko DA, Worsham PL, Abshire TG, et al. Bacillus anthracis comparative genome analysis in support

Note: All web-based references accessed on 5 March 2012.

MOLECULAR MEDICINE CHAPTER 10 Ethical, Legal and Social Issues (ELSI)

OUTLINE

Introduction 301 Industry and Gene Patents 319 Scientific Misconduct 321 Consent 302 Clinical Practice and Research 302 Challenges Ahead 322 Populations 304 Education and Engagement 322 Genetic Identifiers 304 DNA Theft 322 Biobanks 305 Whole Genome Sequencing for Patient Omics Research 306 Care 323 Direct-to-Consumer DNA Testing 323 DNA Genetic Tests 309 Access and Equity 324 Privacy, Confidentiality and Duty of Care 309 Stem Cell Tourism 324 Discrimination and Stigmatization 313 Synthetic Biology 325 Genetic Screening 315 Vulnerable Groups 316 References 326 Oversight 318 Regulation and Self-regulation 318

INTRODUCTION l Integrity as a guiding value for researchers and practitioners to ensure honest, ethical conduct The fundamental principles of ethics as in the search for knowledge or delivery of care. applied to medical research and clinical prac- l Respect for the person’s dignity, their tice are exemplified by: needs and welfare, and their rights to confidentiality and privacy. l Autonomy to ensure an individual has l Beneficence to ensure practitioners and complete freedom in thought and ultimately researchers intend to do good and maintain in decision making. their skills to this end.

Molecular Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-381451-7.00010-4 301 © 2012 Elsevier Inc. All rights reserved. 302 10. Ethical, Legal and Social Issues (ELSI) l Non-maleficence (first do no harm) to the CONSENT individual or others in the community. l Justice ensures equity in the provision of care, The concept of informed consent is basic to and accountability in the use of scarce health how medicine is practiced and research is con- resources, and benefits are appropriately ducted. Nevertheless, a signature on a piece distributed. of paper means little, as ultimately it is what was understood that defines the type of con- Legal requirements and professional stand- sent given. The evolution of informed consent ards govern medical practice and there are within medical research has moved forwards additional safeguards for volunteers when it progressively since the Nuremberg trials (Table comes to medical research (Table 10.1). The 10.1). Obtaining the appropriate consent in applications of molecular medicine would be clinical practice is guided by ethical principles, covered by these. However, there are particular but ultimately the courts will resolve problems challenges in dealing with DNA-based medi- that arise. Some now consider that the cur- cine, and these will be the focus of this chapter. rent approach to getting informed consent is Most revolve around research into the genome not sustainable with the omics-type research or the clinical applications of DNA genetic developments occurring in molecular medicine testing. (Figure 10.1). Scenarios demonstrating various Those working in medical genetics (and now issues regarding consent follow. genomics) take the view that genetic informa- tion is no different to other types of medical information, as to say otherwise might imply Clinical Practice and Research genetic determinism. This is consistent with the The zone demarcating clinical and research message given throughout this book; human activities becomes even more blurred in genom- traits and genetic disorders rarely represent ics because of the rapid changes that occur. pure genetic effects but invariably G x E and There is also the pressure from external sources other interactions, even if the environmental (government and research funders) to speed up components in the Mendelian disorders are the translation of research findings into clinical small. Nevertheless, genetic information is dif- practice to optimize benefits for the community. ferent because of the inherent properties of An example would be BRCA1 and BRCA2 DNA DNA including: testing for breast cancer. When these genes were l The potential for predicting disease first discovered in 1994, there was great enthu- development in asymptomatic individuals siasm that they would help solve the problem by looking for germline (constitutive) genetic of breast cancer. Thus, there was considerable changes. pressure to start clinical DNA testing even l There are implications for family members though little was known about the test’s clini- who share genes and DNA with a patient. cal utility. Today, DNA testing for these genes l Unwanted information can result, for can help particular families with an uncom- example, non-paternity might be identified. mon form of breast cancer, but there remains a In omics, a risk is the finding of incidental or lot more to be discovered (Chapter 7). Patients unexpected results pertinent to health. with negative results might need further l A view has been expressed that DNA can DNA testing as new genes are identified, and never be de-identified (anonymized or made patients who have had a DNA test may need anonymous) and so additional precautions to be reassessed at some future date in the light are necessary to protect privacy. of new findings. Were these various caveats

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 303

TABLE 10.1 Setting standards and guidance in molecular medicine [1]a.

Standard Details

Declaration of Helsinki – Ethical Following the Nuremberg trials, the judges issued a statement now known principles for medical research involving as the Nuremberg Code. For medical research the code required that: (1) The human subjects [2] The Declaration subject gives voluntary consent, and risks should never exceed likely benefits; of Helsinki, has been reviewed and (2) The research is justifiable and socially worthwhile, and (3) The investigator amended by a number of World has appropriate knowledge and skills, and is aware of what has been done Medical Assemblies with the latest previously in that field. The World Medical Association embraced the Code, and occurring in 2008. in 1964 expanded it to the Declaration of Helsinki which made recommendations to guide physicians in their conduct of human biomedical research. These were intended to improve diagnostic and therapeutic procedures, which of necessity would sometimes be combined with professional care. Nuffield Council on Bioethics [3] Founded in 1991 as an independent body, the Council has undertaken a number of reviews on ethical issues in genetics. A 2010 report of relevance is titled: Medical profiling and online medicine: the ethics of “personalized healthcare” in a consumer age.

Presidential Commission for the Study The President’s Council on Bioethics was formed in 2001 and was replaced in of Bioethical Issues [4] 2009 by the Presidential Commission for the Study of Bioethical Issues to advise the US President on bioethical matter that may emerge because of advances in and related areas of science and technology. UK’s Human Genetics Commission [5] The UK’s Human Genetics Commission provides government and the public with advice on expected new developments in genetic technology. It is planned to replace it with a government body of experts. OECD’s Guidelines on Human Biobanks Provides a useful 2009 report on DNA banks with advice and guidance from and Genetic Research Databases [6] establishment of this type of resource to how it will be discontinued and materials or data disposed. 2007 Report of the International Overview of consent issues across clinical practice, research (biomedical, clinical, Bioethics Committee of UNESCO on epidemiological), emergencies and tissue donation. Consent is considered in the Consent [7] context of the impaired, the disadvantaged and being culturally appropriate. National Statement on Ethical Conduct Although originally formulated for medical research alone, this statement from in Research Involving Humans [8] Australia on how research should be conducted has now been broadened to include non-medical issues. Essentially Yours – the Protection of Human 2003 report into human genetics resulting in 144 recommendations that remain Genetic Information in Australia [9] pertinent nearly a decade later. USA’s 2008 Genetic Information Legislation of relevance where health insurance and employment are linked. Nondiscrimination Act – GINA [10] This law ensures: (1) Employers cannot use genetic information (including the request to have DNA genetic tests) to decide on who will be employed and how much health coverage is given, and (2) Health insurers do not use genetic information (both DNA genetic tests and utilization of genetic services) to influence eligibility or types of premiums issued. GINA does not apply to life, disability or long-term care insurance. DNA theft [11] An amendment to the Human Tissues Act 2004 now makes it illegal for non- consensual DNA testing. aThe reference provides a more comprehensive overview of the documents or standards that have helped to evolve the concept of informed consent.

MOLECULAR MEDICINE 304 10. Ethical, Legal and Social Issues (ELSI)

Clinical Practice All material comparable. This is the pragmatic and perhaps potentially only practical way forward, but what is compa- Research re-identifiable rable may not be easy to measure, nor does it address the autonomy of some communities, Genomics & whose approach to ethical review including Consent consent might be different but still consistent with their needs and beliefs. Biobanks & Cohorts Engaging with indigenous populations for studied open-ended consent relatively straightforward genetic research has not been easy, and there are many examples FIGURE 10.1 Informed consent in molecular medicine. where indigenous populations have felt manip- Consent is more complex because there are additional issues ulated, and feel that little benefit comes back to that can impact on the informed part. (1) The consent can their communities. The challenges of genomics cover a mix of medical research and clinical service as new research will be greater because of the broad developments or early discoveries are moved (prematurely or otherwise) into clinical practice. This will be driven to nature of this activity and the way it is con- some extent by the view that translation from research into ducted (see also Vulnerable Groups). clinical practice is lagging and must move along faster to get better value for the research dollar; (2) The intrinsic nature of genomics (measuring everything) encourages collabora- Genetic Identifiers tions or consortia to be formed. These may cross cultural, ethnic and jurisdictional boundaries so the consent process The belief that all genetic material is poten- for consortium members may vary. This might be a particu- tially re-identifiable and so, for practical pur- lar concern of indigenous populations; (3) The concept that poses, cannot be de-identified, means that all genetic material is potentially re-identifiable, and (4) genetic or genomic research protocols must The development of biobanks and the use of open-ended undergo more stringent oversight. Importantly, consent. some projects may not proceed or the use of potentially invaluable DNA collections may be understood in the original consent, given prior denied. This becomes a significant issue in the to DNA testing? case of de-identified material which, in some jurisdictions, is not considered to be human Populations subject research [10]. It is correct that DNA test- ing in the forensic scenario is used to identify a Contemporary studies looking at complex person of interest. However, it does this by pro- genetic inheritance require large numbers of viding a comparison between two samples (the participants, and this usually means inter- unknown against one belonging to the individ- national consortia. In addition, it is essential ual) and then the courts decide on the signifi- to move away from thinking that a predomi- cance of this comparison (Chapter 9). A DNA nantly European Caucasian patient group (or forensic test per se has never identified any one even the control group) is adequate, as there are individual. Similarly, a DNA sample collected many genetic differences between populations. as part of a research project has no intrinsic Research oversight bodies are challenged by the potential to provide any identifiers unless it is multi-centered, multi-national studies that can compared to another sample that is known to emerge from genomic-based strategies. Their have originated from that person. responses will vary but often the default posi- Unethical behavior in DNA genetic testing tion is agreement, provided the approval proc- can occur if an individual tests a DNA sample ess for ethics review from the collaborators is without the appropriate consent, or tests for

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 305

BOX 10.1 TERMINOLOGY.

Critical to any discussion about informed 4. Coded – the human sample could be linked consent in genomics research is the potential to to identifiers and clinical information by identify or de-identify (anonymize) a sample. researchers who have access to the key that This will significantly impact the numerator in breaks the code, and a risk/benefit analysis. It is further complicated 5. Identified – the link between the human sample by the different descriptors including: identifia- and identifiers or clinical details is there for a ble, de-identifiable, re-identifiable, anonymized, number of authorized people to see. coded, linked, de-linked and other terms. According to the review, any sample con- Reference [12] provides a useful overview of the taining DNA is excluded from (1) because terminology, and highlights five categories from of the forensic link, and a statement is made the European perspective although these do that this category would only be relevant for not exactly equate with terms used in the USA. archeological samples. If so, DNA-containing These are: samples will start at (2) or (3) depending on 1. Anonymous – the human sample cannot be the philosophy of DNA being re-identifiable. identified; The status of a research sample in terms of its 2. Unlinked anonymized – the human sample link to the donor is crucial in terms of assessing cannot be linked to identifiers or clinical risks to privacy or the potential for discrimina- information that may be held; tion by a third party. Inconsistent and confus- 3. Linked anonymized – the human sample ing terminology is unacceptable in omics-based could be linked to identifiers and clinical research where it can be expected that many information but the key for this is held by a studies will be conducted across different coun- trusted third party outside the research team; tries and jurisdictions.

genes or genetic information that were not part Biobanks of the consent. Surreptitiously comparing a DNA sample with a known DNA sample may provide An important driver for reviewing consent some evidence for identity. However, the prob- is the development of biobanks with their focus lem is not the DNA or the fact that the research on the future rather than the present. Some of project is a genetic one, but a failure of safe- the biggest biobanks are primarily involved in guards that should have prevented this compari- clinical care rather than research, for example, son from occurring. Thus, the philosophy that national blood transfusion services or bone DNA is always potentially re-identifiable needs marrow banks. However, in this chapter the a rethink as it impacts on the consent process. focus will be on biobanks for research, since Related to this is the confusing and inconsistent clinical service activities are regulated by law. terminology adopted in different jurisdictions As the name suggests a biobank is a reposi- which should be addressed (Box 10.1). tory of biological material. More recently the term

MOLECULAR MEDICINE 306 10. Ethical, Legal and Social Issues (ELSI) biobank has taken a new meaning and reflects a less well-funded or a commercial biobank developments in genomic research that require a which could close or be sold with little central tissue and clinical data resource for long- notice. Data security over this time period is term epidemiological-type studies with large a major concern and requires considerable population numbers [13]. This type of biobank trust from the donors. It will be necessary may not have specific short-term aims or hypoth- to have regular updates on the phenotypic eses to test, but it collects clinical data (pheno- information that is accumulated when donors types) and DNA or other tissue samples. This is are medically examined or health events like done with the understanding that sometime in death or disease are reported to the National the future new technology or accumulation of Health Service, as these data will then be data will provide important answers for a range matched to results from genomic studies. of diseases, some of which may not have been l There have been complaints about the UK proposed at the time of the biobank’s start. Thus, Biobank including: from an ethical perspective the risk/benefit consid- l Disquiet that the UK NHS has provided eration, which is an important yardstick allowing the UK Biobank (a third party) with the an ethics review committee to assess a research names, addresses and dates of births of proposal, is unlikely to be favorable because: potential donors. l The issue of intellectual property (IP) 1. The goals (assuming these are known) was not initially transparent although are long term, and the potential benefit is the philosophy behind the UK Biobank difficult to assess, and is that IP will not be sought but might 2. Risks are likely to be high because of the result from inventions by others using many associated unknowns particularly if the Biobank resources. Any income that there is long-term storage and use. resulted from IP will be re-invested in the Generally, research biobanks represent major Biobank. national endeavors, an example of which is the There are differing views regarding what UK Biobank (Table 10.2) [14]. These types of might comprise a biobank. These assess ethi- biobanks demonstrate a number of challenges: cal issues entirely on risk versus benefit rather l Can consent be truly informed? Presently, than size or underlying philosophy – i.e. even all that can be said about the UK Biobank a smaller collection of tissues or DNA might is that any studies emerging will be large, be considered to have the same risks as a large long-lasting, will not benefit the donor biobank, and so more stringent requirements directly and could investigate any important may be needed from an ethical perspective. contemporary disease. It is not even possible Related to biobanks are tissue and DNA banks to say what diseases will be studied, as that were developed for specific research pur- the Biobank’s timeline extends to 30 years. poses. The OECD calls these human research Nevertheless, some believe (although others genetic databases which include tissue banks and disagree) that if these facts are clearly outlined data banks. This terminology makes sense as the consent process is fully informed. tissue or DNA banks are also data banks but in l The duration of the project (30 years) is in a different form (Table 10.2). itself a challenge because a lot could happen in this time; although at least with the Omics Research reputable sponsors of the UK Biobank one can be reasonably sure that funding will not Researchers are working together to ensure be an issue compared to the uncertainty of that developments in genomics and biobanking

MOLECULAR MEDICINE TABLE 10.2 DNA banks and registers.

Repository

United Kingdom (UK) Biobank: The purpose is to build a resource for future research to understand interactions between genes, environment and health. The biobank will enable long-term prospective studies of an epidemiologic nature linking DNA data with medical records and family histories obtained through family physicians. It is directed to men and women aged 45–69 in the UK and it commenced in 2007. By 2010 the biobank had reached its goal of enrolling 500 000 individuals who contributed their DNA, saliva and urine, and agreed to regular medical examinations as well as providing access to their National Health Service (NHS) clinical data. No particular disease will be studied but donors will be informed that all important 21st century diseases including heart disease, cancer, dementia and others will be investigated. This biobank is the resource and repository of data and does not do the research but allows researchers to use the extensive databases. The UK Biobank has access to medical information on the donors through the centralized records of the UK’s NHS. It stores various tissue samples and it is intended to keep this resource for at least 30 years. The stored material is described as being reversibly anonymized (equivalent to linked anonymized in Box 10.1) and so researchers will not have access to identifiers. The key to link phenotypes with genotypes will be held by an independent third party. Development of the UK Biobank has proceeded smoothly with appropriate public consultation and the formation of an ethics and governance framework. Icelandic biobank: A large scale DNA biobank and centralized database that provoked considerable controversy in Iceland, a country with a relatively closed community and well-documented genealogies. The purpose of this resource was to facilitate gene discovery. The biobank is large with nearly half the population of Iceland included and has three repositories: (1) A database of established family relationships in the form of genealogies – these were already on the public record in Iceland; (2) A database of phenotypes taken directly from the medical records, i.e. a national electronic health record, and (3) A DNA collection for genotypic data. The first model for this resource provoked international debate because there was no consent required to include medical information. Eventually this was partially addressed with the addition of an opt-out system, i.e. access to medical record data was automatic unless the individual declined. Safeguards in relationship to privacy and confidentiality were also included. Despite considerable misgivings this resource has already shown benefit with some important research findings published. In December 2009, deCODE filed for bankruptcy raising concern that its databases might be sold to a third party. This was denied by deCODE’s CEO who indicated that the databases belonged to the individual donors and could not be sold. deCODE was then purchased by a US based company. In January 2010 it announced that it had re-emerged from bankruptcy and would continue its gene discovery research. Faroe Islands Biobank: This self-governing group of islands has a population of about 50 000. In 2011 it started an ambitious project with the UK and USA to sequence the DNA of nearly all its residents. The information obtained would be stored in a central biobank and made available to the individuals’ medical practitioners for healthcare related issues. What is potentially very exciting and different about this proposal is that both the Ministry of Health and Ministry of Education have been involved in the planning with the aim to educate and engage all the islands’ children starting early in their schooling. DNA banks: Reasons why DNA might be stored in a bank include: (1) Advances in gene discovery can change what is a genetic defect of unknown etiology today to a disease with a known DNA marker. This information may benefit family members (even when a person is deceased); (2) The possibility that information gained from research might return some benefit to the donor, and (3) Providing access to new drugs if research is being conducted by a pharmaceutical company. Professional societies have proposed guidelines for DNA banks. These cover: Physical facilities; Relationship between depositors, their families and health professionals; Confidentiality; Safety precautions and quality assurance measures. The word depositor rather than donor is often used because the individual giving the sample maintains ownership and is not acting as a donor in the broadest sense. Depositors need to have clear statements on the length of banking, the potential problems and their rights in respect of the banked DNA. A DNA bank is a planned activity with well-defined operating guidelines. What can be done with the DNA, particularly in terms of research, requires careful thought, the appropriate consent and ethics review. Formal guidelines define the rights of the individual who has had DNA banked. Nevertheless, dilemmas arise including: (1) Purpose: Material has been stored for one purpose but then another DNA test becomes available and the stored material (the depositor may be deceased) would be helpful in defining the genetic status of other family members, and (2) Ownership: can the material be sold or transferred to others? Criminal DNA databases: A different and at times controversial database stores DNA fingerprint profiles. As discussed in Chapter 9 these profiles help police identify a likely perpetrator or exclude persons of interest. DNA fingerprint profiles are now being deposited in centralized police databases, although the indications for taking them, and how long they are stored have proven to be controversial in some jurisdictions. DNA fingerprints obtained from unexpected sources including the newborn screening Guthrie cards can be used to identify victims of mass disasters or help in the search for a missing person. The obtaining of Guthrie cards for forensic or legal purposes by a court order is a sensitive issue and has led in some cases to these cards being destroyed within a short time after they are collected. Genetic registers and databases: Registers come in various forms from local lists of genetic diseases to national registers. A further extension of the genetic register is the availability, in a central database, of a list of names or identities of individuals who have a particular type of genetic disorder. The significance of this in providing information for health planning or to assist other family members is balanced by the potential for unauthorized disclosure of data. The privacy issue is particularly significant when third parties, for example, employers, insurance companies or the courts of law, may gain access to this information. 308 10. Ethical, Legal and Social Issues (ELSI) continue to progress smoothly. One example is l Privacy of participants – the traditional the group known as P3G – Public Population ways to protect this in research is through Project in Genomics [15]. Many consensus de-identification and ensuring that records statements have been published to identify are safe and secure. Will whole genome key areas that would need to be considered in strategies make de-identification more whole genome type research before the informed difficult as there is a likely to be sharing part to consent can be adequately addressed of samples and data because this type [16]. These include: of research is often multi-centered and even multi-national. The legal oversight l Feedback and return of results – presenting for privacy and security may also vary papers, workshops, bulletins or summaries depending on the jurisdiction involved. of research findings are accepted ways l Governance structures – a critical issue that of informing research participants. What will reflect a number of variables particularly remains problematic is the return of the life of the biobank. individual results, particularly with whole l Participant’s right to withdraw from research genome type strategies when there is prior programs. This becomes more difficult if the knowledge that these types of studies research is advanced and publications have are likely to produce incidental findings. emerged. If results can no longer be used or Included here would be: material not kept for auditing or review will l What results are returned? This can publications need to be withdrawn? be complex but needs to be negotiated l Calculating the risk/benefit when it might during the consent process. Individuals need to include a public benefit component. might volunteer for a research study and they can decide what information they want (all or only some) and this Other Models for Consent might include requesting that no The rapid changes occurring in genomics and information is returned to them. In the the links with biobanks makes the traditional latter circumstance incidental findings consent process more complex (Figure 10.2). are less of an issue but the researcher can Indeed the usual requirements for informed con- still be placed in an invidious position if sent may not be achievable with biobanks [17]. a significant result or incidental finding This has generated considerable discussion emerges. Should the researcher ignore about alternative approaches, such as open-ended the request for no return of results or go consent (also called broad consent, general con- back to the volunteer and check again that sent or blanket consent). This would seem rea- this is what was intended? This becomes sonable, provided the research participant knew further complicated if the result might exactly what she or he was getting involved in, have significant health implications for including the open-ended nature of the research, relatives, for example, the children of the and the inherent greater risks to privacy that research volunteer. would go with long-term storage and use of the l Who in the team (which is often donated material and data, some of which will multidisciplinary and multi-centered) go offshore. In this circumstance, a lot is being has responsibility for the results, asked in terms of trust and so correspondingly particularly with the passage of time and it will be necessary to ensure that the proposed changed circumstances for the donor or structure is adequately governed, funded and researchers. regularly reviewed from an ethical perspective.

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 309

One matter that recurs in discussions about the more complex forms of informed consent (including open-ended consent) is for the pro­cess Genomic research biobanks to be step-wise and dynamic with continuing feedback through various means (focus groups, newsletters, website, email bulletins, text mes-

Genetic research saging and social media) [1]. Perhaps a model similar to what the regulators are using is to adopt a risk classification so that instead of having test a list of activities as depicted in Figure 10.2, there Genomic DNA would be levels of risk with the least risk, for example routine blood count, requiring only ver- bal agreement as consent, while at the other end test Genetic DNA the most risky category (the biobank) requiring formal written consent and ongoing feedback. Alternative model(s) need to be considered Traditional very soon, as the practice of medicine and med­ medical research ical research increasingly takes on different directions and the one-size-fits-all approach to getting informed consent is under pressure. This Blood count has the potential to undermine research as well as research facilities or resources that are pres- ently available. Other issues of relevance to the FIGURE 10.2 Risk rated informed consent. Some consent process are summarized in Table 10.3. situations requiring informed consent are illustrated. The taking of a routine blood count has minimal risks which should be reflected in the type of consent obtained. The DNA GENETIC TESTS research study is more complex because individuals are acting as volunteers. In some cases, the volunteers have a disease and may gain some benefit, such as access to a DNA genetic testing is not always straight- drug, from participating and so some risk might be justi- forward. Failure to find a DNA mutation does fied. In others, the participant might be involved for no not necessarily exclude a disease. The finding reason other than altruism and so must not be exposed to of a DNA mutation may not mean the indi- unnecessary risk. If there is risk it must be clearly identi- fied. Genetic DNA testing is undertaken for clinical care vidual has a disease, since it could indicate a or research and has potential risks including stigmatiza- genetic predisposition which may or may not tion and discrimination because of what is found. This is progress. Therefore, the concept of risk in genet- further complicated because the same DNA genetic test ics can be difficult for patients and even some can be used for different purposes (Chapter 3). Informed health professionals to grasp. It has the poten- consent for genetic research is complex because the results from the study might have ramifications for other family tial to become more problematic as DNA test- members or impact on the ability to obtain some types of ing expands the options for predictive medicine insurance. Depending on their design, these studies might into the complex genetic disorders. find incidental changes in genes or DNA that could impact on an individual’s health. Research involving the germline (and so family members are implicated) would be more Privacy, Confidentiality and Duty of Care risky than comparable work looking for changes in somatic cell DNA. Finally, genomic research through the use of The distinction between privacy and con- biobanks can carry the highest risk as discussed in the text. fidentiality is subtle. Privacy considerations

MOLECULAR MEDICINE 310 10. Ethical, Legal and Social Issues (ELSI)

TABLE 10.3 Additional issues that impact on consent.

Issue Concerns

Conflicts of interest The consent process must not be flawed because of conflicts of interest. This could be a problem particularly with private-public partnerships and associated patents. It is illustrated by the Jesse Gelsinger case (Box 10.2) where clinical investigators had a financial interest in the company producing a gene therapy vector that was responsible for this research volunteer’s death. This conflict of interest may or may not have been fully explained to the patient prior to the clinical trial, but it remains a contentious issue. Gene therapy trials are expensive and so usually have a sponsor likely to be the company that manufactures the product. Therefore, conflicts of interest may be inevitable but need to be transparent particularly if an investigator has a link to the company. Indigenous communities Indigenous peoples are starting to participate in genetic research studies with informed consent models that are appropriate to their particular beliefs and needs. It is now suggested within the evolving genomics paradigm that a new consent process is needed. This will be difficult with some communities, but at least it is now clear that engagement needs to start at the front end rather than developing a process and expecting others to fit into the mold. Archived specimens and There have been some highly publicized examples of unethical behavior with tissue organs. old resources The backlash has been tightening of laws which, in some circumstances, have meant that tissues or DNA collected for earlier studies are not accessible for future research because the consent was not sufficiently broad to include new activities. In many cases it is not feasible to go back to the original donors and update the consent so these valuable resources are unable to be used. The issue of de-identification or re-identification becomes relevant as approval to work with samples that are no longer identifiable is more likely to be obtained.

BOX 10.2 GENE THERAPY DEATH.

The first death directly attributable to gene therapy approach, and there was some evidence therapy occurred in 1999. Jesse Gelsinger was an that the viral vector used (adenovirus) resulted 18-year-old male with a mild form of X-linked in some adverse events including fever, throm- OTC (ornithine transcarbamylase) deficiency. bocytopenia and transaminitis. Nevertheless, The defect in OTC involves the urea cycle lead- the patient was given a relatively high dose, ing to protein intolerance due to accumulation and he died from acute respiratory failure four of ammonia in the body. Although not severely days later. Subsequent review by the FDA iden- affected, this individual volunteered for a phase I tified a number of violations of the clinical gene therapy study (in a phase I study safety trial rules. Shortcomings were also noted rather than efficacy is the end point being meas- in the review process, as well as the regula- ured) to correct the OTC deficiency. A number of tory protocol for notification of serious adverse individuals had already been treated by this gene events.

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 311 primarily reflect the person him or herself while is also protected by law. The concepts of privacy, confidentiality relates to information about the per- confidentiality and duty of care within molecular son. This information can come in many ways, medicine are complicated further because knowl- including communications between a doctor edge of germline DNA can have implications for and patient, DNA test results and health records the health of genetic relatives. In this circumstance, (Table 10.4). Individuals have a right to privacy an important issue needing to be addressed is the which is usually protected by law. Doctor-patient definition of the boundaries for the physicians’ confidentiality is a well established trust between responsibilities beyond the patient. two people. Although protected by law, it is not There is a general consensus that, for some absolute as in rare circumstances, for example a predictive DNA test results, it is essential that court order, there may be a duty to disclose. The at-risk family members are informed that they health professional’s duty of care to the patient might also carry a mutant gene (this risk can

TABLE 10.4 ELSI Glossary.

Term Definition

Informed consent For informed consent it is necessary to show: (1) The information provided is appropriate and relevant; (2) The information is understood, and (3) Consent has been voluntary, i.e. the individual is able to consent and there has been no coercion. Other versions of informed consent include: open – general – broad – open-ended – blanket consents (see text for explanations). HREC, REC, ERB, IRB There are various acronyms for local authorities or bodies evaluating human research protocols including their ethical content. Examples are: HREC: Human research ethics committee; REC: research ethics committee; IRB: institutional review board; ERB: ethics review board. DNA bank, Tissue bank, There is a confusing list of names for biological tissues (or data) that are stored as central Biobank or Human genetic resources for medical research. The OECD uses the term human genetic research database to research database cover the many approaches and uses of this material and has defined biobank as a collection of biological material and the associated data and information stored in an organized system, for a population or a large subset of a population. Others take a different view of a biobank with the numbers less relevant than the risk/benefit consideration. Another distinction is whether the tissue bank is for clinical service activities (and so usually covered by legislation) or research. Privacy Privacy is the right of an individual to keep information about him/herself from being disclosed. Patients are in control of their health information and they decide who has access. The individual’s privacy is protected through various laws. Confidentiality Confidentiality describes how health professionals deal with the patient’s information once it has been disclosed to them. This involves a relationship of trust because the information during the professional interaction is given with the understanding that it will not be divulged to others unless previously agreed to for treatment decisions, payment of services, or other uses. Duty of care A health professional’s duty of care to patients is protected by law. What is not clear presently in relation to genetics and genomics is the health professional’s duty of care to other family members who share the same DNA and so the same risks as patients (discussed further in the text).

MOLECULAR MEDICINE 312 10. Ethical, Legal and Social Issues (ELSI) be as high as 50% for autosomal dominant dis­ orders). The example usually described is famil- ial adenomatous polyposis (FAP); an autosomal l dominant disorder with a precancerous polyp 1 phase prior to the inevitable development of colon cancer (Chapter 7). If detected at the polyp stage, this disorder is potentially curable ll 50% ? by surgical resection of the colon, but if missed, 1 colon cancer will develop and eventually kill the patient. The penetrance for developing colon cancer in FAP is close to 100%. It is also agreed that the best person to inform at-risk relatives lll 25% ? about the DNA test result is the patient, and 1 2 3 this is what usually occurs. This should be dis- cussed early in the consent process. However, in the uncommon circumstance that the patient refuses to divulge this important information, the health practitioner is placed in a dilemma, because of privacy issues and his or her obliga- tion to confidentiality to the patient. Presently, FIGURE 10.3 DNA testing for a late onset auto- the physician’s duty of care to relatives is not somal dominant familial cancer that is life threatening well defined, as illustrated below (see Figures but treatable if detected early. It has 100% penetrance. A 10.3 and 10.4). female has this cancer confirmed on DNA testing (I-1). The a priori risks for her son (II-1) are 50% and granddaughter The US courts have given mixed messages, (III-1) 25%. A dilemma arises if the granddaughter wishes from two cases which involved genetic forms of to know whether she carries the mutation for this cancer thyroid cancer and FAP. The court in the former (precipitated in this example because she is pregnant). Her case ruled that the physician had a duty to warn estranged father and brother do not know about the fam- family members about the genetic form of thy- ily history. Predictive DNA testing is possible by looking for the grandmother’s mutation. If the granddaughter has roid cancer, but it was sufficient to ask the patient this mutation it means her father must have the cancer gene, to do so. In contrast, the court dealing with the and it increases the risk for her brother (III-3) from 25% to colon cancer case gave a different ruling – i.e. the 50%. The physician is in a difficult position because: (1) physician had a duty to warn at-risk family mem- Does duty of care extend beyond the granddaughter who bers directly even though the patient had specifi- is the patient? (2) Is there an ethical or legal obligation for the physician to notify the two family members about the cally asked his physician not to tell others about risks for a treatable cancer if the granddaughter refuses to the cancer [18]. These are civil cases and so even if contact them with this information? the physician is deceased, the plaintiff can sue the estate which is what happened in the colon can- many topics in detail, but is vague in its recom- cer example. Hence, it is important to delineate mendations on how a health professional deals the physician’s duty of care in genetics or genom- with a serious genetic disorder involving refusal ics cases, because the predictive nature of DNA by a patient to notify genetic relatives [19]. testing means disputes may not become evident In Australia, the 2003 Essentially Yours for many years into the future. report mentioned in Table 10.1 recommended In the UK, this matter has been dealt with a change in the Commonwealth privacy law so through a 2011 report on genetic testing and the that physicians could disclose genetic informa- sharing of genetic information. The report covers tion to genetic relatives in the circumstances

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 313

Noteworthy in the above recommendation and the subsequent change in the Commonwealth Privacy Act were the following points: 1. It is not mandatory for physicians to disclose the information to genetic relatives; 2. A genetic relative is not defined making it 25/20 43/12 difficult for a physician to know when to stop screening at-risk family members; 3. Risks about reproduction because an individual might be born with a genetic disorder are not covered, and 4. The physician’s obligation for confidentiality remained because the change was to the Privacy Act. 25/18 ? For the amended law to be operational it was necessary for guidelines to be issued to assist physicians in deciding when and how to dis- close genetic information to genetic relatives [8]. FIGURE 10.4 DNA genetic test providing unwanted information. A male with chorea is diagnosed to have Huntington disease confirmed by DNA testing (Table 2.4). Discrimination and Stigmatization He has two daughters who will be at 50% risk. One asymp- tomatic daughter has a predictive DNA test while the For the individual, DNA testing has the second indicated by a ? does not know about the family his- potential to detect health problems early. tory. The daughter who is tested has (CAG)n repeats of 25 However, this must be balanced with the risk and 18. Once it is confirmed that this is not a laboratory or of discrimination or stigmatization, particularly blood collection error, the results show: (1) She does not have Huntington disease, and (2) The likelihood of non-paternity. as DNA genetic testing can have a mystique This is assumed because the 25 repeat in the daughter has about it since it is looking at a very personal come from the mother, and her 18 has come from another matter in terms of our genetic makeup or person since her putative father has repeats of 43 and 12. inheritance. The early, poorly designed sickle Non-paternity would need to be confirmed by a panel of cell screening programs in the USA led to DNA markers (Chapter 9) and is problematic particularly as the family also needs to deal with the father’s diagnosis. discrimination and stigmatization; as discussed The second dilemma relates to duty of care and what actions in Chapter 6. (if any) are needed to deal with the sister’s risk although the fact that there is no curative treatment for Huntington dis- Workplace Screening ease distinguishes this case from the example in Figure 10.3. The types of workplace DNA genetic tests were reviewed in Chapter 6. The concern regard- ing this type of test is that it might lead to loss where disclosure is necessary to lessen or prevent a of employment. An example is the case of the serious threat to an individual’s life, health or safety Burlington Northern Santa Fe Railroad, which even when the threat is not imminent [9]. The DNA tested employees claiming compensation imminent part was important because predic- for carpal tunnel syndrome as a work related tive DNA tests generally provide information problem. Testing was undertaken without their that would be relevant to some future event knowledge or consent. The company subse- rather than an immediate one. quently settled out of court when the case was

MOLECULAR MEDICINE 314 10. Ethical, Legal and Social Issues (ELSI) taken to the US Equal Employment Opportunity information from genetic tests is no different to Commission. The fact that the above example other forms of medical information. Recently, has been quoted by a number of sources might it was reported in Australia that there were a even be a positive message, i.e. this is not a few cases where the insurance industry unfairly major or systemic problem. In the USA, the 2008 discriminated against individuals on the basis Genetic Information Nondiscrimination Act (Table of DNA test results [20]. The recommenda- 10.1) ensures that it is illegal for employees to tions from this study (as well as the Essentially consider genetic information and DNA testing Yours report [9]) were for better policies and results in job selection, hiring or assigning jobs guidelines to be developed so that inappropriate or determining eligibility of premiums for health discrimination does not occur. The life insur- insurance [10]. ance industry usually works through a mix of self-regulation and legislative protections for Risk Rated Insurance consumers. Therefore, the industry needs to The life insurance industry is usually risk respond to the perception that decisions involv- rated, meaning decisions are based on evidence ing genetic tests are not always based on good from actuarial modeling to determine risk and evidence. Otherwise, government will act. so the probability of death. Life insurance is On the flip side of the debate is the consid- then denied or provided for an agreed amount eration of an individual who is likely to have a of money. If the industry was subsidized by loaded life insurance policy, or is even denied government and so could offer universal cov- life insurance, because of a family history of a erage it would be described as community rated. genetic disease. In this circumstance the indi- As a business, the life insurance industry needs vidual with a DNA genetic test result that has to be commercially viable and so it is allowed excluded the family-specific mutation from his to discriminate, provided that what it does is or her DNA could reasonably expect that life based on evidence. Thus, family and medical insurance would be obtainable without consid- history are taken into consideration in decid- eration of that particular genetic risk. ing on whether to insure or add a loading to a policy if an individual has certain risks. Ethnicity A DNA genetic test is also medical informa- There is considerable interest in under- tion and may need to be disclosed. This becomes standing genetic differences between ethnic complex and controversial when the test predicts populations. This has particular relevance to an event that is yet to happen – i.e. predisease as pharmacogenetic-based risks (Chapter 3). At discussed Chapter 2. In some circumstances, the the same time, it is important that this type of risk due to a known causative mutation can be genetic information is not used in ways to dis- predicted accurately (for example Huntington criminate against minority groups. While the disease) while in others the risk is variable, for issue of race and scientific research is not new, example BRCA1 and BRCA2 mutations have it is entering an uncertain phase as more infor- incomplete penetrance (Chapter 7). mation becomes known about the variability of In the UK, the life insurance industry has our genome, and in what ways the differences voluntarily prohibited the use of DNA genetic exist. More work is now needed to avoid prob- testing information for any life insurance policy lems in the area of race, ethnicity and DNA under £500 000 except for one test (Huntington genetic testing while at the same time acknowl- disease). In contrast, the Australian life insur- edging that populations can have differences in ance industry requires all DNA genetic test disease predisposition and there is much to be results to be disclosed, and considers that the learnt from this [21].

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 315

Genetic Screening persist if newborn screening is an important public health preventive measure. Government and funding bodies expect research findings to be moved more efficiently along the translational pipeline so that they can Non-invasive Prenatal Diagnosis (NIPD) impact on clinical care earlier. The danger here Non-invasive prenatal testing is new, and is that the gap between what can be detected is offered in a number of countries as well as by DNA testing and what can be done with this direct-to-consumer for early detection of fetal knowledge continues to grow. This is relevant to sex, usually seven or more weeks after concep- genetic-based DNA screening in several areas: tion. This test is based on an earlier observation that there are fetal cells circulating in the mater- l Newborn screening to prevent treatable nal circulation during pregnancy. Next, it was genetic disorders. shown that there is free fetal DNA in the moth- l Carrier screening to determine risks for er’s blood and this can be analyzed by PCR. The couples planning a family. test, when performed optimally (after seven l Population screening for identifying weeks gestation using reverse transcriptase individual risks, workplace risks. Q-PCR) has high sensitivity and specificity [22]. l Reproductive screening via prenatal testing, It can be used earlier than non-invasive ultra- preimplantation genetic diagnosis (PGD) or sound to identify male sex. However, identifying non-invasive prenatal diagnosis discussed DNA belonging to a male in blood from a female below. is technically easier than looking for mutations Each attempts to make an early diagnosis by in genes. A 2009 report notes that NIPD should DNA testing so that various interventions become be available for a wider range of routine clini- possible. Ethical issues related to genetic screen- cal testing in the near future, and makes recom- ing have already been highlighted in terms of mendations on what preparations are necessary, the potential for discrimination or stigmatization. including issues of consent [23]. Other scenarios and related ELSI follow. Apart from allowing an earlier and non-inva- sive approach, NIPD has appeal because it will Newborn Screening be accessible to a wider number of women, par- This has been discussed in Chapter 6. Two ticularly in rural and remote areas and in devel- contentious issues are: oping countries. Concerns about NIPD are: 1. Costs versus benefits. This should be measurable 1. It is presently not reliable for genetic testing. and a consideration in establishing newborn Further evaluation is needed before it replaces screening programs. Hence, it is surprising that chorion villus sampling (first trimester) or there can be significant differences in the range amniocentesis (second trimester) for detecting of genetic disorders screened even within the genetic disorders, and same country, and 2. It has the potential to be used in the longer 2. Level of consent required. There are differences term to test for non-medical or trivial between jurisdictions with some taking the medical or social indications. view that newborn screening is of sufficient public health benefit that it becomes Family Balancing mandatory and consent is not required. Sex selection is used to diagnose or prevent Others take a completely different view severe X-linked disorders or for social reasons and expect written informed consent. It is to ensure that newborns are of the preferred difficult to see why these two inconsistencies sex. The latter is called family balancing. The

MOLECULAR MEDICINE 316 10. Ethical, Legal and Social Issues (ELSI) standard sex ratio for newborn boys to girls is Familial hypertrophic cardiomyopathy is an around 105–107 to 100, whereas in countries autosomal dominant disorder, usually present- like China and India where there is a prefer- ing in adult life but also associated with sudden ence for males this ratio is closer to 120 to 100. cardiac death at any age, often in association Family balancing is contentious because: with strenuous activity. DNA testing of a child who is at risk because of a family history has a 1. Limited resources are used to undertake number of potential benefits: non-medically-related DNA testing; 2. The test is an example of sex discrimination, 1. Excluding a family-specific DNA mutation because one sex (usually male) is preferred causing this disorder (50% of the time) over the other; means the child is no longer at risk and 3. There is uncertainty about long-term societal intensive follow-up is not necessary. The effects if the trend leading to a predominance child can then participate in competitive of males continues, and sports, and 4. The slippery slope argument – i.e. sexing 2. Finding a causative mutation can lead now, and when the genes for other traits to better surveillance and, if necessary, (personality, performance and so on) are inserting an implantable defibrillator. This found, these will be requested. can be a life-saving preventive step as the cause of sudden cardiac death is usually a Family balancing has kept a relatively low ventricular arrhythmia which can be rapidly profile. Those who can afford it have PGD or cardioverted with this device. sperm sorting so they do not have to termi- nate a pregnancy. Others must rely on chorion An overview of DNA testing in children is villus sampling or amniocentesis and a pos- given in [8,19]. sible termination of pregnancy. Some ethicists Genetic DNA testing in indigenous peoples have defended family balancing for various or individuals from communities with different reasons, including individual autonomy and cultural backgrounds requires additional care to the right of a couple to choose their baby’s sex. acknowledge their particular beliefs and address Nevertheless, sex selection outside of a medical the risk of racial discrimination. It is difficult indication is saying that one sex is preferable to recommend a particular approach because to the other, which would not be acceptable if this will be influenced by the group involved. dealing with children or adults. An ethical per- The 2010 American Society of Human Genetics spective of this subject is found in [24]. Presidential Address considered the implications Vulnerable Groups of genetics and genomics in indigenous popula- tions, and highlighted examples where research A dilemma arises when predictive DNA test- had failed because it was not culturally competent. ing is requested for children who cannot give It gave examples where similar research was informed consent because of age and/or under- successful because it involved an equal part- standing of what is being done. In these cir- nership between researchers and indigenous cumstances consent will be given by parents or peoples [25]. Reference was also made to the guardians. Relief of parental anxiety is generally Canadian Institutes of Health Research (CIHR) not considered sufficient, as are non-medical indi- and its 2010 guidelines document on health cations such as life-style planning. An acceptable research involving aboriginal people. This docu- medical indication for predictive DNA testing ment enunciates principles that should guide is a disorder for which early intervention will the way research is planned and then conducted improve the prognosis or treatment. with indigenous peoples (Table 10.5).

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 317

TABLE 10.5 CIHR guidelines for health research involving Aboriginal people [26].

No. Article

1. A researcher should understand and respect Aboriginal world views, including responsibilities to the people and culture that flow from being granted access to traditional or sacred knowledge. To the extent possible these should be incorporated into research agreements. 2. A community’s jurisdiction over the conduct of research should be understood and respected. 3. Communities should be given the option of a participatory-research approach. 4. A researcher who proposes to carry out research that touches on traditional or sacred knowledge of an Aboriginal community, or on community members as Aboriginal people, should consult the community leaders to obtain their consent before approaching community members individually. Once community consent has been obtained, the researcher will still need the free, prior and informed consent of the individual participants. 5. Concerns of individual participants and their community regarding anonymity, privacy and confidentiality should be respected and addressed in a research agreement. 6. The research agreement should, with the guidance of community knowledge holders, address the use of the community’s cultural knowledge and sacred knowledge. 7. Aboriginal people and their communities retain their inherent rights to any cultural knowledge, sacred knowledge, and cultural practices and traditions, which are shared with the researcher. The researcher should also support mechanisms for the protection of such knowledge, practices and traditions. 8. Community and individual concerns over, and claims to, intellectual property should be explicitly acknowledged and addressed in the negotiation prior to starting the research project. Expectations regarding intellectual property rights of all parties involved in the research should be stated in the research agreement. 9. Research should be of benefit to the community as well as to the researcher. 10. A researcher should support education and training of Aboriginal people in the community, including training in research methods and ethics. 11. 1. A researcher has an obligation to learn about, and apply, Aboriginal cultural protocols relevant to the Aboriginal community involved in the research. 2. A researcher should, to the extent reasonably possible, translate all publications, reports and other relevant documents into the language of the community. 3. A researcher should ensure that there is ongoing, accessible and understandable communication with the community. 12. 1. A researcher should recognize and respect the rights and proprietary interests of individuals and the community in data and biological samples generated or taken in the course of the research. 2. Transfer of data and biological samples from one of the original parties to a research agreement, to a third party, requires consent of the other original party(ies). 3. Where the data or biological samples are known to have originated with Aboriginal people, the researcher should consult with the appropriate Aboriginal organizations before initiating secondary use. 4. Secondary use requires formal review. 13. Biological samples should be considered on loan to the researcher unless otherwise specified in the research agreement. 14. An Aboriginal community should have an opportunity to participate in the interpretation of data and the review of conclusions drawn from the research to ensure accuracy and cultural sensitivity of interpretation. 15. An Aboriginal community should, at its discretion, be able to decide how its contributions to the research project should be acknowledged. Community members are entitled to due credit and to participate in the dissemination of results. Publications should recognize the contribution of the community and its members as appropriate, and in conformity with confidentiality agreements.

MOLECULAR MEDICINE 318 10. Ethical, Legal and Social Issues (ELSI)

OVERSIGHT to emerge, and there is even less insight into how their clinical utility will be measured. In the 1970s, the rapid developments in l The driver for DNA genetic and genomic recombinant DNA (rDNA) technologies were testing is industry, and as platforms emerge matched by growing concerns in the general that can measure DNA or other analytes and scientific communities. A conference was faster and cheaper, new applications for convened at Asilomar (USA) in 1975 to address testing are found. These increase the gap these issues. Subsequently, regulatory and between what we know and what can be funding bodies developed guidelines for rDNA done to prevent or treat diseases. work. These dealt with the type of experiments l The direct-to-consumer (DTC) DNA testing allowable, and the necessity to use both vectors marketplace and its continuing expansion (in the form of plasmids) and hosts (bacteria) (Chapter 5). that were safe and could be contained within l Another challenge for the future will be laboratories certified to undertake rDNA work. the ELSI of whole genome sequencing Guidelines began to be relaxed during the late (Chapter 4), commented on below. 1970s and early 1980s when it became apparent that the technology was safe and was being car- Regulation and Self-regulation ried out responsibly. However, government and private funding bodies insisted that a form of Having come this far in molecular medicine, monitoring be maintained which has continued it is too late for a moratorium to review and plan to this day. What this shows is that significant forward progress. What will happen is a mix of concerns about a new scientific direction can be regulation by government and self-regulation effectively dealt with through a process of con- by the industry following input from commu­ sultation, negotiation and regulation. nities, professional bodies or research funders to Advances in molecular medicine, particu- ensure that progress continues and problems are larly in DNA genetic testing, have occured addressed. One example is the DTC DNA test- steadily with frequent media updates giv- ing industry which to date has worked under ing the public glowing but at times unrealis- different standards to those expected in clinical tic expectations of what the new discoveries practice. Now the tide may be turning after con- would bring. DNA genetic testing has had min- siderable inertia from the regulators (Chapter 5). imal oversight or even regulation in some The rapid advances in molecular medicine jurisdictions, and it has been left to the scien- and the negative impact these can have on labo- tists and clinicians to be the drivers abetted by ratory practices were illustrated by suboptimal a willing media. Consequently, a number of standards practiced in some forensic laboratories problems have emerged: (Chapter 9). An important ethical and legal issue in laboratory and clinical practice is the obliga- l DNA genetic tests are now part of tion to ensure high standards, and for this qual- routine clinical care, yet for many there ity assurance programs have been developed. is little evidence of their clinical utility. Deficiencies in these activities may reflect reluc- Internationally there is no strong consensus tance by laboratories to participate but are more on what is the best way to evaluate these likely to be the result of external pressures to start tests (Chapter 3). While we struggle with a a new diagnostic test. These may arise because single genetic test and how valuable it is to of self-interest, financial gains, building a track clinical decision making or clinical care, the record for grant funding or lobby groups wanting next wave of genomic DNA tests are starting a test out in the market place in case it provides

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 319 an additional option in clinical care. Thus, expan- Nuffield Council on Bioethics claims that sion to increase the quantity of testing can take the inventiveness step on patents for DNA priority over the quality of the results. This is sequences is easier to satisfy in the USA. In con- potentially more of a problem in research labora- trast, the utility component of a patent is more tories that also provide a clinical service. strictly observed in the USA. In some countries, Ultimately, data provided by a labora- notably the European Union, ethical consid- tory must be of the highest standard possible. erations may exclude a patent from being filed. Various jurisdictions are now looking at DNA In contrast, countries like the USA, Canada genetic testing, and ways to update regulations and Australia make no provision for ethical to make it mandatory to validate and evaluate and social considerations in the patent proc- these tests in a manner comparable to what is ess. Countries are also bound by international required for conventional pathology assays. treaties, such as the World Trade Organization (TRIPS) agreement, and these restrict what can Industry and Gene Patents be done with a patent. For example, compul- sory licensing can be evoked by government in An important initiative of the Human certain circumstances, but the TRIPS agreement Genome Project was to build up the private- requires appropriate compensation to be paid. public link so that the expertise and resources The monopoly granted by a patent also over- from both worlds would drive discovery and rides anti-competitive legislation, and this is ultimately clinical care (Chapter 1). Not sur- particularly concerning for DNA genetic testing prisingly in an intense and competitive envi- because restricted licensing could impact nega- ronment, tensions have arisen. The most tively on quality assurance as well as access contentious issue has been gene patents. to testing. This is what happened with multi- A patent is an intellectual property right ple BRCA1, BRCA2 patents held by Myriad® acquired by the inventor of a new, inventive Genetics and its decision that all DNA testing and useful product or process. The purpose of for breast cancer would be conducted in its a patent is to encourage an inventor to place USA-based laboratory. In Australia, an exclu- an invention in the public domain in exchange sive license was given to one company which for certain rights for a limited period (usually also claimed sole rights to all DNA testing in 20 years). The patent allows the inventor to: Australia. There were two issues here; the very 1. Stop others from exploiting the invention restrictive licensing meant costs were likely during the life of the patent; to rise (and so accessibility would become an 2. Exploit the patent, and issue) while lack of competition might lead to 3. License the patent to others. a lowering of standards, and secondly, a more fundamental issue was the patenting of isolated The criteria required before a patent is DNA or gene sequences as these were seen to granted include: be part of nature. Many doubted that inventive- 1. Appropriate subject matter; ness could be claimed. 2. A novel or new invention; Much has been written about the lack of 3. An inventive (non-obvious) or innovative an inventive step if naturally-derived DNA step must be involved, and sequences or genes are patented. However, 4. Usefulness (utility) must be demonstrated. others would argue that once a natural sub- stance like DNA was taken and changes made Within the above framework there are dif- to it, inventiveness could be claimed. Whatever ferences in interpretation. For example, the the merits of these two positions, there is little

MOLECULAR MEDICINE 320 10. Ethical, Legal and Social Issues (ELSI) doubt that in the early days of molecular medi- of nature. However, Myriad® narrowly won cine many patents were granted with dubious its appeal in mid 2011, as the majority judges’ inventiveness being displayed. This is often seen view was that DNA isolated in discovering whenever there is a new wave of technology that the BRCA1, BRCA2 genes was not natural but is unfamiliar to patent examiners. The situation cDNA. The case was taken to the US Supreme improves as the examiners gain more experi- Court which subsequently referred the matter ence, or it could be avoided by better training. back to the US Federal Court because of a recent The Myriad® story captured the interest of patent ruling the Supreme Court had made. the media and the ire of the community, then There are other alternatives to litigation when governments. A case was taken to the European challenging a patent. One involves the taking Court in 2008, which ruled against the com- out of a defensive patent (Table 10.6). Some legal pany but then had to modify its ruling follow- experts have claimed that the patent system in ing an appeal. In 2010 a case was brought to the respect to genomic research is healthy because US District Court against Myriad® and the US it has been possible to protect the public inter- Patent and Trade Office [27]. Myriad® lost this est by defensive patents. However, the costs in case because the court considered that genes taking out and then defending these are con- could not be patented as they were products siderable; hence the option is available to few

TABLE 10.6 Patents – variations, issues and dilemmas [28].

Patent issue Explanation

Dependent patent Patent on an invention that cannot be exploited without encroaching on an earlier patent (dominant patent). Blocking patent Patents used to inhibit developments by others. Defensive patent Patents taken out to prevent others from patenting. Examples: (1) A European charitable organization took out a patent on BRCA2 in an attempt to stop Myriad® in the USA. (2) US Centers for Disease Control and Prevention patented the SARS viral sequence to stop other patents but allowed non-profit organizations free access. Patent thicket Multiplicity of overlapping patents making it difficult for others to navigate through this web to develop their own new technology. Royalty stacking Multiplicity of overlapping patents leading to the need to pay multiple license fees. Reach through claims Claims made by patent holders to future intellectual property in new products that might result from the use of a patented invention. These can restrict the licensee’s rights to future inventions that might emerge. Patent pools One mechanism to deal with multiplicity of patents. Involves a cooperative arrangement allowing the owners of several patents required for some product to license or assign rights at a single price. Licensing Means by which patented technology is legally transferred to others under certain uses and conditions. Unlike laws against anti-competitive practice, a patent is anti-competitive and licensing (with very rare exceptions) is decided solely by the patent holder. The option for exclusive licensing can increase the value of the patent, but diminishes the product’s utility for clinical care.

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 321 organizations. However as has been seen in Scientific Misconduct the Myriad® example, highly experienced legal experts have provided their services pro bono There have been few instances of unethi- because of the important issues involved. As cal behavior by workers in molecular medi- shown in Table 10.6 patents can be a legal mine- cine. Established and accepted principles for field with various options and strategies availa- the conduct of research using DNA have been ble to press home an advantage, or alternatively, followed. Nevertheless, careful monitoring is ensure the competition is disadvantaged. needed to ensure that the applications of molec- A few final points about patents: ular medicine remain medically, scientifically and ethically sound, so that public trust contin- 1. There is considerable discussion about ues. Molecular medicine is vulnerable because: research exemptions in patents, so that research is not inhibited by complex legal 1. There are financial rewards to be gained issues, and from intellectual property and these could 2. There is always the option for national generate perceived or actual conflicts of compulsory licensing. This is rarely invoked interest, and but remains a potent stick for government 2. The large data sets generated are potentially and a way to modify behavior. more difficult to oversight or peer review. Since money is increasingly equated with med- Various forms of scientific misconduct hap- ical discoveries, the question now being asked pen, including fabrication (making up results), is why subjects of research studies are not shar- falsification (manipulating results), plagiarism ing in the spoils. Previously, research volunteers (stealing ideas from others) and suppression would have been satisfied with altruism as their (not revealing data that might impact nega- motive for participation, but this has changed as tively on one’s own results). An area of grow- they see researchers seeking monetary gains. In ing concern is the number of research papers this respect, a number of cogent arguments have that are being retracted following publication. been made that without the very large pedigrees An attempt to quantify how often data are required for positional cloning, genes such as fabricated or falsified was made by a meta- BRCA1 would not have been discovered. analysis of surveys that asked this question Despite the problems highlighted regard- of scientists. The results suggested nearly 2% ing patents, some consider them a necessary might have falsified or modified data in some evil, because without protection for intellec- way. Although this finding is disturbing, the tual property, the huge costs of developing comment was made that 2% was probably a new drugs or therapies would be a barrier to conservative figure [29]. While not unique to progress. A number of very important gene molecular medicine it is an issue that must be discoveries were made by private companies. addressed, particularly since modern analytic The medical and scientific community is now platforms generate Tb (terabytes) and Pb (peta- strongly encouraged to derive benefits from bytes) of DNA sequence data. How this mass the patenting of important discoveries. Patents of information is monitored in terms of peer have become important criteria of the investiga- review will be an important consideration. tor’s productivity and competitiveness in the Graduate students today are less likely to peer review process for gaining research fund- carry out never-ending PCRs or DNA sequenc- ing. Many governments establish a direct link ing reactions, as these are usually undertaken between improvements in health and genera- by centralized analysis facilities. So in many tion of wealth in biotechnology. cases the graduate student is given a stack

MOLECULAR MEDICINE 322 10. Ethical, Legal and Social Issues (ELSI) of data for interpretation (and even this may The education of students, health professionals have been processed into more meaningful and the community is a daunting task with information through various analytic soft- much catching-up needed. This was the sub- ware programs). This move away from the ject of the 2011 report from the US Secretary’s wet-laboratory type postgraduate training to Advisory Committee on Genetics, Health and in silico work is exciting, because it challenges Society (SACGHS), which stated that the full the creativity and resourcefulness of the stu- benefits of the genetics and genomics revolu- dent. However, it is important that there remains tion will only be fully realized if the educa- some understanding of what has occurred to tional and training needs are met [30]. The ensure data continue to be critically evaluated. report made six recommendations with three clearly targeted at government, which must assume leadership to fill what is a substantial CHALLENGES AHEAD (and growing) gap in knowledge of molecu- lar medicine. If not, the slow translation of Education and Engagement research discoveries into the clinic or consulting room will continue. Reference was made to the members of the molecular medicine team in Figure 3.18. This DNA Theft is important to emphasize in an ELSI chapter because a core component to sound practice is Paternity testing is an emotive and conten- an ongoing knowledge of the subject and the tious area of consent. Professional societies agree various changes that are occurring. In a rapidly that all involved parties need to consent, but this moving area like molecular medicine it is dif- may not be possible in the case of children. Court ficult to maintain up-to-date competence in all orders for paternity testing are straightforward, facets – research, science, laboratory and clinical but disputes occur when a putative father takes a issues including counseling as well as skills in sample from the child for paternity testing with- eHealth. As has been shown for the treatment of out the consent of the child or the mother. This diseases affecting several organs, such as cystic biological sample does not have to be blood, but fibrosis, the best approach is through a team so might be shed hair follicles or an object from that optimal care can be reliably integrated. which DNA can be extracted. Does hair belong The community has to be engaged and to anyone once it is shed? Related to this is the involved in molecular medicine. To do this it has hypothetical scenario involving paternity testing to be better informed. This is another role for the on a high profile individual in the community on molecular medicine team. While the team is pre- DNA obtained from discarded tissues taken with- dominantly focused on the patient and the fam- out consent. The circumstance of non-consensual ily, it is ideally placed through the primary care DNA testing, even on material that has been shed, (family) physicians and other health profession- has been dealt with in the UK by a change to the als to be working with the community to ensure Human Tissue Act 2004, which makes this illegal. that all are moving in the same direction and There is no Federal DNA theft law in Australia or with the same vision of what is expected. The the USA, although it was recommended by an community is very positive and generous when enquiry in the former [9,11]. It will be interesting it comes to medical research. It is essential that to follow how the UK DNA theft law works, as this continues as molecular medicine moves into generally correcting a problem with a new law in omics with its increasingly greater need for trust a rapidly changing area like genetics and genom- as goals become very ambitious but less obvious. ics can result in the creation of other problems.

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 323

Whole Genome Sequencing for the potential to generate what is in effect an Patient Care extensive biobank of data. If obtained with the appropriate consent, and securely linked and Whole genome sequencing is a key driver curated carefully, such biobanks would become for many medical research projects in cancer an invaluable resource for research. An analogy and complex genetic disorders. Another front could be made with the Guthrie cards collected is now opening with whole genome sequenc- in the newborn screening program (Chapter 6). ing for direct patient care. ELSI challenges in These cards are considered a resource for public the latter are many, particularly with reference good. However, following some litigation they to the data sets generated and how they will be are now in danger of being destroyed if they used for patient care. To some extent progress were collected without appropriate consent, here will be linked to the development of and there is fear within the community that the eHealth initiatives (Chapter 4). cards could be used inappropriately by govern- Some views on ELSI and whole genome ment or the police [33]. It might be a little late sequencing for clinical care come from a 2010 to save the Guthrie cards already collected, but report by the Health Council of the Netherlands it would be wise to consider the lessons from and in 2011 the PHG Foundation’s Next step newborn screening to avoid similar problems in the sequence: the implications of whole genome with whole genome sequencing. sequencing for health in the UK [31,32]. A key mes- Three areas identified as requiring further sage is that whole genome sequencing will move work and policy development to address ELSI in DNA diagnostic testing and other types of DNA the NG DNA research sequencing strategies are: tests (see Table 3.7) into what is essentially DNA screening since the most likely scenario once 1. Consent; costs fall (some predict $100 per whole genome 2. Data sharing, and sequence!) is for this test to be used on the 3. Return of results [34]. healthy individual even as part of the newborn A comment is made that this technology screening strategy, and then re-interrogated as should not be considered an incremental step required, to look for mutations in known genes forward but a magnitude of order leap into vast or for personalizing treatment options. data sets that will be difficult to interpret. The reports provide interesting glimpses into when whole genome sequencing could be used including adult life or the newborn, the fetus Direct-to-Consumer DNA Testing in utero and even screening embryos as part This topic and its associated ELSI were of in vitro fertilization. Filters using appropri- covered in Chapter 5. Two key issues are ate software can be applied to a whole genome truth-in-advertising as well as the regulatory sequence so that only material needed or ana- requirements for medical genetic testing. The lyzable is extracted. This reduces the unwanted latter has not been resolved in the direct-to- data but even so it is likely that health informa- consumer market but the regulators are starting tion will emerge that was not sought or perhaps to take notice. The ELSI with DTC DNA testing not wanted by the patient and, let us not forget are complex and will evoke contrasting opin- the relatives who share some of the DNA. ions, but ultimately it is unethical to sell infor- mation that might have medical relevance if it Research Implications is not accurate or still at an uncertain (research) As whole genome sequences become a phase in its development. Broad disclaimers get routine component of clinical care, there is around legal issues but not ethical ones.

MOLECULAR MEDICINE 324 10. Ethical, Legal and Social Issues (ELSI)

Access and Equity developing countries. Individuals with rare diseases represent another group that is rec- Delivering personalized medicine through ognized as having a special need. These are molecular medicine is achievable within the defined by the European Union (EU) as dis- infrastructure of a DNA testing laboratory and eases that affect not more than 5 per 100 000 of a teaching hospital, but it must also reach the the population. Although each disease is rare, broader community, including the disadvan- together it is estimated that collectively 1 in 17 taged, or those in rural and remote regions. people will be affected, and 75% involve chil- One economical and practical way to do this is dren with 30% of rare disease patients dying through eHealth. Apart from electronic health before their fifth birthday [35]. Most rare dis- records, there are opportunities for telemedi- eases are genetic disorders, and the EU lists less cine and counseling to be provided in novel common causes as rare cancers, autoimmune ways to ensure that those who are disadvan- disease, congenital malformations, toxic and taged (rural and remote communities, develop- infectious diseases. ing countries and indigenous groups) can still Considerable activity has occurred in the EU fully participate. The Internet and its various in the past decade to address the issue of rare social interactions provide another source for disease, particularly diagnostics and therapies. ensuring access to molecular medicine. Many national bodies such as Rare Disease As analytic platforms evolve they will UK have formed. An important new devel- become more compact, and so could be moved opment in the fight against rare diseases has from the laboratory to the bedside or consulting been whole genome analysis strategies, which office. Small size will be inevitable due to the have already shown usefulness in detecting the continuing improvements occurring through underlying cause and, in one example, tailoring nanotechnology. This will provide immediate treatment to the individual’s disorder (Box 4.8). access to DNA testing, allowing the right medi- cation at the right dose to be individualized for each patient. Stem Cell Tourism Similar developments will occur in crime Apart from the ELSI discussions on the use scene DNA testing. These will reduce the time of human embryos for sourcing stem cells dis- lag and opportunities for contamination or cussed in Chapter 8, a comparable develop- chain of custody problems. This approach, ment to DTC DNA testing is now occurring in called point-of-care testing, will require close regenerative medicine, which is expected to links with experts located centrally through become a billion dollar market in the next few electronic means. The DTC DNA testing philos- years. Patients with severe debilitating and ophy has been criticized, but we should also be untreatable disorders have been attracted by looking at this strategy and how it can be used the promise of cures and have traveled to vari- to deliver more accessible but quality DNA test- ous clinics, many of which are located offshore, ing services without bypassing the health pro- to try stem cell therapies. In these circum- fessionals who can provide advice and assist stances, vulnerable patients have wasted their with interpretation. money, while others risk long-term health con- sequences and death. Rare Diseases In response to this the International Society As discussed in Chapter 6, the benefits for Stem Cell Research has set up a website emerging from the omics revolution will which provides guidelines and factual informa- need to be available to all including those in tion about stem cell therapies for patients. It is

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 325 important for health professionals to be aware JCVI-syn 1.0) which was designed, synthesized of this resource so they can guide interested and assembled de novo from the information in members of the community to it [36]. There is a genome sequence [39]. This provoked pub- also a call for more effort to be made by scien- lic concern and in response, the US President tists, particularly in relation to their control over asked for a report on synbio which was tabled stem cells. The commercialization and direct-to- in December 2010 [40]. A year earlier, the patient advertising of regenerative therapies has European Commission had produced a similar not helped to advance the science. However, like document titled Ethics of Synthetic Biology [41]. DTC DNA testing, it is difficult to regulate par- The 2010 report New Directions: The ethics of ticularly when located offshore, and so an edu- synthetic biology and emerging technologies made cated community both lay and professional is 18 recommendations centered around: a key priority to address examples of what are 1. Funding, research, intellectual property; unethical and even illegal practices. 2. Risk assessment and monitoring, particularly Travel to other countries (particularly poor in relation to field release; or developing countries) to seek an organ trans- 3. Coordination and dialogue both nationally plant (transplant tourism) or a xenotransplant and internationally; (xenotourism) are related issues. An individual 4. Education and ELSI; may resort to the latter because xenotransplanta- 5. Biosecurity, and tion is banned in his or her country (Chapter 8). 6. Regulation. Apart from the comments made above for cell therapy tourism, there is the public health issue The report did not recommend a morato- of safety, particularly in relation to animal-to- rium, as happened for genetic engineering in human spread of infection. 1974, or formation of new regulatory bodies. To address the exploitation of the poor It indicated that synbio could be monitored by who sell their organs from necessity, and to current agencies, which must also be aware of reduce the potential risks to the recipients, what is occurring through ongoing audits of key stakeholders, including the World Health work in major research institutions, as well as Organization have issued The Declaration of the smaller so called DIYBiology movement Istanbul on Organ Trafficking and Transplant (DIY: Do-it-yourself). The second of these is cer- Tourism [37]. This was developed from princi- tainly a big ask! ples in the Universal Declaration of Human Rights The report’s pragmatic approach was partly and will hopefully provide a guide and some underpinned by the knowledge that it would external pressure for ensuring appropriate be prohibitively expensive to make new organ- behavior in tissue organs and transplantation. isms or products, and so the risks are limited, In 2008, the World Health Organization hosted particularly in the DIY environment. This a meeting of experts, which led to the Changsha may not be the case with international bio­ Communique [38] dealing with a broad range of terror, where expense may not be a limitation. regulatory issues in xenotransplantation. Nevertheless, it was proposed that newly syn- thesized bacteria could be tagged with unique Synthetic Biology identifiers, so that they could be traced if used in bioterror. Alternatively synthetic products Synthetic biology (synbio), introduced in could have suicide genes incorporated into them, Chapter 8, hit the headlines in May 2010 when that could be activated if required. J. C. Venter and colleagues described the first Overall, the US recommendations in a com- synthetic bacterium (Mycoplasma mycoides plex and changing environment have been

MOLECULAR MEDICINE 326 10. Ethical, Legal and Social Issues (ELSI) the use of some control but not changes that [12] Elger BS, Caplan AL. Consent and anonymiza- might stifle innovation. In contrast, the 2009 tion in research involving biobanks. EMBO reports 2006;7:661–6. European Commission report noted the exist- [13] Tutton R. Biobanking: social, political and eth- ing and fragmentary regulatory framework, ics aspects. In: Encyclopedia of Life Sciences (ELS). and questioned whether these were sufficient Chichester: John Wiley & Sons, Ltd; 2010. to meet the present and future needs. The many [14] UK Biobank. www.ukbiobank.ac.uk/ applications of synbio (Chapter 8) are marching [15] P3G. www.p3g.org/secretariat/ [16] Kaye J, Boddington P, de Vries J, Hawkins N, Melham forwards using novel approaches to develop K. Ethical implications of the use of whole genome biofuels, improve food production or clean- methods in medical research. European Journal of up after environmental damage. The potential Human Genetics 2010;18:398–403. could be unlimited but with every major leap [17] Lipworth W, Ankeny R, Kerride I. Consent in crisis: there will be concerns about risks. The com- the need to reconceptualize consent to tissue banking research. Internal Medicine Journal 2006;36:124–8. munity and scientists need to be reassured that [18] Overview two cancer genetic legal cases and safety and ELSI will not be compromised. duty of care. www.dnapolicy.org/resources/ Overviewofcourtdecisions_Crockin.pdf [19] Consent and confidentiality in clinical genetic prac- References tice: guidance on genetic testing and sharing genetic information. 2nd ed. www.geneticseducation.nhs.uk/ [1] Mascalzoni D, Hicks A, Pramstaller P, Wjst M. media/47812/report.pdf; 2011. Informed consent in the genomics era. PloS Medicine [20] Barlow-Stewart K, Taylor SD, Treloar SA, Stranger 2008;9:e192. M, Otlowski M. Verification of consumers’ experi- [2] Krleza-Jeric K, Lemmens T. 7th Revision of the ences and perceptions of genetic discrimination and Declaration of Helsinki: Good news for the transparency its impact on utilization of genetic testing. Genetics in of clinical trials. Croat Medical Journal 2009;50:105–10. Medicine 2009;11:193–201. [3] Nuffield Council on Bioethics. http://www.nuffield- [21] Caulfield T, Fullerton SM, Ali-Khan SE, et al. Race and bioethics.org/ ancestry in biomedical research: exploring the chal- [4] Presidential Commission for the Study of Bioethical lenges. Genome Medicine 2009;1:8. Issues. www.bioethics.gov/documents/synthetic- [22] Devaney SA, Palomaki GE, Scott JA, Bianchi DW. biology/PCSBI-Synthetic-Biology-Report-12.16.10. Noninvasive fetal sex determination using cell-free fetal pdf; 2010. DNA: a systematic review and meta-analysis. Journal of [5] UK Human Genetics Commission. www.hgc.gov.uk/ the American Medical Association 2011;306:627–36. Client/index.asp?ContentId1 [23] Wright C. PHG Foundation report – Cell free fetal [6] OECD’s guidelines on human biobanks and genetic nucleic acids for non-invasive prenatal diagnosis. research databases. http://www.oecd.org/dataoecd/ www.phgfoundation.org/download/ffdna/ffDNA_ 41/47/44054609.pdf report.pdf; 2009. [7] UNESCO report on consent. http://unesdoc.unesco. [24] Wilkinson S. Sexism, sex selection and “family balanc- org/images/0017/001781/178124e.pdf ing”. Medical Law Review 2008;16:369–89. [8] National Health & Medical Research Council of [25] McInness RR. Culture: The silent language geneti- Australia – 3 relevant publications: Medical genetic cists must learn – genetic research with indigenous testing: information for health professionals; Biobanks populations. American Journal of Human Genetics information paper; Guidelines approved under sec- 2011;88:254–61. tion 95AA of the Privacy Act 1988 (Cth). www.nhmrc. [26] Canadian Institutes of Health Research. Guidelines gov.au/your_health/egenetics/index.htm for health research involving aboriginal people. www. [9] Essentially Yours: The protection of human genetic cihr-irsc.gc.ca/e/29134.html information in Australia. www.austlii.edu.au/au/ [27] US Federal Court decision on Myriad 2010. www. other/alrc/publications/reports/96/ genomicslawreport.com/wp-content/uploads/ [10] Hudson KL. Genomics, health care and society. New 2010/03/Myriad-SJ-Opinion.pdf England Journal of Medicine 2011;365:1033–41. [28] Australian Law Reform Commission 2004 report [11] Tamir S. Direct-to-consumer genetic testing: ethical- on Genes and Ingenuity: Gene Patenting and legal perspectives and practical considerations. Human Health. www.austlii.edu.au/au/other/alrc/ Medical Law Review 2010;18:213–38. publications/reports/99/

MOLECULAR MEDICINE 10. Ethical, Legal and Social Issues (ELSI) 327

[29] Fanelli D. How many scientists fabricate and falsify [35] Rare disease UK. http://www.raredisease.org.uk/ research? A systematic review and meta-analysis of See also the EU site with its numerous reports on this survey data. PloS ONE 2009;4:e5738. subject. http://ec.europa.eu/health/rare_diseases/ [30] US SACGHS 2011 report on Genetics Education and publications/index_en.htm Training. http://oba.od.nih.gov/oba/SACGHS/ [36] International Society for Stem Cell Research patient reports/SACGHS_education_report_2011.pdf education website. www.closerlookatstemcells.org// [31] Health Council of the Netherlands. The “thousand AM/Template.cfm?Section=Home dollar genome”: an ethical exploration. Monitoring [37] Declaration of Istanbul on organ trafficking and Report Ethics and Health, 2010/2. The Hague: Centre transplant tourism. Clinical Journal of the American for Ethics and Health, 2010. www.gezondheidsraad. Society of 2008;3:1227–31. nl/en/publications/thousand-dollar-genome-ethical- [38] WHO Statement on regulatory requirements for exploration#a-downloads xenotransplantation clinical trials (Changsha Communi­ [32] Next steps in the sequence: the implications of que). www.who.int/transplantation/xeno/en/ whole genome sequencing for health in the UK. [39] Gibson DG, Glass JI, Lartigue C, et al. Creation of a PHG Foundation 2011. www.phgfoundation.org/ bacterial cell controlled by a chemically synthesized reports/10364/ genome. Science 2010;329:52–6. [33] Tarini BA. Storage and use of residual newborn [40] Presidential Commission for the Study of Bioethical screening blood spots: a public policy emergency. Issues. New directions: the ethics of synthetic biol- Genetics in Medicine 2011;13:619–20. ogy and emerging technologies. www.bioethics.gov/ [34] Tabor HK, Berkman BE, Hull SC, Bamshad MJ. documents/synthetic-biology/PCSBI-Synthetic- Genomics really gets personal: how exome and whole Biology-Report-12.16.10.pdf; 2010. genome sequencing challenge the ethical framework [41] European Commission. Ethics of synthetic biology. of human genetics research. American Journal of http://ec.europa.eu/bepa/european-group-ethics/ Medical Genetics Part A 2011;155:2916–24. docs/opinion25_en.pdf; 2009.

Note: All web-based references accessed on 19 March 2012.

MOLECULAR MEDICINE Index

Note: Page numbers followed by “f” indicates figures and “t” indicates tables.

A Apoptosis overview 2, 129–137 Abacavir 106 extrinsic pathway 225 Web resources 132–133t, 136t Absolute risk (AR) 98–99t intrinsic pathway 225 Bioterrorism ACE, see Angiotensin-converting overview 224–225 anthrax 295 enzyme AR, see Absolute risk historical perspective 294t aCGH, see Array-based comparative Archon Genomics X Prize 122 microbial forensics 294–296 genomic hybridization Array-based comparative genomic overview 292–297 Acute promyelocytic leukemia hybridization (aCGH) 128–129, scientific research 296–297 236–237 130f BRAF 234, 240 ADA, see Adenosine deaminase ART, see Assisted reproductive BRCA1 231–233 Adenosine deaminase (ADA), technology BRCA2 231–233 gene therapy for deficiency 253, Artemisinin 141 Breast cancer 257–258t, 263 ASC, see Adult stem cell alleles 233–234 Adrenoleukodystrophy (ALD), gene Assisted reproductive technology BRCA genes 231–233 therapy 258, 260t (ART) 209–210 genetic testing 109, 110f Adult polycystic kidney Autonomy, ethics 301 overview 231–234, 232t disease 95f Autopsy, see Molecular autopsy risk and penetrance 233 Adult stem cell (ASC) 267t, 268–269 Autosomal dominant disorders Aging examples 43–48, 44t C genetics 211–213 pedigree 44f Caloric restriction, see Dietary models trinucleotide repeat expansion 45f, restriction animal models 213–214 48t Cancer, see also specific cancers human disease 213t Autosomal inheritance 40 apoptosis theories 212t Autosomal recessive disorders extrinsic pathway 225 AKT1 77 iron overload model 48–49, 49t intrinsic pathway 225 ALD, see Adrenoleukodystrophy pedigree 44f, 48 overview 224–225 Allele 40 predisease 50 cell cycle dysregulation 223–224, Allele drop-out 90–91, 90f Avian influenza 189–190 223f Alzheimer disease 68–69, 176 diet effects 198–199 Analytic validity 99, 100t, 110, 111f B DNA repair defects 225–226, 226t Aneuploidy 62 BCR-ABL fusion 236, 237f epigenetics 226–227 Angelman syndrome 75–76, 76t, 209 Beckwith-Weidemann syndrome germline cancers Angiotensin-converting enzyme 75–76, 76t, 209 breast cancer 231–234, 232t (ACE) 57 Beneficence, ethics 301 colon cancer 228–231, 229f, 230t Annotation, genome 29 Biobank 305–306, 307t, 311t overview 228 Anticipation 45, 47f Bioeconomy 199 history of study 214–216, 214–215t APC 228–231, 229f, 230t Bioinformatics metastasis 227–228 APOE applications microRNA genes 221–223 aging genetics 212–213 clinical 134–137 oncogenenes 216–217, 218f, 219t APOE4 research 131–133 retroviruses 216–217, 217f Alzheimer disease alleles 69 DNA variant in silico analysis 134, somatic cell cancers boxers and dementia 176 135 hematological malignancies modifying gene 57 hardware developments 133 235–238, 236f, 237t

329 330 Index

Cancer (Continued) nosocomial infection and drug Development oncogenic viruses 240, 241–242t resistance 180–182 epigenetics overview 234–235 pathogenesis developmental abnormalities solid malignancies 238–240 host resistance 186–188 210–211 tumor suppressor genes 217–221, overview 185–192 gametogenesis 210 219t virulence factors 185–186, 186f, preimplantation embryo 210 two-hit model 220 187t HOX genes 204–206, 205f Candidate gene 43 prion disease 183, 184–185 imprinting 207–210 Capillary gel electrophoresis 119 taxonomy and comparative microRNA genes 207 Carrier testing, X-linked disease 53 genomics 179–180, 179f PAX genes 206, 206t cDNA, see Complementary DNA zoonosis 192–194, 193–194t puberty 211 Cell cycle, dysregulation in cancer Comparative effectiveness research SOX genes 206–207 223–224, 223f (CER) 149–151 SRY 207, 208f Centromere 18–19, 22f Comparative genomic hybridization, DGGE, see Denaturing gradient gel CER, see Comparative effectiveness see Array-based comparative electrophoresis research genomic hybridization dHPLC, see Denaturing high- Cetuximab 108t, 240 Comparative genomics 22, 24 performance liquid CF, see Cystic fibrosis Complementary DNA (cDNA), chromatography Chemical cleavage of mismatch 94 synthesis 10f Diabetes, types 64t Chimerism 60 Complex genetic disorder Dietary restriction 213–214 Chromatin 20 Alzheimer disease 68–69 Direct-to-consumer (DTC) DNA Chromosomal mosaicism 58–59 common health issues 63–65 testing Chromosome de novo mutations 68 advertising 166 aberrations, see Cytogenetics definition 1–2 benefits 157–158 disorder types 62–63 gene discovery clinical and laboratory implications structure 18–22 association study components 157f Chronic myeloid leukemia 236 65–66, 66f concerns 158–163 Clinical practice, driving changes in genome-wide association study definition 154 molecular medicine 33–34 66–68, 67f education and research 166 Clinical utility 100t, 111, 111f hypothesis 65f ethical, legal, and social issues 323 Clinical validity 100t, 111, 111f imprinting 76 genetic counseling 162–163 Cloning risk calculation in genetic testing Government Accountability Office comparison of types 266t 95–99, 98–99t audits 158 reproductive cloning 264 Complex somatic disorders 78 interpretation 159, 163t somatic cell nuclear transfer Conditional probability 96, 98t laboratory standards 159–161 209–210, 264, 265f Confidentiality 309–313, 311t marker selection 160 stem cells Consent, see Informed consent market 154–155, 156t adult stem cells 268–269 Copy number variation (CNV) 12t, privacy 162 comparison of types 267t 14, 129 professional standards 164–166 embryonic stem cells 266–268 Cystic fibrosis (CF) prospects 166–167 induced pluripotent stem cell 269 population screening 172–173 psychiatric practice 162–163 overview 265–270 positional cloning 43 public trust 162 sources 267f Cytochromes P0450 (CYPs) 105 regulation 163–164 transdifferentiation 269–270 Cytogenetics regulatory issues 155 therapeutic cloning 264 abnormality types 22 research versus validated tests 161 Clopidogrel 106–107 Down syndrome 23–24 selection of test 159 CNV, see Copy number variation G-banding 20–21, 22f self-regulation of services 164, 165t Codon 3, 7t test development pipeline 161f Colon cancer 222, 228–231, 229f, 230t D test types 155 Communicable disease, see also Denaturing gradient gel worried well concerns 162 Viruses electrophoresis (DGGE) 94 Discrimination and stigmatization 24, detection 178–185, 178t Denaturing high-performance liquid 172–173, 313–315 emerging infection 192–194 chromatography (dHPLC) 94 Dizygotic twins 40–41, 71 Index 331

DNA ENCODE Project, see Encyclopedia of regulation 318–319 genetic code 3, 4t, 7t DNA Elements Project scientific misconduct 321–322 hybridization 8 Encyclopedia of DNA Elements overview 110, 112f probe types 6, 8, 8t (ENCODE) Project 26t, 32 privacy, confidentiality, and duty replication 4–6, 7f Endangered species, DNA profiling of care 309–313 structure 2–9, 6f 293t workplace screening 313–314 DNA genetic testing, see Genetic ENU, see Ethylnitrosourea glossary 311t testing Epidemiology 169 informed consent DNA methytransferase I (DNMT1) 70 Epigenetics biobank 305–306, 307t DNA microarray aging theory 212t clinical practice and research array-based comparative genomic cancer 226–227 302–304 hybridization 128–129, 130f clinical relevance 71–74 genetic identifiers 304–305 gene expression analysis 126–127, developmental abnormalities impacting issues 310t 127f 210–211 models 308–309 personalized medicine 128, 130f diseases 72t omics research 306–309 principles 125–126 DNA methylation 70 overview 302–309, 304f single nucleotide polymorphism epigenome 73–74 populations 304 microarray 127–128 gametogenesis 210 terminology 305 DNA repair histone modification 70 standards and guidance in mechanisms 226t imprinting 76, 77f molecular medicine 303t mismatch repair 226 inheritance 69–76 Ethylnitrosourea (ENU), mouse overview 225–226 non-coding RNA 70–71 mutants 143 DNA scanning 94 nucleosome positioning 70 Evidence-based medicine (EBM) 149 DNA sequencing preimplantation embryo 210 Expressivity 44t applications transgenerational and clinical 122–125 environmental effects 73 F research 121–122 twin studies 71 Factor VIII, recombinant 248f comparison of techniques 120t Epimutation 84, 227 Familial adenomatous polyposis (FAP) historical perspective 117–119, 118t ESC, see Embryonic stem cell 228–231, 229f, 230t next generation DNA sequencing Ethical, legal, and social issues (ELSI) Familial hypercholesterolemia 174–175 118–119, 120f, 121, 121f, 123, ethics principles 301–302 Family balancing 315–316 124, 151 genetic testing FAP, see Familial adenomatous single molecule sequencing 119, challenges polyposis 120t access and equity 324 FISH, see Fluorescence in situ DNA vaccine 252 direct-to-consumer DNA hybridization DNMT1, see DNA methytransferase I testing 323 Fluorescence in situ hybridization Dominant 40 DNA theft 322 (FISH) 21, 22, 42t, 62–63, 81, 108t, Down syndrome, cytogenetics 23–24 education 322 128, 129, 177 Drug resistance rare diseases 324 Forensics nosocomial infection 180–182 stem cell tourism 324–325 disaster victim identification organisms 182t synthetic biology 325–326 288–289 DTC DNA testing, see Direct-to- whole genome sequencing 323 DNA databases consumer DNA testing ethnicity 314 concerns 287–288 Duty of care 309–313, 311t genetic screening examples 287–288 neonates 315 partial DNA matches 288 E non-invasive prenatal DNA evidence EBM, see Evidence-based medicine diagnosis 315 admissibility 285, 297 EHR, see Electronic health record sex selection 315–316 controversies 279 Electronic health record (EHR) vulnerable groups and DNA profiling 134, 135 guidelines 316, 317t cold case solving 289–290 ELSI, see Ethical, legal, and social insurance risk rating 314 in-field testing 297 issues oversight marker selection 282, 284t Embryonic stem cell (ESC) 266–268 patents 319–321, 320t match probability 282–284, 285f 332 Index

Forensics (Continued) classification obesity, see Obesity non-medical applications 293t outcomes 114 overview 194–199 personalized justice and tests 101–104, 103t Globin gene cluster 54–55, 55f sentencing 297–298 clinical utility 109 GWAS, see Genome-wide association polymerase chain reaction DNA variant detection 87–95, 88f, study 281–282 90f quality assurance 282, 283f ethical, legal, and social issues, see H DNA sources at crime scenes Ethical, legal, and social issues Haplotype 40, 66, 95 285–287 evolution of tests 82f haplotype blocks 66, 67f, 127 expert testimony 280–281 forensics, see Forensics haplotype map, see HAPMAP historical perspective 276–279, germline versus somatic cell tests 82 HAPMAP 26t, 66–67 276–277t infection, see Communicable disease HBV, see Hepatitis B virus molecular autopsy 291–292 laboratory responsibilities 113, 114 HD, see Huntington disease National Academy of Sciences mutations, see also Mutation Hemizygous 50–51, 78, 92, 151 report 279–280, 280t analysis approaches 85, 93t Hemophilia overview 275–281 nomenclature 85–87, 86t features 52t polymorphism analysis 277–279, 278f types 84–85, 84t flip tip recombination mutation 53f post-conviction DNA testing 290 pharmacogenetics, see gene therapy 257–258t relationship testing 290–291 Pharmacogenetics heredity 51–53 terrorism, see Bioterrorism pharmacogenomic testing 111t management Fragile X syndrome 45f, 71 polymorphism types 82–83, 83f inhibitor development in patients Functional cloning 41 risk calculation 248–249 Functional genomics 29–31 complex genetic disorders 97–99, milestones 246t 98–99t plasma products 246–247, 247t G Mendelian disorders 96–97, recombinant DNA-derived G-banding 20–23, 22f 98–99t products 247–248, 248f Gene discovery overview 95–99 Hepatitis B virus (HBV), recombinant chromosomal location 41–42 test development pipeline 101f vaccine 249–250 complex genetic disorders validation of tests 99–101, 100t, Hereditary hemochromatosis association study components 110–111 gene-environment interactions 50 65–66, 66f workplace DNA testing gene mutations 48–49, 86t genome-wide association study ethics 313–314 genetic testing 101–102, 103t, 104f, 66–68, 67f identity establishment 177–178 109 confirmation 42 litigation 177 screening 50, 172t mapping 42, 42t predisposition testing 175–177 Hereditary non-polyposis colon cancer overview 41–43 toxin exposure assessment (HNPCC) 222, 226 positional cloning 43 177–178 Heterodisomy 57, 58f Gene, structure 9–11, 9f Genome Heteroplasmy 61, 62t, 284t Gene-environment interactions 31f, comparison between species Heterozygous 40 39, 50, 97 complexity 17f HFE 101–102, 103t, 104f Gene-gene interactions 54–57, 54f size 16t HIV, see Human immunodeficiency Gene gun 254 variation 31–32 virus Gene therapy, see RNA interference; Genome-wide association study HIV, see Human immunodeficiency Somatic cell gene therapy (GWAS) 66–68, 67f, 76, 97, 121, virus Genetic counseling 127, 169, 188, 196, 233 HNPCC, see Hereditary non-polyposis direct-to-consumer DNA testing Genomic imprinting, see Imprinting colon cancer relationship 162 Genotype 40 Homoplasmy 61, 62t types 111–112 Genotypic assay, carrier testing 53 Homozygous 40 Genetic determinism 112, 302 Germline mosaicism 59–60, 60f HOX genes, development 204–206, 205f Genetic testing, see also Direct-to- Global health HPV, see Human papillomavirus consumer DNA testing bioeconomy 199 Human Genome Project ACCE approach for evaluation challenges 195t Department of Energy role 25 109–111, 111f nutrigenomics 197, 198 goals 24–25, 27t Index 333

historical perspective 25–29 screening 50 Mitochondrial DNA, aging theory 212t international research activities 26t Isodisomy 57, 58f Mitochondrial inheritance omics origins 29–31 Isomorph 10–11 disease examples 61, 62t overview 2, 22–31, 292 overview 60–62 Human immunodeficiency virus (HIV) J pedigree 61f gene therapy 258t Joint probability 96–97, 98t MLD, see Metachromatic resistance 186–188 Justice, ethics 302 leukodystrophy Human Microbiome Project 133t, Justice Project 290, 290t Modifying genes 56–57 146–147 Molecular autopsy 291–292 Human papillomavirus (HPV), K Molecular genetic testing, see Genetic recombinant vaccine 250, 252 KRAS 108t, 229f, 240 testing Huntington disease (HD) 40–41, 44–48, Molecular medicine 48t, 50, 86, 90f, 103t, 110, 159, 204, L milestones 3f, 5t 229, 313f LDLR 123, 175 Nobel Prize winners 4t Legal issues, see Ethical, legal, and Monosomy 22, 58f, 62 I social issues Monozygotic twins 40–41, 71, 195, 264 ICSI, see Intracytoplasmic sperm Leukemia 235–238, 236f, 237t Mosaicism injection Linkage analysis 41, 94–95, 94f chimerism 22, 60 IGF1 213–214 LOD score 41–42 chromosomal mosaicism 58–59 Imatinib 141, 237 Long QT syndrome (LQTS), molecular germline mosaicism 59–60, 60f Immigration, relationship testing 291 autopsy 292 overview 58–60 Imprinting LQTS, see Long QT syndrome somatic cell mosaicism 59, 60f complex genetic disorders 76 Lymphoma 187, 206, 222, 225, 237t, mRNA, see Messenger RNA diseases 75–76, 76t 241–242, 258 Multipurpose DNA genetic test 102f epigenetics 76, 77f Lyonization 51 Mutation overview 74–76 analysis approaches 85, 93t twins 71 M direct detection In vitro fertilization 209–210 Malaria, resistance 188 deletions 92, 93f Induced pluripotent stem cell (iPSC) Marijuana, DNA profiling 293t overview 91–92 269 Mass spectrometry single base changes 91–92, 92f Infection, see Communicable disease metabolomics 140–141, 142 DNA scanning 94 Influenza virus 188–192, 189f, 190f proteomics 138–139 indirect detection 92–95 Informed consent MC4R 196–197 nomenclature 85–87, 86t biobank 305–306, 307t Mendelian disorders, risk calculation types 84–85, 84t clinical practice and research in genetic testing 96–97, 98–99t 302–304 Messenger RNA (mRNA) N genetic identifiers 304–305 alternative splicing 10–11 Nanomedicine 263 impacting issues 310t function 15t ncRNA, see Non-coding RNA models 308–309 transcription 9 Neonate omics research 306–309 Metabolomics 140–142 genetic screening ethics 315 overview 302–309, 304f Metachromatic leukodystrophy population screening 174 populations 304 (MLD), gene therapy 260t, 261 Next generation (NG) DNA terminology 305 Metagenomics sequencing 118–120, 120f, 121, Innocence Project 290 Human Microbiome Project 133t, 121f, 123–125, 151, 239–240 Insurance, risk rated 314 146–147 NG DNA sequencing, see Next Integrity, ethics 301 overview 146–147 generation DNA sequencing Intracytoplasmic sperm injection Metastasis 227–228 NIPD, see Non-invasive prenatal (ICSI) 209–210 MicroRNA genes diagnosis iPSC, see Induced pluripotent stem cell cancer role 221–223 Non-coding RNA (ncRNA) Iron overload, see also Hereditary development role 207 epigenetics 70–71 hemochromatosis Microsatellite 82, 277 functions 15–18, 15t, 19t causes 49t Minimal residual disease leukemia 238 housekeeping RNAs 16–18 gene mutations 48–49 Missing heritability factors 63 regulatory RNAs 18 334 Index

Non-invasive prenatal diagnosis human models 142–143, 142f Public health (NIPD) 315 overview 142–146 behavior change 174–175 Non-synonymous single nucleotide Phenotype 2, 40, 44t, 45, 54f, 56, 62t, overview 169–170 polymorphism 12, 84t 65f, 66f, 95, 96f, 142–144, 178t, population screening 170–174, 171t, Nutrigenetics 155, 197–198 230, 307t 172t Nutrigenomics 197–198, 198t Phenotypic assay, carrier testing 53 preventive medicine 170, 170t Philadelphia chromosome leukemia workplace DNA testing O 214t, 236, 237f identity establishment 177–178 Obesity Pig endogenous retroviruses (PERV) litigation 177 genetics 196, 197 271 predisposition testing 175–177 overview 195–197 PIG-A 78 toxin exposure assessment 177 microbiome 196–197 PKD1 95f Odds ratio (OR) 67, 97, 98–99t, 160f Pleiotropy 44t, 142 Q Omics, see also specific omics and PML-RARα fusion leukemia 236–238 QTL, see Quantitative trait loci techniques PNH, see Paroxysmal nocturnal Quantitative trait loci (QTL) 57, consent in research 306–309 hemoglobinuria 63–64, 188 origins 29–31 Polygenic 64 outputs 31f Polymerase chain reaction (PCR) R types 30t errors 89–91, 90f Randomized clinical trial (RCT) 108, OMIM, see Online Mendelian forensic DNA 281–282 149, 150f Inheritance in Man gel electrophoresis of products 88f, RCT, see Randomized clinical trial Oncogenenes 4, 214–217, 218f, 219t 119 Recessive 40, 44f, 48–49, 54, 96–97, 1000 Genome Project 31–32, 132t principles 87–91, 88f 103t, 105, 171–173, 197, 218 Online Mendelian Inheritance in Man types 89t Recreational test 104 (OMIM) 40, 136t Polyploidy 62 Relative risk (RR) 97, 98–99t, 135, 160f Open-ended consent 308 Population screening Repetitive DNA 2, 11, 20, 51f criteria 171t Restriction fragment length P cystic fibrosis 172–173 polymorphism (RFLP) 11–12, 13f, Paroxysmal nocturnal hemoglobinuria neonates 174 82, 83f, 93t (PNH) 78 overview 170–174 Reverse genetics 41 Patent, ethical, legal, and social issues sickle cell in sport 173 RFLP, see Restriction fragment length 319–321, 320t strategies 172t polymorphism Paternity testing 290–291 Positional cloning 41–43 Ribosomal RNA (rRNA), function 14, PAX genes, development 206, 206t Posterior probability 97, 98t 15t, 16, 18, 19t, 61, 146 PCR, see Polymerase chain reaction Prader-Willi syndrome 75–76, 76t, 77f Ribozyme, therapeutic prospects 262 Penetrance 44t, 65f, 97, 103t, 110, 159, Predictive medicine 33f, 47 Risk rated informed consent 309f 215t, 228–230, 232t, 233, 312 Predisease 50, 314 RNA, see also specific RNA types Personalized medicine Premutation 45f, 46 functions 15, 15t principles 32–36, 33f Primer 6f, 8–9, 87, 88f, 89t, 90f, 92–93 non-coding RNA 15–18, 15t, 19t relationship with genomic and Prion disease 183, 184–185 structure 14–15 molecular medicine 150f Prior probability 96–97, 98t RNA interference roadmap 34–36 Privacy 309–313, 311t gene therapy 261–262 PERV, see Pig endogenous retroviruses Probe, types 6–9, 8t principles 18, 20f Pharmacogenetics Progeria 72, 213t RR, see Relative risk clinical practice 106–108 Proteomics rRNA, see Ribosomal RNA drug development 108 biomarker discovery 139 germline DNA testing 107t drug development 140 S overview 104–108 interactome 140 SARS, see Severe acute respiratory Pharmacogenomics, genetic testing overview 137–140 syndrome 111t protein microarray 140 Satellite DNA 11 Phenocopy 40 technology 138–139 SCID, see Severe combined Phenomics 2, 29, 31, 69, 138 venomics 138 immunodeficiency animal models 143–146 Puberty 211 SCNT, see Somatic cell nuclear transfer Index 335

Screening, see Population screening overview 265–270 Tumor suppressor genes 214–215t, 216, Severe acute respiratory syndrome sources 267f 217–221, 219t, 229f, 258t (SARS) 189, 191–192 tourism 324–325 Severe combined immunodeficiency transdifferentiation 269–270 U (SCID), gene therapy 256, 257t, Susceptibility gene 97, 232 Uniparental disomy 259–260 Swine influenza 191–192 mechanisms 58f Sex selection 315–316 Symptomatic carrier 51, 193t models 75, 76t Sickle cell hemoglobin 54, 96f Synonymous single nucleotide types 57 Sickle cell disease 86t, 172–173, 253 polymorphism 12, 84t Simple tandem repeat, see Synthetic biology V Microsatellite applications 272 Vaccines 249–252 Single gene Mendelian disorder 1–2, challenges 272 conventional vaccines 251 65f, 95–97, 98–99t definition 271 DNA vaccines 252 Single gene somatic disorder 77–78 ethics 325–326 recombinant Single molecule sequencing 118t, 119, Systems biology hepatitis B virus vaccine 249–250 120t, 121f overview 2, 29, 131, 137, 147–149 human papillomavirus virus Single nucleotide polymorphism representation 148f vaccine 250–252 (SNP) 12–15, 26t, 28, 63–67, 76, 82, clinical applications 147–149 Variants of unknown significance (VUS) 83f, 95, 103t, 107t, 124–126, 160f, 82, 124, 129, 159, 175, 226, 232, 234 232, 297 T VDJ recombination 235, 236f Single nucleotide polymorphism Telomere Vemurafenib 240 microarray 127–128 aging theory 212t, 213t Venomics 30t, 137–138 Single nucleotide variation (SNV) 12t, dysfunction and disease 21t Virulence factors 185–186, 186f, 187t 13–14, 29, 32, 83 overview 6, 12, 16, 19–20, 22f, 215t Viruses Single-stranded conformation Terrorism, see Bioterrorism blood transfusion services 182–183 polymorphism (SSCP) 94 Thalassemia DNA sequencing 180 SNP, see Single nucleotide linkage analysis 94f gene therapy vectors 254–256 polymorphism malaria protection 55 influenza virus 188–192, 189f, 190f SNV, see Single nucleotide variation molecular pathology 56 oncogenic viruses 216–217, 217f, Social issues, see Ethical, legal, and phenotypes 56 240, 241–242t social issues Thrifty genes 195–196 pig endogenous retroviruses 271 Somatic cell gene therapy TP53 125–126, 215, 218–219, 221, RNA viruses 181 applications 253 224–225, 229f, 231, 233, 236, 241, Vitamin D, deficiency and cancer clinical trials 256–261, 257–258t 242t, 257t 198–199 disadvantages 255 TPMT 103t, 106, 107t VUS, see Variants of unknown fatalities 310 Transcription factor 9–10, 213, 218f, significance gene delivery 253–256, 253f, 256t 219t, 228, 265, 270 overview 252–262 Transdifferentiation, stem cell W regulation 262–263 generation 269–270 Werner syndrome 213t target cells 255–256 Transfer RNA (tRNA), function 14–16, Whole exome sequencing 32, 68, 113, Somatic cell mosaicism 59, 60f 15t, 18, 61, 62t 119, 120f, 121, 124–125, 151, 239 Somatic cell nuclear transfer (SCNT) Transgenic mouse 143, 144–145 Whole genome sequencing 14, 26t, 65f, 209–210, 264, 265f, 268–269 Translocation 22–24, 62–63, 78, 141, 204, 67, 105, 120–124, 150f, 151, 166, SOX genes, development 206–207 206, 214t, 222, 225, 235–239, 248 191, 215t, 233, 239, 293, 295, 297, Spanish influenza 190–191 Transplantation, see 318, 323 SRY, sex determination 207, 208f Xenotransplantation Williams syndrome 63 SSCP, see Single-stranded Trastuzumab 108t, 141, 240 Workplace DNA testing conformation polymorphism Trinucleotide repeat expansion 45f, ethics 313–314 Stem cell 46t, 48t identity establishment 177 adult stem cells 268–269 Trisomy 22–24, 58, 62, 69 litigation 177 comparison of types 267t tRNA, see Transfer RNA predisposition testing 175–177 embryonic stem cells 266–268 Tuberculosis 181, 182t, 195 toxin exposure assessment 177 induced pluripotent stem cell 269 Worried well 161–162, 170t, 174 336 Index

X Xenotransplantation Y X-inactivation 51, 210 graft rejection 270 Y chromosome, structure 51f X-linked inheritance infection 271 carrier testing 53–54 overview 270–271 Z overview 40, 50–53 regulation 271 Zebrafish, applications 16t, 143, 146 pedigree 44f XIST 51, 210