Next Generation Genomics the DNA Revolution

Total Page:16

File Type:pdf, Size:1020Kb

Next Generation Genomics the DNA Revolution June 2020 Next Generation Genomics The DNA Revolution Leonard C. Mitchell, CFA Table of Contents Introduction 3 Precedents for Hyperbolic Change 4 The Promise of Genetic Engineering to Reduce Human Suffering 5 To Understand Genomics, Understand Proteins 8 Big Pharma and The Era of Personalized Medicine 12 The Human Microbiome 16 Epigenetics and Disease 18 Genomics and Cancer 20 Viruses, Vaccines and Cures 24 Reading and Writing the Genetic Code 28 The Blueprint for Life 29 The Race to Map the Human Genome 31 Then There Was CRISPR 33 How and Why CRISPR? 34 The Need for Faster Sequencing 35 Hacking the Human Genome 36 The Life Science Industry - Mergers and Acquisitions 38 Artificial Intelligence and Big Data 39 Bioengineering Agricultural Engineering 40 Industrial Scale Bio-Manufacturing 41 Conclusion and Risks 44 About the Author 46-47 This is the third in a series on developing technologies that will drive our economy and soon change our lives. Two technologies in particular, Artificial Intelligence, covered earlier, and Genomics, will be the most impactful in shaping life in the 21st century. Leonard C. Mitchell, CFA “We are running on the last ounces of gas in the philosophical gas tank, we are facing philosophical bankruptcy, the new challenges, especially for the new technologies. Climate change and nuclear war are kind of easy challenges, because we know what to do about them, we need to prevent them, it’s very easy. Not all agree on what to do, but nobody says in principal, we should have more of these. But with AI and genomic engineering there are no agreed goals. The dream of some people are the nightmares of others. We don’t even have the philosophical basis to discuss it.” Yuval Noah Harari (author of Sapiens) and Steven Pinker (psychologist, author) in conversation. Genomics 3 INTRODUCTION “We have got to the point in human history where we simply do not have to accept what nature has given us.” Jay Keasling, professor of biochemical engineering, UC Berkeley in The New Yorker, 2009 The study of genomics is rapidly changing the medical field. George Church, Harvard professor of genetics, and author of Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, tells the story of a child named Nic in Madison, Wisconsin who began having intestinal problems before the age of two. The physicians at Children’s Hospital in Milwaukee failed to find the cause. After more than 100 surgeries before he turned four, Nic’s chief physician determined that the symptoms were so severe and uncontrolled that the problem had to be genetic. Four months after Nic’s genome was sequenced, they identified the gene responsible for his illness. Nic had a rare disease that required a bone marrow transplant. The transplanted stem cells from cord blood of a matched healthy donor replaced his defective immune system with a functioning one. Thanks to genetic science, Nic is healthy today. Advancements in genetics means we can now read, remove, copy, and replace defective genes. As unimaginable today as iPads and smart phones would have been to our grandparents 100 years ago, our understanding of genomics means that we are on the verge of being able to engineer disease resistance in our cells, grow molecular computers, and program bacteria to produce any chemical humans need. A diagnosis of a genetic disease or cancer will soon no longer be a death sentence. It will be possible for pharmaceutical companies to customize drugs for your unique biome. New drugs will be formulated in half the time and at half the cost. Personalized medicine will replace the one- size-fits-all pharmacology of today. Human cells will be reconfigured to resist common viruses; no more colds or flus. The descendants of today’s humans will be stronger, smarter, more beautiful, and healthier while living well into their 100s. Farmers will have crops that fertilize themselves in less than a decade. Domestic animals and pets will be creatures that do not exist today. The largest databases will be stored on DNA chips. The precursors for all of these advancements are found in modern laboratories today. Genomics 4 PRECEDENTS FOR HYPERBOLIC CHANGE “Our forebears expected the future to be pretty much like their present, which had been pretty much like their past. Although exponential trends did exist a thousand years ago, they were at that very early stage where an exponential trend is so flat that it looks like no trend at all. So their lack of expectations was largely fulfilled. Today, in accordance with the common wisdom, everyone expects continuous technological progress and the social repercussions that follow. But the future will be far more surprising than most observers realize: few have truly internalized the implications of the fact that the rate of change itself is accelerating. Most long-range forecasts of technical feasibility in future time periods dramatically underestimate the power of future technology because they are based on what I call the “intuitive linear” view of technological progress rather than the “historical exponential view.” To express this another way, it is not the case that we will experience a hundred years of progress in the twenty-first century; rather we will witness on the order of twenty thousand years of progress at today’s rate of progress, that is.” Ray Kurzweil, American inventor, author, and futurist HYPERBOLIC CHANGE - A VISUAL Human existence remained pretty much the same from one generation to the next. Changes wrought by the introduction of steel and later petroleum along with advancements in science and engineering changed life forever. These charts show that the rate of change can at first be gradual and then suddenly hyperbolic. The combination of artificial Intelligence and genomics in this century will change us as rapidly and shake the foundations of society. Later sections of this paper will explain why this change is upon us and which companies and industries will be affected. Genomics 5 THE PROMISE OF GENETIC ENGINEERING TO REDUCE HUMAN SUFFERING “Computer science is going to evolve rapidly, and medicine will evolve with it. This is coevolution.” Larry Norton, cancer specialist at Memorial Sloan- Kettering Cancer Center working with IBM’s Watson DNA is remarkable chemical. It is both so extremely stable and dependable it is found in fossils that are hundreds of thousands of years old. DNA exists only to reproduce itself and does so with remarkable accuracy, on average with only one error or mutation for every billion letters copied. When they occur, genetic errors can create mutant proteins, most of which are benign and quickly dispatched by the body’s immune cells; however, some mutant proteins survive to cause problems. If there are too many errors, the cell dies; too few and the organism cannot adapt to changes in its environment. Genetic errors are the price organisms pay for evolution. When mutations are beneficial, such as when they result in stronger muscles or regenerative liver tissue, the organism is more likely to live longer and pass these improvements to offspring. Tragically, genetic errors are the most common reason for infant death. Genetic errors are also responsible for 19% of deaths in hospital pediatric intensive care units, half of all pediatric long-term care, and over half of all end of life admissions. Genetic errors, abnormalities in the genome, come in several varieties. Genetic errors are monogenic when it is a single mutated gene and polygenic when it involves more than one complete chromosomal abnormality. Errors can take the form of an extra codon (duplication), a codon where it should not be, a missing codon (deletion), or a codon that repeats too many times. Other genetic abnormalities include missing genes and extra (translocation) or missing chromosomes. Genetic mutations can be inherited or can occur spontaneously. Most of us are familiar with one or two genetic diseases. In fact, there are over 6,000 known genetic disorders. Around 65% of people are affected by one or more disorders that affect health. Well known genetic diseases include Huntington’s, Crohn’s, Down syndrome, Fabry, Gaucher, hemophilia, Marfan syndrome, microcephaly, cystic fibrosis, albinism, Alzheimer’s, muscular dystrophy, Tay-Sachs, primary pulmonary hypertension, mental retardation, and sickle cell anemia. It is the deletion of just three bases from the CFTR gene on chromosome 7 that is sufficient to cause cystic fibrosis (CF). CF patients have difficulty breathing, constantly cough up mucus, and have frequent lung infections. So far there is no known cure. Sickle cell anemia causes frequent severely painful attacks when misshapen red blood cells become trapped in veins. Its source is the substitution of just one letter in the HBB gene sequence that codes for hemoglobin. Genomics 6 Some biomedical platforms have developed new DNA-, RNA-, and cell-based therapies with broad potential applications that offer a glimmer of hope for treating life-threatening diseases that just a few decades ago, had no cure. The Japanese pharmaceutical platform, Astellas Pharma Inc., recently announced plans to buy Audentes Therapeutics for $3 billion, a 110% premium to its Audentes market value. Astellas believed it was worth it to access gene Therapeutics therapy that included three compounds that cause mutated valued at sections of the genetic code to be ignored during the translation phase, ignoring the mutation that leads to Duchenne Muscular $3 billion Dystrophy. - a 110% premium to its There has also been some success in the battle against another market value monstrous genetic disease Lymphomas. Lymphoma is a group - by Japanese of blood cancers sourced from lymphocytes. The most common is non-Hodgkin’s lymphoma. With immunotherapies designed to pharmaceutical combat the disease recently receiving regulatory approval, help is company Astellas on the way.
Recommended publications
  • Beyond Four Bases: Epigenetic Modifications Prove Critical to Understanding E
    Base Modification Detection Case Study | November 2012 BEYOND FOUR BASES: EPIGENETIC MODIFICATIONS PROVE CRITICAL TO UNDERSTANDING E. COLI OUTBREAK Studies of the E. coli outbreak in Germany demonstrate the fundamental need for long-read sequencing and DNA base modification data. Using SMRT® sequencing technology, scientists were for the first time able to reveal some of the complex mechanisms underlying gene regulation processes in the organism. A newly published paper adds to insights generated in a 2011 study investigating last year’s deadly E. coli outbreak in Germany; together, they offer a fascinating new view of the mechanisms of gene regulation in a microbe. Dr. Eric Schadt, Chair of the Department of Genetics and Genomics Sciences and Director of the Institute for Genomics and Multiscale Biology at Mount Sinai School of Medicine, helped lead these efforts to elucidate the strain of E. coli responsible for the outbreak. “All these bugs living in us and around us are affecting us on a much deeper level than we’ve appreciated,” he says. “Even in a microorganism there are complex networks at play, and being able to study these on a genome-wide scale for the first time is really causing a revolution within the microbiology community.” The PacBio RS sequencing platform “has the throughput to completely finish these small microbial genomes in under a day,” Schadt says. Dr. Eric Schadt, Director of the Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine The papers are remarkable for different reasons. “Origins — and then pulling those together into a multiscale view of the E.
    [Show full text]
  • Title Authors Abstract Introduction
    Title MOSAIK: A hash-based algorithm for accurate next-generation sequencing read mapping Authors Wan-Ping Lee1, Michael Stromberg1,2, Alistair Ward1, Chip Stewart1,3, Erik Garrison1, Gabor T. Marth1 1 Department of Biology, Boston College, Chestnut Hill, MA 2 Illumina, Inc., San Diego, CA 3 Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA Abstract MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation.
    [Show full text]
  • Pacific Biosciences' 2020 Annual Report
    2020 Annual Report $SULO )HOORZ6WRFNKROGHUV ZDVDQXQIRUJHWWDEOH\HDURQPDQ\DFFRXQWV7KH&29,'SDQGHPLFFKDQJHGRXUOLYHVDQGWKHZD\ZHGREXVLQHVV2XUHPSOR\HHVZRUNHG WLUHOHVVO\WRVXSSRUWRXUFXVWRPHUVDQG,FDQQRWWKDQNWKHPHQRXJK'HVSLWHWKHFKDOOHQJHVRIWKHSDQGHPLFZHFRQWLQXHGWRGULYHRXUEXVLQHVV IRUZDUGJURZLQJRXU6HTXHO,,6\VWHPLQVWDOOHGEDVHE\QHDUO\ 3DQGHPLFDVLGHZDVDOVRDGHILQLQJDQGWUDQVIRUPDWLRQDO\HDUIRU3DFLILF%LRVFLHQFHV$IWHUPRQWKVRIPHUJHUSODQQLQJZLWK,OOXPLQDHQGHG LQWHUPLQDWLRQRIWKHPHUJHUDJUHHPHQWZHVHWRXWWRLQYHVWDQGSODQIRUVXFFHVVDVDVWDQGDORQHFRPSDQ\,Q6HSWHPEHURIODVW\HDUDIWHUVHUYLQJ RQWKH%RDUGVLQFH,MRLQHGIXOOWLPHDV&(2WROHDGWKHFRPSDQ\LQWRDQHZSKDVHRIFRPPHUFLDOL]DWLRQDQGJURZWK,MRLQHG3DF%LREHFDXVH, EHOLHYHWKDWRXUWHFKQRORJ\SURGXFWURDGPDSDQGPRVWLPSRUWDQWO\RXUSHRSOHDUHSRVLWLRQHGWRWUXO\HQDEOHWKHSURPLVHRIJHQRPLFVWREHWWHUKXPDQ KHDOWK,QRUGHUWRVHWWKHFRPSDQ\RQDSDWKWRDFKLHYHWKLVPLVVLRQ,VHWIRUWKWKUHHLQLWLDOVWUDWHJLHVWKDWZLOOGULYHRXUH[HFXWLRQDQGVXVWDLQJURZWK RYHUWKHQH[WVHYHUDO\HDUV7KHVHVWUDWHJLFSLOODUVZLOOKHOSXVEULQJWKHZRUOG¶VPRVWDGYDQFHGVLQJOHPROHFXOHORQJUHDGVHTXHQFLQJWHFKQRORJLHV WRUHVHDUFKHUVDQGFOLQLFLDQVZRUOGZLGHDQGZLOOHQDEOHXVWRSDUWLFLSDWHLQDPDUNHWRSSRUWXQLW\RIPRUHWKDQELOOLRQ ([SDQGLQJRXU&RPPHUFLDO5HDFK 7KHJHQRPLFVODQGVFDSHKDVJURZQFRQVLGHUDEO\RYHUWKHSDVWGHFDGHDQGWKHUHDUHPRUHODEVWKDQHYHUSHUIRUPLQJVHTXHQFLQJ$IWHU\HDUVRI LQYHVWPHQW DQG SURGXFW GHYHORSPHQW LQGXVWU\ H[SHUWV UHFRJQL]H 3DF%LR DV WKH OHDGHU LQ DFFXUDWH DQG FRPSOHWH ORQJUHDG VHTXHQFLQJ 2XU WHFKQRORJLFDOGLIIHUHQWLDWLRQPHDQVWKDWPRUHFXVWRPHUVWKDQHYHUFDQQRZEHQHILWIURPLQFRUSRUDWLQJ3DF%LRVHTXHQFLQJLQWRWKHLUODEV7RFDSWXUH
    [Show full text]
  • Prodoma: Improve Protein Domain Classification for Third-Generation
    PRODOMA: IMPROVE PROTEIN DOMAIN CLASSIFICATION FOR THIRD-GENERATION SEQUENCING READS USING DEEP LEARNING Nan Du Jiayu Shang Dept. of Computer Science and Engineering Dept. of Electrical Engineering Michigan State University City University of Hong Kong East Lansing, MI 48824, United States Kowloon, Hong Kong SAR, China [email protected] [email protected] Yanni Sun Dept. of Electrical Engineering City University of Hong Kong Kowloon, Hong Kong SAR, China [email protected] September 29, 2020 ABSTRACT Motivation: With the development of third-generation sequencing technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in third-generation sequencing data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in third-generation sequencing data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. Results: In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for third-generation sequencing reads. It uses deep neural networks with 3-frame translation encoding to learn conserved features from partially correct translations. In addition, we formulate our problem as an open-set problem and thus our model can reject unrelated DNA reads such as those from noncoding regions. In the experiments on simulated reads of protein coding sequences and real reads from the human genome, our model outperforms HMMER and DeepFam on protein domain classification.
    [Show full text]
  • No Evidence for Extensive Horizontal Gene Transfer in The
    No evidence for extensive horizontal gene transfer in SEE COMMENTARY the genome of the tardigrade Hypsibius dujardini Georgios Koutsovoulosa, Sujai Kumara, Dominik R. Laetscha,b, Lewis Stevensa, Jennifer Dauba, Claire Conlona, Habib Maroona, Fran Thomasa, Aziz A. Aboobakerc, and Mark Blaxtera,1 aInstitute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, United Kingdom; bThe James Hutton Institute, Dundee DD2 5DA, United Kingdom; and cDepartment of Zoology, University of Oxford, Oxford OX1 3PS, United Kingdom Edited by W. Ford Doolittle, Dalhousie University, Halifax, Canada, and approved March 1, 2016 (received for review January 8, 2016) Tardigrades are meiofaunal ecdysozoans that are key to under- cryptobiotic (24), but serves as a useful comparator for good cryp- standing the origins of Arthropoda. Many species of Tardigrada tobiotic species (9). can survive extreme conditions through cryptobiosis. In a re- Animal genomes can accrete horizontally transferred DNA, cent paper [Boothby TC, et al. (2015) Proc Natl Acad Sci USA especially from germ line-transmitted symbionts (25), but the 112(52):15976–15981], the authors concluded that the tardigrade majority of transfers are nonfunctional and subsequently evolve Hypsibius dujardini had an unprecedented proportion (17%) of neutrally and can be characterized as dead-on-arrival horizontal genes originating through functional horizontal gene transfer (fHGT) gene transfer (doaHGT) (25–27). Functional horizontal gene and speculated that fHGT was likely formative in the evolution of transfer (fHGT) can bring to a recipient genome new biochemical cryptobiosis. We independently sequenced the genome of H. dujardini. capacities and contrasts with gradualist evolution of endogenous As expected from whole-organism DNA sampling, our raw data con- genes to new function.
    [Show full text]
  • Pacific Biosciences Contributes Whole Genome Sequence Data for German E. Coli Outbreak Strain and 11 Related Strains for Comparative Analysis
    Pacific Biosciences Contributes Whole Genome Sequence Data for German E. Coli Outbreak Strain and 11 Related Strains for Comparative Analysis Improved Chemistry and Software Provides Higher Accuracy Single Molecule Reads and Longer Readlengths to Yield PacBio-only De Novo Assembly MENLO PARK, Calif.--(BUSINESS WIRE)-- Pacific Biosciences of California, Inc. (NASDAQ: PACB) announced that it has completed a de novo sequence assembly of the Escherichia coli O104:H4 strain responsible for the recent outbreak in Germany using its Single Molecule Real Time (SMRT™) technology, and sequenced 11 related bacterial strains (including six previously unsequenced strains of the same serotype) for comparative analyses. An international team of scientific experts on E. coli collaborated on the rapid sequencing project to provide more comprehensive information about the origins of the strain that gave rise to the deadly outbreak. The data were generated using an early version of chemistry and software in development at Pacific Biosciences for the next major PacBio RS product upgrade, planned for the fourth quarter of 2011. The data provided to the public domain includes a complete assembly of the German outbreak strain, alignment to assemblies from other outbreak isolates, and sequences for 11 related Enteroaggregative E. coli strains. The project demonstrates the ability to produce a PacBio-only de novo assembly for a complex microbial pathogen, and the power of rapid sequencing of multiple genomes with the PacBio RS to elucidate the evolutionary history of a pathogenic microbe. A summary of the project appears on the company's website at http://blog.pacificbiosciences.com. The Pacific Biosciences scientific team, led by Chief Scientific Officer Eric Schadt, Ph.D., is collaborating with some of the world's leading experts on E.
    [Show full text]
  • The Genome of the Parthenogenetic Springtail Folsomia Candida
    Faddeeva-Vakhrusheva et al. BMC Genomics (2017) 18:493 DOI 10.1186/s12864-017-3852-x RESEARCH ARTICLE Open Access Coping with living in the soil: the genome of the parthenogenetic springtail Folsomia candida Anna Faddeeva-Vakhrusheva1, Ken Kraaijeveld1, Martijn F. L. Derks2, Seyed Yahya Anvar3,4, Valeria Agamennone1, Wouter Suring1, Andries A. Kampfraath1, Jacintha Ellers1, Giang Le Ngoc1,6, Cornelis A. M. van Gestel1, Janine Mariën1, Sandra Smit5, Nico M. van Straalen1 and Dick Roelofs1* Abstract Background: Folsomia candida is a model in soil biology, belonging to the family of Isotomidae, subclass Collembola. It reproduces parthenogenetically in the presence of Wolbachia, and exhibits remarkable physiological adaptations to stress. To better understand these features and adaptations to life in the soil, we studied its genome in the context of its parthenogenetic lifestyle. Results: We applied Pacific Bioscience sequencing and assembly to generate a reference genome for F. candida of 221.7 Mbp, comprising only 162 scaffolds. The complete genome of its endosymbiont Wolbachia, was also assembled and turned out to be the largest strain identified so far. Substantial gene family expansions and lineage-specific gene clusters were linked to stress response. A large number of genes (809) were acquired by horizontal gene transfer. A substantial fraction of these genes are involved in lignocellulose degradation. Also, the presence of genes involved in antibiotic biosynthesis was confirmed. Intra-genomic rearrangements of collinear gene clusters were observed, of which 11 were organized as palindromes. The Hox gene cluster of F. candida showed major rearrangements compared to arthropod consensus cluster, resulting in a disorganized cluster.
    [Show full text]
  • Illumina Pacbio: Provisional Findings Report
    Anticipated acquisition by Illumina, Inc. of Pacific Biosciences of California, Inc. Provisional findings report 24 October 2019 © Crown copyright 2019 You may reuse this information (not including logos) free of charge in any format or medium, under the terms of the Open Government Licence. To view this licence, visit www.nationalarchives.gov.uk/doc/open-government- licence/ or write to the Information Policy Team, The National Archives, Kew, London TW9 4DU, or email: [email protected]. The Competition and Markets Authority has excluded from this published version of the provisional findings report information which the inquiry group considers should be excluded having regard to the three considerations set out in section 244 of the Enterprise Act 2002 (specified information: considerations relevant to disclosure). The omissions are indicated by []. Some numbers have been replaced by a range. These are shown in square brackets. Contents Page Provisional Findings ................................................................................................... 4 1. The reference ....................................................................................................... 4 2. The industry .......................................................................................................... 4 Introduction to DNA sequencing ........................................................................... 4 Applications of DNA sequencing ........................................................................... 6
    [Show full text]