Computational Methods for Analysis of Single Molecule Sequencing Data

Total Page:16

File Type:pdf, Size:1020Kb

Computational Methods for Analysis of Single Molecule Sequencing Data Computational Methods for Analysis of Single Molecule Sequencing Data by Ehsan Haghshenas M.Sc., University of Western Ontario, 2014 B.Sc., Isfahan University of Technology, 2012 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the School of Computing Science Faculty of Applied Sciences c Ehsan Haghshenas 2020 SIMON FRASER UNIVERSITY Spring 2020 Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation. Approval Name: Ehsan Haghshenas Degree: Doctor of Philosophy (Computing Science) Title: Computational Methods for Analysis of Single Molecule Sequencing Data Examining Committee: Chair: Diana Cukierman University Lecturer Binay Bhattacharya Senior Supervisor Professor S. Cenk Sahinalp Co-Supervisor Senior Investigator Center for Cancer Research National Cancer Institute Cedric Chauve Co-Supervisor Professor Faraz Hach Co-Supervisor Assistant Professor Department of Urologic Sciences The University of British Columbia Senior Research Scientist Vancouver Prostate Centre Martin Ester Internal Examiner Professor Mihai Pop External Examiner Professor Department of Computer Science University of Maryland, College Park Date Defended: March 26, 2020 ii Abstract Next-generation sequencing (NGS) technologies paved the way to a significant increase in the number of sequenced genomes, both prokaryotic and eukaryotic. This increase provided an opportunity for considerable advancement in genomics and precision medicine. Although NGS technologies have proven their power in many applications such as de novo genome assembly and variation discovery, computational analysis of the data they generate is still far from being perfect. The main limitation of NGS technologies is their short read length relative to the lengths of (common) genomic repeats. Today, newer sequencing technologies (known as single-molecule sequencing or SMS) such as Pacific Biosciences and Oxford Nanopore are producing significantly longer reads, making it theoretically possible to overcome the difficulties imposed by repeat regions. For instance, for the first time, a complete human chromosome was fully assembled using ultra-long reads generated by Oxford Nanopore. Unfortunately, long reads generated by SMS technologies are characterized by a high error rate, which prevents their direct utilization in many of the standard downstream analysis pipelines and poses new computational challenges. This motivates the development of new computational tools specifically designed for SMS long reads. In this thesis, we present three computational methods that are tailored for SMS long reads. First, we present lordFAST, a fast and sensitive tool for mapping noisy long reads to a reference genome. Mapping sequenced reads to their potential genomic origin is the first fundamental step for many computational biology tasks. As an example, in this thesis, we show the success of lordFAST to be employed in structural variation discovery. Next, we present the second tool, CoLoRMap, which tackles the high level of base-level errors in SMS long reads by providing a means to correct them using a complementary set of NGS short reads. This integrative use of SMS and NGS data is known as hybrid technique. Finally, we introduce HASLR, an ultra-fast hybrid assembler that uses reads generated by both technologies to efficiently generate accurate genome assemblies. We demonstrate that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on all the samples compared to other tested assemblers. Furthermore, the generated assemblies in terms of contiguity and accuracy are on par with the other tools on most of the samples. iii Keywords: Computational biology; Single-molecule sequencing; PacBio; Oxford Nanopore; Long read mapping; Hybrid error correction; Hybrid assembly iv Dedication To my family, with love v Acknowledgements First and foremost, I would like to express my sincerest gratitude to my supervisors, Dr. Cenk Sahinalp, Dr. Cedric Chauve, Dr. Faraz Hach, and Dr. Binay Bhattacharya, for their constant support, guidance, and patience throughout my PhD studies. I was honored to have the opportunity to work with such brilliant scholars, from whom I learned critical thinking and the proper way of doing research. In addition, I would like to give my regards and appreciation to Dr. Jens Stoye, my host and supervisor during my visit at Bielefeld University. This visit greatly influenced the direction of my work on hybrid assembly. I would also like to thank Dr. Mihai Pop and Dr. Martin Ester, my external and internal examiners, for their careful review of my thesis. I appreciate their invaluable discussions, comments, and suggestions, which helped me improve the thesis. I want to give special thanks to Dr. Diana Cukierman, who graciously accepted to be the chair of my examining committee. During my PhD, I was also involved in a few research projects that are not included in this thesis. I had wonderful valuable experiences in these collaborative projects. Regarding these collaborations, I would like to thank Dr. Salem Malikic, Michael Ford, Hossein Asghari, Sean La, and Farid Rashidi Mehrabadi. I also offer enduring gratitude to all past and present lab members in Lab for Computational Biology and Bioinformatics at Simon Fraser University, as well as Hach Lab at Vacnouver Prostate Centre. In particular, I thank Dr. Yen-Yi Lin, Iman Sarrafi, Dr. Ibrahim Numanagic, Ermin Hodzic, Can Kockan, Dr. Raunak Shrestha, Baraa Orabi, Tunc Morova, and Fatih Karaoglanoglu, who all made the work environment a more pleasant one. My special thanks go to Baraa Orabi and Elie Ritch for their help with proofreading the thesis. In addition, I would like to thank all members of the Genome Informatics research group at Bielefeld University in Germany, including Omar Castillo, Konstantinos Tzanakis, Eyla Willing, Georges Hattab, Tizian Schulz, Guillaume Holley, Nina Luhmann, Liren Huang, Lu Zhu, Markus Lux, Linda Sundermann, and Tina Zekic, who made my visit from Bielefeld University such a great experience. I am grateful to many other friends I made in Vancouver including Abdollah Safari, Sina Bahrasemani, Sajjad Gholami, Mehran Khodabandeh, Hedayat Zarkoob, Mohsen Jamali, Soheil Horr, Shahram Zaheri, Amirmasoud Ghasemi, Abraham Hashemian, Hashem Jeihooni, Shahram Pourazadi, Mohammad Mahdavian, Majid Talebi, Saeed vi Mirazimi, Hassan Shavarani, Mahdi Nemati Mehr, Sima Jamali, Nazanin Mehrasa, Ramtin Mehdizadeh, Sina Salari, Ali Afsah, Chakaveh Ahmadizadeh, Mahsa Gharibi, Mohammad Akbari, Akbar Rafiey, Saeed Izadi, Saeid Asgari, Hossein Sharifi-Noghabi, Sepehr MohaimenianPour, Sara Daneshvar, Hooman Zabeti, Sara Jalili, Mohammad Mazraeh, Marjan Moodi, and many more. All these amazing people made Vancouver a true home. Last but not least, I would like to thank my loving family for all their support during these years. An exceptional thanks goes to my wonderful wife, Rana, who definitely made a significant contribution to this thesis with her continuous support and patience. vii Table of Contents Approval ii Abstract iii Dedication v Acknowledgements vi Table of Contents viii List of Tables xi List of Figures xiv 1 Introduction 1 1.1 Contributions . 5 1.2 Organization of the thesis . 7 2 Background and Related Work 8 2.1 Single-molecule sequencing technologies . 8 2.1.1 Pacific Biosciences . 8 2.1.2 Oxford Nanopore Technology . 9 2.1.3 Synthetic long reads . 11 2.2 Definitions and Notations . 12 2.3 Long Read Mapping . 13 2.4 Error correction of long noisy reads . 16 2.4.1 Hybrid correction . 16 2.4.2 Self-correction . 18 2.5 de novo genome assembly . 20 2.5.1 Hybrid assembly . 21 2.5.2 Non-hybrid assembly . 23 2.5.3 wtdbg2 . 26 3 Long read mapping 27 viii 3.1 Methods . 29 3.1.1 Overview . 29 3.1.2 Stage One: Reference Genome Indexing . 29 3.1.3 Stage Two: Read Mapping . 30 3.2 Results . 33 3.2.1 Experiment on a simulated dataset without structural variations . 33 3.2.2 Simulation in presence of structural variations . 36 3.2.3 Experiment on a real dataset . 39 3.3 Summary . 41 4 Hybrid error correction of long reads 43 4.1 Methods . 44 4.1.1 Overview . 44 4.1.2 Initial correction of long reads: the SP algorithm . 45 4.1.3 Correcting gaps using One-End Anchors . 47 4.2 Results . 50 4.2.1 Data and computational setting . 50 4.2.2 Measures of evaluation . 51 4.2.3 Comparison based on alignment . 52 4.2.4 Comparison based on assembly . 52 4.3 Comparison with more recent hybrid correction tools . 60 4.4 Summary . 60 5 Hybrid assembly of long reads 68 5.1 Methods . 70 5.1.1 Obtaining unique short read contigs . 70 5.1.2 Construction of backbone graph . 71 5.1.3 Graph cleaning and simplification . 74 5.1.4 Generating the assembly . 76 5.1.5 Methodological remarks . 77 5.2 Results . 79 5.2.1 Experiment on simulated dataset . 79 5.2.2 Experiment on real dataset . 82 5.3 Summary . 85 6 Conclusion 86 6.1 Future directions . 87 6.2 Recommended guidelines . 88 Bibliography 90 ix Appendix A lordFAST Material 105 A.1 Data . 105 A.1.1 Real data . 105 A.1.2 Synthetic data . 105 A.2 Software . 107 A.3 Command details . 108 Appendix B CoLoRMap Material 111 B.1 Data . 111 Appendix C HASLR Material 112 C.1 Data . 112 C.1.1 Simulated data . 112 C.1.2 Real data . 113 C.2 Software . 114 C.3 Command details . 115 C.4 Visual examples of regions assembled only by HASLR without any misassembly or fragmentation . 117 x List of Tables Table 3.1 Comparison between different tools capable of mapping PacBio long reads on the simulated human dataset. This dataset contains 25,000 reads and 183.61 million bases. Best results are marked with bold typeface. 35 Table 3.2 Runtime and memory usage of same table. 36 Table 3.3 Comparison between different tools capable of mapping PacBio long reads on the simulated human dataset.
Recommended publications
  • In the Court of Chancery of the State of Delaware
    IN THE COURT OF CHANCERY OF THE STATE OF DELAWARE NEW ORLEANS EMPLOYEES’ RETIREMENT SYSTEM, on behalf of itself and all other similarly situated shareholders of Celera Corporation, Plaintiff, v. C.A. No. _________ RICHARD H. AYERS, JEAN-LUC BELINGARD, WILLIAM G. GREEN, PETER BARTON HUTT, GAIL K. NAUGHTON, KATHY ORDONEZ, WAYNE I. ROE, BENNETT M. SHAPIRO, CELERA CORPORATION, QUEST DIAGNOSTICS INCORPORATED, AND SPARK ACQUISITION CORPORATION, Defendants. VERIFIED CLASS ACTION COMPLAINT Plaintiff, New Orleans Employees’ Retirement System (“NOERS” or “Plaintiff”), on behalf of itself and all other similarly situated public shareholders of Celera Corporation (hereafter, “Celera” or the “Company”), brings the following Verified Class Action Complaint (the “Complaint”) against the members of the board of directors of Celera (the “Celera Board” or “Board”) for breaching their fiduciary duties, and against Quest Diagnostics Incorporated (“Quest”) and Spark Acquisition Corporation (“Spark”) for aiding and abetting the same. The allegations of the Complaint are based on the knowledge of Plaintiff as to itself, and on information and belief, including the investigation of counsel and review of publicly available information as to all other matters. INTRODUCTION 1. This is a case about a corporate board that chose to negotiate an all-cash sale of the company while operating under the same material conflict of interest that lay at the heart of the Delaware Supreme Court’s ruling in Revlon, Inc. v. MacAndrews & Forbes, Inc., 506 A.2d 173 (Del. 1986). Over the weeks leading to this action, the Celera Board faced a clear choice: disclose a material accounting fraud and risk liability that would flow from that disclosure, or negotiate a desperate and rushed sale of the company at whatever price a potential bidder would offer in order to insulate themselves from liability and secure their own financial well-being.
    [Show full text]
  • Preliminary Healthcare Agenda 01.03X
    29th Annual J.P. Morgan Healthcare Conference January 10 - 13, 2011 Westin St. Francis Hotel, San Francisco, CA Preliminary Conference Agenda SUNDAY, JANUARY 9 - Registration in Tower Salon A - 3 to 9 PM MONDAY, JANUARY 10 - Registration in Tower Salon A - 6:45 AM, Breakfast in Italian Foyer Grand Ballroom Colonial Room California West California East Elizabethan A/B Elizabethan C/D Alexandra's Breakout: Borgia Room Breakout: Georgian Room Breakout: Olympic Room Breakout: Yorkshire Room Breakout: Sussex Room Private Company Track Not-for-Profit Track 7:30 AM Opening Remarks: Doug Braunstein - Chief Financial Officer, JPMorgan Chase & Co., Grand Ballroom Astra Tech 8:00 AM Celgene Corporation Kinetic Concepts, Inc Alkermes, Inc. Biocon Limited Catalent (private company) 8:30 AM Express Scripts Inc. Agilent Technologies Inc. Beckman Coulter Inc. Bio-Rad Laboratories, Inc. Quality Systems Axcan Intermediate Holdings 9:00 AM Roche Holding AG Zimmer Holdings, Inc. Genoptix, Inc. ImmunoGen, Inc Health Net Inc. Merrimack Pharmaceuticals Inc. Vertex Pharmaceuticals Allscripts Healthcare Solutions, 9:30 AM Medicis Pharmaceutical Corp. Lonza Group Ltd Henry Schein Inc. Surgical Care Affiliates Incorporated Inc. 10:00 AM Medtronic, Inc. WellPoint, Inc. Onyx Pharmaceuticals Inc. Sigma-Aldrich Corporation Align Technology Inc.* Symphogen 10:30 AM Room Not Available Medco Health Solutions, Inc. Smith & Nephew plc* Medivation, Inc. Lexicon Pharmaceuticals, Inc. Zeltiq Aesthetics 11:00 AM Room Not Available Merck KGaA Perrigo Company Healthways Incorporated BioMimetic Therapeutics, Inc. Penumbra, Inc. 11:30 AM Room Not Available Dendreon Corporation Gen-Probe Inc. Select Medical Corporation ArthroCare Corporation PTC Therapeutics, Inc. 12:00 PM Luncheon & Keynote: Nancy-Ann DeParle - Counselor to the President and Director of the White House Office of Health Reform, Grand Ballroom Endo Pharmaceuticals Holdings 1:30 PM Room Not Available Amylin Pharmaceuticals Inc.
    [Show full text]
  • Celera CORP Form 10-KT Filed 2009-03-25
    SECURITIES AND EXCHANGE COMMISSION FORM 10-KT Transition report pursuant to Rule 13a-10 or 15d-10 Filing Date: 2009-03-25 | Period of Report: 2008-12-27 SEC Accession No. 0001193125-09-063099 (HTML Version on secdatabase.com) FILER Celera CORP Mailing Address Business Address 1401 HARBOR BAY 1401 HARBOR BAY CIK:1428156| IRS No.: 262028576 | State of Incorp.:DE | Fiscal Year End: 0630 PARKWAY PARKWAY Type: 10-KT | Act: 34 | File No.: 001-34116 | Film No.: 09704683 ALAMEDA CA 94502 ALAMEDA CA 94502 SIC: 8731 Commercial physical & biological research 510-749-4200 Copyright © 2014 www.secdatabase.com. All Rights Reserved. Please Consider the Environment Before Printing This Document Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 Form 10-KT ¨ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 FOR THE FISCAL YEAR ENDED x TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 FOR THE TRANSITION PERIOD FROM JULY 1, 2008 TO DECEMBER 27, 2008 COMMISSION FILE NUMBER: 001-34116 Celera Corporation (Exact name of registrant as specified in its charter) Delaware 26-2028576 (State or other jurisdiction (I.R.S. Employer of incorporation) Identification No.) 1401 Harbor Bay Parkway Alameda, CA 94502 (Address of principal executive offices, with zip code) (510) 749-4200 (Registrants telephone number, including area code) Securities registered pursuant to Section 12(b) of the Act: Name of Each Exchange on Which Title of Each Class Registered Common Stock, $.01 par value The NASDAQ Stock Market LLC Securities registered pursuant to Section 12(g) of the Act: None Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act.
    [Show full text]
  • In the Court of Chancery of the State of Delaware In
    IN THE COURT OF CHANCERY OF THE STATE OF DELAWARE IN RE CELERA CORPORATION Consolidated SHAREHOLDER LITIGATION C.A. No. 6304-VCP VERIFIED CONSOLIDATED AMENDED CLASS ACTION COMPLAINT Plaintiff New Orleans Employees’ Retirement System (“NOERS”) on behalf of itself and all other similarly situated public shareholders of Celera Corporation (hereafter, “Celera” or the “Company”), bring the following Verified Consolidated Amended Class Action Complaint (the “Complaint”) against the members of the board of directors of Celera (the “Celera Board” or the “Board”) for breaching their fiduciary duties, and against Quest Diagnostics Incorporated (“Quest”) and Spark Acquisition Corporation (“Spark”) for aiding and abetting the same. The allegations of the Complaint are based on the knowledge of Plaintiff as to itself, and on information and belief, including the investigation of counsel and review of publicly available information as to all other matters. INTRODUCTION 1. This is a case about a corporate board that chose to negotiate an all-cash sale of the company while operating under the same material conflict of interest that lay at the heart of the Delaware Supreme Court’s ruling in Revlon, Inc. v. MacAndrews & Forbes, Inc., 506 A.2d 173 (Del. 1986). During the past two years, the Celera Board struggled with a multitude of improper accounting practices. These practices caused the Company to issue numerous financial restatements, and resulted in a number of related lawsuits. In the face of mounting personal liability, the Celera Board struck a deal to sell the Company in exchange for broad indemnification and lucrative continued employment. 2. On March 17, 2011, Celera entered into an Agreement and Plan of Merger (the “Merger Agreement”) with Quest, whereby Quest would, within a week of the deal’s announcement, commence a tender offer (the “Tender Offer”) to acquire all of the issued and outstanding shares of Celera common stock for $8.00 per share in cash (the “Proposed Transaction”).
    [Show full text]
  • Human Genome Project
    Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the sequence of nucleotide base pairs that make up human DNA, and of identifying and mapping all of the genes of the human genome from both a physical and a functional standpoint.[1] It remains the world's largest collaborative biological project.[2] After the idea was picked up in 1984 by the US government when the planning started, the project formally launched in 1990 and was declared complete in 2003[3]. Funding came from the US government through the National Institutes of Health (NIH) as well as numerous other groups from around the world. A parallel project was conducted outside government by the Celera Corporation, or Celera Genomics, which was formally launched in 1998. Most of the government-sponsored sequencing was performed in twenty universities and research centers in the United States, the United Kingdom, Japan, France, Logo HGP; Vitruvian Man, Leonardo Germany, Spain and China.[4] da Vinci The Human Genome Project originally aimed to map the nucleotides contained in a human haploid reference genome (more than three billion). The "genome" of any given individual is unique; mapping the "human genome" involved sequencing a small number of individuals and then assembling these together to get a complete sequence for each chromosome. Therefore, the finished human genome is a mosaic, not representing any one individual. Contents Human Genome Project History State of completion Applications and proposed
    [Show full text]
  • 'A Draft Sequence of the Neandertal Genome'
    A Draft Sequence of the Neandertal Genome Richard E. Green, et al. Science 328, 710 (2010); DOI: 10.1126/science.1188021 This copy is for your personal, non-commercial use only. If you wish to distribute this article to others, you can order high-quality copies for your colleagues, clients, or customers by clicking here. Permission to republish or repurpose articles or portions of articles can be obtained by following the guidelines here. The following resources related to this article are available online at www.sciencemag.org (this information is current as of May 7, 2010 ): Updated information and services, including high-resolution figures, can be found in the online version of this article at: http://www.sciencemag.org/cgi/content/full/328/5979/710 Supporting Online Material can be found at: http://www.sciencemag.org/cgi/content/full/328/5979/710/DC1 This article cites 81 articles, 29 of which can be accessed for free: http://www.sciencemag.org/cgi/content/full/328/5979/710#otherarticles on May 7, 2010 This article has been cited by 1 articles hosted by HighWire Press; see: http://www.sciencemag.org/cgi/content/full/328/5979/710#otherarticles This article appears in the following subject collections: Immunology http://www.sciencemag.org/cgi/collection/immunology www.sciencemag.org Downloaded from Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2010 by the American Association for the Advancement of Science; all rights reserved.
    [Show full text]
  • Classical Genetics 3. the Beginnings of Genomic Biol
    Table of Contents: Preface 3.3.2. Eukaryotic chromosome structure Websites of Interest 3.3.3. Heterochromatin & Euchromatin 3.4. DNA Replication Glossary 3.4.1. DNA replication is semiconservative 1. Introduction 3.4.2. DNA polymerases 1.1. What is a Gene? 3.4.3. Initiation of replication 1.2. What is a Genome? 3.4.4. DNA replication is semidiscontinuous 1.3. What is Genomic Biology? 3.4.5. DNA replication in Eukaryotes. 1.3.1. Structural Genomics 3.4.6. Replicating ends of chromosomes 1.3.2. Comparative Genomics 3.5. Transcription 1.3.3. Functional Genomics 3.5.1. Cellular RNAs are transcribed from DNA 1.4. Genomic Databases 3.5.2. RNA polymerases catalyze transcription 3.5.3. Transcription in Prokaryotes 2. The beginnings of Genomic Biology – classical 3.5.4. Transcription in Prokaryotes - Polycistronic mRNAs genetics are produced from operons 2.1. Mendel & Darwin – traits are conditioned by genes 3.5.5. Beyond Operons – Modification of expression in 2.2. Genes are carried on chromosomes Prokaryotes 2.3. The chromosomal theory of inheritance 3.5.6. Transcriptions in Eukaryotes 2.4. Additional Complexity of Mendelian Inheritance 3.5.7. Processing primary transcripts into mature mRNA 2.4.1. Multiple alleles 3.6. Translation 2.4.2. Incomplete dominance and co-dominance 3.6.1. The Nature of Proteins 2.4.3. Sex linked inheritance 2.4.4. Epistasis 3.6.2. The Genetic Code 2.4.5. Epigenetics 3.6.3. tRNA – The decoding molecule 2.5. The Law of Independent Assortment 3.6.4.
    [Show full text]
  • Applied Biosystems Inc
    SECURITIES AND EXCHANGE COMMISSION FORM 10-K Annual report pursuant to section 13 and 15(d) Filing Date: 2008-08-27 | Period of Report: 2008-06-30 SEC Accession No. 0001193125-08-185524 (HTML Version on secdatabase.com) FILER APPLIED BIOSYSTEMS INC. Mailing Address Business Address 301 MERRITT 7 301 MERRITT 7 CIK:77551| IRS No.: 061534213 | State of Incorp.:DE | Fiscal Year End: 0630 NORWALK CT 06851 NORWALK CT 06851 Type: 10-K | Act: 34 | File No.: 001-04389 | Film No.: 081042769 2038402000 SIC: 3826 Laboratory analytical instruments Copyright © 2012 www.secdatabase.com. All Rights Reserved. Please Consider the Environment Before Printing This Document Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION WASHINGTON, D.C. 20549 FORM 10-K x Annual Report Pursuant to Section 13 Or 15(d) of the Securities Exchange Act of 1934 For the fiscal year ended June 30, 2008 Or ¨ Transition Report Pursuant to Section 13 Or 15(d) of the Securities Exchange Act of 1934 For the transition period from to Commission File Number 001-04389 Applied Biosystems Inc. (Exact name of registrant as specified in its charter) DELAWARE 06-1534213 (State or other jurisdiction of (I.R.S. Employer Identification No.) Incorporation or organization) 301 Merritt 7 06851-1070 (Address of principal executive offices) (Zip Code) Registrants telephone number, including area code: 203-840-2000 Securities registered pursuant to Section 12(b) of the Act: Name of Each Exchange Title of Each Class on Which Registered Applied Biosystems Group Common Stock (par value $0.01 per New York Stock Exchange share) Rights to Purchase Series A Participating Junior New York Stock Exchange Preferred Stock (par value $0.01 per share) Celera Group Common Stock (par value $0.01 per share) N/A Rights to Purchase Series B Participating Junior N/A Preferred Stock (par value $0.01 per share) Securities registered pursuant to Section 12(g) of the Act: None Copyright © 2012 www.secdatabase.com.
    [Show full text]
  • Dissertation Submitted to the Combined Faculties for the Natural
    Dissertation submitted to the Combined Faculties for the Natural Sciences and for Mathematics of the Ruperto-Carola University of Heidelberg, Germany for the degree of Doctor of Natural Sciences presented by Sascha Meiers MS B born in Merzig, Germany Date of oral examination: .. Exploiting emerging DNA sequencing technologies to study genomic rearrangements Referees: Dr. Judith Zaugg Prof. Dr. Benedikt Brors Exploiting emerging DNA sequencing technologies to study genomic rearrangements Sascha Meiers th March Supervised by Dr. Jan Korbel Licensed under Creative Commons Attribution (CC BY) . The source code of this thesis is available at https://github.com/meiers/thesis The layout is inspired by and partly taken from Konrad Rudolph’s thesis Summary Structural variants (SVs) alter the structure of chromosomes by deleting, dupli- cating or otherwise rearranging pieces of DNA. They contribute the majority of nucleotide differences between humans and are known to play causal roles in many diseases. Since the advance of massively parallel sequencing (MPS) tech- nologies, SVs have been studied more comprehensively than ever before. How- ever, in contrast to smaller types of genetic variation, SV detection is still funda- mentally hampered by the limitations of short-read sequencing that cannot suf- ficiently cope with the complexity of large genomes. Emerging DNA sequencing technologies and protocols hold the potential to overcome some of these lim- itations. In this dissertation, I present three distinct studies each utilizing such emerging techniques to detect, to validate and/or to characterize SVs. These tech- nologies, together with novel computational approaches that I developed, allow to characterize SVs that had previously been challening, or even impossible, to assess.
    [Show full text]
  • Evolution of Cancer Genomics and Its Clinical Implications
    Journal of Pediatrics and Neonatal Care Research Article Open Access Evolution of cancer genomics and its clinical implications Introduction Volume 9 Issue 6 - 2019 Genomics is defined as the study of genes and their functions, and Muhammad Tawfique related techniques while genetics is the study of heredity.1,2 The main Pediatrics and Pediatric Hematology and Oncology, Bangladesh difference between genomics and genetics is that genetics scrutinizes Specialized Hospital, Bangladesh the function and composition of the single gene whereas genomics Correspondence: Muhammad Tawfique, Pediatrics and addresses all genes and their inter-relationships in order to identify Pediatric Hematology and Oncology, Bangladesh Specialized their combined influence on the growth and development of the Hospital, 21 Shyamoly, Mirpur Road, Dhaka 1207, Bangladesh, organism. Thus, genomics is an interdisciplinary field of biology that Email focus on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism’s complete set of DNA, including Received: June 24, 2019 | Published: December 24, 2019 all of its genes. It refers to the study of individual genes and their roles in inheritance. Objectives of genomics are collective characterization and quantification of all of an organism’s genes as well as their interrelationship and influence on the organism.3 Genes may direct the of oncogenomics is to identify new oncogenes or tumor suppressor production of proteins with the assistance of enzymes and messenger genes that may provide new insights into cancer diagnosis, predicting molecules. In turn, proteins make up body structures such as organs clinical outcome of cancers and new targets for cancer therapies. The and tissues as well as control chemical reactions and carry signals success of targeted cancer therapies such as Gleevec, Herceptin and between cells.
    [Show full text]
  • New DNA Sequencing Technologies
    Viikki Science Park 1999 Lars Paulin New DNA sequencing technologies DNA Sequencing and Genomics Institute of Biotechnology University ofof HelsinkiHelsinki http://www.biocenter.helsinki.fi/bi/DNA/ Lars Paulin Institute of Biotechnology University of Helsinki DNADNA SequencingSequencing LaboratoryLaboratory Cultivator 2, Viikinkaari 4 Started in 1990 with DNA Synthesis 1991 DNA Sequencing 1994 EU Yeast Genome Project 1999 - 2000 High-throughput pipeline 1999 – 2002 Five EST Sequencing Projects 2003 First Microbe Genome Project – Move together with Microarray Laboratory to Cultivator 2 2006 Genome Sequencer 20 Core Facility – Service DNA sequencing and whole projects – Collaborative projects ”Research hotel” – Develope high-throughput methods – Consulting Lars Paulin Institute of Biotechnology University of Helsinki DNA Sequencing and Microarray Pipe-Line Bacterial growth Sequencers – QPix Colony Picker – ABI 3130 16-Capillary Sequencer – QFill 2 Microplate filler – ABI 3730 48-Capillary Sequencer – 4 Multidrop 96 and 384 well – Genome Sequencer 20 – 4 Titramax shakers Centrifuges PCR – 2 Eppendorf 5810 – 1 Eppendorf 5804 – 1 Tetrad MjResearch [1 x 96, 3 x (2x48)] – 1 Beckman – 2 Tetrad MjResearch ( 4 x 96 ) Other – 1 Eppendorf Mastercycler (384) – Tecan SpectraFluor Microplate Reader – 1 ABI 9700 ( 2 x 384 ) – ImageMaster VDS, GE Healthcare – 4 ABI 9700 ( 96 ) – 8 Servers for analysis Robotics Phred, Staden Program, – Qiagen Biorobot 9600 AceDB, ARB etc. – Qiagen Biorobot 8000 – SUN cluster – ABI Prism 7000 – Qiagen Biorobot 3000 ( 2m
    [Show full text]
  • (Precision) Medicine
    9/23/2014 Disclosures Personalized (Precision) I am an employed by Promega Corporation, a Madison Medicine based international biotechnology company Promega does not market any of the products mentioned in Wisconsin Association of Physician Assistants this presentation, although we do provide components to some of the manufacturers of these diagnostic assays October 10, 2014 Ashley G. Anderson Jr., MD, MS Chief Medical Officer Promega Corporation Objectives Promega Corporation Look into the future, and appreciate the growing role, and limitless potential, of molecular diagnostics Understand the term “personalized medicine” Be motivated to read, learn, and become an early adopter of precision diagnostics Appreciate that these bill be big changes, but they may not come fast. 1 9/23/2014 Personalized medicine Precision medicine “The molecular methods that make personalized medicine The tests themselves are not necessarily “personalized.” possible include testing for variations in genes, gene expression, They are standardized like most other tests proteins and metabolites as well as new treatments that target molecular mechanisms. Test results are correlated with clinical The results from these tests are increasingly more predictive factorsfactors--suchsuch as disease state, prediction of futurfuturee disease states, of disease states drug response, and treatment prognosisprognosis--toto help physiciansphysicians Certain tests are more predictive of responses to specific individualize treatment for each patient” therapies Personalized Medicine
    [Show full text]