Library Preparation for High Throughput DNA Sequencing Anders Jemt
Total Page:16
File Type:pdf, Size:1020Kb
Library Preparation for High Throughput DNA Sequencing Anders Jemt © Anders Jemt Stockholm 2017 Royal Institute of Technology (KTH) School of Biotechnology Division of Gene Technology Science for Life Laboratory SE-171 65 Solna Sweden Printed by Universitetsservice US-AB Drottning Kristinas väg 53B SE-100 44 Stockholm Sweden ISBN 978-91-7729-212-8 TRITA-BIO Report 2017:1 ISSN 1654-2312 Anders Jemt Anders Jemt (2017): Library Preparation for High Throughput DNA Sequencing, Division of Gene Technology, School of Biotechnology, Royal Institute of Technology (KTH), Stockholm, Sweden Abstract Order 3 billion base pairs of DNA in the correct order and you get the blueprint of a human, the genome. Before the introduction of massively parallel sequencing a little more than a decade ago it would cost around $10 million to get this blueprint. Since then, sequencing throughput and cost have plummeted and now that figure is around $1000, and large sequencing centres such as the National Genomics Infrastructure in Stockholm is sequencing the equivalent of 25 human genomes per hour. The papers that form the basis of this thesis cover different aspects of the rapidly expanding sequencing field. Paper I describes a model system that employ massively parallel sequencing to characterize the behaviour of type IIS restriction enzymes. Enzymes are biological macromolecules that catalyse chemical reactions in the cell. All commercially available sequencing systems use enzymes to prepare the nucleic acids before they are loaded on the machine. Thus, intimate knowledge of enzymes is vital not only when designing new sequencing protocols, but also for understanding the limitations of current protocols. Paper II covers the automation of a library preparation protocol for spatially resolved transcriptome sequencing. Automation increases the sample throughput and also minimises the risk of human errors that can introduce technical noise in the data. In paper III, the power of massively parallel sequencing is employed to describe the RNA content of the endometrium at two different time points during the menstrual cycle. Finally, paper IV covers the sequencing of highly degraded nucleic acids from formalin fixed, paraffin embedded samples. These samples often have a rich clinical background, making them extremely valuable for researchers. However, it is challenging to sequence these samples and this study looks at the impact that different preparation kits have on the quality of the sequencing data. Keywords: DNA, RNA, sequencing, massively parallel sequencing, library preparation, automation, genome, transcriptome Sammanfattning Det mänskliga genomet, vår arvsmassa, består av lite drygt 3 miljarder baspar som om de struktureras korrekt bildar en ritning över en människa. Att ta fram den här ritningen kostade fram till introduktionen av massiv parallell sekvensering för lite mer än 10 år sedan, runt 10 miljoner dollar. Sedan dess har sekvenseringskostnaden sjunkit till den punkt där vi idag kan sekvensera ett humant genom för så lite som 1000 dollar. Stora sekvenserings center som National Genomics Infrastructure i Stockholm sekvenserar idag material motsvarande 25 humana genom per timme. Den här boken baseras på fyra artiklar som täcker in ett flertal olika aspekter av ett snabbt expanderande sekvenseringsfält. Artikel I beskriver ett modell system för att studera typ IIS enzymer. Systemet använder sig av massiv parallell sekvensering vilket gör att enzymets funktion kan studeras med enorm precision. Enzymer är biologiska makromolekyler som cellen används för att katalysera kemiska reaktioner. Idag använder alla kommersiellt tillgängliga system enzymer för att förbereda nukleinsyror (DNA och RNA) inför sekvensering. Därför är det viktigt att förstå hur enzymer fungerar när nya DNA sekvenserings protokoll tas fram, men också för att förstå begräsningar i nuvarande protokoll. Artikel II redogör för automatiseringen av ett protokoll som används för att generera spatialt upplösta sekvenseringsbibliotek från RNA. Automatisering är en viktig del i att öka genomflödet av prover samt för att minimera teknisk variation som riskerar att dölja biologiska skillnader. I artikel III används massiv parallell sekvensering av DNA för att undersöka RNA innehållet i livmoderslemhinnan vid två tidpunkter under menstruationscykeln. Artikel IV handlar om sekvensering av nedbrutna nukleinsyror från formalin fixerad, paraffin innesluten vävnad. Dessa prover har ofta en rik klinisk historia vilket gör dem oerhört intressanta för forskare. Dock så är det mycket utmanande att sekvensera dessa prover och därför undersöks här hur olika kit påverkar kvalitén på sekvenseringsdata. Anders Jemt List of publications This thesis is based upon the work presented in the following publications. I. Lundin S*, Jemt A*, Terje-Hegge F, Foam N, Pettersson E, Käller M, Wirta V, Lexow P, Lundeberg J. Endonuclease specificity and sequence dependence of type IIS restriction enzymes. PLoS One 2015; 10: e0117059. II. Jemt A, Salmén F, Lundmark A, Mollbrink A, Navarro JF, Ståhl PL, Yucel-Lindberg T, Lundeberg J. An automated approach to prepare tissue-derived spatially barcoded RNA-sequencing libraries. Sci Rep 2016; 6: 37137. III. Sigurgeirson B*, Åhmark A*, Jemt A, Ujvari D, Westgren M, Lundeberg J, Gidlöf S. Comprehensive RNA sequencing of healthy human endometrium at two time points of the menstrual cycle. Biology of Reproduction (in press) IV. Jemt A, Ewels P, Hammarén R, Gruselius J, Borgström E, Wang C, Holmberg K, Lundin P, Lundin S, Gréen H, Lundeberg J, Käller M. Evaluation of methods for whole genome and transcriptome sequencing from nanograms of FFPE samples. Manuscript *These authors contributed equally to this paper Related publication Zhang M, Schmidt T, Jemt A, Sahlén P, Sychugov I, Lundeberg J, Linnros J. Nanopore arrays in a silicon membrane for parallel single-molecule detection: DNA translocation. Nanotechnology 2015; 26: 314002. Contents INTRODUCTION ............................................................................................. 1 NUCLEIC ACIDS, PROTEINS, CELLS AND TISSUES ............................... 3 DNA ............................................................................................................................... 3 RNA ............................................................................................................................... 5 PROTEINS ...................................................................................................................... 6 CELLS, TISSUES AND ORGANS .................................................................................... 7 SEQUENCING TECHNOLOGIES ................................................................... 9 MASSIVELY PARALLEL SEQUENCING ..................................................................... 11 454 Life Sciences .................................................................................................. 11 Illumina ................................................................................................................... 12 SOLiD ........................................................................................................................ 13 Ion Torrent ............................................................................................................. 13 GeneReader, Qiagen ........................................................................................... 14 Complete Genomics/BGISEQ .......................................................................... 14 SINGLE-MOLECULE SEQUENCING ........................................................................... 15 Pacific Biosciences .............................................................................................. 16 Oxford Nanopore Technologies ..................................................................... 16 UPCOMING TECHNOLOGIES ..................................................................................... 17 GenoCare, Direct Genomics ............................................................................. 18 GENIUS, GenapSys ............................................................................................... 18 Genia, Roche .......................................................................................................... 18 Firefly, Illumina .................................................................................................... 19 Hyb & Seq, Nanostring Technologies .......................................................... 19 Solid-state nanopore sequencers .................................................................. 19 PREPARING NUCLEIC ACIDS FOR SEQUENCING ............................... 21 EXTRACTION .............................................................................................................. 21 Challenging source material .......................................................................... 22 DNA LIBRARY PREPARATION ................................................................................ 23 Fragmentation ..................................................................................................... 23 Modification and Ligation ............................................................................... 24 Amplification and Enrichment ...................................................................... 25 Anders Jemt Reaction clean up and size selection .......................................................... 25 VARIANTS OF DNA SEQUENCING ..........................................................................