Next-Generation DNA Sequencing Informatics, 2Nd Edition
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
De Bruijn Graph Based De Novo Genome Assembly
2160 JOURNAL OF SOFTWARE, VOL. 9, NO. 8, AUGUST 2014 De Bruijn Graph based De novo Genome Assembly Mohammad Ibrahim Khan Computer Science and Engineering, Chittagong University of Engineering and Technology Chittagong, Bangladesh [email protected] Md. Sarwar Kamal Computer Science and Engineering, Chittagong University of Engineering and Technology Chittagong, Bangladesh [email protected] Abstract—The Next Generation Sequencing (NGS) is an and microorganism. Now –a-days, numbers of unique important process which assures inexpensive organization idea and vocational attempt have been established on the of vast size of raw sequence data set over any traditional road to the evolution of new techniques to overcome the sequencing systems or methods. Various aspects of NGS like problems of Sanger. The continuing innovation of DNA, template preparation, sequencing imaging and genome RNA and Proteins sequencing techniques have alignment and assembly outlines the genome sequencing and accompanied the development of sequencing system alignment .Consequently, deBruijn Graph (dBG) is an important mathematical tool that graphically analyzes how under the environment of sudden reduced expenditures the orientations are constructed in groups of nucleotides. and better outcomes in performance comparing with the Basically, de Bruijn graph describe the formation of the systems of tow three years back. Various sequencers genome segments in a circular iterative fashions. Some from some established methods like 454 Life Science or pivotal de Bruijn graph based de novo algorithms and Roche or Illumina and Applied Biosystems are in software package like T-IDBA, Oases, IDBA-tran, Euler, developments and Helicos have appeared in application. Velvet, ABySS, AllPaths, SOAPdenovo and SOAPdenovo2 All these sequencers will produce more and more data have illustrated here. -
What Can the Miseq & NGS Core Do for Your Research?
What Can the MiSeq & NGS Core Do for Your Research? Grant Hill Library Prep Specialist [email protected] © 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Agenda ! Into/basic principles of NGS (terms & technology) ! Small genome sequencing ! Targeted resequencing (custom panels) ! 16s metagenomics & amplicon sequencing ! Small RNA & targeted rna expression ! Experimental design & local resources 2 Enhanced Focus on the Sample to Answer Integration From library prep to downstream informatics & knowledge generation Library Prep Sequence Answer 3 For Research Use Only. Not for use in diagnostic procedures. The Flow Cell Where the magic happens ! Everything except sample preparation is completed on the flow cell •" Template annealing (1 - 384 samples) •" Template amplification •" Sequencing primer hybridization •" Sequencing-by-synthesis reaction •" Generation of fluorescent signal MiSeq NextSeq HiSeq 4 Flow Cell Surface 8 channels Simplified workflow Surface of flow cell ! Clusters in a coated with a lawn of oligo pairs contained environment (no need for clean rooms) ! -
Sequence Alignment/Map) Is a Text Format for Storing Sequence Alignment Data in a Series of Tab Delimited ASCII Columns
NGS FILE FORMATS SEQUENCE FILE FORMATS FASTA FORMAT FASTA Single sequence example: >HWI-ST398_0092:1:1:5372:2486#0/1 TTTTTCGTTCTTTTCATGTACCGCTTTTTGTTCGGTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGAT ACGTAGCAGCAGCATCAGTACGACTACGACGACTAGCACATGCGACGATCGATGCTAGCTGACTATCGATG Multiple sequence example: >Sequence Name 1 TTTTTCGTTCTTTTCATGTACCGCTTTTTGTTCGGTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGAT ACGTAGCAGCAGCATCAGTACGACTACGACGACTAGCACATGCGACGATCGATGCTAGCTGACTATCGATG >Sequence Name 2 ACGTAGACACGACTAGCATCAGCTACGCATCGATCAGCATCGACTAGCATCACACATCGATCAGCATCACGACTAGCAT AGCATCGACTACACTACGACTACGATCCACGTACGACTAGCATGCTAGCGCTAGCTAGCTAGCTAGTCGATCGATGAGT AGCTAGCTAGCTAGC >Sequence Name 3 ACTCAGCATGCATCAGCATCGACTACGACTACGACATCGACTAGCATCAGCAT SEQUENCE FILE FORMATS FASTQ FORMAT FASTQ Text based format for storing sequence data and corresponding quality scores for each base. To enable a one-one correspondence between the base sequence and the quality score the score is stored as a single one letter/number code using an offset of the standard ASCII code. Quality scores range from 0 to 40 and represent a log10 score for the probability of being wrong. E.g. score of 30 => 1:1000 chance of error SEQUENCE FILE FORMATS FASTQ FORMAT FASTQ Each fastq file contain multiple entries and each entry consists of 4 lines: 1. header line beginning with “@“ and sequence name 2. sequence line 3. header line beginning with “+” which can have the name but rarely does 4. quality score line SEQUENCE FILE FORMATS FASTQ FORMAT FASTQ @HWI-ST398_0092:6:73:5372:2486#0/1 TTTTTCGTTCTTTTCATGTACCGCTTTTTGTTCGGTTAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGAT -
Macvector 13.5 Workshop
MacVector 13 Workshop September 2014 MacVector 13.5 for Mac OS X Getting Started with MacVector 13.5 Support: 1 866 338 0222 +44 (0) 1223 410552 [email protected] MacVector 13 Workshop 1 MacVector 13 Workshop September 2014 MacVector Resources Tutorials A number of tutorials are available for download; http://www.macvector.com/downloads.html Videos http://www.macvector.com/Screencasts/screencasts2.html Manual There is a downloadable PDF version of the manual (12.0) at http://www.macvector.com/downloads.html#MacVector12UserGuide Discussion Forums To post questions or follow ongoing discussions, check out the user forums at; http://www.macvector.com/phpbb/index.php Copyright Statement Copyright MacVector, Inc, 2014. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. This version of the workshop guide was published in November -
Evidence of Selection at the Ramosa1 Locus During Maize Domestication
Molecular Ecology (2010) 19, 1296–1311 doi: 10.1111/j.1365-294X.2010.04562.x Evidence of selection at the ramosa1 locus during maize domestication BRANDI SIGMON and ERIK VOLLBRECHT Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA Abstract Modern maize was domesticated from Zea mays parviglumis, a teosinte, about 9000 years ago in Mexico. Genes thought to have been selected upon during the domestication of crops are commonly known as domestication loci. The ramosa1 (ra1) gene encodes a putative transcription factor that controls branching architecture in the maize tassel and ear. Previous work demonstrated reduced nucleotide diversity in a segment of the ra1 gene in a survey of modern maize inbreds, indicating that positive selection occurred at some point in time since maize diverged from its common ancestor with the sister species Tripsacum dactyloides and prompting the hypothesis that ra1 may be a domestication gene. To investigate this hypothesis, we examined ear phenotypes resulting from minor changes in ra1 activity and sampled nucleotide diversity of ra1 across the phylogenetic spectrum between tripsacum and maize, including a broad panel of teosintes and unimproved maize landraces. Weak mutant alleles of ra1 showed subtle effects in the ear, including crooked rows of kernels due to the occasional formation of extra spikelets, correlating a plausible, selected trait with subtle variations in gene activity. Nucleotide diversity was significantly reduced for maize landraces but not for teosintes, and statistical tests implied directional selection on ra1 consistent with the hypothesis that ra1 is a domestication locus. In maize landraces, a noncoding 3¢-segment contained almost no genetic diversity and 5¢-flanking diversity was greatly reduced, suggesting that a regulatory element may have been a target of selection. -
Macvector 10.6 2 Macvector User Guide Copyright Statement
MacVector 10.6 2 MacVector User Guide Copyright statement Copyright MacVector, Inc, 2008. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. MacVector User Guide 3 4 MacVector User Guide 1 Introduction to the User Guide ....................................19 Overview....................................................................................19 The MacVector documentation set.............................................20 About this user guide..............................................................20 Conventions in this user guide...................................................20 Interface conventions .............................................................21 Navigation aids ..........................................................................21 2 Introduction to MacVector.............................................23 -
DNA Sequencing
Contig Assembly ATCGATGCGTAGCAGACTACCGTTACGATGCCTT… TAGCTACGCATCGTCTGATGGCAATGCTACGGAA.. C T AG AGCAGA TAGCTACGCATCGT GT CTACCG GC TT AT CG GTTACGATGCCTT AT David Wishart, Ath 3-41 [email protected] DNA Sequencing 1 Principles of DNA Sequencing Primer DNA fragment Amp PBR322 Tet Ori Denature with Klenow + ddNTP heat to produce + dNTP + primers ssDNA The Secret to Sanger Sequencing 2 Principles of DNA Sequencing 5’ G C A T G C 3’ Template 5’ Primer dATP dATP dATP dATP dCTP dCTP dCTP dCTP dGTP dGTP dGTP dGTP dTTP dTTP dTTP dTTP ddCTP ddATP ddTTP ddGTP GddC GCddA GCAddT ddG GCATGddC GCATddG Principles of DNA Sequencing G T _ _ short C A G C A T G C + + long 3 Capillary Electrophoresis Separation by Electro-osmotic Flow Multiplexed CE with Fluorescent detection ABI 3700 96x700 bases 4 High Throughput DNA Sequencing Large Scale Sequencing • Goal is to determine the nucleic acid sequence of molecules ranging in size from a few hundred bp to >109 bp • The methodology requires an extensive computational analysis of raw data to yield the final sequence result 5 Shotgun Sequencing • High throughput sequencing method that employs automated sequencing of random DNA fragments • Automated DNA sequencing yields sequences of 500 to 1000 bp in length • To determine longer sequences you obtain fragmentary sequences and then join them together by overlapping • Overlapping is an alignment problem, but different from those we have discussed up to now Shotgun Sequencing Isolate ShearDNA Clone into Chromosome into Fragments Seq. Vectors Sequence 6 Shotgun Sequencing Sequence Send to Computer Assembled Chromatogram Sequence Analogy • You have 10 copies of a movie • The film has been cut into short pieces with about 240 frames per piece (10 seconds of film), at random • Reconstruct the film 7 Multi-alignment & Contig Assembly ATCGATGCGTAGCAGACTACCGTTACGATGCCTT… TAGCTACGCATCGTCTGATGGCAATGCTACGGAA. -
Computational Methods for De Novo Assembly of Next-Generation Genome Sequencing Data Rayan Chikhi
Computational methods for de novo assembly of next-generation genome sequencing data Rayan Chikhi To cite this version: Rayan Chikhi. Computational methods for de novo assembly of next-generation genome sequencing data. Other [cs.OH]. École normale supérieure de Cachan - ENS Cachan, 2012. English. NNT : 2012DENS0033. tel-00752033 HAL Id: tel-00752033 https://tel.archives-ouvertes.fr/tel-00752033 Submitted on 14 Nov 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE / ENS CACHAN - BRETAGNE présentée par sous le sceau de l’Université européenne de Bretagne Rayan Chikhi pour obtenir le titre de Préparée à l’Unité Mixte de Recherche 6074 DOCTEUR DE L’école norMALE suPÉRIEURE de CACHAN Institut de recherche en informatique Mention : Informatique École doctorale MATISSE et systèmes aléatoires Thèse soutenue le 2 juillet 2012 Computational Methods devant le jury composé de : Éric RIVALS, for de novo Assembly of DR, LIRMM / rapporteur Sante GNERRE, Next-Generation Genome Group leader, Broad Institute of MIT and Harvard / rapporteur Marie-France SAGOT, -
Clustering and Apple
Apple in Research Rajiv Pillai Power of UNIX. Simplicity of Macintosh. Mac OS X: The easy way to be open Comand Line Interface FreeBSD 5 Editors Commands and Utilities Shells Scripting languages The Best Foundation The Best Foundation Secure Scalable Open standards High performance Rock-solid stability Advanced networking Built on Open Source Over 100 Open Source Projects Apple Confidential Modern Languages • GCC 3.3 • Perl 5.8.1 • Python 2.3 • PHP 4.3.2 • TCL 8.4.2 • Ruby 1.6.8 • Bash 2.05 Integrated X11 Quartz window Runs side-by- manager side with native applications (or full screen) Accelerated graphics Launch from Finder Dock menu A Whole New World of Solutions Bringing Mac OS X into new markets Scientific Distributed Enterprise Mathematica Platform LSF Oracle 10g MATLAB Globus Sybase BLAST Sun Grid Engine HP OpenView HMMER MPI SAP client GROMACS PBS SAS GeneSpring Myrinet JBoss (J2EE) PyMol Infiniband Tomcat (JSP) IBM XL Fortran iNquiry Axis (SOAP) Mac OS X The Best of Both Worlds Open Like Linux Convenient Like Windows Open Source Shrink-wrap solutions Open Standards Fits in to existing networks Open APIs & Applications Single point of support Runs all the Apps a Scientist Needs A single system on their desk • Their favorite GUI applications (e.g. Mathematica, Gaussian, Vector NTI, TurboWorx, more) • And their favorite UNIX applications (e.g. Phred/Phrap, HMMer, BLAST, Smith-Waterman, more) • And their favorite productivity tools (e.g. Photoshop, Microsoft Office, Outlook email client) • All run simultaneously on Mac OS X (Yes! Side by side) Apple Confidential Over 100 installations since March Academic Government and Commercial Harvard University Naval Medical Research Center Isis Pharmaceuticals Stanford University Scripps Research Institute Cincinnati Children’s Hospital Cornell University Children’s Mercy Hospital U.S. -
Macvector 12.5 Getting Started Guide
MacVector 12.5 Getting Started Guide Copyright statement Copyright MacVector, Inc, 2011. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. 2 Table of Contents COPYRIGHT STATEMENT ................................................................................ 2 INTRODUCTION.................................................................................................. 4 GETTING SEQUENCE INFORMATION INTO MACVECTOR ...................... 4 IMPORTING SEQUENCE FILES .............................................................................4 CREATING NEW SEQUENCES...............................................................................4 OPENING SEQUENCES FROM ENTREZ ...............................................................5 VIEWING AND EDITING SEQUENCES........................................................... -
Whole-Genome Sequencing of Bacterial Pathogens
REVIEW crossm Whole-Genome Sequencing of Bacterial Downloaded from Pathogens: the Future of Nosocomial Outbreak Analysis Scott Quainoo,a Jordy P. M. Coolen,b Sacha A. F. T. van Hijum,c,d Martijn A. Huynen,c Willem J. G. Melchers,b Willem van Schaik,e http://cmr.asm.org/ Heiman F. L. Wertheimb Department of Microbiology, Radboud University, Nijmegen, The Netherlandsa; Department of Medical Microbiology, Radboud University Medical Centre, Nijmegen, The Netherlandsb; Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlandsc; NIZO, Ede, The Netherlandsd; Institute of Microbiology and Infection, University of Birmingham, Birmingham, United Kingdome SUMMARY ...................................................................................1016 Published 30 August 2017 on August 31, 2017 by TECH KNOWLEDGE CTR OF DENMARK INTRODUCTION .............................................................................1017 Citation Quainoo S, Coolen JPM, van Hijum OUTBREAK DEFINITION ....................................................................1018 SAFT, Huynen MA, Melchers WJG, van Schaik CONVENTIONAL MOLECULAR CHARACTERIZATION METHODS ......................1018 W, Wertheim HFL. 2017. Whole-genome Non-Amplification-Based Typing Technologies ..........................................1018 sequencing of bacterial pathogens: the future Restriction fragment length polymorphism methods ................................1018 of nosocomial outbreak analysis. Clin Microbiol Matrix-assisted laser desorption -
Alternate-Locus Aware Variant Calling in Whole Genome Sequencing Marten Jäger1,2, Max Schubach1, Tomasz Zemojtel1,Knutreinert3, Deanna M
Jäger et al. Genome Medicine (2016) 8:130 DOI 10.1186/s13073-016-0383-z RESEARCH Open Access Alternate-locus aware variant calling in whole genome sequencing Marten Jäger1,2, Max Schubach1, Tomasz Zemojtel1,KnutReinert3, Deanna M. Church4 and Peter N. Robinson1,2,3,5,6* Abstract Background: The last two human genome assemblies have extended the previous linear golden-path paradigm of the human genome to a graph-like model to better represent regions with a high degree of structural variability. The new model offers opportunities to improve the technical validity of variant calling in whole-genome sequencing (WGS). Methods: We developed an algorithm that analyzes the patterns of variant calls in the 178 structurally variable regions of the GRCh38 genome assembly, and infers whether a given sample is most likely to contain sequences from the primary assembly, an alternate locus, or their heterozygous combination at each of these 178 regions. We investigate 121 in-house WGS datasets that have been aligned to the GRCh37 and GRCh38 assemblies. Results: We show that stretches of sequences that are largely but not entirely identical between the primary assembly and an alternate locus can result in multiple variant calls against regions of the primary assembly. In WGS analysis, this results in characteristic and recognizable patterns of variant calls at positions that we term alignable scaffold-discrepant positions (ASDPs). In 121 in-house genomes, on average 51.8 ± 3.8 of the 178 regions were found to correspond best to an alternate locus rather than the primary assembly sequence, and filtering these genomes with our algorithm led to the identification of 7863 variant calls per genome that colocalized with ASDPs.