Sequence Assembly and Mapping with MIRA 5 I

Sequence assembly and mapping with MIRA 5 i Sequence assembly and mapping with MIRA 5 The Definitive Guide Sequence assembly and mapping with MIRA 5 ii Copyright © 2018 Bastien Chevreux This documentation is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Sequence assembly and mapping with MIRA 5 iii COLLABORATORS TITLE : Sequence assembly and mapping with MIRA 5 ACTION NAME DATE SIGNATURE WRITTEN BY Bastien Chevreux January 25, 2019 Extensive review of Jacqueline Weber January 25, 2019 early reference manual Extensive review of Andrea Hörster January 25, 2019 early reference manual Draft for section on Katrina Dlugosch January 25, 2019 preprocessing of ESTs in EST manual REVISION HISTORY NUMBER DATE DESCRIPTION NAME Sequence assembly and mapping with MIRA 5 iv Contents 1 Introduction to MIRA 1 1.1 What is MIRA? . .1 1.2 What to read in this manual and where to start reading? . .1 1.3 The MIRA quick tour . .2 1.4 For which data sets to use MIRA and for which not . .3 1.4.1 Genome de-novo . .3 1.4.2 Genome mapping . .4 1.4.3 De-novo ESTs / RNASeq . .4 1.4.4 Mapping EST / RNASeq . .4 1.5 Any special features I might be interested in? . .4 1.5.1 MIRA learns to discern non-perfect repeats, leading to better assemblies . .4 1.5.2 MIRA has integrated editors for data from Sanger, 454, IonTorrent sequencing . .7 1.5.3 MIRA lets you see why contigs end where they end . 11 1.5.4 MIRA tags problematic decisions in hybrid assemblies . 13 1.5.5 MIRA for polishing of PacBio or ONT assemblies . 14 1.5.6 MIRA allows older finishing programs to cope with amount data in Illumina mapping projects . 14 1.5.7 MIRA tags SNPs and other features, outputs result files for biologists . 16 1.5.8 MIRA has ... much more . 18 1.6 Versions, Licenses, Disclaimer and Copyright . 18 1.6.1 Versions . 18 1.6.2 License . 19 1.6.2.1 MIRA . 19 1.6.2.2 Documentation . 19 1.6.3 Copyright . 19 1.6.4 External libraries . 19 1.7 Getting help / Mailing lists / Reporting bugs . 19 1.8 Author . 20 1.9 Miscellaneous . 20 1.9.1 Citing MIRA . 20 1.9.2 Postcards, gold and jewellery . 20 Sequence assembly and mapping with MIRA 5 v 2 Installing MIRA 21 2.1 Where to fetch MIRA . 21 2.1.1 Package naming and versioning scheme . 21 2.1.2 SourceForge: source package and precompiled binaries . 21 2.1.3 GitHub: git repository, source, and binary packages . 22 2.2 Installing from a precompiled binary package . 22 2.3 Integration with third party programs (gap4, consed) . 23 2.4 Compiling MIRA yourself . 23 2.4.1 Prerequisites . 23 2.4.2 Compiling and installing . 24 2.4.3 Configure arguments for MIRA . 24 2.4.3.1 Configure arguments for the BOOST library . 24 2.4.3.2 MIRA specific configure arguments . 24 2.5 Installation walkthroughs . 25 2.5.1 (K)Ubuntu 18.04 . 25 2.5.2 openSUSE 12.1 . 26 2.5.3 Fedora 17 . 26 2.5.4 Mac OSX . 27 2.5.5 Compile everything from scratch . 28 2.5.6 Dynamically linked MIRA . 28 2.6 Compilation hints for other platforms. 29 2.6.1 NetBSD 5 (i386) . 29 2.7 Notes for distribution maintainers / system administrators . 29 2.7.1 Additional data files . 29 3 Preparing data 30 3.1 Introduction . 30 3.2 Sanger . 30 3.3 Roche / 454 . 30 3.4 Illumina . 30 3.5 Ion Torrent . 31 3.6 Short Read Archive (SRA) . 31 4 Cookbook: de-novo genome assemblies 32 4.1 Introduction . 32 4.2 General steps . 32 4.2.1 Copying and naming the sequence data . 33 4.2.2 Writing a simple manifest file . 33 4.2.3 Starting the assembly . 34 Sequence assembly and mapping with MIRA 5 vi 4.3 Manifest files for different use cases . 34 4.3.1 Manifest for shotgun data . 34 4.3.2 Assembling with multiple sequencing technologies (hybrid assemblies) . 34 4.3.3 Manifest for data sets with paired reads . 35 4.3.4 De-novo with multiple strains . 37 5 Cookbook: Mapping 38 5.1 Introduction . 38 5.2 General steps . 38 5.2.1 Copying and naming the sequence data . 39 5.2.2 Copying and naming the reference sequence . 39 5.2.3 Writing a simple manifest file . 40 5.2.4 Starting the assembly . ..

Load more