INFERNAL User's Guide
Total Page:16
File Type:pdf, Size:1020Kb
INFERNAL User’s Guide Sequence analysis using profiles of RNA sequence and secondary structure consensus http://eddylab.org/infernal Version 1.1.4; Dec 2020 Eric Nawrocki and Sean Eddy for the INFERNAL development team https://github.com/EddyRivasLab/infernal/ Copyright (C) 2020 Howard Hughes Medical Institute. Infernal and its documentation are freely distributed under the 3-Clause BSD open source license. For a copy of the license, see http://opensource.org/licenses/BSD-3-Clause. Infernal development is supported by the Intramural Research Program of the National Library of Medicine at the US National Institutes of Health, and also by the National Human Genome Research Institute of the US National Institutes of Health under grant number R01HG009116. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 1 Contents 1 Introduction 6 How to avoid reading this manual . 6 What covariance models are . 6 Applications of covariance models . 7 Infernal and HMMER, CMs and profile HMMs . 7 What’s new in Infernal 1.1 . 8 How to learn more about CMs and profile HMMs . 8 2 Installation 10 Quick installation instructions . 10 System requirements . 10 Multithreaded parallelization for multicores is the default . 11 MPI parallelization for clusters is optional . 11 Using build directories . 12 Makefile targets . 12 Why is the output of ’make’ so clean? . 12 What gets installed by ’make install’, and where? . 12 Staged installations in a buildroot, for a packaging system . 13 Workarounds for some unusual configure/compilation problems . 13 3 Tutorial 15 The programs in Infernal . 15 Files used in the tutorial . 15 Searching a sequence database with a single covariance model . 16 Step 1: build a covariance model with cmbuild . 16 Step 2: calibrate the model with cmcalibrate . 17 Step 3: search a sequence database with cmsearch . 18 Truncated RNA detection . 24 Searching a CM database with a query sequence . 24 Step 1: create an CM database flatfile . 24 Step 2: compress and index the flatfile with cmpress . 25 Step 3: search the CM database with cmscan . 25 Truncated hit and local end alignment example . 27 Searching the Rfam CM database with a query sequence . 28 Creating multiple alignments with cmalign . 30 cmalign assumes sequences may be truncated . 31 Searching a sequence database for RNAs with unknown or no secondary structure . 32 Forcing global CM alignment with the -g option . 34 Specifying and annotating match positions with cmbuild –hand . 34 4 Infernal 1.1’s profile/sequence comparison pipeline 37 Filter thresholds are dependent on database size . 38 Manually setting filter thresholds . 39 In more detail: profile HMM filter stages . 40 Null model. 40 SSV filter. 41 Local Viterbi filter. 41 Biased composition filter. 42 Local Forward filter. 42 Glocal Forward filter. 43 2 Envelope definition. 44 In more detail: CM stages of the pipeline . 45 HMM band definition for CM stages. 45 HMM banded CM CYK filter. 45 HMM banded CM Inside filter/parser. 46 Optimal accuracy alignment. 46 Biased composition CM score correction: the null3 model. 46 Truncated hit detection using variants of the pipeline . 49 Differences between the standard pipeline and the truncated variants . 49 Modifying how truncated hits are detected using command-line options . 50 HMM-only pipeline variant for models without structure . 50 5 Profile SCFG construction: the cmbuild program 52 Technical description of a covariance model . 52 Definition of a stochastic context free grammar . 52 SCFG productions allowed in CMs . 52 From consensus structural alignment to guide tree . 53 From guide tree to covariance model . 54 Parameterization . 55 Comparison to profile HMMs . 55 The cmbuild program, step by step . 57 Alignment input file . 57 Parsing secondary structure annotation . 57 Sequence weighting . 58 Architecture construction . 58 Parameterization . 59 Naming the model . 59 Saving the model . 59 6 Tabular output formats 60 Target hits tables . 60 Target hits table format 1 . 60 Target hits table format 2 . 61 7 Some other topics 63 How do I cite Infernal? . 63 How do I report a bug? . 63 Input files . 64 Reading from a stdin pipe using - (dash) as a filename argument . 64 8 Manual pages 66 cmalign - align sequences to a covariance model . 66 Synopsis . 66 Description . 66 Options . 67 Options for Controlling the Alignment Algorithm . 67 Options for Controlling Speed and Memory Requirements . 68 Optional Output Files . 69 Other Options . 69 cmbuild - construct covariance model(s) from structurally annotated . 71 Synopsis . 71 Description . ..