HMMER User's Guide
Total Page:16
File Type:pdf, Size:1020Kb
HMMER User’s Guide Biological sequence analysis using profile hidden Markov models http://hmmer.org/ Version 3.1b1; May 2013 Sean R. Eddy and Travis J. Wheeler for the HMMER Development Team Janelia Farm Research Campus 19700 Helix Drive Ashburn VA 20147 USA http://eddylab.org/ Copyright (C) 2013 Howard Hughes Medical Institute. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are retained on all copies. HMMER is licensed and freely distributed under the GNU General Public License version 3 (GPLv3). For a copy of the License, see http://www.gnu.org/licenses/. HMMER is a trademark of the Howard Hughes Medical Institute. 1 Contents 1 Introduction 7 How to avoid reading this manual . 7 How to avoid using this software (links to similar software) . 7 What profile HMMs are . 7 Applications of profile HMMs . 8 Design goals of HMMER3 . 9 What’s new in HMMER3.1 . 10 What’s still missing in HMMER3.1 . 10 How to learn more about profile HMMs . 11 2 Installation 12 Quick installation instructions . 12 System requirements . 12 Multithreaded parallelization for multicores is the default . 13 MPI parallelization for clusters is optional . 13 Using build directories . 14 Makefile targets . 14 Why is the output of ’make’ so clean? . 14 What gets installed by ’make install’, and where? . 14 Staged installations in a buildroot, for a packaging system . 15 Workarounds for some unusual configure/compilation problems . 15 3 Tutorial 17 The programs in HMMER . 17 Supported formats . 18 Files used in the tutorial . 18 Searching a protein sequence database with a single protein profile HMM . 19 Step 1: build a profile HMM with hmmbuild . 19 Step 2: search the sequence database with hmmsearch . 20 Single sequence protein queries using phmmer . 26 Iterative protein searches using jackhmmer . 27 Searching a DNA sequence database . 29 Step 1: Optionally build a profile HMM with hmmbuild . 29 Step 2: search the DNA sequence database with nhmmer . 30 Searching a profile HMM database with a query sequence . 32 Step 1: create an HMM database flatfile . 32 Step 2: compress and index the flatfile with hmmpress . 33 Step 3: search the HMM database with hmmscan . 33 Creating multiple alignments with hmmalign . 35 4 The HMMER profile/sequence comparison pipeline 37 Null model. 38 MSV filter. 38 Biased composition filter. 38 Viterbi filter. 39 Forward filter/parser. 40 Domain definition. 40 Modifications to the pipeline as used for DNA search. 42 SSV, not MSV. 42 There are no domains, but there are envelopes. 42 2 Biased composition. 42 5 Tabular output formats 44 The target hits table . 44 The domain hits table (protein search only) . 46 6 Some other topics 48 How do I cite HMMER? . 48 How do I report a bug? . 48 Input files . 49 Reading from a stdin pipe using - (dash) as a filename argument . 49 7 Manual pages 51 alimask - Add mask line to a multiple sequence alignment . 51 Synopsis . 51 Description . 51 Options . 51 Options for Specifying Mask Range . 51 Options for Specifying the Alphabet . 52 Options Controlling Profile Construction . 52 Options Controlling Relative Weights . 53 Other Options . 53 hmmalign - align sequences to a profile HMM . 54 Synopsis . 54 Description . 54 Options . 54 hmmbuild - construct profile HMM(s) from multiple sequence alignment(s) . 55 Synopsis . 55 Description . 55 Options . 55 Options for Specifying the Alphabet . 55 Options Controlling Profile Construction . 56 Options Controlling Relative Weights . 56 Options Controlling Effective Sequence Number . 57 Options Controlling Priors . 57 Options Controlling E-value Calibration . 57 Other Options . 58 hmmconvert - convert profile file to a HMMER format . 60 Synopsis . 60 Description . 60 Options . 60 hmmemit - sample sequences from a profile HMM . 61 Synopsis . 61 Description . 61 Common Options . 61 Options Controlling What to Emit . 61 Options Controlling Emission from Profiles . ..