HMMER User's Guide
Total Page:16
File Type:pdf, Size:1020Kb
HMMER User’s Guide Biological sequence analysis using profile hidden Markov models http://hmmer.wustl.edu/ Version 2.3.2; Oct 2003 Sean Eddy Howard Hughes Medical Institute and Dept. of Genetics Washington University School of Medicine 660 South Euclid Avenue, Box 8232 Saint Louis, Missouri 63110, USA http://www.genetics.wustl.edu/eddy/ With contributions by Ewan Birney ([email protected]) Copyright (C) 1992-2003 HHMI/Washington University School of Medicine. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are retained on all copies. The free version of the HMMER software package is a copyrighted work that may be freely distributed and modified under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Alternative license terms may be obtained (for instance, for commercial purposes) from the Office of Technology Management at Washington University. See the files COPYING and LICENSE that came with your copy of the Infernal software for details. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. For a copy of the full text of the GNU General Public License, see www.gnu.org/licenses. 1 Contents 1 Introduction 6 How to avoid reading this manual . 6 What profile HMMs are . 6 Applications of profile HMMs . 7 How to avoid using this software (links to similar software) . 8 How to learn more about profile HMMs . 8 2 Installation 9 Quick installation instructions . 9 configuring, compiling, and installing a source code distribution . 9 configuring and installing a precompiled binary distribution . 9 System requirements and portability . 10 Running the configure script . 10 setting installation targets . 10 setting compiler and compiler flags . 11 turning on POSIX thread support for multiprocessors . 11 turning on PVM support for clusters . 12 turning on LFS support for files >2GB ............................ 12 turning on Altivec optimization for Macintosh PowerPC . 12 other options, used in development code . 12 example configuration . 12 The config.h header file . 13 controlling memory footprint with RAMLIMIT ......................... 13 limiting the default number of processors with HMMER NCPU . 13 the other stuff in config.h .................................. 13 Running make . 14 Shell environment variables understood by HMMER . 14 Configuring a PVM cluster for HMMER . 15 example of a PVM cluster . 16 3 Tutorial 19 The programs in HMMER . 19 Files used in the tutorial . 19 Format of input alignment files . 20 Searching a sequence database with a single profile HMM . 20 build a profile HMM with hmmbuild . 20 calibrate the profile HMM with hmmcalibrate . 21 search the sequence database with hmmsearch . 22 searching major databases like NCBI NR or SWISSPROT . 25 local alignment versus global alignment . 25 Searching a query sequence against a profile HMM database . 26 creating your own profile HMM database . 26 parsing the domain structure of a sequence with hmmpfam . 26 obtaining the PFAM database . 28 2 Creating and maintaining multiple alignments with hmmalign . 28 General notes on using the programs in HMMER . 28 getting quick help on the command line . 28 sequence file formats . 29 using compressed files . 29 reading from pipes . 29 protein analysis versus nucleic acid analysis . 30 environment variables . 30 exit status from the programs . 30 4 How HMMER builds profile HMMs from alignments 31 The Plan 7 profile HMM architecture . 31 the philosophy of Plan 7 . 32 the fully probabilistic “local” alignment models of Plan 7 . 33 available alignment modes . 34 wing retraction in Plan 7 dynamic programming . 35 Parsing the residue information in input multiple sequence alignments . 35 alphabet type: DNA or protein . 35 case insensitivity . 36 handling of degenerate residue codes . 36 Model architecture construction . 36 maximum a posteriori model architecture construction, the default . 36 fast model architecture construction . 37 hand model architecture construction . 37 Collecting observed emission/transition counts . 37 sequence weighting . 38 determining the effective sequence number . 38 adjustments to the alignment . 39 Estimating probability parameters from counts . 40 using customized priors . 40 the ad hoc PAM prior . 40 Calculating scores from counts . 41 Setting the alignment mode . 41 Naming and saving the HMM . 41 5 How HMMER scores alignments and determines significance 43 Executive summary . 43 In more detail: HMMER bit scores . 43 interaction of multihit alignment with negative bit scores . 44 In more detail: HMMER E-values . 44 fitting extreme value distributions to HMMER score histograms . 45 In more detail: Pfam TC/NC/GA cutoffs . 46 Biased composition filtering: the null2 model . 47 derivation of the null2 score correction . 48 3 6 File formats 51 HMMER save files . 51 header section . 52 main model section . 55 renormalization . 56 note to developers . 56 HMMER null model files . 57 HMMER prior files . 58 Sequence files . 58 supported file formats . 58 FASTA unaligned sequence format . 59 Stockholm, the recommended multiple sequence alignment format . 60 a minimal Stockholm file . 60 syntax of Stockholm markup . 60 semantics of Stockholm markup . ..