BBioinformaticsioinformatics EExplainedxplained

Bioinformatics explained: HMMER September 12, 2007

CLC bio Gustav Wieds Vej 10 8000 Aarhus C Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com [email protected] Explained mn cd ftosqecs ycmaigasqec oasaitclmdlyucnget can you model statistical single a of to comparison sequence simple a a comparing to instance By contrary For statistical a information. sequences sequences. to extra of sequence two some genetic a pattern of compare finding, or acids to gene is family amino for searching a database are for describing [Eddy, biology HMMs model 1990s profile of using mid field of the the idea in The prediction. within 1994] structure HMMs al., secondary et of [Krogh and use use mapping al. for the linkage et HMMs of and Krogh 1980s, Examples by late for introduced the 1998]. the e-value in were biology an for models computational finding profile probability of for field as a the used to is is introduced probability score were HMMs the resulting and The model to HMM. given scores the the assign match. to in assigning to the related by possibility states model be a the the the to to also in to sequence compared a is position is residues modeling It sequence particular sequence A a for score. positions. the at specific higher used in represented a a deletions are commonly and has gets acid insertions which transition it amino each alignment delete an and sequence have distributions and multiple probability you insert has if multiple- model So, match, the a probability. in for turning state states Each by family. contains sequences sequence 1998]. HMM [Eddy, related systems profile distantly scoring A position-specific for based search probability a the into of alignment improve number sequence HMMs infinite potentially profile a The over distribution probability a describes sequences" model Markov hidden "A HMMs) (profile models Markov searched. hidden sequences Profile the one along only one reflecting by or scores one family substitution another, standard with for replaced using search being than to acid rather better amino domain, a search be to in may beneficial sequences more reflecting it be many scores may and substitution It similarity, using similarity. sequence similarity for for for search searching to than of conservation rather way similarity same matrices domain simplistic the substitution standard very have using necessarily a sequences not protein is comparing may contexts, acids similar different amino in and specific patterns and structure, positions specific molecular shared As similar indicates it origin, level, evolutionary significant a significant common functionality. at identify as similar to considered properties sequences are sequence- biological two sequences of with two between kinds acid When similarity These amino scores. matches. calculate one penalty of comparisons gap substitution pairwise and 1990 ] given matrices based each al., substitution for their et standard score determine ways [Altschul using single and different another BLAST one sequences of of two like association compare number algorithms by 1981 ] a Alignment similarities Waterman, are and searches. there [Smith database Smith-Waterman and protein and bioinformatics e.g. in do used to widely is searching Database searches Similarity HMMER explained: Bioinformatics HMMER explained: Bioinformatics • oestsmyb osre o pcfcrsde hl te ie ersn considerable represent sites other while residues specific for variations conserved be may sites some .TeHMcnb adt eamdlgnrtn sequences. generating model a be to said be can HMM The 1998]. [Eddy, frequencies fidvda mn cdpstosof positions acid amino individual of .2 P. Bioinformatics Explained aea tal., in et relevant [Bateman all and sequence are of identified parts be short may in describe full motifs sequences. which families 2002]. a and 2500 sequence Repeats and The to or contexts. members, up protein structure contain family other. different of alignments of each elements family number Full to represent representative members. Domains related a motifs. family contains all and containing which repeats of alignment seed, domains, class families, a default to by the according represented families is protein family of A classification a is Pfam this released families, have protein Center ]. 9318 . [PFAM, Sanger represents 2007) the currently (July database at proteins the of Researchers used and 74% HMM. been ], 2002 covering has profile al., which a et family [Bateman protein building each collection for family for basis alignment (protein multiple the a database most as of the Pfam consists of database the available Pfam One for The publicly and functionality. the domains e.g. is known sharing for libraries family database). sequence protein HMM query a profile a to comprehensive search sequence to the needed from relatedness are libraries HMM Profile when identified are database be family Pfam to specific a likely and more matrices. sequence substitution also a standard are if of instead see sequences models to between statistical easier using relationships be Distant may it related. information, this upon Building HMMER explained: Bioinformatics • • netosmyb cetbea oestswieisrin a o eacpal tother at acceptable be be not may not insertions while may sites some sites sites at other acceptable be while may insertions functionality affecting without functionality affecting deleted without be deleted may sites some iue1: Figure ato nainetfrteGoi aiyfo h fmwebsite Pfam the from family Globin the for alignment an of part A .3 P. Bioinformatics Explained h te rgasi h akg are: package the in programs other The database for programs are these of from Two programs. nine accessible searching: contains package is HMMER the package Currently, HMMER The HMMER in Programs 2003]. [Eddy, colleagues 1995 and Eddy in Sean by released written http://hmmer.janelia.org was construction first HMMER for programs matrices. was scoring contains model. and specific package a position HMMER model. to of the given compared use in the sequence and HMMs to the profile related A of of be probability implementation the to the The in . using states the found sequence the for are to probability match biological residues a the sequence for is for the E-values score assigning HMMs resulting by the profile HMM and profile of HMM, a to implementation compared is software sequence a is of one HMMER 2003]. biology, [Eddy, computational HMMER in being HMMs popular profile most using the implementations software several are There package HMMER from accessed 2002]. al., than be et specific [Bateman can more HMMs //pfam.wustl.edu/ is of database database models database Pfam global domain domain on full The full based The a only and domains. is domain, an and full globin database to a is fragment a matches half to the to allows Pfam-B matches match partial only a allows which identifying which (ls) e.g. (fs) data. database found, fragment incorporate be quality A to to variants: domain high two intended well-annotated in is come a contains databases Pfam-B ]. is Both [PFAM, thus The Pfam-A Pfam-A in and quality. represented Pfam-B. lower already hand and of not domains by and Pfam-A database curated variants, generated two is automatically in which comes database, database Pfam The HMMER explained: Bioinformatics • • • • • • • • hmmfetch profiles. GCG hmmemit of emulation effort" "best and format, binary (E-values). scores value hmmconvert expectation accurate more calculating by sensitive, more searches hmmcalibrate hmmbuild hmmalign hmmsearch hmmpfam mtsqecspoaiitclyfo rfl HMM. profile a from probabilistically sequences Emit ul oe rmamlil eunealignment. sequence multiple a from model a Build lg eune oa xsigmodel. existing an to sequences Align erha M aaaefrmthst ur sequence. query a to matches for database HMM an Search e igemdlfo nHMdatabase. HMM an from model single a Get erhasqec aaaefrmthst igepoieHMM. profile single a to matches for database sequence a Search ovr oe ieit ifrn omt,icuigacmatHMR2 HMMER compact a including formats, different into file model a Convert ae nHMadeprclydtrie aaeesta r sdt make to used are that parameters determines empirically and HMM an Takes (US). . http://pfam.sanger.ac.uk U)or (UK) http: .4 P. Bioinformatics Explained h rti ehmgoi sapatgoi idn xgnadamme ftefml fglobins. of family the of member a and oxygen binding globin first hmmpfam plant The programs, a is search leghemoglobin database protein two The the use to how of hmmsearch. examples and some gives section This usage HMMER of search Examples two programs. the hmmcalibrate own use their and construct hmmbuild to to hmmalign, have seeking the only Researchers use built. normally should been HMMs would already profile has researcher database a the database, since programs Pfam the using When lbnGoi 5526-11 2.6e-21 75.5 N --- E-value ------Score ----- 1 8.2 2.6e-21 = 1 E 75.5, score 2.8 7.8 141: to 7 from 1, subuni of alpha 1 prenyltransferase domain domains: Protein 1.3 Globin: top-scoring of - Alignments - - - (UL - - factor Globin - - ... processivity - - polymerase - - DNA Description ------PPTA - - - - Herpes_UL42 domains): - - all - - Globin includes - - (score bean) ------classification (Broad - - family faba - - Model sequence Vicia - Pfam_fs.bin - for - - - Scores Leghemoglobin-1 [none] - - - lgb1_vicfa.fasta - - - - - Description: - - - - Accession: P02232|LGB1_VICFA - - sequence: - - Query ------(GPL) file: - License Sequence - Public file: - General Medicine HMM - GNU of - the School database - under University HMM distributed HHMI/Washington against Freely 1992-2003 sequences (C) 2003) more Copyright (Oct or output): 2.3.2 one full the HMMER search for - lgb1_vicfa.fasta appendix the hmmpfam Pfam_fs.bin see shown, hmmpfam is of localhost:~...% output quality the the all (not for run cut-offs example the the is adjusting Here for mainly sequences. more parameters, present. or of to one matches number with file a a accepts is hmmpfam second the and file database HMM profile hmmpfam example, second the In family. protein a HMM, Swiss-Prot. an from matches database given hmmsearch a in sequence any number HMMER explained: Bioinformatics dy 2003] [Eddy, • hmmindex 022lgb1_vicfa P02232 hmmpfam h omn ievrino mpa a w eurdprmtr,tefrti the is first the parameters, required two has hmmpfam of version line command The sue oietf ebr ftegoi rti aiyaog10 sequences 1000 among family protein globin the of members identify to used is ne nHMdatabase. HMM an Index xml ilso o h ehmgoi rmaba SisPo accession (Swiss-Prot bean a from 1 leghemoglobin the how show will example * ->dkalvkasWgkvkgtdnreelGaealarlFkayPdtktyFpkfgdls l+sk + ++P++F l+ f +k++F+ ++P ++ +++ + + n k+ alv++s + srcgie ob eae otefml.Tehmerhsosif shows hmmsearch The family. the to related be to recognized is ) .5 P. Bioinformatics Explained hsi umr ftehmerh(e h pedxfrtefl version) full the the for set. appendix adjusting are the parameters for (see these hmmsearch mainly of the none of hmmpfam, example summary this as a In is presented. parameters be This to same hits the the of limits of sequences significance many chosen randomly supports 1000 of hmmsearch file file. a sequence with the HMM is globin Swiss-Prot: argument a second from of the comparison and a file is HMM sequence the Here The is HMM. argument the first with the HMM. Again, searched an sequence be to the to matches and HMM sequences for profile of searched single database is a database (typically) only contain sizeable should a file is HMM file the that is difference big The hmmsearch Pfam_fs.bin Pfam_fs -b program: hmmconvert hmmconvert the using format. localhost:~...hmmer% binary text to the converted than be compact spent can more time file is of database format amount HMM database the An reduces binary This the used. since is disk, the database the Pfam at the from the is Looking of reading model version binary globin globin. a (the how a well Notice 1). definitely sequence below the is e-value matching an leghemoglobin domains with the Pfam match other that only no is are search. search there hmmpfam this e-values, the state. the that of in in in domain conclusion conserved state top-scoring very The each the is of of residue alignment residue the the that probable the mean shown most by letters is the Uppercase represented Above represents match. is the letter HMM in Each profile of used alignment. HMM The output the full presented. in the are line in alignments top shown domain is individual This the HMM. Finally, profile the which of about part information which appendix. including matches the matches, sequence domain to individual the expected the of is HMMs describing part the that table profile quality to a of similar get size. similar also of list same is You matches the a e-value of of with number database The table the random quality. is a a it in match show occur i.e. results the reports, The about BLAST information sequence, that Pfam sequence). each e-value and one the for sequence results just in the the (often follows matches matching time then for and a information search at general the one some with in starts sequence output The query the is bean a database. from 1 Leghemoglobin HMMER explained: Bioinformatics 022LB11I141 ... I 141 P02232|LGB 140 QKG-VLDPHFVVVKEALL-KTIKEASGD--KWSEELSAAWEVAYDGLATA 95 P02232|LGB 94 DSAGVVDSPKLGAHAEKVFGMVRDSAVQLR----ATGEVVLDGKDGSIHI 49 48 QEALVNSSSQLFKQ--NPSNYSVLFYTIILQKAPTAKAMFS-F--LK P02232|LGB 7 P02232|LGB unn h msac rga sdn nvr uhtesm a shmmpfam. as way same the much very in done is program hmmsearch the Running + F++al+++ +++eaW + +a+a H+ a+d aAW+ + k ++++e + + g+ ++++++ l<- +++eall +F ++ + v ++g +L+ erghvdpanFkllgeallIvvLaahlggeveftpevkaAWdkaldvvada + + aH++kV++++ +spk+ +++++ sadaikgspkfkaHgkkVlaalgeavkhLgnddddgnlkaalkkLaarHa * .6 P. Bioinformatics Explained ceeaigteHMRporm mpa n msac sn ID(igeinstructions (single Cell SIMD Bioinformatics using 2007 ]. CLC bio, hmmsearch the [CLC and is technology hmmpfam example data) implementations multiple One programs hardware available. HMMER also and the are accelerating software searches different HMM the demanding, accelerating computationally are algorithms h ME akg sacsil from accessible is package 1 HMMER The HMMER 9.2 of accelerations and Implementations been (e-values have they globins as definitely convincing are too sequences not 1. are above the -2.7 matches e-values of sequence assigned four remaining the that 1 the while is of 4.1e-07), hmmsearch alignments below this the sequences and of 9.5 of matches conclusion table domain The the a of comes appendix). table 1 then the a (see and by 1 matches followed information, actual is general This 9.4 some HMM. the with 8.2 matching starts -2.7 1 output 1 1 the Again, 7.2 prec 6.4 6.2 gp160 1 --- glycoprotein -2.7 ------N ... Envelope -2.5 (Q76638) 5.2 1 1 E-value 1 UDP-glucuronosyltransfe 1 ----- ENV_HV2UC Putative (Oute -2.3 5.2 5.1 1 1 (Q22180) Thioredoxin SPC34 -2.1 -2.1 4 Score UGT55_CAEEL (Q9BDJ3) subunit 2 complex 4.1 4.1e-07 THIO_CALJA DASH kina 1 (P36131) serine/threonine-protein (E -1.8 SPC34_YEAST BR N-acetyltransferase (Complex (Q8TDC3) Glutamate SPP1 1.3 22.9 BRSK1_HUMAN (P62059) component -1.8 -1.8 COMPASS nucle ARGJ_CORDI (Q03012) preferring Inosine-uridine -1.5 -1.4 SPP1_YEAST (Q27546) p beta -0.4 repeat-containing polymerase IUNH_CRIFA Leucine-rich RNA 1 (Q96CN5) DNA-directed chromo (HSP70 0.3 LRC45_HUMAN (Q8RHI7) of protein 2. 1 maintenance kDa (EC 2.3e-16 RPOC_FUSNN Structural 70 subunit 1 (Q8NDV3) shock small 2.9e-27 SM1L2_HUMAN Heat primase nor (Q01233) DNA protein 55.0 1.1e-58 HSP70_NEUCR (P49642) resistance Quinolone hemoglobin 92.9 - PRI1_HUMAN (P0A0J7) wall - Body (Hemoglobin-lik 202.2 - NORA_STAAU (O76243) Flavohemoprotein - (Q6HLA6) (Hemoglobin-lik - GLBB_CERLA Flavohemoprotein (Hemo - (Q6LM37) alpha-D - HMP_BACHK subunit - Hemoglobin - - HMP_PHOPR (P68059) ------domains): - - HBAD_AEGMO Description all - - includes ------(score - [none] - sequences - - Sequence complete - - for - - Scores - - - cutoff: - - Eval - - per-domain [Globin] - - globin.hmm - Globin PF00042.12 - - Description: Globin - - - - Accession: - - HMM: - - Query - 10 - - <= uniprot_sprot_1000.fasta - - [none] - - - - cutoff: - Eval - per-sequence cutoff: [none] - score - per-domain cutoff: - score - per-sequence - (GPL) database: - License Sequence - Public file: - General Medicine HMM - GNU of - the School HMM - under University profile distributed HHMI/Washington a Freely 1992-2003 with (C) 2003) database Copyright (Oct sequence 2.3.2 a HMMER search - uniprot_sprot_1000.fasta hmmsearch globin.hmm hmmsearch localhost:~...hmmer% HMMER explained: Bioinformatics HMhsbe airtd -ausaeeprclestimates] empirical are E-values calibrated; been has [HMM http://hmmer.janelia.org spoieHMM profile As . .7 P. Bioinformatics Explained http://www.clccell.com/ Cell Bioinformatics CLC http://pfam.wustl.edu/ US in available database Pfam (UK) http://pfam.sanger.ac.uk Europe in available database Pfam ftp://selab.janelia.org/pub/software/hmmer/CURRENT/Userguide.pdf package HMMER for Manual http://hmmer.janelia.org website package HMMER The resources useful Other HMMER explained: Bioinformatics .8 P. Bioinformatics Explained Koh . rw,M,Ma,I . jlne,K,adHuse,D (1994). D. PFAM. Haussler, and ] [PFAM, K., Sjölander, S., I. Mian, M., Brown, A., Krogh, 1994] al., et [Krogh (2003). S. Eddy, models. Markov 2003] hidden [Eddy, Profile (1998). paper. S. white Eddy, Cell Bioinformatics 1998] CLC [Eddy, (2007). bio CLC 2007] bio, [CLC (2001). B. Ouellette, and A. Baxevanis, R., 2001] S. Ouellette, Eddy, and L., [Baxevanis Etwiller, R., Durbin, L., Cerruti, E., Birney, A., Bateman, 2002] al., et [Bateman Sih .F n aemn .S 18) dniiaino common of Identification (1981). S. M. Waterman, and F. T. Smith, 1981] Waterman, and [Smith J. D. Lipman, and W., E. Myers, W., Miller, W., Gish, F., S. Altschul, 1990] al., et [Altschul References HMMER explained: Bioinformatics idnmro oesi opttoa ilg.apiain opoenmodeling. protein to applications biology. computational 235(5):1501--1531. Missouri in models Louis, markov Saint Hidden 8232 Box Avenue, http://hmmer.wustl.edu/. Euclid edition. South 2.3.2 660 vesion USA, Medicine, 63110, of School University models) Markov hidden pfam The (2002). Interscience. L. L. E. Sonnhammer, and M., Marshall, database. L., families protein K. Howe, S., Griffiths-Jones, oeua subsequences. molecular tool. search alignment local Basic (1990). http://pfam.sanger.ac.uk oadHge eia nttt n et fGntc Washington Genetics of Dept. and Institute Medical Hughes Howard . o Biol Mol J uli cd Res Acids Nucleic h ME srsGie(ilgclsqec nlssuigprofile using analysis sequence (biological Guide User’s HMMER The 147(1):195--197. , o Biol Mol J 30(1):276--280. , . 215(3):403--410. , Bioinformatics Bioinformatics 14:755--763. , o Biol Mol J Wiley- . .9 P. , Bioinformatics Explained PA111 8. 1. . 8.2 2.8 7.8 1 2.6e-21 .] 2.6e-21 75.5 31 1.3 75.5 [] 3 148 N .] 1 164 E-value --- .. ------38 149 2.6e-21 Score E-value .. = ----- 141 11 .. E 1 score ------29 75.5, score ----- 7 141: 8.2 to 14 7 1/1 from hmm-t 1, hmm-f of 1 ----- 1 ----- 1/1 domains: domain 1/1 top-scoring Globin: 7.8 2.8 seq-t of seq-f Alignments ----- Domain ------subuni Herpes_UL42 alpha prenyltransferase PPTA 1.3 Protein Globin ------(UL domains: Model - - Globin factor for - - processivity Parsed - - polymerase - - Description DNA ------PPTA - - - - domains): Herpes_UL42 - - all - - includes Globin - - bean) (score - - (Broad classification ------faba family - Pfam_fs.bin - Vicia sequence Model - - - for - - [none] Leghemoglobin-1 Scores - lgb1_vicfa.fasta ------Description: - - - - P02232|LGB1_VICFA Accession: - - sequence: - - Query ------(GPL) - - License - file: Public - Sequence Medicine General - file: of GNU - HMM database School the - HMM University under - against HHMI/Washington distributed sequences 1992-2003 Freely more 2003) (C) or (Oct Copyright one 2.3.2 search HMMER lgb1_vicfa.fasta - Pfam_fs.bin hmmpfam hmmpfam localhost:~...hmmer% hmmpfam Example Appendix HMMER explained: Bioinformatics ------[none] - - - - - cutoff: - Eval - [Globin] per-domain - globin.hmm ------10 - uniprot_sprot_1000.fasta <= - [none] - - cutoff: - Eval - [none] cutoff: per-sequence - score - cutoff: per-domain - score (GPL) - per-sequence License - database: Public - Sequence Medicine General - file: of GNU - HMM HMM School the - profile University under - a HHMI/Washington distributed with 1992-2003 Freely database 2003) (C) sequence (Oct Copyright a 2.3.2 search HMMER uniprot_sprot_1000.fasta - globin.hmm hmmsearch hmmsearch localhost:~...hmmer% 7.8 = hmmsearch Example E 1.3, score 29: to 14 8.2 from = 1, E of 2.8, 1 score domain 38: Herpes_UL42: to 11 from 1, of 1 domain PPTA: 022LB1 SLKNSYVF29 SSQLFKQNPSNYSVLF 38 14 VNSSSQLFKQNPSNYSVLFYTI-ILQKAP P02232|LGB 11 141 P02232|LGB I 141 140 P02232|LGB QKG-VLDPHFVVVKEALL-KTIKEASGD--KWSEELSAAWEVAYDGLATA 95 94 P02232|LGB DSAGVVDSPKLGAHAEKVFGMVRDSAVQLR----ATGEVVLDGKDGSIHI 48 49 QEALVNSSSQLFKQ--NPSNYSVLFYTIILQKAPTAKAMFS-F--LK P02232|LGB 7 P02232|LGB * * H+ +a+a a+d + aAW+ k ++++e + g+ + ++++++ +++eall l<- ++ +F + v ++g +L+ + erghvdpanFkllgeallIvvLaahlggeveftpevkaAWdkaldvvada + aH++kV++++ +spk+ +++++ sadaikgspkfkaHgkkVlaalgeavkhLgnddddgnlkaalkkLaarHa * ->mlsvvkhelnsytvfF<- ->LelteklleldpkNysaWnyRrwlleklg<- ->dkalvkasWgkvkgtdnreelGaealarlFkayPdtktyFpkfgdls ++k +y+v+F + k+ +l+k++ ++++ y +++p+Nys+ ++l ++ l+ * f +k++F+ ++P ++ +++ + + n k+ alv++s + * * .10 P. Bioinformatics Explained N_VU / 9 0 .1618. 279.5 9.2 9.4 8.2 -2.7 -2.7 6.2 7.2 -2.7 .] 6.4 148 -2.5 .] 1.1e-58 148 = .] 136 E 5.2 5.2 148 202.2, -2.1 -2.3 .. 132 score -2.1 39 .. 136: 134 609 to 4 5.1 .. 6 [. .] 21 from -1.8 -1.8 21 .] 148 24 .. 597 1, 4.1 148 228 of 2 1 139 5 domains: domain -1.8 .. .] 1 131 .. 214 top-scoring HBAD_AEGMO: 1.3 39 148 177 1/1 of -1.4 .. Alignments -1.5 .. 160 .] 132 212 162 2.3e-16 4.1e-07 148 16 .. 1/1 -0.4 [. 138 151 1/1 ENV_HV2UC 2.9e-27 55.0 22.9 0.3 24 .] .. 195 148 133 307 UGT55_CAEEL 92.9 .. 118 1/1 .] 119 THIO_CALJA 1.1e-58 [] .. 148 1 134 .. 291 1/1 148 144 .] 405 1/1 SPC34_YEAST 202.2 [] 148 96 148 132 .. 1/1 BRSK1_HUMAN 1 1 19 545 390 [] 108 .. 1/1 ARGJ_CORDI 148 1 .. 586 9.2 251 531 SPP1_YEAST .] .. 1/1 1 .. 109 285 565 1/1 IUNH_CRIFA 131 235 .. LRC45_HUMAN ------131 4 248 1/1 E-value 6 RPOC_FUSNN ----- .. 1/1 -2.7 score 136 6 1/1 SM1L2_HUMAN 1 1/1 HSP70_NEUCR 6 1/1 PRI1_HUMAN 9.5 ----- 1/1 hmm-t ----- NORA_STAAU hmm-f 1/1 1 GLBB_CERLA 1 ----- 1/1 HMP_BACHK 9.4 seq-t ----- 8.2 seq-f ------HMP_PHOPR 1 -2.7 1 1 Domain HBAD_AEGMO 7.2 ------6.2 6.4 prec --- 1 gp160 domains: Sequence N ------2.7 glycoprotein for -2.5 Envelope Parsed E-value 1 1 5.2 (Q76638) 1 ----- 1 UDP-glucuronosyltransfe 1 1 5.1 5.2 -2.3 (Oute Putative ENV_HV2UC Score 4 -2.1 -2.1 SPC34 Thioredoxin (Q22180) 2 subunit (Q9BDJ3) UGT55_CAEEL 4.1e-07 4.1 complex 1 kina DASH THIO_CALJA -1.8 (E serine/threonine-protein (P36131) (Complex N-acetyltransferase BR SPC34_YEAST 22.9 1.3 SPP1 Glutamate (Q8TDC3) -1.8 -1.8 component (P62059) BRSK1_HUMAN nucle COMPASS preferring (Q03012) ARGJ_CORDI -1.4 -1.5 Inosine-uridine -0.4 beta p (Q27546) SPP1_YEAST polymerase repeat-containing 1 RNA Leucine-rich IUNH_CRIFA 0.3 (HSP70 chromo DNA-directed (Q96CN5) 1 2. protein of (Q8RHI7) LRC45_HUMAN 2.3e-16 (EC kDa maintenance 1 subunit 70 Structural RPOC_FUSNN 2.9e-27 small shock (Q8NDV3) nor primase Heat SM1L2_HUMAN 1.1e-58 55.0 protein DNA (Q01233) resistance (P49642) HSP70_NEUCR - 92.9 hemoglobin Quinolone - wall (P0A0J7) PRI1_HUMAN - 202.2 (Hemoglobin-lik Body - Flavohemoprotein (O76243) NORA_STAAU - (Hemoglobin-lik (Q6HLA6) - (Hemo Flavohemoprotein GLBB_CERLA - alpha-D (Q6LM37) - subunit HMP_BACHK - Hemoglobin ------(P68059) HMP_PHOPR - domains): - all Description HBAD_AEGMO - includes - (score ------sequences - complete Sequence - for - Scores - - - - - PF00042.12 Globin - Globin Description: - - Accession: - HMM: - Query - - - - HMMER explained: Bioinformatics N_VU:dmi f1 rm57t 0:soe-.,E=9.5 = E -2.7, score 609: to 597 from 1, of 1 domain ENV_HV2UC: BDAGO6DKIAWKQ-HEFAAQMIYPKYPFDS49 DKKLIQATWDKVQG--HQEDFGAEALQRMFITYPPTKTYFPHF-DLS 6 HBAD_AEGMO . Anme fainet eermvdfrbeiy ... brevity] for 136 removed were alignments of L number 136 [A ... 135 HBAD_AEGMO YNLRVDPVNFKLLSQCFQ-VVLAVHLGK--EYTPEVHAAFDKFLSAVAAV 89 88 HBAD_AEGMO -----PGSDQVRGHGKKVVNALGNAVKSM-----D-NLSQALSELSNLHA 50 HBAD_AEGMO estimates] empirical are E-values calibrated; been has [HMM sadaikgspkfkaHgkkVlaalgeavkhLgnddddgnlkaalkkLaarHa * * va++ l e+tpev+aA+dk+l+ l<- hlg+ vvLa ++++vdp+nFkll++++ erghvdpanFkllgeallIvvLaahlggeveftpevkaAWdkaldvvada ->dkalvkasWgkvkgtdnreelGaealarlFkayPdtktyFpkfgdls ->AWdkaldvvadal<- kl+++vg+e+aa+++y kypfdls tktyFp+f e++Gaeal+r+F++yP + g dk+l++a+W+kv * g + gk+agak+dn a+L +Ha +al++L+ nl d + HgkkV++alg+avk +++ +gs * .11 P. Bioinformatics Explained See HMMER explained: Bioinformatics CCbo a ob lal aee sato n rvdro h ok o a o s this use not may You work. work. this the upon and build of form nor provider original transform, and alter, its author not in may as work You labeled purposes. the educational clearly commercial for attribute for be work must work to the You use has and bio" conditions: display, "CLC following distribute, copy, the to under free purposes, are Attribution-NonCommercial- You Commons License. Creative a 2.5 under NoDerivs licensed are articles scientific bio’s CLC All License Commons Creative report: hits: tophits_s top Domain hits: report: top tophits_s sequence Whole 1000 searched: sequences 83.2139 Total = statistic fit: chi-sq EVD theoretical of details Statistical % cr b x oe=rpeet sequences) 4 represents = (one --- exp --- obs ----- scores: all score of Histogram o ouetecontents. the use to how (h-qae 1.174e-13 = P(chi-square) 1 9 237|======88|======205|======11|== 192 0|===== 131 0|=== 79 0|= -10 25 -11 17 -12 11 2 -13 -14 -15 -16 111| 2| 5|= 1 5 10 -1 -2 -3 41 10|== 20|==== 38|======16 69|======20 42 120|======-4 106 -5 143 -6 -7 -8 915187|======195 -9 609 SWGCAFRQVCHTT 597 ENV_HV2UC -|== 5 0 oa eoy 33K 74 74 memory: cutoff: Total E 22K Satisfying hits: 74 Total 18 memory: cutoff: Total E Satisfying hits: Total http://creativecommons.org/licenses/by-nc-nd/2.5/ aba=0.6626 = lambda u=-9.7401 = mu * * = * = * * +a+++v++ +W = ==== * * = * ======* * ======* == * o oeifrainon information more for * .12 P.