Genescan Smo4 and Fgenesh Smo4 Alignment
Total Page:16
File Type:pdf, Size:1020Kb
Genescan_smo4 and Fgenesh_smo4 alignment
Genescan_smo4 and Fgenesh_smo4 have a strong alignment to one another at the end of the predicted gene model. However the sequences have low similarity from AA 1898-1953. Multalign genescan_smo4 and Fgenesh_smo5
As with the alignment before, this alignment is fairly strong also. There are 6 main differences between the two sequences. It looks as though both gene models have additional exons (or extended certain exons) that the other gene model does not have. There are three regions where the sequence similarity is low, perhaps those portions of the sequence are in a different reading frame.
Blastp results for Genescan_smo4
Putative conserved domains have been detected, click on the image below for detailed results.
Best Blast hits
>ref|XP_001765108.1| predicted protein [Physcomitrella patens subsp. patens]
gb|EDQ70103.1| predicted protein [Physcomitrella patens subsp. patens] Length=998
GENE ID: 5928285 PHYPADRAFT_184382 | hypothetical protein [Physcomitrella patens subsp. patens] (10 or fewer PubMed links)
Score = 850 bits (2196), Expect = 0.0, Method: Compositional matrix adjust. Identities = 468/990 (47%), Positives = 651/990 (65%), Gaps = 121/990 (12%)
Query 14 LLVVLA----ACDAIYEDQVGLWDWHQEYIGKVTHAVFQT-ASGKKRVIVATEKSVVASL 68 L VV+A C A+YEDQVG+ DWHQ+YIG+V HAVFQT +G+KRV+VATE++ +ASL Sbjct 13 LFVVVACLSSTCLALYEDQVGVRDWHQQYIGRVKHAVFQTQGTGRKRVVVATEQNAIASL 72
Query 69 NLRSGEIWIVYVVSSA------DGRLLWTSDL----- 94 NLR+G+I+ +V+ DG L+W + + Sbjct 73 NLRTGDIYWRHVLGETDNIDALEISMGKYVLTLSKGNTVRAWHLPDGALIWETRIQAFQG 132
Query 95 LDERLAQTSLSFEG---KNIYVAGFSGSSL------ALFRIDA--STGAFTTLKTTE 140 + L + + +G +++V +SGS L L+R+DA S F Sbjct 133 FNLGLIKLPVDIDGDKVNDLFV--YSGSILTAISGADGATLWRVDAAGSKNIFIEKVVLA 190
Query 141 PLNPSSFVLS----SGV-FAALDTQGNIVTGLMEAEV------VELQKTS 179 P ++ L GV +D + ++ L AE+ V LQ + Sbjct 191 PEEGKAYGLGFFGIMGVALVEIDLKTGDLSDLKSAELSSMLSTEHLHVTSDYAVALQSDA 250
Query 180 LATLLDSPVSSAQLLPDKIPGGCVLSTDGGSIFVLGLDKKGV------221 + ++ S +L+ + P +L+ G S+ +L + +GV Sbjct 251 ESLVVALINSHKELIVVETPVSSILTNPGTSLKLLSTNLEGVISLSSDDQTVILKVDPTT 310
Query 222 ---EVLQQIQGPPVVSNSI-VLDGTFAQSFLQHINSKE------IRVRVLSGKEWIETAE 271 +++++ G VS+S+ VLD +A + ++ S+E +RV E + Sbjct 311 GKLSLVERLTGAVAVSDSLSVLDDKYATAIVEF--SEEGSAQNVFNLRVKGNDFSDEVQK 368
Query 272 ETVEVDPNKGGVQKVFMNAYIKTDRSRGFRVLIVGQDHSLALLQQGKVVWSREEALASVV 331 ETV++ ++G +QK F+NAY++TDRS GFR L+VG+D SL+LLQQG+VVW+RE+ LAS+V Sbjct 369 ETVKLPSHRGFIQKAFLNAYVRTDRSHGFRALVVGEDDSLSLLQQGEVVWTREDGLASIV 428
Query 332 DTLTAELPLEKAGVSVAEVEHDLYEWLKGHVLRLKSTLMLATAEEQTALQALRLNNADKT 391 D AELPLEK GVSVAEVEHDL EWLKGH++++K+TL LAT +E A+Q RLN ADKT Sbjct 429 DASPAELPLEKDGVSVAEVEHDLAEWLKGHIMKMKATLFLATPDELAAVQRARLNQADKT 488
Query 392 KMTRDHNGFRKLIVVLTSSGKLFALHTGNGGIVWSRFIPELSTK------GSLKLYPWRI 445 K TRDHNGFRKL+VVLT +GK+ ALHTG+G +VWS +P L LK++ W++ Sbjct 489 KHTRDHNGFRKLLVVLTKAGKISALHTGDGHVVWSLLVPSLRASYGNPRFSPLKIFQWQV 548
Query 446 PHKH-VDENAVALVLGSS---HDGTGFAAWVDMLTGSVQETLALPYSVKVALALPVVDSS 501 PH+H +DEN V L+L + +D G +W+D+ G+ +++ L YSV + PV DSS Sbjct 549 PHQHALDENPVVLILAQADPGYDVKGALSWIDVHKGTELQSVKLSYSVTQVVTTPVTDSS 608
Query 502 ERRLHLLIDDQNKAHLYPTSDESLSLFEKYMQNVYFYIADKEAGQIEGYNIKSQVDAGE- 560 E+RLHLLID++ +AHL+P ++ESL+LF KY +N YFY DK ++ GY + VD Sbjct 609 EQRLHLLIDNRKRAHLFPATEESLALFLKYKENAYFYEVDKADQKMHGYGLLDLVDPSTG 668
Query 561 --EGGLVFQSQKIWSVLFPKDSETIAAITTRRADEMVHTQAKVLGNRDVWYKYLNKNMVF 618 + G VF+S+K+WS++FP ++E+I + TR++DE+ HTQ KVL NRD+ +KYLNKN+VF Sbjct 669 NIKEGYVFESRKLWSIVFPAETESITTVVTRKSDEVTHTQTKVLSNRDILFKYLNKNLVF 728
Query 619 VATVTPQD-SRVGAANPEETWLVAYLIDSVTGQILHRVSHAHAQGPVHVVFSENWVVYCY 677 VATV P+D S+VGA +PEE LV YL+D+VTG+ILHRVSH + QGPVH V SENWVVY Y Sbjct 729 VATVAPKDKSQVGAVSPEEKTLVVYLVDTVTGRILHRVSHPNMQGPVHAVLSENWVVYHY 788
Query 678 FNVRNHRHEMSVLEVYDKS-ADGKDVLQLMLGRYNASVPFSSFSPRNLEVKGQSYFFPST 736 FN+R HR+EMSVLE+YD+S K V+QLMLG++N+SVP SS+SP NLEVK QSYFF T Sbjct 789 FNLRQHRYEMSVLEIYDQSRLPDKGVIQLMLGQHNSSVPISSYSPVNLEVKQQSYFFTFT 848
Query 737 VRTMSVTFTARGITGKQILVGTIGNQVIALDKRFLDPRRSADPTPMEREEGVIPLSEGLP 796 V+TM+VT TA+GIT KQ+L+GT+ +QV+ALDKR DPRR+ PTP E+EEG++PL++ +P Sbjct 849 VKTMTVTSTAKGITAKQLLLGTVNDQVLALDKRLFDPRRTLTPTPAEQEEGILPLTDSIP 908
Query 797 LFPQSYLTHAARVEELRGIISVPARLESTCLVFAYGIDLFFTRTAPSRTYDSLTEDFSYA 856 + PQSYLTH+ +VE LRG++++PARLEST LVFAYG+DLF+T TAPS+ YDSLTEDFSYA Sbjct 909 ISPQSYLTHSYQVEGLRGLLTIPARLESTSLVFAYGLDLFYTHTAPSKIYDSLTEDFSYA 968
Query 857 LLLITIVVLVVAIAVSMVLSQRKELREKWK 886 LLL+TIVVL ++I V+ VLS+R+EL EKWK Sbjct 969 LLLVTIVVLFLSIIVTYVLSERRELAEKWK 998
>emb|CAO44049.1| unnamed protein product [Vitis vinifera] Length=987
Score = 800 bits (2065), Expect = 0.0, Method: Compositional matrix adjust. Identities = 433/968 (44%), Positives = 615/968 (63%), Gaps = 108/968 (11%)
Query 23 AIYEDQVGLWDWHQEYIGKVTHAVFQT------49 ++YEDQVGL DWHQ+YIGKV HAVF T Sbjct 24 SLYEDQVGLMDWHQQYIGKVKHAVFHTQKAGRKRVVVSTEENVIASLDLRRGDIFWRHVL 83
Query 50 ------ASGKKRVIVATEKSVVASLNLRSGE-IW------76 A GK + +++E S++ + NL G+ +W Sbjct 84 GPNDAVDEIDIALGKYVITLSSEGSILRAWNLPDGQMVWESFLQGPKPSKSLLSVSANLK 143
Query 77 -----IVYV------VSSADGRLLWTSDLLDERL--AQTSLSFEGKNIYVAGFSG-SS 120 +++V VSS DG +LW D DE L Q IY GF G S Sbjct 144 IDKDNVIFVFGKGCLHAVSSIDGEVLWKKDFADESLEVQQIIHPLGSDMIYAVGFVGLSQ 203
Query 121 LALFRIDASTGAFTTLKTTEPLNPSSF-----VLSSGVFAALD-TQGNIVT-GLMEAEVV 173 L ++I+ G LK P F ++SS ALD T+ ++++ ++ E+ Sbjct 204 LDAYQINVRNGE--VLKHRSAAFPGGFCGEVSLVSSDTLVALDATRSSLISISFLDGEI- 260
Query 174 ELQKTSLATLLDSPVSSAQLLPDKIPGGCVLSTDGGSIFVLGLDKKGVEVLQQIQGPPVV 233 LQ+T ++ L+ A +LP K+ G ++ D +FV D+ +EV ++I Sbjct 261 SLQQTHISNLVGDSFGMAVMLPSKLSGMLMIKIDNYMVFVRVADEGKLEVAEKINDAAAA 320
Query 234 SNSIVLDGTFAQSF--LQHINSKEIRVRVLSGKEWI-ETAEETVEVDPNKGGVQKVFMNA 290 + + Q+F ++H +K I + V +W + +E++ +D +G V K+F+N+ Sbjct 321 VSDALALSEGQQAFGLVEHGGNK-IHLTVKLVNDWNGDLLKESIRMDHQRGCVHKIFINS 379
Query 291 YIKTDRSRGFRVLIVGQDHSLALLQQGKVVWSREEALASVVDTLTAELPLEKAGVSVAEV 350 YI+TDRS GFR LIV +DHSL LLQQG++VWSRE+ LAS++D +ELP+EK GVSVA+V Sbjct 380 YIRTDRSHGFRALIVMEDHSLLLLQQGEIVWSREDGLASIIDVTASELPVEKEGVSVAKV 439
Query 351 EHDLYEWLKGHVLRLKSTLMLATAEEQTALQALRLNNADKTKMTRDHNGFRKLIVVLTSS 410 EH+L+EWLKGH+L+LK TLMLA+ E+ A+Q +RL +++K+KMTRDHNGFRKL++VLT + Sbjct 440 EHNLFEWLKGHMLKLKGTLMLASPEDMIAIQGMRLKSSEKSKMTRDHNGFRKLLIVLTRA 499
Query 411 GKLFALHTGNGGIVWSRFIPELSTKGS------LKLYPWRIPHKH-VDENAVALVLGS-- 461 GKLFALHTG+G +VWS + L + L +Y W++PH H +DEN LV+G Sbjct 500 GKLFALHTGDGRVVWSVLLHSLHNSEACAYPTGLNVYQWQVPHHHAMDENPSVLVVGRCG 559
Query 462 -SHDGTGFAAWVDMLTGSVQETLALPYSVKVALALPVVDSSERRLHLLIDDQNKAHLYPT 520 D G ++VD TG ++L L +S++ + L DS E+RLHL+ID + AHLYP Sbjct 560 LGSDAPGVLSFVDTYTGKELDSLFLTHSIERIIPLSFTDSREQRLHLIIDTDHHAHLYPR 619
Query 521 SDESLSLFEKYMQNVYFYIADKEAGQIEGYNIKSQVDAGEEGGLVFQSQKIWSVLFPKDS 580 + E++ +F+ + N+Y+Y + E G I G+ +KS E F ++ +WS++FP +S Sbjct 620 TPEAIGIFQHELPNIYWYSVEAENGIIRGHALKSNCILQEGDEYCFDTRDLWSIVFPSES 679
Query 581 ETIAAITTRRADEMVHTQAKVLGNRDVWYKYLNKNMVFVATVTPQDS-RVGAANPEETWL 639 E I A TR+ +E+VHTQAKV+ ++DV YKY++KN++FVATV P+ + +G+ PEE+WL Sbjct 680 EKILATVTRKLNEVVHTQAKVITDQDVMYKYVSKNLLFVATVAPKATGEIGSVTPEESWL 739
Query 640 VAYLIDSVTGQILHRVSHAHAQGPVHVVFSENWVVYCYFNVRNHRHEMSVLEVYDKS-AD 698 V YLID+VTG+I++R++H QGPVH VFSENWVVY YFN+R HR+EMSV+E+YD+S AD Sbjct 740 VVYLIDTVTGRIIYRMTHHGTQGPVHAVFSENWVVYHYFNLRAHRYEMSVVEIYDQSRAD 799
Query 699 GKDVLQLMLGRYNASVPFSSFSPRNLEVKGQSYFFPSTVRTMSVTFTARGITGKQILVGT 758 KDV +L+LG++N + P SS+S + K Q YFF +V+ M+VT TA+GIT KQ+L+GT Sbjct 800 NKDVWKLVLGKHNLTSPVSSYSRPEVITKSQFYFFTHSVKAMAVTSTAKGITSKQLLIGT 859
Query 759 IGNQVIALDKRFLDPRRSADPTPMEREEGVIPLSEGLPLFPQSYLTHAARVEELRGIISV 818 IG+QV+ALDKR+LDPRR+ +P+ EREEG+IPL++ LP+ PQSY+TH +VE LRGI++ Sbjct 860 IGDQVLALDKRYLDPRRTINPSQSEREEGIIPLTDSLPIIPQSYVTHNLKVEGLRGIVTA 919
Query 819 PARLESTCLVFAYGIDLFFTRTAPSRTYDSLTEDFSYALLLITIVVLVVAIAVSMVLSQR 878 PA+LEST LVFAYG+DLFFTR APSRTYD LT+DFSYALLLITIV LV AI V+ +LS+R Sbjct 920 PAKLESTTLVFAYGVDLFFTRIAPSRTYDLLTDDFSYALLLITIVALVAAIFVTWILSER 979 Query 879 KELREKWK 886 KEL+EKW+ Sbjct 980 KELQEKWR 987
>ref|XP_001772470.1| predicted protein [Physcomitrella patens subsp. patens]
gb|EDQ62752.1| predicted protein [Physcomitrella patens subsp. patens] Length=252 /note="CHY zinc finger. This family of domains are likely
to bind to zinc ions. They contain many conserved cysteine and histidine residues. We have named this domain after the N-terminal motif CXHY. This domain can be found in isolation in some proteins, but...; cl01802"
GENE ID: 5935675 PHYPADRAFT_138830 | hypothetical protein [Physcomitrella patens subsp. patens] (10 or fewer PubMed links)
Score = 322 bits (824), Expect = 3e-85, Method: Compositional matrix adjust. Identities = 141/223 (63%), Positives = 180/223 (80%), Gaps = 1/223 (0%)
Query 1675 CAHYRRRCLIRAPCCNGIFNCRHCHNEAMNANEADPSKRHDLPRHKVERVICSLCGLEQD 1734 CAHY+R C IRAPCCN +F+CRHCHN+A + NE D ++RH++ R VE+VICSLC EQD Sbjct 1 CAHYKRGCKIRAPCCNEVFDCRHCHNDAKSVNEKDDTQRHEIDRRLVEKVICSLCDHEQD 60
Query 1735 VHQVCSGCGVSMGDYYCSICRFFDDDVSKGQFHCDSCGICRVGGQEKFFHCDKCGCCYAV 1794 V QVC CGV MG+Y+CS C+FFDDD SK QFHCD CGICR+GG++ FFHCD+CGCCY+V Sbjct 61 VQQVCENCGVCMGEYFCSKCKFFDDDTSKRQFHCDKCGICRIGGRDNFFHCDRCGCCYSV 120
Query 1795 ALQKGHSCVENSMHHNCPVCFDYLFDSTSDITVLRCGHTIHSECLREMTLHAQFSCPVCS 1854 L++ H+CVE SMH +C +C +YLFDS DITVL CGHT+H ECL+EM H Q++CP+C+ Sbjct 121 ELRERHTCVEKSMHQDCAICMEYLFDSLMDITVLPCGHTLHLECLQEMYKHYQYNCPLCN 180
Query 1855 KSVCDMSSAWERLDQEIAATPMPDAYRNKLVWILCNDCGGSSE 1897 KSVCDMSS W+ +D EIA+ MP+ ++++VWILCNDCG +E Sbjct 181 KSVCDMSSVWKEIDLEIASIQMPEN-QSRMVWILCNDCGAKNE 222
Score = 295 bits (756), Expect = 2e-77, Method: Compositional matrix adjust. Identities = 136/223 (60%), Positives = 165/223 (73%), Gaps = 10/223 (4%)
Query 1269 CPHYRRRCRIRAPCCNEVFGCRHCHNEAKG-EEADPRERHQIRRESIRRVICLLCDTEQD 1327 C HY+R C+IRAPCCNEVF CRHCHN+AK E D +RH+I R + +VIC LCD EQD Sbjct 1 CAHYKRGCKIRAPCCNEVFDCRHCHNDAKSVNEKDDTQRHEIDRRLVEKVICSLCDHEQD 60
Query 1328 VQQVCEGCGVCMGSYFCSKCNLFDDDTDKHQYHCDSCGICRVGGADNFFHCDRCGCCYSV 1387 VQQVCE CGVCMG YFCSKC FDDDT K Q+HCD CGICR+GG DNFFHCDRCGCCYSV Sbjct 61 VQQVCENCGVCMGEYFCSKCKFFDDDTSKRQFHCDKCGICRIGGRDNFFHCDRCGCCYSV 120
Query 1388 ALQGKHVCVERAMHHNCPVCFEFLFDSVKQITVLQCGHTMHADCFNEMRLH------S 1439 L+ +H CVE++MH +C +C E+LFDS+ ITVL CGHT+H +C EM H + Sbjct 121 ELRERHTCVEKSMHQDCAICMEYLFDSLMDITVLPCGHTLHLECLQEMYKHYQYNCPLCN 180
Query 1440 RSVLDLSEYWQTLDKEIAATPMPEALRGKTVWMLCNDCNHKDE 1482 +SV D+S W+ +D EIA+ MPE + + VW+LCNDC K+E Sbjct 181 KSVCDMSSVWKEIDLEIASIQMPEN-QSRMVWILCNDCGAKNE 222 TAIR database results
Score E Sequences producing significant alignments: (bits) Value ref|NP_196717.3| catalytic [Arabidopsis thaliana] 747 0.0 ref|NP_197938.2| zinc finger (C3HC4-type RING finger) famil... 375 e-103 ref|NP_197683.1| zinc finger (C3HC4-type RING finger) famil... 367 e-101 ref|NP_001078621.1| zinc finger (C3HC4-type RING finger) fa... 358 2e-98 ref|NP_197366.1| zinc finger (C3HC4-type RING finger) famil... 338 3e-92 ref|NP_191856.4| protein binding / zinc ion binding [Arabid... 316 1e-85 ref|NP_001031037.1| LAG13 (LAG1 LONGEVITY ASSURANCE HOMOLOG... 301 2e-81 ref|NP_566769.1| LAG1 (Longevity assurance gene 1) [Arabido... 290 7e-78 ref|NP_172815.2| LAG13 (LAG1 LONGEVITY ASSURANCE HOMOLOG 3)... 236 2e-61 ref|NP_177615.2| protein binding / zinc ion binding [Arabid... 202 2e-51 ref|NP_188457.1| EMB2454 (EMBRYO DEFECTIVE 2454); protein b... 191 4e-48 ref|NP_188557.1| LAG1 HOMOLOG 2 (LONGEVITY ASSURANCE GENE1 ... 191 5e-48 ref|NP_173325.2| protein binding / zinc ion binding [Arabid... 190 7e-48 ref|NP_566651.1| zinc finger (C3HC4-type RING finger) famil... 40 0.018 ref|NP_565253.1| RHA2B (RING-H2 FINGER PROTEIN 2B); protein... 39 0.024 ref|NP_191705.1| BRH1 (BRASSINOSTEROID-RESPONSIVE RING-H2);... 39 0.025 ref|NP_178507.1| XERICO; protein binding / zinc ion binding... 39 0.026 ref|NP_973416.1| XERICO; protein binding / zinc ion binding... 39 0.026 ref|NP_567480.2| zinc finger (C3HC4-type RING finger) famil... 39 0.032 ref|NP_188629.1| zinc finger (C3HC4-type RING finger) famil... 39 0.037 ref|NP_177367.1| zinc finger (C3HC4-type RING finger) famil... 39 0.047 ref|NP_974274.1| zinc finger (C3HC4-type RING finger) famil... 39 0.047 ref|NP_188049.1| zinc finger (C3HC4-type RING finger) famil... 38 0.055 ref|NP_565942.1| RHC1A (RING-H2 finger C1A); protein bindin... 38 0.066 ref|NP_973651.1| RHC1A (RING-H2 finger C1A); protein bindin... 38 0.066 ref|NP_973652.1| RHC1A (RING-H2 finger C1A); protein bindin... 38 0.066
>ref|NP_196717.3| catalytic [Arabidopsis thaliana] Length = 982 /note="Dehydrogenases with pyrrolo-quinoline quinone (PQQ)
as cofactor, like ethanol, methanol, and membrane bound glucose dehydrogenases. The alignment model contains an 8-bladed beta-propeller; cl09980"
Score = 747 bits (1928), Expect = 0.0, Method: Composition-based stats. Identities = 409/967 (42%), Positives = 594/967 (61%), Gaps = 108/967 (11%)
Query: 23 AIYEDQVGLWDWHQEYIGKVTHAVFQT------49 ++YEDQ GL DWHQ YIGKV HAVF T Sbjct: 21 SLYEDQAGLTDWHQRYIGKVKHAVFHTQKTGRKRVIVSTEENVVASLDLRHGEIFWRHVL 80
Query: 50 ------ASGKKRVIVATEKSVVASLNLRSGE-IW------76 A GK + +++E S + + NL G+ +W Sbjct: 81 GTKDAIDGVGIALGKYVITLSSEGSTLRAWNLPDGQMVWETSLHTAQHSKSLLSVPINLK 140
Query: 77 ------IVYVVSSADGRLLWTSDLLDERL-AQTSLSFEGKNI-YVAGFSGSSL 121 ++ VS+ DG +LW D E Q L G +I YV GF SS Sbjct: 141 VDKDYPITVFGGGYLHAVSAIDGEVLWKKDFTAEGFEVQRVLQAPGSSIIYVLGFLHSSE 200
Query: 122 AL-FRIDASTGAFTTLKTTEPLNPSSF-----VLSSGVFAALDTQGNIVT--GLMEAEVV 173 A+ ++ID+ +G K+T + P F +SS LD+ +I+ G ++ ++ Sbjct: 201 AVVYQIDSKSGEVVAQKST--VFPGGFSGEISSVSSDKVVVLDSTRSILVTIGFIDGDI- 257 Query: 174 ELQKTSLATLLDSPVSSAQLLPDKIPGGCVLSTDGGSIFVLGLDKKGVEVLQQIQGPPVV 233 QKT ++ L++ +A++L + + + +IFV DK +EV+ + + Sbjct: 258 SFQKTPISDLVEDS-GTAEILSPLLSNMLAVKVNKRTIFVNVGDKGKLEVVDSLSDETAM 316
Query: 234 SNSI-VLDGTFAQSFLQHINSK-EIRVRVLSGKEWIETAEETVEVDPNKGGVQKVFMNAY 291 S+S+ V D A + + H S+ + V++++ + ET+++D N+G V KVFMN Y Sbjct: 317 SDSLPVADDQEAFASVHHEGSRIHLMVKLVNDLNNV-LLRETIQMDQNRGRVHKVFMNNY 375
Query: 292 IKTDRSRGFRVLIVGQDHSLALLQQGKVVWSREEALASVVDTLTAELPLEKAGVSVAEVE 351 I+TDRS GFR LIV +DHSL LLQQG +VWSREE LASV D TAELPLEK GVSVA+VE Sbjct: 376 IRTDRSNGFRALIVMEDHSLLLLQQGAIVWSREEGLASVTDVTTAELPLEKDGVSVAKVE 435
Query: 352 HDLYEWLKGHVLRLKSTLMLATAEEQTALQALRLNNADKTKMTRDHNGFRKLIVVLTSSG 411 H L+EWLKGHVL+LK +L+LA+ E+ A+Q LR+ ++ K K+TRDHNGFRKLI+ LT +G Sbjct: 436 HTLFEWLKGHVLKLKGSLLLASPEDVVAIQDLRVKSSGKNKLTRDHNGFRKLILALTRAG 495
Query: 412 KLFALHTGNGGIVWSRFIPELSTKGS------LKLYPWRIPHKH-VDENAVALVL---GS 461 KLFALHTG+G IVWS + S S + LY W++PH H +DEN LV+ GS Sbjct: 496 KLFALHTGDGRIVWSMLLNSPSQSQSCERPNGVSLYQWQVPHHHAMDENPSVLVVGKCGS 555
Query: 462 SHDGTGFAAWVDMLTGSVQETLALPYSVKVALALPVVDSSERRLHLLIDDQNKAHLYPTS 521 G ++VD+ TG + + +SV + LP+ DS E+RLHL+ D HLYP + Sbjct: 556 DSSAPGVLSFVDVYTGKEISSSDIGHSVVQVMPLPITDSKEQRLHLIADTVGHVHLYPKT 615
Query: 522 DESLSLFEKYMQNVYFYIADKEAGQIEGYNIKSQVDAGEEGGLVFQSQKIWSVLFPKDSE 581 E+LS+F++ QNVY+Y + + G I G+ +K F ++++W+V+FP +SE Sbjct: 616 SEALSIFQREFQNVYWYTVEADDGIIRGHVMKGSCSGETADEYCFTTRELWTVVFPSESE 675
Query: 582 TIAAITTRRADEMVHTQAKVLGNRDVWYKYLNKNMVFVATVTPQDS-RVGAANPEETWLV 640 I + TR+ +E+VHTQAKV ++D+ YKY+++N++FVATV+P+ + +G+ PEE+ LV Sbjct: 676 KIISTLTRKPNEVVHTQAKVNTDQDLLYKYVSRNLLFVATVSPKGAGEIGSVTPEESSLV 735
Query: 641 AYLIDSVTGQILHRVSHAHAQGPVHVVFSENWVVYCYFNVRNHRHEMSVLEVYDKS-ADG 699 YLID++TG+ILHR+SH QGPVH VFSENWVVY YFN+R H++E++V+E+YD+S A+ Sbjct: 736 VYLIDTITGRILHRLSHQGCQGPVHAVFSENWVVYHYFNLRAHKYEVTVVEIYDQSRAEN 795
Query: 700 KDVLQLMLGRYNASVPFSSFSPRNLEVKGQSYFFPSTVRTMSVTFTARGITGKQILVGTI 759 K+V +L+LG++N + P +S+S + K QSYFF +V+T++VT TA+GIT KQ+L+GTI Sbjct: 796 KNVWKLILGKHNLTAPITSYSRPEVFTKSQSYFFAQSVKTIAVTSTAKGITSKQLLIGTI 855
Query: 760 GNQVIALDKRFLDPRRSADPTPMEREEGVIPLSEGLPLFPQSYLTHAARVEELRGIISVP 819 G+Q++ALDKRF+DPRR+ +P+ E+EEG+IPL++ LP+ PQ+Y+TH+ +VE LRGI++ P Sbjct: 856 GDQILALDKRFVDPRRTLNPSQAEKEEGIIPLTDTLPIIPQAYVTHSHKVEGLRGIVTAP 915
Query: 820 ARLESTCLVFAYGIDLFFTRTAPSRTYDSLTEDFSYXXXXXXXXXXXXXXXXXXXXSQRK 879 ++LEST VFAYG+DLF+TR APS+TYDSLT+DFSY S++K Sbjct: 916 SKLESTTHVFAYGVDLFYTRLAPSKTYDSLTDDFSYALLLITIVALVAAIYITWVLSEKK 975
Query: 880 ELREKWK 886 EL EKW+ Sbjct: 976 ELSEKWR 982
>ref|NP_197938.2| zinc finger (C3HC4-type RING finger) family protein [Arabidopsis thaliana] Length = 308
Score = 375 bits (962), Expect = e-103, Method: Composition-based stats. Identities = 160/272 (58%), Positives = 199/272 (73%), Gaps = 5/272 (1%) Query: 1630 ESGLHSTLSHQIEIATAAEVFSQESLARGVEEQI--RALKEGVMEYGCAHYRRRCLIRAP 1687 E G S SH I +E +L R E + + L G+MEYGC HYRRRC IRAP Sbjct: 19 EKGEMSRHSHPHSINEESE---SSTLERVAAESLTNKVLDRGLMEYGCPHYRRRCCIRAP 75
Query: 1688 CCNGIFNCRHCHNEAMNANEADPSKRHDLPRHKVERVICSLCGLEQDVHQVCSGCGVSMG 1747 CCN IF C HCH EA N D +RHD+PRH+VE+VIC LCG EQ+V Q+C CGV MG Sbjct: 76 CCNEIFGCHHCHYEAKNNINVDQKQRHDIPRHQVEQVICLLCGTEQEVGQICIHCGVCMG 135
Query: 1748 DYYCSICRFFDDDVSKGQFHCDSCGICRVGGQEKFFHCDKCGCCYAVALQKGHSCVENSM 1807 Y+C +C+ +DDD SK Q+HCD CGICR+GG+E FFHC KCGCCY++ L+ GH CVE +M Sbjct: 136 KYFCKVCKLYDDDTSKKQYHCDGCGICRIGGRENFFHCYKCGCCYSILLKNGHPCVEGAM 195
Query: 1808 HHNCPVCFDYLFDSTSDITVLRCGHTIHSECLREMTLHAQFSCPVCSKSVCDMSSAWERL 1867 HH+CP+CF++LF+S +D+TVL CGHTIH +CL EM H Q++CP+CSKSVCDMS WE+ Sbjct: 196 HHDCPICFEFLFESRNDVTVLPCGHTIHQKCLEEMRDHYQYACPLCSKSVCDMSKVWEKF 255
Query: 1868 DQEIAATPMPDAYRNKLVWILCNDCGGSSEGQ 1899 D EIAATPMP+ Y+N++V ILCNDCG +E Q Sbjct: 256 DMEIAATPMPEPYQNRMVQILCNDCGKKAEVQ 287
Score = 323 bits (829), Expect = 5e-88, Method: Composition-based stats. Identities = 138/226 (61%), Positives = 164/226 (72%), Gaps = 9/226 (3%)
Query: 1266 KYGCPHYRRRCRIRAPCCNEVFGCRHCHNEAKGE-EADPRERHQIRRESIRRVICLLCDT 1324 +YGCPHYRRRC IRAPCCNE+FGC HCH EAK D ++RH I R + +VICLLC T Sbjct: 60 EYGCPHYRRRCCIRAPCCNEIFGCHHCHYEAKNNINVDQKQRHDIPRHQVEQVICLLCGT 119
Query: 1325 EQDVQQVCEGCGVCMGSYFCSKCNLFDDDTDKHQYHCDSCGICRVGGADNFFHCDRCGCC 1384 EQ+V Q+C CGVCMG YFC C L+DDDT K QYHCD CGICR+GG +NFFHC +CGCC Sbjct: 120 EQEVGQICIHCGVCMGKYFCKVCKLYDDDTSKKQYHCDGCGICRIGGRENFFHCYKCGCC 179
Query: 1385 YSVALQGKHVCVERAMHHNCPVCFEFLFDSVKQITVLQCGHTMHADCFNEMRLH------1438 YS+ L+ H CVE AMHH+CP+CFEFLF+S +TVL CGHT+H C EMR H Sbjct: 180 YSILLKNGHPCVEGAMHHDCPICFEFLFESRNDVTVLPCGHTIHQKCLEEMRDHYQYACP 239
Query: 1439 --SRSVLDLSEYWQTLDKEIAATPMPEALRGKTVWMLCNDCNHKDE 1482 S+SV D+S+ W+ D EIAATPMPE + + V +LCNDC K E Sbjct: 240 LCSKSVCDMSKVWEKFDMEIAATPMPEPYQNRMVQILCNDCGKKAE 285
>ref|NP_197683.1| zinc finger (C3HC4-type RING finger) family protein [Arabidopsis thaliana] Length = 291
Score = 367 bits (943), Expect = e-101, Method: Composition-based stats. Identities = 149/228 (65%), Positives = 184/228 (80%)
Query: 1669 GVMEYGCAHYRRRCLIRAPCCNGIFNCRHCHNEAMNANEADPSKRHDLPRHKVERVICSL 1728 G YGC+HYRRRC IRAPCC+ IF+CRHCHNEA ++ + RH+LPRH+V +VICSL Sbjct: 21 GSGHYGCSHYRRRCKIRAPCCDEIFDCRHCHNEAKDSLHIEQHHRHELPRHEVSKVICSL 80
Query: 1729 CGLEQDVHQVCSGCGVSMGDYYCSICRFFDDDVSKGQFHCDSCGICRVGGQEKFFHCDKC 1788 C EQDV Q CS CGV MG Y+CS C+FFDDD+SK Q+HCD CGICR GG+E FFHC +C Sbjct: 81 CETEQDVQQNCSNCGVCMGKYFCSKCKFFDDDLSKKQYHCDECGICRTGGEENFFHCKRC 140
Query: 1789 GCCYAVALQKGHSCVENSMHHNCPVCFDYLFDSTSDITVLRCGHTIHSECLREMTLHAQF 1848 CCY+ ++ H CVE +MHHNCPVCF+YLFDST DITVLRCGHT+H EC ++M LH ++ Sbjct: 141 RCCYSKIMEDKHQCVEGAMHHNCPVCFEYLFDSTRDITVLRCGHTMHLECTKDMGLHNRY 200
Query: 1849 SCPVCSKSVCDMSSAWERLDQEIAATPMPDAYRNKLVWILCNDCGGSS 1896 +CPVCSKS+CDMS+ W++LD+E+AA PMP Y NK+VWILCNDCG ++ Sbjct: 201 TCPVCSKSICDMSNLWKKLDEEVAAYPMPKMYENKMVWILCNDCGSNT 248
Score = 325 bits (832), Expect = 3e-88, Method: Composition-based stats. Identities = 137/220 (62%), Positives = 162/220 (73%), Gaps = 9/220 (4%)
Query: 1267 YGCPHYRRRCRIRAPCCNEVFGCRHCHNEAKGE-EADPRERHQIRRESIRRVICLLCDTE 1325 YGC HYRRRC+IRAPCC+E+F CRHCHNEAK + RH++ R + +VIC LC+TE Sbjct: 25 YGCSHYRRRCKIRAPCCDEIFDCRHCHNEAKDSLHIEQHHRHELPRHEVSKVICSLCETE 84
Query: 1326 QDVQQVCEGCGVCMGSYFCSKCNLFDDDTDKHQYHCDSCGICRVGGADNFFHCDRCGCCY 1385 QDVQQ C CGVCMG YFCSKC FDDD K QYHCD CGICR GG +NFFHC RC CCY Sbjct: 85 QDVQQNCSNCGVCMGKYFCSKCKFFDDDLSKKQYHCDECGICRTGGEENFFHCKRCRCCY 144
Query: 1386 SVALQGKHVCVERAMHHNCPVCFEFLFDSVKQITVLQCGHTMHADCFNEMRLHSR----- 1440 S ++ KH CVE AMHHNCPVCFE+LFDS + ITVL+CGHTMH +C +M LH+R Sbjct: 145 SKIMEDKHQCVEGAMHHNCPVCFEYLFDSTRDITVLRCGHTMHLECTKDMGLHNRYTCPV 204
Query: 1441 ---SVLDLSEYWQTLDKEIAATPMPEALRGKTVWMLCNDC 1477 S+ D+S W+ LD+E+AA PMP+ K VW+LCNDC Sbjct: 205 CSKSICDMSNLWKKLDEEVAAYPMPKMYENKMVWILCNDC 244
>ref|NP_001078621.1| zinc finger (C3HC4-type RING finger) family protein [Arabidopsis thaliana] Length = 328
Score = 358 bits (919), Expect = 2e-98, Method: Composition-based stats. Identities = 151/258 (58%), Positives = 189/258 (73%), Gaps = 5/258 (1%)
Query: 1630 ESGLHSTLSHQIEIATAAEVFSQESLARGVEEQI--RALKEGVMEYGCAHYRRRCLIRAP 1687 E G S SH I +E +L R E + + L G+MEYGC HYRRRC IRAP Sbjct: 19 EKGEMSRHSHPHSINEESE---SSTLERVAAESLTNKVLDRGLMEYGCPHYRRRCCIRAP 75
Query: 1688 CCNGIFNCRHCHNEAMNANEADPSKRHDLPRHKVERVICSLCGLEQDVHQVCSGCGVSMG 1747 CCN IF C HCH EA N D +RHD+PRH+VE+VIC LCG EQ+V Q+C CGV MG Sbjct: 76 CCNEIFGCHHCHYEAKNNINVDQKQRHDIPRHQVEQVICLLCGTEQEVGQICIHCGVCMG 135
Query: 1748 DYYCSICRFFDDDVSKGQFHCDSCGICRVGGQEKFFHCDKCGCCYAVALQKGHSCVENSM 1807 Y+C +C+ +DDD SK Q+HCD CGICR+GG+E FFHC KCGCCY++ L+ GH CVE +M Sbjct: 136 KYFCKVCKLYDDDTSKKQYHCDGCGICRIGGRENFFHCYKCGCCYSILLKNGHPCVEGAM 195
Query: 1808 HHNCPVCFDYLFDSTSDITVLRCGHTIHSECLREMTLHAQFSCPVCSKSVCDMSSAWERL 1867 HH+CP+CF++LF+S +D+TVL CGHTIH +CL EM H Q++CP+CSKSVCDMS WE+ Sbjct: 196 HHDCPICFEFLFESRNDVTVLPCGHTIHQKCLEEMRDHYQYACPLCSKSVCDMSKVWEKF 255
Query: 1868 DQEIAATPMPDAYRNKLV 1885 D EIAATPMP+ Y+N++V Sbjct: 256 DMEIAATPMPEPYQNRMV 273 Score = 310 bits (794), Expect = 7e-84, Method: Composition-based stats. Identities = 131/214 (61%), Positives = 156/214 (72%), Gaps = 9/214 (4%)
Query: 1266 KYGCPHYRRRCRIRAPCCNEVFGCRHCHNEAKGE-EADPRERHQIRRESIRRVICLLCDT 1324 +YGCPHYRRRC IRAPCCNE+FGC HCH EAK D ++RH I R + +VICLLC T Sbjct: 60 EYGCPHYRRRCCIRAPCCNEIFGCHHCHYEAKNNINVDQKQRHDIPRHQVEQVICLLCGT 119
Query: 1325 EQDVQQVCEGCGVCMGSYFCSKCNLFDDDTDKHQYHCDSCGICRVGGADNFFHCDRCGCC 1384 EQ+V Q+C CGVCMG YFC C L+DDDT K QYHCD CGICR+GG +NFFHC +CGCC Sbjct: 120 EQEVGQICIHCGVCMGKYFCKVCKLYDDDTSKKQYHCDGCGICRIGGRENFFHCYKCGCC 179
Query: 1385 YSVALQGKHVCVERAMHHNCPVCFEFLFDSVKQITVLQCGHTMHADCFNEMRLH------1438 YS+ L+ H CVE AMHH+CP+CFEFLF+S +TVL CGHT+H C EMR H Sbjct: 180 YSILLKNGHPCVEGAMHHDCPICFEFLFESRNDVTVLPCGHTIHQKCLEEMRDHYQYACP 239
Query: 1439 --SRSVLDLSEYWQTLDKEIAATPMPEALRGKTV 1470 S+SV D+S+ W+ D EIAATPMPE + + V Sbjct: 240 LCSKSVCDMSKVWEKFDMEIAATPMPEPYQNRMV 273
>ref|NP_566769.1| LAG1 (Longevity assurance gene 1) [Arabidopsis thaliana] Length = 310 /note="Identical to LAG1 longevity assurance homolog 1 (LAG1) [Arabidopsis Thaliana] (GB:Q9LDF2); similar to LAG13 (LAG1 LONGEVITY ASSURANCE HOMOLOG 3) [Arabidopsis thaliana] (TAIR:AT1G13580.2); similar to Lag1 longevity assurance-like 3 [Brassica rapa] (GB:ABV89617.1); contains InterPro domain TRAM, LAG1 and CLN8 homology; (InterPro:IPR006634); contains InterPro domain Longevity assurance proteins LAG1/LAC1; (InterPro:IPR016439); contains InterPro domain Longevity-assurance protein (LAG1); (InterPro:IPR005547)"
Score = 290 bits (742), Expect = 7e-78, Method: Composition-based stats. Identities = 144/292 (49%), Positives = 186/292 (63%), Gaps = 6/292 (2%)
Query: 898 REIDPSFWDLVTLAPIFAIGFPVCRFFLDRFVLEKLSRKSVFGTHESKLRKLSDADRDAL 957 +E P++ DL L P+FA+ FP RF LDRFV EKL+ ++G + + + DR Sbjct: 14 QESFPTYQDLGFL-PLFAVFFPTIRFLLDRFVFEKLASLVIYGRMSTN-KSDNIKDRKKN 71
Query: 958 RKTQTKFKESGWKCVYYTTAEIFALYVTYNETWLTDSYSIWVGPGDQTWPNQTIKVKLKL 1017 KFKES WKC+YY +AE+ AL VTYNE W +++ W+GPGDQ WP+Q +K+KLK Sbjct: 72 SPKVRKFKESAWKCIYYLSAELLALSVTYNEPWFSNTLYFWIGPGDQIWPDQPMKMKLKF 131
Query: 1018 LXXXXXXXXXXXXXXLIFWETRRKDFGVGSFNILVEPVKFVVLYFGASRFARIGCVVLAL 1077 L L+FWETRR DFGV + + V V+ Y R R G V+LAL Sbjct: 132 LYMFAAGFYTYSIFALVFWETRRSDFGVSMGHHITTLVLIVLSYI--CRLTRAGSVILAL 189
Query: 1078 HDASDVFLELAKMSKYAGVRVVPDVLFGLFALSWVLLRLIYFPVWVIWGTSYLSIKAINI 1137 HDASDVFLE+ KMSKY G + + F LFALSWV+LRLIY+P W++W TSY I ++ Sbjct: 190 HDASDVFLEIGKMSKYCGAESLASISFVLFALSWVVLRLIYYPFWILWSTSYQIIMTVDK 249
Query: 1138 HLHRGYGPIYYYVTNTLLISLFVLHIYWWVLIYRMIVKQIR-AGVIGDDVRS 1188 H GPI YY+ NTLL L VLHI+WWVLIYRM+VKQ++ G + +DVRS Sbjct: 250 EKHPN-GPILYYMFNTLLYFLLVLHIFWWVLIYRMLVKQVQDRGKLSEDVRS 300
Multalign to Arabidopsis Zinc-finger protein C3HC4-type RING finger The Blast results indicate that this gene prediction model is actually comprised of at least 3 different proteins, not just one. It seems as though it is mainly comprised of zinc-finger type proteins. This is one of the better Blast hits (using TAIR database). It has a pretty good alignment to the gene prediction model.
Multalign to Arabidopsis catalytic gene Multalign to Physcomitrella unknown protein These two Multalign alignments are fairly good. Both Physcomitrella and Arabidopsis have an exon (or extension of an exon) that is not present in the gene model prediction. Multalign Arabidopsis catalytic gene to Physcomitrella unknown protein
This alignment just shows that the two sequences that came up as strong Blast hits are very similar. Perhaps the Physcomitrella unknown protein has similar function as the Arabidopsis protein? Multalign to Arabidopsis longevity assurance protein
This alignment is also pretty good. There are numerous regions that seem to be conserved.
Based on the Blast results and Multalign alignments it seems as though this gene prediction model is comprised of 3 genes, a long gene with some catalytic function, a longevity assurance gene, and a zinc finger protein.