BIT150 Fall 2008 Homework 3

Total Page:16

File Type:pdf, Size:1020Kb

BIT150 Fall 2008 Homework 3

BIT150 – Fall 2008 – Homework 3 Due on Thursday October 15th by email to TA: [email protected] as Hwk3_Lastname BEFORE the Lab

1. 35 points The following is a multiple sequence alignment of a 41-bp fragment from a putative plant cytochrome P450 gene from rice, maize, sorghum, and rye:

1.1. Using the Jukes and Kantor 1-parameter model showed below, A C G T A - 1 1 1 C 1 - 1 1 G 1 1 - 1 T 1 1 1 - - Calculate pair-wise distances between the sequences and construct BY HAND a distance matrix. - Show your calculations. - Present the distance matrix.

1.2. Using the distance-based method UPGMA, - Construct BY HAND a phylogenetic tree based on the distance matrix created in 1.1. - Provide distances for all the branches. - Include all your intermediate matrices. - Show your calculations. - Manually draw the phylogenetic tree.

1 2. 10 points Sequences from the flavanoid 3’ hydroxylase gene, Fop1, are provided below. >Triticum monoccocum MDHSVLLLLASLAAVAVAAVWHLRSHGRRTKLPLPPGPRGWPVLGNLPQLGAMPHHTMAALARQHGPLFRLRFGSVEVVVAASAKVARS FLRAHDANFSDRPPTSGAEHLAYNYQDLVFAPYGARWRALRKLCALHLFSARALDALRTIRQDEARLMVTHLLSSSSPAGVAVNLCAIN VCATNALARAAIGRRMFGDGVGEGAREFKDMVVELMQLAGVLNIGDFVPALRWLDPQGVVAKMKRLHRRYDRMMDGFISERGQHAGEME GNDLLSVMLATMRWQSPADAGEEDGIKFTEIDIKALLLNLFTAGTDTTSSTVEWALAELIRDPCILKQLQHELDGVVGNDRLVTEADLP RLTFLAAVIKETFRLHPATPLSLPRVAAEDCEVDGYHVSKGTTLIMNVWAIARDPASWGPDPLEFRPVRFLPGGLHESADVKGGDYELI PFGAGRRICAGLGWGLRMVTLMTAMLVHAFDWSLVDGTTPEKLNMEEAYGQTLQRAVPLVVQPVPRLLSSAYTV

>Zea mays MCAMAREYGPLFRLRFGSAEVVVAASARVAAQFLRAHDANFSNRPPNSGAEHVAYNYQDLVFAPYGSRWRALRKLCALHLFSAKALDDL RGVREGEVALMVRELARQGERGRAAVALGQVANVCATNTLARATVGRRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALAWLDP QGVVGRMKRLHRRYDDMMNGIIRERKAAEEGKDLLSVLLARMREQQPLAEGDDTRFNETDIKALLLNLFTAGTDTTSSTVEWALAELIR HPDVLRKAQQELDAVVGRDRLVSESDLPRLTYLTAVIKETFRLHPSTPLSLPRVAAEECEVDGFRIPAGTTLLVNVWAIARDPEAWPEP LEFRPARFLPGGSHAGVDVKGSDFELIPFGAGRRICAGLSWGLRMVTLMTATLVHALDWDLADGMTADKLDMEEAYGLTLQRAVPLMVR PAPRLLPSAYAE

>Oryza sativa MDVVPLPLLLGSLAVSAAVWYLVYFLRGGSGGDAARKRRPLPPGPRGWPVLGNLPQLGDKPHHTMCALARQYGPLFRLRFGCAEVVVAA SAPVAAQFLRGHDANFSNRPPNSGAEHVAYNYQDLVFAPYGARWRALRKLCALHLFSAKALDDLRAVREGEVALMVRNLARQQAASVAL GQEANVCATNTLARATIGHRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALRWLDPQGVVAKMKRLHRRYDNMMNGFINERKAG AQPDGVAAGEHGNDLLSVLLARMQEEQKLDGDGEKITETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKEAQHELDTVVGRGR LVSESDLPRLPYLTAVIKETFRLHPSTPLSLPREAAEECEVDGYRIPKGATLLVNVWAIARDPTQWPDPLQYQPSRFLPGRMHADVDVK GADFGLIPFGAGRRICAGLSWGLRMVTLMTATLVHGFDWTLANGATPDKLNMEEAYGLTLQRAVPLMVQPVPRLLPSAYGV

>Sorghum bicolor MDVPLPLLLGSLAVSVVVWCLLLRRGGNGKGKGKRPLPPGPRGWPVLGNLPQVGSHPHHTMCALAKEYGPLFRLRFGSAEVVVAASARV AAQFLRAHDANFSNRPPNSGAEHVAYNYQDLVFAPYGSRWRALRKLCALHLFSAKALDDLRGVREGEVALMVRELARHQHQHAGVPLGQ VANVCATNTLARATVGRRVFAVDGGEEAREFKDMVVELMQLAGVFNVGDFVPALAWLDLQGVVGKMKRLHRRYDDMMNGIIRERKAVEE GKDLLSVLLARMREQQSLADGEDSMINETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKKAQEELDAVVGRDRLVSESDLPRL TYLTAVIKETFRLHPSTPLSLPRVAAEECEVDGFRIPAGTTLLVNVWAIARDPEAWPEPLQFRPDRFLPGGSHAGVDVKGSDFELIPFG AGRRICAGLSWGLRMVTLMTATLVHALDWDLADGMTAYKLDMEEAYGLTLQRAVPLMVRPAPRLLPSAYAAE

>Phyllostachys edulis MDLPLPLVLSTLAVSAIVCYVLFFRAGKARRRAPLPPGPRGWPVLGNLPQLGGKTHQTLHVMTKVYGPLLRLRFGSSDVVVAGSAAVAE QFLRIHDAKFSNRPPNSGGEHMAYNYQDVVFGPYGPRWRAMRKVCAVNLFSARALDDLRAVRERETALMVRSLVEASAPRGAPAVPLGK AVNVCTTNALSRAAVGRRVFAAGSEVAKEFKEIVLEVMQVGGVLNVGDFVPALRWLDPQGVVAKMKKLHRRYDDMMNAIIGERRAGVKP AGEEGKDLLGLLLAMMQEEQPLAGGEEDKITDTDIKALTLVS 2.1. Construct phylogenetic tees using NJ and UPGMA methods: - Use Number of differences as the substitution model. - Use bootstrap as the test of inferred phylogeny, with 1,000 replications. - Present the trees in your homework.

2.2. What are the bootstrap values indicating in these trees?

2 3. 15 points From the following trees (A, B, C, D):

3.1. Construct BY HAND: - a strict consensus tree (groups present in ALL trees); - a 50% majority-rule consensus tree (groups in >50% of the trees). 3.2. What are consensus trees used for?

4. 10 points From the following induced multiple sequence alignment:

Induced multiple sequence alignment of a segment of the ‘4-coumarate Co-A Ligase’ gene (‘-‘ indicates a gap). H1 T C T A C T G A C H2 A C - A C G G A C H3 A C T A C G A A T H4 A C T G T G - - C

4.1. Calculate BY HAND the ‘sum-of-pairs’ distance score, scoring transitions (A<->G and C<->T) as 1 unit of distance and transversions as 2 unit of distance (Kimura 2- Parameter model) and affine gap penalties: gap opening 3; gap extension 1. A C G T A - 2 1 2 C 2 - 2 1 G 1 2 - 2 T 2 1 2 - - Indicate all your calculations within the table provided: Kimura 2-parameter H1 vs. H2 H1 vs. H3 H1 vs. H4 H2 vs. H3 H2 vs. H4 H3 vs. H4 Sum of Pairs

3 5. 30 points Given the following 6 CCT domain protein sequences:

>T._urartu_ZCCT1 MSMSCGLCGANNCPRLMVSPIHHRHHHHQEHQLREHQFFAQGNHHHHHPVPLPPANFDHSRTWTTPFHETAAAGNSSRLTLEVGAGGRP MAHLVQPPARAHIVPFYGGAFTNTISNEAIMTIDTEMMVGPAHYPTMQERAAKVMRYREKRKRRRYDKQIRYESRKAYAELRPRVNGRF VKVPEAMASPSSPASPYDPSKLHLRWFR

>Ae._tauschii_ZCCT-D1 MSMSCGLCGPNNCPRLMVSPIHHHHHQEHQLREHQFFAQGNHHHQHHGAAADHPVPLPPANFDHRRTWTTPFHETAAAGSSISRLTLEV GAGGRHMAHLSSARAHIVPFYGGAFTNTISNEAIMTIDTEMMVGPAHYPTMQERAAKVMRYREKRKRRRYDKQIRYESRKAYAELRPRV NGRFVKVPEAMASPSSPASPYDPSKLHLGWLR

>ZCCT-S2_Ae._speltoides MSMSCGLCGASNCPHHMISPVLQHHQEHGLREYQFFAQGHHHHHHDGTAADYPPPPPANCHHCKSWTTPFHETAAAGNSSRLTLEVDAG GQHLAHLLQPPAPPRATIVPFREGAFTSTISNATIMTIDTEMMVGAAHNPTMQERHAKVMRYREKRKRRRYDKQIRYESRKAYAKLRPR VNGRFVKVPEAAVSPSPPASPYDPSKLNLGLFR

>ZCCT2_T._tauschii MSMSCGLCGASNCPHHMNSPVLHHHHHHQEHRLCEYQFFAQGQHHHHHGAAADYPPPPPANCHHRRSWTTPFHETAAAGNSSRLTLEVD AGGQHTAHLLQPPAPPRATIVPFCGGAFTSTISNATIRTIDTEMMVGAAHNPTMQEREAKVMRYREKRKRRRYDKQIRYESRKAYAELR PRVNGRFVKVPEATASPSPPTSPYDPSKLHLGWFR

>Os_AAL7978 MSAASGAACGVCGGGVGECGCLLHQRRGGGGGGGGGGVRCGIAADLNRGFPAIFQGVGVEETAVEGDGGAQPAAGLQEFQFFGHDDHDS VAWLFNDPAPPGGTDHQLHRQTAPMAVGNGAAAAQQRQAFDAYAQYQPGHGLTFDVPLTRGEAAAAVLEASLGLGGAGAGGRNPATSSS TIMSFCGSTFTDAVSSIPKDHAAAAAVVANGGLSGGGGDPAMDREAKVMRYKEKRKRRRYEKQIRYASRKAYAEMRPRVKGRFAKVPDG ELDGATPPPPSSAAGGGYEPGRLDLGWFRS

>OSI Os_AP005307 MGMANEESPNYQVKKGGRIPPRSSLIYPFMSMGPAAGEGCGLCGADGGGCCSRHRHDDDGFPFVFPPSACQGIGAPAPPVHEFQFFGND GGGDDGESVAWLFDDYPPPSPVAAAAGMHHRQPPYDGVVAPPSLFRRNTGAGGLTFDVSLGERPDLDAGLGLGGGGGRHAEAAASATIM SYCGSTFTDAASSMPKEMVAAMADDGESLNPNTVVGAMVEREAKLMRYKEKRKKRCYEKQIRYASRKAYAEMRPRVRGRFAKEPDQEAV APPSTYVDPSRLELGQWFR

5.1. Use tCOFFEE to produce an alignment of conserved protein regions.

5.2. Produce a multiple sequence alignment using ClustalW. Using BOXSHADE, prepare a publishable alignment for these sequences. Paste the alignment into your homework document. -Between tCOFFEE and ClustalW which program seems to have better identified conserved regions between these genes? 5.3. Construct a phylogenetic tree with the NJ method (using Number of differences as the substitution model) with bootstrap values. Include the tree here.

4

Recommended publications