Assignment 2 (50 points) This assignment will be dedicated to protein analysis

The goal of this exercise: * To learn about the basic protein structure prediction tools available on the WWW. You will take a protein sequence, and predict features related to its primary, secondary and tertiary structures. *To get experience with phylogenic programs * To use some of the important modeling WWW servers

Part-1: Sequence analysis (Sub-cellular localization)

1. From NCBI or SwissProt databases pick proteins that allow you to test the following purposes: -Signal peptide (secretory pathway) -Chloroplast -Mitochondria -Peroxisome -N-glycosylation -Transmembrane domains -GPI-anchoring -Hydrophobicity

2. Use the websites from today’s lecture 3- Submit: *Graphical presentations whenever is possible. * Discuss and interpret your data and indicate which program you used.

Part-2: Phylogeny analysis.

1. Pick a protein sequence. Use your favorite protein sequence, or pick any random sequence, or use what you used in Assignment 1. Save it as FASTA format. 2. Search Swiss-Prot or NCBI using BLAST search program 3- Pick randomly between 10 and 15 sequences (proteins). Submit the list of these sequences (not the sequences). 4-Using this web-based program (http://align.genome.jp/ ), perform several trees. -Submit three types of trees. -Answer: *What the difference between the methods that are used to create the trees? * Discuss your trees.

1 Part-3: 3-D fold prediction.

We will be visiting these websites. The easy one to work with is the first (PHYRE), but I encourage you to try the two other sites. Protein Homology/analogY Recognition Engine (Phyre2, NEW!) ( http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index ) for structure modeling PredictProtein for secondary structure prediction (http://www.predictprotein.org/newwebsite/submit.html) UCLA Fold Recognition (http://www.doe-mbi.ucla.edu/Services/FOLD/)

1. Fetch these 2 protein sequences from NCBI: AAV51943 and At3g46550 or your protein * Report some details about the proteins * For further processing, save the protein sequences on your computer as FASTA format. * Or you can use the following sequences without creating a FASTA file 2. Create a 3-D model for your protein sequences * go to Phyre website and submit your sequences. * You will get by email the link to the results of the modeling. Or you can wait and get them for directly from the interactive web * Report top template information. * How many residues out of the sequence have been modeled? Confidence? * Submit a picture of the model for each protein that you can view via Jmol program. * What secondary elements can you identify? at what positions? Can you attempt to change the presentation of the molecule?

Sequences you can use: >alb8 VTGLRAGHRRVNENAWDVRTPVHLGSSFYDVPSVRAGRCTLGERELTLAGDVGGARLLHLQCHFGLDTLSWARRGARATGVDFSRAAVTA ARELSAELGVPAVFHRADVQDLPAELSGFDLAVTTYGVTCWLEDLSAWAASVHGALRPGGRFLLVEFHPLLELALPGAVSGHGSYFGSPDPPP TATSGTYTDPDAPIFYEEYRWQHPVGDVVNALIGAGFELTGLGEYPDSPVPLFDERLAGSPLAPAPRSYSITARRKS >alb7 SSGLVPRGSGMKETAAAKFERQHMDSPDLGTGGGSGIEGRMAALFGALGRDQERARATLNLVPSENVLSPLARVPFALDAYARYFFDHKRM FGAWSFFGGTGAGAIEQETLLPLLRDQAQAPFVNPQPISGLNCMTAAMSALASPGDTVVLIPTDAGGHMSTAGVARRLGLHVLTLPMADAHT VDHEALGALLRSERPALVYLDQSTVLFPLDCAPLREVIDRESPRTLLHFDSRHLNGLILSKALANPLDRGADTFGGSTHKTLAGPHKGFLATRR EDLSERIDASTADLVSHHHPAEVLSLAVTLLELRDRDGAGYGAAILANARALAARLHERGAAVAAADRGFTGCHQVWLDTRSADEGVAMAD RLYAAGVAVNRVGVPGVRGAAFRLSSAEVTRCGATEADSTELADIIADVVVDGAPTDRVASRAAALRARLYRPRYCFEDDALEDPAVPEWL RELAAAVGRGVYGEDR >alb4 NSYFEHPSIAVLDRDEILFAVEDERFTGIKHGRTYSPYQTYLPVASLYHGLAAVDATVDDIDEIGYSYHRWTHLRSLAGCFTGKRVSGFREELT AFLSLVNLRQAMRSGYDIPRRYRDRIFPEKLARVPFREYHHHLAHAASAFHCSDFEEALVVVADGAGERSATSVYRGRGGQLERIGGVDLPNS LGIFYSMITAHLGFEPFSDEFKVMGLAAYGEPAHRQACSRILRLGPDGSYVLDLAALRSLDTLLGPARRPGEPLAQRHKDIARSVQDRLTEALH HVLGHWLGRTGLRNVCLAGGTFLNCVANGSLARDPRIEGIFVQPAAHDAGTAIGAAALSAVRRGGGPKVVFRSAALGTSHTAAACEKACAA AEVPHVRPAPEDMIDAVARRLADGEVVGVFRGRMEFGPRALGMRSLLASPADPAMRDRLNRIKGREDFRPVAPIVLREHFDTYFDGQPNRYM LFTTRALERTVREAPSAVHVDGTARVQCVQEDEDPWLHALITRFAELTGLPMVINTSLNVRGKPIVESPAEALACLGSTAMNLLVLEDVLAGP GAPDAVRQAVGSAGSGVAEGTA

2