Bioinformatics - 2 - Investigation of C
Total Page:16
File Type:pdf, Size:1020Kb
Advanced Skills in Biochemistry: Bioinformatics - 2 - Investigation of C. elegans protein kinase polypeptides Exercise 1 The purpose of this workshop is to familarise you with some of the databases and tools that are available for the analysis of proteins. For the purposes of this exercise, you will again be investigating the family of catalytic subunits of the cyclic AMP-dependent protein kinase (PK-A) in the free-living nematode, C. elegans. 1. The C. elegans polypeptide sequences we are interested in are encoded by the F47F2.1 gene on the X chromosome and the kin-1 gene on chromosome 1. In addition, we need the sequence of the mouse PK-A - catalytic subunit. The easiest way of obtaining each of these sequences is by making use of the Sequence Retrieval System (SRS). You should therefore visit the SRS Homepage: http://srs6.ebi.ac.uk/ 2. From the home page, select the Library Page tab. Then, under the UniProt Universal Protein Resource section, select the UniProtKB and UniParc databases. Then select the Standard Query Form. You should then carry out separate searches for the following accession numbers (i.e. select AccessionNumber from the drop-down menu in place of the default AllText): P05132, UPI000002AC77 and Q7JP68. These searches should find records for the polypeptide sequence for the mouse -catalytic subunit, the polypeptide sequence of the catalytic subunit encoded by the kin-1 gene and the F47F2.1b polypeptide sequence. Each of the records should be saved locally (e.g. on the desktop). 3. You should now prepare a simple text file containing each of the sequences in the following format: >mouse MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLV KHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVM EYVAGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYI QVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFA DQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATT DWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF >kin-1 MLKFLKPKSSDEGSSKDNKNSASLKEFLDKAREDFKQRWENPAQNTACLDDFDRIKTLGT GSFGRVMLVKHKQSGNYYAMKILDKQKVVKLKQVEHTLNEKRILQAIDFPFLVNMTFSLK DNSNLYMVLEFISGGEMFSHLRRIGRFSEPHSRFYAAQIVLAFEYLHSLDLIYRDLKPEN LLIDSTGYLKVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEM AAGYPPFFADQPIQIYEKIVSGKVKFPSHFSNELKDLLKNLLQVDLTKRYGNLKNGVADI KNHKWFGSTDWIAIYQKKIEAPFLPKCRGPGDASNFDDYEEEPLRISGTEKCAKEFAEF >F47F2.1 MVILSHHKRSPPLLMRLFRICRFFRRLMSSSTSSVESVEDESCSNECSASFTFDTNNNSR GNNQVNELAEETHMKLSITPTRESFSLSQLERIITIGKGTFGRVELARDKITGAHYALKV LNIRRVVDMRQTQHVHNEKRVLLQLKHPFIVKMYASEKDSNHLYMIMEFVPGGEMFSYLR ASRSFSNSMARFYASEIVCALEYIHSLGIVYRDLKPENLMLSKEGHIKMADFGFAKELRD RTYTICGTPDYLAPESLARTGHNKGVDWWALGILIYEMMVGKPPFRGKTTSEIYDAIIEH KLKFPRSFNLAAKDLVKKLLEVDRTQRIGCMKNGTQDVKDHKWFEKVNWDDTLHLRVEPP IVPTLYHPGDTGNFDDYEEDTTGGPLCSQRDRDLFAEW You should take particular care to remove all non-sequence characters (including spaces) from the sequence data. In addition, the header line for each sequence must be in the format shown above (i.e. >name). The text file should also be incorporated into a Word document. 4. Once you have prepared this ‘multiple sequence’ file you are ready to carry out a multiple alignment exercise using these sequences. In order to do this you should visit the following site: http://www.ebi.ac.uk/Tools/clustalw2/ Here you should paste the contents of your ‘multiple sequence’ text file into the space provided and then run the ClustalW multiple alignment program. You will (eventually) receive an alignment for these three sequences. The alignment is provided as a simple text-only representation (although you can select ‘show colors’ to see a coloured representation. (note. If the above site web site is not working, you can try the 1 following alternative: http://clustalw.ddbj.nig.ac.jp/top-e.html the disadvantage with this site is that it only provides you with the simple text-only representation of the alignment). 5. You should select the alignment (be sure to include the header entitled CLUSTAL 2.0.10 multiple sequence alignment), save to the clipboard and then go to the BOXSHADE site: http://www.ch.embnet.org/software/BOX_form.html Once at the BOXSHADE site, paste your alignment into the BOXSHADE form, select other for the input alignment sequence format, select RTF_new for the output option and select 1.0 for the fraction of sequences option. (note. if you are using the alignment provided by the alternative web page (see above), you will need to select the aln option for the input alignment sequence format and you must take care to include the header text from the alignment). Now run the BOXSHADE program and view the output using Word or another word processor. Hopefully you will see an alignment similar to that shown here, where a black background represents an identical residue in each sequence and a gray background represents a similar residue in each sequence. Mouse 1 ----------------------------------------------MGNAAAAKKGSEQE kin-1 1 --------------------------------------MLKFLKPKSSDEGSSKDNKNSA F47F2.1 1 MVILSHHKRSPPLLMRLFRICRFFRRLMSSSTSSVESVEDESCSNECSASFTFDTNNNSR mouse 15 SVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKI kin-1 23 SLKEFLDKAREDFKQRWENPAQNTACLDDFDRIKTLGTGSFGRVMLVKHKQSGNYYAMKI F47F2.1 61 GNNQVNELAEETHMKLSITPTRESFSLSQLERIITIGKGTFGRVELARDKITGAHYALKV mouse 75 LDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLR kin-1 83 LDKQKVVKLKQVEHTLNEKRILQAIDFPFLVNMTFSLKDNSNLYMVLEFISGGEMFSHLR F47F2.1 121 LNIRRVVDMRQTQHVHNEKRVLLQLKHPFIVKMYASEKDSNHLYMIMEFVPGGEMFSYLR mouse 135 RIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKG kin-1 143 RIGRFSEPHSRFYAAQIVLAFEYLHSLDLIYRDLKPENLLIDSTGYLKVTDFGFAKRVKG F47F2.1 181 ASRSFSNSMARFYASEIVCALEYIHSLGIVYRDLKPENLMLSKEGHIKMADFGFAKELRD mouse 195 RTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSG kin-1 203 RTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSG F47F2.1 241 RTYTICGTPDYLAPESLARTGHNKGVDWWALGILIYEMMVGKPPFRGKTTSEIYDAIIEH mouse 255 KVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAP kin-1 263 KVKFPSHFSNELKDLLKNLLQVDLTKRYGNLKNGVADIKNHKWFGSTDWIAIYQKKIEAP F47F2.1 301 KLKFPRSFNLAAKDLVKKLLEVDRTQRIGCMKNGTQDVKDHKWFEKVNWDDTLHLRVEPP mouse 315 FIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF- kin-1 323 FLPKCRGPGDASNFDDYEEEPLRISGTEKCAKEFAEF- F47F2.1 361 IVPTLYHPGDTGNFDDYEEDTTGGPLCSQRDRDLFAEW Note. An alternative BOXSHADE Server can be found at: http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html. In this case, paste in your alignment, select HTML as the output format and use 1.0 as the identity threshold (in the sequence properties section). Your BOXSHADE alignment should be incorporated into your Word document. It should be clear from your results that the C. elegans polypeptides show considerable similarity to the mammalian -catalytic subunit. In particular, catalytically important residues (e.g. the YRDLKPEN motif) are highly conserved between the polypeptides. In contrast, N- and C-termini of the polypeptides differ more markedly. If you wish to learn more about the structure and catalytic properties of protein kinases you should visit the Protein Kinase Resource (PKR) at: http://www.nih.go.jp/mirror/Kinases/pk_home.html 2 6. It is possible to make structural predictions for the C. elegans polypeptides based upon their polypeptide sequences and the fact that they show significant similarities to the mouse -catalytic subunit. In order to do this you should visit the CPHmodels site: http://www.cbs.dtu.dk/services/CPHmodels/ This site provides an automated protein modelling service. You should paste the sequence of either the kin-1 or the F47F2.1 polypeptide into the form and provide the other information requested (e.g. your email address). A structure will then be predicted based upon comparison with other three-dimensional structures already available in the protein databases. This process may take some time, but you will eventually see a summary of the procedures used to arrive at the predicted structure and a ‘query.pdb’ file that can be saved and used to display the predicted structure. It should be noted that it can be extremely difficult (if not impossible) to predict reasonable 3-dimensional structures using the above approach. However, in this case, because of the similarity of the C. elegans polypeptides to a number of well-characterized proteins (specifically the mammalian -catalytic subunits) it is possible to produce reasonable models. You should rename this file (maintaining the .pdb extension), so that it can be loaded into a program that allows you to view and manipulate pdb files. Perhaps the most commonly used pdb viewer is PyMOL and loading your file into PyMOL (available from the ‘Install University Applications’ icon) should allow you to visualize your predicted structure as follows (this is the predicted kin-1 polypeptide structure): 3 Exactly what your structure looks like will depend upon the options you have chosen in PyMOL and on your view-point (the mouse can be used to rotate the structure at will). An extremely comprehensive PyMOL User’s Guide is available from the following site: http://pymol.sourceforge.net/newman/user/toc.html You can ignore the warning about the manual being ancient and obsolete! A brief summary of PyMOL commands etc. is appended to this handout. There are two ways of using PyMOL: point and click mode (with a mouse) and command-line mode (entering text into the command input window that appears when you run PyMOL. The point and click mode allows you to quickly rotate and resize the molecule. The command line mode is more flexible and useful for complex selection and issue of commands.