<<

Advanced Skills in Biochemistry: Bioinformatics - 2 - Investigation of C. elegans kinase polypeptides

Exercise 1

The purpose of this workshop is to familarise you with some of the databases and tools that are available for the analysis of . For the purposes of this exercise, you will again be investigating the family of catalytic subunits of the cyclic AMP-dependent protein kinase (PK-A) in the free-living nematode, C. elegans.

1. The C. elegans polypeptide sequences we are interested in are encoded by the F47F2.1 on the X and the kin-1 gene on chromosome 1. In addition, we need the sequence of the mouse PK-A - catalytic subunit. The easiest way of obtaining each of these sequences is by making use of the Sequence Retrieval System (SRS). You should therefore visit the SRS Homepage: http://srs6.ebi.ac.uk/

2. From the home page, select the Library Page tab. Then, under the UniProt Universal Protein Resource section, select the UniProtKB and UniParc databases. Then select the Standard Query Form. You should then carry out separate searches for the following accession numbers (i.e. select AccessionNumber from the drop-down menu in place of the default AllText): P05132, UPI000002AC77 and Q7JP68. These searches should find records for the polypeptide sequence for the mouse -catalytic subunit, the polypeptide sequence of the catalytic subunit encoded by the kin-1 gene and the F47F2.1b polypeptide sequence. Each of the records should be saved locally (e.g. on the desktop).

3. You should now prepare a simple text file containing each of the sequences in the following format: >mouse MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLV KHKESGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVM EYVAGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYI QVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFA DQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATT DWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF >kin-1 MLKFLKPKSSDEGSSKDNKNSASLKEFLDKAREDFKQRWENPAQNTACLDDFDRIKTLGT GSFGRVMLVKHKQSGNYYAMKILDKQKVVKLKQVEHTLNEKRILQAIDFPFLVNMTFSLK DNSNLYMVLEFISGGEMFSHLRRIGRFSEPHSRFYAAQIVLAFEYLHSLDLIYRDLKPEN LLIDSTGYLKVTDFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEM AAGYPPFFADQPIQIYEKIVSGKVKFPSHFSNELKDLLKNLLQVDLTKRYGNLKNGVADI KNHKWFGSTDWIAIYQKKIEAPFLPKCRGPGDASNFDDYEEEPLRISGTEKCAKEFAEF >F47F2.1 MVILSHHKRSPPLLMRLFRICRFFRRLMSSSTSSVESVEDESCSNECSASFTFDTNNNSR GNNQVNELAEETHMKLSITPTRESFSLSQLERIITIGKGTFGRVELARDKITGAHYALKV LNIRRVVDMRQTQHVHNEKRVLLQLKHPFIVKMYASEKDSNHLYMIMEFVPGGEMFSYLR ASRSFSNSMARFYASEIVCALEYIHSLGIVYRDLKPENLMLSKEGHIKMADFGFAKELRD RTYTICGTPDYLAPESLARTGHNKGVDWWALGILIYEMMVGKPPFRGKTTSEIYDAIIEH KLKFPRSFNLAAKDLVKKLLEVDRTQRIGCMKNGTQDVKDHKWFEKVNWDDTLHLRVEPP IVPTLYHPGDTGNFDDYEEDTTGGPLCSQRDRDLFAEW

You should take particular care to remove all non-sequence characters (including spaces) from the sequence data. In addition, the header line for each sequence must be in the format shown above (i.e. >name). The text file should also be incorporated into a Word document.

4. Once you have prepared this ‘multiple sequence’ file you are ready to carry out a multiple alignment exercise using these sequences. In order to do this you should visit the following site: http://www.ebi.ac.uk/Tools/clustalw2/

Here you should paste the contents of your ‘multiple sequence’ text file into the space provided and then run the ClustalW multiple alignment program. You will (eventually) receive an alignment for these three sequences. The alignment is provided as a simple text-only representation (although you can select ‘show colors’ to see a coloured representation. (note. If the above site web site is not working, you can try the 1 following alternative: http://clustalw.ddbj.nig.ac.jp/top-e.html the disadvantage with this site is that it only provides you with the simple text-only representation of the alignment).

5. You should select the alignment (be sure to include the header entitled CLUSTAL 2.0.10 multiple sequence alignment), save to the clipboard and then go to the BOXSHADE site:

http://www.ch.embnet.org/software/BOX_form.html

Once at the BOXSHADE site, paste your alignment into the BOXSHADE form, select other for the input alignment sequence format, select RTF_new for the output option and select 1.0 for the fraction of sequences option. (note. if you are using the alignment provided by the alternative web page (see above), you will need to select the aln option for the input alignment sequence format and you must take care to include the header text from the alignment). Now run the BOXSHADE program and view the output using Word or another word processor. Hopefully you will see an alignment similar to that shown here, where a black background represents an identical residue in each sequence and a gray background represents a similar residue in each sequence.

Mouse 1 ------MGNAAAAKKGSEQE kin-1 1 ------MLKFLKPKSSDEGSSKDNKNSA F47F2.1 1 MVILSHHKRSPPLLMRLFRICRFFRRLMSSSTSSVESVEDESCSNECSASFTFDTNNNSR

mouse 15 SVKEFLAKAKEDFLKKWETPSQNTAQLDQFDRIKTLGTGSFGRVMLVKHKESGNHYAMKI kin-1 23 SLKEFLDKAREDFKQRWENPAQNTACLDDFDRIKTLGTGSFGRVMLVKHKQSGNYYAMKI F47F2.1 61 GNNQVNELAEETHMKLSITPTRESFSLSQLERIITIGKGTFGRVELARDKITGAHYALKV

mouse 75 LDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVAGGEMFSHLR kin-1 83 LDKQKVVKLKQVEHTLNEKRILQAIDFPFLVNMTFSLKDNSNLYMVLEFISGGEMFSHLR F47F2.1 121 LNIRRVVDMRQTQHVHNEKRVLLQLKHPFIVKMYASEKDSNHLYMIMEFVPGGEMFSYLR

mouse 135 RIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIQVTDFGFAKRVKG kin-1 143 RIGRFSEPHSRFYAAQIVLAFEYLHSLDLIYRDLKPENLLIDSTGYLKVTDFGFAKRVKG F47F2.1 181 ASRSFSNSMARFYASEIVCALEYIHSLGIVYRDLKPENLMLSKEGHIKMADFGFAKELRD

mouse 195 RTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSG kin-1 203 RTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSG F47F2.1 241 RTYTICGTPDYLAPESLARTGHNKGVDWWALGILIYEMMVGKPPFRGKTTSEIYDAIIEH

mouse 255 KVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAP kin-1 263 KVKFPSHFSNELKDLLKNLLQVDLTKRYGNLKNGVADIKNHKWFGSTDWIAIYQKKIEAP F47F2.1 301 KLKFPRSFNLAAKDLVKKLLEVDRTQRIGCMKNGTQDVKDHKWFEKVNWDDTLHLRVEPP

mouse 315 FIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFTEF- kin-1 323 FLPKCRGPGDASNFDDYEEEPLRISGTEKCAKEFAEF- F47F2.1 361 IVPTLYHPGDTGNFDDYEEDTTGGPLCSQRDRDLFAEW

Note. An alternative BOXSHADE Server can be found at: http://bioweb.pasteur.fr/seqanal/interfaces/boxshade.html. In this case, paste in your alignment, select HTML as the output format and use 1.0 as the identity threshold (in the sequence properties section). Your BOXSHADE alignment should be incorporated into your Word document.

It should be clear from your results that the C. elegans polypeptides show considerable similarity to the mammalian -catalytic subunit. In particular, catalytically important residues (e.g. the YRDLKPEN motif) are highly conserved between the polypeptides. In contrast, N- and C-termini of the polypeptides differ more markedly. If you wish to learn more about the structure and catalytic properties of protein kinases you should visit the Protein Kinase Resource (PKR) at: http://www.nih.go.jp/mirror/Kinases/pk_home.html

2 6. It is possible to make structural predictions for the C. elegans polypeptides based upon their polypeptide sequences and the fact that they show significant similarities to the mouse -catalytic subunit. In order to do this you should visit the CPHmodels site: http://www.cbs.dtu.dk/services/CPHmodels/

This site provides an automated protein modelling service. You should paste the sequence of either the kin-1 or the F47F2.1 polypeptide into the form and provide the other information requested (e.g. your email address). A structure will then be predicted based upon comparison with other three-dimensional structures already available in the protein databases. This process may take some time, but you will eventually see a summary of the procedures used to arrive at the predicted structure and a ‘query.pdb’ file that can be saved and used to display the predicted structure. It should be noted that it can be extremely difficult (if not impossible) to predict reasonable 3-dimensional structures using the above approach. However, in this case, because of the similarity of the C. elegans polypeptides to a number of well-characterized proteins (specifically the mammalian -catalytic subunits) it is possible to produce reasonable models.

You should rename this file (maintaining the .pdb extension), so that it can be loaded into a program that allows you to view and manipulate pdb files. Perhaps the most commonly used pdb viewer is PyMOL and loading your file into PyMOL (available from the ‘Install University Applications’ icon) should allow you to visualize your predicted structure as follows (this is the predicted kin-1 polypeptide structure):

3 Exactly what your structure looks like will depend upon the options you have chosen in PyMOL and on your view-point (the mouse can be used to rotate the structure at will). An extremely comprehensive PyMOL User’s Guide is available from the following site: http://pymol.sourceforge.net/newman/user/toc.html

You can ignore the warning about the manual being ancient and obsolete! A brief summary of PyMOL commands etc. is appended to this handout. There are two ways of using PyMOL: point and click mode (with a mouse) and command-line mode (entering text into the command input window that appears when you run PyMOL. The point and click mode allows you to quickly rotate and resize the molecule. The command line mode is more flexible and useful for complex selection and issue of commands. For the purposes of today’s workshop, you will only need to use point and click mode. You can alter the appearance of the molecule by use of the A (Actions:), S (Show:), H (Hide:), L (Label:) and C (Color:) buttons in the top right-hand corner of the graphic display. The Preset: menu (under - Actions:) is particularly useful and you should investigate the visualisations that are available.

A particularly useful option is the ability to display the polypeptide sequence above the molecular visualisation (File|Display|Sequence). It is then possible to select specific amino acid residues and visualise them differently from the rest of the molecule. In this case, you should locate the catalytically important YRDLKPEN motif. Once selected, you can alter the appearance of these residues (e.g. spheres) to highlight their position in the protein. You should note that when you specifically select residues in the sequence an object (sele) appears in the list on the right (under model1 in this case). This can be manipulated independently by the ASHL buttons.

Once you have created a satisfactory visualisation you can save it in a format that can be imported into a word document (File|Save Image As|PNG). We recommend that all such images have a white background (Display|Background|White), if they are going to be printed (the default black background will soon drain print cartridges). Your visualisation should be incorporated into your Word document.

Note. The RCSB (PDB) contains protein structural information that may be retrieved and visualised. The Homepage for this site is http://www.rcsb.org/pdb/ . Each PDB file contains the co-ordinates of the atoms within proteins and may be used to reconstruct 3-D images of the protein. Each entry contains information obtained by X-ray crystallography and/or NMR and the structures of hundreds of proteins are archived.

The completed word-processed document is to be printed and placed in the appropriate box in the Teaching Support Office, Life Sciences Building no later than Thursday 4th November, 2010.

4