Introductory molecular biology computing and bioinformatics

for Molecular Mechanisms of Development.

Barcelona 2006.

Exercise Set 1

1. Find the protein amino acid sequence of the mouse haematopoietic cell

protein kinase called “tec” described by Mano et al. in 1993. Create a text

file (.txt) containing this sequence in FASTA format.

2. Find the sequence of the protein with accession number AAA37592.

Create a text file (.txt) containing this sequence in FASTA format.

3. Find the complimentary DNA (cDNA) sequence with accession number

BC018394. Translate this nucleotide sequence into the corresponding

protein amino acid sequence and save this file. Edit the protein sequence

into FASTA format and save it as a text file (.txt)

4. Perform a BLAST search with the tec kinase sequence you have saved.

Make notes on any conserved domains that are expected to be present in

the protein. Format your output before proceeding.

5. Examine the “E values” or “Expect scores”. What have these scores been

used to do to your list of BLAST hits? Can you follow a link from the

BLAST page to notes on what this score means?

6. From the BLAST search output find the protein sequence for Bruton

agammaglobulinemia / Bruton’s tyrosine kinase….. submitted by Tsukada

et al. in 1993. Create a text file (.txt) containing this sequence in FASTA

format. 7. From the BLAST search find the protein sequence for mouse BMX non-

receptor tyrosine kinase submitted by Ekman et al. in 1997. Create a text

file (.txt) containing this sequence in FASTA format.

8. Create a single text file (.txt) containing all of your saved protein

sequences. Edit the top identifier line (everything to the RHS of the >

symbol) so that each sequence is identified by a single short name, each

one beginning with either a different letter or number. Eliminate any blank

lines to remove spaces between the blocks of to describe each sequence.

9. Use this file containing all of your sequences to obtain a multiple sequence

alignment (MSA) and save the alignment. You should use either ClustalW

or Multalin.

Please note the following points if you use ClustalW/Boxshade,

a) set the ClustalW output format to wo/numbers,

b) follow the link to obtain the output file in .aln format (alignment file) and

copy the resulting page

c) paste the .aln output into the Boxshade window and set output to

either rtf new or rtf old and input to ALN.

10.Use PFAM and InterPro (and any other programmes from the “Analysing

sequences” section of the tool box) to analyse your tec kinase protein

sequence. Determine which, if any, domains are common to all of

sequences present in your MSA. Annotate your alignment to highlight this

region of similarity.

Geraint Thomas Department of Physiology, UCL 2006