Introductory Molecular Biology Computing and Bioinformatics for Molecular Mechanisms Of
Total Page:16
File Type:pdf, Size:1020Kb
Introductory molecular biology computing and bioinformatics
for Molecular Mechanisms of Development.
Barcelona 2006.
Exercise Set 1
1. Find the protein amino acid sequence of the mouse haematopoietic cell
protein kinase called “tec” described by Mano et al. in 1993. Create a text
file (.txt) containing this sequence in FASTA format.
2. Find the sequence of the protein with accession number AAA37592.
Create a text file (.txt) containing this sequence in FASTA format.
3. Find the complimentary DNA (cDNA) sequence with accession number
BC018394. Translate this nucleotide sequence into the corresponding
protein amino acid sequence and save this file. Edit the protein sequence
into FASTA format and save it as a text file (.txt)
4. Perform a BLAST search with the tec kinase sequence you have saved.
Make notes on any conserved domains that are expected to be present in
the protein. Format your output before proceeding.
5. Examine the “E values” or “Expect scores”. What have these scores been
used to do to your list of BLAST hits? Can you follow a link from the
BLAST page to notes on what this score means?
6. From the BLAST search output find the protein sequence for Bruton
agammaglobulinemia / Bruton’s tyrosine kinase….. submitted by Tsukada
et al. in 1993. Create a text file (.txt) containing this sequence in FASTA
format. 7. From the BLAST search find the protein sequence for mouse BMX non-
receptor tyrosine kinase submitted by Ekman et al. in 1997. Create a text
file (.txt) containing this sequence in FASTA format.
8. Create a single text file (.txt) containing all of your saved protein
sequences. Edit the top identifier line (everything to the RHS of the >
symbol) so that each sequence is identified by a single short name, each
one beginning with either a different letter or number. Eliminate any blank
lines to remove spaces between the blocks of to describe each sequence.
9. Use this file containing all of your sequences to obtain a multiple sequence
alignment (MSA) and save the alignment. You should use either ClustalW
or Multalin.
Please note the following points if you use ClustalW/Boxshade,
a) set the ClustalW output format to wo/numbers,
b) follow the link to obtain the output file in .aln format (alignment file) and
copy the resulting page
c) paste the .aln output into the Boxshade window and set output to
either rtf new or rtf old and input to ALN.
10.Use PFAM and InterPro (and any other programmes from the “Analysing
sequences” section of the tool box) to analyse your tec kinase protein
sequence. Determine which, if any, domains are common to all of
sequences present in your MSA. Annotate your alignment to highlight this
region of similarity.
Geraint Thomas Department of Physiology, UCL 2006