Introduction to Computational Biology; An Evolutionary Approach: Appendices
Bernhard Haubold & Thomas Wiehe
ICB c 2006 Birkhäuser Verlag Nomenclature of Nucleic Acid Constituends
Chemical Base Structure Nucleoside Symbol Class
NH¾ N N adenine N N purine adenosine A
O H N N N guanine H¾ N N purine guanosine G
NH¾ H N cytosine O N pyrimidine cytidine C
O
H CH¿ N thymine O N pyrimidine thymidine T
O H N uracil O N pyrimidine uridine U
ICB c 2006 Birkhäuser Verlag Carbon Atoms in Ribose
O 5’ 4’ 1’
3’ 2’
ICB c 2006 Birkhäuser Verlag DNA & RNA
AB O O
H CH¿ H N N O N O N HO P O HO P O O O O O
O O OH
ICB c 2006 Birkhäuser Verlag Structure of DNA
ICB c 2006 Birkhäuser Verlag Ambiguity Codes for DNA
Symbol Meaning Symbol Meaning AA RGA BTGC SGC CC TT DTGA UT GG VGCA HTCA WTA KTG XTGCA MCA YTC NTGCA
ICB c 2006 Birkhäuser Verlag Amino Acid Nomenclature
One Three Name One Three Name A Ala alanine M Met methionine C Cys cysteine N Asn asparagine D Asp aspartic acid P Pro proline E Glu glutamic acid Q Gln glutamine F Phe phenylalanine R Arg arginine G Gly glycine S Ser serine H His histidine T Thr threonine I Ile isoleucine V Val valine K Lys lysine W Trp tryptophan L Leu leucine Y Tyr tyrosine
ICB c 2006 Birkhäuser Verlag Physico-Chemical Properties of Amino Acids
H H H
H¾ N COOH COOH
H¾ N COOH
H¾ N COOH
CH¿
CH¾ CHOH H N Ala CONH¾
CH¿ H Asn Thr Pro
H¾ N COOH H H Gly
H¾ N COOH
H CH¾ OH H H ¾ H N COOH H¾ N COOH
Ser H¾ N COOH
CH¾ CH
CH¾ SH ¿ H C CH¿ COOH Cys Asp tiny Val
small
H H H H H¾ N COOH
H¾ N COOH H
H¾ N COOH CH
H¾ N COOH ¿
H C CH¾ ¾ CH H¾ N COOH
CH¾
CH¾
CH¿
CH¾ H N Ile N Phe N His OH H Trp H H Tyr aromatic
H¾ N COOH
H¾ N COOH H
CH¾
CH¾
H¾ N COOH CH ¿ H C CH¿
CH¾
CH¾ COOH Leu CH¾
Glu CH¾ ¾ CH¾ NH H aliphatic
Lys H¾ N COOH H
CH¾
H¾ N COOH H
CH¾ ¾ CH¾ H N COOH
CONH¾ ¾ H¾ CH
C Gln ¾ CH¾ CH NH S
C CH¿
H¾ N NH Met Arg negative positive polar non−polar
ICB c 2006 Birkhäuser Verlag
Synthesis of Peptide Bond
Ê ¾ O
H¾ N
Ê ½
Ê O ½ OH O N
H¾ N OH
H¾ N
Ê
¾ C OH Æ O
ICB c 2006 Birkhäuser Verlag Standard Genetic Code
second position 5’ end 3’ end T C A G T Phe/F Tyr/Y Cys/C Ser/S C Ter/* A Ter/* T Trp/W G Leu/L T His/H Pro/P Arg/R C A Gln/Q C G T Asn/N Ser/S Ile/I Thr/T C A Lys/K Arg/R A Met/M G T Asp/D Val/V Ala/A Gly/G C A Glu/E G G
ICB c 2006 Birkhäuser Verlag Bioinformatics Applications
Program Purpose Reference Website BLAST Databasesearch [2] www.ncbi.nlm.nih.gov/BLAST clustalw Multiple sequence alignment [12] ftp-igbmc.u-strasbg.fr/pub EMBOSS Various bioinformatics tasks [10] emboss.sourceforge.net DnaSP Analyze DNA sequence polymorphisms [11] www.ub.es/dnasp FASTA Database search [9] fasta.bioch.virginia.edu GenScan Gene prediction [3] genes.mit.edu/GENSCAN.html gff2aplot Plotting sequence comparisons [1] genome.imim.es/software
ms Coalescent simulations [6] home.uchicago.edu/~rhudson1 MUMmer Genomealignment [4] www.tigr.org/software PHYLIP Phylogeny reconstruction [5] evolution.genetics.washington.edu SGP-2 Gene prediction [8] www1.imim.es/software/sgp2 Slam Gene prediction [7] bio.math.berkeley.edu/slam
ICB c 2006 Birkhäuser Verlag Bioinformatics Databases
Name Purpose Website GenBank primary sequence repository www.ncbi.nlm.nih.gov EMBL primarysequencerepository www.ebi.ac.uk/embl DDBJ primarysequencerepository www.ddbj.nig.ac.jp SwissProt curated protein sequences www.expasy.org/sprot PDB macromolecular structures www.rcsb.org/pdb INTERPRO protein motifs www.ebi.ac.uk/interpro
ICB c 2006 Birkhäuser Verlag References
J. F. Abril, R. Guigó, and T. Wiehe. gff2aplot: plotting sequence comparisons. Bioinformatics, 19:2477–2479, 2004. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990. C. B. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78–94, 1997. A. L. Delcher, S. Kasti, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg. Alignment of whole genomes. Nucleic Acids Research, 27:2369–2376, 1999. J. Felsenstein. PHYLIP (phylogeny interference package), 1993. R. R. Hudson. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18:337–338, 2002. L. Pachter, M. Alexandersson, and S. Cawley. Applications of generalized pair Hidden Markov Models to alignment and gene finding problems. In Proceedings of the Fifth Annual Conference on Computational Molecular Biology (RECOMB 2001), pages 241–248. ACM Press, New York, 2001. G. Parra, P. Agarwal, J. F. Abril, T. Wiehe, J. Fickett, and R. Guigó. Comparative gene prediction in human and mouse. Genome Research, 13:108–117, 2003. W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, USA, 85:2444–2448, 1988. P. Rice and A. Bleasby. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics, 16:276–277, 2000. J. Rozas, J. C. Sánchez-DelBarrio, X. Messeguer, and R. Rozas. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics, 19:2496–2497, 2003. J. D. Thompson, D. G. Higgins, and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalities and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.
ICB c 2006 Birkhäuser Verlag