ELEMENTS OF BIOINFORMATICS
10 PHYLOGENETIC ANALYSIS PHYLOGENETIC ANALYSIS
1. Introduction - phylogenetics
2. The structure of a phylogenetic tree
3. Methodology of tree construction
4. Software examples
Copyright ©2014, Joanna Szyda INTRODUCTION - PHYLOGENETICS
METHOD OF CONSTRUCTION OF PHYLOGENETIC TREES
CLUSTERING CLADISTIC
• Do not account for • Account for evolutional evolutional relationship relationship
• UPGMA - Unweighted pair Group Method with • Maximum parsimony Arithmetic Mean
• Neighbour joining • Maximum likelihood
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION
THE CLUSTERING METHOD METHODS OF TREE CONSTRUCTION - UPGMA
STEPS IN UPGMA
1. Calculate a distance matrix between individuals
2. Choose the most similar individuals = nod
3. Calculate a new distance matrix
4. ... Go back to 2
5.
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 1. Calculate a distance matrix between individuals
ATCC ATGC TTCG TCGG
0 1 2 4 ATCC
0 3 3 ATGC
0 2 TTCG
0 TCGG Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 2. Choose the most similar individuals = nod
ATCC ATGC TTCG TCGG
0 1 2 4 ATCC
0 3 3 ATGC
0 2 TTCG
0 TCGG Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA
2. Choose the most similar individuals = nod
ATCC ATGC TTCG TCGG 0 1 2 4 ATCC
0 3 3 ATGC
0 2 TTCG
0 TCGG 0.5 0.5
ATCC ATGC Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 3. Calculate a new distance matrix
ATCC + ATGC TTCG TCGG
ATCC + 0 (2+3)/2=2.5 (4+3)/2=3.5 ATGC
0 2 TTCG
0 TCGG
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 4. Choose the most similar individuals = nod
ATCC + ATGC TTCG TCGG
ATCC + 0 (2+3)/2=2.5 (4+3)/2=3.5 ATGC
0 2 TTCG
0 TCGG
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 4. Choose the most similar individuals = nod
ATCC + ATGC TTCG TCGG ATCC + 0 (2+3)/2=2.5 (4+3)/2=3.5 ATGC
0 2 TTCG
0 TCGG
0.5 0.5 1 1
ATCC ATGC TTCG TCGG Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 5. Calculate a new distance matrix
ATCC TTCG + + ATGC TCGG
ATCC + 0 (2+4+3+3)/4=3 ATGC
ATCC + 0 ATGC
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA 6. Choose the most similar individuals = nod
ATCC TTCG + + ATGC TCGG
ATCC + 0 (2+4+3+3)/4=3 ATGC
ATCC + 0 ATGC 1.5 1.5
0.5 0.5 1 1
ATCC ATGC TTCG TCGG Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - UPGMA
1. The most basic method
2. Very fast
3. A „molecular clock” assumption assumes the same evolutionary speed on all lineages
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION - NEIGHBOUR JOINING
1. Accounts for different speed of evolution across lineages
2. Relatively fast
3. Estimates of edge length
4. Results depend on the assumed model of evolution
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION
THE CLADISTIC METHOD METHODS OF TREE CONSTRUCTION – Maximum parsimony
STEPS IN MAXIMUM PARSIMONY
2. Construction of ALL possible trees
3. Selection of a tree with minimal number of mutations required
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION – Maximum parsimony
1. Sequence alignment
1 A A C C G A T 2 A A C C G C A 3 A G T C G T T 4 A G T C G G A
• The same values → sequence non-informative • All different values → sequence non-informative • Repeated values → sequence informative
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION – Maximum parsimony
2. Construction of all possible trees
The total number of possible trees:
• rooted (2n - 3) !! • unrooted (2n - 5) !! n – the number of sequences n !! = 1*3*5*7*...*n
n rooted unrooted
3 3 1 4 15 3 5 105 15 10 34 459 425 2 027 025
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION – Maximum parsimony
2. Construction of all possible trees
ACT GTA
GTA ACA GTA GTT
ACA GTT
ACT GTT ACA ACT Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION – Maximum parsimony
3. Selection of a tree with minimal number of mutations required ACT GTA
GTA 3 1 GTA ACA 3 GTT GTA GTT
GTA 2 ACA GTT 1 GTT 1 2 2 GTT 1 ACT
ACT GTT ACA ACT Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION – Maximum parsimony
1. Parsimony Selects a tree with minimal number of mutations
2. Ockham razor preferring the simplest of many possible solutions
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION – Maximum likelihood
1. Accounts for different probability of mutations uses substitution models
2. Uses all sequence positions (not only informative)
3. Very slow
4. Accurate esults
5. Provides the probability of tree corectness
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION
UPGMA MAXIMUM PARSIMONY
• Slow a large number of • Simple, fast possible trees
• Analysis of large data sets - • Analysis of large data sets - possible problematic
• Does not account for • Accounts for evolution evolution through modelling mutations
Copyright ©2014, Joanna Szyda METHODS OF TREE CONSTRUCTION
Probability of tree corectness bootstrap
GENERATE AN ARTIFICIAL DATA SET
Shuffle the nucleotide sequence
1000
GENERATE A TREE
DETERMINE THE REPEATABILITY OF A GIVEN BRANCH
Copyright ©2014, Joanna Szyda TREE CONSTRUCTION
PRACTICAL EXAMPLE TREE CONSTRUCTION – example Clustal
1. Select sequences
Copyright ©2014, Joanna Szyda TREE CONSTRUCTION – example Clustal
2. Sequence alignment
Copyright ©2014, Joanna Szyda TREE CONSTRUCTION – example Clustal
3. Tree construction
Branch length
Copyright ©2014, Joanna Szyda PHYLOGENETIC SOFTWARE
PHYLIP - www.phylip.com/
Copyright ©2014, Joanna Szyda PHYLOGENETIC SOFTWARE INSTITUT PASTEUR
http://mobyle.pasteur.fr/cgi -bin/portal.py#welcome
Copyright ©2014, Joanna Szyda PHYLOGENETIC SOFTWARE MEGA - www.megasoftware.net/
Copyright ©2014, Joanna Szyda OPROGRAMOWANIE PAUP – www.paup.csit.fsu.edu/
Copyright ©2014, Joanna Szyda OPROGRAMOWANIE
TREEFINDER - www.treefinder.de
Copyright ©2014, Joanna Szyda