Human genetic ancestry (When branches get short...)
Ingo Ebersberger, Greg Ewing, Sascha Strauss, Heiko Schmidt and Arndt von Haeseler
Center for Integrative Bioinformatics in Vienna (CIBIV) http://www.illgenauktionen.de/Katalog/RIMG1770.JPG
The Coalescent Model
Wright Fisher population model with constant population size. Each generation chooses its parent at random. Pairs of lineages are traced back to a coalescent event. Kingman (1982) developed a continuous model that allows to estimate times between the coalescent events. The coalescent rate for any pair of genetic lineages is proportional to 1/Ne in generations or to 1/θ in substitutions. Reconstructing phylogenetic relationships of contemporary sequences Inferring the species tree from the sequence tree When internal branches get short... When internal branches get short...
MRCA MRCA P2P3 1/3 P1P2
MRCAP1P3 A well known example Resolved ML trees distributed across the 15 topologies
All All
20 (0.17%) 4 (0.03%)
9,147 (76.47%) 1,369 (11.46%)
19 (0.16%) 13 (0.11%)
0 5 (0.04%)
0 1,361 (11.39%)
5 (0.04%) 0
0 0
0
((H,C),G) is the most frequently observed sequence tree
9,147 out of 11,877 sequence trees (77%) support ((H,C),G) The fossil record on primate evolution
Sivapithecus (~13 MYBP) Proconsul (~18 MYBP)
Sahelanthropus (~6.5 MYBP)
Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) The fossil record on primate evolution
Sahelanthropus (~6.5 MYBP)
Sivapithecus (~13 MYBP)
Proconsul (~18 MYBP)
Aegyptopithecus (~33 MYBP)
Morothopithecus (~21 MYBP) The fossil record on primate evolution
Sahelanthropus (~6.5 MYBP)
Sivapithecus (~13 MYBP)
Proconsul (~18 MYBP)
Aegyptopithecus (~33 MYBP)
Morothopithecus (~21 MYBP) Dating of speciation events during primate evolution
Human/ Gorilla Orang Rhesus Chimp
Calibration to human-chimp split (min: Sahelanthropus; max: arbitrary) 6.5 - 10 7.3 - 19 20 - 45 36 - 79
Calibration to human-orangutan split (min: Sivapithecus; max: Proconsul) 2.9 - 6.3 3.4 - 11.3 13 - 18 21 - 34
Calibration to human-macaque split (min: Morotopithecus; max: 2.7 - 6.3 3.0 - 12.0 10 - 20 21 - 33 Aegyptopithecus)
modified from Patterson et al. (2006) Nature 441:1103-1108 The fossil record on primate evolution
? Sahelanthropus (~6.5 MYBP)
Sivapithecus (~13 MYBP)
Proconsul (~18 MYBP)
Aegyptopithecus (~33 MYBP)
Morothopithecus (~21 MYBP) A fraction of the human genome is evolutionary old Human Chimp Gorilla Human Chimp Gorilla
MRCACG MRCAHC 1/3 1,369 (11.5%)
Human Chimp Gorilla
MRCAHG
1,361 (11.4%) ~35% of the human genome is evolutionary old Human Chimp Gorilla Human Chimp Gorilla
MRCACG MRCAHC 1/3 1,369 (11.5%) ~11.5%
Human Chimp Gorilla
MRCAHG
1,361 (11.4%) Genomic location of incongruent sequence trees The position of incongruent sequence trees with respect to genes
All (%) Gene (%) Exon (%) All (%) Gene (%) Exon (%)
20 (0.17) 8 (0.17) 2 (0.32) 4 (0.03) 1 (0.02) 0
9,148 (76.58) 3,814 (78.85) 487 (78.93) 1,369 (11.46) 504 (10.42) 63 (10.21)
19 (0.16) 10 (0.21) 2 (0.32) 13 (0.11) 6 (0.12) 1 (0.16)
0 0 0 5 (0.04) 0 0
1 (0.01) 0 0 1,361 (11.39) 492 (10.17) 62 (10.05)
5 (0.04) 2 (0.04) 0 0 0 0
0 0 0 0 0 0
0 0 0
The position of incongruent sequence trees with respect to genes: An example
1) KIR3DX1 2) LILRA2 3) LILRA1 4) LILRB1 5) LILRB4 The ML-tree of the LILRA Gene Family (CDS) Function of LILRA1
Leukocyte Immunoglobuline-like receptor (Subfamily A) expressed in monocytes and B-cells plays a role in negative regulation of immune response cell surface receptor linked signal transduction Human genes to go (> 300 bp exonic sequence covered)
# TREE_ID POSTPROB EXONIC DESCRIPTION 1 ((H,G)C) 1 869 transmembrane protease 2 ((C,G)H) 1 835 Hermansky-Pudlak syndrome 1 protein. 3 ((C,G)H) 1 767 4 ((C,G)H) 1 757 meningioma (disrupted in balanced translocation) 1 (MN1) 5 ((H,G)C) 1 756 Zinc finger protein 230 (Zinc finger protein FDZF2). 6 ((C,G)H) 1 740 solute carrier family 22 (organic cation transporter) 7 ((H,G)C) 1 736 Sel-1 homolog precursor (Suppressor of lin-12-like protein) (Sel-1L). 8 ((C,G)H) 1 696 Docking protein 5 (Downstream of tyrosine kinase 5) (IRS6) (Protein dok-5). 9 ((H,G)C) 1 693 Melatonin-related receptor (G protein-coupled receptor 50) (H9). 10 ((C,G)H) 1 688 FLJ35348 (FLJ35348) on chromosome 9 11 ((H,G)C) 1 687 Membrane progestin receptor beta (mPR beta) 12 ((H,G)C) 1 676 13 ((C,G)H) 1 647 Disks large-associated protein 1 (DAP-1) 14 ((H,G)C) 1 647 Olfactory receptor 6S1. 15 ((C,G)H) 1 634 Secretogranin-2 precursor (Secretogranin II) 16 ((C,G)H) 1 619 CUB and sushi domain-containing protein 2 17 ((C,G)H) 1 619 HMG2 like isoform 1 .. 58 ((H,G)C) 0,95 365 Peptidyl-prolyl cis-trans isomerase A Amino acid substitution patterns depend on the sequence phylogeny
AA H , C , G C , H , G positions X Y Y X Y Y
H C G 9,482 (17) (75%) (59%) 24 (41%) H G C 1,449 12 (11%) (41%)
C G H 1,787 34 (17) (14%) (59%) (59%) When did species-specific morphological characteristics evolve?
? Sahelanthropus (~6.5 MYBP)
Sivapithecus (~13 MYBP)
Proconsul (~18 MYBP)
Aegyptopithecus (~33 MYBP)
Morothopithecus (~21 MYBP) When did species-specific morphological characteristics evolve?
? Sahelanthropus (~6.5 MYBP)
Sivapithecus (~13 MYBP)
Proconsul (~18 MYBP)
Aegyptopithecus (~33 MYBP)
Morothopithecus (~21 MYBP) When subsequent branches get compressed... Radiation: Speciation events compressed in time Compression of subsequent branches can cause Anomalous Gene Trees (AGTs)
ANOMALOUS SEQUENCE TREES: incongruent sequence trees that are more likely to be observed than congruent sequence trees
Degnan and Rosenberg (2006) PLOS Genetics 2:762-769 The problem of anomalous gene trees
Not all unlabelled rooted tree topologies are equally probable
1/3
2/3 Symmetric labelled trees are more probable than asymmetric labelled trees
a b c d a c b d a d b c c d b a a b c d
c d a b a b d c 1/3 2/3
a c b d b d a c
a c d b b d c a a d b c b c d a a d c b b c a d
If we generate rooted 4-taxa trees with a coalescent model, a symmetric labelled topology has a higher probability (1/9) then an asymmetric labelled topology (1/18) Asymmetric 4-taxa species trees can produce AGTs
P1 P2 P3 P4
1/9
1/12
P1 P2 P3 P4 Asymmetric 4-taxa species trees can produce AGTs
P1 P2 P3 P4
1/9
1/12
P1 P2 P3 P4 Topology preference in real data
All (%) Gene (%) Exon (%) All (%) Gene (%) Exon (%)
20 (0.17) 8 (0.17) 2 (0.32) 4 (0.03) 1 (0.02) 0
9,148 (76.58) 3,814 (78.85) 487 (78.93) 1,369 (11.46) 504 (10.42) 63 (10.21)
19 (0.16) 10 (0.21) 2 (0.32) 13 (0.11) 6 (0.12) 1 (0.16)
0 0 0 5 (0.04) 0 0
1 (0.01) 0 0 1,361 (11.39) 492 (10.17) 62 (10.05)
5 (0.04) 2 (0.04) 0 0 0 0
0 0 0 0 0 0
0 0 0
29 38 AGTs in n-taxa species trees (n>5)
The probability for each labelled tree is 1/90, 1/60, and 1/180, respectively (5 taxa case) proof that every species tree topology with five or more species can give rise to AGTs AGTs in n-taxa species trees (n>5)
The probability for each labelled tree is 1/90, 1/60, and 1/180, respectively (5 taxa case) proof that every species tree topology with five or more species can give rise to AGTs
Species tree reconstruction by means of taking the consensus from many sequence trees can be positively missleading A fast method to reconstruct species trees in the presence of AGTs
A simple solution: rooted triples do not display the AGT phenomenon
1/3 The idea of the Triple Consensus Method (TCM)
L3 L11 L12 L13A L15 L23 L27 L27a L32 L35 Anurida maritima X X X X X X X X X X Archispirostreptus gigas X X X X X X X X X X Astrosclera willeyana X X X X X X X X X X Barentsia elongata X X X X X X X X X X Flustra foliacea X X X X X X X X X X Littorina saxatilis X X X X X X X X X X Lubomirskia baicalensis X X X X X X X X X X Psoroptes ovis X X X X X X X X X X Sipunculus nudus X X X X X X X X X X Xenoturbella bocki X X X X X X X X X X
Generate Consensus Tree Simulation
Generate sequence tree
(b)
(a) Simulation Results (perfect sequence trees)
20 taxa tree, 1000 replications per parameter combination Simulation Results (reconstructed sequence trees (ML))
ML sequence tree
true sequence tree
20 taxa tree, 1000 replications per parameter combination Application to biological data Application to biological data
Anopheles g. C. briggsae Homo Ciona s. C. elegans
Apis m.
Pan t. Maccaca m. Bos t. Canis f. Saccharomyces s.
Drosophila m.
Monodelphis d. Mus m. Rattus n. Xenopus t.
Drosophila p.
Tetraodon n. Takifugu r. Danio r. Gallus g. The resulting rooted species phylogeny based on 216 sequence trees The resulting species phylogeny based on 216 sequence trees
No difference to the MRe-tree, therefore no indication for AGTs A more relevant dataset to come??
Porifera
Ecdysozoa
Deuterostomes
Annelids/Molluscs Some conclusions and challenges
Genealogy of the phenotype depends on the genealogy of the underlying genotype. Consideration of population genetic effects is necessary when branches get short or population sizes get large. Map of human genetic ancestry is required to understand human phenotypic evolution and to correctly interpret the fossil record. Ancestral polymorphisms and incomplete lineage sorting serve as an alternative to homoplasy. short branches can lead to ‘un-intuitive’ behaviour of evolution and can in several ways interfere with accurate species tree reconstruction. Acknowledgements Positional conservation in the LILR-region y " a(x)
3 ! Fossil Record on Human Evolution
Dating of Hominid Fossils: Ardipithecus kadabba: ~5.5 MYBP Orrorin tugenensis: ~5.8 MYBP Sahelanthropus tchadensis: 6.5-7.5 MYBP