Human Genetic Ancestry (When Branches Get Short...)
Total Page:16
File Type:pdf, Size:1020Kb
Human genetic ancestry (When branches get short...) Ingo Ebersberger, Greg Ewing, Sascha Strauss, Heiko Schmidt and Arndt von Haeseler Center for Integrative Bioinformatics in Vienna (CIBIV) http://www.illgenauktionen.de/Katalog/RIMG1770.JPG The Coalescent Model Wright Fisher population model with constant population size. Each generation chooses its parent at random. Pairs of lineages are traced back to a coalescent event. Kingman (1982) developed a continuous model that allows to estimate times between the coalescent events. The coalescent rate for any pair of genetic lineages is proportional to 1/Ne in generations or to 1/θ in substitutions. Reconstructing phylogenetic relationships of contemporary sequences Inferring the species tree from the sequence tree When internal branches get short... When internal branches get short... MRCA MRCA P2P3 1/3 P1P2 MRCAP1P3 A well known example Resolved ML trees distributed across the 15 topologies All All 20 (0.17%) 4 (0.03%) 9,147 (76.47%) 1,369 (11.46%) 19 (0.16%) 13 (0.11%) 0 5 (0.04%) 0 1,361 (11.39%) 5 (0.04%) 0 0 0 0 ((H,C),G) is the most frequently observed sequence tree 9,147 out of 11,877 sequence trees (77%) support ((H,C),G) The fossil record on primate evolution Sivapithecus (~13 MYBP) Proconsul (~18 MYBP) Sahelanthropus (~6.5 MYBP) Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) The fossil record on primate evolution Sahelanthropus (~6.5 MYBP) Sivapithecus (~13 MYBP) Proconsul (~18 MYBP) Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) The fossil record on primate evolution Sahelanthropus (~6.5 MYBP) Sivapithecus (~13 MYBP) Proconsul (~18 MYBP) Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) Dating of speciation events during primate evolution Human/ Gorilla Orang Rhesus Chimp Calibration to human-chimp split (min: Sahelanthropus; max: arbitrary) 6.5 - 10 7.3 - 19 20 - 45 36 - 79 Calibration to human-orangutan split (min: Sivapithecus; max: Proconsul) 2.9 - 6.3 3.4 - 11.3 13 - 18 21 - 34 Calibration to human-macaque split (min: Morotopithecus; max: 2.7 - 6.3 3.0 - 12.0 10 - 20 21 - 33 Aegyptopithecus) modified from Patterson et al. (2006) Nature 441:1103-1108 The fossil record on primate evolution ? Sahelanthropus (~6.5 MYBP) Sivapithecus (~13 MYBP) Proconsul (~18 MYBP) Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) A fraction of the human genome is evolutionary old Human Chimp Gorilla Human Chimp Gorilla MRCACG MRCAHC 1/3 1,369 (11.5%) Human Chimp Gorilla MRCAHG 1,361 (11.4%) ~35% of the human genome is evolutionary old Human Chimp Gorilla Human Chimp Gorilla MRCACG MRCAHC 1/3 1,369 (11.5%) ~11.5% Human Chimp Gorilla MRCAHG 1,361 (11.4%) Genomic location of incongruent sequence trees The position of incongruent sequence trees with respect to genes All (%) Gene (%) Exon (%) All (%) Gene (%) Exon (%) 20 (0.17) 8 (0.17) 2 (0.32) 4 (0.03) 1 (0.02) 0 9,148 (76.58) 3,814 (78.85) 487 (78.93) 1,369 (11.46) 504 (10.42) 63 (10.21) 19 (0.16) 10 (0.21) 2 (0.32) 13 (0.11) 6 (0.12) 1 (0.16) 0 0 0 5 (0.04) 0 0 1 (0.01) 0 0 1,361 (11.39) 492 (10.17) 62 (10.05) 5 (0.04) 2 (0.04) 0 0 0 0 0 0 0 0 0 0 0 0 0 The position of incongruent sequence trees with respect to genes: An example 1) KIR3DX1 2) LILRA2 3) LILRA1 4) LILRB1 5) LILRB4 The ML-tree of the LILRA Gene Family (CDS) Function of LILRA1 Leukocyte Immunoglobuline-like receptor (Subfamily A) expressed in monocytes and B-cells plays a role in negative regulation of immune response cell surface receptor linked signal transduction Human genes to go (> 300 bp exonic sequence covered) # TREE_ID POSTPROB EXONIC DESCRIPTION 1 ((H,G)C) 1 869 transmembrane protease 2 ((C,G)H) 1 835 Hermansky-Pudlak syndrome 1 protein. 3 ((C,G)H) 1 767 4 ((C,G)H) 1 757 meningioma (disrupted in balanced translocation) 1 (MN1) 5 ((H,G)C) 1 756 Zinc finger protein 230 (Zinc finger protein FDZF2). 6 ((C,G)H) 1 740 solute carrier family 22 (organic cation transporter) 7 ((H,G)C) 1 736 Sel-1 homolog precursor (Suppressor of lin-12-like protein) (Sel-1L). 8 ((C,G)H) 1 696 Docking protein 5 (Downstream of tyrosine kinase 5) (IRS6) (Protein dok-5). 9 ((H,G)C) 1 693 Melatonin-related receptor (G protein-coupled receptor 50) (H9). 10 ((C,G)H) 1 688 FLJ35348 (FLJ35348) on chromosome 9 11 ((H,G)C) 1 687 Membrane progestin receptor beta (mPR beta) 12 ((H,G)C) 1 676 13 ((C,G)H) 1 647 Disks large-associated protein 1 (DAP-1) 14 ((H,G)C) 1 647 Olfactory receptor 6S1. 15 ((C,G)H) 1 634 Secretogranin-2 precursor (Secretogranin II) 16 ((C,G)H) 1 619 CUB and sushi domain-containing protein 2 17 ((C,G)H) 1 619 HMG2 like isoform 1 .. 58 ((H,G)C) 0,95 365 Peptidyl-prolyl cis-trans isomerase A Amino acid substitution patterns depend on the sequence phylogeny AA H , C , G C , H , G positions X Y Y X Y Y H C G 9,482 (17) (75%) (59%) 24 (41%) H G C 1,449 12 (11%) (41%) C G H 1,787 34 (17) (14%) (59%) (59%) When did species-specific morphological characteristics evolve? ? Sahelanthropus (~6.5 MYBP) Sivapithecus (~13 MYBP) Proconsul (~18 MYBP) Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) When did species-specific morphological characteristics evolve? ? Sahelanthropus (~6.5 MYBP) Sivapithecus (~13 MYBP) Proconsul (~18 MYBP) Aegyptopithecus (~33 MYBP) Morothopithecus (~21 MYBP) When subsequent branches get compressed... Radiation: Speciation events compressed in time Compression of subsequent branches can cause Anomalous Gene Trees (AGTs) ANOMALOUS SEQUENCE TREES: incongruent sequence trees that are more likely to be observed than congruent sequence trees Degnan and Rosenberg (2006) PLOS Genetics 2:762-769 The problem of anomalous gene trees Not all unlabelled rooted tree topologies are equally probable 1/3 2/3 Symmetric labelled trees are more probable than asymmetric labelled trees a b c d a c b d a d b c c d b a a b c d c d a b a b d c 1/3 2/3 a c b d b d a c a c d b b d c a a d b c b c d a a d c b b c a d If we generate rooted 4-taxa trees with a coalescent model, a symmetric labelled topology has a higher probability (1/9) then an asymmetric labelled topology (1/18) Asymmetric 4-taxa species trees can produce AGTs P1 P2 P3 P4 1/9 1/12 P1 P2 P3 P4 Asymmetric 4-taxa species trees can produce AGTs P1 P2 P3 P4 1/9 1/12 P1 P2 P3 P4 Topology preference in real data All (%) Gene (%) Exon (%) All (%) Gene (%) Exon (%) 20 (0.17) 8 (0.17) 2 (0.32) 4 (0.03) 1 (0.02) 0 9,148 (76.58) 3,814 (78.85) 487 (78.93) 1,369 (11.46) 504 (10.42) 63 (10.21) 19 (0.16) 10 (0.21) 2 (0.32) 13 (0.11) 6 (0.12) 1 (0.16) 0 0 0 5 (0.04) 0 0 1 (0.01) 0 0 1,361 (11.39) 492 (10.17) 62 (10.05) 5 (0.04) 2 (0.04) 0 0 0 0 0 0 0 0 0 0 0 0 0 29 38 AGTs in n-taxa species trees (n>5) The probability for each labelled tree is 1/90, 1/60, and 1/180, respectively (5 taxa case) proof that every species tree topology with five or more species can give rise to AGTs AGTs in n-taxa species trees (n>5) The probability for each labelled tree is 1/90, 1/60, and 1/180, respectively (5 taxa case) proof that every species tree topology with five or more species can give rise to AGTs Species tree reconstruction by means of taking the consensus from many sequence trees can be positively missleading A fast method to reconstruct species trees in the presence of AGTs A simple solution: rooted triples do not display the AGT phenomenon 1/3 The idea of the Triple Consensus Method (TCM) L3 L11 L12 L13A L15 L23 L27 L27a L32 L35 Anurida maritima X X X X X X X X X X Archispirostreptus gigas X X X X X X X X X X Astrosclera willeyana X X X X X X X X X X Barentsia elongata X X X X X X X X X X Flustra foliacea X X X X X X X X X X Littorina saxatilis X X X X X X X X X X Lubomirskia baicalensis X X X X X X X X X X Psoroptes ovis X X X X X X X X X X Sipunculus nudus X X X X X X X X X X Xenoturbella bocki X X X X X X X X X X Generate Consensus Tree Simulation Generate sequence tree (b) (a) Simulation Results (perfect sequence trees) 20 taxa tree, 1000 replications per parameter combination Simulation Results (reconstructed sequence trees (ML)) ML sequence tree true sequence tree 20 taxa tree, 1000 replications per parameter combination Application to biological data Application to biological data Anopheles g. C. briggsae Homo Ciona s. C. elegans Apis m. Pan t. Maccaca m. Bos t. Canis f. Saccharomyces s. Drosophila m. Monodelphis d. Mus m. Rattus n.