ASTRAL and ASTRAL-PRO: Species tree estimation from gene family trees, addressing gene duplication and loss
Siavash Mirarab Electrical and Computer Engineering University of California, San Diego
1 Gene tree discordance
gene 1 gene1000
Gorilla Human Chimp Orang. Gorilla Chimp Human Orang.
2 Gene tree discordance
The species tree
Gorilla Human Chimp Orangutan gene 1 gene1000
Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree
2 Gene tree discordance
The species tree
Gorilla Human Chimp Orangutan gene 1 gene1000
Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree
Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)
2 Gene tree discordance
The species tree
Gorilla Human Chimp Orangutan gene 1 gene1000
Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree
Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)
3 Gene tree discordance
The species tree
Gorilla Human Chimp Orangutan gene 1 gene1000
Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree
Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)
4 Incomplete Lineage Sorting (ILS)
• Can occur when multiple alleles of a gene persist (fail to coalesce) during the lifetime of an ancestral population
“gene” here simply refers to a recombination-free locus in the genome
5 Incomplete Lineage Sorting (ILS)
• Can occur when multiple alleles of a gene persist (fail to coalesce) during the lifetime of an ancestral population
“gene” here simply refers to a recombination-free locus in the genome
5 Incomplete Lineage Sorting (ILS)
• Can occur when multiple alleles of a gene persist (fail to coalesce) during the lifetime of an ancestral population
• Always possible. Likely for:
• Short branches (# generations)
• High population sizes
“gene” here simply refers to a recombination-free locus in the genome
5 Gene duplication and loss (GDL)
• Duplication: A gene copies itself to a new position, creating a new locus. Loss: An existing locus is removed from the genome on some lineage
picture from evolution-textbook.org (Fig 5-20), redrawn6 from Eisen, Genome Research, 1998 Gene duplication and loss (GDL)
• Duplication: A gene copies itself to a new position, creating a new locus. Loss: An existing locus is removed from the genome on some lineage • Paralogs: genes that diverged in a duplication event; e.g: 2A and 3B Orthologs: genes that diverged in a speciation event; e.g: 1B and 3B
picture from evolution-textbook.org (Fig 5-20), redrawn6 from Eisen, Genome Research, 1998 Gene duplication and loss (GDL)
• Duplication: A gene copies itself to a new position, creating a new locus. Loss: An existing locus is removed from the genome on some lineage • Paralogs: genes that diverged in a duplication event; e.g: 2A and 3B Orthologs: genes that diverged in a speciation event; e.g: 1B and 3B • A gene tree with paralogous genes may differ from the species tree – We strive for finding orthologous genes, but we may fail
picture from evolution-textbook.org (Fig 5-20), redrawn6 from Eisen, Genome Research, 1998 Species tree
Gorilla Human Chimp Orangutan
Gene evolution model
Gene tree Gene tree Gene tree Gene tree
Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human
Sequence evolution model Sequence data Sequence data (Alignments) (Alignments) ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 7 C-TA-CACGGTG AGCTAC-CACGGAT Species tree
Gorilla Human Chimp Orangutan
Gene evolution model
Gene tree Gene tree Gene tree Gene tree
Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human
Sequence evolution model Sequence data Sequence data (Alignments) (Alignments) ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 8 C-TA-CACGGTG AGCTAC-CACGGAT Gene tree evolution models
• Multi-species coalescent (MSC) model
• A statistical gene tree evolution model for ILS [Pamilo and Nei, 1988] [Rannala and Yang, 2003]
• Does not model recombination within a gene
• In theory, we can infer the species tree given a large randomly distributed sample of recombination-free, reticulation-free, error-free orthologous gene trees
9 Gene tree evolution models
• Multi-species coalescent (MSC) model
• A statistical gene tree evolution model for ILS [Pamilo and Nei, 1988] [Rannala and Yang, 2003]
• Does not model recombination within a gene
• In theory, we can infer the species tree given a large randomly distributed sample of recombination-free, reticulation-free, error-free orthologous gene trees
• Gene duplication and loss models:
• Typically modeled using birth death processes, requiring a rate of birth and death (often fixed)
9 Gorilla Human Chimp Orangutan
Gene evolution model
Gene tree Gene tree Gene tree Gene tree
Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human
Sequence evolution model
ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan
Gene evolution model
Gene tree Gene tree Gene tree Gene tree
Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human
Step 1: inferSequence gene trees evolution (traditional model methods)
ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan StepGene 2: infer evolution species model trees Gene tree Gene tree Gene tree Gene tree
Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human
Step 1: inferSequence gene trees evolution (traditional model methods)
ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan StepGene 2:Challenge infer evolution species model 1: trees InferringGene tree the Genespecies tree treeGene under tree MSCGene tree for large datasets Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human
Step 1: inferSequence gene trees evolution (traditional model methods)
ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Unrooted quartets under MSC model
For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)
Chimp Gorilla θ1=70% θ2=15% θ3=15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp
Human Orang. Human Orang. Human Gorilla Human Orang.
11 Unrooted quartets under MSC model
For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)
Chimp Gorilla θ1=70% θ2=15% θ3=15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp
Human Orang. Human Orang. Human Gorilla Human Orang.
The most frequent gene tree = The most likely species tree
11 More than 4 species
For 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
Gorilla Human Chimp Orang. Rhesus
12 More than 4 species
For 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
Gorilla Human Chimp Orang. Rhesus
n 1. Break gene trees into quartets of species (4) 2. Find the dominant tree for all quartets of taxa 3. Combine quartet trees
Some tools (e.g.. BUCKy-p [Larget, et al., 2010])
12 More than 4 species
For 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
Gorilla Human Chimp Orang. Rhesus
n 1. Break gene treesAlternative into quartets: of species n (4) weight all 3 quartet topologies 2. Find the dominant(4) tree for all quartets of taxa 3. Combine quartetby their trees frequency and find the optimal tree Some tools (e.g.. BUCKy-p [Larget, et al., 2010])
12 Maximum Quartet Support Species Tree
k the set of quartet trees induced by T Score(T) = |Q(T) ∩ Q(ti)| ∑ a gene tree 1
• Optimization problem (NP-Hard):
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
13 Maximum Quartet Support Species Tree
k the set of quartet trees induced by T Score(T) = |Q(T) ∩ Q(ti)| ∑ a gene tree 1
• Optimization problem (NP-Hard):
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
• Statistically consistent under the multi-species coalescent model
13 Maximum Quartet Support Species Tree
k the set of quartet trees induced by T Score(T) = |Q(T) ∩ Q(ti)| ∑ a gene tree 1
• Optimization problem (NP-Hard):
Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
• Statistically consistent under the multi-species coalescent model
• ASTRAL: solved efficiently using dynamic programming
13 ASTRAL versions
• ASTRAL-I ( • This make it fast but it remains statistically consistent 14 ASTRAL versions • ASTRAL-I ( • This make it fast but it remains statistically consistent • ASTRAL-II ( • Improved the accuracy at the expense of running time • Can handle polytomies in input gene trees 14 ASTRAL versions • ASTRAL-I ( • This make it fast but it remains statistically consistent • ASTRAL-II ( • Improved the accuracy at the expense of running time • Can handle polytomies in input gene trees • ASTRAL-III (>v. 5.1.1) changed the search space again for a better running time versus accuracy trade-off • Improved running time for unresolved trees; makes it feasible to remove very low support branches from gene trees. 14 ASTRAL versions • ASTRAL-I ( • This make it fast but it remains statistically consistent • ASTRAL-II ( • Improved the accuracy at the expense of running time • Can handle polytomies in input gene trees • ASTRAL-III (>v. 5.1.1) changed the search space again for a better running time versus accuracy trade-off • Improved running time for unresolved trees; makes it feasible to remove very low support branches from gene trees. • ASTRAL-MP: enables multi-threading with CPU and GPU 14 ASTRAL used widely Early use: • Plants: Wickett, et al., 2014, PNAS • Birds: Prum, et al., 2015, Nature ASTRAL • Xenoturbella, Cannon et al., 2016, Nature • Xenoturbella, Rouse et al., 2016, Nature • Flatworms: Laumer, et al., 2015, eLife • Shrews: Giarla, et al., 2015, Syst. Bio. ASTRAL-II • Frogs: Yuan et al., 2016, Syst. Bio. • Tomatoes: Pease, et al., 2016, PLoS Bio. • Angiosperms: Huang et al., 2016, MBE • Worms: Andrade, et al., 2015, MBE ASTRAL-III 15 Comparison to concatenation: depends on the level of discordance 1000 1000gene trees 200 gene200 trees 50 gene50 trees 30% ● 20% ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● Species tree topological error (FN) low medium high low medium high low medium high gene tree discordance Simulations, 200 species, ● ASTRAL−II ● CA−ML deep ILS [Mirarab and 16 Warnow, 2016] Comparison to concatenation: depends on the level of discordance 1000 1000gene trees 200 gene200 trees 50 gene50 trees 30% ● 20% ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● Species tree topological error (FN) low medium high low medium high low medium high gene tree discordance Simulations, 200 species, ● ASTRAL−II ● CA−ML deep ILS [Mirarab and 16 Warnow, 2016] Some ASTRAL-related papers • ASTRAL performs well under ILS and HGT (Davidson, Vachaspati, Mirarab, and Warnow). BMC Genomics (2015). • It can handle multiple alleles per species (Rabiee, Sayyari, and Mirarab). MPE (2019) • It computes branch length and branch support and can test for polytomies using quartet frequencies (Sayyari and Mirarab). MBE (2016) and Genes (2018) • Visualizing quartet discordance using DiscoVista (Sayyari, Whitfield, and Mirarab) MPE (2018) • Fragmentary sequences can negatively impact ASTRAL trees (Sayyari, Whitfield, and Mirarab). MBE (2017) • Filtering loci is not beneficial! (Molloy and Warnow), Systematic Biology (2018) • ASTRAL consistent under models of missing data (Nute, Molloy, Chou, and Warnow). BMC Genomics (2018) • How many genes does ASTRAL need? (Shekhar, Roch, and Mirarab). Transactions on Computational Biology and Bioinformatics (2017) • Using ASTRAL as a supertree method (Vachaspati and Warnow). Bioinformatics (2017) 17 Gorilla Human Chimp Orangutan StepGene 2: infer evolution species model trees Gene tree Gene tree Gene tree Gene tree Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human Step 1: inferSequence gene trees evolution (traditional model methods) ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 18 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan StepGene 2:Challenge infer evolution species model 2: trees ModelingGene tree causesGene tree of discordanceGene tree Geneother tree than ILS Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human Step 1: inferSequence gene trees evolution (traditional model methods) ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 18 C-TA-CACGGTG AGCTAC-CACGGAT Gene tree discordance The species tree Gorilla Human Chimp Orangutan gene 1 gene1000 Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS) 19 Gene family trees • Multi-labelled: each species can appear multiple times • Each instance is a copy belonging to a different locus 20 Gene family trees • Multi-labelled: each species can appear multiple times • Each instance is a copy belonging to a different locus • Nodes can be tagged as either speciation or duplication • The “tags” identify orthology and are usually unknown Speciation duplication 20 Extending quartet score to gene family trees D D a b c d e What is the quartet score of this species tree with respect to this gene tree? 21 Extending quartet score to gene family trees D D a b c d e What is the quartet score of this species tree with respect to this gene tree? 21 Extending quartet score to gene family trees D D a b c d e What is the quartet score D of this species tree with D respect to this gene tree? 21 Extending quartet score to gene family trees D D a b c d e What is the quartet score D of this species tree with D respect to this gene tree? 21 Extending quartet score to gene family trees D D a b c d e What is the quartet score D of this species tree with D respect to this gene tree? 21 Extending quartet score to gene family trees D D a b c d e = What is the quartet score D of this species tree with D respect to this gene tree? 22 Per-locus (PL) quartet score • A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree Per-locus (PL) quartet score • A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree A. Speciation-Driven (SD) quartets: those quartets whose topology is determined by a speciation event Per-locus (PL) quartet score • A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree A. Speciation-Driven (SD) quartets: those quartets whose topology is determined by a speciation event B. Equivalent SD quartets: two quarts whose tree topology could not have been different given speciation/duplication tags (we have a mathematical way to find these) Per-locus (PL) quartet score • A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree A. Speciation-Driven (SD) quartets: those quartets whose topology is determined by a speciation event B. Equivalent SD quartets: two quarts whose tree topology could not have been different given speciation/duplication tags (we have a mathematical way to find these) C. Per-locus quartet similarity: the number of SD quartets in the gene family that match the species tree, counting all equivalent quartets only once Why Per-Locus (PL)? • It counts loci as they existed at the time of speciation events • Subsequent dup/loss events do not “inflate” similarity • Meant to preserve MSC distributions as much as possible Why Per-Locus (PL)? • It counts loci as they existed at the time of speciation events • Subsequent dup/loss events do not “inflate” similarity • Meant to preserve MSC distributions as much as possible • Theorem: Given correctly rooted/tagged gene trees, maximizing PL quartet score is statistically consistent under the GDL model Why Per-Locus (PL)? • It counts loci as they existed at the time of speciation events • Subsequent dup/loss events do not “inflate” similarity • Meant to preserve MSC distributions as much as possible • Theorem: Given correctly rooted/tagged gene trees, maximizing PL quartet score is statistically consistent under the GDL model • With no dup/loss, PL score is just the normal quartet score (so also consistent under the MSC model) ASTRAL-Pro Input: A set of (unrooted) gene family trees Output: An unrooted species tree ASTRAL-Pro Input: A set of (unrooted) gene family trees Output: An unrooted species tree 1. Root and tag input gene family trees using a parsimony approach ASTRAL-Pro Input: A set of (unrooted) gene family trees Output: An unrooted species tree 1. Root and tag input gene family trees using a parsimony approach 2. Use a dynamic programming similar to ASTRAL to find the tree that maximizes the PL quartet score But does it work? 26 Dup: 0 Dup: 1 Dup: 5 ● Input: Est. (100bp) 15% ● ● ● ● ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● 0% Input: Est. (500bp) ● 15% ● ● ● Simulations 10% ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% • Under DLCoal (both ILS and GDL) using Simphy Species tree error (NRF) error tree Species • 1000 genes, 25+1 species 15% ● Input: true • Vary: Dup rate, Loss/Dup rate, ILS level ● ● 10% ● • Default: very high ILS (70% RF distance)● ● ● ● ● ● 5% ● ● ● ● • True and estimated gene trees. ● ● ● ● ● ● ● — 100bp and 500bp gene● alignments● with ● ● ● ● ● 0%moderate● ● (36%) and low ●(15%)● gene tree error● ● 0 20 50 70 0 20 50 70 0 20 50 70 • Compare to: ILS level (RF%) ● MulRF● DupTree● A−Pro● ASTRAL−multi [Chaudhary, 2013] [Wehe, 2008] [Rabiee, 2019] used for duploss [Legried, 2019] 27 Dup: 0 Dup: 1 Dup: 5 ● Input: Est. (100bp) 15% ● ● ● ● ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● 0% Input: Est. (500bp) ● 15% ● ● ● Simulations 10% ● dup rate ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● Dup: 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% 0.20 • Under DLCoal (both ILS and GDL) using Simphy ● ● Species tree error (NRF) error tree Species ● • 1000 genes, 25+1 species 0.15 Loss/Dup: 0 15% ● Input: true • Vary: Dup rate, Loss/Dup rate, ILS level ● 0.10● 10% ● ● • Default: very high ILS (70% RF distance)● ● ● ● ● 0.05 ● 5%• ● ● ● ● True and estimated gene trees. ● ● ● ● Species tree error (NRF) ● ● ● ● ● — 100bp and 500bp gene● alignments● with ● ● ● ● ● ● ● 0%moderate● ● (36%) and low ●(15%)● gene tree error● ● 0.00 0 20 50 70 0 20 50 70 0 20 50 70 true • Compare to: Est. (100bp)Est. (500bp) ILS level (RF%) ● MulRF● DupTree● A−Pro● ASTRAL−multi loss rate [Chaudhary, 2013] [Wehe, 2008] [Rabiee, ●2019]MulRF ● DupTree dup● A−Pro used for duploss [Legried, 2019] 27 Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup: 1 0.20 ● ● ● Loss/Dup: 0 ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● Dup: 5 ● Dup: 4 Dup: 3 Dup: 2 ● Dup: 1 0.20 ● ● ● 0.10 ● ● ● ● Loss/Dup: 0 ● ● 0.15 ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.05 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● Loss/Dup: 0.1 0.15 ● ● 0.00 ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● Loss/Dup: 0.5 ● ● ● ● 0.15 ● 0.00 ● ● ● ● ● ● ● Loss/Dup: 0.5 0.15 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 0.10 ● ● 0.10 Species tree error (NRF) ● ● ● ● ● ● Species tree error (NRF) ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup:Dup: 15 ● Dup: 4 Dup: 3 Dup: 2 Dup: 1 ● ● 0.20 ● 0.20 ● ● ● ● ● ● ● Loss/Dup: 0 ● ● Loss/Dup: 0 0.05 ● ● ● Loss/Dup: 1 ● ● ● ● ● ● 0.15 ● ● 0.15 Changing 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● 0.100.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dup and ● ● ● ● ● ● 0.05 0.050.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.00 0.00 ● true true true true true ● Loss rates ● Loss/Dup: 0.1 ● Loss/Dup: 0.1 ● ● Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) 0.15 ● ● 0.15 ● ● ● ● ● ● Gene tree type● ● Loss/Dup: 1 ● ● ● 0.15 ● ● ● ● ● ● ● 0.10 ● ● ●0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● MulRF ● DupTree ● A−Pro ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.10 0.00 ● ● Loss/Dup: 0.5 ● • ● Loss/Dup: 0.5 ● 0.15 ● very high ILS 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 (70% RF 0.10 0.05 ● ● ● ● ● ● Species tree error (NRF) Species tree error (NRF) ● ● ● ● ● ● ● ● ● ● ● ● ● ● distance)● ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Loss/Dup: 1 Loss/Dup: 1 ● ● ● ● ● ● 0.15 ● 0.15 ● ● ● true true ● true true true ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp)0.10 Est. (100bp)● Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Gene tree type true true true true truetrue true true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est.Est. (100bp)(100bp)Est.Est. (500bp)(500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Gene tree type Gene tree type ● ● ● ● MulRF ● DupTree ● A−Pro MulRF ● MulRF ● DupTreeDupTree ● A−Pro A−Pro 28 Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup: 1 0.20 ● ● ● Loss/Dup: 0 ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● Loss/Dup: 0.5 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● Species tree error (NRF) ● Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup:Dup: 15 Dup: 4 Dup: 3 Dup: 2 Dup: 1 ● 0.20 ● 0.20 ● ● ● ● ● ● Loss/Dup: 0 0.05● ● ● ● ● Loss/Dup: 0 ● 0.15 Changing● ● 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dup and ● ● ● ● 0.05 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 ● ● Loss rates ● Loss/Dup: 0.1 ● Loss/Dup: 0.1 ● ● ● ● 0.15 ● ● 0.15 ● ● ● ● ● ● ● ● Loss/Dup: 1 ● ● ● 0.15 ● ● ● ● ● ● ● 0.10 ● ● ●0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.10 0.00 ● ● Loss/Dup: 0.5 ● • ● Loss/Dup: 0.5 ● 0.15 ● very high ILS 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 (70% RF 0.10 0.05 ● ● ● ● ● ● Species tree error (NRF) Species tree error (NRF) ● ● ● ● ● ● ● ● ● ● ● ● ● ● distance)● ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Loss/Dup: 1 Loss/Dup: 1 ● ● ● ● ● ● 0.15 ● 0.15 ● ● ● true true ● true true true ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp)0.10 Est. (100bp)● Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Gene tree type true true true true truetrue true true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est.Est. (100bp)(100bp)Est.Est. (500bp)(500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Gene tree type Gene tree type ● ● ● ● MulRF ● DupTree ● A−Pro MulRF ● MulRF ● DupTreeDupTree ● A−Pro A−Pro 28 Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup: 1 0.20 ● ● ● Loss/Dup: 0 ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● Loss/Dup: 0.5 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● Species tree error (NRF) ● Dup: 5 Dup: 24 Dup: 13 Dup:Dup: 0.2 2 Dup: 01 ● ● 0.20 ● ● ● ● Loss/Dup: 0 0.05 ● ● ● ● ● Changing● 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● Dup and ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● Loss rates ● ● ● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● ● Loss/Dup: 1 ● ● 0.15 ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.10 0.00 ● ● • ● Loss/Dup: 0.5 very high ILS 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● (70% RF 0.10 ● 0.05 ● ● ● ● Species tree error (NRF) ● ● ● ● ● ● ● ● ● ● ● distance) ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Loss/Dup: 1 ● ● ● 0.15 ● ● true true ● true true true ● ● ● ● ● ● Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)0.10 Est. (100bp)● Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● Gene tree type true true true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Gene tree type ● ● ● MulRF ● MulRF ● DupTreeDupTree ● A−Pro A−Pro 28 Dup: 0 Dup: 1 Dup: 5 ● Input: Est. (100bp) Dup: 0 Dup: 1 Dup: 5 ● 15% ● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp) 0.100 ● Changing● ● 10% ● ILS: 0 0.075 15% ● ● ● ● 0.050 ● ● ● ● ● 5% ● ● ILS levels● ● 0.025 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● 0.075 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● 0% 0.050 15% 0.025 ● Input: true Species tree error (NRF) error tree Species • Loss/Dup rate = 1 ● 0.000 ● 10% ● 15% ● Input: true ● ● ● 0.10 ILS: 50 ● ● ● ● ● ● 5% ● ● ● ● 10% ● ● Species tree error (NRF) ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● ● 0 20 50 70 ● 0● 20 50 70 0 20 50 70 ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ● 0% ILS level (RF%) ILS: 70 0.04 ILS: 0 ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● 0.02 ● ● ILS level (RF%) 0.05 ● ● 0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi true true true ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● MulRF ● A−Pro ● ● 0.01 ● ● ● ● ● 0.00 ● 0.075 ● ● ● ● ILS: 50 ● ● 0.050 ● ● ● Species tree error (NRF) ● ● ● ● 0.025 ● ● ● ● ● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ● ● ● ● 0.05 ● ● ● ● ● ● true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● MulRF ● A−Pro Dup: 0 Dup: 1 Dup: 5 ● Input: Est. (100bp) Dup: 0 Dup: 1 Dup: 5 ● 15% ● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp) 0.100 ● Changing● ● 10% ● ILS: 0 0.075 15% ● ● ● ● 0.050 ● ● ● ● ● 5% ● ● ILS levels● ● 0.025 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● 0.075 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● 0% 0.050 15% 0.025 ● Input: true Species tree error (NRF) error tree Species • Loss/Dup rate = 1 ● 0.000 ● 10% ● 15% ● Input: true ● ● ● 0.10 ILS: 50 ● ● ● ● ● ● 5% ● ● ● ● 10% ● ● Species tree error (NRF) ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● ● 0 20 50 70 ● 0● 20 50 70 0 20 50 70 ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ● 0% ILS level (RF%) ILS: 70 0.04 ILS: 0 ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● 0.02 ● ● ILS level (RF%) 0.05 ● ● 0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi true true true ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● MulRF ● A−Pro ● ● 0.01 ● ● ● ● ● 0.00 ● 0.075 ● ● ● ● ILS: 50 ● ● 0.050 ● ● ● Species tree error (NRF) ● ● ● ● 0.025 ● ● ● ● ● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ● ● ● ● 0.05 ● ● ● ● ● ● true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● MulRF ● A−Pro Dup: 0 Dup: 1 Dup: 5 ● Input: Est. (100bp) Dup: 0 Dup: 1 Dup: 5 ● 15% ● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp) ● ● 0.100 ● ● Changing● ● ILS: 0 10% ● ILS: 0 0.075 15% ● ● ● ● ● 0.050 ● ● ● ● ● ● ● ● ● ● ● ● 5% ● 0.025 ● ILS levels● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● ● 0.075 ILS: 20 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● ● 0% 0.050 ● ● ● ● 15% ● Input: true ● ● 0.025 ● ● ● ● Species tree error (NRF) error tree Species ● ● ● ● ● ● ● ● ● • Loss/Dup rate = 1 ● 0.000 ●● ● 10% ● 15% ● Input: true ● ● ● ● ILS: 50 0.10 ILS: 50 ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● 10% ● ● ● Species tree error (NRF) ● Species tree error (NRF) ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● ● ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● 0 20 50 70 0● 20 50 70 0 20 50 70 ● ● ● ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ● ● ILS: 70 0% ILS level (RF%) ILS: 70 ● ILS: 0 0.04 ● ● ● ● ● ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ILS level (RF%) 0.05 ● ● ● ● ● ● ● 0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi truetrue truetrue truetrue ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● ● ● ● MulRF DupTree● AA−Pro−Pro ● ● 0.01 ● ● ● ● ● 0.00 ● 0.075 ● ● ● ● ILS: 50 ● ● 0.050 ● ● ● Species tree error (NRF) ● ● ● ● 0.025 ● ● ● ● ● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ● ● ● ● 0.05 ● ● ● ● ● ● true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● MulRF ● A−Pro Dup: 0 Dup: 1 Dup: 5 ● Input: Est. (100bp) Dup: 0 Dup: 1 Dup: 5 ● 15% ● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp) ● ● 0.100 ● ● Changing● ● ILS: 0 ILS: 0 10% ● ILS: 0 0.075 15% ● ● ● ● ● 0.050 ● ● ● ● ● ● ● ● ● ● ● ● 5% ● 0.025 ● ILS levels● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● ● 0.075 ILS: 20 ILS: 20 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● ● 0% 0.050 ● ● ● ● 15% ● Input: true ● ● 0.025 ● ● ● ● Species tree error (NRF) error tree Species ● ● ● ● ● ● ● ● ● • Loss/Dup rate = 1 ● 0.000 ●● ● 10% ● 15% ● Input: true ● ● ● ● ILS: 50 ILS: 50 0.10 ILS: 50 ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● 10% ● ● ● Species tree error (NRF) Species tree error (NRF) ● Species tree error (NRF) ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● ● ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● 0 20 50 70 0● 20 50 70 0 20 50 70 ● ● ● ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ● ● ILS: 70 0% ILS: 70 ILS level (RF%) ILS: 70 ● ILS: 0 0.04 ● ● ● ● ● ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ILS level (RF%) 0.05 ● ● ● ● ● ● ● 0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi truetrue truetrue truetrue ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● ● ● ● MulRF MulRFDupTree DupTree● A−Pro AA−Pro−ASTRALPro −multi ● ● 0.01 ● ● ● ● ● 0.00 ● 0.075 ● ● ● ● ILS: 50 ● ● 0.050 ● ● ● Species tree error (NRF) ● ● ● ● 0.025 ● ● ● ● ● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ● ● ● ● 0.05 ● ● ● ● ● ● true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● MulRF ● A−Pro More species? Remains accurate 10% ● Gene trees ● 100 bp • 70% ILS, ● ● 500 bp ● ● true gt • Dup = 5 ● 5% ● ● ● n ● ● ● • Dup/Loss = 1 ● 10 ● ● ● 25 ● 100 • 1000 genes Species tree error (NRF) ● ● ● 250 ● 500 2% ● 1 100 Running time (minutes) 30 More genes? Increases accuracy ● Gene trees ● 25% ● ● 100 bp • 70% ILS, ● 500 bp ● ● true gt ● • Dup = 5 10% ● ● ● k • ● Dup/Loss = 1 5% ● 25 ● ● ● ● 100 • 25 species ● ● ● 250 2% Species tree error (NRF) ● ● ● 1,000 ● 2,500 1% ● ● ● 10,000 0.1 1.0 Running time (minutes) 31 Summary • MulRF: resilient to gene tree error, but very sensitive to ILS • DupTree: resilient to ILS but very sensitive to gene tree error • ASTRAL-multi: designed for multi-alleles. Theoretically consistent, empirically not good • A-Pro: relatively robust to ILS, GDL and gene tree estimation error 32 Biological dataset (plants) • Original publication only used 424 single-copy gene trees for 103 species • We has also estimated 9683 multi-copy gene trees for a subset of 83 species • Never found a good way to use this in the original study! • DupTree on this dataset produced a tree that could not be believed (many obviously wrong branches) 33 Biological dataset (plants) Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Persea Saruma Magnoliids Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 Biological dataset (plants) Breaking the order Lamiales Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Persea Saruma Magnoliids Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 Biological dataset (plants) Breaking the order Lamiales Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Consistent Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia with latest Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Magnoliids Persea onekp Saruma Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum [Nature, 2019] Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 Biological dataset (plants) Recovered both ways in the Breaking the order Lamiales literature (A-Pro similar to concat) Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Consistent Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia with latest Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Magnoliids Persea onekp Saruma Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum [Nature, 2019] Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 A-Pro is available • On bioRxiv: Zhang, Chao, Celine Scornavacca, Erin Molloy, and Siavash Mirarab. “ASTRAL-Pro: Quartet-Based Species Tree Inference despite Paralogy.” bioRxiv, 2019, 2019.12.12.874727. doi:10.1101/2019.12.12.874727. • On GitHub: https://github.com/chaoszhang/A-pro • More soon … 35 ASTRAL-Pro ASTRAL Chao Zhang Tandy Warnow Maryam Hashemi John Yin Celine Scornavacca Erin K. Molloy 36 Erfan Sayyari Chao Zhang