<<

ASTRAL and ASTRAL-PRO: Species tree estimation from gene family trees, addressing gene duplication and loss

Siavash Mirarab Electrical and Computer Engineering University of California, San Diego

1 Gene tree discordance

gene 1 gene1000

Gorilla Human Chimp Orang. Gorilla Chimp Human Orang.

2 Gene tree discordance

The species tree

Gorilla Human Chimp Orangutan gene 1 gene1000

Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree

2 Gene tree discordance

The species tree

Gorilla Human Chimp Orangutan gene 1 gene1000

Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree

Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)

2 Gene tree discordance

The species tree

Gorilla Human Chimp Orangutan gene 1 gene1000

Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree

Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)

3 Gene tree discordance

The species tree

Gorilla Human Chimp Orangutan gene 1 gene1000

Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree

Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)

4 Incomplete Lineage Sorting (ILS)

• Can occur when multiple alleles of a gene persist (fail to coalesce) during the lifetime of an ancestral population

“gene” here simply refers to a recombination-free locus in the genome

5 Incomplete Lineage Sorting (ILS)

• Can occur when multiple alleles of a gene persist (fail to coalesce) during the lifetime of an ancestral population

“gene” here simply refers to a recombination-free locus in the genome

5 Incomplete Lineage Sorting (ILS)

• Can occur when multiple alleles of a gene persist (fail to coalesce) during the lifetime of an ancestral population

• Always possible. Likely for:

• Short branches (# generations)

• High population sizes

“gene” here simply refers to a recombination-free locus in the genome

5 Gene duplication and loss (GDL)

• Duplication: A gene copies itself to a new position, creating a new locus. Loss: An existing locus is removed from the genome on some lineage

picture from evolution-textbook.org (Fig 5-20), redrawn6 from Eisen, Genome Research, 1998 Gene duplication and loss (GDL)

• Duplication: A gene copies itself to a new position, creating a new locus. Loss: An existing locus is removed from the genome on some lineage • Paralogs: genes that diverged in a duplication event; e.g: 2A and 3B Orthologs: genes that diverged in a speciation event; e.g: 1B and 3B

picture from evolution-textbook.org (Fig 5-20), redrawn6 from Eisen, Genome Research, 1998 Gene duplication and loss (GDL)

• Duplication: A gene copies itself to a new position, creating a new locus. Loss: An existing locus is removed from the genome on some lineage • Paralogs: genes that diverged in a duplication event; e.g: 2A and 3B Orthologs: genes that diverged in a speciation event; e.g: 1B and 3B • A gene tree with paralogous genes may differ from the species tree – We strive for finding orthologous genes, but we may fail

picture from evolution-textbook.org (Fig 5-20), redrawn6 from Eisen, Genome Research, 1998 Species tree

Gorilla Human Chimp Orangutan

Gene evolution model

Gene tree Gene tree Gene tree Gene tree

Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Sequence evolution model Sequence data Sequence data (Alignments) (Alignments) ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 7 C-TA-CACGGTG AGCTAC-CACGGAT Species tree

Gorilla Human Chimp Orangutan

Gene evolution model

Gene tree Gene tree Gene tree Gene tree

Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Sequence evolution model Sequence data Sequence data (Alignments) (Alignments) ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 8 C-TA-CACGGTG AGCTAC-CACGGAT Gene tree evolution models

• Multi-species coalescent (MSC) model

• A statistical gene tree evolution model for ILS [Pamilo and Nei, 1988] [Rannala and Yang, 2003]

• Does not model recombination within a gene

• In theory, we can infer the species tree given a large randomly distributed sample of recombination-free, reticulation-free, error-free orthologous gene trees

9 Gene tree evolution models

• Multi-species coalescent (MSC) model

• A statistical gene tree evolution model for ILS [Pamilo and Nei, 1988] [Rannala and Yang, 2003]

• Does not model recombination within a gene

• In theory, we can infer the species tree given a large randomly distributed sample of recombination-free, reticulation-free, error-free orthologous gene trees

• Gene duplication and loss models:

• Typically modeled using birth death processes, requiring a rate of birth and death (often fixed)

9 Gorilla Human Chimp Orangutan

Gene evolution model

Gene tree Gene tree Gene tree Gene tree

Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Sequence evolution model

ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan

Gene evolution model

Gene tree Gene tree Gene tree Gene tree

Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Step 1: inferSequence gene trees evolution (traditional model methods)

ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan StepGene 2: infer evolution species model trees Gene tree Gene tree Gene tree Gene tree

Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Step 1: inferSequence gene trees evolution (traditional model methods)

ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan StepGene 2:Challenge infer evolution species model 1: trees InferringGene tree the Genespecies tree treeGene under tree MSCGene tree for large datasets Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Step 1: inferSequence gene trees evolution (traditional model methods)

ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA -CTGCACACGG CTGA-CAC-G 10 C-TA-CACGGTG AGCTAC-CACGGAT Unrooted quartets under MSC model

For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)

Chimp Gorilla θ1=70% θ2=15% θ3=15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp

Human Orang. Human Orang. Human Gorilla Human Orang.

11 Unrooted quartets under MSC model

For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)

Chimp Gorilla θ1=70% θ2=15% θ3=15% d=0.8 Chimp Gorilla Orang. Chimp Gorilla Chimp

Human Orang. Human Orang. Human Gorilla Human Orang.

The most frequent gene tree = The most likely species tree

11 More than 4 species

For 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)

Gorilla Human Chimp Orang. Rhesus

12 More than 4 species

For 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)

Gorilla Human Chimp Orang. Rhesus

n 1. Break gene trees into quartets of species (4) 2. Find the dominant tree for all quartets of taxa 3. Combine quartet trees

Some tools (e.g.. BUCKy-p [Larget, et al., 2010])

12 More than 4 species

For 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)

Gorilla Human Chimp Orang. Rhesus

n 1. Break gene treesAlternative into quartets: of species n (4) weight all 3 quartet topologies 2. Find the dominant(4) tree for all quartets of taxa 3. Combine quartetby their trees frequency and find the optimal tree Some tools (e.g.. BUCKy-p [Larget, et al., 2010])

12 Maximum Quartet Support Species Tree

k the set of quartet trees induced by T Score(T) = |Q(T) ∩ Q(ti)| ∑ a gene tree 1

• Optimization problem (NP-Hard):

Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees

13 Maximum Quartet Support Species Tree

k the set of quartet trees induced by T Score(T) = |Q(T) ∩ Q(ti)| ∑ a gene tree 1

• Optimization problem (NP-Hard):

Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees

• Statistically consistent under the multi-species coalescent model

13 Maximum Quartet Support Species Tree

k the set of quartet trees induced by T Score(T) = |Q(T) ∩ Q(ti)| ∑ a gene tree 1

• Optimization problem (NP-Hard):

Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees

• Statistically consistent under the multi-species coalescent model

• ASTRAL: solved efficiently using dynamic programming

13 ASTRAL versions

• ASTRAL-I (

• This make it fast but it remains statistically consistent

14 ASTRAL versions

• ASTRAL-I (

• This make it fast but it remains statistically consistent

• ASTRAL-II (

• Improved the accuracy at the expense of running time

• Can handle polytomies in input gene trees

14 ASTRAL versions

• ASTRAL-I (

• This make it fast but it remains statistically consistent

• ASTRAL-II (

• Improved the accuracy at the expense of running time

• Can handle polytomies in input gene trees

• ASTRAL-III (>v. 5.1.1) changed the search space again for a better running time versus accuracy trade-off

• Improved running time for unresolved trees; makes it feasible to remove very low support branches from gene trees.

14 ASTRAL versions

• ASTRAL-I (

• This make it fast but it remains statistically consistent

• ASTRAL-II (

• Improved the accuracy at the expense of running time

• Can handle polytomies in input gene trees

• ASTRAL-III (>v. 5.1.1) changed the search space again for a better running time versus accuracy trade-off

• Improved running time for unresolved trees; makes it feasible to remove very low support branches from gene trees.

• ASTRAL-MP: enables multi-threading with CPU and GPU

14 ASTRAL used widely

Early use:

: Wickett, et al., 2014, PNAS

• Birds: Prum, et al., 2015, Nature ASTRAL

• Xenoturbella, Cannon et al., 2016, Nature

• Xenoturbella, Rouse et al., 2016, Nature

• Flatworms: Laumer, et al., 2015, eLife

• Shrews: Giarla, et al., 2015, Syst. Bio. ASTRAL-II • Frogs: Yuan et al., 2016, Syst. Bio.

• Tomatoes: Pease, et al., 2016, PLoS Bio.

• Angiosperms: Huang et al., 2016, MBE

• Worms: Andrade, et al., 2015, MBE ASTRAL-III

15 Comparison to concatenation: depends on the level of discordance

1000 1000gene trees 200 gene200 trees 50 gene50 trees 30% ●

20% ●

● 10% ● ● ● ● ● ● ● ● ● ● ● ● Species tree topological error (FN)

low medium high low medium high low medium high gene tree discordance Simulations, 200 species, ● ASTRAL−II ● CA−ML deep ILS [Mirarab and

16 Warnow, 2016] Comparison to concatenation: depends on the level of discordance

1000 1000gene trees 200 gene200 trees 50 gene50 trees 30% ●

20% ●

● 10% ● ● ● ● ● ● ● ● ● ● ● ● Species tree topological error (FN)

low medium high low medium high low medium high gene tree discordance Simulations, 200 species, ● ASTRAL−II ● CA−ML deep ILS [Mirarab and

16 Warnow, 2016] Some ASTRAL-related papers

• ASTRAL performs well under ILS and HGT (Davidson, Vachaspati, Mirarab, and Warnow). BMC Genomics (2015).

• It can handle multiple alleles per species (Rabiee, Sayyari, and Mirarab). MPE (2019)

• It computes branch length and branch support and can test for polytomies using quartet frequencies (Sayyari and Mirarab). MBE (2016) and Genes (2018)

• Visualizing quartet discordance using DiscoVista (Sayyari, Whitfield, and Mirarab) MPE (2018)

• Fragmentary sequences can negatively impact ASTRAL trees (Sayyari, Whitfield, and Mirarab). MBE (2017)

• Filtering loci is not beneficial! (Molloy and Warnow), Systematic Biology (2018)

• ASTRAL consistent under models of missing data (Nute, Molloy, Chou, and Warnow). BMC Genomics (2018)

• How many genes does ASTRAL need? (Shekhar, Roch, and Mirarab). Transactions on Computational Biology and Bioinformatics (2017)

• Using ASTRAL as a supertree method (Vachaspati and Warnow). Bioinformatics (2017)

17 Gorilla Human Chimp Orangutan StepGene 2: infer evolution species model trees Gene tree Gene tree Gene tree Gene tree

Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Step 1: inferSequence gene trees evolution (traditional model methods)

ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA

-CTGCACACGG CTGA-CAC-G 18 C-TA-CACGGTG AGCTAC-CACGGAT Gorilla Human Chimp Orangutan StepGene 2:Challenge infer evolution species model 2: trees ModelingGene tree causesGene tree of discordanceGene tree Geneother tree than ILS Orang. Human Orang. Chimp Orang. Orang. Chimp Gorilla Gorilla Human Chimp Human Gorilla Chimp Human

Step 1: inferSequence gene trees evolution (traditional model methods)

ACTGCACACCG CTGAGCATCG AGCAGCATCGTG CAGGCACGCACGAA ACTGC-CCCCG CTGAGC-TCG AGCAGC-TCGTG AGC-CACGC-CATA AATGC-CCCCG ATGAGC-TC- AGCAGC-TC-TG ATGGCACGC-C-TA

-CTGCACACGG CTGA-CAC-G 18 C-TA-CACGGTG AGCTAC-CACGGAT Gene tree discordance

The species tree

Gorilla Human Chimp Orangutan gene 1 gene1000

Gorilla Human Chimp Orang. Gorilla Chimp Human Orang. A gene tree

Causes of gene tree discordance include: • Duplication and loss • Horizontal Gene Transfer (HGT) and Hybridization • Incomplete Lineage Sorting (ILS)

19 Gene family trees

• Multi-labelled: each species can appear multiple times

• Each instance is a copy belonging to a different locus

20 Gene family trees

• Multi-labelled: each species can appear multiple times

• Each instance is a copy belonging to a different locus

• Nodes can be tagged as either speciation or duplication

• The “tags” identify orthology and are usually unknown

Speciation duplication

20 Extending quartet score to gene family trees

D

D

a b c d e

What is the quartet score of this species tree with respect to this gene tree?

21 Extending quartet score to gene family trees

D

D

a b c d e

What is the quartet score of this species tree with respect to this gene tree?

21 Extending quartet score to gene family trees

D

D

a b c d e

What is the quartet score D of this species tree with D respect to this gene tree?

21 Extending quartet score to gene family trees

D

D

a b c d e

What is the quartet score D of this species tree with D respect to this gene tree?

21 Extending quartet score to gene family trees

D

D

a b c d e

What is the quartet score D of this species tree with D respect to this gene tree?

21 Extending quartet score to gene family trees

D

D

a b c d e =

What is the quartet score D of this species tree with D respect to this gene tree?

22 Per-locus (PL) quartet score

• A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree Per-locus (PL) quartet score

• A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree

A. Speciation-Driven (SD) quartets: those quartets whose topology is determined by a speciation event Per-locus (PL) quartet score

• A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree

A. Speciation-Driven (SD) quartets: those quartets whose topology is determined by a speciation event

B. Equivalent SD quartets: two quarts whose tree topology could not have been different given speciation/duplication tags (we have a mathematical way to find these) Per-locus (PL) quartet score

• A measure of similarity between a singly-labelled species tree and a rooted tagged (speciation or duplication) multi-labelled gene family tree

A. Speciation-Driven (SD) quartets: those quartets whose topology is determined by a speciation event

B. Equivalent SD quartets: two quarts whose tree topology could not have been different given speciation/duplication tags (we have a mathematical way to find these)

C. Per-locus quartet similarity: the number of SD quartets in the gene family that match the species tree, counting all equivalent quartets only once Why Per-Locus (PL)?

• It counts loci as they existed at the time of speciation events

• Subsequent dup/loss events do not “inflate” similarity

• Meant to preserve MSC distributions as much as possible Why Per-Locus (PL)?

• It counts loci as they existed at the time of speciation events

• Subsequent dup/loss events do not “inflate” similarity

• Meant to preserve MSC distributions as much as possible

• Theorem: Given correctly rooted/tagged gene trees, maximizing PL quartet score is statistically consistent under the GDL model Why Per-Locus (PL)?

• It counts loci as they existed at the time of speciation events

• Subsequent dup/loss events do not “inflate” similarity

• Meant to preserve MSC distributions as much as possible

• Theorem: Given correctly rooted/tagged gene trees, maximizing PL quartet score is statistically consistent under the GDL model

• With no dup/loss, PL score is just the normal quartet score (so also consistent under the MSC model) ASTRAL-Pro

Input: A set of (unrooted) gene family trees

Output: An unrooted species tree ASTRAL-Pro

Input: A set of (unrooted) gene family trees

Output: An unrooted species tree

1. Root and tag input gene family trees using a parsimony approach ASTRAL-Pro

Input: A set of (unrooted) gene family trees

Output: An unrooted species tree

1. Root and tag input gene family trees using a parsimony approach

2. Use a dynamic programming similar to ASTRAL to find the tree that maximizes the PL quartet score But does it work?

26 Dup: 0 Dup: 1 Dup: 5

● Input: Est. (100bp) 15% ●

● ● ● ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● 0% Input: Est. (500bp)

● 15%

● ● ● Simulations 10% ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% • Under DLCoal (both ILS and GDL) using Simphy Species tree error (NRF) error tree Species • 1000 genes, 25+1 species

15% ● Input: true • Vary: Dup rate, Loss/Dup rate, ILS level ● ● 10% ● • Default: very high ILS (70% RF distance)● ● ● ● ● ● 5% ● ● ● ● • True and estimated gene trees. ● ● ● ● ● ● ● — 100bp and 500bp gene● alignments● with ● ● ● ● ● 0%moderate● ● (36%) and low ●(15%)● gene tree error● ● 0 20 50 70 0 20 50 70 0 20 50 70 • Compare to: ILS level (RF%)

● MulRF● DupTree● A−Pro● ASTRAL−multi [Chaudhary, 2013] [Wehe, 2008] [Rabiee, 2019] used for duploss [Legried, 2019]

27 Dup: 0 Dup: 1 Dup: 5

● Input: Est. (100bp) 15% ●

● ● ● ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ● ● 0% Input: Est. (500bp)

● 15%

● ● ● Simulations 10% ● dup rate ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● ● ● ● Dup: 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% 0.20 • Under DLCoal (both ILS and GDL) using Simphy ●

● Species tree error (NRF) error tree Species ● • 1000 genes, 25+1 species 0.15 Loss/Dup: 0

15% ● Input: true • Vary: Dup rate, Loss/Dup rate, ILS level ● 0.10● 10% ● ● • Default: very high ILS (70% RF distance)● ● ● ● ● 0.05 ● 5%• ● ● ● ● True and estimated gene trees. ● ● ● ●

Species tree error (NRF) ● ● ● ● ● — 100bp and 500bp gene● alignments● with ● ● ● ● ● ● ● 0%moderate● ● (36%) and low ●(15%)● gene tree error● ● 0.00 0 20 50 70 0 20 50 70 0 20 50 70 true • Compare to: Est. (100bp)Est. (500bp) ILS level (RF%)

● MulRF● DupTree● A−Pro● ASTRAL−multi loss rate [Chaudhary, 2013] [Wehe, 2008] [Rabiee, ●2019]MulRF ● DupTree dup● A−Pro used for duploss [Legried, 2019]

27 Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup: 1

0.20 ●

● Loss/Dup: 0 ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00

● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● Dup: 5 ● Dup: 4 Dup: 3 Dup: 2 ● Dup: 1

0.20 ● ● ● 0.10 ● ● ● ● Loss/Dup: 0 ● ● 0.15 ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ●● ● ● ● ● ● ● 0.05 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ●

● ● Loss/Dup: 0.1 0.15 ● ● 0.00 ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● Loss/Dup: 0.5 ● ● ● ● 0.15 ● 0.00 ● ● ● ● ● ● ● Loss/Dup: 0.5 0.15 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● 0.10 ● ● 0.10 Species tree error (NRF) ● ● ● ● ● ● Species tree error (NRF) ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup:Dup: 15 ● Dup: 4 Dup: 3 Dup: 2 Dup: 1 ● ● 0.20 ● 0.20 ● ● ● ● ● ●

● Loss/Dup: 0 ● ● Loss/Dup: 0 0.05 ● ● ● Loss/Dup: 1 ● ● ● ● ● ● 0.15 ● ● 0.15 Changing 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● 0.100.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dup and ● ● ● ● ● ● 0.05 0.050.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 0.00 0.00 ● true true true true true ● Loss rates ● Loss/Dup: 0.1 ● Loss/Dup: 0.1 ● ● Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) 0.15 ● ● 0.15 ● ● ● ● ● ● Gene tree type● ● Loss/Dup: 1 ● ● ● 0.15 ● ● ● ● ● ● ● 0.10 ● ● ●0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● MulRF ● DupTree ● A−Pro ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.10 0.00

● ● Loss/Dup: 0.5 ● • ● Loss/Dup: 0.5 ● 0.15 ● very high ILS 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 (70% RF 0.10 0.05 ● ● ● ● ● ● Species tree error (NRF) Species tree error (NRF) ● ● ● ● ● ● ● ● ● ● ● ● ● ● distance)● ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● Loss/Dup: 1 Loss/Dup: 1 ● ● ● ● ● ● 0.15 ● 0.15 ● ● ● true true ● true true true ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp)0.10 Est. (100bp)● Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Gene tree type true true true true truetrue true true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est.Est. (100bp)(100bp)Est.Est. (500bp)(500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Gene tree type Gene tree type

● ● ● ● MulRF ● DupTree ● A−Pro MulRF ● MulRF ● DupTreeDupTree ● A−Pro A−Pro

28 Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup: 1

0.20 ●

● Loss/Dup: 0 ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00

● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ●

● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00

● Loss/Dup: 0.5 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ●

Species tree error (NRF) ● Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup:Dup: 15 Dup: 4 Dup: 3 Dup: 2 Dup: 1 ● 0.20 ● 0.20 ● ● ● ● ●

● Loss/Dup: 0 0.05● ● ● ● ● Loss/Dup: 0 ● 0.15 Changing● ● 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Dup and ● ● ● ● 0.05 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 ● ● Loss rates ● Loss/Dup: 0.1 ● Loss/Dup: 0.1 ● ● ● ● 0.15 ● ● 0.15 ● ● ● ● ● ● ● ● Loss/Dup: 1 ● ● ● 0.15 ● ● ● ● ● ● ● 0.10 ● ● ●0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.10 0.00

● ● Loss/Dup: 0.5 ● • ● Loss/Dup: 0.5 ● 0.15 ● very high ILS 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 (70% RF 0.10 0.05 ● ● ● ● ● ● Species tree error (NRF) Species tree error (NRF) ● ● ● ● ● ● ● ● ● ● ● ● ● ● distance)● ● ● ● 0.05 ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● Loss/Dup: 1 Loss/Dup: 1 ● ● ● ● ● ● 0.15 ● 0.15 ● ● ● true true ● true true true ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 Est. (100bp)Est. (500bp)● Est. (100bp)Est. (500bp)0.10 Est. (100bp)● Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Gene tree type true true true true truetrue true true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est.Est. (100bp)(100bp)Est.Est. (500bp)(500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Gene tree type Gene tree type

● ● ● ● MulRF ● DupTree ● A−Pro MulRF ● MulRF ● DupTreeDupTree ● A−Pro A−Pro

28 Dup: 5 Dup: 4 Dup: 3 Dup: 2 Dup: 1

0.20 ●

● Loss/Dup: 0 ● 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00

● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ●

● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00

● Loss/Dup: 0.5 0.15 ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ●

Species tree error (NRF) ● Dup: 5 Dup: 24 Dup: 13 Dup:Dup: 0.2 2 Dup: 01 ● ● 0.20 ● ● ●

● Loss/Dup: 0 0.05 ● ● ● ● ● Changing● 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● Dup and ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● Loss rates ●

● ● ● Loss/Dup: 0.1 0.15 ● ● ● ● ● ● Loss/Dup: 1 ● ● 0.15 ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.10 0.00

● • ● Loss/Dup: 0.5 very high ILS 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● (70% RF 0.10 ● 0.05 ● ● ● ● Species tree error (NRF) ● ● ● ● ● ● ● ● ● ● ● distance) ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● Loss/Dup: 1 ● ● ● 0.15 ● ● true true ● true true true ● ● ● ● ● ● Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)0.10 Est. (100bp)● Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● Gene tree type

true true true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Gene tree type

● ● ● MulRF ● MulRF ● DupTreeDupTree ● A−Pro A−Pro

28 Dup: 0 Dup: 1 Dup: 5

● Input: Est. (100bp)

Dup: 0 Dup: 1 Dup: 5 ● 15%

● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ●

● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp) 0.100 ● Changing● ●

10% ● ILS: 0 0.075 15% ● ● ● ● 0.050 ● ● ● ● ● 5% ● ● ILS levels● ● 0.025 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● 0.075 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● 0% 0.050

15% 0.025 ● Input: true Species tree error (NRF) error tree Species • Loss/Dup rate = 1 ● 0.000 ● 10% ●

15% ● Input: true ● ● ● 0.10 ILS: 50 ● ● ● ● ● ● 5% ● ● ● ● 10% ● ● Species tree error (NRF) ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● ● 0 20 50 70 ● 0● 20 50 70 0 20 50 70 ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ●

0% ILS level (RF%) ILS: 70 0.04 ILS: 0 ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● 0.02 ● ● ILS level (RF%) 0.05 ● ●

0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi true true true ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● MulRF ● A−Pro ● ● 0.01 ● ●

● ● ● 0.00

0.075 ● ● ●

● ILS: 50

● ●

0.050 ● ● ● Species tree error (NRF) ● ● ● ●

0.025 ● ● ● ●

● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ●

● ● ● 0.05 ● ● ● ● ● ●

true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)

● MulRF ● A−Pro Dup: 0 Dup: 1 Dup: 5

● Input: Est. (100bp)

Dup: 0 Dup: 1 Dup: 5 ● 15%

● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ●

● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp) 0.100 ● Changing● ●

10% ● ILS: 0 0.075 15% ● ● ● ● 0.050 ● ● ● ● ● 5% ● ● ILS levels● ● 0.025 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● 0.075 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● 0% 0.050

15% 0.025 ● Input: true Species tree error (NRF) error tree Species • Loss/Dup rate = 1 ● 0.000 ● 10% ●

15% ● Input: true ● ● ● 0.10 ILS: 50 ● ● ● ● ● ● 5% ● ● ● ● 10% ● ● Species tree error (NRF) ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● ● 0 20 50 70 ● 0● 20 50 70 0 20 50 70 ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ●

0% ILS level (RF%) ILS: 70 0.04 ILS: 0 ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● 0.02 ● ● ILS level (RF%) 0.05 ● ●

0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi true true true ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● MulRF ● A−Pro ● ● 0.01 ● ●

● ● ● 0.00

0.075 ● ● ●

● ILS: 50

● ●

0.050 ● ● ● Species tree error (NRF) ● ● ● ●

0.025 ● ● ● ●

● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ●

● ● ● 0.05 ● ● ● ● ● ●

true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)

● MulRF ● A−Pro Dup: 0 Dup: 1 Dup: 5

● Input: Est. (100bp)

Dup: 0 Dup: 1 Dup: 5 ● 15%

● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ●

● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp)

● ● 0.100 ● ● Changing● ● ILS: 0 10% ● ILS: 0 0.075 15% ● ● ● ● ● 0.050 ● ● ● ● ● ● ● ● ● ● ● ● 5% ● 0.025 ● ILS levels● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● ● 0.075 ILS: 20 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● ● 0% 0.050 ● ● ● ●

15% ● Input: true ● ● 0.025 ● ● ● ● Species tree error (NRF) error tree Species ● ● ● ● ● ● ● ● ● • Loss/Dup rate = 1 ● 0.000 ●● ● 10% ●

15% ● Input: true ● ● ● ● ILS: 50 0.10 ILS: 50 ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● 10% ● ● ● Species tree error (NRF) ● Species tree error (NRF) ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● ● ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● 0 20 50 70 0● 20 50 70 0 20 50 70 ● ● ● ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ● ● ILS: 70 0% ILS level (RF%) ILS: 70 ● ILS: 0 0.04 ● ● ● ● ● ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ILS level (RF%) 0.05 ● ● ● ● ● ● ● 0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi truetrue truetrue truetrue ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● ● ● ● MulRF DupTree● AA−Pro−Pro ● ● 0.01 ● ●

● ● ● 0.00

0.075 ● ● ●

● ILS: 50

● ●

0.050 ● ● ● Species tree error (NRF) ● ● ● ●

0.025 ● ● ● ●

● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ●

● ● ● 0.05 ● ● ● ● ● ●

true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)

● MulRF ● A−Pro Dup: 0 Dup: 1 Dup: 5

● Input: Est. (100bp)

Dup: 0 Dup: 1 Dup: 5 ● 15%

● Input: Est. (100bp) ● ● ● ● ● ● ● ● ● 10% ● 15% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5% ● ● ● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● 5% ● ● ● ● ● ● ● ● ● ● ● ●

● Input: Est. (500bp) 0% ● 15% Dup: 5 Dup: 1 Dup: 0 0.125 ● Input: Est. (500bp)

● ● 0.100 ● ● Changing● ● ILS: 0 ILS: 0 10% ● ILS: 0 0.075 15% ● ● ● ● ● 0.050 ● ● ● ● ● ● ● ● ● ● ● ● 5% ● 0.025 ● ILS levels● ● ● 10% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.100 0% ● 5% ● ● ● ● ● ● ● ● ● ● ● 0.075 ILS: 20 ILS: 20 ● ● ● ILS: 20 ● ● (NRF) error tree Species ● ● ● ● ● ● ● ● ● ● ● ● 0% 0.050 ● ● ● ●

15% ● Input: true ● ● 0.025 ● ● ● ● Species tree error (NRF) error tree Species ● ● ● ● ● ● ● ● ● • Loss/Dup rate = 1 ● 0.000 ●● ● 10% ●

15% ● Input: true ● ● ● ● ILS: 50 ILS: 50 0.10 ILS: 50 ● ● ● ● ● ● ● 5% ● ● ● ● ● ● ● ● 10% ● ● ● Species tree error (NRF) Species tree error (NRF) ● Species tree error (NRF) ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0% ● ● ● ● 5% ● ● ● ● Dup: 5 Dup: 1 Dup: 0 ● ● ● ● 0 20 50 70 0● 20 50 70 0 20 50 70 ● ● ● ● ● ● 0.06 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.15 ● ● ILS: 70

0% ILS: 70 ILS level (RF%) ILS: 70 ● ILS: 0 0.04 ● ● ● ● ● ● ● 0 20 50 70 0 20● 50MulRF 70 0● 20 DupTree 50 70 ● A−Pro0.10 ● ASTRAL−multi ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ILS level (RF%) 0.05 ● ● ● ● ● ● ● 0.00 ● ● ● ● MulRF● DupTree● A−Pro● ASTRAL−multi truetrue truetrue truetrue ● ● Est.0.03 (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est.● (500bp) ILS: 20 ● ● 0.02 ● ● 29 ● ● ● ● ● ● MulRF MulRFDupTree DupTree● A−Pro AA−Pro−ASTRALPro −multi ● ● 0.01 ● ●

● ● ● 0.00

0.075 ● ● ●

● ILS: 50

● ●

0.050 ● ● ● Species tree error (NRF) ● ● ● ●

0.025 ● ● ● ●

● ● ● 0.15 ● ILS: 70 ● ● ● 0.10 ● ●

● ● ● 0.05 ● ● ● ● ● ●

true true true Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp) Est. (100bp)Est. (500bp)

● MulRF ● A−Pro More species? Remains accurate

10% ● Gene trees

● 100 bp • 70% ILS, ● ● 500 bp ● ● true gt • Dup = 5 ● 5% ● ● ● n ●

● ● • Dup/Loss = 1 ● 10 ● ● ● 25 ● 100 • 1000 genes Species tree error (NRF) ● ● ● 250 ● 500 2% ● 1 100 Running time (minutes)

30 More genes? Increases accuracy

● Gene trees

● 25% ● ● 100 bp • 70% ILS, ● 500 bp

● ● true gt ● • Dup = 5 10% ● ● ● k • ● Dup/Loss = 1 5% ● 25 ● ● ● ● 100 • 25 species ● ● ● 250 2% Species tree error (NRF) ● ● ● 1,000 ● 2,500 1% ● ● ● 10,000 0.1 1.0 Running time (minutes)

31 Summary

• MulRF: resilient to gene tree error, but very sensitive to ILS

• DupTree: resilient to ILS but very sensitive to gene tree error

• ASTRAL-multi: designed for multi-alleles. Theoretically consistent, empirically not good

• A-Pro: relatively robust to ILS, GDL and gene tree estimation error

32 Biological dataset (plants)

• Original publication only used 424 single-copy gene trees for 103 species

• We has also estimated 9683 multi-copy gene trees for a subset of 83 species

• Never found a good way to use this in the original study!

• DupTree on this dataset produced a tree that could not be believed (many obviously wrong branches)

33 Biological dataset (plants)

Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Persea Saruma Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Cyathea Huperzia Equisetum (Land Plants) Huperzia Physcomitrella Selaginella Polytrichum Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Coleochaetales Spirogyra Chaetosphaeridium Chara Coleochaete Klebsormidium Chara Klebsormidium Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 Biological dataset (plants)

Breaking the order Lamiales

Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Persea Saruma Magnoliids Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 Biological dataset (plants)

Breaking the order Lamiales

Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Consistent Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia with latest Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Magnoliids Persea onekp Saruma Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum [Nature, 2019] Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 Biological dataset (plants)

Recovered both ways in the Breaking the order Lamiales literature (A-Pro similar to concat)

Rosmarinus Ipomoea Rosmarinus Catharanthus Ipomoea Allamanda Catharanthus ASTRAL-Pro Inula ASTRAL (Single copy) Allamanda Tanacetum Inula Diospyros Tanacetum (all genes) Kochia Diospyros Carica Kochia Arabidopsis [Wickett, et al., Carica Hibiscus Eudicots Arabidopsis Populus Hibiscus Larrea 2014, PNAS] Populus Malus Larrea Boehmeria Boehmeria Consistent Medicago Medicago Vitis Vitis Podophyllum Podophyllum Aquilegia Aquilegia with latest Eschscholzia Eschscholzia Liriodendron Liriodendron Persea Magnoliids Persea onekp Saruma Saruma Houttuynia Houttuynia Sarcandra Chloranthaless Sarcandra Sorghum Sorghum [Nature, 2019] Zea Zea Oryza Oryza Brachypodium Brachypodium Sabal Sabal Smilax Monocots Yucca Colchicum Smilax Yucca Colchicum Dioscorea Dioscorea Angiosperms Acorus Acorus Kadsura Kadsura Nuphar ANA Grade Amborella Nuphar Gnetum Amborella Welwitschia Gnetum Ephedra Welwitschia Pinus Ephedra Cedrus Pinus Juniperus Cedrus Cunninghamia Juniperus Taxus Gymnosperms Cunninghamia Sciadopitys Taxus Prumnopitys Sciadopitys Cycas rumphii Prumnopitys Cycas micholitzii Cycas rumphii Zamia Cycas micholitzii Ginkgo Zamia Psilotum Ginkgo Ophioglossum Psilotum Angiopteris Monilophytes Ophioglossum Cyathea Angiopteris Pteridium Pteridium Embryophytes Equisetum Cyathea Huperzia Lycophytes Equisetum (Land Plants) Selaginella Huperzia Physcomitrella Selaginella Polytrichum Mosses Polytrichum Sphagnum Physcomitrella Marchantia emarginata Liverworts Sphagnum Marchantia polymorpha Marchantia emarginata Nothoceros aenigmaticus Marchantia polymorpha Nothoceros vincentianus Hornworts Nothoceros aenigmaticus Roya Nothoceros vincentianus Cosmarium Roya Netrium Zygnematophyceae Cosmarium Cylindrocystis Netrium Mougeotia Cylindrocystis Spirogyra Mougeotia Chaetosphaeridium Coleochaetales Spirogyra Coleochaete Chaetosphaeridium Chara Coleochaete Klebsormidium Spirotaenia Chara Chlorokybus Klebsormidium Mesostigma Chlorokybus Chlamydomonas Spirotaenia Volvox Mesostigma Uronema Uronema 34 A-Pro is available

• On bioRxiv: Zhang, Chao, Celine Scornavacca, Erin Molloy, and Siavash Mirarab. “ASTRAL-Pro: Quartet-Based Species Tree Inference despite Paralogy.” bioRxiv, 2019, 2019.12.12.874727. doi:10.1101/2019.12.12.874727.

• On GitHub: https://github.com/chaoszhang/A-pro

• More soon …

35 ASTRAL-Pro ASTRAL

Chao Zhang Tandy Warnow

Maryam Hashemi John Yin Celine Scornavacca Erin K. Molloy

36 Erfan Sayyari Chao Zhang