<<

Ghost lineages deceive introgression tests and call for a new null hypothesis

Théo Tricou1,*, Eric Tannier1,2 and Damien M. de Vienne1,*

1 Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France 2 INRIA Grenoble Rhône-Alpes, F-38334, France

*Corresponding authors: E-mails: [email protected], [email protected]

Supplementary Material

Supplementary material and methods 1. Dfoil 1.1. Computing of the Dfoil For each combination of 5 lineages with symetrical topology (((P1,P2),(P3,P4)),O), we computed the 4 D-like statistics with the following equations (Pease and Hahn 2015): (Σ퐵퐴퐵퐴퐴 + Σ퐵퐵퐵퐴퐴 + Σ퐴퐵퐴퐵퐴 + Σ퐴퐴퐴퐵퐴) − (Σ퐵퐴퐴퐵퐴 + Σ퐵퐵퐵퐴퐴 + Σ퐴퐵퐵퐴퐴 + Σ퐴퐴퐵퐴퐴) DFO= (Σ퐵퐴퐵퐴퐴 + Σ퐵퐵퐵퐴퐴 + Σ퐴퐵퐴퐵퐴 + Σ퐴퐴퐴퐵퐴) + (Σ퐵퐴퐴퐵퐴 + Σ퐵퐵퐵퐴퐴 + Σ퐴퐵퐵퐴퐴 + Σ퐴퐴퐵퐴퐴)

(Σ퐴퐵퐵퐴퐴 + Σ퐵퐵퐴퐴퐴 + Σ퐵퐴퐴퐵퐴 + Σ퐴퐴퐴퐵퐴) − (Σ퐴퐵퐴퐵퐴 + Σ퐵퐵퐴퐵퐴 + Σ퐵퐴퐵퐴퐴 + Σ퐴퐴퐵퐴퐴) DIL= (Σ퐴퐵퐵퐴퐴 + Σ퐵퐵퐴퐴퐴 + Σ퐵퐴퐴퐵퐴 + Σ퐴퐴퐴퐵퐴) + (Σ퐴퐵퐴퐵퐴 + Σ퐵퐵퐴퐵퐴 + Σ퐵퐴퐵퐴퐴 + Σ퐴퐴퐵퐴퐴)

(Σ퐵퐴퐵퐴퐴 + Σ퐵퐴퐵퐵퐴 + Σ퐴퐵퐴퐵퐴 + Σ퐴퐵퐴퐴퐴) − (Σ퐴퐵퐵퐴퐴 + Σ퐴퐵퐵퐵퐴 + Σ퐵퐴퐴퐵퐴 + Σ퐵퐴퐴퐴퐴) DFI= (Σ퐵퐴퐵퐴퐴 + Σ퐵퐴퐵퐵퐴 + Σ퐴퐵퐴퐵퐴 + Σ퐴퐵퐴퐴퐴) + (Σ퐴퐵퐵퐴퐴 + Σ퐴퐵퐵퐵퐴 + Σ퐵퐴퐴퐵퐴 + Σ퐵퐴퐴퐴퐴) DOL

(Σ퐵퐴퐴퐵퐴 + Σ퐵퐴퐵퐵퐴 + Σ퐴퐵퐵퐴퐴 + Σ퐴퐵퐴퐴퐴) − (Σ퐴퐵퐴퐵퐴 + Σ퐴퐵퐵퐵퐴 + Σ퐵퐴퐵퐴퐴 + Σ퐵퐴퐴퐴퐴) = (Σ퐵퐴퐴퐵퐴 + Σ퐵퐴퐵퐵퐴 + Σ퐴퐵퐵퐴퐴 + Σ퐴퐵퐴퐴퐴) + (Σ퐴퐵퐴퐵퐴 + Σ퐴퐵퐵퐵퐴 + Σ퐵퐴퐵퐴퐴 + Σ퐵퐴퐴퐴퐴) and we performed binomial tests to evaluate whether the difference between both elements framing the minus sign of each equation was significant, in to assign a “+”, “-” or “0” sign.

1.2. DFOIL, a 5- extension of D-statistics, rarely solves the issue raised by ghost introgressions.

In addition to the D-statistics, we tested the DFOIL, a 5-taxon test derived from the D-statistics allowing to determine the direction of gene flows. In Pease et al (2015), 8 unique patterns of

DFOIL were linked to different polarized introgression events (with an explicit direction) and another 2 to paires of non polarized (both directions) events (see Table 1. in Pease and Hahn 2015). Our simulations yield additional results when taking into account unsampled lineages and ghost introgressions. We observed two additional DFOIL patterns, “00++” and “00--” that can be interpreted as non polarized events. Furthermore, any non polarized event can be explained either by an introgression between an ancestor of one and a from the opposite clade or an introgression from a midgroup to the second species of the opposite clade. For example, the Dfoil pattern “++00” arise from the event P1P2<->P3 but could also be observed following the event Ghost->P4. For “--00”, events are P1P2<->P4 and Ghost->P3. For the two new patterns, “00--” and “00++”, events are P3P4<->P1 and Ghost->P2 and events are P3P4<->P2 and Ghost->P1 respectively. It should be noted that, similarly to the D-statistics, an introgression from the lineages or an external lineages to the quintet will produce the same pattern as a midgroup ghost interpretation (Supplementary fig. S3). Given that the ancestor of P3 and P4 is always older than the ancestor of P1 and P2, implies that a lineage with no descendant available inside the ingroup is the donor for those two DFOIL patterns, either a sister lineage to P3P4 or a midgroup ghost lineage. Conversely polarized events can not be explained by any events involving midgroup ghost lineages. Meaning that Dfoil can only be erroneously interpreted if the pattern is non polarized.

Pease, James B., and Matthew W. Hahn. 2015. “Detection and Polarization of Introgression in a Five-Taxon Phylogeny.” Systematic 64 (4): 651–62. https://doi.org/10.1093/sysbio/syv023. Supplementary Figures

Supplementary Figure S1. Proportion of erroneous interpretation of D-statistics for all subsets of parameters testeed. Mean proportion of erroneous interpretations observed (y-axis) as a function of the taxonomic sampling effort (x-axis) 20 species sampled for N extent species simulated (N = 20, 40, 60, 80, 100) by the distance to the outgroup (columns) for different strengths of the phylogenetic distance effect (lines) as controlled by α (α = 0 -no effect-, 1, 10, 100 and 1000). Supplementary Figure 2. D-statistics distribution (y-axis) for each event introgression events (x-axis) that can create a detectable excess of ABBA or BABA patterns as identified by the D-statistics (or Patterson’s D) test. Introgression events that are not expected to produce significant D-statistics (exemple between sister lineages) are not shown here. Using approach 1, 100 random species trees with 20 extent species at the end and one introgression were simulated. In this figure lineages called “O” are outside the quartet, lineages called “P4” are the outgroup of the quartet. Ghost introgressions are called “N2P1” or “N2P2”. Introgressions from and to the outgroup or from lineages outside the quartet are annotated “P4P1”, “P4P2”, “P1P4”, “P2P4”, “OP1”, “OP1”.

Supplementary Figure 3. DFOIL results for all possible introgression events, involving ingroup lineages (P1, P2, P3, P4, P1P2 and P3P4), midgroup lineages (N2) or the outgroup and lineages outside the quintet (P5 and O). Using approach 1 (see Material and Method in main text), 100 random species trees with 20 extent species at the end and one introgression were simulated. For each event the corresponding 4 signs pattern is represented by +/−/0 signs (see text). A pattern represents the observed values of four D-like statistics DFO⁠, DIL,⁠

DFI⁠, and DOL (see supplementary materials).

Supplementary Figure 4. Effect of the outgroup distance on the proportion of erroneous interpretation of the DFOIL. Proportion of erroneous interpretations due to ghost midgroup introgression observed (y-axis), function of the range for R which quartet are considered (x-axis). Supplementary Figure 5. Effect of the size of the species tree on the proportion of erroneous interpretation of the D-statistics. Proportion of erroneous interpretations observed (y-axis), function of the number of extinct lineages in the species tree (x-axis). Species trees were simulated using 4 rate, pext

(pext = 0, 0.3, 0.6, 0.9). For each parameter pext, 1000 species trees were simulated with 20 extant species at the end and with 100 introgressions sampled.