Determining the Probability of Hemiplasy in the Presence of Incomplete Lineage Sorting and Introgression Supplementary Materials and Methods

Determining the probability of hemiplasy in the presence of incomplete lineage sorting and introgression Supplementary Materials and Methods Mark S. Hibbins*, Matthew J.S. Gibson*, and Matthew W. Hahn*† Department of Biology* and Department of Computer Science† Indiana University, Bloomington 1 Mutation probabilities on genealogies Each of the twelve possible genealogies under our parent tree model has a set of five branch lengths along which mutations can occur. l1, l2, and l3 denote the tip branches leading to species A, B, and C respectively; l4 denotes the internal branch, and l5 denotes the ancestral branch. As described in the supplement of Guerrero & Hahn (2018), the mutation probability on each of these branches has the general form R 1 − e−mx f (x)dx, where m is the mutation probability per 2N generations, x is the random variable for the branch length, and f(x) is the probability density function for x. We begin with the mutation probabilities for parent tree 1, which are found in the supplement of Guerrero & Hahn, and will be re-written here to be consistent with notation. In the following notation, parent tree 1 will be denoted as ”pt1”. Since many of the genealogies are identical in length, the mutation probabilities on their branches can be written with general expressions. We first consider the genealogies AB21, BC1, and AC1, which are all produced via incomplete lineage sorting in parent tree 1, and share the following set of mutation probabilities: 1 Z t3−t2 3 −m(t1+(t2−t1)+x) −x −3x n1[ILS, pt1] = (1 − e ) (e − e )dx (1) L 0 2 1 Z t3−t2 −m(t1+(t2−t1)+x) −3x −((t3−t2)−x) n2[ILS, pt1] = (1 − e )3e (1 − e )dx (2) L 0 t −t Z 3 2 Z (t −t )−y 1 −3y 3 2 −x −mx n4[ILS, pt1] = 3e ( e (1 − e )dx)dy (3) L 0 0 Z t3−t2 1 −m((t3−t2)−x) −3x n5[ILS, pt1] = (1 − e )3e dx (4) L 0 1 −3(t3−t2) 3 −(t3−t2) In each of the above, L = 1 + 2 e − 2 e is the probability of coalescence of A, B, and C in their ancestral population. t3 denotes the total height of the tree, i.e. the time at the base of the tree. The difference between t3 and t2 determines the duration of the ancestral population of all three taxa, before speciation occurs. Equations 1 through 4 each represent the mutation probabilities for multiple branches, which are as follows: n1[ILS, pt1] = n(l3, AB21) = n(l1, BC1) = n(l2, AC1) (5) 1 n2[ILS, pt1] = n(l1, AB21) = n(l2, AB21) = n(l2, BC1) = n(l3, BC1) = n(l1, AC1) = n(l3, AC1) (6) n4[ILS, pt1] = n(l4, AB21) = n(l4, BC1) = n(l4, AC1) (7) n5[ILS, pt1] = n(l5, AB21) = n(l5, BC1) = n(l5, AC1) (8) The gene tree produced by lineage sorting in parent tree 1, AB11, has a different set of mutation probabilities, since the branches have different expected lengths. These are: 1 Z t2−t1 ( ) = ( ) = ( − −m(t1+x)) −x n l1, AB11 n l2, AB11 −( − ) 1 e e dx (9) 1 − e t2 t1 0 1 Z t3−t2 ( ) = ( − −m(t1+(t2−t1)+x)) −x n l3, AB11 −( − ) 1 e e dx (10) 1 − e t3 t2 0 Z t2−t1 − − e y Z t3−t2 e x −m((t2−t1)−y+x) n(l4, AB11) = ( (1 − e ) dx)dy (11) −(t2−t1) −(t3−t2) 0 1 − e 0 1 − e 1 Z t3−t2 ( ) = ( − −m((t3−t2)−x)) −x n l5, AB11 −( − ) 1 e e dx (12) 1 − e t3 t2 0 Now we consider introgression, starting with parent tree 2. Many of the mutation probabilities are symmetrical with parent tree 1 and therefore remain the same, and the remain- der have the same general form with different parameters. For the ILS genealogies BC22, AB2, and AC2, equations 1 and 2 have the time of A-B speciation (t1) replaced with the timing of B-C introgression (tm). This gives: Z t3−t2 1 −m(tm+(t2−tm)+x) 3 −x −3x n1[ILS, pt2] = (1 − e ) (e − e )dx (13) L 0 2 Z t3−t2 1 −m(tm+(t2−tm)+x) −3x −((t3−t2)−x) n2[ILS, pt2] = (1 − e )3e (1 − e )dx (14) L 0 n4[ILS, pt2] = n4[ILS, pt1] (15) n5[ILS, pt2] = n5[ILS, pt1] (16) These correspond to the following branch mutation probabilities: n1[ILS, pt2] = n(l1, BC22) = n(l3, AB2) = n(l2, AC2) (17) n2[ILS, pt2] = n(l2, BC22) = n(l3, BC22) = n(l1, AB2) = n(l2, AB2) = n(l1, AC2) = n(l3, AC2) (18) n4[ILS, pt2] = n(l4, BC22) = n(l4, AB2) = n(l4, AC2) (19) n5[ILS, pt2] = n(l5, BC22) = n(l5, AB2) = n(l5, AC2) (20) 2 For the genealogy produced by lineage sorting in parent tree 2, BC12, we have: 1 Z t2−tm ( ) = ( ) = ( − −m(tm+x)) −x n l2, BC12 n l3, BC12 −( − ) 1 e e dx (21) 1 − e t2 tm 0 1 Z t3−t2 ( ) = ( − −m(tm+(t2−tm)+x)) −x n l1, BC12 −( − ) 1 e e dx (22) 1 − e t3 t2 0 Z t2−tm − − e y Z t3−t2 e x −m((t2−tm)−y+x) n(l4, BC12) = ( (1 − e ) dx)dy (23) −(t2−tm) −(t3−t2) 0 1 − e 0 1 − e n(l5, BC12) = n(l5, AB11) (24) Finally, we consider parent tree 3. The mutation probabilities have the same formulation as parent tree 2, with two key changes: since parent tree 3 is shorter (Figure 2 of main text), t2 is replaced by t1. This also applies to the value of L, which we will denote for 1 −3(t3−t1) 3 −(t3−t1) parent tree 3 as L3 = 1 + 2 e − 2 e . For the ILS genealogies BC23, AB3, and AC3, this gives: 1 Z t3−t1 3 −m(tm+(t1−tm)+x) −x −3x n1[ILS, pt3] = (1 − e ) (e − e )dx (25) L3 0 2 1 Z t3−t1 −m(tm+(t1−tm)+x) −3x −((t3−t1)−x) n2[ILS, pt3] = (1 − e )3e (1 − e )dx (26) L3 0 t −t Z 3 1 Z (t −t )−y 1 −3y 3 1 −x −mx n4[ILS, pt3] = 3e ( e (1 − e )dx)dy (27) L3 0 0 1 Z t3−t1 −m((t3−t1)−x) −3x n5[ILS, pt3] = (1 − e )3e dx (28) L3 0 Where: n1[ILS, pt3] = n(l1, BC23) = n(l3, AB3) = n(l2, BC3) (29) n2[ILS, pt3] = n(l2, BC23) = n(l3, BC23) = n(l1, AB3) = n(l2, AB3) = n(l1, AC3) = n(l3, AC3) (30) n4[ILS, pt3] = n(l4, BC23) = n(l4, AB3) = n(l4, AC3) (31) n5[ILS, pt3] = n(l5, BC23) = n(l5, AB3) = n(l5, AC3) (32) Finally, for the genealogy BC13, the mutation probabilities are as follows: 1 Z t1−tm ( ) = ( ) = ( − −m(tm+x)) −x n l2, BC13 n l3, BC13 −( − ) 1 e e dx (33) 1 − e t1 tm 0 1 Z t3−t1 ( ) = ( − −m(tm+(t1−tm)+x)) −x n l1, BC13 −( − ) 1 e e dx (34) 1 − e t3 t1 0 Z t1−tm − − e y Z t3−t1 e x −m((t1−tm)−y+x) n(l4, BC13) = ( (1 − e ) dx)dy (35) −(t1−tm) −(t3−t1) 0 1 − e 0 1 − e 1 Z t3−t1 ( ) = ( − −m((t3−t1)−x)) −x n l5, BC13 −( − ) 1 e e dx (36) 1 − e t3 t1 0 3 2 When does introgression makes hemiplasy more likely than ILS alone? The probability of hemiplasy with C ! B introgression is Pe = (1 − d)Pe[BC1] + d(Pe[BC12] + Pe[BC22]) (37) From this, it can be seen that introgression makes hemiplasy more likely than ILS alone when: Pe[BC12] + Pe[BC22] > Pe[BC1] (38) When is this true? Substituting the relevant expressions from the main text gives: −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12))+ i6=4 1 −(t2−tm) ( e )v(l4, BC22) ∏(1 − v(li, BC22)) 3 i6=4 1 −(t2−t1) > ( e )v(l4, BC1) ∏(1 − v(li, BC1)) (39) 3 i6=4 −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > i6=4 1 −(t2−t1) ( e )v(l4, BC1) ∏(1 − v(li, BC1))− 3 i6=4 1 −(t2−tm) ( e )v(l4, BC22) ∏(1 − v(li, BC22)) (40) 3 i6=4 The mutation probabilities on the right side of the inequality are equal since they are on the same topology with the same branch lengths. Therefore, equation 40 can simplified: −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > i6=4 1 1 −(t2−t1) −(t2−tm) v(l4, BC1) ∏(1 − v(li, BC1))( e − e ) (41) i6=4 3 3 As the most conservative case, let us assume a hybrid speciation scenario where t1 = tm. This represents the most conservative introgression scenario, since Figure 4 in the main text shows that more recent introgression makes hemiplasy more likely. In this case, the right side of the inequality simplifies to 0, leaving −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > 0 (42) i6=4 This is true whenever t2 > tm, which is true by definition in this model. Therefore, introgression always makes hemiplasy more likely than ILS alone. 4 3 Supplementary Figures and Tables Lygosoma sp Prasinohaema virens Lipinia longiceps Prasinohaema sp nov 2 Prasinohaema semoni Lipinia noctua Lipinia albodorsale Lobulia brongersmai Lobulia elegans Prasinohaema prehensicauda Prasinohaema sp nov 1 Prasinohaema flavipes Fojia bumui Lipinia pulchra Papuascincus sp nov Papuascincus stanleyanus 0.3 Supplementary Figure 1: Phylogeny of green-blooded lizards and an outgroup inferred from 3220 UCE gene trees using ASTRAL (data from Rodriguez et al.

Determining the Probability of Hemiplasy in the Presence of Incomplete Lineage Sorting and Introgression Supplementary Materials and Methods

Cretaceous Fossil Gecko Hand Reveals a Strikingly Modern Scansorial Morphology: Qualitative and Biometric Analysis of an Amber-Preserved Lizard Hand

Bibliography and Scientific Name Index to Amphibians

NHBSS 061 1G Hikida Fieldg

Literature Cited in Lizards Natural History Database

La Collezione Erpetologica Del Museo Civico Di Storia Naturale “G. Doria” Di Genova the Herpetological Collection of the Museo Civico Di Storia Naturale “G

Standard Common and Current Scientific Names for North American Amphibians, Turtles, Reptiles & Crocodilians

Density of Three Skink Species on a Sub-Tropical Pacific Island Estimated with Hierarchical Distance Sampling

1 §4-71-6.5 List of Restricted Animals [ ] Part A: For

Systematics of the Carlia “Fusca” Lizards (Squamata: Scincidae) of New Guinea and Nearby Islands

(Genus Lipinia Gray, 1845) from Northeastern Peninsular Malaysia

Species Richness in Time and Space: a Phylogenetic and Geographic Perspective

Competition Poorly Correlates with Morphological Niche Partitioning In