Determining the Probability of Hemiplasy in the Presence of Incomplete Lineage Sorting and Introgression Supplementary Materials and Methods

Determining the probability of hemiplasy in the presence of incomplete lineage sorting and introgression Supplementary Materials and Methods Mark S. Hibbins*, Matthew J.S. Gibson*, and Matthew W. Hahn*† Department of Biology* and Department of Computer Science† Indiana University, Bloomington 1 Mutation probabilities on genealogies Each of the twelve possible genealogies under our parent tree model has a set of five branch lengths along which mutations can occur. l1, l2, and l3 denote the tip branches leading to species A, B, and C respectively; l4 denotes the internal branch, and l5 denotes the ancestral branch. As described in the supplement of Guerrero & Hahn (2018), the mutation probability on each of these branches has the general form R 1 − e−mx f (x)dx, where m is the mutation probability per 2N generations, x is the random variable for the branch length, and f(x) is the probability density function for x. We begin with the mutation probabilities for parent tree 1, which are found in the supplement of Guerrero & Hahn, and will be re-written here to be consistent with notation. In the following notation, parent tree 1 will be denoted as ”pt1”. Since many of the genealogies are identical in length, the mutation probabilities on their branches can be written with general expressions. We first consider the genealogies AB21, BC1, and AC1, which are all produced via incomplete lineage sorting in parent tree 1, and share the following set of mutation probabilities: 1 Z t3−t2 3 −m(t1+(t2−t1)+x) −x −3x n1[ILS, pt1] = (1 − e ) (e − e )dx (1) L 0 2 1 Z t3−t2 −m(t1+(t2−t1)+x) −3x −((t3−t2)−x) n2[ILS, pt1] = (1 − e )3e (1 − e )dx (2) L 0 t −t Z 3 2 Z (t −t )−y 1 −3y 3 2 −x −mx n4[ILS, pt1] = 3e ( e (1 − e )dx)dy (3) L 0 0 Z t3−t2 1 −m((t3−t2)−x) −3x n5[ILS, pt1] = (1 − e )3e dx (4) L 0 1 −3(t3−t2) 3 −(t3−t2) In each of the above, L = 1 + 2 e − 2 e is the probability of coalescence of A, B, and C in their ancestral population. t3 denotes the total height of the tree, i.e. the time at the base of the tree. The difference between t3 and t2 determines the duration of the ancestral population of all three taxa, before speciation occurs. Equations 1 through 4 each represent the mutation probabilities for multiple branches, which are as follows: n1[ILS, pt1] = n(l3, AB21) = n(l1, BC1) = n(l2, AC1) (5) 1 n2[ILS, pt1] = n(l1, AB21) = n(l2, AB21) = n(l2, BC1) = n(l3, BC1) = n(l1, AC1) = n(l3, AC1) (6) n4[ILS, pt1] = n(l4, AB21) = n(l4, BC1) = n(l4, AC1) (7) n5[ILS, pt1] = n(l5, AB21) = n(l5, BC1) = n(l5, AC1) (8) The gene tree produced by lineage sorting in parent tree 1, AB11, has a different set of mutation probabilities, since the branches have different expected lengths. These are: 1 Z t2−t1 ( ) = ( ) = ( − −m(t1+x)) −x n l1, AB11 n l2, AB11 −( − ) 1 e e dx (9) 1 − e t2 t1 0 1 Z t3−t2 ( ) = ( − −m(t1+(t2−t1)+x)) −x n l3, AB11 −( − ) 1 e e dx (10) 1 − e t3 t2 0 Z t2−t1 − − e y Z t3−t2 e x −m((t2−t1)−y+x) n(l4, AB11) = ( (1 − e ) dx)dy (11) −(t2−t1) −(t3−t2) 0 1 − e 0 1 − e 1 Z t3−t2 ( ) = ( − −m((t3−t2)−x)) −x n l5, AB11 −( − ) 1 e e dx (12) 1 − e t3 t2 0 Now we consider introgression, starting with parent tree 2. Many of the mutation probabilities are symmetrical with parent tree 1 and therefore remain the same, and the remain- der have the same general form with different parameters. For the ILS genealogies BC22, AB2, and AC2, equations 1 and 2 have the time of A-B speciation (t1) replaced with the timing of B-C introgression (tm). This gives: Z t3−t2 1 −m(tm+(t2−tm)+x) 3 −x −3x n1[ILS, pt2] = (1 − e ) (e − e )dx (13) L 0 2 Z t3−t2 1 −m(tm+(t2−tm)+x) −3x −((t3−t2)−x) n2[ILS, pt2] = (1 − e )3e (1 − e )dx (14) L 0 n4[ILS, pt2] = n4[ILS, pt1] (15) n5[ILS, pt2] = n5[ILS, pt1] (16) These correspond to the following branch mutation probabilities: n1[ILS, pt2] = n(l1, BC22) = n(l3, AB2) = n(l2, AC2) (17) n2[ILS, pt2] = n(l2, BC22) = n(l3, BC22) = n(l1, AB2) = n(l2, AB2) = n(l1, AC2) = n(l3, AC2) (18) n4[ILS, pt2] = n(l4, BC22) = n(l4, AB2) = n(l4, AC2) (19) n5[ILS, pt2] = n(l5, BC22) = n(l5, AB2) = n(l5, AC2) (20) 2 For the genealogy produced by lineage sorting in parent tree 2, BC12, we have: 1 Z t2−tm ( ) = ( ) = ( − −m(tm+x)) −x n l2, BC12 n l3, BC12 −( − ) 1 e e dx (21) 1 − e t2 tm 0 1 Z t3−t2 ( ) = ( − −m(tm+(t2−tm)+x)) −x n l1, BC12 −( − ) 1 e e dx (22) 1 − e t3 t2 0 Z t2−tm − − e y Z t3−t2 e x −m((t2−tm)−y+x) n(l4, BC12) = ( (1 − e ) dx)dy (23) −(t2−tm) −(t3−t2) 0 1 − e 0 1 − e n(l5, BC12) = n(l5, AB11) (24) Finally, we consider parent tree 3. The mutation probabilities have the same formulation as parent tree 2, with two key changes: since parent tree 3 is shorter (Figure 2 of main text), t2 is replaced by t1. This also applies to the value of L, which we will denote for 1 −3(t3−t1) 3 −(t3−t1) parent tree 3 as L3 = 1 + 2 e − 2 e . For the ILS genealogies BC23, AB3, and AC3, this gives: 1 Z t3−t1 3 −m(tm+(t1−tm)+x) −x −3x n1[ILS, pt3] = (1 − e ) (e − e )dx (25) L3 0 2 1 Z t3−t1 −m(tm+(t1−tm)+x) −3x −((t3−t1)−x) n2[ILS, pt3] = (1 − e )3e (1 − e )dx (26) L3 0 t −t Z 3 1 Z (t −t )−y 1 −3y 3 1 −x −mx n4[ILS, pt3] = 3e ( e (1 − e )dx)dy (27) L3 0 0 1 Z t3−t1 −m((t3−t1)−x) −3x n5[ILS, pt3] = (1 − e )3e dx (28) L3 0 Where: n1[ILS, pt3] = n(l1, BC23) = n(l3, AB3) = n(l2, BC3) (29) n2[ILS, pt3] = n(l2, BC23) = n(l3, BC23) = n(l1, AB3) = n(l2, AB3) = n(l1, AC3) = n(l3, AC3) (30) n4[ILS, pt3] = n(l4, BC23) = n(l4, AB3) = n(l4, AC3) (31) n5[ILS, pt3] = n(l5, BC23) = n(l5, AB3) = n(l5, AC3) (32) Finally, for the genealogy BC13, the mutation probabilities are as follows: 1 Z t1−tm ( ) = ( ) = ( − −m(tm+x)) −x n l2, BC13 n l3, BC13 −( − ) 1 e e dx (33) 1 − e t1 tm 0 1 Z t3−t1 ( ) = ( − −m(tm+(t1−tm)+x)) −x n l1, BC13 −( − ) 1 e e dx (34) 1 − e t3 t1 0 Z t1−tm − − e y Z t3−t1 e x −m((t1−tm)−y+x) n(l4, BC13) = ( (1 − e ) dx)dy (35) −(t1−tm) −(t3−t1) 0 1 − e 0 1 − e 1 Z t3−t1 ( ) = ( − −m((t3−t1)−x)) −x n l5, BC13 −( − ) 1 e e dx (36) 1 − e t3 t1 0 3 2 When does introgression makes hemiplasy more likely than ILS alone? The probability of hemiplasy with C ! B introgression is Pe = (1 − d)Pe[BC1] + d(Pe[BC12] + Pe[BC22]) (37) From this, it can be seen that introgression makes hemiplasy more likely than ILS alone when: Pe[BC12] + Pe[BC22] > Pe[BC1] (38) When is this true? Substituting the relevant expressions from the main text gives: −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12))+ i6=4 1 −(t2−tm) ( e )v(l4, BC22) ∏(1 − v(li, BC22)) 3 i6=4 1 −(t2−t1) > ( e )v(l4, BC1) ∏(1 − v(li, BC1)) (39) 3 i6=4 −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > i6=4 1 −(t2−t1) ( e )v(l4, BC1) ∏(1 − v(li, BC1))− 3 i6=4 1 −(t2−tm) ( e )v(l4, BC22) ∏(1 − v(li, BC22)) (40) 3 i6=4 The mutation probabilities on the right side of the inequality are equal since they are on the same topology with the same branch lengths. Therefore, equation 40 can simplified: −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > i6=4 1 1 −(t2−t1) −(t2−tm) v(l4, BC1) ∏(1 − v(li, BC1))( e − e ) (41) i6=4 3 3 As the most conservative case, let us assume a hybrid speciation scenario where t1 = tm. This represents the most conservative introgression scenario, since Figure 4 in the main text shows that more recent introgression makes hemiplasy more likely. In this case, the right side of the inequality simplifies to 0, leaving −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > 0 (42) i6=4 This is true whenever t2 > tm, which is true by definition in this model. Therefore, introgression always makes hemiplasy more likely than ILS alone. 4 3 Supplementary Figures and Tables A) B) C) D) AB1 AB2 BC AC t t t t 2 2 2 2 t t t t 1 1 1 1 A B C A B C A B C ABC E) F) G) H) BC1! BC2! AB! AC! t t t t 2 2 2 2 t t t t m m m m A B C A BC A B C A B C Supplementary Figure 1: Each parent tree in our model generates four gene trees; one generated from lineage sorting (Panels A and E), and three equally likely trees generated from incomplete lineage sorting (panels B-D, F-H).

Determining the Probability of Hemiplasy in the Presence of Incomplete Lineage Sorting and Introgression Supplementary Materials and Methods

VMAA-Performance-Sta

Supporting Information Modular Control of L-Tryptophan Isotopic Substitution Via an Efficient Biosynthetic Cascade

The USANA Compensation Plan (Malaysia)

United States Olympic Committee and U.S. Department of Veterans Affairs

Para Cycling Information Sheet About the Sport Classification Explained

Para Cycling

(VA) Veteran Monthly Assistance Allowance for Disabled Veterans

Should the Para-Cycling Classification System Be Reclassified?

Fire Department Fiscal Year 2020 Annual Report July 1, 2019 – June 30, 2020

The Paralympic Athlete Dedicated to the Memory of Trevor Williams Who Inspired the Editors in 1997 to Write This Book

TNM Classification of Malignant Tumours

Conc. (Mm) B Conc