Determining the Probability of Hemiplasy in the Presence of Incomplete Lineage Sorting and Introgression Supplementary Materials and Methods

Determining the Probability of Hemiplasy in the Presence of Incomplete Lineage Sorting and Introgression Supplementary Materials and Methods

Determining the probability of hemiplasy in the presence of incomplete lineage sorting and introgression Supplementary Materials and Methods Mark S. Hibbins*, Matthew J.S. Gibson*, and Matthew W. Hahn*† Department of Biology* and Department of Computer Science† Indiana University, Bloomington 1 Mutation probabilities on genealogies Each of the twelve possible genealogies under our parent tree model has a set of five branch lengths along which mutations can occur. l1, l2, and l3 denote the tip branches leading to species A, B, and C respectively; l4 denotes the internal branch, and l5 denotes the ancestral branch. As described in the supplement of Guerrero & Hahn (2018), the mu- tation probability on each of these branches has the general form R 1 − e−mx f (x)dx, where m is the mutation probability per 2N generations, x is the random variable for the branch length, and f(x) is the probability density function for x. We begin with the mutation probabilities for parent tree 1, which are found in the supple- ment of Guerrero & Hahn, and will be re-written here to be consistent with notation. In the following notation, parent tree 1 will be denoted as ”pt1”. Since many of the genealo- gies are identical in length, the mutation probabilities on their branches can be written with general expressions. We first consider the genealogies AB21, BC1, and AC1, which are all produced via incomplete lineage sorting in parent tree 1, and share the following set of mutation probabilities: 1 Z t3−t2 3 −m(t1+(t2−t1)+x) −x −3x n1[ILS, pt1] = (1 − e ) (e − e )dx (1) L 0 2 1 Z t3−t2 −m(t1+(t2−t1)+x) −3x −((t3−t2)−x) n2[ILS, pt1] = (1 − e )3e (1 − e )dx (2) L 0 t −t Z 3 2 Z (t −t )−y 1 −3y 3 2 −x −mx n4[ILS, pt1] = 3e ( e (1 − e )dx)dy (3) L 0 0 Z t3−t2 1 −m((t3−t2)−x) −3x n5[ILS, pt1] = (1 − e )3e dx (4) L 0 1 −3(t3−t2) 3 −(t3−t2) In each of the above, L = 1 + 2 e − 2 e is the probability of coalescence of A, B, and C in their ancestral population. t3 denotes the total height of the tree, i.e. the time at the base of the tree. The difference between t3 and t2 determines the duration of the ancestral population of all three taxa, before speciation occurs. Equations 1 through 4 each represent the mutation probabilities for multiple branches, which are as follows: n1[ILS, pt1] = n(l3, AB21) = n(l1, BC1) = n(l2, AC1) (5) 1 n2[ILS, pt1] = n(l1, AB21) = n(l2, AB21) = n(l2, BC1) = n(l3, BC1) = n(l1, AC1) = n(l3, AC1) (6) n4[ILS, pt1] = n(l4, AB21) = n(l4, BC1) = n(l4, AC1) (7) n5[ILS, pt1] = n(l5, AB21) = n(l5, BC1) = n(l5, AC1) (8) The gene tree produced by lineage sorting in parent tree 1, AB11, has a different set of mutation probabilities, since the branches have different expected lengths. These are: 1 Z t2−t1 ( ) = ( ) = ( − −m(t1+x)) −x n l1, AB11 n l2, AB11 −( − ) 1 e e dx (9) 1 − e t2 t1 0 1 Z t3−t2 ( ) = ( − −m(t1+(t2−t1)+x)) −x n l3, AB11 −( − ) 1 e e dx (10) 1 − e t3 t2 0 Z t2−t1 − − e y Z t3−t2 e x −m((t2−t1)−y+x) n(l4, AB11) = ( (1 − e ) dx)dy (11) −(t2−t1) −(t3−t2) 0 1 − e 0 1 − e 1 Z t3−t2 ( ) = ( − −m((t3−t2)−x)) −x n l5, AB11 −( − ) 1 e e dx (12) 1 − e t3 t2 0 Now we consider introgression, starting with parent tree 2. Many of the mutation proba- bilities are symmetrical with parent tree 1 and therefore remain the same, and the remain- der have the same general form with different parameters. For the ILS genealogies BC22, AB2, and AC2, equations 1 and 2 have the time of A-B speciation (t1) replaced with the timing of B-C introgression (tm). This gives: Z t3−t2 1 −m(tm+(t2−tm)+x) 3 −x −3x n1[ILS, pt2] = (1 − e ) (e − e )dx (13) L 0 2 Z t3−t2 1 −m(tm+(t2−tm)+x) −3x −((t3−t2)−x) n2[ILS, pt2] = (1 − e )3e (1 − e )dx (14) L 0 n4[ILS, pt2] = n4[ILS, pt1] (15) n5[ILS, pt2] = n5[ILS, pt1] (16) These correspond to the following branch mutation probabilities: n1[ILS, pt2] = n(l1, BC22) = n(l3, AB2) = n(l2, AC2) (17) n2[ILS, pt2] = n(l2, BC22) = n(l3, BC22) = n(l1, AB2) = n(l2, AB2) = n(l1, AC2) = n(l3, AC2) (18) n4[ILS, pt2] = n(l4, BC22) = n(l4, AB2) = n(l4, AC2) (19) n5[ILS, pt2] = n(l5, BC22) = n(l5, AB2) = n(l5, AC2) (20) 2 For the genealogy produced by lineage sorting in parent tree 2, BC12, we have: 1 Z t2−tm ( ) = ( ) = ( − −m(tm+x)) −x n l2, BC12 n l3, BC12 −( − ) 1 e e dx (21) 1 − e t2 tm 0 1 Z t3−t2 ( ) = ( − −m(tm+(t2−tm)+x)) −x n l1, BC12 −( − ) 1 e e dx (22) 1 − e t3 t2 0 Z t2−tm − − e y Z t3−t2 e x −m((t2−tm)−y+x) n(l4, BC12) = ( (1 − e ) dx)dy (23) −(t2−tm) −(t3−t2) 0 1 − e 0 1 − e n(l5, BC12) = n(l5, AB11) (24) Finally, we consider parent tree 3. The mutation probabilities have the same formulation as parent tree 2, with two key changes: since parent tree 3 is shorter (Figure 2 of main text), t2 is replaced by t1. This also applies to the value of L, which we will denote for 1 −3(t3−t1) 3 −(t3−t1) parent tree 3 as L3 = 1 + 2 e − 2 e . For the ILS genealogies BC23, AB3, and AC3, this gives: 1 Z t3−t1 3 −m(tm+(t1−tm)+x) −x −3x n1[ILS, pt3] = (1 − e ) (e − e )dx (25) L3 0 2 1 Z t3−t1 −m(tm+(t1−tm)+x) −3x −((t3−t1)−x) n2[ILS, pt3] = (1 − e )3e (1 − e )dx (26) L3 0 t −t Z 3 1 Z (t −t )−y 1 −3y 3 1 −x −mx n4[ILS, pt3] = 3e ( e (1 − e )dx)dy (27) L3 0 0 1 Z t3−t1 −m((t3−t1)−x) −3x n5[ILS, pt3] = (1 − e )3e dx (28) L3 0 Where: n1[ILS, pt3] = n(l1, BC23) = n(l3, AB3) = n(l2, BC3) (29) n2[ILS, pt3] = n(l2, BC23) = n(l3, BC23) = n(l1, AB3) = n(l2, AB3) = n(l1, AC3) = n(l3, AC3) (30) n4[ILS, pt3] = n(l4, BC23) = n(l4, AB3) = n(l4, AC3) (31) n5[ILS, pt3] = n(l5, BC23) = n(l5, AB3) = n(l5, AC3) (32) Finally, for the genealogy BC13, the mutation probabilities are as follows: 1 Z t1−tm ( ) = ( ) = ( − −m(tm+x)) −x n l2, BC13 n l3, BC13 −( − ) 1 e e dx (33) 1 − e t1 tm 0 1 Z t3−t1 ( ) = ( − −m(tm+(t1−tm)+x)) −x n l1, BC13 −( − ) 1 e e dx (34) 1 − e t3 t1 0 Z t1−tm − − e y Z t3−t1 e x −m((t1−tm)−y+x) n(l4, BC13) = ( (1 − e ) dx)dy (35) −(t1−tm) −(t3−t1) 0 1 − e 0 1 − e 1 Z t3−t1 ( ) = ( − −m((t3−t1)−x)) −x n l5, BC13 −( − ) 1 e e dx (36) 1 − e t3 t1 0 3 2 When does introgression makes hemiplasy more likely than ILS alone? The probability of hemiplasy with C ! B introgression is Pe = (1 − d)Pe[BC1] + d(Pe[BC12] + Pe[BC22]) (37) From this, it can be seen that introgression makes hemiplasy more likely than ILS alone when: Pe[BC12] + Pe[BC22] > Pe[BC1] (38) When is this true? Substituting the relevant expressions from the main text gives: −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12))+ i6=4 1 −(t2−tm) ( e )v(l4, BC22) ∏(1 − v(li, BC22)) 3 i6=4 1 −(t2−t1) > ( e )v(l4, BC1) ∏(1 − v(li, BC1)) (39) 3 i6=4 −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > i6=4 1 −(t2−t1) ( e )v(l4, BC1) ∏(1 − v(li, BC1))− 3 i6=4 1 −(t2−tm) ( e )v(l4, BC22) ∏(1 − v(li, BC22)) (40) 3 i6=4 The mutation probabilities on the right side of the inequality are equal since they are on the same topology with the same branch lengths. Therefore, equation 40 can simplified: −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > i6=4 1 1 −(t2−t1) −(t2−tm) v(l4, BC1) ∏(1 − v(li, BC1))( e − e ) (41) i6=4 3 3 As the most conservative case, let us assume a hybrid speciation scenario where t1 = tm. This represents the most conservative introgression scenario, since Figure 4 in the main text shows that more recent introgression makes hemiplasy more likely. In this case, the right side of the inequality simplifies to 0, leaving −(t2−tm) (1 − e )v(l4, BC12) ∏(1 − v(li, BC12)) > 0 (42) i6=4 This is true whenever t2 > tm, which is true by definition in this model. Therefore, intro- gression always makes hemiplasy more likely than ILS alone. 4 3 Supplementary Figures and Tables Lygosoma sp Prasinohaema virens Lipinia longiceps Prasinohaema sp nov 2 Prasinohaema semoni Lipinia noctua Lipinia albodorsale Lobulia brongersmai Lobulia elegans Prasinohaema prehensicauda Prasinohaema sp nov 1 Prasinohaema flavipes Fojia bumui Lipinia pulchra Papuascincus sp nov Papuascincus stanleyanus 0.3 Supplementary Figure 1: Phylogeny of green-blooded lizards and an outgroup inferred from 3220 UCE gene trees using ASTRAL (data from Rodriguez et al.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us