Xenolog Classification
Total Page:16
File Type:pdf, Size:1020Kb
Xenolog Classification S1 Xenolog Classification Charlotte Darby, Maureen Stolzer, Patrick Ropp, Daniel Barker, and Dannie Durand. S Supplementary Information S.1 Hierarchical properties of xenolog classes Theorem 2.1, in the main text, links relatedness in the gene tree and closeness in the xenolog hierarchy, for a gene family with a single transfer and no duplications. Here we state and prove a more general version of this theorem that also accounts for multiple transfers and paraxenologs. Prior to stating the main theorem, we introduce several lemmas. Each event type maps parent-child relationships in the gene tree to a characteristic set of relationships between the associated nodes in the species tree. Let g1 and g2 be the children of g in TG. • If g diverged by speciation (E (g)=σ), then M (g1) and M (g2) are the children of M (g) in TS. • If g diverged by duplication (E (g)=δ) and no losses occurred immediately following the duplication, then M (g1)= M (g2)=M (g). • If the divergence at g arose through a horizontal transfer (E (g)=τ) from g to g1, wlog, then M (g1) !≶ M (g2) and M (g2)=M (g). This mapping of parent-child relationships implies that, in the absence of transfers, nodes that are comparable in the gene tree map to nodes that are comparable in the species tree, leading to the following lemma, which we state without proof. Lemma S.1. (Concordance of relationships in the gene and species trees) Given any gi and g j in VG, if there are no transfer events on the path between gi and g j, then (a) If E (MRCA(gi,g j)) = σ or E (MRCA(gi,g j)) = δ, then M (MRCA(gi,g j)) = MRCA(M (gi),M (g j)). (b) If E (MRCA(gi,g j)) = σ and M (gi) <S M (g j), then gi <G g j. (c) For any g ∈ VG, if E (MRCA(gi,g j)) = σ and MRCA(M (gi),M (g j)) <S M (g), then MRCA(gi,g j) <G g. The xenolog class hierarchy arises because gene tree evolution is constrained by species tree evolution in the absence of transfers. We therefore first show that xenolog classes have hierarchical properties in the species tree. Lemma S.2. (Xenolog classes are hierarchical in the species tree) Let t =(gd,gr) be a transfer in TG. For any Sibling Donor (para)xenolog, gSD, Sibling Recipient xenolog, gSR, and Outgroup xenolog, gO, of a reference gene g!∈ Δ(gr), MRCA(M (gd),M (gSD)) <S MRCA(M (gd),M (gSR)) <S MRCA(M (gd),M (gO)). Proof. First inequality: Since M (gd) ∈ D and M (gSD) ∈ D, MRCA(M (gd),M (gSD)) is also in D. By definition, M (gSR) ∈ R; therefore MRCA(M (gd),M (gSR)) = as. Since D is the set of nodes in a subtree rooted at a child of as, any node in D is a descendant of as. Therefore, MRCA(M (gd),M (gSD)) <S MRCA(M (gd),M (gSR)). “darby_xenologclassification-supplement” — 2016/11/6 — page S1 — #10 S2 Darby et al. Second inequality: By definition, M (gO) ∈ VS \ A, where A = Δ(as), and gd ∈ D ⊂ A. Therefore, as <S MRCA(M (gd),M (gO)). Since MRCA(M (gd),M (gSR)) = as, MRCA(M (gd),M (gSR)) <S MRCA(M (gd),M (gO)). Comparable transfers: We now generalize Theorem 2.1 to allow for duplications and multiple transfers. We first consider xenologous pairs where reference gene g! and (para)xenolog g are separated by k ≥ 1 mutually comparable transfers, t1,t2 ...tk; g can either be a xenolog or a paraxenolog. Therefore, the hierarchy must not only account for Primary, Sibling Donor, Sibling Recipient and Outgroup xenologs, but also Sibling Donor (SDXP) and untyped paraxenologs (XP). ∗ ∗ ∗ Theorem S.1. (General xenolog class hierarchy: Comparable transfers) Given a reconciled tree, TG, let t =(gd,gr ) be 1 1 1 k k k a super-transfer in TG, where t =(gd,gr )...t =(gd,gr ) is an ordered sequence of k ≥ 1 mutually comparable transfers ∗ 1 ∗ k ! ∗ such that gd = gd and gr = gr . Let g ∈ Δ(gr ) be a reference gene in VG and let ∗ gP ∈ VG \ Δ(gr ) be a Primary xenolog, ∗ gSD(P) ∈ VG \ Δ(gr ) be a Sibling Donor xenolog or paraxenolog, ∗ gSR ∈ VG \ Δ(gr ) be a Sibling Recipient xenolog, ∗ gO ∈ VG \ Δ(gr ) be an Outgroup xenolog, and ∗ gXP ∈ VG \ Δ(gr ) be an untyped paraxenolog ! ∗ ! of g such that t lies on the path from g to each of gP,gSD(P) ,gSR,gO and gXP . Then, ! ! ! MRCA(g,gP) <G MRCA(g,gSD(P) ) <G MRCA(g,gSR), MRCA(g!,gSR) <G MRCA(g!,gO), ! ! MRCA(g,gSR) <G MRCA(g,gXP ). Proof. ! ! MRCA(g,gP) <G MRCA(g,gSD(P) ): ∗ 1 ! ∗ ∗ 1 ! ∗ By the Primary xenolog definition, gP ∈ Δ(gd) \ Δ(gr ). Since g ∈ Δ(gr ) and gr <G gr , MRCA(g,gP)=gd. Since gSD(P) ∗ ∗ ! ! is a Sibling Donor xenolog, by definition, gSD(P) !∈ Δ(gd). Therefore, gd <G MRCA(g,gSD(P) ) and MRCA(g,gP) <G ! MRCA(g,gSD(P) ). ! ! MRCA(g,gSD(P) ) <G MRCA(g,gSR): By Lemma S.2, M ∗ M M ∗ M MRCA( (gd), (gSD(P) )) <S MRCA( (gd), (gSR)). (S1) “darby_xenologclassification-supplement” — 2016/11/6 — page S2 — #11 Xenolog Classification S3 ∗ Since there are no transfers in TG ancestral to gd and Lemma S.1(a) holds for both orthologs and paralogs, Equation S1 reduces to ∗ ∗ MRCA(gd,gSD(P) ) <G MRCA(gd,gSR). ∗ ! ! ∗ ! Since gd is on the path from g to gSD(P) and on the path from g to gSR, MRCA(gd,gSD(P) )=MRCA(g,gSD(P) ) and ∗ ! MRCA(gd,gSR)=MRCA(g,gSR), yielding ! ! MRCA(g,gSD(P) ) <G MRCA(g,gSR). MRCA(g!,gSR) <G MRCA(g!,gO): By Lemma S.2, M ∗ M M ∗ M MRCA( (gd), (gSR)) <S MRCA( (gd), (gO)). (S2) ∗ Since no divergences above gd are transfers, by Lemma S.1, Equation S2 reduces to ∗ ∗ MRCA(gd,gSR) <G MRCA(gd,gO) ∗ ! ∗ ! . As above, MRCA(gd,gSR)=MRCA(g,gSR) and MRCA(gd,gO)=MRCA(g,gO), yielding MRCA(g!,gSR) <G MRCA(g!,gO). ! ! MRCA(g,gSR) <G MRCA(g,gXP ): M ∗ ∗ M ∗ M ∗ M Since (gd) ∈ D and (gSR) ∈ R , MRCA( (gd), (gSR)) = as. Applying Lemma S.1(a) and substituting ! ∗ MRCA(g,gSR) for MRCA(gd,gSR), we obtain M (MRCA(g!,gSR)) =S as. Since E (MRCA(g!,gSR)) = σ, all descendants of MRCA(g!,gSR) are in species that are descendants of as. For gXP , by definition, sDUP ≥S as and M ! (MRCA(g,gXP )) ≥S as. E ! ! Since (MRCA(g,gXP )) = δ, the children of MRCA(g,gXP ) are also in as or a species ancestral to as. ! ! ! Suppose that MRCA(g,gSR)) ≥G MRCA(g,gXP ). Then, the children of MRCA(g,gSR) could not be descendants of the ! M ! ! children of MRCA(g,gXP ). Since (C(MRCA(g,gXP ))) ≥S as, children of MRCA(g,gXP ) would be in a species s ≥S as. “darby_xenologclassification-supplement” — 2016/11/6 — page S3 — #12 S4 Darby et al. g5 gDUP a OX s g3 gDUP X P g5 gDUP g2 SRX g SDX 3 t PX g g1 2 g1 ZYXW V h g h g g h g h g g ggˆ g g g h h h h g Z Z Y Y ˆ X X X W V Y Z X W Z Y X W V ’ P P ’P P D R O PX SDX SRXSRX X X X X OX Fig. S1. Untyped Paraxenolog classification: (left) Gene tree with one duplication followed by one transfer shown in the context of the species tree. The duplication occurred in the transfer cenancestor. (right) The reconciled gene tree. Each leaf is annotated with its xenolog class. Internal nodes on the path from g! to the root are the common ancestor of g! and, respectively, the Primary xenolog, Sibling Donor xenolog, Sibling Recipient xenologs, Untyped paraxenologs, and Outgroup xenolog. The progression of P these labels satisfy the hierarchy, PX <X SDX <X SRX <X X <X OX, consistent with Theorem S.2. However, M (C(MRCA(g!,gSR))) must be a descendant of as leading to a contradiction. Therefore, ! ! MRCA(g,gSR)) <G MRCA(g,gXP ). Transfers are incomparable: An untyped Incomparable xenolog will always be less closely related than a Primary xenolog to a given reference gene. Otherwise, untyped Incomparable xenologs may fall anywhere in the hierarchy. We state the following theorem for three incomparable transfers.Extension to more complex scenarios with incomparable super-transfers is straightforward. Theorem S.2. (General xenolog class hierarchy: Incomparable transfers) Consider a reconciled gene tree, TG, with three 1 1 1 2 2 2 3 3 3 ! 1 mutually incomparable transfers, t =(gd,gr ),t =(gd,gr ), and t =(gd,gr ). Given a reference gene, g ∈ Δ(gr ), let 2 ! 3 ! gP ∈ Δ(gr ) be a Primary xenolog of g, and let gI(P) ∈ Δ(gr ) be an untyped Incomparable (para)xenolog of g. Then, ! ! MRCA(g,gP) <G MRCA(g,gI(P) ). 1 1 ! 1 Proof. By definition of Primary xenologs, gP is in Δ(gd)\Δ(gr ). Therefore, MRCA(g,gP)=gd. By definition of untyped 1 ! 1 ! ! Incomparable xenologs, gI(P) is not in Δ(gd). Therefore, MRCA(g,gI(P) ) >G gd. Thus, MRCA(g,gI(P) ) >G MRCA(g,gP). “darby_xenologclassification-supplement” — 2016/11/6 — page S4 — #13 Xenolog Classification S5 S.2 Comparable multiple transfers can form a loop in the species tree 1 2 k ∗ ∗ ∗ Recall that an ordered set of comparable transfers t ,t ,...t can be replaced by a single super-transfer, t =(gd,gr ), where ∗ 1 ∗ k M ∗ gd = gd and gr = gr .