Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Modeling Phylogenetic Comparative Methods with Hybridization
Tony Jhwueng
NIMBioS Interdisciplinary Seminar
Jan 25 2011 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Outline:
1 Introduction: Phylognetic Comparative Methods (PCMs). 2 Modeling PCMs with Hybridization. Develop possible comparative methods when there are ancient hybridization events in addition to the usual speciation events. 3 Some ongoing work. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Phylogenetic Comparative Methods Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Human (Akha) Chimpanzee Gorilla
4.7 million years ago
7.2 million years ago (Takahata et al., 1995) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Comparative Data
Examples: Body mass (adult male)(kg) Brain mass (adult male)(gram)
x1 55.5 y1 1361 Xbody = x2 = 56.7 , Ybrain = y2 = 440 . x3 172.4 y3 570
Data from Jerison (1973) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Phylogenetic Comparative Methods (PCMs)
Phylogenetic Comparative Methods (PCMs) are statistical methods that incorporating phylogenetic tree for analyzing comparative data in the ecology and evolution literature. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Phylogenetic Comparative Methods (PCMs)
Phylogenetic comparative methods are commonly applied to such questions as: 1 What is the slope of an allometric scaling relationship ? e.g. how does brain mass vary in relation to body mass ? 2 What was the ancestral state of a trait? (Schluter et al. 1997; Hardy 2006; Ronquist 2004.) e.g. where did endothermy(warm-blooded animals) evolve in the lineage that led to mammals? Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A PCM developed from Evolutionary Perspective
Rely directly on explicit assumptions regarding the evolutionary process. 1. FIC (Felsenstein 1985): derived directly from population genetic theory and requires an assumption that the traits of interest have evolved via the Brownian motion process. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Brownian Motion process: dyt = σdBt
σ (rate parameter) measures the intensity of the random fluctuations in the evolutionary process.
BM: σ=1 BM: σ=3 Trait Value y(t) Value Trait y(t) Value Trait −400 −200 0 200 400 −400 −200 0 200 400
0 2000 4000 6000 8000 0 2000 4000 6000 8000 Time t Time t Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
The Comparative Data is NOT Statisitcally Independent
Since the species are related by shared evolutionary history it may not be reasonable to view comparative data as independent, identically distributed realizations of the same stochastic process. the distribution of comparative data depends on the assumption of stochastic process for trait evolution. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Representing Phylogenetic Tree by Similarity Matrix G
Scale the phylogenetic tree so that the length from the root to each tip is 1 Relationship between the trait of paired species is measured by the shared branch length (time).
y1 y2 y3 Human Chimp. Gorilla
y1 y2 y3 0.4 0.4 y1 10 .60
G3 = y2 0.610 1
y3 0 0 1 0.6 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
FIC (Felsenstein, 1985): trait has evolved under the Brownian motion
The variation of rate of evolution of the trait value is proportional to time.
y1 y2 y3 Human Chimp. Gorilla y µ 10 .60 1 0.4 0.4 2 y2 ∼ MVN µ , σ 0.610 y µ 0 0 1 1 3 0.6 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Other PCMs
Developed from evolutionary perspective 2. PGLS (Martins and Hansen 1997, Butler and King 2004, Hansen 2008): expands the assumptions of BM to allow for other evolutionary scenarios (OU process) (for stabilizing selection). 3. PMM (Lynch 1991; Housworth et al. 2004): derived from quantitative genetics, and partitions phenotypic variation into phylogenetically heritable and nonheritable components. Developed from statistical perspective 4. ARM: spatial autoregressive method (Cheverud et al. 1985, Jittleman and Kot 1990). 5. PVR: phylogenetic eigenvector regression. Diniz-Filho et al. (1998) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Some useful textbooks and softwares for PCMs:
The Comparative Method in Evolutionary Biology by Harvey and Pagel, 1991. Inferring Phylogenies by Joseph Felsenstein, 2004, Ch 26. Analysis of Phylogenetics and Evolution with R by Emmanuel Paradis, 2006. Softwares: PHYLIP (Joseph Felsenstein) COMPARE (Em´ıliaMartins) BROWNIE (Brian O’Meara) Various R packages at http://cran.r-project.org/web/views/Phylogenetics.html. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Modeling PCMs with Hybridization Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Sunflower
Wild Sunflowers in a Field
(photo by Erin Silversmith) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Sunflower
Helianthus annuus. Helianthus petiolaris. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Sunflower
L. H. Rieseberg (1991) provided evidence that H. anomalus is a hybrid species derived from H. annuus and H. petiolaris. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Cichild
Lake Tanganyika (African Great Lake: the second largest freshwater lake in the world).
(picture from Wikipedia) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Cichild
A typical shell-nest constructed by large Lamprologus callipterus males. These aggregations attract different species of obligatory and facultative gastropod-shell-breeders, which consequently live and breed in closest vicinity. (photo from Koblm¨ulleret al. 2007) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Cichlid
Hybrid cichilds from Lake Tanganyika
Lamprologus meleagris. (photo by Hag- Lamprologus speciosus.(photo by blom, Fredrik ) Slaboch, Roman ) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Hybridization is common in nature: Cichlid
Hybrid cichilds from Lake Tanganyika
Neolamprologus fasciatus. (photo by Neolamprologus multifasciatus. Jensen, Johnny) (photo by Gagliardi, Flavio ) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Purpose of this project
Hybrid species are known for sharing some common phenotypes from their parents. When studying the trait of a group of related species involving multiple hybrids,
Question 1: are hybrids constrained to be between the parents, or hybridization allows them to break free from their constraints ?
Hybrid trait Hybrid trait Hybrid trait
? ? ? Trait values Species 1 Species 2 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Purpose of this project
Some exotic species can evolve rapidly especially after hybridization (Barrett and Richardson 1982).
Question 2: When studying the trait of a group of related species involving multiple hybrids, does hybridization increase the rate of evolution ? Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Modeling PCMs with Hybridization
If evolution involves ancient hybridizations (reticulate evolutionary events), instead of the phylogenetic tree, incorporate the phylogenetic network into comparative analysis. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Linder and Rieseberg 2004.
Assume the nonhybrid taxa are normal diploid organisms, in which each chromosome consists of a pair of homologs. In a diploid hybridization event, the hybrid inherits one of two homologs from each chromosome from each of its two parents.
Species 1 Hybrid Species 2 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A 3 taxa network
A nucleotide inherited from the A parent (species 1 at t1) of hybrid B (hybrid at t1) will be part of the subtree in which species 1 and hybrid are sister taxa
Species 1 hybrid Species 2
t2
A B C
t1 O Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A 3 taxa network
A nucleotide inherited from the C parent (species 2 at t1) of hybrid B (hybrid at t1) will be part of the subtree in which species 2 and hybrid are sister taxa.
Species 1 hybrid Species 2
t2
A B C
t1 O Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A 3 taxa network
Species 1 hybrid Species 2
t2
A B C
t1 O Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
The affinity between the hybrid and other species
Denote r as the trait of hybrid.
Denote x1 and x2 as the trait of species 1 and 2, respectively. Define r = µ + β(x1 + x2 − 2µ). (??) where β is called the hybrid parameter. Then
cov(r, z) = cov(β(x1 + x2), z), z = x1, r, x2. (?)
The new PCMs are associate with the hybrid parameter β. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Phylogenetic Tree and Phylogenetic Network in BM model
y1 y2 y3 x1 r x2 Human Chimp. Gorilla H. a. H. an. H. p.
0.6 0.6 0.6 0.6 0.6 1
0.4 0.4 0.4
y1 y2 y3 x1 r x2 y1 1 0.4 0 x1 1 0.4β 0 2 y2 0.4 1 0 r 0.4β 0.6 + 0.8β 0.4β y3 0 0 1 x2 0 0.4β 1 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Investigating the hybrid effect
Test of assumption: Question 1: Using trait data with evolutionary evidence (network), does the trait of hybrid at origin fall between its parents with statistical significance?
r = µ + β(x1 + x2 − 2µ)
Hypothesis Testing: 1 H : β = 0 2 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A simple approach for 3 taxa case
3 7 11
0.6 0.6 0.6
0.4 0.4
Input trait data: (3,7,11) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A simple approach for 3 taxa case
Statistical Model
3 µ 1 0.4β 0 2 2 7 ∼ MVN µ , σ 0.4β 0.6 + 0.8β 0.4β 11 µ 0 0.4β 1 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A simple approach for 3 taxa case
Parameters are estimated by maximum likelihood analysis and confidence intervals are obtained from simulation.
µˆ = 6.78 (5.94, 8.32)
σˆ = 4.12 (2.97, 5.88) βˆ = 0.47 (0.31, 0.65)
Answer for Question 1 (H0 : β = 0.5): Since 0.5 falls in the confidence interval (0.31, 0.65), the null hypothesis is not rejected. The trait of the hybrid in this data (3,7,11) to be more extreme than its parents is insignificant. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Investigating the hybrid effect
Question 2: Does hybridization increase the rate of evolution ?
x1 r x2 x1 r x2 H. a. H. an. H. p. H. a. H. an. H. p.
0.6 0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Investigating the hybrid effect
Test different rate of evolution. (McPeek, 1995; O’Meara et al. 2006)
1 0.4β 0 2 σ 0.4β 0.8β2 + 0.6 0.4β 0 0.4βt1 1
1 0.4β 0 0 0 0 2 2 σ 0.4β 0.8β2 0.4β + η 00 .60 0 0.4βt1 1 0 0 0 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Investigating the hybrid effect
Statistical models
2 Y ∼ MVN(µ1, σ Gβ) (same rate)
v.s.
2 2 Y ∼ MVN(µ1, σ (Gβ − J) + η J) (diff. rate)
Hypothesis Testing:
2 2 H0 : σ = η Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A simple approach for 3 taxa case
2 2 Answer for Question 2 (H0 : σ = η ):
2 Y ∼ MVN(µ1, σ Gβ)
vs
2 2 Y ∼ MVN(µ1, σ (Gβ − J) + η J) By likelihood ratio test(LRT). Since 2 −2{log Lβˆ/ log Lβ,ˆ ηˆ} = 2.69 < χdf =1 = 3.84 (P value = 0.101), we fail to reject the null hypothesis. Hence the heterogeneous rate of evolution between hybrid and other species is insignificant. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Phylogenetic Network
1 2 X1 3 4 5 X2 6 7 8 9 10 11 12 13 14 15 16 X3 17 18 19
20 21 22
23 24 25
26 28 27 29 30 31 32
33 36 34 35
37 38 39
40 41 42 43 44 45 46 47
48 49
50 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Algorithm for general construction of similarity matrix.
1 Input: phylogenetic network in eNewick format. 2 Construct the similarity matrix through following recursive formula t Gβ,k = Kk−1Gβ,k−1Kk−1 + tk−1Ik (BM)
where K depends on hybridization (Kh) or speciation (Ks ). Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Quick look for algorithm of 4 taxa
1 2 3 4
t3
4 5 6 7
t2 5 8
t1 9 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[9] → [5, 8]: Speciation
1 2 3 4
t3
5 6 7 4
t2
5 8
t1
9 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[9] → [5, 8]: Speciation
5 8 5 t1 0 G2 = 8 0 t1 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[5, 8] → [5, 7, 4]: Speciation
1 2 3 4
t3
5 6 7 4
t2
8
t1
9 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[5, 8] → [5, 7, 4]: Speciation
5 7 4 5 t1 + t2 0 0 G3 = 7 0 t1 + t2 t1 4 0 t1 t1 + t2 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[5, 7, 4] → [5, 6, 7, 4]: Hybridization
1 2 3 4
t3
5 6 7 4
t2
8
t1
9 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[5, 7, 4] → [5, 6, 7, 4]: Hybridization
5 6 7 4 5 t1 + t2 β(t1 + t2) 0 0 2 0 6 β(t1 + t2) 2β (t1 + t2) β(t1 + t2) βt1 Gβ,4 = 7 0 β(t1 + t2) t1 + t2 t1 4 0 βt1 t1 t1 + t2 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[5, 6, 7, 4] → [1, 2, 3, 4]: Elongation
1 2 3 4
t3
5 6 7
t2
8
t1
9 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
[5, 6, 7, 4] → [1, 2, 3, 4]: Elongation
Gβ,4 =
1 2 3 4 1 t1 + t2 + t3 β(t1 + t2) 0 0 2 2 β(t1 + t2) t3 + 2β (t1 + t2) β(t1 + t2) βt1 3 0 β(t1 + t2) t1 + t2 + t3 t1 4 0 βt1 t1 t1 + t2 + t3 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Statistical Model and Parameters Estimation
Statistical model
2 Y ∼ MVN(µ1, σ Gβ)
Negative log likelihood function `(µ, σ2, β|Y)
n n 1 1 = log 2π + log σ2 + log |G |+ (Y−µ1)t G−1(Y−µ1) 2 2 2 β 2σ2 β MLE estimation through derivative free method: golden section method. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Application: Koblm¨ulleret al. (2007)
Cichild in Lake Tanganyika: 5 hybrids (in bold) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Phylogenetic Network: regraph from Koblm¨ulleret al. (2007)
1 2 X1 3 4 5 X2 6 7 8 9 10 11 12 13 14 15 16 X3 17 18 19
20 21 22
23 24 25
26 28 27 29 30 31 32
33 36 34 35
37 38 39
40 41 42 43 44 45 46 47
48 49
50 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Input
eNewick format ((((((1 : 0.495, ((((2 : 0.215, (5 : 0.215)27 : 0)26 : 0.1, (14 : 0.315)34 : 0)33 : 0.055, (X1 : 0.1, (3 : 0.1)21 : 0)20 : 0.27)37 : 0.06, ((21 : 0, 4 : 0.1)22 : 0.26, (27 : 0, (X 2:0 .17, (6 : 0.17)24 : 0)23 : 0.045)28 : 0.145)36 : 0.07)38 : 0.065)39 : 0.11, 7 : 0.605)41 : 0.045, ((8 : 0.575, 9 : 0.575)40 : 0.06, 10 : 0.635)42 : 0.015)43 : 0.125, ((((24 : 0, 11 : 0.17)25 : 0.075, 12 : 0.245)29 : 0.475, (13 : 0.68, (34 : 0, (15 : 0.275, (16 : 0.275)31 : 0)30 : 0.04)35 : 0.365)44 : 0.04)45 : 0.02, (31 : 0, X 3: 0.275)32 : 0.465)46 : 0.035)47 : 0.11, (17 : 0.835, 18 : 0.835)48 : 0.05)49 : 0.115, 19 : 1)50 : 0;
Trait data: total lengths of cichild (cm). (Froese and Pauly, 2010) www.fishbase.org Y = (y1, y2, ··· , y19) = (13.5, 12.4, 7, 5.8, 5.5, 16, 15, 25, 6.1, 6.5, 5.5, 5, 7.8, 15, 4.3, 4, 8.6, 10.3, 8.5) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Some result
Model does not fit the raw data, log transform the data. MLE estimators βˆ = 0.43, µˆ = 2.06, σˆ = 0.69. Since βˆ = 0.43 < 0.5, the trait values of those hybrid species are not more variable than those of the other species. −5 As 0.5 falls in (βˆ25, βˆ975) = (5.57 · 10 , 1.05). The total length of those hybrids to be more extreme than their parents is insignificant. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Some ongoing work in this project
1 Contribute R packages bmhyd (BM model) and ouhyd (OU model). 2 Modeling with more parameters Ri = βi (Xi + Yi ), i = 1, 2, ··· , d. 3 Improve these methods to allow adaptive radiations. 4 Develop sampling strategy to study performance (bias and power of β) of BM and OU model. 5 Model events such as horizontal gene transfers that affects traits but are biologically different from hybridization. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Acknowledgement: NIMBioS Elizabeth Housworth (Dept. of Mathematics, Indiana University Bloomington) Brian O’Meara (EEB, Univ. of Tennessee, Knoxville) Em´ıliaMartin (Dept. of Biology, Indiana University Bloomington) Vasileios Maroulas (Dept. of Math, Univ. of Tennessee, Knoxville) Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Thanks ! Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Bias and Power analysis for the hybrid parameter
Access the bias and power of parameter β using Access 5, 9, 17, 33, 65 and 129 taxa phylogenetic network, each contains one ancient hybrid.
The time where the ancient hybridization occurs. Set t1 by t1 = 0.1, 0.5, 0.9, separately. Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
A sample network for simulation
t4
t3
t2
17 taxa
1 hybrid t1 Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Bias analysis of BM model
Bias of β
Red : t1 = 0.1 Black : t1 = 0.5 Blue : t1 = 0.9 −3 −2 −1 0 1 2 3
−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 β Introduction: PCMs Modeling PCMs with Hybridization Some ongoing work
Power analysis of BM model
Power of β = 0.5
Up : t1 = 0.9 Middle : t1 = 0.5 Down : t1 = 0.1 0.0 0.2 0.4 0.6 0.8 1.0
−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 β