Math/C SC 5610 Computational Biology Lecture 10 and 11: Phylogenetics
Stephen Billups
University of Colorado at Denver
Math/C SC 5610Computational Biology – p.1/29 Announcements
Project Guidelines and Ideas are posted. (proposal due March 8) CCB Seminar, this Friday (Feb. 18) Speaker: Kevin Cohen Title: Two and a half approaches to natural language processing in Computational Biology Time: 11-12 (Followed by lunch) Place: Media Center, AU008
Math/C SC 5610Computational Biology – p.2/29 Outline
Finish Intro to Optimization Baldi-Chauvin Algorithm Phylogenetics
Math/C SC 5610Computational Biology – p.3/29 Equality Constrained Optimization
minx∈X f(x) subject to h(x) = 0
Define the Lagrangian:
L(x, ) = f(x) h(x). where ∈ IRm. Optimality Conditions: If x is a solution, then there exists ∈ IRm such that ∇xL(x , ) = ∇f(x ) ∇h(x ) = 0.
Math/C SC 5610Computational Biology – p.4/29 Geometric Intuition
The equation ∇f(x ) ∇h(x ) = 0. says that ∇f(x ) is a linear combination of {∇h1(x ), ∇h2(x ), . . . , ∇h3(x )}, which says that ∇f(x ) is orthogonal to tangent plane of the constraints.
g(x)=0
grad g(x)
grad f(x)
Math/C SC 5610Computational Biology – p.5/29 Back to Training HMMs
Now that we understand a little about optimization, we can now look at the Baldi-Chauvin Algorithm for training HMMs.
Math/C SC 5610Computational Biology – p.6/29 Baldi-Chauvin Algorithm
Main Ideas: Applies gradient descent to minimize the negative log-likelihood E = log L(M) as a function of the model parameters. Requires constraints on the probabilities:
n n m X i = 1, X ai,j = 1, X bi,k = 1. i=1 j=1 k=1
This is accomplished essentially by variable elimination. Does not use a linesearch. Instead, the approach is to update as follows:
xk+1 = xk C∇f(xk),
where C is a constant. (Not guaranteed to converge to anything!!).
Math/C SC 5610Computational Biology – p.7/29 Baldi-Chauvin (cont).
Employs a change of variables that ensures that transition and emission probabilities never go to zero.
e ωi,j ai,j = ωi,k k e