Conformational analysis of molecular chains using Nano-Kinematics
y z
Dinesh Mano cha Yunshan Zhu William Wright
Computer Science Department
UniversityofNorth Carolina
Chap el Hill, NC 27599-317 5
fmano cha,zhu,[email protected]
Fax: 919962-179 9
Abstract: We present algorithms for 3-D manipulation and conformational analysis of molecular
chains, when b ond length, b ond angles and related dihedral angles remain xed. These algorithms
are useful for lo cal deformations of linear molecules, exact ring closure in cyclic molecules and
molecular emb edding for short chains. Other p ossible applications include structure prediction,
protein folding, conformation energy analysis and 3D molecular matching and do cking. The algo-
rithms are applicable to all serial molecular chains and make no asssumptions ab out their geometry.
We make use of results on direct and inverse kinematics from rob otics and mechanics literature
and show the corresp ondence b etween kinematics and conformational analysis of molecules. In
particular, we p ose these problems algebraically and compute all the solutions making use of the
structure of these equations and matrix computations. The algorithms have b een implemented
and p erform well in practice. In particular, they take tens of milliseconds on currentworkstations
for lo cal deformations and chain closures on molecular chains consisting of six or fewer rotatable
dihedral angles.
Categories: 3D Protein Mo deling, 3D Molecular Matching and Do cking, Multiple alignmentof
genetic sequences
Supp orted in part by Junior FacultyAward, University ResearchAward, NSF Grant CCR-9319957, ARPA
Contract DABT63-93-C-0 048 and NSF/ARPA Science and Technology Center for Computer Graphics and Scienti c
Visualizatio n, NSF Prime Contract Numb er 8920219
y
Supp orted in part by NIH National Center for Researc h Resources grant RR-02170
z
Supp orted in part by NIH National Center for Research Resources grant RR-02170
1 Intro duction
Three dimensional manipulation of molecular chains is a fundamental problem in conformational
analysis and do cking applications. Conformational analysis deals with computation of minimal en-
ergy con gurations of deformable molecules and do cking involves matching one molecular structure
to a receptor site of a second molecule and computing the most energetically favorable 3-D con-
formation. These problems have b een extensively studied in bio chemistry and molecular mo deling
literature. Molecular chains are mo deled using b ond lengths, b ond angles and dihedral angles.
In the literature on conformational energy calculations, b ond lengths and b ond angles have
b een treated in two di erentways, as variables or as xed parameters[GoS70]. However, mo deling
the b ond-length and b ond-angle as variables signi cantly add to the complexity and even for small
chains consisting of six or seven b onds, minimal energy computation b ecomes a non-trivial problem.
In their classic pap er, Go and Scheraga highlight that the minimal energy conformations computed
by xing these parameters is a go o d guess to the actual minima and can b e used along with a
p erturbation treatment for the energy minimization [GoS70]. Therefore, in this pap er, we assume
that the b ond lengths and b ond angles are xed and compute the conformation of the molecular
chains bychanging the dihedral angles only.
Given the assumptions of xed b ond-lengths and b ond-angles, two fundamental problems arise
in conformational energy calculations: ring closure and local conformational deformations [GoS70].
The problem of ring closure arises when we deal with cyclic molecules, i.e. the calculation of
dihedral angles which corresp onds to exact ring closure. In conformational energy minimization
pro cedures, one selects a starting conformation and alters it step by step such that the corresp ond-
ing conformational energy decreases monotonically.However, a small change in a dihedral angle
lo cated near the middle of the long chain causes a drastic change in the overall conformation of
the molecule. It is therefore, desired to have a co op erativevariation of angles which con nes the
conformational changes to a local section of the chain. As a result we are interested in calculation of
changes in dihedral angles, which cause lo cal deformations of conformations in long p olymer chains
[GoS70, BA82, CMR89, BK85]. In this pap er, We also address the molecular embedding problem,
that is, to nd atom p ositions that satis es a set of geometric constraints. It is an imp ortant
problem in interpreting data from NMR exp eriment [Cri89].
Manyinteractive systems for molecular mo deling suchasSybyl [Tri88] and Insight [Bio91]
provide capabilities for building and changing molecular mo dels by rotating the dihedral angles.
The full conformation space of acyclic molecules is the Cartesian pro duct of all the dihedral angle 2
with one-dimensional cycles, known as the n-dimensional toroidal manifold, where n is the number
of rotatable b onds. There is a considerable amount of literature on exhaustive metho ds. A recent
survey of di erent metho ds is given in [HK88]. The complexity of the search metho ds is an
exp onential function of the numb er of rotatable joints and therefore limited to chains with low n.
For cyclic molecules having rotatable b onds in the rings, six degrees of freedom are lost due to the
closure constraint. Go and Scheraga have prop osed solutions to computing the ring conformations
[GoS70, GoS73]. The algorithm in [GoS70] is restricted to molecular chains where the rotation can
take place ab out all backb one b onds and the metho ds in [GoS73] are for symmetric chains only.
They reduce the problem to computing ro ots of a univariate p olynomial. Algorithms based on
distancegeometry to study the ring systems are prop osed in [We83, PD92]. They use a distance
matrix to generate a set of cartesian co ordinates that satis es the original set of distances as
much as p ossible. A similar approach based on successive in nitesimal rotations is presented by
[SLP86]. Although these approaches make no assumption ab out the geometry of molecules, they are
relatively slow in practice, may fail to converge and cannot b e used for analytic computation of the
conformation space for even small chains. More recently, Cripp en has intro duced the technique of
linear embedding and applied it to explore the conformation space of cycloalkanes [Cri89, Cri92]. It
is a variant on the usual distance geometry metho ds and makes use of the metric matrix as opp osed
to distance matrix. However, the application to ring structures has b een limited to molecular chains,
where all b ond lengths and b ond angles are equal. Other approaches for molecular conformations
are based on sto chastic optimization and Monte Carlo analysis [GoS78, Sau89, BK85]. These can
b e slow in practice and are not guaranteed to nd all the solutions.
In this pap er, we reduce the problem of p ose-computation to a system of algebraic equations
and showhow they relate to the problems of direct and inverse kinematics in mechanics and
rob otics [Cra89, SV89]. In particular, we show the corresp ondence b etween molecular conforma-
tional analysis and inverse kinematics of serial manipulators. We also showhowinverse kinematics
can b e used to solve the molecular emb edding problem for short chains. The problem of inverse
kinematics has b een extensively studied in the rob otics literature, but closed form solutions are
known for some sp ecial geometries only.We present techniques for computing the solutions of the
algebraic equations by making use of their sparse structure and reduce the problem to computing
the eigenvalues and eigenvectors of a matrix. Based on numerically stable and ecient algorithms
for computing the eigendecomp osition of a matrix, we present fast and accurate algorithms for
ring closure and lo op analysis. These algorithms have b een implemented and take ab out of tens 3
of milliseconds on currentworkstations for manipulation of molecular chains consisting of up to
six rotatable b onds. We demonstrate their p erformance on chain closures of cyclohexane and lo op
deformation on a p olyp eptide chain. These algorithms make no assumptions ab out the geometry of
the molecules and are applicable to all molecular chains. These techniques are based on kinematics
and applied to molecular b onds of the order of a few Angstroms. Therefore, we refer to them as
nano-kinematics
The rest of the pap er is organized in the following manner. In Section 2, we show the equivalence
between the kinematics problem and molecular mo deling problems. In Section 3, we showhow
the technique can also b e applied to molecualr emb edding problems. We present the algebraic
algorithms in Section 4 and show the reduction to an eigenvalue problem. Dep ending on the
geometry of the molecular chain, we obtain di erent matrix orders. We discuss extensions of the
algorithm to chains consisting of more than six dihedral angles in Section 5. The algorithms are
illustrated in Section 6 with several examples of applications. The details of the implementation
and p erformance are presented in Section 7.
2 Molecular Mo deling and Inverse Kinematics
The inverse kinematics problem for general serial manipulators is fundamental for computer con-
trolled rob ots. Given the p ose of the end e ector the p osition and orientation, the problem
corresp onds to computing the joint displacements for that p ose [SV89]. A rob ot manipulator may
consist of prismatic, revolute or spherical joints. A molecular chain with xed b ond lengths and
b ond angles corresp onds to a serial manipulator with revolute joints. Therefore, we only consider
the inverse kinematics problem for serial chains with revolute joints. Initially,we consider chains
with six rotatable joints denoted as 6R and later on extend the results to arbitrary chains.
We use the Denavit{Hartenb erg DH formalism, [DH55], to mo del a serial chain with n revolute
joints. Each b ond is represented by the line along its joint axis and the common normal to the next
joint axis. In the case of parallel joints, any of the common normals can b e chosen. A co ordinate
system is attached to each joint for describing the relative arrangements among the various b onds.
The z-axis of frame i, called z , is coincident with the joint axis i. Initially a base frame is chosen
i
such that the origin is on the z axis and the x , y axis are chosen conveniently to form a right
1 1 1
hand frame. The origin of frame i, o corresp onds to the p oint where the common normal to z
i i
and z intersection z .Ifz and z intersect, the origin z is their p ointofintersection. x
i 1 i i i 1 i i 4 X
P
Z Y
Ca
N Y Z H X
CN X
O C Ca Z
Y
Figure 1: Co ordinate systems on a p olyp eptide units based on DH formalism
is along the common normal b etween z and z through o , or in the direction normal to the
i i 1 i
z z plane if z and z intersect. Given x and z , y is chosen to complete the right hand
i 1 i i 1 i i i i
co ordinate system at o . This formulation is shown for a p eptide unit in Fig. 1. This is rep eated
i
for i =2;:::;n. Given a co ordinate system, we create the following parameters for the molecular
chain:
a = distance along x from o to the intersection of the x and z axes.
i i i i i 1
d = distance along z from o to the intersection of the x and z axes.
i i 1 i 1 i i 1
= the angle b etween z and z measured along x .
i i 1 i i
= the angle b etween x and x measured ab out z .
i i i i 1
1
These parameters for the p olyp eptide unit are shown in Fig. 2. The dihedral angles corresp ond to
the 's. Given this representation of the co ordinate systems for each b ond, the 4 4 transformation
i
matrix relating i 1 co ordinate system to i co ordinate system is: 5 P α1
d1 d2 Ca Ca H
C N α2 O Ca P
P
Figure 2: DH parameters for a p olyp eptide unit
0 1
c s s a c
i i i i i i i
B C
B C
B C
s c c a s
i i i i i i i
B C
A = ; 1
i
B C
B C
0 d
i i i
@ A
0 0 0 1
where s = sin ; c = cos ; = sin ; = cos :
i i i i i i i i
For a given serial chain or a rob ot manipulator we are also given the con guration of the end
of the chain or the end-e ector. This con guration is describ ed with resp ect to the base link or
link 1. We represent this con guration as:
0 1
l m n q
x x x x
B C
B C
B C
l m n q
y y y y
B C
A = :
hand
B C
B C
l m n q
z z z z
@ A
0 0 0 1
The problem of inverse kinematics corresp onds to computing the joint angles, ; ;:::; such
1 2 n
that
A A :::A = A : 2
1 2 n hand
i
This relationship can b e reduced to a system of algebraic equations by substituting x = tan .
i
2
2
1 x
2x
i i
Therefore, sin = and cos = .Eventually we obtain a system of 6 algebraic equations
i 2 2 i
1+x 1+x
i i
in n unknowns. 6
The problems of ring closure and lo cal deformations turn out to b e a sp ecial case of inverse
kinematics formulation, as shown b elow.
2.1 Ring Closure
The ring closure problem turns out to b e a sp ecial case of the inverse kinematics problem highlighted
ab ove. In particular for ring closure, the right hand side matrix A corresp onds to the identity
hand
matrix:
0 1
1 0 0 0
B C
B C
B C
0 1 0 0
B C
A = :
hand
B C
B C
0 0 1 0
@ A
0 0 0 1
As a result, all the solutions to the ring closure are obtained by substituting for A in 2.
hand
2.2 Lo cal Deformations
The problem of p erforming conformational variations on a lo cal p ortion of a chain corresp onds
to selecting the b onds formulating the chain, cho osing the subset of rotatable b onds and the
conformation of the end of the lo cal chain. The algorithm pro ceeds by assigning co ordinate systems
and computing the DH parameters for the lo cal chain. The rotatable b onds need not b e contiguous
as shown in Fig. 1. The resulting problem can b e p osed as shown in 2. A corresp onds
hand
to the conformation of the end of the chain and n corresp onds to the numb er of rotatable in the
selected chain.
2.3 Cases of Rotation ab out All Backb one Bonds
Go and Scheraga have considered a restricted version of the inverse kinematics problem, when the
rotation can take place ab out all backb one b onds as opp osed to a few selected dihedral angles
[GoS70]. This turns out to b e a sp ecial case of the inverse kinematics problem as well. In particular,
the Denavit-Hartenb erg parameters of suchachain corresp ond exactly to: a =0,d = b ond length
i i
and = b ond angle. Such a molecular chain has b een shown in Fig. 3.
i 7 F4 F6
F2
F0 α 4 α6 α2
d5 d3 d4 d6 d1 d2
α 3 α5 α1
F5 F3
F1
Figure 3: Parameters for Molecular Chain with All Bonds Rotatable
3 Molecular Emb edding and Inverse Kinematics
The molecular emb edding problem consists of nding one or more sets of atomic co ordinates such
that a given list of geometric constraints is satis ed. Such problems arise, when we are given data
derived from NMR exp eriments, where we are given some experimental constraints corresp onding
to the distance b etween some atoms. In these cases it is assumed that the molecule is under no great
strain, so that all b ond lengths and b ond angles are known from standard values taken from X-ray
crystallographic studies on small molecules [Cri89]. These constraints are referred to as holonomic
constraints. The goal of the emb edding algorithms is to nd all conformations which satisfy these
constraints. In this section, we show the equivalence b etween the emb edding problems and inverse
kinematics. Furthermore, fast algorithms for inverse kinematics can b e used for computing the
emb eddings.
There are many approaches to computing the molecular emb eddings and most general one
is based on EMBED [CH, Cri81]. It is frequently used for the determination of conformations
of small proteins in solution by NMR. Given the exp erimental and holonomic constraints, the
algorithm nds upp er and lower b ounds for all distances which satisfy the constraints and using
some random values in this range it computes the closest corresp onding three-dimensional metric
matrix. Cripp en has extended the algorithm using linearized emb edding and taking chiralityinto
account [Cri89]. The overall pro cess is iterative and may take considerable running time on some
cases. Furthermore, even for small chains it is not guaranteed to compute all the con gurations. 8
We show the equivalence b etween molecular emb edding and inverse kinematics for small chains.
It can b e combined with mo del building techniques for application to larger chains as shown in
[LS92].
3.1 Application to Small Chains
In this section we will consider emb eddings of molecular chains consisting of up to six rotatable
b onds. In the basic version of the problem we are given an initial b ond, sp eci ed by xed atom
p ositions, P and P , and a nal b ond sp eci ed by xed atom p ositions P and P . We are
1 2 6 7
interested in nding all the p ositions of all of the intermediate atoms P , P and P such that the
3 4 5
b ond lengths are all as required d ;d ;d and d and the angles b etween the b onds are all as
2 3 4 5
required ; ; ; and . The problem formulation is shown in Fig. 3. We can reduce this
2 3 4 5 6
problem to inverse kinematics in the following manner:
Assign a frame using the DH formulation at each atom shown as P in Fig. 3, such that the
i
z-axis is along the b onds. We know the p osition of P in the base co ordinate system assigned at
7
P . Let the orientation of the frame at P b e such that the z-axis is along the b ond P P and
1 7 6 7
the x and y axes are chosen to complete a right hand co ordinate system. A is therefore computed
E
appropriately. The Denavit-Hartenb erg parameters are computed using the d 's and 's. The
i i
inverse kinematics problem corresp onds to computing all the dihedral angles, given the p ose of the
end-e ector A . Given the dihedral angles, the p ositions of the intermediate atoms can b e easily
E computed.
P7 P3 P5 P1
α3 α5 d5 d3 d4 d6 d1 d2
α4 α6 α2
P4 P6
P2
Figure 3: Molecular emb edding problem with unknown atom p ositions
Theorem 3.1 Assuming xedbond lengths and bond angles, the conformation of a molecular chain 9 F5 F7 F3 F1 α5 α3
d5 d3 d4 d6 d1 d2
α α2 4 α6
F4 F6
F2
Figure 4: Inverse kinematics formulation of molecular emb edding
with up to 7 atoms can be determined its rst and last bond positions.
Pro of
We prove the theorem by showing that the molecular chains in Fig. 3 and Fig. 4 have equivalent
solution space. Therefore, the conformation of the molecular chain in Fig. 3 represented in terms
of unknown atom p ositions can b e computed using the algorithm for inverse kinematics highlighted
in the previous sections.
We initially show that for each solution to the emb edding problem, there exists a corresp onding
solution to the inverse kinematics problem. Given atom p ositions P ,P , P that satisfy the b ond
3 4 5
length and b ond angles constraints, the x-axis of lo cal frame at P can b e computed based on the
i
atom p ositions. Given the frames and their p ositions in the global co ordinate system, the s are
i
easily computed and they corresp ond to the solutions of the inverse kinematics problem.
Conversely,if s are the solutions for the inverse kinematics problem, using forward kinematics
i
the p ositions of p oints P , P , :::, P are computed. Given the p osition and orientation of the
2 3 6
frames, we know that P and P corresp ond to the given p ositions, and z is along P -P b ond, and
1 7 0 1 2
z is along P -P . Since all rotations are along the b onds, the p osition of P and P are invariant,
7 6 7 2 6
and they match the sp eci ed p ositions. The rotations along the b onds do not violate any b ond
length or b ond angle constraints.
Q.E.D.
Although wehave considered molecular chains with rotation along all backb one joints, the
relationship b etween emb edding problems and kinematics problems extends to all other geometries
like p eptide chains as well. 10
4 Inverse Kinematics and Matrix Formulation
Most of the literature in rob otics has concentrate on inverse kinematics of chains with six or fewer
joints. The complexityofinverse kinematics of a general six jointed chain is a function of the
geometry of the chain. While the solution can b e expressed in closed form for a variety of sp ecial
cases, such as when three consecutive joint axes intersect in a common p oint, no such formulation
is known for the general case. It is not clear whether the solutions for such a manipulator can
b e expressed in closed form. Iterative solutions based on numerical techniques like optimization
or Newton's metho d to the inverse kinematics for general 6R manipulators have b een known
for quite some time. However, they su er from two drawbacks. Firstly they are slow for practical
applications and secondly they are unable to compute all the solutions. As a result, most industrial
manipulators are designed suciently simply so that a closed from exists.
The inverse kinematics problem for six revolute joints has b een studied for more than two
decades in the rob otics literature. Given the multivariate p olynomial system, di erent algorithms
to reduce the problem to nding ro ots of a univariate p olynomial are given in [Pie68, RRS73 ,
AA79, DC80]. Given a 6R manipulator with a p ose of the end-e ector, the maximum number of
solutions assuming a nite numb er of solutions is 16 as shown in [Pri86, LL88, RR89 ]. Practical
metho ds to compute all the solutions are presented in [WM91] based on continuation metho ds and
in [MC92] based on matrix computations.
In this section, we consider the problem of inverse kinematics for a molecular chain with six
revolute joints. Chains with more than six joints or fewer than six joints are considered in next
section. We make use of the algebraic reduction of the problem to a univariate system presented
by Raghavan and Roth [RR89]. However the symb olic derivation highlighted by Raghavan and
Roth do es not take care of all the geometries e.g. the p olyp eptide con gurations. As opp osed
to reducing the problem to nding ro ots of a univariate p olynomial, we formulate it as an eigen-
value problem. Dep ending on the geometry of the molecular chain we obtain matrices of di erent
order, though the total numb er of solutions is still b ounded by 16 assuming a nite number of
con gurations. The main advantages of the matrix formulation are eciency and accuracy. The
reduction to a univariate p olynomial involves expanding a symb olic determinant, which is relatively
exp ensive. Moreover the problem of nding ro ots of p olynomial of degree 16 can b e numerically
ill-conditi oned [Wil59]. As a result, wemay not b e able to compute accurate results using IEEE
double precision arithmetic on currentworkstations. 11
4.1 Raghavan and Roth Formulation
Given the matrix equation for a six jointed manipulator, 2, Raghavan and Roth rearrange it as: