<<

Appendix: of Variations

Jianzhong Wu

This appendix provides a very brief, first-line introduction to , an extension of that was first introduced by in 1733. The background material is expected to be sufficient for those who are mainly interested in application rather than mathematical development of variational methods for molecular modeling. To get a more comprehensive understanding of this fascinating subject, the reader is referred to standard texts of mathematical such as:

1. Mathematical Methods of Physics, J. Mathews and R. L. Walker, Addison-Wesley, 1970. 2. Calculus of Variations, I. M. Gelfand and S. V. Fomin, Dover Books on Mathe- matics, 2000. 3. Variational Methods in , P. Blanchard and E. Brüning, Springer- Verlag, 1992.

A.1

A functional is an extension of what we mean by a multivariable . When we write a multivariable function f (z), where z is an n-dimensional , we mean = ( , ,..., ) ( ) that for each set of numbers z z1 z2 zn , there is a number f z associated ( ) = 2 = n 2 with it. Simple examples of multivariable functions are f z z i=1 zi or f (z) = a · z, where a is an n-dimensional vector. When we write a functional, F[y], we mean that for each smooth (differentiable) function y(x), there is a number F[y] related to it. In other words, a functional maps a function into a number, or a functional is a function of functions. The [ ]= 1 ( ) F y 0 y x dx provides a simple example of functionals. For each smooth function y(x), its integration from 0 to 1 yields a number. While the “input” of a

© Springer Science+Business Media Singapore 2017 315 J. Wu (ed.), Variational Methods in Molecular Modeling, Molecular Modeling and Simulation, DOI 10.1007/978-981-10-2502-0 316 Appendix: Calculus of Variations

(a) (b)

Fig. A.1 While the input for a multidimensional function is a vector, the input for a functional is a smooth function y(x). a An n-dimensional vector z contains a set of numbers affiliated with its dimensionality; b A one-dimensional function y(x) may be understood as a vector of infinite dimensionality multivariable function is an n-dimensional vector, the “input” for a functional is a function. By comparing the similarity between a function and a vector, we see that a functional is a function of infinite dimensionality. Schematically, Fig.A.1 illustrates the difference between the inputs for a multi-dimensional function and a functional.

A.2 Variational Problem

To illustrate how a functional can be used to solve a realistic problem, consider the time required for a ball to fall along some frictionless path with two ends fixed at positions A and B, as indicated in Fig. A.2. For simplicity, assume that the path is two-dimensional and that it can be described by a smooth function y = y(x).Let t denote the time required for the ball to go from A to B along a frictionless path. What path y(x) should be chosen to make t a minimum?

Fig. A.2 Calculus of variations can be used to identify a frictionless path that yields the shortest traveling time for a ball falling from point A to B Appendix: Calculus of Variations 317

For convenience, we put point A at the origin of a coordinate system and y downward. At any instant, the ball speed is

ds v = , (A.1) dt where v denotes the magnitude of speed, s represents the length along the path, and t is time. Rearrangement of Eq. (A.1)gives

ds dt = , v (A.2) and thus the total traveling time is  B ds t = . (A.3) A v

The differential length of the path ds is  ds = 1 + y2dx, (A.4) where y = dy/dx represents the of the path. Because the ball starts at point A, conservation of energy requires that at any vertical distance y, the loss of potential energy per unit mass at y is equal to the gain in the kinetic energy per unit mass, i.e.,

gy = v2/2(A.5) where g stands for the gravity constant. Substituting Eqs. (A.4) and (A.5)into Eq. (A.3)gives   x f 1 + y2 t = dx. (A.6) 0 2gy

Equation(A.6) indicates that the total time t can be found if we know y as a function of x. For any path with ends fixed at A(0, 0) and B(x f , y f ), there is a corresponding time for the ball to travel from A to B. Therefore, the total traveling time is a functional of path y(x), that is, t = F[y(x)]. The essential problem in calculus of variations is functional minimization,1 i.e., to find a function that minimizes a given functional. In the above example, we want to know the path y(x) with two ends fixed at A and B that gives the minimum descent time. To answer this question, we need to know how a functional responds to a change in its “input”, where the “input” is not an ordinary variable, but a function.

1Functional maximization can be concerted to minimization by trivially adding a negative sign. 318 Appendix: Calculus of Variations

A.3 Functional

To obtain the unknown function that minimizes a functional, we use functional dif- ferentiation as discussed below. It is not much different from the used in finding the minimum of a multidimensional function. The variation of a functional with respect to its “input” is described by a functional derivative: δ [ ( )] [ ( ) + εδ( − )]− [ ( )] F y x ≡ F y x x x F y x  lim δy(x ) ε→0 ε F[y + εδ]−F[y] = lim δ (A.7) ε→0 εδ dF[y] = δ(x − x) dy where ε is a real number, and δ(x − x) stands for the . As shown in Fig. A.3, the Dirac function δ(x − x0) represents a generalized density that is normalized and has a value of infinite at x = x0. According to Eq. (A.7), the functional derivative δF[y(x)]/δy(x) can be understood as the change in functional F[y(x)] with respect to a change in the input function y(x) at the point x = x. Because the functional derivative is in general dependent on x, δF[y(x)]/δy(x) is a function of x. The functional derivative defined above can be similarly applied to a function. Suppose f (y) is a function of y, its functional derivative with respect to y is

δ f (y) = f (y)δ(x − x). (A.8) δy(x)

In a special case f (y) = y,wehave

δy(x) = δ(x − x). (A.9) δy(x)

Equation(A.9) says that the functional derivative of a function with respect to itself is a Dirac delta function.

Fig. A.3 One-dimensional Dirac function δ(x − x0) represents a probability density that is everywhere zero except at x = x0 where it is infinite (∞) Appendix: Calculus of Variations 319

Functional derivative may be considered as a natural extension of a partial deriva- tive of a multivariable function to infinite dimensionality. To see this, consider again a multivariable function f (z), where z stands for an n-dimensional vector. Partial derivative ∂ f/∂zi describes the change in f (z) with respect to an infinitesimal change in the ith dimension of z while keeping all other dimensions unchanged, i.e.,

n ∂ f ∂ f df = δ dz = dz . (A.10) ∂z ij i ∂z i j=1 i i where δij stands for the Kronecker delta function, i.e., δij = 1fori = j and zero otherwise. Similarly, the change of a functional with respect to its “input” (function) at a point x can be written as    dF  dF  δF = dx δ(x − x )δy = δy . (A.11) dy dy x

Comparing Eqs. (A.10) and (A.11), we see that the variable x can be understood as a continuous index of function y(x), similar to ias an index of vector z. As all partial of a multi-dimensional function vanish at the minimum point, a functional F[y] reaches an minimum when

δF[y(x)] = 0(A.12) δy(x) for all values of x.

A.4 Chain Rules of Functional Derivative

A functional derivative obeys chain rules similar to those for a partial derivative. For example, the of a partial derivative of a multivariable function f (z) can be written as ∂ f {g(z)} n ∂ f ∂g = j , (A.13) ∂z ∂g ∂z i j=1 j i where g(z) is an n-dimensional function of vector z. The analogous chain rule for a functional derivative is  δF{G[y(x)]} δF δG(x) = dx , (A.14) δy(x) δG(x) δy(x) where the summation of discrete indices in Eq. (A.13) is replaced by an integral over the continuous indices. In particular, if F[y(x)]=y(x),wehave 320 Appendix: Calculus of Variations

 δy(x) δG(x) δ(x − x) = dx . (A.15) δG(x) δy(x)

Equation(A.15) represents a general relation between the reciprocals of functional derivatives. It can be shown that the functional derivative of a function is commutable with a normal derivative, i.e., δ(df/dx) d δ f = (A.16) δy dx δy where both f and g are functions of x. In a special case, the functional derivative of y(x) is δ dy(x) d δy(x) dδ(x − x) = = . (A.17) δy(x) dx dx δy(x) dx

A.5 Higher-Order Functional Derivatives and Functional Taylor Expansion

Higher-order functional derivatives can be defined similar to the higher-order partial derivatives. In general, the mth-order functional derivative of F[y(x)] is

δ(m) F[y(x)] d(m) F[y] = δ(x − x )δ(x − x ) ···δ(x − x ) (m) 1 2 m (A.18) δy(x1)δy(x2) ···δy(xm ) dy

These functional derivatives are used in the functional Taylor expansion. In parallel to a Taylor expansion of a multivariable function, f (z),

n ∂ f 1 n n ∂ f ∂ f f (z + z) = f (z) + z + z z +··· .,(A.19) ∂z i 2 ∂z ∂z i j i=1 i i=1 j=1 i j we can apply the Taylor expansion to a functional   2 δF 1  δ F  F[y + y]=F[y]+ dx y(x) + dxdx y(x)y(x ) +··· δy(x) 2! δy(x)δy(x) (A.20)

Again, the difference between the multivariable and the functional Taylor expansions lies only on in the summation of the indices, i.e., the summation of integers and the integration of a continuous variable. Appendix: Calculus of Variations 321

A.6

Functional integration provides a general procedure to evaluate the change in func- tional at different input functions. It can also be used to calculate a functional from its derivative. For a given function y(x), F[λy(x)] represents a function of real variable λ.By the chain rule, the derivative of F[λy(x)] with respect to λ gives  dF[λy(x)] dF[λy(x)] ∂(λy) δF[λy(x)] = = dx y(x). (A.21) dλ d(λy) ∂λ δ(λy(x))

The second equality in Eq. (A.21) can be verified by substituting the functional deriv- ative by its definition (i.e., Eq. (A.7)). Equation(A.21) holds true when we replace y(x) with y(x) ≡ y(x) − y0(x):  dF[y (x) + λy(x)] δF[λy(x)] 0 = dx y(x). (A.22) dλ δ(λy(x)) where y0(x) is an arbitrary input function. Integrating the two sides of Eq. (A.22) with respect to λ from 0 to 1 gives   1 δ [λ ( )] [ ]= [ ]+ λ  F y x  ( ). F y F y0 d dx  y x (A.23) 0 δ(λy(x ))

Equation(A.23) indicates that the change of a functional with its input function is related to the integration of the functional derivative and a coupling parameter λ linking the input functions.

A.7 Functional of a Multidimensional Function

It is straightforward to extend the functional derivative, the functional Taylor expan- sion, and the functional integral when the input is a multidimensional function, i.e., y = y(x) where x is a multidimensional vector. Following a procedure similar to that discussed for the one-dimensional case, we have a functional derivative

δF[y(x)] dF[y] = δ(x − x) (A.24) δy(x) dy where δ(x − x) stands for a multidimensional Dirac delta function. The functional Taylor expansion of F[y(x)] is   δF 1 δ2 F F[y + y]=F[y]+ dx y(x) + dxdx y(x)y(x) +···, δy(x) 2! δy(x)δy(x) (A.25) 322 Appendix: Calculus of Variations and a functional integral is   1 δF[λy(x)] F[y]=F[y0]+ dλ dx y(x). (A.26) 0 δ(λy(x))

A.8 An Illustrative Example

Now we return to the example shown in Fig. A.2. We want to find the functional derivative of   x f 1 + y2 t[y(x)]= dx. (A.27) 0 2gy and the path y(x) that yields the shortest descent time from A to B. For short notation, let  1 + y2 f (y, y) = . (A.28) 2gy f (y, y) can be understood as a normal two-dimensional function because y and y are literately independent. Following the rules of ordinary calculus, we have

∂ ∂  df = f + f dy . ∂ ∂  (A.29) dy y y y y dy

The functional derivative t[y(x)] is thus given by  δt[y(x)] x f df = δ(x − x)dx δ ( ) y x 0 dy    x f ∂ f ∂ f dy = + δ(x − x)dx ∂ ∂  0 y y y y dy   ∂ x f ∂ δ = f + f y  dx (A.30) ∂y  ∂y δy x=x 0 y ∂ x f ∂ = f + f δ( − )  d x x ∂y  ∂y x=x 0 y ∂ ∂ = f − d f ∂ ∂  y x=x dx y x=x

 where 0 < x < x f . The last equality is obtained by integrating by parts. The path of shortest descent satisfies δt[y(x)] = 0. (A.31) δy(x) Appendix: Calculus of Variations 323

From Eq. (A.29), we have

∂ f d ∂ f − = 0. (A.32) ∂y dx ∂y

Equation(A.31) is known as the Euler-Lagrange equation. Solution to this ordinary gives the path of shortest descent. Equation(A.31) can be solved most conveniently by using an indirect method. First, we notice that df ∂ f dy ∂ f = + y (A.33) dx ∂y dx ∂y

d ∂ f ∂ f dy d ∂ f y = + y (A.34) dx ∂y ∂y dx dx ∂y

Subtracting Eq. (A.32)byEq.(A.33), and utilizing Eq. (A.31), we find

d ∂ f ∂ f d ∂ f f − y = y − = 0. (A.35) dx ∂y ∂y dx ∂y

Thus ∂ f f − y = k (A.36) ∂y  where k is a constant. From f = (1 + y2)/(2gy), we find

∂ f y y = √1  =  f  (A.37) ∂y 2gy 1 + y2 1 + y 2 and  ∂ f y2 f 1 f − y = f (1 − ) = = = k. (A.38) ∂y 1 + y2 1 + y2 2gy(1 + y2)

Rearranging Eq. (A.37)gives

y(1 + y2) = 2a, (A.39) where a = gk2. Without loss of generality, we may assume y = tan(θ/2) where θ is a parameter. Then we have from Eq. (A.38)

2a y = = 2a sin2(θ/2) = a(1 − cos θ) (A.40) 1 + y2 324 Appendix: Calculus of Variations and dx 1 dy = = a tan(θ/2) sin θ = a(1 − cos θ) (A.41) dθ y dθ

Integrating Eq. (A.40) with respect to θ equation, subject to the boundary condition x = 0atθ = 0, yields x = a(θ − sin θ) . (A.42)

From Eqs. (A.39) and (A.41), we obtain the parametric equation for path y(x) that yields shortest time for the ball to descend from A to B:  x = a(θ − sin θ) (A.43) y = a(1 − cos θ)

Parameters a and θ f can be found by the condition that this path must end at point ( , ) B x f y f :  x = a(θ − sin θ ) f f f . (A.44) y f = a(1 − cos θ f )