<<

Remarks on The Legendre Transform

Arthur Jaffe September 28, 2018

I Background

The Legendre transform L takes a f defined on a D into another convex function Lf defined on a convex set LD. In particular (Lf)(p) = sup (hp, vi − f(v)) . (1) v∈D For a convex function f on a convex set, the Legendre transform has period two, namely

2 L f = f . (2) The Legendre transform is non-linear (unlike the Fourier or Laplace transforms). It has interesting, if subtle, properties. In particular, the non- makes the Legendre transformation more difficult to deal with, but nevertheless the transformation is very useful. In these notes we consider an elementary form of the Legendre transform, in which we transform a function f of a single, vector-valued variable v. In certain fields of one encounters “higher order” Legendre transforms, associated with functions of more than one vector-valued variable.

I.1 Convex Sets and Convex Functions A set D is said to be convex, if given two points x, y ∈ D, the interpolating line tx+(1−t)y ∈ D for 0 ≤ t ≤ 1. A real-valued function f(v) defined for v in a convex set D is convex, if

f(tv + (1 − t)w) 6 tf(v) + (1 − t)f(w) , for all v, w ∈ D , and 0 6 t 6 1 . (3) In other words, the graph of the function lies below a straight line interpolating between the value of the function at two given points. One says that f is strictly convex if the inequality (3) is strict for all v 6= w and t ∈ (0, 1).

1 The simplest case occurs when D ⊂ R is a closed on the line and the function f is twice differentiable. Then taking t = 1/2, multiplying by (v − w)−2 and letting w → v 00 in (3) shows that convexity ensures f (v) > 0. In fact positivity of the second is also sufficient for f to be convex. The second derivative test gives a straightforward way to determine whether a real-valued function f(v) of one variable is convex. In the case of a real-valued function of many variables, one needs to generalize the second derivative test to a condition on the matrix of possible second . If v ∈ RN , consider the N × N real, symmetric matrix H with entries ∂2f(v) Hij(v) = . ∂vi∂vj

The matrix H is called the of f(v). This matrix is real and symmetric, so 00 H has real eigenvalues. The condition that generalizes 0 6 f , is the condition that all the eigenvalues of H are non-negative. One says that H is “positive” and writes

0 6 H . (4)

N 1 Equivalently, for any vector a ∈ R , the expectation of H in a satisfies, 0 6 ha, Hai. Generally we want this inequality to be strictly positive unless a = 0. In that case every eigenvalue of H is strictly positive, and the matrix H is said to be “positive definite” which one writes as 0 < H , (5) or 0 < ha, Hai unless a = 0. Any positive definite matrix has an inverse.

II The Legendre Transform and its Dual Domain

We start from a convex domain D and a real-valued, convex function f(v) defined for v ∈ D. Then one defines the Legendre transform Lf of f as (1), namely

(Lf)(p) = sup (hp, vi − f(v)) . v∈D

PN Here hp, vi = j=1 pj vj. We can think of the domain LD of Lf as a dual to the domain D. This is just the set of p such that (1) is well defined (finite).

1 P ~ P Here we have dropped using the vector sign on a and write ha, bi = j ajbj instead of ~a · b = j ajbj. Both notations are common, and for real vectors the two notations are interchangeable. In the case of P complex vectors, one generally defines ha, bi = j a¯jbj, wherea ¯j denotes the complex conjugate of aj.

2 We claim that LD is a convex set and that Lf is a convex function on LD. Let p1, p2 ∈ LD and take 0 6 t 6 1. Define p(t) = tp1 + (1 − t)p2. Then

hp(t), vi − f(v) = t (hp1, vi − f(v)) + (1 − t)(hp2, vi − f(v)) . Taking the sup over v then yields

sup (hp(t), vi − f(v)) 6 tLf(p1) + (1 − t)Lf(p2) . (6) v∈D

As the right side is bounded, this shows (Lf)(p(t)) is well-defined by the left side of (6). Hence p(t) ∈ LD, and LD is a convex set. Furthermore (6) shows that

(Lf)(p(t)) 6 t(Lf)(p1) + (1 − t)(Lf)(p2) , (7) so the Legendre transform Lf is a convex function on LD.

II.1 An Interesting Inequality

Note that at the point where (1) holds, the function (Lf)(p) is the maximum of the function hp, vi − f(v). But at any other point v, one still has the relation

hp, vi 6 f(v) + (Lf)(p) . (8) This inequality enters some applications of the Legendre transform.

II.2 Another Formula for Lf Since the function (Lf)(p) maximizes the expression hp, vi − f(v) as a function of v, any point v = v0 for which the maximum occurs is a critical point of hp, vi − f(v). In other words,

∂ (hp, vi − f(v)) = 0 , j = 1,...,N. ∂v j v=v0

In case f is strictly convex, then there is a unique point v0 that maximizes this expression. One simply writes ∂f(v) (Lf)(p) = hp, vi − f(v) , where pj = , for j = 1,...,N. (9) ∂vj This is the form of the transformation from the Lagrangian to the Hamiltonian, where we suppress the dependence of both sides on the variables q and possibly t. In this case f(v) = L(q, v, t), where L is the Lagrangian and LL = H, the Hamiltonian.

3 II.3 Remark about H = 0 ∂2f Note that in case the function f(v) is linear, we then have = ij ≡ 0 for all v. In ∂vi∂vj H ∂f this case (and likewise in the case that f is piecewise linear) the equations pj = cannot ∂vj be solved for vj as a function of p. Thus we cannot use the definition (9) of the Legendre transform. But we can still use the definition (1), and we take that one.

0 II.4 The Relation HH =Id There is a close relationship between the Hessian matrix H(v) of f(v) and the Hessian matrix H0(p) of the Legendre transform (Lf)(p) of f. In fact when one evaluates H0(p) at the corresponding point p = ∇vf(v), these matrices are inverses of one another, namely −1 H(v) = H0(p) . In order to see this, notice that

2 2 ∂ f(v) 0 (∂ Lf)(p) Hjk(v) = , and Hjk(p) = . (10) ∂vj∂vk ∂pj∂pk

At the point ∂f(v) pj = , (11) ∂vj one thus has 2 ∂ f(v) ∂pj Hjk(v) = = . (12) ∂vj∂vk ∂vk The relation (12) states that the Hessian H(v) of f(v) is the Jacobian matrix of the transfor- mation v 7→ p at the point (11). It also shows that the Jacobian matrix equals its transpose. Also at the point (11), one has (Lf)(p) = hp, vi − f(v), so

(∂ f)(p) X ∂vj X ∂f(v) ∂vj L = v + p − = v . (13) ∂p i j ∂p ∂v ∂p i i j i j j i

∂H(q, p, t) Notice that in the Lagrangian-Hamiltonian case, relation (13) saysq ˙i = . From ∂pi (13) we infer that 2 0 ∂ (Lf)(p) ∂vi Hij(p) = = . (14) ∂pi∂pj ∂pj So the relation (14) says that the Hessian H0(p) is the Jacobian matrix of the inverse trans- formation p 7→ v.

4 Thus at the point (11) the two Hessian matrices H(v) and H0(p) are the Jacobian matrices of inverse transformations. Therefore the two Hessian matrices are inverses of each other. In other words at the point (11),

X X ∂vi ∂pj ∂vi ( 0 ) = 0 (p) (v) = = = δ . (15) H H ik Hij Hjk ∂p ∂v ∂v ik j j j k k

II.5 The Relation L2f = f The key to showing that the Legendre transform squares to the identity is the relation (13). For if this relation holds, then one can evaluate the supremum of the convex function of p, namely hv, pi − (Lf)(p) , (16)

∂(Lf)(p) 2 at the point where vi = , to obtain the function (L f)(v). Assuming that we can ∂pi invert the relationship v 7→ p, the point (13) is the same as the point (11). But at the point (11), one has (Lf)(p) = hp, vi − f(v). Thus as hp, vi = hv, pi, we conclude that

2 (L f)(v) = sup (hv, pi − ((Lf)(p)) = f(v) . (17) p

In the context of , performing the Legendre transform on the variable q˙ ↔ p we have,

(LL)(q, q,˙ t) = H(q, p, t) , and (LH)(q, p, t) = L(q, q,˙ t) . (18)

III Some Examples

Here are several instances of the Legendre transformation in practice. We give no explanation here, but only suggest that these examples indicate that the Legendre transformation is ubiquitous.

• Classical Mechanics: The correspondence between the Lagrangian L(q, q˙) and the Hamiltonian H(q, p). For fixed q, the Legendre transform acts on the variable v =q ˙.

: The correspondence between the and the . In thermodynamics, one defines the Helmholz free energy F (T,V ) as a function of temperature T and V . On the other hand, the Gibbs

5 free energy G(T,P ) depends on temperature T and pressure P . The relationship be- tween the two is a Legendre transformation,

G(T,P ) = PV + F (T,V ) . (19)

One generally eliminates V on the right side, by solving for V = V (P,T ).

, H¨older’sinequality: Suppose 1 < α, β are related by α−1 +β−1 = 1. R α 1/α For kfkα = |f| dx , H¨older’sinequality states that

kfgk1 6 kfkα kgkβ . (20) The Schwarz inequality is the case α = β = 2. 1 α 1 β Suppose 0 6 v, p are positive numbers, and define f(v) = α v . Then (Lf)(p) = β p , and the Legendre transform inequality (8) says that 1 1 vp vα + pβ . (21) 6 α β For α = β = 2, this says that the geometric mean is bounded by the arithmetic mean. Replace v and p by their positive square root, so √ 1 vp (v + p) . 6 2

To obtain (20), replace f(x) by f(x)/kfkα and g(x) by g(x)/kgkβ. Thus one sees that it is sufficient to establish (20) in case that kfkα = kgkβ = 1, namely when the right side of (20) is 1. Note that (21) ensures that 1 1 |f((x)g(x)| |f(x)|α + |g(x)|β . 6 α β

Integrating this inequality over x and using kfkα = kgkβ = 1, yields 1 1 1 1 kfgk kfkα + kgkβ = + = 1 . 1 6 α α β β α β

• Statistical Mechanics and Quantum Fields: The relation between connected Feynman diagrams and “one-particle irreducible” diagrams.

6