<<

Classical Mechanics

Prof. Dr. Alberto S. Cattaneo and Nima Moshayedi

January 7, 2016 2 Preface

This script was written for the course called for mathematicians at the University of Zurich. The course was given by Professor Alberto S. Cattaneo in the spring semester 2014. I want to thank Professor Cattaneo for giving me his notes from the lecture and also for corrections and remarks on it. I also want to mention that this script should only be notes, which give all the definitions and so on, in a compact way and should not replace the lecture. Not every detail is written in this script, so one should also either use another book on Classical Mechanics and read the script together with the book, or use the script parallel to a lecture on Classical Mechanics. This course also gives an introduction on smooth manifolds and combines the mathematical methods of differentiable manifolds with those of Classical Mechanics.

Nima Moshayedi, January 7, 2016

3 4 Contents

1 From Newton’s Laws to Lagrange’s equations 9 1.1 Introduction ...... 9 1.2 Elements of Newtonian Mechanics ...... 9 1.2.1 Newton’s Apple ...... 9 1.2.2 Energy Conservation ...... 10 1.2.3 Phase Space ...... 11 1.2.4 Newton’s Vector Law ...... 11 1.2.5 Pendulum ...... 11 1.2.6 The Virial Theorem ...... 12 1.2.7 Use of Hamiltonian as a Differential equation ...... 13 1.2.8 Generic Structure of One-Degree-of-Freedom Systems ...... 13 1.3 ...... 14 1.3.1 Functionals and Variations ...... 14 1.3.2 Extremals ...... 15 1.3.3 Shortest Path ...... 15 1.3.4 Multiple Functions ...... 16 1.3.5 Symmetries and Conservation Laws ...... 17 1.4 The Action principle ...... 18 1.4.1 Coordinate-Invariance of the Action Principle ...... 18 1.4.2 Central Field Orbits ...... 19 1.4.3 Systems with Constraints and Lagrange Multipliers ...... 21

2 Differential forms 23 2.1 Notations ...... 23 2.2 Definitions ...... 24 2.2.1 The wedge product ...... 24 2.2.2 The exterior ...... 24 2.2.3 The Pullback ...... 24 2.2.4 The Lie derivative ...... 24 2.2.5 The contraction ...... 25 2.3 Properties ...... 25

3 Hamiltonian systems 27 3.1 Introduction ...... 27 3.2 Legendre Transform ...... 27 3.2.1 and Convexity ...... 28 3.2.2 Involution ...... 28 3.2.3 Total Differential of Legendre Transform ...... 29 3.2.4 Local Legendre Transformation ...... 29 3.2.5 Multivariable Case ...... 30 3.3 Canonical Equations ...... 30

5 6 CONTENTS

3.3.1 Hamiltonian Function ...... 30 3.3.2 Canonical Action Principle ...... 31 3.3.3 Previous Examples in Canonical Form ...... 32 3.4 The Poisson bracket ...... 33 3.4.1 Constants of motion ...... 34 3.4.2 The Poisson bracket in coordinate-free language ...... 34

4 Symplectic integrators 37 4.1 Introduction ...... 37 4.2 The Euler method ...... 37 4.3 Hamiltonian systems ...... 38

5 The Noether Theorem 41 5.1 Introduction ...... 41 5.2 Symmetries in ...... 41 5.2.1 Symmetries and the Lagrangian function ...... 42 5.2.2 Examples ...... 43 5.2.3 Generalized symmetries ...... 43 5.3 From the Lagrangian to the Hamiltonian formalism ...... 47 5.4 Symmetries in ...... 48 5.4.1 Symplectic geometry ...... 49 5.4.2 The Kepler problem ...... 50

6 The Hamilton–Jacobi equation 51 6.1 Introduction ...... 51 6.2 The Hamilton–Jacobi equation ...... 51 6.3 The action as a function of endpoints ...... 52 6.4 Solving the Cauchy problem for the Hamilton–Jacobi equation ...... 54 6.5 Generating functions ...... 57

7 Introduction to Differentiable Manifolds 59 7.1 Introduction ...... 59 7.2 Manifolds ...... 59 7.3 Maps ...... 61 7.3.1 Submanifolds ...... 62 7.4 Topological manifolds ...... 62 7.5 Differentiable manifolds ...... 64 7.5.1 The space ...... 64 7.5.2 The ...... 66 7.6 Vector bundles ...... 67 7.6.1 Constructions on vector bundles ...... 68 7.6.2 Differential forms ...... 69 7.7 Applications to mechanics ...... 69 7.7.1 The Noether 1-form ...... 69 7.7.2 The Legendre mapping ...... 70 7.7.3 The Liouville 1-form ...... 70 7.7.4 Symplectic geometry ...... 71

Appendices CONTENTS 7

Appendix A Topology and Derivations 75 A.1 Topology ...... 75 A.2 Derivations ...... 76

Appendix B Vector fields as derivations 79

Bibliography 80 8 CONTENTS Chapter 1

From Newton’s Laws to Lagrange’s equations

1.1 Introduction

Classical mechanics is a very peculiar branch of . It used to be considered the sum total of our theoretical knowledge of the physical universe (Laplace’s daemon, the Newtonian clockwork), but now it is known as an idealization, a toy model if you will. Classical Mechanics still describes the world pretty well in the range of validity, which is for example that of our everyday experience. So it is still an indispensable part of any physicist’s or engineer’s education. It is so useful because the more accurate theories that we know of (general relativity and quantum mechanics) make corrections to classical mechanics generally only in extreme situations (black holes, neutron stars, atomic structure, superconductivity, and so forth). Given that GR and QM are much more harder theory to use and apply it is no wonder that scientists will revert to classical mechanics whenever possible. So, what is classical mechanics?

1.2 Elements of Newtonian Mechanics

In the title classical means that there are no quantum effects. The simplest mechanical system is a mass point, which is a single point moving in space that has a finite mass m attached to it. You can think of the matter field belonging to a mass point as a delta-function in space: an infinitely concentrated, featureless lump of matter. The equation of motion for the mass point comes from physics and is expressed by Newton’s law:

force = mass × accelaration, F = ma. Our first mass point system comes right at the beginning of mechanics, it is Newton’s apple.

1.2.1 Newton’s Apple Newton’s apple is a mass point with mass m and vertical position z(t) at time t. Here z is a Cartesian coordinate pointing upwards. From physics it is known that the gravitational force on the apple is given by mg and points downwards. Here g is the gravity constant with typical value 9.81ms−1. Thus Newton’s law for the apple is

mz¨ = −mg ⇐⇒ z¨ + g = 0, where the dot denotes a time derivative. We see that the mass of the apple does not affect its motion in the gravitational field. What is the solution to the equation given above? This is

9 10 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS a second-order ODE in time, so the solution to the initial-value problem requires specifying two initial conditions. In our case these are given by the initial position z(0) and the initial velocity z˙(0). Given these two numbers the solution to the above ODE is

g z(t) = z(0) +z ˙(0)t − t2, 2 which is the equation for a parabola that you may recognize. It is worth reflecting about what we have done so far. Newton’s law tells us how to evolve a mechanical system in time. More specifically, let us define the state of our system as the collection of variables that completely specifies the conditions of our system at a moment in time. This is a key definition in mechanics. In the present case we have that

state = {position, velocity} = {z, z˙}

because these were the initial conditions needed for our ODE. If our ODE is well posed for t ∈ [0,T ] with some T > 0 then there is a unique map such that

state(0) −→ state(t)

for all t ∈ [0,T ]. Thus, in principle, the present state contains all the information needed to compute any future state; in other words the classical mechanical universe is deterministic and the future can in principle be predicted by solving a well-posed differential equation.

1.2.2 Energy Conservation

To derive the energy conservation law from the solution of our ODE, we use a standard proce- dure: multiply the equation of motion by the velocityz ˙ and manipulate. This yields

d z˙2  z˙z¨ + gz˙ = 0 ⇔ + gz = 0, dt 2 which is a conservation law for the energy function

z˙2 H(z, z˙) = + gz = const = E. 2 This function is defined up to an integration constant. The meaning of it is that the energy function H(z, z˙), which is called the Hamiltonian, is constant if z(t) satisfies Newton’s law. Other ways of saying the same thing are: H is an invariant of the motion or H is a first integral of Newton’s law. The value of H along a trajectory is denoted by E and is of course known from z˙2 the initial state. In physics, the 2 part of H is called the kinetic energy and the remainder is called the potential energy. The 1-dimensional potential energy is always given by a function V : R → R of the given coordinate, noted V (z) if the coordinate is z, which is not always the same for different mechanical systems.

Definition 1.2.1 (Energy function (one dimensional)). Let z be the position of a point mass 1 2 with mass m. The kinetic energy of the particle is given by T = 2 mz˙ and the potential energy is a function V : R → R, denoted V (z). Then the function

1 H(z, z˙) = T + U = mz˙2 + V (z) 2 is called the Hamiltonian energy function of the system. 1.2. ELEMENTS OF NEWTONIAN MECHANICS 11

1.2.3 Phase Space The dynamics of our mechanical system is best visualized in phase space, which is the space spanned by two state coordinates z andz ˙. Phase space is attractive for the following reasons:

• any possible state of the system corresponds to a specific point in phase space, so phase space is also the space of all possible states;

• a solution traces out a trajectory in phase space, and these trajectories do not cross if Newton’s law is well posed;

• If the energy is conserved, then the trajectories are contained in the contours (or level)of constant H.

The last point means that plotting contours of constant H immediately produces the solu- tion trajectories, albeit not their parametrization with respect to time t. This is very useful, because it allows us to learn something about the solution to another equation without solving the equation! That is what is all about. For Newton’s apple we note that all trajectories are open and that there are no fixed points, i.e., no points at whichz ˙ =z ¨ = 0. By inspection, such a fixed point would correspond to a critical point of H, i.e., a point where ∇H = 0.

1.2.4 Newton’s Vector Law In general a mass point moves in three-dimensional space and its position is described by the cartesian coordinates x = (x, y, z). Newton’s law is then the vector law

m¨x = F(x, ˙x, t) subject to given x(0) and ˙x(0). Here the prescribed force vector F may in general depend on the state {x, ˙x} and on time t. However, we will consider only energy-conserving , i.e. forces that derive from a time- independent scalar potential energy function V (x) via F = −∇V . For instance, the gravitational potential associated with our ODE at the beginning was V = mgz. The vector character of the above equation means that Newton’s law for a mass point corresponds to a system of three coupled ODE’s in time and that the phase space is six-dimensional (two dimensions, position and velocity, for each spatial direction). Each coordinate adds a position-velocity pair to the definition of the system state. The number of coordinates is often called the degrees of freedom, and so the mass point has three degrees of freedom in general. In our apple example, the two degrees of freedom related to the horizontal directions were irrelevant because Newton’s law reduced tox ¨ =y ¨ = 0 in them. Therefore, the system reduced to a single degree of freedom. another kind of reduction occurs through kinematical constraints, as the next example.

1.2.5 Pendulum A pendulum consists of a mass point with mass m attached to a rod with length l. The pendulum lies in the xz-plane and is fixed at the coordinate origin x = 0. There are two degrees of freedom for the position of the mass point, namely, x(t) and z(t), but they have to satisfy the constraint

x2 + z2 = l, at all times. This can be used to eliminate one degree of freedom from consideration. Indeed, using the angle φ such that x = l sin φ and z = −l cos φ we will derive the equation g φ¨ + sin φ = 0, l 12 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS

where g is gravity as before. This nonlinear equation is harder to solve than the one before, but fortunately we can get most information about the solution from the Hamiltonian function. The state of the constrained system is described by the phase space coordinates {φ, φ˙} and energy conservation is derived exactly as before and yields the pendulum Hamiltonian

φ˙2 g H(φ, φ˙) = − cos φ. 2 l Clearly finding the potential energy amounts to setting the force in Newton’s law equal to dV − dφ and integrating. In systems with more than one degree of freedom this works only (locally) if the vector force F(x) satisfies the integrability condition ∇ × F = 0. For small-amplitude oscillations |φ|  1 and therefore the potential energy term can be approximated by the first few terms of its Taylor series. Keeping only the first non constant term leads to the linear harmonic oscillator equations

φ˙2 g φ2 g H = + and φ¨ + φ = 0. 1 l 2 l with the simple general solution φ = A cos(Ωt) + B sin(Ωt), where the frequency Ω = pg/l. In phase space the contours of this H are ellipses and therefore all orbits are bounded, which is consistent with the small-amplitude, low-energy approximation.

1.2.6 The Virial Theorem The pendulum allows demonstrating a second useful technique for extracting some knowledge about the solution from the governing equations without solving them. The procedure is similar to finding the energy conservation law, but with the difference that this time we multiply the ODE for the pendulum by φ instead of φ˙. Also, we then time-average the equation over the t ∈ [0,T ]. This first yields

g d g φφ¨ + φ sin φ = 0 ⇐⇒ (φφ˙) − (φ˙)2 + φ sin φ = 0 l dt l and then

1 T 1 Z T  g  ˙ ˙ 2 φφ + −(φ) + φ sin φ dt = 0. T 0 T 0 l The first term is evaluated at the endpoints of the time integral. Now, under the assumption that φφ˙ is bounded this first term goes to zero as T → ∞. If we denote the time-averaging as T → ∞ by (...) then we obtain the virial theorem.

g (φ¨)2 = φ sin φ. l In general, this shows a relationship that has to be true for all motions satisfying the assumption that φφ˙ is bounded. In particular, for small-amplitude oscillations the right hand side reduces to twice the averaged potential energy. This shows that for small oscillations there is equipartition of energy between its kinetic and potential forms. When does the virial apply? In general, the virial theorem is guaranteed to apply if the conservation law H = E can be used to derive an a priori bound on φ and φ˙. For the pendulum this occurs if H < Ec, i.e. for the case of bounded orbits, where Ec is the critical energy. It also applies to the case H = Ec, because of the infinite travel time between saddle points that we noted before. However, it does not apply to the high-energy revolutions H > Ec, which repeatedly spin over the top such that |φ| grows without bound. 1.2. ELEMENTS OF NEWTONIAN MECHANICS 13

1.2.7 Use of Hamiltonian as a Differential equation The invariance of the function H(φ, φ˙) along a solution trajectory can be exploited to aid the integration of Newton’s law. On a given trajectory the energy has value E and therefore the equation

H(φ, φ˙) = E, can be solved as a first-order ODE. This means the second-order- ODE in Newton’s law has been reduced to a first-order ODE by using the first integral H = E. In the pendulum case we obtain

dφ r  g  Z dφ Z = ± 2 E + cos φ ⇐⇒ = ± dt. dt l q g  2 E + l cos φ The sign is determined from the initial conditions. this reduces the solution procedure to a quadrature, which in this case can be performed using elliptic functions. For instance, this can be used to compute the period of finite-amplitude oscillations. This use of H as an ODE foreshadows the Hamilton-Jacobi PDE we will encounter later.

1.2.8 Generic Structure of One-Degree-of-Freedom Systems We can now summarize the structure of the equations for a generic coordinates q(t) satisfying Newton’s law

dV q¨ + = 0 dq with potential energy function V (q). Note that we set the constant mass m = 1. The state of the system is determined by {q, q˙} and the Hamiltonian

q˙2 H(q, q˙) = T + V = + V (q). 2 is conserved if q is a solution of Newton’s equation. Here T is a common shorthand for the kinetic energy. Newton’s law expresses that the particle is accelerated towards decreasing values of the potential V (q). In general, this means acceleration towards a minimum of V , should one exist. In the case of Newton’s apple there was no minimum and the push goes on forever. In the case of the pendulum minima of V occur at φ = 0 and its 2π-periodic repetitions. Upon reaching a minimum the particle overshoots due to its kinetic energy and then it climbs the potential on the other side. If this climbing motion is reversed before a maximum of V is reached then the particle turns back towards to minimum and an oscillation is observed. This is the low-energy case of the pendulum. If the motion goes over the next maximum of V then the particle aims for the neighboring minimum. This is the high-energy case of the pendulum going over the top. Turning for the particle corresponds to zero kinetic energy and energy conservation implies that V (q) = E at a turning point. Depending on the value of E this equation may or may not have a solution. For a given value of the energy H = E we can obtain the first-order equation

Z dq Z q˙ = ±p2(E − V (q)) with quadrature = ± dt. p2(E − V (q)) Finally, if qq˙ is bounded along a trajectory, then we have the virial theorem

q˙2 1 dV = q 2 2 dq 14 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS

for time-averaging over very long intervals on this trajectory. A useful special case arises if V is a power law V = Aq2m with A > 0 and integer m > 0. Then qq˙ remains bounded by energy conservation and the virial theorem yields

q˙2 = T = mV. 2 For m = 1 this shows equipartition of energy, as in the linear harmonic oscillator but in any case it shows how the energy must, on average, be partitioned between its kinetic and potential forms. Like energy conservation, this is a powerful fact about the solution q(t) that can be derived without solving the governing equations.

1.3 Calculus of Variations

Interesting things can be understood in at least two different ways. Mechanics can be understood from the point of view of Newton’s law: solve a certain initial value problem for an ODE that takes from the present state at t = 0 to a future state at t = T . The true path of the coordinate q(t), say, is then a solution to this initial-value problem for the differential equation called Newton’s law. Now, there is an alternative, complementary point of view that looks at the entire path q(t) for all t ∈ [0,T ] simultaneously and then gives a criterion for the true path as the solution to an optimization problem. To get to this point of view we need to recall the calculus of variations.

1.3.1 Functionals and Variations Consider a function y(x) defined on the interval x ∈ [a, b]. We will assume that y is always smooth enough to make possible all differential operations that we need to carry out1. Typically, it will be sufficient that y is twice continuously differentiable. Define the integral

Z b J[y(x)] = F (y, y0, x)dx a in which the function F is assumed to be sufficiently smooth in all its three variable slots to allow partial derivatives up to second order to exist. We write derivatives of F with respect to the arguments in its slots as partial derivatives. For example F = x2y + (y0)2 then

∂F ∂F ∂F ∂2F ∂2F = 2xy, = x2, = 2y0, = = 2x ∂x ∂y ∂y0 ∂x∂y ∂y∂x and so on. The number J depends on the whole function y(x) and we say that J is a functional of y(x); this relationship is denoted by the square brackets. Thus a functional maps a function y(x) to a single number J. This is a familiar concept, for instance the usual integral norms are functionals. The calculus of variations is concerned with the change in J if the function y(x) is subject to a small variation, i.e., if y(x) is changed by a small amount. This means that y(x) is replaced by

y(x) −→ y(x) + δy(x), where the variation δy(x) is a smooth function that is small in the sense that

0 kδyk∞  1 and kδy k∞  1.

1Paraphrasing Einstein: y(x) should be as smooth as necessary for the problem at hand, but not any smoother. The general theory of calculus of variations accommodates functions (and generalized functions) that are less regular than we assume. 1.3. CALCULUS OF VARIATIONS 15

Here the variation of y0 equals the derivative of δy, i.e.

d δy0 = δy. dx The change in J is the also small and can be computed from the Taylor expansion of F as

Z b J[y + δy] − J[y] = (F (y + δy, y0 + δy0, x) − F (y, y0, x))dx a Z b   ∂F 0 0 ∂F 0 0 = δ (y, y , x) + δy 0 (y, y , x) dx + o(δy, δy ). a ∂y ∂y

0 d Using δy = dx δy the integral can be rewritten using

Z b ∂F d ∂F  ∂F b δJ = − δydx + δy . 0 0 a ∂y dx ∂y ∂y a This expression is called the first variation of J around y(x) and it is usually denoted by δJ. In general, it consists of an integral part and an endpoint part. The endpoint part vanishes if the admissible variations satisfy δy(a) = δy(b) = 0. For fixed y(x) the first variation δJ is a linear functional in δy, and it plays the same role here as does the differential in ordinary calculus.

1.3.2 Extremals A function y(x) is an extremal of J if the first variation of J around y(x) vanishes for all δy that vanishes at the endpoints, i.e., δJ = 0, ∀δy s.t. δy(a) = δy(b) = 0. This is only possible if the integrand is zero everywhere. Otherwise, we could choose a variation that is zero at the endpoints but makes the integral nonzero, which leads to δJ 6= 0. Therefore an extremal must satisfy the celebrated Euler-Lagrange equation

Definition 1.3.1 (Euler-Lagrange equation (EL-equation)). The EL-equation for the extremal variational problem is given by

d ∂F  ∂F EL : = . dx ∂y0 ∂y

Therefore the EL-equation is typically a second-order ODE for y(x). Remark 1.3.2. The boundary conditions depend on the admissible functions y(x). For instance, if y(a) and y(b) are fixed then these are the boundary conditions. If y(b) is not fixed then the ∂F vanishing of δJ for all δy implies that ∂y0 = 0 at x = b. This is called the natural boundary condition for the variational problem. An analogous statement applies at the other endpoint at x = a.

1.3.3 Shortest Path

A simple example is to find the shortest path between two points (xA, yA) and (xB, yB) in two- dimensional Euclidean space. Actually, we have not spoken about minima and maxima yet, and a good deal of work is needed to verify whether a given extremal corresponds to a minimum of the functional or not. Sometimes this is clear from the context, as is the case here. We assume that the curve can be parametrized by x, xB > xA and then the length functional can be written as 16 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS

Z B Z B p Z xB p J = ds = dx2 + dy2 = 1 + y02dx A A xA p 02 based on the function y(x) such that y(xA) = yA and y(xB) = yB. Thus F = 1 + y and the EL equation is

d ∂F  ∂F y0 = 0 ⇒ = = const. dx ∂y0 ∂y0 p1 + y02 This implies that y0 is constant and therefore the extremal y(x) is a straight line through the two endpoints, as it should be. A variant of this problem allows the second endpoint to move freely in y at fixed xB; i.e. y(xB) is not fixed. This brings into play the natural boundary ∂F 0 condition at this point. i.e. ∂y0 = 0 at x = xB. This implies that y = 0 there. Thus in this case the extremal is a horizontal straight line connecting the first point (xA, yA) to the second point (xB, yB). The y-location of the second point is itself part of the solution. Clearly, here we found the shortest path between the first point and the line x = xB.

1.3.4 Multiple Functions The EL-equation generalize easily to functionals that depend on multiple functions. Specifically, 0 0 for a functional that depends on N functions yn(x) with integrand F (y1, y1, ..., yN , yN , x), the EL-equations are

d  ∂F  ∂F n = 1, ..., N : 0 = . dx ∂yn ∂yn

Typically, this is a system of N coupled second-order ODEs for the functions yn(x). For instance, if we use a parametric representation of the path in the form (x(τ), y(τ)) with τ ∈ [0, 1] such that

(x(0), y(0)) = (xA, yA) and (x(1), y(1)) = (xB, yB), then we naturally have a length functional that depends on N = 2 functions as

Z 1 Z 1 p J[x(τ), y(τ)] = F (x, ˙ y˙)dτ = x˙ 2 +y ˙2dτ. 0 0 The dot denotes differentiation with respect to the parameter τ. The first variation of J is " ! ! # Z 1 d −x˙ d −y˙ xδx˙ +yδy ˙ 1

δJ = p δx + p δy dτ + p . 0 dτ x˙ 2 +y ˙2 dτ x˙ 2 +y ˙2 x˙ 2 +y ˙2 0 The variations δx and δy are independent and therefore the necessary conditions for an extremal are the two EL-equations

∂F x˙ ∂F y˙ = − = const and = − = const. ∂x˙ px˙ 2 +y ˙2 ∂y˙ px˙ 2 +y ˙2 These agree with the straight line condition. If the endpoints are variable then we now have an interesting new possibility in the case where the initial point (xA, yA) is fixed whilst the final point (xB(s), yB(s)) can vary along a smooth curve parametrized by s. This imposes the 0 0 condition (δx(1), δy(1)) = (xB(s), yB(s))ds on admissible endpoint variations, which implies that the critical path ends at a point (xB(s), yB(s)) such that

0 0 x˙(1)xB(s) +y ˙(1)yB(s) = 0. 1.3. CALCULUS OF VARIATIONS 17

Plugging everything together we get an equation for s and therefore for the endpoints (xB(s), yB(s)). The geometric interpretation of the last equation is simple: the shortest path must intersect the curve of possible endpoints with an angle of ninety degrees. This is a general property of the shortest path between a point and a smooth curve and it is consistent with the special case xB = const that we looked at before.

1.3.5 Symmetries and Conservation Laws We return to the generic N = 1 functional and define a conservation law to be a function G(y, y0, x) that is constant along extremals of the functional. In other words, G is a first integral of the EL-equations that is

dG ∂G ∂G ∂G = + y0 + y00 = 0 dx ∂x ∂y ∂y0 if y(x) satisfies the EL-equations. Conservation laws greatly simplify the problem of finding the extremal function y(x) in the first place, and they also allow us to understand something about the nature of the extremals without knowing them in detail. It turns out that many conservation laws can be linked to continuous symmetries of the functional J relative to trans- formation groups applied to x and y. The most general theorem to this effect is called Noether’s theorem, but we will only use a few simple consequences of it. For instance, the distance functional is invariant under the continuous transformation group (x, y) 7→ (x, y) + (a, b) for arbitrary a, b ∈ R (this transformation includes the endpoints and the claimed invariance is then trivial). We say that J has a continuous symmetry with respect to translations in both x and y. This reflects the homogeneity of Euclidean space. By inspection, for a general functional based on the integrand F (y, y0, x) this kind of translational symmetry is possible only if F does not depend explicitly on either x or y: i.e. we can have F (y0) only, ∂F and we saw that this led to the conservation law ∂y0 = const. In general, a translational symmetry in either x or y can be used to write down a generic conservation law. The form of the conservation law depend on whether the symmetry refers to a dependent or an independent variable. For example, a symmetry in the dependent variable y 0 ∂F implies F (y , x) and this leads directly to a generic conservation law for Gy = ∂y0 because dG d ∂F  ∂F y = = = 0 dx dx ∂y0 ∂y by the EL-equations. Similarly, a symmetry in the independent variable x implies F (y, y0) 0 ∂F and this also leads to a generic conservation law, but for the different quantity Gx = y ∂y0 − F . This follows from

dG ∂F d ∂F  ∂F ∂F ∂F ∂F x = y00 + y0 − − y0 − y00 = − = 0. dx ∂y0 dx ∂y0 ∂x ∂y ∂y0 ∂x In the distance example both these conservation laws reduce to y0 = const. Similarly conservation laws are obtained for arbitrary N1. For instance, in the parametrized version of the distance problem we had N = 2 and the functional again had translational symmetries in x and y. However, in this case both x and y are dependent variables and therefore the associated ∂F ∂F conservation laws are simply the quantities ∂x˙ and ∂y˙ . There is an additional symmetry in the independent parametrization variable τ, but the associated conservation law for

∂F ∂F G =x ˙ +y ˙ − F τ ∂x˙ ∂y˙

is irrelevant because Gτ is identically zero for arbitrary functions x(τ) and y(τ). This is a consequence of the fact that the parametrization of the curve is irrelevant. 18 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS

1.4 The Action principle

We now return to Newton’s apple and rephrase its fate as a variational problem. Compared to the generic theory, we replace x by t, y(x) by z(t) and F by a function L to be defined as follows. The action integral is

Z T Z T Z T z˙2  S[z(t)] = L(z, z˙)dt = (T − V )dt = − gz dt, 0 0 0 2 where L = T − V is the Lagrangian. Notice that L is not equal to H. It involves the difference of kinetic and potential energy. The following statement is called the action principle: Newton’s law is the EL-equation for an extremal of the action integral relative to all trajectories that have a fixed initial point z(0) and a fixed terminal point z(T ). The proof of this is immediate: the vanishing of the first variation implies the EL-equation

d ∂L ∂L ∂L ∂L = and = −g, =z ˙ ⇒ z¨ = −g. dt ∂z˙ ∂z ∂z ∂z˙ This is Newton’s law. Because the endpoints are fixed there are no further terms to consider in δS. The peculiar thing is that the boundary conditions under which the trajectory of z(t) is an extremal of S are different from the initial conditions used to solve Newton’s law. The former involve the position at both t = 0 and t = T whereas the latter involve the position and velocity simultaneously at time t = 0. The action functional S has a symmetry with respect to time because the Lagrangian L does not depend explicitly on t. This implies the conservation of

∂L z˙2 z˙ − L = + gz = H(z, z˙), ∂z˙ 2 which is the familiar energy conservation law. Time symmetry implies energy conserva- tion. Newton’s apple immediately generalizes to a generic one-degree-of-freedom system with coordinate q(t). We have that the action

Z T Z T Z T q˙2  S[q] = L(q, q˙)dt = (T − V )dt = − V (q) dt 0 0 0 2 has vanishing first variation at the true path q(t) subject to fixed q(0) and q(T ). In other words, the true path q(t) is an extremal of S over the space of all admissible functions satisfying the fixed endpoint conditions. The EL-equation is

d ∂L ∂L = ⇒ q¨ + V 0(q) = 0. dt ∂q˙ ∂q The above remains true also for time-dependent potentials V (q, t).

1.4.1 Coordinate-Invariance of the Action Principle There are a number of distinct properties of the action principle that make it attractive to use. First, the generic EL-equation are invariant under arbitrary coordinate transformations. This means that if we know that q(t) satisfies the generic EL-equation

d ∂L ∂L = dt ∂q˙ ∂q then any other coordinate Q(t) such that q = f(Q) for some function f also satisfies the generic EL-equation based on the transformed Lagragian 1.4. THE ACTION PRINCIPLE 19

L˜(Q.Q,˙ t) = L(f(Q), f 0(Q)Q,˙ t). This transformed Lagrangian follows directly from the variable substitution and usingq ˙ = f 0(Q)Q˙ . The partial derivatives

∂L˜ ∂L ∂L ∂L˜ ∂L = f 0(Q) + f 00(Q)Q˙ and = f 0(Q) ∂Q ∂q ∂q˙ ∂Q˙ ∂q˙ combine to yield ! d ∂L˜ d ∂L ∂L ∂L ∂L ∂L˜ = f 0 + f 00Q˙ = f 0 + f 00Q˙ = . dt ∂Q˙ dt ∂q˙ ∂q˙ ∂q ∂q˙ ∂Q We have indeed obtained the generic EL-equation for Q. Of course, the final ODE for Q(t) will differ from that for q(t). The point is that the procedure that leads to the ODE is the generic EL-equation, which is always formed in the same way. A second property of the action principle is that, in conservative force fields, it generalizes easily to a particle moving in three dimensions. Then Newton’s law is the vector law

¨x + ∇V (x) = 0 for the particle location x = (x, y, z). However the action S remains the scalar with La- grangian L given by

1 L(x, ˙x) = | ˙x|2 − V (x). 2 The action principle then relates to independent variations of x(t), y(t) and z(t). The corresponding EL-equations are the three coupled ODEs

d ∂L ∂L d ∂L ∂L d ∂L ∂L = , = , = . dt ∂x˙ ∂x dt ∂y˙ ∂y dt ∂z˙ ∂z This is already easy to solve in Cartesian coordinates but the key point is that these equations remain valid in arbitrary curvilinear coordinates (q1, q2, q3). This is because of the coordinate- invariance of the EL-equations that was already noted. The same is not true for Newton’s law, which needs to be reformulated in curvilinear coor- dinates according to the rules of vector calculus, usually a tedious task. An example will make this clear.

1.4.2 Central Force Field Orbits Consider a very heavy point mass sitting at the origin of a three-dimensional Cartesian coordi- nate system. A very light particle is orbiting it with position x(t). We neglect the motion of the heavy mass and want to compute the motion of the light mass, which we call our particle. This is a model for Earth orbiting the sun, for example. The heavy mass creates a gravitational field that depends only on distance r from the origin. It acts like a potential energy V (r) on the particle. Specifically, in Newton’s theory of gravitation the potential is given by

G V (r) = − , r where G > 0 is a constant. This pulls the particle towards the origin all the time, with a 1 central force that is proportional to r2 . Now, if z(0) =z ˙(0) = 0 then one can easily show that the particle will never leave the xy-plane. We orient the coordinates such that this is the case 20 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS and now have a system with two degrees of freedom for the particle, say polar coordinates with radius r(t) and angle θ(t) such that x = r cos θ and y = r sin θ. A naive application of Newton’s laws would be to write

r¨ + V 0(r) = 0 and θ¨ = 0, which captures the central force and the correct fact that there is no force in the azimuthal direction. However, the statement is clearly wrong as it leads to uniform angular motion θ = A + Bt and to finite-time collapse of the particle r = 0. Both are clearly wrong. Indeed, you would not be reading this if these equations were true, hence they must be false. The error was, of course, that Newton’s law takes different forms in Cartesian and polar coordinates. Rather than using vector calculus, we correct our mistake by using the scalar action principle. Here all that is needed is to find the correct expression for the kinetic energy T in polar coordinates. This is easily done based on the Euclidean length element ds, which is given by

ds2 = dx2 + dy2 = dr2 + r2dθ in Cartesian and polar coordinates. This elementary use of the metric is all we need to know about the nature of polar coordinates. The kinetic energy of the particle is

1 ds2 1 1 T = = (x ˙ 2 +y ˙2) = (r ˙2 + r2θ˙2) 2 dt 2 2 and therefore the Lagrangian is

1 L(r, r,˙ θ˙) = (r ˙2 + r2θ˙2) − V (r). 2 We could now try to solve the two EL-equations

d ∂L ∂L d ∂L ∂L = and = . dt ∂r˙ ∂r dt ∂θ˙ ∂θ However, there is a shortcut and as mathematicians we know that taking shortcuts is always worth it, no matter how difficult it is to actually do it. In this case the shortcut comes from the symmetries of S. The present S is symmetric with respect to time t and angle θ, i.e., it is symmetric with respect to continuous translations in time and continuous rotations in space. By Noether’s theorem there are two corresponding conserved quantities. The time symmetry gives the usual conserved energy

∂L ∂L 1 H =r ˙ + θ˙ − L = (r ˙2 + r2θ˙2) + V (r) = E ∂r˙ ∂θ˙ 2 and the rotational symmetry gives a second conserved quantity via

d ∂L ∂L = 0 ⇒ = r2θ˙ = M. dt ∂θ˙ ∂θ˙ This quantity is called the angular momentum in physics and it is conserved2 for all central potentials V (r). The presence of two conserved quantities allows us to integrate our system with two degrees of freedom by a sequence of quadratures. First, we substitute M for θ˙ in the energy law.

2The conservation of M is enough to explain Kepler’s second law, the law of ”equal areas”, which was extrapolated from astronomical observations before the laws of mechanics were known. 1.4. THE ACTION PRINCIPLE 21

H(r, r,˙ θ˙) = E to obtain H(r, r,˙ M/r2) = E, which is a first order ODE for r(t) alone. Upon solving this ODE for r(t) we can in turn view the second conservation law θ˙ = M/r2 as a first-order ODE for θ(t). This procedure can be carried out and yields the full orbit r(t) and θ(t) in terms of quadratures. Here we will be satisfied with information on r(t) alone, i.e.

1 M 2 r˙2 + V (r) + = E. 2 2r2 This can be viewed as the energy equation for a generic one-degree-of-freedom system with an effective potential energy

M 2 V = V (r) + . eff 2r2 For nonzero M the behavior is fundamentally altered near r = 0, because now the effective potential accelerates r(t) away from the origin. This effect is what was missed by the naive approach, which is only correct in the case M = 0.

1.4.3 Systems with Constraints and Lagrange Multipliers A further property of the action principle is that it generalizes easily to systems with constraints. For instance, the pendulum is the result of considering a point mass with two degrees of freedom x(t) and z(t) exposed to gravity and subject to the constraint x2 + z2 = l. Without this constraint, using polar coordinates r, φ, we have the Lagrangian

1 L = (r ˙2 + r2φ˙2) + gr cos φ. 2 With the constraint, we will assume that the action principle continues to apply in the following sense: the action S is extremal relative to all r(t) and φ(t) that satisfy the constraint as well as the usual endpoint conditions. This assumption has been corroborated by solving many model problems and finding no contradiction; we treat it as a basic statement of physics. In the present case the constraint r = l can be directly substituted in the Lagrangian above, which eliminates r and reduces the problem to a one-degree-of-freedom system with Lagrangian

1 L = l2φ˙2 + gl cos φ. 2 The variations in φ are unconstrained. However, it often happens that the constraint cannot be used to eliminate degrees of freedom. For instance, this could happen if the constraint involves time derivatives of coordinates. Then the EL-equations do not apply because they have been derived under assumption of unconstrained variations. Still, in such cases the constraint can be incorporated by the general method of Lagrange multipliers, as we shall see in this example. This is a mathematical method and not a physical law. It transforms a variational problem with constraints into a new unconstrained problem with more degrees of freedom. For this unconstrained system the EL-equations then apply in their standard form. Specifically, the method consists of using the homogeneous constrains r−l = 0 to form the augmented Lagrangian

Lˆ = L − λ(r − l). Here λ(t) is a Lagrange multiplier function that must be added to the list of unknown functions r(t) and φ(t). Because the constraint acts at every moment in time there is not a single Lagrange multiplier but a full function. Indeed, by breaking down the action integral into a finite Riemann sum it is clear that there are as many constraints as members in that 22 CHAPTER 1. FROM NEWTON’S LAWS TO LAGRANGE’S EQUATIONS sum. They all have their own multiplier and in the continuous limit these multipliers become a function of t. Now, according to the theory of Lagrange multipliers the augmented action Sˆ based on Lˆ is extremal relative to unconstrained variations of r, φ and λ precisely if the original action is extremal relative to the constrained variations in r and φ. Therefore, we consider the three EL-equations for the augmented action

Z T 1  S[r, φ, λ] = (r ˙2 + r2φ˙2) + gr cos φ − λ(r − l) dt. 0 2 They are

r :r ¨ = rφ˙2 + g cos φ − λ d φ : (r2φ˙) = −gr sin φ dt λ : r = l

The last equation enforces the constraint. In the general case, these three equations need to be solved simultaneously. In the present simple case substituting from the third into the second equation again produces the EL-equation for φ alone. the first equation decouples and becomes

λ = lφ˙2 + g cos φ. This gives λ(t) once φ(t) has been found. It can be shown that λ is equal to the central force along the rod of the pendulum that is necessary to enforce the constraint. This turns out to be true more generally, i.e., the value of the Lagrange multiplier is proportional to the constraint forces. The method of Lagrange multipliers is an amazing construction. In your own time you might ponder a few peculiar questions such as whether Sˆ is numerically equal to S for the extremal path, what happens to λ when the constraint is rewritten as sin(r − l) = 0 (or some such), what are the symmetries and associated conserved quantities of the augmented Lagrangian and what is the derivative of the extremalized action S with respect to l? Chapter 2

Differential forms

2.1 Notations

n In these notes U will denote an open subset of R , α a k-form, β an l-form, f a function and X a vector field on U. We will denote by (x1, . . . , xn) the coordinates on U and accordingly write

n X i1 ik α(x) = αi1,...,ik (x) dx ∧ · · · ∧ dx , i1,...,ik=1 n X j1 jl β(x) = βj1,...,jl (x) dx ∧ · · · ∧ dx , j1,...,jl=1 n X ∂ X(x) = Xi(x) . ∂xi i=1

For simplicity we will assume throughout that α, β, f and X are smooth, i.e., that all compo- i nents αi1,...,ik , all components βj1,...,jl , all components X and f are arbitrarily often continuously differentiable. Recall that functions and zero-forms are one and the same thing. ∂ ∂ n Remark 2.1.1. The symbols ∂x1 ,..., ∂xn denote the basis of R corresponding to our choice 1 n n ∗ of coordinates. The symbols dx ,..., dx denote the dual basis of (R ) ; i.e., the canonical i ∂ k n ∗ pairing of dx with ∂xj is 1 if i = j and 0 otherwise. The induced basis of ∧ (R ) is given by i1 ik i the symbols (dx ∧ · · · ∧ dx )1≤i1<···

Using this identity one can rewrite all the terms in the above expansion of α into a linear i1 ik k n ∗ combination of the basis elements (dx ∧ · · · ∧ dx )1≤i1<···

23 24 CHAPTER 2. DIFFERENTIAL FORMS

2.2 Definitions

2.2.1 The wedge product The wedge product of α and β is the (k + l)-form

n n X X i1 ik j1 jl α ∧ β(x) = αi1,...,ik (x)βj1,...,jl (x) dx ∧ · · · ∧ dx ∧ dx ∧ · · · ∧ dx . i1,...,ik=1 j1,...,jl=1 Notice that if k + l > n, then α ∧ β is automatically zero.

2.2.2 The exterior derivative The differential or exterior derivative of α is the (k + 1)-form

n n X X ∂ dα(x) = α (x) dxj ∧ dxi1 ∧ · · · ∧ dxik . ∂xj i1,...,ik j=1 i1,...,ik=1

i n ∗ Notice that dx denotes at the same time the i-th basis vector of (R ) and the differential of the coordinate function xi. Also notice that if α is a top form, i.e., k = n, then automatically dα = 0.

2.2.3 The Pullback m If V is an open subset of R and φ a smooth map V → U, the pullback of α is the k-form on V defined by φ∗α(y) := ∧kdφ(y)∗α(φ(y)), y ∈ V, m n ∗ k ∗ where dφ(y): R → R denotes the differential of φ at y, dφ(y) its transpose and ∧ dφ(y) the k-th exterior power of the latter. If (y1, . . . , ym) are coordinates on V , we have

m n ∂φi1 ∂φik ∗ X X j1 jk φ α(y) = αi1,...,i (φ(y)) (y) ··· (y) dy ∧ · · · ∧ dy . k ∂yj1 ∂yjk j1,...,jk=1 i1,...,ik=1

s Observe that, if W is an open subset of R and ψ a smooth map W → V , we have (φ ◦ ψ)∗ = ψ∗φ∗.

2.2.4 The Lie derivative The Lie derivative with respect to X of α is the k-form

∗ φX,tα − α LX α = lim , t→0 t where φX,t is the flow of X at time t. Explicitly we have

n X i1 ik LX α(x) = X(αi1,...,ik )(x) dx ∧ · · · ∧ dx + i1,...,ik=1 n n X X ∂Xir + (−1)r−1α (x) (x) dxs ∧ dxi1 ∧ · · · ∧ ddxir ∧ · · · ∧ dxik , i1,...,ik ∂xs i1,...,ik=1 r,s=1

Pn j ∂ where X(αi1,...,ik ) = j=1 X ∂xj αi1,...,ik denotes the directional derivative of the function α in the direction of X and the caret denotes that the factor dxir is omitted. i1,...,ik b 2.3. PROPERTIES 25

2.2.5 The contraction The contraction of X with α is the (k − 1)-form

n n X X r−1 ir i1 ir ik ιX α(x) = (−1) αi1,...,ik (x)X (x) dx ∧ · · · ∧ ddx ∧ · · · ∧ dx . i1,...,ik=1 r=1

If α is a function, i.e., k = 0, then automatically ιX α = 0.

2.3 Properties

n The wedge product is bilinear over R , whereas the differential, the pullback, the Lie derivative n and the contraction are linear over R . Moreover, we have the following properties:

α ∧ β = (−1)k+lβ ∧ α, (2.1) d2α = 0, (2.2) d(α ∧ β) = dα ∧ β + (−1)kα ∧ dβ, (2.3) φ∗f = f ◦ φ, (2.4) φ∗(α ∧ β) = φ∗α ∧ φ∗β, (2.5) dφ∗α = φ∗dα, (2.6)

LX f = X(f), (2.7)

LX (α ∧ β) = LX α ∧ β + α ∧ LX β, (2.8)

LX dα = dLX α, (2.9) k ιX (α ∧ β) = ιX α ∧ β + (−1) α ∧ ιX β, (2.10)

LX α = ιX dα + dιX α (Cartan’s formula). (2.11)

∗ Observe that the above properties characterize d, φ , LX and ιX completely: the formulae in the previous Section may be recovered from these properties. If Y is a second vector field, we also have

ιX ιY α = −ιY ιX α, (2.12)

LX LY α − LY LX α = L[X,Y ]α, (2.13)

ιX LY α − LY ιX α = ι[X,Y ]α, (2.14) where [X,Y ] is the Lie bracket of X and Y defined by

[X,Y ](f) = X(Y (f)) − Y (X(f)). 26 CHAPTER 2. DIFFERENTIAL FORMS Chapter 3

Hamiltonian systems

3.1 Introduction

The second formulation we will look at is the Hamilton formalism. In this system, in place of the Lagrangian we define a quantity called the Hamiltonian, to which the canonical equations of motion are applied. While the EL-equations describe the motion of a particle as a single second-order differential equation, the canonical equations describe the motion as a coupled system of two first-order differential equations. One of the many advantages of Hamiltonian mechanics is that it is similar in form to quantum mechanics, the theory that describes the motion of particles at subatomic distance scales. An understanding of Hamiltonian mechanics provides a good introduction to the mathematics of quantum mechanics.

3.2 Legendre Transform

00 Consider a smooth real-valued function f(x) for x ∈ R that is strictly convex, i.e., f > 0 for all x. Then the Legendre transformation transforms the pair (x, f(x)) into a new pair (p, F (p)) by the definition

Definition 3.2.1 (Legendre Transform). The Legendre transform of a f(x) is given by

F (p) = max[px − f(x)]. x The necessary condition for a maximum is

p = f 0(x). This is to be viewed as an equation for x given p. The second x-derivative of the Legendre transform is −f 00(x), which is negative by assumption. Therefore p = f 0(x) is necessary and sufficient for the unique maximum. Thus, an equivalent way to write F (p) is via the two equations

F (p) = px − f(x) and p = f 0(x). This can be contracted to

F (p) = xf 0(x) − f(x) provided one does not forget to solve p = f 0(x) for x in terms of p. This is invariably the tricky step. Note that the Legendre transformation is not a linear transformation despite

27 28 CHAPTER 3. HAMILTONIAN SYSTEMS the appearance of the contracted equation. this is because inverting p = f 0(x) is not a linear procedure in f(x). xα As an example, the Legendre transformation of f = α for α > 1 and x > 0 is computed to be

α − 1 F (p) = xα and p = xα−1 > 0. α Inverting the second equation and substituting in the first yields

pβ 1 1 F (p) = where + = 1. β α β

We see that β > 1; i.e., F (p) for p > 0 is again convex. This is true in general as we shall see.

3.2.1 Derivatives and Convexity

From the equations F (p) = px − f(x) and p = f 0(x), the differential of F (p) can be written as

dF = pdx + xdp − f 0(x)dx = xdp ⇒ F 0(p) = x.

This is a remarkable formula, which makes this transform useful for differential equations. The second derivative follows as

dx 1 1 F 00(p) = = = > 0, dp dp f 00(x) dx and therefore F (p) is strictly convex. We observe that the Legendre transform preserves convexity.

3.2.2 Involution

This means that we can apply the Legendre transform to F (p) and the claim is that this returns f(x). This means that the Legendre transform is an involution; i.e., it is its own inverse. To show that this is true we rewrite the Legendre transform as

f(x) = xf 0(x) − F (p) = F 0(p)p − F (p),

where we have used the equation for the differential and p = f 0(x). However, the last expression is equal to

max[xp − F (p)] p

for fixed x by the same argument as given before, which yields x = F 0(p) for the location of the maximum in p. Therefore

f(x) = max[xp − F (p)] and F (p) = max[px − f(x)] p x

both hold and this proves that the Legendre transform is an involution. 3.2. LEGENDRE TRANSFORM 29

3.2.3 Total Differential of Legendre Transform Consider f(x, y) and its Legendre transform with respect to x, which we shall denote by

F (p, y) = max[px − f(x, y)]. x Again, this is equivalent to the two equations

F (p, y) = px − f(x, y) and p = fx(x, y). The total differential of F (p, y) is

∂F  ∂F  dF = dp + dy, ∂p y ∂y p where we have indicated what must be kept constant during differentiation. On the other hand,

∂f  ∂f  ∂f  dF = pdx + xdp − dx − dy = xdp − dy. ∂x y ∂y x ∂y x Comparing the last two expressions we find that

∂F  ∂F  ∂f  = x and = − . ∂p y ∂y p ∂y x The first equation is the natural extension of the differential of F . The second equation applies to derivatives with respect to any variable that is not involved in the transformation. Note the minus sign.

3.2.4 Local Legendre Transformation The assumption of global strict convexity can be relaxed if one is only interested in the local Legendre transform of f(x) in the neighborhood of a given point x. That is to say, consider any smooth function f(x) at a point x and define the corresponding values of the local Legendre transform F (p) and of its argument p as

F (p) = xf 0(x) − f(x) and p = f 0(x). Thus (x, f(x)) is mapped locally to a particular pair (p, F (p)). If we vary x → x + dx then we can compute how p and F change in return. This yields

dp dF dp = dx = f 00(x)dx and dF = dp = xf 00(x)dx. dx dp In this way the local Legendre transform can be carved out and the map (x, f(x)) → (p, F (p)) remains invertible if f 00(x) is nonzero. The sign of f 00(x) does not matter. For a strictly convex function this local construction can be extended globally and then agrees with the previous definition. However, if f 00(x) goes through zero then the Legendre transform F (p) becomes multivalued. This follows from dp = f 00(x)dx, which changes sign and therefore the same values of p are traversed again. For example, the local Legendre transform of

x3 2p3/2 f(x) = is F (p) = ± 3 3 with the domain restriction p0. The sign of F corresponds to the sign of x. 30 CHAPTER 3. HAMILTONIAN SYSTEMS

3.2.5 Multivariable Case The Legendre transform with respect to several variables is defined as a sequence of single- variable transforms. For example, consider a globally strictly convex function f(x1, x2), i.e., a function whose matrix of second derivatives is positive definite. then the Legendre transform is defined as

F (p1, p2) = max[p1x1 + p2x2 − f(x1, x2)]. x1,x2 As before, this yields

∂f ∂f p1 = and p2 = ∂x1 ∂x2 and also

∂F ∂F = x1 and = x2. ∂p1 ∂p2 the same applies to the preservation of convexity and the involution character of the Legendre transform. A local Legendre transform relaxes the assumption of global strict convexity to the assumption that the equations of p1 and p2 can be inverted locally, i.e., the matrix of second derivatives of f is nonsingular at (x1, x2).

3.3 Canonical Equations

We now apply the Legendre transform to the action principle. This is a general mathematical technique and applies to any feasible variational problem, whether of mechanical origin or not. To be specific we consider the standard action integral for a one-degree-of-freedom system, i.e.,

Z T S[q] = L(q, q,˙ t)dt 0 with corresponding EL-equation

d ∂L ∂L = . dt ∂q˙ ∂q We assume that the Lagrangian L is strictly convex in its second slot, i.e., in its dependence on the valueq ˙. This is certainly true in the standard kinetic energy case, in which L = T −V (q) 1 2 and T = 2 q˙ .

3.3.1 Hamiltonian Function Based on this we introduce the Hamiltonian function H(q, p, t) as the Legendre transform of L with respect to its second slot:

H(q, p, t) = max[pq˙ − L(q, q,˙ t)]. q˙ This is equivalent to

∂L ∂L H(q, p, t) =q ˙ − L(q, q,˙ t) and p = , ∂q˙ ∂q˙ where the second equation must be inverted for given values of (q, p, t) to yield the value of q˙ that must be inserted in the first equation. 3.3. CANONICAL EQUATIONS 31

Let’s consider this done and the definition of H as a function of q and t as well as of the new variable p to be completed. We can now use the differential relations to substitute in the EL-equation. This yields

∂L ∂L ∂H d ∂H = p and = − ⇒ p =p ˙ = − . ∂q˙ ∂q ∂q dt ∂q In addition, we also have

∂H =q. ˙ ∂p In summary, the single second-order EL-equation for q(t) is replaced by the two canonical equations. n n Definition 3.3.1 (Canonical Equations). Let U ∈ R be a open subset of R and let H(q, p, t) be a Hamiltonian function on U ×U ×I. Then the EL-equation, in the form of the Hamiltonian function, is described by the following two equations:

∂H q˙ = ∂p ∂H p˙ = − ∂q The initial-value problem is posed by specifying initial values for both q(0) and p(0). This is a Hamiltonian system of first-order ODEs which has a number of structural advantages.

3.3.2 Canonical Action Principle The canonical equations can be viewed as the EL-equations for a new action principle called the canonical action principle. This is a new functional in terms of the two functions q and p, namely

Z T S[q, p] = (pq˙ − H(q, p, t))dt. 0 Perhaps we should use a different symbol for it. Either way, in the present form this is not equal to S[q], because we can still chose p(t) freely and this affects the value of S[q, p]. However, if we fix q(t) and maximize the integrand over all possible values of p then the integrand becomes numerically equal to L because we are performing a Legendre transformation on H. Therefore we have that

S[q] = max S[q, p]. p(t) This means that the original action principle is recovered by demanding that the canonical action S[q, p] be extremal with respect to independent variations of both q(t) and p(t). The variation with respect to p does not require integration by parts and therefore does not require endpoint conditions for p(t). The corresponding EL-equation is the first half of the canonical equations. the variation with respect to q does require integration by parts and we use the canonical equations, where we have to fix the endpoint values of q. We have derived the canonical equations once by direct substituting and a second time by identifying a new action principle in terms of two functions for which the canonical equations are the EL-equations. It is a peculiar fact that EL-equations for the canonical action principle are first-order equations in times, because the generic form of the EL-equations yields second-order equations. This is due to the very special linear way in whichq ˙ appears in the canonical action integral. 32 CHAPTER 3. HAMILTONIAN SYSTEMS

3.3.3 Previous Examples in Canonical Form

1 2 The generic one-degree-of-freedom system has a quadratic kinetic energy 2 mq˙ , which is obvi- ously convex. The Legendre transform then yields the canonical momentum

∂L p p = = mq˙ ⇒ q˙ = . ∂q˙ m The second form stresses that the equation above is to be used for eliminatingq ˙ in favor of p. We find that p is proportional to the velocity and that the canonical momentum is equal to the standard momentum in physics if q is a Cartesian coordinate. This is good, because all our earlier work stands as it is. This is bad, because it makes us forget that the canonical formalism uses different variables in general, i.e., p 6=q ˙ in general. The Legendre transform of L(q, q,˙ t) with respect toq ˙ is the Hamiltonian function

p2 p2 p2 H(q, p, t) = pq˙ − L = − + V (q, t) = + V (q, t), m 2m 2m which is the energy of the system. It is typical that the kinetic energy term is again quadratic and that the mass m slides into the denominator in the canonical formalism. The canonical equations are

∂H p ∂H ∂V q˙ = = andp ˙ = − = − . ∂p m ∂q ∂q If L is symmetric in time t then so is the Hamiltonian function H. This leads to conservation of H along a solution trajectory (q(t), p(t)), as can be seen from the

dH ∂H ∂H ∂H ∂H ∂H ∂H ∂H ∂H ∂H = q˙ + p˙ + = − + = . dt ∂q ∂p ∂t ∂q ∂p ∂p ∂q ∂t ∂t ∂H This is zero if ∂t = 0 and therefore time symmetry implies energy conservation in explicit form in the canonical formalism. Conservation laws tend to be more explicit in the canonical formalism, which is one of its advantages. This also happens in the central force orbit problem, which had two degrees of freedom r(t) and θ(t). The Lagrangian function was (we set m = 1 again)

1 L(r, r,˙ θ˙) = (r ˙2 + r2θ˙2) − V (r) 2 which for r > 0 is again quadratic and convex in bothr ˙ and θ˙. We perform a Legendre transform with respect to both of these and obtain

∂L ∂L 2 ˙ pr = =r ˙ and pθ = = r θ. ∂r˙ ∂θ˙ This is trivially inverted and leads to the Hamiltonian function

1 H(r, p , θ, p ) = (p2 + r−2p2) + V (r). r θ 2 r θ Time symmetry means that H is conserved again. The canonical equations are

2 ∂H ∂H pθ 0 r˙ = = pr, p˙r = − = 3 − V (r), ∂pr ∂r r 2 ˙ ∂H pθ ∂H θ = = 2 , p˙θ = − = 0 ∂pθ r ∂θ 3.4. THE POISSON BRACKET 33

I hope by now you see the recipe. The crucial equation is the last: the symmetry with respect to θ implies the conservation of the canonical momentum associated with θ, namely pθ. This conserved quantity is the angular momentum M we encountered before. We conclude that in general a symmetry with respect to a coordinate qi leads to a conser- vation law for associated canonical momentum pi because

∂H p˙i = − = 0. ∂qi For a given Lagrangian with a continuous symmetry a smart change of variables q → q˜ might make the Lagrangian explicitly symmetric with respect to one of the new coordinates and then one can exploit the conservation law for the associated canonical momentum. An example for this is transforming the orbit system from Cartesian into polar coordinates, which made the azimuthal symmetry explicit. The art and science of transformations that seek to make symmetries explicit goes by the name of canonical transformations in mechanics. Finally, we can note that a symmetry with respect to a canonical momentum pi likewise leads a conservation law for the associated coordinate qi, namely

∂H q˙i = = 0. ∂pi

This reinforces the view that in the canonical equations the pairs (qi, pi) appear on the same footing. We call (qi, pi) a pair of canonically conjugate variables.

3.4 The Poisson bracket

Definition 3.4.1 (Poisson bracket). In (qi, pi) on the phase space, given two functions f(qi, pi, t) and g(qi, pi, t), the Poisson bracket is given by

N X  ∂f ∂g ∂f ∂g  {f, g} = − . ∂q ∂p ∂p ∂q i=1 i i i i The canonical equations have an equivalent expression in terms of the Poisson bracket. This may be most directly demonstrated in an explicit coordinate frame. Suppose that f(q, p, t) is a function on the phase space. Then from the multivariable chain rule, one has

d ∂f dq ∂f dp ∂f f(q, p, t) = + + . dt ∂q dt ∂p dt ∂t Further, one may take p = p(t) and q = q(t) to be solutions to the canonical equations; that is,

∂H q˙ = = {q, H} ∂p ∂H p˙ = − = {p, H} ∂q

Then, one has

d ∂f ∂H ∂f ∂H ∂f ∂f f(q, p, t) = − + = {f, H} + . dt ∂q ∂p ∂p ∂q ∂t ∂t Thus, the time evolution of a function f on the phase space can be given as a one-parameter family of symplectomorphisms (that are canonical transformations which are area-preserving 34 CHAPTER 3. HAMILTONIAN SYSTEMS diffeomorphisms or just transformations that preserve the Poisson bracket), with the time t be- ing the parameter: Hamiltonian motion is a generated by the Hamil- tonian. That is, Poisson brackets are preserved in it, so that any time t in the solution to the canonical equations, q(t) = q(0) exp(−t{H, ·}), p(t) = p(0) exp(−t{H, ·}), can serve as the bracket coordinates. Poisson brackets are canonical invariants. Dropping the coordinates from the notation, one has

d  ∂  f = − {H, ·} f. dt ∂t This equation is known as the Liouville equation.

3.4.1 Constants of motion If we have a constant of motion for our given system then the constant of motion will commute with the Hamiltonian under the Poisson bracket. Suppose some function f(q, p) is a constant of motion. This implies that if p(t), q(t) is a trajectory or solution to the canonical equations, then one has that

df = 0 dt along that trajectory. Then one has

d f(q, p) = {f, H} dt where, as above, the intermediate step follows by applying the canonical equations. If the Poisson bracket of f and g vanish ({f, g} = 0), then f and g are said to be in involution.

3.4.2 The Poisson bracket in coordinate-free language Let M be a symplectic manifold, that is a manifold equipped with a symplectic form: a 2-form ω which is both closed (i.e. its exterior derivative dω = 0) and non-degenerate. For example, 2n in the treatment above, take M to be R and take

n X ω = dqi ∧ dpi. i=1

If ιvω is the contraction operator defined by (ιvω)(w) = ω(v, w), then non-degeneracy is equivalent to saying that for every one-form α there is a unique vector field Ωα such that

ιΩα ω = α. Then if H is a smooth function on M, the Hamiltonian vector field XH can be defined to be ΩdH . It is easy to see that

∂ Xpi = ∂qi ∂ Xqi = − ∂pi

The Poisson bracket {·, ·} on (M, ω) is a bilinear operator on differentiable functions, defined by

{f, g} = ω(Xf ,Xg). 3.4. THE POISSON BRACKET 35

The Poisson bracket of two functions on M is itself a function on M. The Poisson bracket is antisymmetric because

{f, g} = ω(Xf ,Xg) = −ω(Xg,Xf ) = −{g, f}. Furthermore,

{f, g} = ω(Xf ,Xg) = ω(Ωdf ,Xg) = (ιΩdf ω)(Xg) = df(Xg) = Xgf = LXg f.

Here Xgf denotes the vector field Xg applied to the function f as a directional derivative, and LXg f denotees the (entirely equivalent) Lie derivative of the function f. If α is an arbitrary one-form on M, the vector Ωα generates (at least locally) a flow φx(t) satisfying the boundary condition φx(0) = x and the first-order differential equation dφ x = Ω | . dt α φx(t)

The φx(t) will be symplectomorphisms (canonical transformations) for every t as a function of x if and only if LΩα ω = 0; when this is true, Ωα is called a symplectic vector field. Recalling Cartan’s identity LX (ω) = ιX (dω) + d(ιX ω) and dω = 0, it follows that

LΩα ω = d(ιΩα ω) = dα.

Therefore Ωα is a symplectic vector field if and only if α is a closed form. Since d(df) = 2 d f = 0, it follows that every Hamiltonian vector field Xf is a symplectic vector field, and that the Hamiltonian flow consists of canonical transformations. From above, under the Hamiltonian flow Xh, d f(φ (t)) = X f = {f, H}. dt x H This is a fundamental result in Hamiltonian mechanics, governing the time evolution of functions defined on phase space. As noted above, when {f, H} = 0, f is a constant of motion of the system. In addition, in canonical coordinates (with {pi, pj} = {qi, qj} = 0 and {qi, pj} = δij), the canonical equations for the time evolution of the system follow immediately from this formula. It also follows that the Poisson bracket is a derivation in both arguments; that is, it satisfies Leibnitz’s :

{fg, h} = f{g, h} + g{f, h} and {f, gh} = g{f, h} + h{f, g}. The Poisson bracket is intimately connected to the Lie bracket of the Hamiltonian vector fields. Because the Lie derivative is a derivation,

Lvιwω = ιLvwω + ιwLvω = ι[v,w]ω + ιwLvω.

Thus if v and w are symplectic, using Lvω = 0, Cartan’s identity and the fact that ιwω is a closed form,

ι[v,w]ω = Lvιwω = d(ιvιwω) + ιvd(ιwω) = d(ιvιwω) = d(ω(v, w)).

It follows that [v, w] = Xω(w,v), so that

[Xf ,Xg] = Xω(Xg,Xf ) = −Xω(Xf ,Xg) = X{f,g}. Thus, the Poisson bracket on functions corresponds to the Lie bracket of the associated Hamiltonian vector field. We have also shown that the Lie bracket of two symplectic vector 36 CHAPTER 3. HAMILTONIAN SYSTEMS

fields is a Hamiltonian vector field and hence is also symplectic. In the language of abstract algebra, the symplectic vector fields form a sub algebra of the Lie algebra of smooth vector fields on M, and the Hamiltonian vector fields form an ideal of this sub algebra. The symplectic vector fields are the Lie algebra of the (infinite-dimensional) Lie group of symplectomorphisms of M. It is widely asserted that the Jacobi identity for the Poisson bracket,

{f, {g, h}} + {g, {h, f}} + {h, {f, g}} = 0 follows from corresponding identity for the Lie bracket of vector fields, but this is true only up to a locally constant function. However, to prove the Jacobi identity, it is sufficient to show that:

ad{f,g} = [adf , adg],

where the operator adg on smooth functions on M is defined by adg(·) = {·, g} and the bracket on the right-hand side is the commutator of operators, [A, B] = AB − BA. The operator adg is equal to the operator Xg. The proof of the Jacobi identity follows from the equation

[Xf ,Xg] = Xω(Xf ,Xg) = −Xω(Xf ,Xg) = −X{f,g}, because the Lie bracket of vector fields is just their commutator as differential operators. The algebra of smooth functions on M, together with the Poisson bracket forms a Poisson algebra, because it is a Lie algebra under the Poisson bracket, which additionally satisfies the Leibniz rule. We have shown that every symplectic manifold is a Poisson manifold, that is a manifold with a curly-bracket operator on smooth functions such that the smooth functions form a Poisson algebra. However, not every Poisson manifold arises in this way, because Poisson manifolds allow for degeneracy which cannot arise in the symplectic case. Note also that the Poisson bracket of two constants of motion is again a constant of motion. Chapter 4

Symplectic integrators

4.1 Introduction

The Hamiltonian flow preserves the symplectic structure; an approximation, such as for numer- ical integration, in general does not, which can be expected to miss some important features of the system under study. In this note, we briefly recall how to improve the Euler method in such a way that the symplectic structure is preserved.

4.2 The Euler method

n Let X be a vector field on an open subset O of R and φX,t its flow (at time t). Recall that φX,t(x0), x0 ∈ U, is by definition the evaluation at time t of the unique solution to the Cauchy problem ( x˙(t) = X(x), (4.1) x(0) = x0. The time t must belong to the maximal interval of definition of the solution, which in turn in general depends on x0. By uniqueness we have the fundamental property

φX,t ◦ φX,s = φX,t+s, for t, s and t + s in the maximal interval. This implies that, for t in the maximal interval and any integer N, we have φX,t = φ t ◦ · · · ◦ φ t . X, N X, N | {z } N times t The Euler method consists in approximating φX,τ , τ = N , by a truncation of its Taylor ex- pansion in τ. Let us work it out up to O(τ 2). A path x(t) may be expanded around t = 0 as x(τ) = x(0) + τx˙(0) + O(τ 2). If it is the solution to (4.1), we then have

2 x(τ) = x0 + τX(x0) + O(τ ).

This yields 2 φX,τ (x0) = x0 + τX(x0) + O(τ ). (4.2)

The Euler method, at this order, consists in replacing φX,τ by its truncation

φeX,τ (x0) := x0 + τX(x0),

37 38 CHAPTER 4. SYMPLECTIC INTEGRATORS getting the approximate solution

Euler φX,t := φe t ◦ · · · ◦ φe t . X, N X, N | {z } N times

4.3 Hamiltonian systems

d d Let H be a Hamiltonian function (on W × R with W an open subset of R ) and XH its

Hamiltonian vector field. We want to compute its flow φXH ,t. (To match the notation of the d previous Section, we set n = 2d and O = W ×R .) The problem is that in general the truncation φe t does not preserve the symplectic form, nor does so the ensuing Euler approximation XH , N φEuler. XH ,t The idea of a symplectic integrator, in this context, consists in choosing a different approx- 2 imation of φXH ,τ that is equal to φeXH ,τ up to O(τ ) but preserves the symplectic form. Suppose that H = H1 +H2 (we will see below that this is interesting for practical purposes if we can compute the Hamiltonian flows of H1 and H2 exactly). We then have XH = XH1 +XH2 . By (4.2) we have

φ (x ) = x + τX (x ) + O(τ 2), XH1 ,τ 0 0 H1 0 φ (x ) = x + τX (x ) + O(τ 2). XH2 ,τ 0 0 H2 0 This implies

(φ ◦ φ )(x ) = x + τ(X (x ) + X (x )) + O(τ 2) = XH1 ,τ XH2 ,τ 0 0 H1 0 H2 0 2 = x0 + τXH (x0) + O(τ ), so φ = φ ◦ φ + O(τ 2). XH ,τ XH1 ,τ XH2 ,τ

The idea is now to replace φXH ,τ by

φ := φ ◦ φ (4.3) bXH ,τ XH1 ,τ XH2 ,τ getting the approximate solution

SI φX ,t := φb t ◦ · · · ◦ φb t (4.4) H XH , N XH , N | {z } N times

Notice that φ and φ preserve the symplectic structure and hence so does φSI . XH1 ,τ XH2 ,τ XH ,t The method is applicable if we can compute φ and φ exactly. The typical case is XH1 ,τ XH2 ,τ that of a Hamiltonian of the form H(q, p) = T (p) + U(q). In this case, we set H1(q, p) = T (p) 1 and H2(q, p) = U(q). To be even more specific (even though this is not needed), let us assume ||p||2 that T (p) = 2m . Let us now compute the corresponding flows. In the case of H1 we have to solve p q˙ = , m p˙ = 0,

1 Observe that the Hamiltonian H2 is not hyperregular as a function of p, so it certainly does not arise as the Legendre transform of a Lagrangian. This shows one more reason why the Hamiltonian formalism is often preferable. 4.3. HAMILTONIAN SYSTEMS 39 with initial conditions at t = 0 given by q0 and p0. This yields p q(τ) = q + τ 0 , p(τ) = p . 0 m 0 Hence  p0  φX ,τ (q0, p0) = q0 + τ , p0 . H1 m

In the case of H2 we have to solve

q˙ = 0, p˙ = −∇U(q), with the same initial conditions. This yields

q(τ) = q0, p(τ) = p0 − τ∇U(q0).

Therefore, φ (q , p ) = (q , p − τ∇U(q )) . XH2 ,τ 0 0 0 0 0 We finally have by (4.3) that

  p0 − τ∇U(q0) φbX ,τ (q0, p0) = q0 + τ , p0 − τ∇U(q0) (4.5) H m

Notice that this agrees up to O(τ 2) with the first-order Taylor approximation

 p0  φeX ,τ (q0, p0) = q0 + τ , p0 − τ∇U(q0) H m but is different from the second-order Taylor approximation of φXH ,τ (which in general does not preserve the symplectic structure). ||p||2 If we had chosen instead H1(q, p) = U(q) and H2(q, p) = T (p) = 2m , we would have got

 p0  p0  φbX ,τ (q0, p0) = q0 + τ , p0 − τ∇U q0 + τ , p0 , H m m which yields a different approximation that also preserves the symplectic structure. 40 CHAPTER 4. SYMPLECTIC INTEGRATORS Chapter 5

The Noether Theorem

5.1 Introduction

A symmetry is an invertible transformation that preserves some properties. In geometry, e.g., one considers transformations that preserve “shape” (in Euclidean geometry this leads to trans- lations, rotations and reflections as symmetries). In mechanical systems, symmetries are trans- formations that preserve the equations of motions. One also considers stricter symmetries that preserve more, e.g., the action functional. Symmetry transformations often arise as flows of vector fields. The latter are then called infinitesimal symmetries. A fundamental result in mechanics is Noether’s theorem which relates infinitesimal symmetries to constants of motion.

5.2 Symmetries in Lagrangian mechanics

d Let JL be the action functional associated to a Lagrangian function on U × R × I:

Z b JL[x] = L(x(t), x˙(t), t) dt, a where x is a path [a, b] → U,[a, b] ⊂ I.A symmetry, in a strict sense, is a diffeomorphism φ of U such that

JL[φ ◦ x] = JL[x] (5.1) for each path x on U. Notice that in particular a symmetry maps extremal paths (with end points xa, xb) to extremal paths (with end points φ(xa), φ(xb)). A vector field X on U is called an infinitesimal symmetry if its flow φs is a symmetry for all s (where it is defined). We then have

∂ δJL 0 = JL[φs ◦ x] = [x, X], ∂s s=0 δx where we regard the path t 7→ X(x(t)) as a variation. Recall the general formula

d ! b δJL X ∂L i [x, δx] = ELL[x, δx] + δx , δx ∂vi a i=1 with d Z b X  ∂L d ∂L  EL [x, δx] = (x(t), x˙(t), t) − (x(t), x˙(t), t) δxi(t) dt. L ∂qi dt ∂vi a i=1

41 42 CHAPTER 5. THE NOETHER THEOREM

If X is an infinitesimal symmetry and x is an extremal path (i.e., it satisfies the Euler–Lagrange equations), then we get d ! X ∂L b 0 = Xi . ∂vi a i=1 This shows that the quantity in brackets is the same at the initial time a and at the final time b. Since the choice of the end times was irrelevant for the derivation, this shows that the quantity in brackets does not change and is therefore a constant of motion.1 We can summarize this as follows:

Definition 5.2.1. The Noether 1-form associated to the Lagrangian L is the 1-form αL on d U × R × I defined by d X ∂L α := (q, v, t) dqi. L ∂vi i=1

Theorem 5.2.2 (Noether’s Theorem). If X is an infinitesimal symmetry of JL, then

IX := ιX αL is a constant of motion.

5.2.1 Symmetries and the Lagrangian function

A symmetry of JL may actually be recognized directly on L. Recall that we defined the tangent lift φˇ of φ by d d φˇ: U × R × [a, b] → U × R × [a, b] (q, v, t) 7→ (φ(q), dφ(q)v, t)

Lemma 5.2.3. A diffeomorphism φ of U is a symmetry of JL, as expressed by equation (5.1), if and only if L ◦ φˇ = L.

Proof. Recall that we observed that for any map φ we have JL[φ◦x] = JL◦φˇ[x]. This immediately shows that φ is a symmetry if L ◦ φˇ = L. On the other hand, assume that φ is a symmetry. Then, by the above observation, we have that J [x] = J [x] for every path x. If we define L := L ◦ φˇ − L, we then have J [x] = 0 for L◦φˇ L e Le every path x. We want to show that Le vanishes identically. Assume on the contrary that there is d a point (q, v, τ) in U ×R ×I such that Le(q, v, τ) 6= 0. Then consider the path x(t) = q +(t−τ)v on the interval [τ, τ + ]. We then have Z τ+ J [x] = L(q + (t − τ)v, v, t) dt = L(q, v, τ) + O(2). Le e e τ By choosing  appropriately, we then see that J [x] cannot vanish for all paths and this is a Le contradiction.

We now want to move on to infinitesimal symmetries. First we need the following

Lemma 5.2.4. Let X be a vector field on U and φs its flow. Then the tangent lift φˇs is the d flow of the vector field Xˇ on U × R × I given by d d X ∂ X ∂Xi ∂ Xˇ(q, v, t) = Xi(q) + (q) vj , ∂qi ∂qj ∂vi i=1 i,j=1 called the tangent lift of X.

1 d Recall that a function f on U × R × I is a called a constant of motion (a.k.a. conserved quantity or first integral) if f(x(t), x˙(t), t) is constant in t for every solution x of the Euler–Lagrange equations. 5.2. SYMMETRIES IN LAGRANGIAN MECHANICS 43

Notice that the tangent lift of X defined above does not depend on t and does not have a component in the t direction, so it can be regarded (and this is usually done) as a vector field d on U × R .

Proof. From φs+s0 = φs ◦ φs0 and φ0 = id, we get, by the definition of the tangent lift, that ˇ ˇ ˇ φs+s0 = φs ◦ φs0 and φ0 = id. So φs is also a flow. To compute the corresponding vector field, we just have to derive φˇs with respect to s at s = 0 and this gives the explicit expression for Xˇ in the Lemma.

We now have the

Corollary 5.2.5. A vector field X is an infinitesimal symmetry of JL if and only if

Xˇ(L) = 0

5.2.2 Examples Example 5.2.6. Suppose that the system is invariant under translations in the ithe direc- i ∂ tion, i.e., X = ∂qi is an infinitesimal symmetry. This implies that the ith component of the generalized momentum, ∂L p = = ι i α , i ∂vi X L ˇ i ∂ ∂L is a constant of motion. Since X = ∂qi , we see that this occurs if and only if ∂qi = 0. Example 5.2.7. Suppose that d = 3 and the system is invariant under rotations around the ith axis. In this case, the infinitesimal symmetry is given by the vector field Xi(q) = P3 j ∂ j,k=1 ijk q ∂qk . The corresponding constant of motion is

3 i X j ∂L J = ι i α =  q X L ijk ∂vk j,k=1

∂L which is called the (generalized) angular momentum. If we denote ∂vk by pk, we then have i i ˇ P3 j ∂ P3 j ∂ J = (q×p) . One can easily compute X = j,k=1 ijk q ∂qk + j,k=1 ijk v ∂vk . This describes an infinitesimal rotation acting simultaneously on the q-space and on the v-space. A Lagrangian L of the form L(q, v, t) = T (v) − U(q) is then invariant if both functions T and U are invariant under rotations (around the ith axis). If T and U are invariant under the whole group SO(3), 1 2 then all components of the vector J = q × p are conserved. Finally, if T (v) = 2 m||v|| , we then have J = q × mv which is the usual expression for the angular momentum.

5.2.3 Generalized symmetries Condition (5.1) is a bit too strong since it involves the action functional also away from extremal paths. One way to weaken it, in such a way that a symmetry still preserves the Euler–Lagrange equations, is to assume that the restriction of

JL,φ[x] := JL[φ ◦ x] − JL[x]

[a,b] on each path space Pxa,xb := {x:[a, b] → U : x(a) = xa, x(b) = xb} is constant. This can also be characterized as in the following [a,b] Lemma 5.2.8. Assume that the restriction of JL,φ on each Pxa,xb is constant. Then there is a function F on U such that JL,φ[x] = F (xb) − F (xa) [a,b] for each x ∈ Pxa,xb for each a, b, xa, xb. 44 CHAPTER 5. THE NOETHER THEOREM

Remark 5.2.9. Notice that the function F is defined up to an additive constant on each connected component of U.

Proof. Let us first consider the case when U is connected. Fix a point y in U. Then JL,φ will [a,b] take the same value on all paths in Py,xb . We then define F (xb) as such value. Next we want to show that F (y) = 0. For xb = y, we also have the constant path x(t) = y ∀t ∈ [a, b] at our disposal. This can be considered for arbitrary a and b. Since JL,φ is defined as an integral, its value vanishes if b tends to a. So F (y) = 0. [a,b] Consider now a path x in Pxa,y with the property that all its derivatives at t = a vanish. [a,b] [a,b] Next define x ∈ Py,xa by x(t) = x(a + b − t). Finally define xe ∈ Py,y by ( x(2t − a) if t∈ a, a+b  x(t) = 2 e  a+b  x(2t − b) if t ∈ 2 , b Notice that the vanishing condition on the derivatives of x at its initial point makes this path smooth. We have JL,φ[xe] = F (y) = 0. On the other hand, since JL,φ is defined as an integral, [a,b] we have JL,φ[xe] = JL,φ[x] + JL,φ[x]. So JL,φ[x] = −JL,φ[x] = −F (xa) since x ∈ Py,xa . By the [a,b] constancy of JL,φ we conclude that JL,φ[x] = −F (xa) for every x ∈ Pxa,y [a,b] Finally consider a path x in Pxa,xb that passes through y at some time τ ∈ (a, b). Denote by xe the restriction of this path to the interval [a, τ] and byx ¯ its restriction to [τ, b]. Since J is defined as an integral, we have J [x] = J [x] + J [¯x]. Sincex ¯ ∈ P[τ,b], we know L,φ L,φ L,φ e L,φ y,xb [a,τ] that JL,φ[¯x] = F (xb) and, since xe ∈ Pxa,y , we know that JL,φ[xe] = −F (xa). We conclude that JL,φ[x] = F (xb) − F (xa). If U is not connected, we just apply the above procedure to each connected component.

We then define a generalized symmetry as a pair (φ, F ), where φ is a diffeomorphism of U and F is a function on U, such that

JL[φ ◦ x] = JL[x] + F (xb) − F (xa)

[a,b] for each x ∈ Pxa,xb for each a, b, xa, xb. Notice that we can write

d Z b d Z b X ∂F F (x ) − F (x ) = F (x(t)) dt = x˙ i(t) (x(t)) dt. b a dt ∂qi a a i=1

Pd i ∂F If we define Fe(q, v) := i=1 v ∂qi (q), then, by repeating the arguments in the proof of Lemma 5.2.3, we conclude that (φ, F ) is a generalized symmetry if and only if

L ◦ φˇ = L + F.e (5.2)

The infinitesimal version of a generalized symmetry is a vector field X such that its flow φs together with a family Fs of functions is a generalized symmetry for all s (where it is defined):

JL[φs ◦ x] = JL[x] + Fs(xb) − Fs(xa) (5.3)

∂Fs for all s. Define f = ∂s |s=0. We then say that the pair (X, f) defines a generalized infinitesimal symmetry. By differentiating equations (5.2) and (5.3) with respect to s at s = 0 we get the following

Proposition 5.2.10. The pair (X, f) defines a generalized infinitesimal symmetry of JL if and only if Xˇ(L) = fe 5.2. SYMMETRIES IN LAGRANGIAN MECHANICS 45

Pd i ∂f with fe(q, v) := i=1 v ∂qi (q). In this case

IX := ιX αL − f is a constant of motion.

[a,b] Remark 5.2.11. The even more general case when the restriction of JL,φ on each Pxa,xb is only 2 [a,b] locally constant on each path space Pxa,xb := {x:[a, b] → U : x(a) = xa, x(b) = xb} can be treated with a bit more of work. We enunciate the results for completeness. A diffeomorphism Pd i φ of U has this property if and only there is a closed 1-form Θ = i=1 Θi dq such that Z JL[φ ◦ x] = JL[x] + Θ x

ˇ Pd i and this occurs if and only if L ◦ φ = L + Θe with Θ(e q, v) = i=1 v Θi(q). A vector field X Pd i defines the infinitesimal version of this if and only if there is a closed 1-form θ = i=1 θi dq ˇ Pd i such that X(L) = θe with θe(q, v) = i=1 v θi(q). This does not lead to a conservation law if θ is not exact, but only to the statement that the closed 1-form ιX αL − θ integrates to zero along every orbit. Notice that if the closed form θ is exact, θ = df, then we are in the case described above and we have indeed a constant of motion as in Proposition 5.2.10. By Poincar´eLemma we know that this necessarily happens if U is star shaped. Otherwise, we may always restrict our attention to a star shaped neighborhood V ⊂ U of the initial conditions. As long as the orbit lies in V , we have a constant of motion. We may also take a later point of this orbit as a new initial condition and choose a new star shaped neighborhood. In general, we may patch the whole orbit by star shaped neighborhoods Vi and in each of them we have a constant of motion. In each Vi we have indeed a function fi such that θ|Vi = dfi and a constant of motion IX,i := ιX αL −fi defined on Vi. Notice however that, if Vi and Vj have a nonempty intersection, the restriction of fi and fj to this intersection will not be equal in general, but will differ by a constant. So, in this setting, a constant of motion exists, but only locally, and it may be thought globally only up to locally defined constants. In particular, it is not a globally defined function. Remark 5.2.12. Notice that we have a

1 ∞ d T :Ω (U) → C (U × R ) Pd i Pd i i=1 θi(q) dq 7→ i=1 v θi(q).

The function θe used in the previous remark is then T θ. The function fe used above is T df. Notice that T φ∗θ = φˇ∗T θ for every smooth map φ: U → V and for every θ ∈ Ω1(V ). More asbtracly, we may write

T φ∗ = φˇ∗T (5.4) as an equality of linear maps Ω1(V ) → Ω1(U).

2This means that each such restriction does not change under continuous deformations of the path. On the other hand, if the path space is not connected, the restriction may take a different value on different connected components. 46 CHAPTER 5. THE NOETHER THEOREM

Equivalent Lagrangians

d d Let L be a Lagrangian on U × R ×I, U open in R , and let G be a function on U. Define

Le = L + T dG

(i.e., L(q, v, t) = L(q, v, t) + Pd vi ∂G ). We then have J [x] = J [x] + G(x ) − G(x ) for all e i=1 ∂qi Le L b a [a,b] x ∈ Pxa,xb . This implies that L and Le have the same extremal paths and are therefore equivalent from the point of view of Euler–Lagrange equations. The Noether 1-forms are related by

α = α + dG. Le L Also notice that, if (φ, F ) is a generalized symmetry for L, then by (5.4) we get that (φ, F +φ∗G) is a generalized symmetry for Le. Hence, if (X, f) is a generalized infinitesimal symmetry for L, we see that (X, f + X(G)) is a generalized infinitesimal symmetry for Le. We then have that ι α − f − X(G) = ι α − f, so the constant of motion corresponding to the generalized X Le X L symmetry (X, f) of L is equal to the constant of motion corresponding to the generalized symmetry (X, f + X(G)) of Le. Notice that a strict infinitesimal symmetry X of L is only a generalized one if one uses the equivalent Lagrangian Le and X(G) 6= 0. So the notion of strict symmetry really depends on the choice of Lagrangian. Since one cannot be sure to have chosen the “right” Lagrangian (and there might be no Lagrangian that is “right” for all symmetries in case there are more at hand), the correct framework to use is that of generalized symmetries. Also observe that, if f can be written as −X(G) for some function G, then we may transform the generalized infinitesimal symmetry (X, f) of L into an infinitesimal symmetry, in the strict sense, of the equivalent Lagrangian Le. Notice however that, even if this may be possible, different infinitesimal symmetries may require different equivalent Lagrangians to be made strict.

Example: constant magnetic field Consider a charged particle moving in a constant magnetic field of magnitude B 6= 0. The system is clearly invariant under translations in every direction. On the other hand, the Lagrangian depends on a vector potential generating this field and this cannot be translation invariant, otherwise it would produce the zero magnetic field. This provides an example of generalized symmetry. Suppose for definiteness that B points in x3-direction and the particle has mass m and charge e. Newton’s equation then read (setting the speed of light c to 1)

mx¨1 = eBv2, mx¨2 = −eBv1, mx¨3 = 0, and they are clearly invariant under translations x(t) 7→ x(t)+a, where a is an arbitrary vector, since they only depend on the first and second time derivatives of x. On the other hand, the Lagrangian of the system is 1 L(x, v) = m||v||2 + ev · A(x). 2 Notice that A cannot be translation invariant (i.e., constant) since B = ∇×A 6= 0. For example 2 we may choose A1 = −Bx , A2 = A3 = 0 getting 1 L(x, v) = m||v||2 − eB v1x2. 2 5.3. FROM THE LAGRANGIAN TO THE HAMILTONIAN FORMALISM 47

We then have 3 X i i 2 1 αL = mv dq − eB x dx . i=1 Notice that this Lagrangian is invariant under translations in the x1 and in the x3 direction. So we have the integrals of motion

1 2 P1 := ι ∂ αL = mv − eB x , ∂x1 3 P3 := ι ∂ αL = mv . ∂x3 Under a translation in the x2 direction we have, on the other hand,

∂ˇ ∂ L = L = −eB v1 = fe ∂x2 ∂x2 with f(x) = −eB x1. We then get the integral of motion

2 1 P2 := ι ∂ αL − f = mv + eB x . ∂x2 Exercise 5.2.13. Show that rotations around the x3 axis are also generalized symmetries and compute the corresponding integral of motion. Exercise 5.2.14. Show that the integrals of motions computed above do not change if the vector potential A is changed to A + ∇λ. Exercise 5.2.15. Show that one can choose A such that v · A is invariant under rotations around the x3 axis, so that rotations around the x3 axis become symmetries in the strict sense.

Exercise 5.2.16. Show that changing the vector potential A to Ae = A + ∇λ produces an equivalent Lagrangian.

5.3 From the Lagrangian to the Hamiltonian formalism

d Suppose L is a hyperregular Lagrangian on U × R × I. Denote by ψL its associated Legendre ∂L mapping and by H its Legendre transform. The first observation is that the factors ∂vi appearing in the Definition 5.2.1 of the Noether 1-form are just the generalized momenta pi, so we have

∗ αL = ψLα, where d X i α := pidq i=1 is the Liouville 1-form (a.k.a. the Poincar´e1-form or the tautological 1-form). Notice that the dependency on the Lagrangian of the Noether 1-form comes only through the Legendre mapping: the Liouville 1-form is independent of L. Also notice that the canonical symplectic form ω is actually dα. Now suppose that X is an infinitesimal symmetry. Then IX = ιX αL is ∗ a constant of motion. We clearly have IX = ψLFX with

d X i FX = ιX α = X (q) pi. i=1

Since this is a constant of motion, we then have XH (FX ) = 0. But this is equivalent to

Xe(H) = 0, (5.5) 48 CHAPTER 5. THE NOETHER THEOREM where Xe is the Hamiltonian vector field of FX :

ι ω = −dF . (5.6) Xe X This is a consequence of the following basic

Lemma 5.3.1. Let f and g be functions, and let Xf and Xg be their Hamiltonian vector fields. Then

Xf (g) = −Xg(f) = ιXg ιXf ω

We will se that equations (5.5) and (5.6) characterize symmetries in the Hamiltonian for- malism. The present case has however two peculiarities. The first is thatFX is linear in the p variables. The second is that we have F = ι α, which in turn implies L α = 0. X Xe Xe A simple computation shows that

d X ∂Xi ∂ Xe = X − (q) pi ∂qj ∂p i,j=1 j which is called the cotangent lift of X. ∗ If (X, f) is a generalized inifnitesimal symmetry, then we have IX = ιX αL − f = ψLFX,f Pd i with FX,f = i=1 X (q) pi − f(q). If we denote by Xef = Xe − Xf the Hamiltonian vector field of FX,f , we then get Xf (H) = 0 and ι ω = −dFX,f , which have the same form as (5.5) and e Xef (5.6). Notice that in this case FX,f is at most linear in the p variables and that L α = df. Xef

5.4 Symmetries in Hamiltonian mechanics

∞ d d Let H ∈ C (V ) be a time-independent Hamiltonian on an open subset V of R × R . Notice that we can assume H to be time independent without loss of generality.3 Recall that Hamilton’s equation for H can be obtained as Euler–Lagrangian equations for the Lagrangian4 d X i LeH (q, p, vq, vp) = pivq − H(q, p) = T α − H, i=1 Pd i where we have used the notations of Remark 5.2.12 with α = i=1 pi dq the Liouville 1-form. Notice that we also have α = α. LeH ˇ∗ A symmetry is then a diffeomorphism φ of the phase space V such that φ LeH = LeH . By ˇ∗ ∗ ∗ Remark 5.2.12, we have φ LeH = T φ α − φ H. Since H does not depend on the velocities, we then have that φ is a symmetry if and only if

φ∗H = H and φ∗α = α.

Notice that the second equation also implies φ∗ω = ω. A vector field Y on V is then an infinitesimal symmetry if and only if

Y (H) = 0 and LY α = 0, whereas the corresponding constant of motion is ιY α.

3 If H(q, p, t) were time dependent, we could replace it by the time-independent Hamiltonian He(q, τ, p, pτ ) = H(q, p, τ) + pτ on extended phase space. 4Notice that fixing both positions and momenta at endpoints in general yields no solutions. On the other hand, the boundary term in computing the functional derivative of this Lagrangian does not involve the variation of the momenta. One then considers extremal paths with fixed qs at the endpoints, leaving the ps free. The corresponding Euler–Lagrange equations for these extremal paths are then the Hamilton equations. 5.4. SYMMETRIES IN HAMILTONIAN MECHANICS 49

This occurs, e.g., if H is the Legendre transform of L and Y is the cotangent lift of an infinitesimal symmetry of L. Similarly, we see that (φ, F ) is a generalized symmetry if and only if

φ∗H = H and φ∗α = α + dF.

A pair (Y, g) of a vector field and a function on V is then a generalized infinitesimal symmetry if and only if Y (H) = 0 and LY α = dg.

In this case, the constant of motion is F = ιY α − g. This implies

ιY ω = −dF. Notice that now F can be an arbitrary function on V . We have thus arrived at the Theorem 5.4.1. A (possibly generalized) infinitesimal symmetry of a Hamiltonian system on

V with Hamiltonian function H is a Hamiltonian vector field XF , ιXF ω = −dF , such that XF (H) = 0.

Remark 5.4.2. Notice that by Lemma 5.3.1 this immediately implies XH (F ) = 0, so F is the corresponding constant of motion. Remark 5.4.3. Lemma 5.3.1 can also be read the other way around. Namely, suppose that F is a constant of motion for a Hamiltonian system with Hamiltonian H (i.e., XH (F ) = 0). Then XF is a symmetry (i.e., XF (H) = 0). In this way, we have a one-to-one correspondence between symmetries and constants of motion up to an additive constant. d Remark 5.4.4. Suppose that H is the Legendre transform of a Lagrangian L defined on U ×R =: V . In the Lagrangian formalism, infinitesimal symmetries are vector fields on U that preserve L (possibly up to a term T df, with f a function on U). These correspond to infinitesimal symmetries in the Hamiltonian formalism whose constants of motions are at most linear in the p variables. These are just very particular examples of symmetries. Remark 5.4.5. The general case discussed in Remark 5.2.11 corresponds, in its infinitesimal form, to having a vector field Y and a closed 1-from θ such that Y (H) = 0 and LY α = θ. Notice that the second equation is equivalent to LY ω = 0. Again this does not produce a constant of motion in general. However, locally we may have θ exact and proceed as above.

5.4.1 Symplectic geometry The above results have a nice, general form in symplectic geometry. There one just assumes to have a closed 2-form ω with the property that for each function f there is a unique vector

field Xf , called the Hamiltonian vector field of f, such that ιXf ω = −df. The latter property may be checked by computing the matrix representing ω at each point in local coordinates and verifying that it is nondegenerate. A function f satisfying ιX ω = −df is called a Hamiltonian function for X and is not uniquely defined (it is defined up to a constant on each connected component). A vector field that is the Hamiltonian vector field for some function is called a Hamiltonian vector field. Not every vector field is Hamiltonian. Notice that a Hamiltonian vector field X automatically satisfies LX ω = 0. A vector satisfying this property is called a symplectic vector field. In general not every symplectic vector field is Hamiltonian. Notice that Lemma 5.3.1 holds in this general setting. Hence, if we are given a Hamiltonian function H, which defines the dynamic of the system, we define a symmetry to be a Hamiltonian vector field X such that X(H) = 0. If f is a Hamiltonian function for X, then the Lemma implies that XH (f) = 0, so f is a constant of motion. This is the general, simple and beautiful version of Noether’s theorem in symplectic geometry. 50 CHAPTER 5. THE NOETHER THEOREM

5.4.2 The Kepler problem Consider the Hamiltonian ||p||2 K H(q, p) = − 2m ||q|| for a body in a gravitational field, where K is a positive constant. Notice that H is a smooth 3 3 function on U × R with U = R \ 0. Rotation invariance, in the Lagrangian version, yields conservation of the angular momentum5 J = q × p, see Example 5.2.7. Notice that the components of J are linear in p as they come from symmetries in the Lagrangian formalism. Another constant of motion is the Laplace–Runge–Lenz vector (shortly, the Lenz vector) q A = p × J − mK . ||q|| The simplest way to prove that A is conserved is by taking its time derivative along a solution to Hamilton’s equations p q˙ = , m q p˙ = −K ||q||3

Using J˙ = 0 along a solution, we then get q  q p · q  A˙ = −K × J − K − = 0. ||q||3 ||q|| ||q||3

From XH (A) = 0 we then get XAi (H) = 0 for all i. Notice that A has a quadratic term in the p variables, so it does not come from a symmetry in the Lagrangian setting (nor is XAi a cotangent lift). On the other hand, the vector fields XAi are symmetries in the Hamiltonian formalism. The conservation of the Lenz vector can be used to derive the Kepler orbits directly (without having to solve the differential equations). First observe that, by the cyclic property of the triple product, we have A · q = ||J||2 − mK||q||. Recall that adapting the coordinates to the initial conditions we may assume that the motion occurs in the xy plane. Then J points in the z direction. So A is also in the xy plane. If we denote by θ the angle between A and q, we may then rewrite the above equation as Ar cos θ = J 2 − mKr, wehre we have set r := ||q||, J := ||J|| and A := ||A||. Notice that J and A are obviously also constants of motion. We then get J2 mK r = A , 1 + mK cos θ A which shows that the orbits are conic sections with eccentricity e = mK .

Remark 5.4.6. The Hamiltonian vector fields XJi generate infinitesimal rotations and correspond to the fact that the Hamiltonian is rotation invariant (with rotations extended to phase space by the cotangent lift; i.e, A ∈ SO(3) acts by (q, p) 7→ (Aq, Ap)). The vector fields XAi generate additional infinitesimal transformations. One can show that the XJi ’s and the XAi ’s together generate a (nonlinear) action of the group SO(4) on phase space.

5By this we mean that each component of the vector J is a constant of motion. Chapter 6

The Hamilton–Jacobi equation

6.1 Introduction

The Hamilton–Jacobi equation is a PDE associated to a Hamiltonian system with which it has a two-way link. On the one hand, it is the PDE satisfied by the action functional evaluated on orbits as a function of the end variables; as such, solving the Hamilton equation provides an effective method of solving the Cauchy problem for the Hamilton–Jacobi equation (method of characteristics). On the other hand, a solution depending on enough parameters (com- plete integral) provides a generating function for a canonical transformation that trivializes the Hamiltonian, thus allowing one to solve the Hamilton equations. The method actually only works for very special systems (integrable systems); however, perturbations of integrable systems are more effectively studied in the canonical variables in which the unperturbed Hamiltonian is trivialized. Finally, the Hamilton–Jacobi equation is related to the asymptotics of the Schr¨odinger equation in the classical limit.

6.2 The Hamilton–Jacobi equation

d Throughout we will denote by U an open subset of R , by I an interval and by H a Hamiltonian d function on U × R × I, which we will write as H(q, p, t). The Hamilton–Jacobi equation for the unknown function S on an open subset of of U × I is then ∂S  ∂S  + H q, , t = 0 ∂t ∂q

∂S ∂S ∂S where ∂q is a shorthand notation for ∂q1 ,..., ∂qd . If the Hamiltonian does not depend on the time variable t, we simply write H(q, p). To it one associates the reduced Hamilton–Jacobi equation in the unknown function S0 on an open subset of U:  ∂S  H q, 0 = E ∂q where E is a parameter, called the energy. Notice that if S0 is a solution of the reduced Hamilton–Jacobi equation at energy E, then S(q, t) = S0(q) − Et is a solution of the Hamilton– Jacobi equation. Finally, notice that S and S0 enter into the equations only through their derivatives; so every solution can be shifted by a constant yielding a new solution. Remark 6.2.1. The Hamilton–Jacobi equation is also related to the asymptotics of the Schr¨odinger 2 Pd pi equation which appears in quantum mechanics. For the Hamiltonian H(q, p) = i=1 2m +V (q),

51 52 CHAPTER 6. THE HAMILTON–JACOBI EQUATION this is the PDE d ∂ψ X 2 ∂2ψ i = − ~ + V (q)ψ, ~ ∂t 2m (∂qi)2 i=1 where the unknown ψ is a time-dependent complex-valued function on Uand ~ is a constant. If one writes i φ(q,t) ψ(q, t) = A(q, t) e ~ , where A and φ are real valued, and assumes that for ~ small we have A = A0 + O(~) and φ = φ0 +O(~), then φ0 solves the Hamilton–Jacobi equation on W := {q ∈ U : A0(q, t) 6= 0 ∀t}. This gives an indication that in the limit ~ → 0 quantum mechanics is approximated by classical mechanics.

6.3 The action as a function of endpoints

d Let L be a Lagrangian function on U × R × I and denote by S its corresponding action functional: Z tB S[x] = L(x(t), x˙(t), t) dt, tA where x is a map [tA, tB] → U, with [tA, tB] ⊂ I. Let W be an open subset of U × I × U × I such that for each (q , t , q , t ) ∈ W there is a unique extremal path, denoted by q∗ A A B B qA,tA,qB ,tB (or simply q∗). Define S∗(q , t , q , t ) := S[q∗ ]. A A B B qA,tA,qB ,tB Example 6.3.1. Consider the free particle in one dimension; i.e., d = 1, U = R, L(q, v, t) = 1 2 2 mv . The EL equation, mq¨ = 0, is easily solved and we have, with W = {(qA, tA, qB, tB) ∈ U × I × U × I : tA 6= tB}, (t − t )q + (t − t)q q∗ (t) = A B B A . qA,tA,qB ,tB tB − tA Hence q − q q˙∗ (t) = B A qA,tA,qB ,tB tB − tA and Z tB  2 2 ∗ 1 qB − qA 1 (qB − qA) S (qA, tA, qB, tB) = m dt = m . tA 2 tB − tA 2 tB − tA We now want to study the dependency of S∗ on its arguments. Theorem 6.3.2. We have ∂S∗ ∂S∗ i = pBi, = −HB, ∂qB ∂tB ∂S∗ ∂S∗ i = −pAi, = HA, ∂qA ∂tA where ∂L p (q , t , q , t ) := (q , q˙∗ (t ), t ), Bi A A B B ∂vi B qA,tA,qB ,tB B B ∂L p (q , t , q , t ) := (q , q˙∗ (t ), t ), Ai A A B B ∂vi A qA,tA,qB ,tB A A d X ∗i ∗ HB(qA, tA, qB, tB) = pBi(qA, tA, qB, tB)q ˙ (tB) − L(qB, q˙ (tB), tB), i=1 d X ∗i ∗ HA(qA, tA, qB, tB) = pAi(qA, tA, qB, tB)q ˙ (tA) − L(qA, q˙ (tA), tA). i=1 6.3. THE ACTION AS A FUNCTION OF ENDPOINTS 53

One can also compactly write

d d ∗ X i X i dS = − pAidqA + HAdtA + pBidqB − HBdtB. i=1 i=1 Example 6.3.3. Let us check the above formulae in the case of Example 6.3.1. We explicitly have ∂S∗ q − q = m B A = mq˙∗ (t ) qA,tA,qB ,tB B ∂qB tB − tA and ∂S∗ 1 q − q 2 1 = − m B A = − mq˙∗ (t )2. qA,tA,qB ,tB B ∂tB 2 tB − tA 2

Proof of Theorem 6.3.2. We begin with the derivatives with respect to qB. Let δqB be a vector d in R . We have

∗ ∗ d ∗ S (qA, tA, qB + δqB, tB) − S (qA, tA, qB, tB) X ∂S i lim = δqB. →0  ∂qi i=1 B Write q∗ := q∗ and q∗ = q∗. We have  qA,tA,qB +δqB ,tB 0

∗ ∗ 2 q = q + δq + O( )

d for a uniquely defined path δq :[tA, tB] → R . Thus, S[q∗] − S[q∗] δS lim  = [q∗, δq], →0  δq which implies d X ∂S∗ δS δqi = [q∗, δq]. ∂qi B δq i=1 B ∗ Now observe that, since q is an extremal path, we have

δS [q∗, δq] = δq d X  ∂L ∂L  = (q , q˙∗(t ), t )δqi(t ) − (q , q˙∗(t ), t )δqi(t ) . (6.1) ∂vi B B B B ∂vi A A A A i=1

∗ ∗ Since q (tA) = qA, we get δq(tA) = 0. From q (tB) = qB + δqB, we conclude δq(tB) = δqB. Thus, d δS X ∂L [q∗, δq] = = (q , q˙∗(t ), t )δqi , δq ∂vi B B B B i=1 which proves the first equation. We now come to the second equation, the derivative with respect to tB. For δtB ∈ R, we have ∗ ∗ ∗ S (qA, tA, qB, tB + δtB) − S (qA, tA, qB, tB) ∂S lim = δtB. →0  ∂tB Write q∗ := q∗ and q∗ = q∗. Notice that q∗ is defined on the interval [t , t +δt ].  qA,tA,qB ,tB +δtB 0  A B B ∗ ∗ Assume δtB ≥ 0 and denote by qe the restriction of q to [tA, tB]. We then have

Z tB +δtB ∗ ∗ ∗ ∗ S[q ] = S[qe ] + L(q (t), q˙ (t), t) dt. tB 54 CHAPTER 6. THE HAMILTON–JACOBI EQUATION

Notice that we have ∗ ∗ 2 qe = q + δq + O( ) d 1 for a uniquely defined path δq :[tA, tB] → R . Thus, ∗ ∗ S[q ] − S[q ] δS ∗ ∗ ∗ lim = [q , δq] + L(q (tB), q˙ (tB), tB)δtB, →0  δq which implies ∗ ∂S δS ∗ ∗ ∗ δtB = [q , δq] + L(q (tB), q˙ (tB), tB)δtB. ∂tB δq ∗ ∗ We use again (6.1). Notice that qe (tA) = q (tA) = qA implies δq(tA) = 0. On the other hand,

∗ ∗ ∗ 2 qB = q (tB + δtB) = q (tB) + q˙ (tB)δtB + O( ) = ∗ ∗ 2 ∗ ∗ 2 = q (tB) + q˙ (tB)δtB + O( ) = q (tB) + (δq(tB) +q ˙ (tB)δtB) + O( ) = ∗ 2 = qB + (δq(tB) +q ˙ (tB)δtB) + O( ),

∗ so δq(tB) = −q˙ (tB)δtB. We conclude that

d δS X ∂L [q∗, δq] = − (q , q˙∗(t ), t )(q ˙∗)i(t )δt , δq ∂vi B B B  B B i=1 which proves the second equation. The third and the fourth equations are proved along the same lines.

Now assume that L is hyperregular and denote by H its Legendre transform. We then have

HB(qA, tA, qB, tB) = H(qB, pB(qA, tA, qB, tB), tB).

The first two eqations in Theorem 6.3.2 then imply

∂S∗  ∂S∗  + H qB, , tB = 0; ∂tB ∂qB

∗ i.e., S as a function of the end variables (qB, tB) satisfies the Hamilton–Jacobi equation.

6.4 Solving the Cauchy problem for the Hamilton–Jacobi equa- tion

The Cauchy problem for the Hamilton–Jacobi equation is the system

∂S  ∂S  + H q, , t = 0, ∂t ∂q

S(q, t0) = σ(q), where σ is a given function on U. For simplicity, and actually without loss of generality, we assume that H is time independent and take t0 = 0. We want to show that we can solve the Cauchy problem for the Hamilton–Jacobi equation by integrating the Hamilton equations. First define

 ∂σ  L := (q, p) ∈ U × d : p = (q) ∀i R i ∂qi

1 + − The limit is for  → 0 if δtB ≥ 0 and for  → 0 otherwise. 6.4. SOLVING THE CAUCHY PROBLEM FOR THE HAMILTON–JACOBI EQUATION55 and Lt := φt(L), where φt is the flow of the Hamiltonian vector field of H. Notice that L is defined as the graph of a map. This will be the case also for Lτ for τ small. Let t1 be the greatest number such that Lτ is a graph for all τ ∈ (0, t1). Then for each q ∈ U and for each τ ∈ (0, t1) there is a unique p(q, τ) such that (q, p(q, τ)) ∈ Lτ . We then have a unique orbit (q∗, p∗) on the interval [0, τ] such that q∗(τ) = q and p∗(τ) = p(q, τ): namely, ∗ ∗ −1 (q , p )(t) = φt(φτ (q, p(q, τ))). ∗ ∗ ∗ ∗ ∂σ ∗ 2 Equivalently, (q , p ) is the unique orbit with q (τ) = q and pi (0) = ∂qi (q (0)) ∀i. These orbits are called characteristics of the system. ∗ Let φ: U × (0, t1) → U be the map that assigns q (0) to a pair (q, τ). Notice that limτ→0 φ(q, τ) = q ∀q ∈ U, so we can extend φ to U × [0, t1) Then we have the Theorem 6.4.1. If H is the Legendre transform of the Lagrangian L, then the function

Z τ S(q, τ) := σ(φ(q, τ)) + L(q∗(t), q˙∗(t)) dt 0 solves the Cauchy problem for τ ∈ [0, t1). Proof. We clearly have S(q, 0) = σ(q). Then let Z τ ∗ ∗ ∗ Schar(q, τ) := L(q (t), q˙ (t)) dt = S (φ(q, τ), 0, q, τ). 0 By Theorem 6.3.2 we have

d i ∂Schar X ∂σ ∂φ (q, τ) = p (q, τ) − (φ(q, τ)) (q, τ), ∂qj j ∂qi ∂qj i=1 d i ∂Schar X ∂σ ∂φ (q, τ) = −H(q, p(q, τ)) − (φ(q, τ)) (q, τ). ∂τ ∂qi ∂τ i=1 Hence ∂S (q, τ) = p (q, τ), ∂qj j ∂S (q, τ) = −H(q, p(q, τ)), ∂τ so S solves the Hamilton–Jacobi equation.

For a general Hamiltonian H, we have the

Theorem 6.4.2. The function

Z τ d ! X ∗ ∗i ∗ ∗ S(q, τ) := σ(φ(q, τ)) + pi (t)q ˙ (t) − H(q (t), p (t)) dt 0 i=1 solves the Cauchy problem for τ ∈ [0, t1). Notice, by the way, that for H the Legendre transform of L one has

Z τ d ! Z τ X ∗ ∗i ∗ ∗ ∗ ∗ pi (t)q ˙ (t) − H(q (t), p (t)) dt = L(q (t), q˙ (t)) dt. 0 i=1 0 2In practice, one solves the backward Cauchy problem with final conditions q∗(τ) = q and p∗(τ) = p for some ∗ ∂σ ∗ p and then uses the conditions pi (0) = ∂qi (q (0)) ∀i to determine p as a function of q and τ. 56 CHAPTER 6. THE HAMILTON–JACOBI EQUATION

Proof. We clearly have S(q, 0) = σ(q). Then recall that the Hamilton equations for H are also the EL equations for the the Lagrangian function

d X i Le(q, p, vq, vp, t) := pivq − H(q, p, t) i=1

d 2d defined on (U × R ) × R × I. Denote by Se the action functional corresponding to Le and observe that

Z τ d ! X ∗ ∗i ∗ ∗ ∗ ∗ SHam(q, τ) := pi (t)q ˙ (t) − H(q (t), p (t)) dt = Se[(q , p )]. 0 i=1

We now want to compute derivatives of SHam with respect to its arguments. We essentially proceed like in the proof of Theorem 6.3.2. The first remark is that, for any solution (Q, P ) of the Hamilton equations on an interval [0, τ], we have

δSe[(Q, P ), (δQ, δP )] := Se[(Q + δQ, P + δP )] − Se[(Q, P )] = lim = →0  d X i i  = Pi(τ)δQ (τ) − Pi(0)δQ (0) . i=1 In particular, for the characteristic (q∗, p∗) we have

d d ∗ ∗ X i X ∂σ i δSe[(q , p ), (δQ, δP )] = pi(q, τ)δQ (τ) − (φ(q, τ))δQ (0), ∂qi i=1 i=1 where again we have written p∗(τ) = p(q, τ). ∗ ∗ ∗ ∗ Now consider the characteristic (q , p ) corresponding to (q + δq, τ). We write q = q + ∗ 2 ∗ ∗ ∗ 2 δq + O( ) and p = p + δp + O( ). Since

∗ ∗ 2 q + δq = q (τ) = q + δq (τ) + O( ), we get δq∗(τ) = δq. Since

∗ ∗ 2 φ(q + δq, τ) = q (0) = φ(q, τ) + δq (0) + O( ),

∗i Pd ∂φi j we get δq (0) = j=1 ∂qj (q, τ)δq . Hence

d d i ∗ ∗ ∗ ∗ X i X ∂σ ∂φ j δSe[(q , p ), (δq , δp )] = pi(q, τ)δq − (φ(q, τ)) (q, τ)δq . ∂qi ∂qj i=1 i,j=1 Since d SHam(q + δq, τ) − SHam(q, τ) X ∂SHam lim = (q, τ)δqj, →0  ∂qj j=1 we finally get d i ∂SHam X ∂σ ∂φ (q, τ) = p (q, τ) − (φ(q, τ)) (q, τ) ∂qj j ∂qi ∂qj i=1 and so ∂S (q, τ) = p (q, τ). ∂qj j 6.5. GENERATING FUNCTIONS 57

∗ ∗ Similarly, we now denote by (q , p ) the characteristic corresponding to (q, τ + δτ). We ∗ ∗ ∗ ∗ assume δτ ≥ 0 and denote by (qe , pe ) the restriction of (q , p ) to [0, τ]. We then have

SHam(q, τ + δτ) = d ! ∗ ∗ X ∗i 2 = Se[(qe , pe )] + pi(q, τ)q ˙ (τ) − H(q, p(q, τ)) δτ + O( ). i=1 ∗ ∗ ∗ 2 ∗ ∗ ∗ 2 We write qe = q + δq + O( ) and pe = p + δp + O( ). Reasoning as above we get ∗ ∗ ∗i ∂φi δq (τ) = −q˙ (τ)δτ and δq (0) = ∂τ (q, τ)δτ. So, putting everything together, we get

d i ∂SHam X ∂σ ∂φ (q, τ) = −H(q, p(q, τ)) − (φ(q, τ)) (q, τ) ∂τ ∂qi ∂τ i=1 and so ∂S (q, τ) = −H(q, p(q, τ)). ∂τ Hence S solves the Hamilton–Jacobi equation.

Remark 6.4.3. In Remark 6.2.1 we have seen that the Hamilton–Jacobi equation is related to the asymptotics of the Schr¨odingerequation. With the results of this Section, we now also i S(q,t) see that ψ(q, t) := e ~ solves the Schr¨odingerequation up to O(~). This shows the role of the exponential of the action functional in quantum mechanics (its full-fledged role appears in Feynman’s path integral).

6.5 Generating functions

Pd i 2d Let ω = i=1 dpidq by the symplectic form on W ⊂ R with coordinates (q, p) and Ω = Pd i 2d i=1 dPidQ by the symplectic form on Z ⊂ R with coordinates (Q, P ). Recall that a symplec- tomorphism (in this context a.k.a. canonical transformation) is a diffeomorphisms φ: W → Z such that φ∗Ω = ω. Also recall that the orbits of the Hamiltonian system with Hamiltonian function H on W are bijectively mapped by the symplectomorphsm φ to the orbits of the Hamiltonian system with Hamiltonian function He := H ◦ φ−1 on Z. The idea is to look for a symplectomorphism that makes He very simple, so that its Hamilton equations can be solved explixitly. We actually look for a symplectomorphism that makes He depend only on the P variables: He(Q, P ) = K(P ) for some function K. In this case, the Hamilton equation are just P˙ = 0 and Q˙ i = ∂K (P ), ∀i. The solution of the Cauchy problem with initial condition, say at time t = 0, ∂Pi given by (Q ,P ) is then P (t) = P and Qi(t) = Qi + ∂K (P )t, ∀i. 0 0 0 0 ∂Pi 0 Pd i Pd i Notice that ω = dα with α = i=1 pidq and that Ω = dβ with β = i=1 PidQ . Notice that a diffeomorphism φ: W → Z such that α − φ∗β is the differential of a function F is in particular a symplectomorphism. We will only consider symplectomprhisms of this form.3 More i explicitly, we denote by Q (q, p) and Pi(q, p) the components of φ. So we have d d X i X i pidq − Pi(q, p)dQ (q, p) = dF (q, p). (6.2) i=1 i=1 Now we assume that the graph of φ in Z × W may be parametrized by the (q, P ) variables (instead of the (q, p) variables). Namely, we want to solve the equations Pi = Pi(q, p) with respect to the p variables getting them as smooth functions of the P s and the qs. By the implicit function theorem, this is possible is the following condition is satisfied:

3In general, φ is a symplectomorphism if and only if α − φ∗β is closed. If W and Z are contractible, in particular star shaped, then every closed 1-form is automatically exact. 58 CHAPTER 6. THE HAMILTON–JACOBI EQUATION

  ∂Pi Assumption 1. We assume that the matrix j is nondegenerate for all (q, p) ∈ W . ∂p i,j=1,...,d

Under this assumption, we then get functions pe(q, P ) and define Qe(q, P ) := Q(q, pe(q, P )). Equation (6.2) now becomes

d d X i X i pei(q, P )dq − PidQe (q, P ) = dFe(q, P ) i=1 i=1 with Fe(q, P ) = F (q, pe(q, P )). Setting

d X i S(q, P ) := Fe(q, P ) + PiQe (q, P ), i=1 we finally get d d X i X i pi(q, P )dq + Q (q, P )dPi = dS(q, P ), i=1 i=1 where we have removed the tildes for simplicity of notation. Notice that this equation is equiv- alent to the system ∂S p = , (6.3) i ∂qi ∂S Qi = , (6.4) ∂Pi for i = 1, . . . , d. Notice that Assumption 1 is satisfied if the following holds:

 ∂2S  Assumption 2. The matrix j is nondegenerate for all (q, P ). ∂q ∂Pi i,j=1,...,d As the map φ can then be reconstructed by these equations, an S satisfying this condition is called a generating function for φ. Next we want He(Q, P ) = K(P ). Since He(Q(q, P ),P ) = H(q, p(q, P )),we get by (6.3) that

 ∂S  H q, = K(P ). ∂q

Hence S, as a function of q parametrized by P , solves the reduced Hamilton–Jacobi equation at energy K(P ). A solution satisfying Assumption 2 is called a complete integral. What we have shown is Jacobi’s theorem that a complete integral for the Hamilton–Jacobi equation for H allows one to solve its Hamilton equations. Notice that the P variables are contants of motions for the He system. Also notice that their differentials are clearly linear independent and that their pairwise Poisson brackets vanish. Regarded as functions of the (q, p) variables, they are then independent constants of motions for the H system in involution. A d-dimensional Hamiltonian system with d independent constants of motions in involution is called integrable. We then see that the above method can only work for integrable systems. Chapter 7

Introduction to Differentiable Manifolds

7.1 Introduction

n Differentiable manifolds are sets that locally look like some R so that we can do calculus on n them. Examples of manifolds are open subsets of R or subsets defined by constraints satisfying the assumptions of the implicit function theorem (example: the n-sphere Sn). Also in the latter case, it is however more practical to think of manifolds intrinsically in terms of charts. The example to bear in mind are charts of Earth collected in an atlas, with the indications on how to pass from one chart to another.

7.2 Manifolds

Definition 7.2.1. A chart on a set M is a pair (U, φ) where U is a subset of M and φ is an n injective map from U to R for some n. The map φ is called a chart map or a coordinate map. One often refers to φ itself as a chart, for the subset U is part of φ as its definition domain.

If (U, φU ) and (V, φV ) are charts on M, we may compose the bijections (φU )|U∩V : U ∩ V →

φU (U ∩ V ) and (φV )|U∩V : U ∩ V → φV (U ∩ V ) and get the bijection −1 φU,V := (φV )|U∩V ◦ (φU |U∩V ) : φU (U ∩ V ) → φV (U ∩ V ) called the transition map from (U, φU ) to (V, φV ) (or simply from U to V ).

Definition 7.2.2. An atlas on a set M is a collection of charts (Uα, φα)α∈I , where I is an index set, such that ∪α∈I Uα = M.

Remark 7.2.3. We usually denote the transition maps between charts in an atlas (Uα, φα)α∈I simply by φαβ (instead of φUα,Uβ ).

One can easily check that, if φα(Uα) is open ∀α ∈ I (in the standard topology of the target), 1 then the atlas A = {(Uα, φα)}α∈I defines a topology on M:

OA(M) := {V ⊂ M | φα(V ∩ Uα) is open ∀α ∈ I}.

We may additionally require that all Uα be open in this topology or, equivalently, that φα(Uα ∩ Uβ) is open ∀α, β ∈ I. In this case we speak of an open atlas. All transition maps in an open atlas have open domain and codomain, so we can require them to belong to a class C ⊃ C0 of maps (e.g., Ck for k = 0, 1,..., ∞, or analytic, or complex analytic, or Lipschitz).

1For more on topology, see Appendix A.1.

59 60 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

Definition 7.2.4. A C-atlas is an open atlas such that all transition maps are C-maps.

n n Example 7.2.5. Let M = R . Then A = {(R , φ)} is a C-atlas for any structure C if φ is an injective map with open image. Notice that M has the standard topology iff φ is a homeomorphism with its image. If φ is the identity map Id, this is called the standard atlas for n R .

n Example 7.2.6. Let M be an open subset of R with its standard topology. Then A = {(U, ι)}, with ι the inclusion map, is a C-atlas for any structure C.

n n n Example 7.2.7. Let M = R . Let A = {(R , Id), (R , φ)}. Then A is a C-atlas iff φ and its inverse are C-maps.

2 Example 7.2.8. Let M be the set of lines (i.e., one-dimensional affine subspaces) of R . Let U1 be the subset of nonvertical lines and U2 the subset of nonhorizontal lines. Notice that every line in U1 can be uniquely parametrized as y = m1x + q1 and every line in U2 can be 2 uniquely parametrized as x = m2y + q2. Define φi : Ui → R as the map that assigns to a line k the corresponding pair (mi, qi), for i = 1, 2. Then A = {(U1, φ1), (U2, φ2)} is a C -atlas for k = 0, 1, 2,..., ∞.

n n+1 Pn+1 i 2 Example 7.2.9. Let M = S := {x ∈ R | i=1 (x ) = 1} be the n-sphere. Let N = (0,..., 0, 1) and S = (0,..., 0, −1) denote its north and south poles, respectively. Let UN := n n n n S \{N} and US := S \{S}. Let φN : UN → R and φS : US → R be the stereographic n projections with respect to N and S, respectively: φN maps a point y in S to the intersection n+1 of the plane {x = 0} with the line passing through N and y; similarly for φS. Then k A = {(UN , φN ), (US, φS)} is a C -atlas for k = 0, 1, 2,..., ∞.

n k Example 7.2.10. Let M be a subset of R defined by C -constraints satisfying the assumptions of the implicit function theorem. Then locally M can be regarded as the graph of a Ck-map. Any open covering of M with this property yields a Ck-atlas.

The same set often occurs with different atlases, like in the last example, that we wish to consider equivalent.

Definition 7.2.11. Two C-atlases on the same set are C-equivalent if their union is also a C-atlas.

Notice that the union of two atlases has in general more transition maps and in checking equivalence one has to check that also the new transition maps are C-maps.

n n n Example 7.2.12. Let M = R , A1 = {(R , Id)} and A2 = {(R , φ)} for an injective map φ with open image. These two atlases are C-equivalent iff φ and its inverse are C-maps.

We finally arrive to the

Definition 7.2.13. A C-manifold is an equivalence class of C-atlases.

Remark 7.2.14. Usually in defining a C-manifold we explicitly introduce one atlas and tacitly consider the corresponding C-manifold as the equivalence class containing this atlas. Also notice that the union of all atlases in a given class is also an atlas, called the maximal atlas, in the same equivalence class. Thus, we may equivalently define a manifold as a set with a maximal atlas. This is not very practical as the maximal atlas is huge. Working with an equivalence class of atlases instead of a single one also has the advantage that whatever definition we want to give requires choosing just a particular atlas in the class and we may choose the most convenient one. 7.3. MAPS 61

n Example 7.2.15. The standard C-manifold structure on R is the C-equivalence class of the n atlas {(R , Id)}. Remark 7.2.16. Notice that the same set can be given different manifold structures. For exam- n ple, let M = R . On it we have the the standard C-structure of the previous example. For any injective map φ with open image we also have the C-structure given by the equivalent class of n the the C-atlas {(R , φ)}. The two structures define the same C-manifold iff φ and its inverse are C-maps. Notice that if φ is not a homeorphism, the two manifolds are different also as topological spaces. Suppose that φ is a homeomorphism but not a Ck-diffeomorphism; then the two structures define the same topological space and the same C0-manifold, but not the same Ck-manifold. k m Recall that the existence of a C -diffeomorphism between an open subset of R and an open n m subset of R implies m = n since the differential at any point is a linear isomorphism of R n and R as vector spaces (the result is also true for homeomorphisms, though the proof is more difficult). So we have the

Definition 7.2.17. A connected manifold has dimension n if for any (and hence for all) of its n charts the target of the chart map is R . In general, we say that a manifold has dimension n if all its connected components have dimension n. We write dim M = n.

7.3 Maps

Let F : M → N be a map of sets. Let (U, φU ) be a chart on M and (V, ψV ) be a chart on N. The map −1 FU,V := ψV ◦ F|U ◦ φU : φU (U) → ψV (V ) is called the representation of F in the charts (U, φU ) and (V, ψV ). Definition 7.3.1. A map F : M → N between C-manifolds is called a C-map or C-morphism if all its representations are C-maps.

Remark 7.3.2. Notice that is it enough to choose one atlas in the equivalence class of the source and one atlas in the equivalence class of the target and to check that all representations are C-maps for charts of these two atlases. The condition will then automatically hold for any other atlases in the same class.

Definition 7.3.3. A C-map from a C-manifold M to R with its standard manifold structure is called a C-function. We denote by C(M) the of C-functions on M.

k Remark 7.3.4. Notice that a C -map between open subsets of Cartesian powers of R is also automatically Cl ∀l ≤ k, so a Ck-manifold can be regarded also as a Cl-manifold ∀l ≤ k. As a consequence, ∀l ≤ k, we have the notion of Cl-maps between Ck-manifolds and of Cl-functions on a Ck-manifold.

Definition 7.3.5. An invertible C-map between C-manifolds whose inverse is also a C-map is called a C-isomorphism.A Ck-isomorphism, k ≥ 1, is usually called a Ck-diffeomorphism (or just a diffeomorphism).

Example 7.3.6. Let M and N be open subsets of Cartesian powers of R with the standard C-manifold structure. Then a map is a C-map of C-manifolds iff it is a C-map in the standard sense.

Example 7.3.7. Let M be a C-manifold and U an open subset thereof. We consider U as a C-manifold by restricting any atlas from M to U. Then the inclusion map ι: U → M is a C-map. 62 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

n n Example 7.3.8. Let M be R with the equivalence class of the atlas {(R , φ)}, where φ is an n injective map with open image. Let N be R with its standard structure. Then φ: M → N n is a C-map for any C (since its representation is the identity map on open subset of R ). If in addition φ is also surjective, then φ: M → N is a C-isomorphism.

Remark 7.3.9. Let M and N be as in the previous example with φ a bijection. Assume that k φ: R → R is a homeomorphism but not a C -diffeomorphism. Then the given atlases are C0-equivalent but not Ck-equivalent. As a consequence, M and N are the same C0-manifold but different Ck-manifolds. On the other hand, φ: M → N is always a Ck-diffeomorphism of Ck-manifolds. More difficult is to find examples of two Ck-manifolds that are the same C0-man- ifold (or C0-isomorphic to each other), but are different, non Ck-diffeomorphic Ck-manifolds. Milnor constructed a C∞-manifold structure on the 7-sphere that is not diffeomorphic to the standard 7-sphere. From the work of Donaldson and Freedman one can derive uncountably ∞ 4 4 many different C -manifold structures on R (called the exotic R s) that are not diffeomorphic 4 to each other nor to the standard R . In dimension 3 and less, one can show that any two C0-isomorphic manifolds are also diffeomorphic.

7.3.1 Submanifolds A submanifold is a subset of a manifold that is locally given by fixing some coordinates. More precisely:

Definition 7.3.10. Let N be an n-dimensional C-manifold. A k-dimensional C-submanifold, k ≤ n, is a subset M of N such that there is a C-atlas {(Uα, φα)}α∈I of N with the property k that ∀α such that Uα ∩ M 6= ∅ we have φα(Uα) = V1,α × V2,α with V1,α open in R and V2,α n−k open in R and φα(Uα ∩ M) = V1,α × {x} for some x in V2,α.

One can prove that {(Uα, φα)}α∈I:Uα∩M6=∅ is a C-atlas for M. Moreover, the inclusion map M → N becomes a C-map.

Example 7.3.11. Any open subset M of a manifold N is a submanifold.

n k Example 7.3.12. One may check that a subset of the standard R defined in terms of C -con- straints satisfying the assumptions of the implicit function theorem is a Ck-submanifold. There is a more general version of this, the implicit function theorem for manifolds, which we do not present here.

7.4 Topological manifolds

In this Section we concentrate on C0-manifolds. Notice however that every C-manifold is by definition also a C0-manifold. As we have seen, an atlas whose chart maps have open images defines a topology. In this topology the chart maps are clearly open maps. We also have the

Lemma 7.4.1. All the chart maps of a C0-atlas are continuous, so they are homeomorphisms with their images.

n n Proof. Consider a chart (Uα, φα), φα : Uα → R . Let V be an open subset of R and W := −1 0 φα (V ). For any chart (Uβ, φβ) we have φβ(W ∩ Uβ) = φαβ(V ). In a C -atlas, all transition maps are homeomorphisms, so φαβ(V ) is open for all β, which shows that W is open. We have thus proved that φα is continuous. Since we already know that it is injective and open, we conclude that it is a homeomorphism with its image.

Different atlases in general define different topologies. However, 7.4. TOPOLOGICAL MANIFOLDS 63

Lemma 7.4.2. Two C0-equivalent C0-atlases define the same topology.

0 Proof. Let A1 = {(Uα, φα)}α∈I and A2 = {(Uj, φj)}j∈J be C -equivalent. Let W be open in the A1-topology. We have φj(W ∩ Uα) = φαj(φα(W ∩ Uα)). Since the atlases are equivalent, φαj is a homeomorphism and, since W is A1-open, φα(W ∩ Uα) is open. Hence φj(W ∩ Uα) is open. Since this holds for all j ∈ J, we get that W ∩ Uα is open in the A2-topology. Finally, we write W = ∪α∈I W ∩ Uα, i.e., as a union of A2-open set. This shows that W is open in the A2-topology.

As a consequence a C0-manifold has a canonically associated topology in which all charts are homeomorphism. This suggests the following

Definition 7.4.3. A topological manifold is a topological space endowed with an atlas {(Uα, φα)}α∈I in which all Uα are open and all φα are homeomorphisms with their images.

Theorem 7.4.4. A topological manifold is the same as a C0-manifold.

Proof. We have seen above that a C0-manifold structure defines a topology in which every atlas in the equivalence class has the properties in the definition of a topological manifold; so a C0-manifold is a topological manifold. On the other hand, the atlas of a topological manifold is open and all transition maps are homeomorphism since they are now compositions of homeomorphisms. The C0-equivalence class of this atlas then defines a C0-manifold.

Also notice the following

Lemma 7.4.5. Let M and N be C0-manifolds and so, consequently, topological manifolds. A map F : M → N is a C0-map iff it is continuous. In particular, a C0-isomorphism is the same as a homeomorphism.

0 Proof. Suppose that F is a C -map. Let {(Uα, φα)}α∈I be an atlas on M and {(Vβ, ψβ)}β∈J −1 be an atlas on N. For every W ⊂ N, ∀α ∈ I and ∀β ∈ J, we have φα(F (W ∩ Vβ) ∩ Uα) = −1 Fα,β(ψβ(W ∩Vβ)). If W is open, then ψβ(W ∩Vβ) is open for all β. Since all Fα,β are continuos, −1 −1 we conclude that φα(F (W ∩ Vβ) ∩ Uα) is open for all α and all β. Hence, F (W ∩ Vβ) is −1 −1 open for all β, so F (W ) = ∪β∈J F (W ∩ Vβ) is open. This shows that F is continuous. On the other hand, if F is continous, then all its representations are also continuous since all chart maps are homeomorphisms. Thus, F is a C0-map.

Remark 7.4.6. In the following we will no longer distinguish between C0-manifolds and topo- logical manifolds.2 Both descriptions are useful. Sometimes we are given a set with charts (like in the example of the manifold of lines in the plane). In other cases, we are given a topological space directly (like in all examples when our manifold arises as a subset of another manifold, n e.g., R ). In the definition of a topological manifold, several textbooks assume the topology to be Hausdorff and second countable. These properties have important consequences (like the ex- istence of a partition of unity which is fundamental in several contexts, e.g., in showing the existence of Riemannian metrics, or in proving Stokes theorem), but are not strictly necessary, so we will not assume them here. Notice that the Hausdorff property, stating that distinct points always have disjoint neigh- borhoods, is usually only assumed to avoid the “pathology” of having not uniquely defined limits. There are however several cases when one needs non-Hausdorff manifolds, so this as- sumption is often no longer required in modern textbooks.

2What we have proved above is that the category of C0-manifolds and the category of topological manifolds are isomorphic, if you know what categories are. 64 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

Example 7.4.7. Let M := R ∪ {∗} where {∗} is a one-element set (and ∗ 6∈ R). Let U1 = R, φ1 = Id, and U2 = (R \{0}) ∪ {∗} with φ2 : U2 → R defined by φ2(x) = x if x ∈ R \{0} and 0 ∞ φ2(∗) = 0. One can easily see that this is a C -atlas (actually a C -atlas, for the transition functions are just identity maps). On the other hand, the induced topology is not Hausdorff, for 0 and ∗ do not have disjoint neighborhoods.

7.5 Differentiable manifolds

A Ck-manifold with k ≥ 1 is also called a differentiable manifold. If k = ∞, one also speaks of a smooth manifold. The Ck-morphisms are also called differentiable maps, and also smooth maps in case k = ∞. Recall the following

Definition 7.5.1. Let F : U → V be a differentiable map between open subsets of Cartesian powers of R. The map F is called an immersion if dxF is injective ∀x ∈ U and a submersion if dxF is surjective ∀x ∈ U.

Then we have the

Definition 7.5.2. A differentiable map between differentiable manifolds is called an immersion if all its representations are immersions and a submersion if all its representations are submersions. An injective immersion is also called an embedding.

Observe that to check if a map is an immersion or a submersion one just has to consider all representations for a given choice of atlases. One can prove that the image of an embedding is a submanifold (and this is one very common way in which submanifolds arise in examples).

7.5.1 The tangent space

n n Recall that to an open subset of R we associate another copy of R , called its tangent space. Elements of this space, the tangent vectors, have the geometric interpretation of velocities of curves passing through a point or of directions along which we can differentiate functions. We will use all these viewpoints to give different caracterizations of tangent vectors to a manifold, even though we relegate the last one, directional derivatives, to Appendix A.2 as it can be safely ignored for the rest of these notes. In the following M is an n-dimensional Ck-manifold, k ≥ 1.

Definition 7.5.3. A coordinatized tangent vector at q ∈ M is a triple (U, φU , v) where (U, φU ) n is a chart with U 3 q and v is an element of R . Two coordinatized tangent vectors (U, φU , v) and (V, φV , w) at q are defined to be equivalent if w = dφU (q)φU,V v.A tangent vector at q ∈ M is an equivalence class of coordinatized tangent vectors at q. We denote by TqM, the tangent space of M at q, the set of tangent vectors at q.

A chart (U, φU ) at q defines a bijection of sets

n Φq,U : TqM → R (7.1) [(U, φU , v)] 7→ v

We will also simply write ΦU when the point q is understood. Using this bijection, we can n transfer the vector space structure from R to TqM making ΦU into a linear isomorphism. A crucial result is that this linear structure does not depend on the choice of the chart:

Lemma 7.5.4. TqM has a canonical structure of vector space for which Φq,U is an isomorphism for every chart (U, φU ) containing q. 7.5. DIFFERENTIABLE MANIFOLDS 65

Proof. Given a chart (U, φU ), the bijection ΦU defines the linear structure

λ ·U [(U, φU , v)] = [(U, φU , λv)], 0 0 [(U, φU , v)] +U [(U, φU , v )] = [(U, φU , v + v )],

0 n ∀λ ∈ R and ∀v, v ∈ R . If (V, φV ) is another chart, we have

λ ·U [(U, φU , v)] = [(U, φU , λv)] =

= [(V, φV , dφU (q)φU,V λv)] = [(V, φV , λdφU (q)φU,V v)] =

= λ ·V [(V, φV , dφU (q)φU,V v)] = λ ·V [(U, φU , v)], so ·U = ·V . Similarly,

0 0 [(U, φU , v)] +U [(U, φU , v )] = [(U, φU , v + v )] = 0 0 = [(V, φV , dφU (q)φU,V (v + v ))] = [(V, φV , dφU (q)φU,V v + dφU (q)φU,V v )] = 0 = [(V, φV , dφU (q)φU,V v)] +V [(V, φV , dφU (q)φU,V v )] = 0 = [(U, φU , v)] +V [(U, φU , v )], so +U = +V .

0 From now on we will simply write λ[(U, φU , v)] and [(U, φU , v)] + [(U, φU , v )] without the U label. Notice that in particular we have

dim TqM = dim M where dim denotes on the left-hand-side the dimension of a vector space and on the right-hand- side the dimension of a manifold. Let now F : M → N be a differentiable map. Given a chart (U, φU ) of M containing q and a U,V −1 chart (V, ψV ) of N containing F (q), we have the linear map dq F :ΦV dφU (q)FU,V ΦU : TqM → TF (q)N.

U,V Lemma 7.5.5. The linear map dq F does not depend on the choice of charts, so we have a canonically defined linear map

dqF : TqM → TF (q)N

0 0 Proof. Let (U , φU 0 ) be also a chart containing q and (V , ψV 0 ) be also a chart containing F (q). Then

U,V dq F [(U, φU , v)] = [(V, ψV , dφU (q)FU,V v)] = 0 0 0 = [(V , ψV , dψ(F (q))ψV,V dφU (q)FU,V v)] = 0 −1 = [(V , ψ 0 , d 0 F 0 0 (d φ 0 ) v)] = V φU (q) U ,V φU (q) U,U U 0,V 0 0 −1 U 0,V 0 0 0 = dq F [(U , φU , (dφU (q)φU,U ) v)] = dq F [(U, φU , v)],

U,V U 0,V 0 so dq F = dq .

We also immediately have the following

Lemma 7.5.6. Let F : M → N and G: N → Z be differentiable maps. Then dq(G ◦ F ) = dF (q)G dqF for all q ∈ M. 66 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

n Remark 7.5.7. Suppose M is a submanifold of R defined by l constraints satisfying the condi- n l tions of the implicit function theorem. We may reorganize the constraints as a map Φ: R → R . The conditions are that dqΦ for q ∈ M is surjective and that dqΦ dqι = 0, where ι is the inclu- n sion map of M into R . As a consequence, TqM = ker dqΦ ∀q ∈ M. This is an explicit way of computing the tangent space. Remark 7.5.8. Notice that we can now characterize immersions and submersions, introduced in Definition 7.5.2, as follows: A differentiable map F : M → N is an immersion iff dqF is injective ∀q ∈ M and is a submersion iff dqF is surjective ∀q ∈ M. A differentiable curve in M is a differentiable map γ : I → M, where I is an open subset of R with its standard manifold structure. For t ∈ I, we define the velocity of γ at t as

γ˙ (t) := dtγ1 ∈ Tγ(t)M

n where 1 is the vector 1 in R. Notice that for M an open subset of R this coincides with the usual definition of velocity. For q ∈ M, define Pq as the space of differentiable curves γ : I → M such that I 3 0 and γ(0) = q. It is easy to verify that the map Pq → TqM, γ 7→ γ˙ (0) is surjective, so we can think of TqM as the space of all possible velocities at q. This observation together with Remark 7.5.7 yields a practical way of computing the tangent n spaces of a submanifold of R .

7.5.2 The tangent bundle We can glue all the tangent spaces of an n-dimensional Ck-manifold M, k ≥ 1, together:

TM := ∪q∈M TqM

3 An element of TM is usually denoted as a pair (q, v) with q ∈ M and v ∈ TqM. We introduce the surjective map π : TM → M,(q, v) 7→ q. Notice that the fiber TqM can be also obtained as π−1(q). k−1 TM has the following structure of C -manifold. Let {(Uα, φα)}α∈I be an atlas in the −1 equivalence class defining M. We set Ubα := π (Uα) and

n n φbα : Ubα → R × R

(q, v) 7→ (φα(q), Φq,Uα v)

where Φq,Uα is the isomorphism defined in (7.1). Notice that the chart maps are linear in the fibers. The transition maps are then readily computed as

φbαβ(x, w) = (φαβ(x), dxφαβw)

Namely, they are the tangent lifts of the transition maps for M and are clearly Ck−1.

Definition 7.5.9. The tangent bundle of the Ck-manifold M, k ≥ 1, is the Ck−1-manifold defined by the equivalence class of the above atlas.

Remark 7.5.10. Observe that another atlas on M in the same Ck-equivalence class yields an atlas on TM that is Ck−1-equivalent to previous one. Remark 7.5.11. Notice that π : TM → M is a Ck−1-surjective map and, if k > 1, a submersion.

3Notice that we now denote by v a tangent vector at q, i.e., an equivalence class of coordinatized tangent n vectors at q, and no longer an element of R . 7.6. VECTOR BUNDLES 67

Definition 7.5.12. If M and N are Ck-manifolds and F : M → N is a Ck-map, then the tangent k−1 lift Fb : TM → TN is the C -map (q, v) 7→ (F (q), dqF v). k−1 Definition 7.5.13. A vector field on M is a C -map X : M → TM such π ◦ X = IdM .

In an atlas {(Uα, φα)}α∈I M and the corresponding atlas {(Ubα, φbα)}α∈I , a vector field X is k−1 n represented by a collection of C -maps Xα : φα(Uα) → R . All these maps are related by

Xβ(φαβ(x)) = dxφαβ Xα(x) for all α, β ∈ I and for all x ∈ φα(Uα ∩ Uβ). Notice that a collection of maps Xα satisfying all these relations defines a vector field and this is how often vector fields are introduced. To a vector field X we associate the ODE

q˙ = X(q).

A solution is a path q : I → M such thatq ˙(t) = X(q(t)) ∈ Tq(t)M for all t ∈ I. Assume k > 1, so the vector field is continuously differentiable. Notice that the existence and uniqueness theorem as well as the theorem on dependence on the initial values extend immediately to the case of Ck-manifolds, as it enough to have them in charts. (A solution that passes from one chart to another can be regarded as a pair of solutions, the end point of the first serving as the initial condition of the second.) To a vector field X we may then associate its flow Φt. We have that Φ0 = IdM and Φt+s = Φt ◦ Φs, which also implies that all Φt’s are diffeomorphisms. We recover X(q) as d0Φ(q)1, regarding Φ(q) as a map I → M.

7.6 Vector bundles

The tangent bundle introduced in the previous Section is an example of a more general structure known as a . Definition 7.6.1. A Ck-vector bundle of rank r over a Ck-manifold of dimension n is a Ck-man- ifold E together with a surjection π : E → M such that:

−1 1. Eq := π (q) is a vector space for all q ∈ M.

k −1 k 2. E possesses a C -atlas of the form {(Ueα, φeα)}α∈I with Ueα = π (Uα) for a C -atlas {(Uα, φα)}α∈I of M and

n r φeα : Ueα → R × R (q, v ∈ Eq) 7→ (φα(q),Aα(q)v)

where Aα(q) is a linear isomorphism for all q ∈ Uα.

−1 r r k n 3. The maps Aαβ(q) := Aβ(q)Aα(q) : R → R are C in q (i.e., Aαβ : Uα ∩ Uβ → End(R ) k n n2 is a C -map, where we identify End(R ) with R with its standard manifold structure) for all α, β ∈ I. Notice that π is a Ck-map with respect to this manifold structure and that for k > 0 it is a submersion. Also notice that the atlas in the definition has transition functions

φeαβ(x, u) = (φαβ(x),Aαβ(x)u)

r that are linear in the second factor R . Definition 7.6.2. A section of a Ck-vector bundle E −→π M is a Ck-map σ : M → E with π ◦ σ = IdM . 68 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

Example 7.6.3. It is readily verified that the tangent bundle TM of a Ck-manifold M with k ≥ 1 is a Ck−1-vector bundle where we regard the base manifold M as a Ck−1-manifold. A section of TM is then the same as a vector field on M. If one picks an atlas as in Definition 7.6.1, then a section of E is the same as a collection of k 4 r C -maps σα : φα(Uα) → R such that

σβ(φαβ(x)) = Aαβ(x) σα(x) (7.2) for all α, β ∈ I and for all x ∈ φα(Uα ∩ Uβ).

7.6.1 Constructions on vector bundles Another important consideration is that all constructions in linear algebra extend from vector spaces to vector bundles. We only consider two examples. We fix E and M as in Definition 7.6.1. ∗ ∗ ∗ Example 7.6.4 (The ). Let E := ∪q∈M Eq . We denote an element of E as a pair ∗ (q, ω) with ω ∈ Eq . We let πE∗ (q, ω) = q. To a chart {(Ueα, φeα)}α∈I of E we associate the chart ∗ −1 ∗ {(Ubα, φbα)}α∈I of E with Ubα = πE∗ (Uα) = ∪q∈Uα Eq and n r ∗ φbα : Ubα → R × (R ) ∗ ∗ −1 (q, ω ∈ Eq ) 7→ (φα(q), (Aα(q) ) ω) r ∗ r where we regard (R ) as the manifold R with its standard structure. It follows that we have transitions maps ∗ −1 φbαβ(x, u) = (φαβ(x), (Aαβ(x) ) u). In the case E = TM, the dual bundle is denoted by T ∗M and is called the of M. m m m Example 7.6.5 (Exterior power). Let ∧ E := ∪q∈M ∧ Eq. We denote an element of ∧ E as a m pair (q, ω) with ω ∈ ∧ Eq. We let π∧mE(q, ω) = q. To a chart {(Ueα, φeα)}α∈I of E we associate m −1 m the chart {(Ubα, φbα)}α∈I of ∧ E with Ubα = π∧mE(Uα) = ∪q∈Uα ∧ Eq and n m r φbα : Ubα → R × ∧ R m m (q, ω ∈ ∧ Eq) 7→ (φα(q), ∧ Aα(q) ω) r m r ( ) where we regard ∧ R as the manifold R m with its standard structure. It follows that we have transitions maps m φbαβ(x, u) = (φαβ(x), ∧ Aαβ(x) u). One further construction is given in the following Example 7.6.6 (Pullback bundle). Let F : N → M be a Ck-map and E −→π M a Ck-vector bundle. One defines F ∗E := {(q, e) ∈ N × E | F (q) = π(e)}. One can readily see that F ∗E is k ∗ a C -vector bundle over M with projection map πF ∗E(q, e) = q. In practice, the fiber of F E at q is given by the fiber of E∗ at F (q) and the fiber transition maps of F ∗E at q are given the fiber transtion maps of E at F (q). More precisely, we pick an atlas {(Vj, ψj)}j∈J of N. To the atlas in Definition 7.6.1, we then associate a new atlas {(Vαj, ψαj)}(α,j)∈I×J of N with −1 ∗ −1 Vαj := F (Uα) ∩ Vj and ψαj := ψj . The atlas of F E is then given by Vbαj = πF ∗E(Vαj) = |Vαj

∪q∈Vαj EF (q) and s r ψbαj : Vbαj → R × R (q, v ∈ EF (q)) 7→ (ψαj(q),Aα(F (q)) v) where s is the dimension of N. It follows that we have transitions maps

ψb(αj)(βj0)(x, u) = (ψ(αj)(βj0)(x),Aαβ(F (x))u). 4These maps also called local sections. 7.7. APPLICATIONS TO MECHANICS 69

7.6.2 Differential forms For simplicity we are now going to consider only smooth manifolds.

Definition 7.6.7. An m-form on a smooth manifold M is a section of ∧mTM. We denote by m ∞ • m • Ω (M) the C (M)-module of m-forms and by Ω (M) = ⊕mΩ (M). An element of Ω (M) is called a differential form on M.

Notice that if σ is an m-form, then (7.2) now reads

m ∗ (∧ dxφαβ) σβ(φαβ(x)) = σα(x), so we have the

Proposition 7.6.8. If {(Uα, φα)}α∈I is an atlas for M, then an m-form σ on M is the same as a collection of m-forms σα, defined on φα(Uα), such that

∗ σα = φαβσβ (7.3) for all α, β ∈ I.

Recall that the wedge product and the differential of differential forms on open subsets of n R are compatible with pullbacks. As a consequence they can be extended to manifolds. Also notice that if φ is a diffeomorphism of open subsets of Cartesian powers of R, then we have ∗ ∗ that φ (ιφ∗X σ) = ιX φ σ for all vector fields X and differential forms σ, so the whole Cartan calculus extends to manifolds.5

7.7 Applications to mechanics

Again for simplicity we only consider smooth manifolds.

7.7.1 The Noether 1-form

A smooth Lagrangian on M is by definition a smooth function on TM. If {(Uα, φα)}α∈I is an −1 atlas for M, then Lα := L ◦ φα is a smooth function on φα(Uα) for all α ∈ I. The Noether 1-form on φα(Uα) is n X ∂Lα i 1 θL := dq ∈ Ω (φbα(Ubα)), α ∂vi α i=1 α 1 n 1 n n where (qα, . . . , qα, vα, . . . , vα) are coordinates on φbα(Ubα) = φα(Uα) × R .

Proposition 7.7.1. The collection of 1-forms θLα defines a 1-form θL on TM called the Noether 1-form for L.

Proof. We have to verify (7.3), where now the transition maps are those of the tangent bundle; viz., we have to verify that ∗ θLα = φbαβθLβ , (7.4) for all α, β ∈ I. Explicitly, the maps φbαβ relate the coordinates (qβ, vβ) = φbαβ(qα, vα) by

i i qβ = φαβ(qα), n ∂φi i X αβ j vβ = j (qα) vα. j=1 ∂qα

5 −1 Recall that φ∗X(x) = dφ−1xX(φ (x)). 70 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

Since Lα(qα) = Lβ(qβ) by definition, we have n j j ! n j ∂Lα X ∂qβ ∂Lβ ∂vβ ∂Lβ X ∂φαβ ∂Lβ = + = . (7.5) ∂vi ∂vi j ∂vi j ∂qi j α j=1 α ∂qβ α ∂vβ j=1 α ∂vβ On the other hand, n j X ∂φαβ dqi = dqj . (7.6) ∂qi α β i=1 α Hence (7.4) is proved.

7.7.2 The Legendre mapping −1 Let L be a Lagrangian on M and Lα := L ◦ φα its representation in the chart (Uα, φα) as above. We define α ∂Lα pi := i ∂vα as a function of the coordinates (qα, vα). Equation (7.5) in the proof of Proposition 7.7.1, now reads n j X ∂φαβ pα = (q ) pβ, i ∂qi α j j=1 α β ∗ −1 α β α i.e., p = ((dqα φαβ) ) p . This shows that (qβ, p ) = φeαβ(qα, p ) where φeαβ are the transition maps for the cotangent bundle. This implies that the maps n n ψLα : φbα(Ubα) = φα(Uα) × R → φeα(Ueα) = φα(Uα) × R α (qα, vα) 7→ (qα, p (qα, vα)) are the representation of a map ∗ ψL : TM → T M (7.7) called the Legendre mapping. This also shows that Hamiltonian mechanics naturally takes place on the cotangent bundle.

7.7.3 The Liouville 1-form ∗ Let {(Uα, φα)}α∈I be an atlas for M and let {(Ueα, φeα)}α∈I be the corresponding atlas for T M. One defines n X α i 1 θα := pi dqα ∈ Ω (φeα(Ueα)), i=1 1 n α α n where (qα, . . . , qα, p1 , . . . , pn) are the coordinates on φeα(Ueα) = φα(Uα) × R . ∗ Proposition 7.7.2. The collection of 1-forms θα defines a 1-form θ on T M called the Liouville 1-form (a.k.a. the Poincar´e 1-form or the tautological 1-form). Proof. We have to verify (7.3), where now the transition maps are those of the cotangent bundle; viz., we have to verify that ∗ θα = φeαβθβ, (7.8) β α for all α, β ∈ I. Explicitly, the maps φeαβ relate the coordinates (qβ, p ) = φeαβ(qα, p ) by i i qβ = φαβ(qα), n j X ∂φαβ pα = (q ) pβ. i ∂qi α j j=1 α By (7.6), we immedately get (7.8). 7.7. APPLICATIONS TO MECHANICS 71

By direct inspection of the formulae we also get the

∗ Proposition 7.7.3. Let L be a Lagrangian. Then θL = ψLθ, where ψL denotes the Legendre mapping of equation (7.7).

Remark 7.7.4. There is also a coordinate independent definition of θ. Namely, denote by (q, p), ∗ ∗ ∗ q ∈ M and p ∈ (TqM) , the points in T M and let π : T M → M, π(q, p) = q be the projection ∗ map. For v ∈ T(q,p)T M, define θ(q,p)v := p(d(q,p)π v).

7.7.4 Symplectic geometry Symplectic geometry arises in the general formulation of the Hamiltonian systems encountered in mechanics.

Definition 7.7.5. A symplectic form on a manifold N is a closed, nondegnerate 2-form on N. A symplectic manifold is a pair (N, ω) where N is a manifold and ω is a symplectic form on N.

2 Namely, ω ∈ Ω (M), dω = 0, and for each q ∈ N the bilinear form ωq on TqN is nondegen- ] ∗ ] erate (equivalently, the linear map ωq : TqN → Tq N,(ωqv)w := ωq(v, w), for v, w ∈ TqN, is an isomorphism).

2n 1 n Example 7.7.6. Let N be an open subset of R with coordinates q , . . . , q , p1, . . . , pn. Then n X i ω = dpi ∧ dq (7.9) i=1 is a symplectic form on N.

Example 7.7.7. Let N = T ∗M and ω = dθ where θ is the Liouville 1-form on T ∗M. Then (N, ω) is a symplectic manifold. Nondegeneracy is verified since in local charts ω is written as in (7.9).

Remark 7.7.8. Darboux’s Theorem, which we do not prove here, asserts that every symplectic manifold possesses an atlas such that the symplectic form in each chart is as in (7.9). From now on, let (N, ω) be a symplectic manifold.

Definition 7.7.9. The Hamiltonian vector field XH of a function H on N is the unique vector

field satisfying ιXH ω = −dH. A vector field is called Hamiltonian if it is the Hamiltonian vector field of a function (which is defined up to a local constant).

If H and F are two functions on N, then one can easily see that

XH (F ) = −XF (H). (7.10)

This has two important consequences. The first is Noether’s theorem. We need first the

Definition 7.7.10. A Hamiltonian system is a pair ((N, ω),H) where (N, ω) is a symplectic manifold and H is a function on N.A constant of motion for the Hamiltonian system ((N, ω),H) is a function that is constant on the orbits of XH . An infinitesimal symmetry for the Hamiltonian system ((N, ω),H) is a Hamiltonian vector field X on N such that X(H) = 0.

Theorem 7.7.11 (Noether’s Theorem). A Hamiltonian vector field is a symmetry for the Hamiltonian system ((N, ω),H) iff any of its Hamiltonian functions is a constant of motion.

Proof. Let F be a Hamiltonian function for the vector field at hand, which we denote by XF . Being a symmetry means XF (H) = 0. On the other hand, F is a constant of motion iff XH (F ) = 0. The Theorem then follows from (7.10). 72 CHAPTER 7. INTRODUCTION TO DIFFERENTIABLE MANIFOLDS

The second consequence of (7.10) is that the bracket

{H,F } := XH (F ), called the Poisson bracket on (N, ω), is skew-symmetric. One can also show that it satifies the Jacobi identity. This immediately implies that the Poisson bracket of two contants of motion for the same Hamiltonian function is again a constant of motion. Appendices

73

Appendix A

Topology and Derivations

A.1 Topology

We recall a few facts about topology. Definition A.1.1. A topology on a set S is a collection O(S) of subsets of S such that 1. ∅,S ∈ O(S); 2. ∀U, V ∈ O(S) we have U ∩ V ∈ O(S);

3. if (Uα)α∈I is a family indexed by I with Uα ∈ O(S) ∀α ∈ I, we have ∪α∈I Uα ∈ O(S). A set with a topology is called a topological space. n n Example A.1.2. The collection of the usual open subsets of R forms a topology on R , called its standard topology. In general, elements of a topology are called open sets and elements of a topological space are called points.A neighborhood of a point is an open set containing it. Definition A.1.3. A map F : S → T between topological spaces (S, O(S)) and (T, O(T )) is called continuous if F −1(U) ∈ O(S) ∀U ∈ O(T ). A continous invertible map whose inverse is also continuous is called a homeomorphism. A map that maps open sets to opens sets (i.e., in the abve notation, F (U) ∈ O(T ) ∀U ∈ O(S)) is called open. Topologies may often be induced. We consider two examples here. Example A.1.4. Let (S, O(S)) be a topological space and let T be a subset. Then

OS(T ) := {U ⊂ T | ∃V ∈ O(S): U = V ∩ T } is a topology on T . With this topology the inclusion map ι: T,→ S is continuous. n This is in particular the topology one usually considers on subsets of R with its standard topology. Example A.1.5. Let (S, O(S)) be a topological space and π : S → T be a surjective map. Then −1 OS,π(T ) := {U ⊂ T | π (U) ∈ O(S)} is a topology on T . With this topology π is continuous. Notice in particular that π arises when we have a quotient relation on S and define T as the set of equivalence classes. n Remark A.1.6. Unless stated otherwise, when we speak of R , we tacitly assume the standard topology; when speaking of a subset of a topological space or a quotient of a topological space, we tacitly assume the induced topology.

75 76 APPENDIX A. TOPOLOGY AND DERIVATIONS

A.2 Derivations

In this Appendix we make a digression, which can be omitted with no consequences by the hasty reader, on the interpretation of tangent vectors as directions along which one can differentiate functions. This idea leads, in the case of smooth manifolds, to a definition of the tangent space where the linear structure is intrinsic and does not require choosing charts, even though only at an intermediate stage. The construction is also more algebraic in nature. The characterizing algebraic property of a derivative is the Leibniz rule for differentiating products. From the topological viewpoint, derivatives are characterized by the fact that, being defined as limits, they only see an arbitrarily small neighborhood of the point where we differ- entiate. The latter remark then suggests considering functions “up to a change of the definition domain,” a viewpoint that turns out to be quite useful. k k k Let M be a C -manifold, k ≥ 0. For q ∈ M we denote by Cq (M) the set of C -functions defined in a neighborhood of q in M. Notice that by pointwise addition and multiplication of k functions (on the intersection of their definition domains), Cq (M) is a commutative algebra.

k Definition A.2.1. We define two functions in Cq (M) to be equivalent if they coincide in a neighborhood of q.1 An equivalence class is called a germ of Ck-functions at q. We denote by k Cq M the set of germs at q with the inherited algebra structure. Notice that two equivalent functions have the same value at q. This defines an algebra morphism, called the evaluation at q:

k evq : Cq M → R. We are now ready for the

k Definition A.2.2. A derivation at q in M is a linear map D : Cq M → R satisfying the Leibniz rule D(fg) = Dfevqg + evqfDg, k for all f, g ∈ Cq M. Notice that a linear combination of derivations at q is also a derivation at k q. We denote by Derq M the vector space of derivations at q in M. Remark A.2.3. Notice that if U is a neighborhood of q, regarded as a Ck-manifold, a germ at k k q ∈ U is the same as a germ at q ∈ M. So we have Cq U = Cq M. As a consequence we have

k k Derq U = Derq M for every neighborhood U of q in M. The first algebraic remark is the following Lemma A.2.4. A derivation vanishes on germs of constant functions (the germ of a constant function at q is an equivalence class containing a function that is constant in a neighborhood of q). Proof. Let D be a derivation at q. First consider the germ 1 (the equivalence class containing a function that is equal to 1 in a neighborhood of q). From 1 · 1 = 1, it follows that

D1 = D1 1 + 1D1 = 2D1, so D1 = 0. Then observe that, if f is the germ of a constant function, then f = k1, where k is the evaluation of f at q. Hence, by , we have Df = k D1 = 0 1More pedantically, f ∼ q if there is a neighborhood U of q in M contained in the definition domains of f and g such that f|U = g|U . A.2. DERIVATIONS 77

Remark A.2.5. Notice that all the above extends to a more geneal context: one may define derivations an any algebra with a character (an algebra morphism to the ground field). The above Lemma holds in the case of algebras with one. k ∗ k Let now F : M → N be a C -morphism. Then we have an algebra morphism F : CF (q)(N) → Ck(M), f 7→ f ◦F , where V is the definition domain of f. This clearly descends to germs, q |F −1(V ) so we have an algebra morphism

∗ k k F : CF (q)N → Cq M, which in turn induces a linear map of derivations

k k k derq : Derq M → DerF (q)N D 7→ D ◦ F ∗

It then follows immediately that, if G: N → Z is now a Ck-morphism, then

k k k derq (G ◦ F ) = derF (q)G derq F.

k k This in particular implies that, if F is a C -isomorphism, then derq F is a linear isomorphism. k k Let (U, φU ) be a chart containing q. We then have an isomorphism derq φU : Derq U → Derk φ (U). As in Remark A.2.3, we have DerkU = DerkM and Derk φ (U) = Derk n. φU (q) U q q φU (q) U φU (q)R Hence we have an isomorphism

derkφ : DerkM −→∼ Derk n q U q φU (q)R for each chart (U, φU ) containing q. It remains for us to understand derivations at a point of n R : n Lemma A.2.6. For every y ∈ R , the linear map k n n Ay : DeryR → R Dx1  .  D 7→  .  Dxn is surjective for k ≥ 1 and an isomorphism for k = ∞ (here x1, . . . , xn denote the germs of the n coordinate functions on R ). Proof. For k ≥ 1 we may also define the linear map

n k n By : R → DeryR v1  .  v =  .  7→ Dv vn with n X ∂f D [f] = vi (y), v ∂xi i=1 where f is a representative of [f]. Notice that AyBy = Id, which implies that Ay is surjective. It remains to show that, for k = ∞, we also have ByAy = Id. Let f be a representative of ∞ n [f] ∈ Cy R . As a function of x, f may be Taylor-expanded around y as n X ∂f f(x) = f(y) + (xi − yi) (y) + R (x), ∂xi 2 i=1 78 APPENDIX A. TOPOLOGY AND DERIVATIONS where the rest can be written as n X Z 1 ∂2f R (x) = (xi − yi)(xj − yj) (1 − t) (y + t(x − y)) dt. 2 ∂xi∂xj i,j=1 0 (To prove this formula just integrate by parts.) Define n ∂f X Z 1 ∂2f σ (x) := (y) + (xj − yj) (1 − t) (y + t(x − y)) dt, i ∂xi ∂xi∂xj j=1 0 so we can write n X i i f(x) = f(y) + (x − y )σi(x). i=1 i i ∞ 2 Observe that, for all i, both x − y and σi are C -functions; the first vanishes at x = y, whereas for the second we have ∂f σ (y) = (y). i ∂xi ∞ n For a derivation D ∈ Dery R , we then have, also using Lemma A.2.4, n X ∂f D[f] = Dxi (y) = B A (D)[f], ∂xi y y i=1 which completes the proof.

∞ ∞ From now on, we simply write Derq and derq instead of Derq and derq . Corollary A.2.7. For every q in a smooth manifold, we have

dim DerqM = dim M We finally want to compare the construction in terms of derivations with the one in terms of equivalence classes of coordinatized tangent vectors.

Theorem A.2.8. Let M be a smooth manifold, q ∈ M, and (U, φU ) a chart containing q. Then the isomorphism −1 −1 τq,U := (derqφU ) Aφ(q)Φq,U : TqM → DerqM does not depend on the choice of chart. We will denote this canonical isomorphism simply by τq. −1 If F : M → N is a smooth map, we have dqF = τF (q) derqF τq. Proof. Explicitly we have, n X ∂(f ◦ φ−1) (τ [(U, φ , v)])[f] = vi U (φ (q)), q,U U ∂xi U i=1 ∞ for every representative f of [f] ∈ Cq M. We then have, by the chain rule,

(τq,V [(U, φU , v)])[f] = (τq,V [(V, φV , dφU (q)φU,V v)])[f] = n i −1 X ∂φU,V ∂(f ◦ φ ) = (φ (q)) vj V (φ (q)) = ∂xj U ∂xi V i,j=1 n X ∂(f ◦ φ−1) = vi U (φ (q)) = (τ [(U, φ , v)])[f]. ∂xi U q,U U i=1 The last statement of the Theorem also easily follows from the chain rule in differentiating f ◦F , ∞ f ∈ [f] ∈ CF (q)N.

2 k−2 Here it is crucial to work with k = ∞. For k ≥ 2 finite, in general σi is only C , and for k = 1 it is not even defined. Appendix B

Vector fields as derivations

We now want to show that vector fields on a manifolds are the same as derivations on its algebra of functions.

∞ Definition B.0.1. A derivation on a smooth manifold M is an R-linear map D : C (M) → C∞(M) such that D(fg) = Df g + f Dg. Notice that a linear combination of derivations is also a derivation. We denote by Der(M) the C∞(M)-module of derivations on C∞(M).

We first want to connect derivations with derivations at a point q. Notice that, for every k k k C -manifold, k ≥ 0, we have a linear map γq : C (M) → Cq M that associates to a function its germ at q.

Lemma B.0.2. For every q ∈ M, γq is surjective.

k Proof. Let [f] ∈ Cq M. Pick an atlas {(Uα, φα)}α∈I , and let α be an index such that Uα 3 q. −1 ∗ k k Let f be a representative of (φα ) [f] ∈ Cq φα(Uα). Choose ψ ∈ C (φα(Uα)) with the following properties:

1. ψ|V = 1, where V is an open ball containing φα(q) and contained in φα(Uα). 2. ψ = 0, where W is an open ball containing V and contained in φ (U ). |φα(Uα)\W α α

−1 ∗ Let fe := fψ. We clearly have [fe] = (φα ) [f]. Let fα := f ◦ φα. We then have [fα] = [f]. Finally, we define a function fb on M by fb(x) = fα(x) for x ∈ Uα and f(x) = 0 for x ∈ M \ Uα. We claim that fb ∈ Ck(M). −1 For β ∈ I, define Wβ := φαβ(W ) ⊂ φβ(Uβ). We clearly have that fb◦ φβ coincides with fe◦ φβα on φβ(Uα ∩ Uβ) and vanishes in the complement of Wβ in φβ(Uβ). This shows that −1 k k fb◦ φβ is C for all β, so fb ∈ C (M). Finally, observe that γq(fb) = [f].

Theorem B.0.3. If M is a smooth manifold, we have a canonical C∞(M)-linear isomorphism

τ : X(M) → Der(C∞(M)), where X(M) is the C∞(M)-module of vector fields on M.

Proof. If X is a vector field and f is a function, we define ((τ(X))f)(q) := (τqX(q))γqf. It is readily verified that τ(X) is a derivation. It is also clear that τ is C∞(M)-linear and injective. We only have to show that it is surjective.

79 80 APPENDIX B. VECTOR FIELDS AS DERIVATIONS

∞ −1 If D is a derivation and [f] ∈ Cq , we define Dq[f] := (Df)(q) for any f ∈ γq ([f]). This is readily seen to be independent of the choice of f and to be a derivation at q. We then define −1 X(q) := τq (Dq), which is readily seen to depend smoothly on q. Hence we have found an inverse map fo τ. Bibliography

[1] Oliver B¨uhler: A Brief Introduction to Classical, Statistical, and Quantum Mechanics, American Mathematical Society, Courant Institute of Mathematical Sciences

[2] Leon A. Takhtajan: Quantum Mechanics for Mathematicians, Graduate Studies in Mathe- matics, 95, American Mathematical Society.

[3] V.I. Arnold: Mathematical Methods of Classical Mechanics, Springer-Verlag, Graduate Texts in Mathematics.

[4] John M. Lee: Introduction to smooth manifolds, University of Washington, Department of Mathematics.

[5] L.D. Landau, E.M. Lifshitz: Mechanics, Course of Theoretical Physics. Vol. 1 (3rd ed.). Butterworth-Heinemann.

81