Phys410, Classical Notes University of Maryland, College Park Ted Jacobson

December 20, 2012

These notes are an evolving document...

Contents

1 Energy 3 1.1 Potential energy ...... 3

2 Variational calculus 4 2.1 Euler-Lagrange equations ...... 4

3 Lagrangian Mechanics 4 3.1 Constraints ...... 6 3.2 Effective potential; example of spherical pendulum ...... 8 3.3 Spinning hoop ...... 9

4 Hamiltonian and Conservation of energy 9

5 Properties of the action 10

6 Electromagnetic field 12 6.1 Scalar and vector potentials ...... 12 6.2 Lagrangian for charge in an electromagnetic field ...... 13

1 7 Lagrange multipliers and constraints 14

8 Tidal force and potential 14

9 Velocity in a rotating frame 15

10 Special Relativity 16 10.1 Spacetime interval ...... 18 10.1.1 Proper time ...... 18 10.1.2 Pythagorean theorem of spacetime ...... 18 10.1.3 Time dilation and the twin effect ...... 20 10.1.4 Spacetime geometry ...... 20 10.1.5 Velocity and time dilation ...... 22 10.2 Inertial motion ...... 22 10.3 Action ...... 24 10.4 Energy and momentum ...... 25 10.4.1 Relation between energy and momentum ...... 26 10.4.2 Massless particles ...... 26 10.5 4-vectors and the Minkowski scalar product ...... 27 10.5.1 Conventions ...... 28 10.5.2 “Look Ma, no Lorentz transformations” ...... 28 10.6 4-momentum and 4-velocity ...... 28 10.6.1 Pesky factors of c ...... 30 10.6.2 Example: Relativistic Doppler effect ...... 30 10.7 Zero momentum frame ...... 31 10.7.1 Example: Head-on vs. fixed target collision energy . . . 31 10.7.2 Example: GZK cosmic ray cutoff ...... 32 10.8 Electromagnetic coupling ...... 33

11 General relativity 34

12 Hamiltonian formalism 37 12.1 Hamilton’s equations and the Hamiltonian ...... 38 12.2 Example: Bead on a rotating circular hoop ...... 39 12.2.1 Driven hoop ...... 40 12.2.2 Freely rotating hoop ...... 41 12.3 Example: Charged particle in an electromagnetic field . . . . . 42 12.4 Phase space volume and Liouville’s theorem ...... 43

2 12.5 Phase space and ...... 44 12.6 Entropy and phase space ...... 45 12.7 Adiabatic invariants ...... 46 12.7.1 Example: Oscillator with slowly varying m(t) and k(t) 48 12.7.2 Adiabatic invariance and quantum mechanics . . . . . 49 12.7.3 Example: Cyclotron orbits and magnetic mirror . . . . 49 12.8 Poisson brackets ...... 49 12.8.1 Canonical Quantization ...... 51 12.8.2 Canonical transformations ...... 51

13 Continuum mechanics 52 13.1 String ...... 53 13.2 Electromagnetic field ...... 55 13.3 Elastic solids ...... 55

1 Energy

1.1 Potential energy The concept of potential energy arises by considering forces that are (minus) the gradient of a function, the ”potential”. For such forces, if the potential is time independent, the force is said to be ”conservative”, and the work along a path is just minus the change of the potential, thanks to the fundamental theorem of calculus applied to line integrals. The work for such a force is therefore independent of the path that connects two given endpoints. By Stokes’ theorem, this is related to the fact that the curl of such a force is zero, since the curl of the gradient of anything is zero. Central forces F = f(r)ˆr are derivable from a potential. The key is that ∇r = ˆr, which I explained both computationally and in terms of the geomet- rical interpretation of the gradient: it points in the direciton of greatest rate of change of the function, and has magnitude equal to that rate of change. Thus we can write Z r  f(r)ˆr = f(r)∇r = ∇ dr0f(r0) (1) which shows that the potential for this radial force is U(r) = U(r) = − R r f(r0)dr0.

3 2 Variational calculus

2.1 Euler-Lagrange equations I explained the nature of a “functional” and what it means for that to be stationary with respect to variations of the function(s) that form its argu- ment. As an alternative to the method described in the book, I re-derived the Euler-Lagrange equations without introducing any particular path variation eta. - Examples: soap film stretched between hoops, length of a curve in the Euclidean plane. We solved this three ways:

1) paths y(x) [could instead take x(y)]

2) parametrized paths x(t), y(t)

3) parametrized paths r(t), θ(t) using the E-L equations. In the second case, we noted that the path param- eter has not been specified, so there is no reason whyx ˙(t) andy ˙(t) should be constant. But we found thatx ˙(t)/y˙(t) is constant, which implies that dx/dy (or dy/dx) is constant. In the 3rd case, the eqns are complicated, but if we use the translation symmetry to place the origin of the coordinate system on the curve, we see that the theta equation implies θ˙ = 0, which is certainly the description of a straight line through the origin.

3 Lagrangian Mechanics

Pulled out of a hat the definition of the Lagrangian, L = T − U, and the action S = R L dt, also called Hamilton’s principal function. Showed that for a particle in 1d the condition that S be stationary under all path variations that vanish at the endpoints is equivalent to Newton’s second law. This is called “Hamilton’s principle”. Then generalized this to a particle in 3d, then to two particles in 3d interacting with each other via a potential. It general- izes to any number of particles.

- It’s quite remarkable that the collection of vector equations of a system of a system of particles all come from Hamilton’s principle, which refers to the variation of the integral of a scalar. Adding more particles or dimensions

4 increases the number of functions that the action depends on, but it’s still the integral of a scalar.

- Although it looks arbitrary at first, the action approach is actually the deeper approach to mechanics. The action approach also governs relativistic mechanics, and even field theory. For example Maxwell’s equations and even Einstein’s field equations of gravitation are all governed by an action princi- ple. In the case of fields, the Lagrangian is an integral over space. Moreover, it is via the action that the role of symmetries is best understood and ex- ploited. Also, as a practical matter, one of the most powerful things about the Lagrangian formalism is the flexibility of the choice of variables, since by using variables adapted to a system one can simplify the equations. Choice of variables can also be useful in exploiting approximation schemes.

- The significance of the action and Hamilton’s principle can be under- stood from the viewpoint of quantum mechanics. In Feynman’s path integral formulation, each path is assigned the amplitude exp(iS/h¯), where h¯ is Planck’s constant. (It only makes sense to exponentiate a dimensionless quantity. S has dimensions of action = (energy) × (time) = (momentum) × (length), the same ash ¯.) The total amplitude is the sum over all paths. Destructive interference occurs when the action of two paths differs by some- thing comparable toh ¯ or greater. This is howh ¯ sets the scale of quantum effects. At the classical path, the variation of S vanishes, so nearby paths interfere constructively. In the classical limit, the path is thus determined by the condition that S be stationary. You can read about this in the Feynman lectures, for instance.

R 1 2 - What is action? For a free particle motion the action is S = 2 mv dt, which is the average kinetic energy times the total time interval. On the classical path (solution to the equation of motion) v = v0 = const. We can easily show this is the minimum for all paths. In the presence of a potential, the action is still a minimum on the classical path, provided the two times are close enough. For a harmonic oscillator, ”short enough” means less than half the period.

- Can change variables freely in describing the configuration of the sys- tem. Example: change from (x1, x2) to (xcm, xrel).

5 3.1 Constraints When the configuration coordinates of a system are constrained by physical conditions, then one can just impose the constraint in the the Lagrangian, eliminating a constrained degree of freedom and omitting the potential that enforces the constraint. This is correct because after imposing the constraint, although the variations of the original coordinates are restricted, they are all valid variations, so the action must be stationary with respect to them, so the corresponding E-L equations must hold. If there are enough E-L equations to determine the time evolution of the remaining coordinates, then the description is complete. Let’s illustrate this with the example of a simple pendulum hanging from a string of fixed length. In terms of spherical coordinates based at the vertex, the mass can move freely in θ and φ, but the r degree of freedom is constrained to be equal to a fixed length r0 by some constraining potential Uconst(r) arising from the microscopic structure of the string. If r is set equal to r0 in the Lagrangian, the θ and φ equations remain valid and they determine the evolution of these coordinates. A more explicit argument goes as follows. The full Lagrangian is

1 2 2 ˙2 2 2 ˙2 L = 2 m(r ˙ + r θ + r sin θ φ ) + mgr cos θ − Uconst(r). (2) The Lagrange equation for r is

0 mr¨ = mg cos θ − Uconst(r). (3)

If we know that the constraint is satisfied at r = r0, then we can just omit Uconst(r) and set r equal to r0 in the Lagrangian. It’s that simple! Note that if we solve the same problem with Newton’s second law, the unknown string tension is one of the forces, so it must be found or at least eliminated by combining the components of Newton’s law. The Lagrangian method never introduces the tension in the first place. Nevertheless, if we want to know the tension, we can still find it using the Lagrangian: if r = r0, the r-equation (3) implies that the force of constraint is

0 − Uconst(r0) = −mg cos θ. (4)

That is, the force of constraint is equal to whatever it must be in order for the r-equation to be satisfied when r = r0. All this generalizes to any system.

6 To further illustrate the ideas, let’s re-do the oscillator problem using Cartesian coordinates (x, y, z). The Lagrangian before the constraint is L = 1 2 2 2 2 2 2 2 2 m(x ˙ +y ˙ +z ˙ ) − mgz. Then the constraint is x + y + z = r0, which can be solved for any one of the three coordinates. Let’s solve it for z, viz. 2 2 2 1/2 2 2 2 −1/2 z = (r0 − x − y ) . Thenz ˙ = −(xx˙ + yy˙)(r0 − x − y ) , so we have

2 1 2 2 1 (xx˙ + yy˙) 2 2 2 1/2 L = 2 m(x ˙ +y ˙ ) + 2 m 2 2 2 − mg(r0 − x − y ) (5) r0 − x − y

Notice that φ doesn’t appear in the Lagrangian (2). It is said to be an ignorable coordinate. The reason it does not appear is that φ translation is ˙ a symmetry of the system. Correspondingly, pφ ≡ ∂L/∂φ, the “generalized momentum conjugate to φ”, is conserved. This is nothing but the angular momentum about the vertical axis.

- Planar pendulum (φ = const) in harmonic oscillator approximation: ¨ in the equation of motion, θ = −(g/r0) sin θ, one may expand sin θ = θ − (1/6)θ3 + ... and drop all but the linear term to get the harmonic p oscillator eqn. for an oscillator with frequency ω = g/r0. The correc- tion term in the equation of motion has relative size (1/6)θ2, which for θ = π/4 (45◦) is only about 0.1, i.e. it’s a 10% correction. Alterna- tively, one make the small angle approximation in the Lagrangian, expanding cos θ = 1 − (1/2)θ2 + (1/4!)θ4 − ... . (Note that the correction in the La- grangian has relative size (1/12)θ2, which is half as large as the correction in the equation of motion.)

- Circular pendulum motion (θ = const): The θ-equation with θ˙ = 0 im- p plies that the angular frequency is g/(r0 cos θ). At θ = 0 this is the same as for the planar pendulum, which makes sense because the circular oscil- lation is the superposition of two planar oscillations, a quarter cycle out of phase. As θ approaches π/2 the frequency goes to infinity. This makes sense because the tension must go to infinity in order for the vertical component of the tension force to balance the vertical gravitational force.

- Pendulum with sliding pivot point: consider a standard planar pen- dulum, but with the pivot point at the top free to slide in the horizontal direction. Then the configuration is described by two coordinates, e.g. the

7 horizontal position of the pivot point and the angle of the pendulum from the vertical. We wrote out the Lagrangian for this system.

- Extended bodies: can think of this as a huge number of particles, con- strained by atomic forces so that the whole system has only a few degrees of freedom. As an example I considered a “physical pendulum”, i.e. a solid body pivoting around a fixed axis in a gravitational field. The kinetic energy P 1 2 can be written as a sum over all the mass elements of the body, T = 2 mivi . th ˙ If ri is the distance of the i mass element from the axis, its speed is riθ. 1 ˙2 P 2 So T = Iθ , where I = miri is the moment of inertia. Similarly, the 2 P potential energy can be written as a sum U = migyi, where yi is the vertical component of the position vector of the ith mass element. Now P miyi = Mycm, where M is the total mass and ycm is the vertical compo- nent of the center of mass position. Moreover, ycm = `(1 − cos θ), where ` is the distance from the axis to the center of mass. So the Lagrangian for the 1 ˙2 pendulum is L = 2 Iθ − Mg`(1 − cos θ). By comparison with a harmonic oscillator Lagrangian, we can read off the oscillation frequency of this pen- dulum, ω = pMg`/I. For example, for a uniform rod of length R, the CM R R 2 1 3 is in the center, so we have ` = R/2. Also, I = (M/R) 0 dx x = 3 MR , so ω = p3g/2R.

3.2 Effective potential; example of spherical pendulum How to set up the problem if there is motion in both the θ and φ directions. Write out both equations of motion. The φ equation will be the the angular momentum conservation law, and enables one to solve for φ˙ in terms of the ˙ conjugate momentum pφ and θ. Then this can be used to eliminate φ from the θ equation, reducing the θ motion to a one dimensional problem with an effectve potential Ueff (θ). Since the energy is conserved, it’s simpler to just eliminate φ˙ from the expression for the total mechanical energy, and to identify the effective potential by its appearance in the energy expression. Setting the time derivative of the energy to zero we can always recover the θ equation of motion. Important note1 : you cannot substitute for φ˙ in terms of pφ in the Lagrangian before finding the θ equation. This would introduce θ dependence that is different from what was in the Lagrangian. It’s incorrect, ˙ because this extra θ dependence comes from the relation between φ and pφ,

8 treating the arbitrary conserved pφ as a constant. Important note2 : Another way an effective potential can arise is if the φ coordinate is constrained to rotate at a given angular velocity by an external agent. Then the angular part of the kinetic energy behaves like a term in the effective potential of the 1 2 2 form − 2 mΩ r .

- small oscillations of the spherical pendulum: We showed before that for any fixed θ0 there is a circular motion, with some associated angular momen- tum. Now you can perturb that motion to introduce an oscillation, whose 2 00 frequency will be determined by ω = Ueff (θ0).

3.3 Spinning hoop Made several points about this.

1) The mass drops out of the equations of motion. It affects the forces of constraint, but as the Lagrangian is proportional to m, not the equa- tions of motion. This derives from the fact that both the inertia and the force of gravity are proportional to m. This is of course a special property of gravity.

2) We can choose units with m = g = R = 1. This simplifies the equa- tions, but you loose the ability to check your algebra with dimensional analysis. You put the m, g, R back in at the end using dimensional analysis.

3) Went over the solution of the problem of small oscillations about the 00 equilibrium points in detail. Showed how the evaluation of Ueff (θ0) is simplified by writing Ueff (θ) as a product of factors, one of which vanishes at each equilibrium point. Only the derivative of the latter 00 factor survives when evaluating Ueff (θ0).

4 Hamiltonian and Conservation of energy

Momentum and angular momentum conservation derive from space trans- lation and rotation symmetry respectively. Energy conservation arises from time translation symmetry. We derived the conserved quantity that arises

9 from time translation symmetry of the Lagrangian. If there is no explicit t dependence in L, then the “Hamiltonian”,

i H = piq˙ − L, (6)

i is conserved. Here the index i appears twice, once on pi and once onq ˙ . We use the Einstein summation convention according to which repeated indices appearing in the same term (i.e. on multiplied objects) are summed over all their values. What is the meaning of H? For a Lagrangian of the form 1 i j 1 i j L = 2 Aij(q)q ˙ q˙ − U(q) we find H = 2 Aij(q)q ˙ q˙ + U(q). So if the kinetic 1 i j energy is T = 2 Aij(q)q ˙ q˙ , then H = T + U is the total mechanical energy.

Index gymnastics: In deriving the form of H in the previous paragraph, we went through some index gymnastics.

We considered a relatively simple example where H is not the total me- chanical energy: the bead sliding on a hoop driven by an external torque to rotate at constant angular frequency ω. The Lagrangian is

1 2 ˙2 1 2 2 2 L = 2 mR θ + 2 mω R sin θ − mgR(1 − cos θ). The second term is the azimuthal part of the kinetic energy, but it contains no time derivatives of the generalized coordinate θ, so shows up as a con- tribution to the effectve potential Ueff (θ). This means that H is not the total mechanical energy, but rather the total mechanical energy minus twice the azimuthal kinetic energy. It makes sense that mechanical energy is not conserved, since the driver of the rotation of the hoop puts energy into the particle motion. And the orientation of the constraint forces is imposed by external time dependence, so the system really has time dependence, even though the Lagrangian for the generalized coordinate does not. Also, angular momentum is not conserved, since the hoop at each instant is an external constraint that violates rotational invariance. So what is H, this conserved quantity. Is there a symmetry that it corresponds to??

5 Properties of the action

Free particle at rest: v = 0 path has the minimum action, S = 0.

10 Freely falling particle in uniform gravitational field: minimum action neg- ative, from up and down motion. If particle goes up a height h, both v and U scale proportional to h, but T scales as h2. So for small enough h, the Lagrangian T − U will be negative. The h that gives minimum for constant velocity up and down happens to be the same as the h that gives the height of the classical path. (Can you find an argument showing that this must be the case?)

If you bring in circular orbits then, for a sufficiently long time interval, there is a second path, the circular orbit. The action on that path is a saddle point of the action, not the minimum.

Ambiguity of the Lagrangian: You can add a total time derivative without changing the equations of motion, because the action for L + df/dt is the action for L plus [f(t2) − f(t1)]. With fixed endpoints, these actions differ by a constant (asuuming f = f(q, t) depends on q and t but not on time derivatives of q), so they have the same stationary points. A nice example is in the homework, of the pendulum in an accelerating elevator.

Change of inertial frame (Galilean transformation): What is the change of the action when you change inertial reference frames? The defini- 0 tion of kinetic energy changes: the velocity wrt the new frame is v = v − v0, where v0 is the velocity of the new frame wrt the old one. The kinetic energy in the new frame is therefore

1 02 1 2 1 2 2 mv = 2 mv − mv0v + 2 mv0. The difference of the two definitions of kinetic energy is a total time deriva- 0 1 2 tive: T = T +df/dt, with f = −mv0x(t)+ 2 mv0t). The definition of potential energy doesn’t change since it is just a function U(x, t) of position in space and time, which makes no reference to a particular frame. (Of course the formula for it would look different when written using the new coordinate.) So the Lagrangian changes by a total time derivative, so the action changes by a constant, for fixed endpoints.

Using this we can argue that the free particle motion at constant velocity minimizes the action: go into the reference frame where the velocity is zero, where clearly the action is minimized.

11 6 Electromagnetic field

Lorentz force: F = q(E + v × B). (7) Maxwell’s equations (in SI units):

∇ · E = ρ/0 Gauss’ law (8) 1 ∇ × B − c2 ∂tE = µ0j Ampere-Maxwell law (9) ∇ · B = 0 no magnetic monopoles (10)

∇ × E + ∂tB = 0 Faraday’s law (11)

6.1 Scalar and vector potentials For static magnetic fields we have ∇ × E = 0, hence there exists a scalar V such that E = −∇V . The electrostatic potential energy of a charge is then qV , which can be used in the Lagrangian to get the equation of motion. But if the electric field has a part that is induced by a changing magnetic field, then ∇ × E 6= 0, so E is not the gradient of a scalar. Moreover, how is a static magnetic field, incorporated into the Lagrangian?

The absence of magnetic poles (10) implies that there exists a vector potential A such that B = ∇×A. In terms of the vector potential, Faraday’s law (11) becomes ∇ × E + ∂t∇ × A = 0. Since the partial derivatives in ∂t and ∇ commute, this can also be expressed as ∇ × (E + ∂tA) = 0. This implies that there exists a scalar potential V such that E + ∂tA = −∇V . Thus the fields can be written in terms of potentials as

B = ∇ × A, E = −∇V − ∂tA. (12) To arrive at this we used the homogenous Maxwell equations (10,11) that do not involve the charge and current density source terms. Conversely, if E and B are defined in terms of potentials via (12) then the homogenous Maxwell equations hold automatically. The potentials are not unique: one can make a gauge transformation to new potentials 0 0 A = A + ∇f, V = V − ∂tf (13) which yield the same B and E for any function f. This is called gauge invariance of the fields.

12 6.2 Lagrangian for charge in an electromagnetic field What about the action? The electromagnetic coupling term should be

1) a scalar - the action is a scalar

2) linear in the potentials - since the Lorentz force is linear in the fields

3) gauge invariant - since the Lorentz force (7) involves only the fields, not the potentials

There must be a term in the Lagrangian like in the electrostatic case, −qV . This is a scalar and linear in V . However it is not gauge invariant: V changes by −∂tf (13), so the Lagrangian changes by q∂tf. If this were a total time derivative it would not change the equations of motion, because it would only change the action by a constant independent of the path. However, it is only the partial derivative. It doesn’t include the derivative wrt the t-dependence in the x argument of f(x(t), t):

d ∂ dx f(x(t), t) = f + · ∇f. (14) dt ∂t dt So this can’t be the whole story. Something else in the Lagrangian must generate the second term on the right hand side of (14), so that the action will be gauge invariant. In fact, a vector potential term q(dx/dt) · A is just what the doctor ordered, since under a gauge transformation (13) A changes by ∇f! So we seem to have no choice but to define the electromagnetic part of the Lagrangian as

Lem = q(v · A − V ), (15) where v = dx/dt is the charge’s velocity vector. Note that the action defined by (15) satisfies the three conditions listed above. And, indeed, the Euler- Lagrange equations for a particle with L = T + Lem are equivalent to the Lorentz force law (7). It’s quite remarkable that the requirement of gauge invariance is so powerful. This sort of reasoning, applied to a notion of gauge invariance where the fields are matrix-valued, is what guided physicists to the structure of the standard model of particle physics.

13 7 Lagrange multipliers and constraints 8 Tidal force and potential

One can understand the form of the tidal force once and for all, in terms of the Taylor expansion of the potential. The gravitational force is F = −∇U, where U is the gravitational potential energy of the particle. Tides are caused by the variation of this force from place to place. The rate of variation at a point is given by the gradient of the force, but the force is a vector, so this really means the gradient of each of the components of F. But then this gradient is a vector whose components are vectors...i.e. it is a tensor. This may sound obscure, but it is really simple if we use Cartesian coordinates and the index notation. Since we are talking about gravity, it’s nice to work instead with the local gravitational acceleration field g = F/m, since that’s the same for all test masses on which the force may act. Correspondingly, let’s denote U˜ = U/m. Then g = −∇U˜, which in index notation is ˜ gj = −∂jU. (16)

The gradient of the acceleration is then just the tidal tensor ˜ ∂igj = −∂i∂jU, (17) i.e. just the second partial derivatives of the potential. To evaluate the lunar tidal force on the Earth, we need to know how the gravitational force or acceleration differs at different points on the Earth. Since these points are all close to the center of the Earth, compared to the Earth-Moon distance, it makes sense to expand the potential around a point at the center of the Earth in a Taylor series. Let’s call the vector from the Moon to the Earth d0. The Taylor expansion is then

˜ ˜ ˜ i 1 ˜ i j U(d0 + s) = U(d0) + [∂iU(d0)]s + 2 [∂i∂jU(d0)]s s + ... (18) and the corresponding expansion for the acceleration is

˜ ˜ j gi(d0 + s) = −∂iU(d0) − [∂i∂jU(d0)]s + .... (19)

The first term is constant, like a uniform gravitational field. This is what we subtract out when working in the local freely falling, accelerating frame.

14 The remainder is linear in the dispacements sj, and depends on direction. To evaluate the derivatives of the 1/r potential all we need is ∂jr = rj/r, j j −1 −3 and ∂ir = δi . Thus ∂jr = −r rj, and so 1 ∂ ∂ = r−3(3ˆrirˆj − δij). (20) i j r Note that the trace of the left hand side is the Laplacian of 1/r which van- ishes, and indeed the trace of the right hand side vanishes. Up to a coefficient, this is the tidal potential of the field of a point mass, and it is a good approx- imation to the tidal potential of the Moon at the location of the Earth. The next term in the Taylor expansion is smaller by a factor ∼ Re/d0 ∼ 1/60. The tidal potential of the Moon at the Earth is thus

GMmm ˆi ˆj ij i j Utidal(d0 + s) = − 3 (3d0d0 − δ )s s (21) 2d0 which for points on the surface of the Earth evaluates to 2 GMmmRe ˆ 2 Utidal(d0 + s) = − 3 [3(d0 · ˆs) − 1] (22) 2d0 For s along the Earth-Moon line the factor in the square bracket is equal to 2, while for s perpendicular to that line it is −1. This agrees with the expressions in (9.16) and (9.17) of Taylor (though those include the potential at the center of the Earth as well).

9 Velocity in a rotating frame

Let’s see how to describe motion in a rotating frame, and how that is related to the description in an inertial frame. To keep the formulas simple and explicit, let’s assume the rotation is fixed around the z-axis, Ω = Ωzˆ. The inertial orthonormal basis vectors are xˆ0, yˆ0, and the rotating basis vectors are xˆ, yˆ. The latter satisfy xˆ˙ = Ω × xˆ = Ωyˆ (23) yˆ˙ = Ω × yˆ = −Ωxˆ. (24)

The position vector r0 = r can be expressed using either the inertial or the rotating basis vectors xˆ, yˆ:

r0 = x0xˆ0 + y0yˆ0 = xxˆ + yyˆ. (25)

15 The velocity vector in the inertial frame is by definition

v0 = r˙ 0 =x ˙ 0xˆ0 +y ˙0yˆ0. (26) What one means by the velocity vector in the rotating frame is

v =x ˙xˆ +y ˙yˆ (27)

Note here the important fact that although the unit vectors are not constant in time, they are not differentiated in the definition of v. In fact if they were, then v would be identical to v0. Hence the relation between the two velocities is ˙ ˙ v0 = v + xxˆ + yyˆ = v + Ω × r. (28)

Note there is an inconsistency in the notation: since r = r0, it must be that r˙ = r˙ 0. However, the notation r˙ is used for what I’ve called v above, and v 6= v0. The upshot is that the notation r˙ is, according to me, being used inconsistently, if the dot means the same thing in both equations. The point of course is that the dot does not mean the same thing... oh well. It’s potentially confusing and you have to be careful...

10 Special Relativity

To write the Lagrangian or Newton’s second law we use certain structures that are assumed present in space and time in order to define velocity, speed, and the action: 1) absolute time function, 2) metric of spatial distance at one time, 3) family of inertial frames. In place of 3), Newton introduced an absolute standard of rest, but Newto- nian physics depends only on the family of inertial frames, not on which one of those frames is used as the standard of rest. The reason he did this is that it is much simpler, in fact trivial, to specify mathematically. The math needed to define a family of inertial frames without selecting one of them as preferred is much more subtle in Newtonian physics and was not available to Newton. In special relativity, all of these structures are unified into one, the spacetime interval. Before we get to the quantitative aspects of relativity, let’s discuss the qualitative aspects...

16 The key fact giving rise to special relativity theory, historically, is that the speed of light as described by electrodynamics, and measured by exper- iments, does not depend on the speed of the source. This alone could have been accounted for by supposing that there is a preferred frame, the rest frame of the aether, in which the light is propagating. However, nothing else in electromagnetism suggested that the theory has a preferred frame, and in fact the symmetry group of Maxwell’s equations is the Lorentz group, consisting of rotations and “boosts”, i.e. velocity changes. Moreover, noth- ing in Newtonian mechanics indicated the existence of a preferred frame. And when people looked both experimentally and theoretically for preferred frame effects in electrodynamics, they found none. As we’ll see, the apparent contradiction between the relativity of inertial motion and the absoluteness of the speed of light is reconciled in special relativity. Since the speed of light is independent of the source, the paths followed by light rays is space and time trace out an absolute structure that is a property of spacetime. This can be visualized as a lightcone at each spacetime event. Instead of an absolute time slicing of spacetime like in Newtonian physics, we have an absolute family of light cones. At an event p, the inside of one half of the lightcone is the future, the inside of the other half is the past, and the rest is the elsewhere. Points in the future or past of p are timelike related to p, points in the elsewhere are spacelike related to p, and points on the cone are lightlike related. The point p can only be influenced by events inside or on its past lightcone, and can only influence events inside or on its future lightcone. So the lightcones define the causal structure of spacetime. In Newtonian physics, the causal structure is defined by the absolute time function. In Newtonian spacetime, events at the same absolute time are simulta- neous. In relativity, there is no absolute meaning of simultaneity. A given observer can use radar to define a notion of simultaneity, but that notion will depend on the observer. Since all inertial observers are equivalent, there is no preferred definition of simultaneity. Spacelike related points are always “simultaneous” as defined by some observers and not by others. Timelike or lightlike related points are never simultaneous as defined by any observer. Diagrams illustrating the relativity of simultaneity, and contrasting New- tonian and relativistic spacetimes are at http://www2.physics.umd.edu/%7Ejacobson/171c/simul.jpg

17 10.1 Spacetime interval 10.1.1 Proper time In special relativity (SR), the elapsed time between two events in itself is not defined. Elapsed time is a property of a timelike path in spacetime. It should be thought of as “arc length” along a timelike curve. It is called the proper time along that curve. Different paths connecting the same pair of events in general have different elapsed proper times. This is called the twin “paradox”, but it is only paradoxical from a Newtonian absolute time perspective. From the SR point of view it is perfectly natural. The analogy with path length in Euclidean geometry is perfect: there is nothing paradoxical about the fact that different curves connecting the same two points can have different lengths.

10.1.2 Pythagorean theorem of spacetime The spacetime interval, or just “interval”, is the thing that determines the structure or “geometry” of spacetime. It defines the lightcone, proper times, lengths, and the inertial structure as well. Logically, one should just postulate it, and derive consequences. But we can also infer its form and its properties, just by appealing to the postulates of relativity and applying them to what are assumed to be inertial motions, which are a preferred class of timelike paths in spacetime. We also call these paths “inertial observers”. We assume that spacetime has the same properties at any location. Consider two inertial observers O1 and O2 who pass through the same event E and are moving relative to each other (see Figure 1). Let the zero of proper time (hereafter often just called “time”) correspond to the event E for both O1 and O2. At time t1 from E along his worldline, O1 sends a light pulse to O2. The pulse is received at event F at time t0 on O2’s worldline, and the reflected pulse arrives back to O1 at t2. Then the principle of relativity implies that t0/t1 = t2/t0. (29)

The reason is that each pair of times, (t1, t0) and (t0, t2), is defined by a sim- ilar protocol: the relative motion is the same, and the events are connected by a light pulse. The only thing that changed is the length of the initial time interval, t1, or t0. Since no observer is preferred or distinguished, and

18 Figure 1: Two inertial worldlines exchanging lightrays. t0 is the time along O2 from E to F. t1 and t2 are the times along O1 from E to the sending of the light ray and to the receiving of the return light ray, respectively. The assumptions that all inertial motions are equivalent, that the speed of light is independent of the 2 source, and that spacetime is homogeneous, imply that t0 = t1t2. spacetime looks the same everywhere1 it must be that these times have the same ratio, otherwise one of the observers could be distinguished as the one with the smaller ratio. The ratio characterizes the relative motion. (It is the reciprocal of the Doppler shift factor for light, as shown in a homework problem.) Using the equivalence of the ratios (29), we can infer the “radar relation” between the time measurements of O1 and O2, namely,

2 t0 = t1t2. (30) From the radar relation we can deduce how the time and space separations assigned to the events E and F by O1 are related to O2’s proper time interval

1I need to do a more complete job of explaining how this implies that the length of the initial time interval cannot affect the ratio. I will revise the notes later.

19 t0. O1 would define the “time separation” ∆t of the events E and F to be the time at the event G that lies halfway in between t1 and t2, i.e.

∆t = (t1 + t2)/2. (31)

Similarly, O1 would define the “distance” ∆x from himself to F, when he is at G, by the light travel time (t2 − t1)/2 times the speed of light c, i.e.

∆x = c(t2 − t1)/2. (32)

We can invert these definitions to find t1 = ∆t − ∆x/c and t2 = ∆t + ∆x/c, so (30) implies 2 2 2 t0 = ∆t − (∆x/c) . (33)

Thus the proper time t0 of O2 along the direct path from E to F can be expressed in terms of the ∆t and ∆x coordinate increments, conventionally defined by O1, by a kind of spacetime Pythagorean theorem.

10.1.3 Time dilation and the twin effect

According to (33), the proper time t0 measured by O2 along his own path is shorter than the time ∆t assigned to that path by O1. This is the called the time dilation effect. Note that there is nothing paradoxical about the fact that the two observers come up with different times, since they are measuring different things: O2 measures the proper time along his path from E to F, while O1 measures the proper time along his path from E to G. Suppose now that O2 suddenly fires a rocket at F and returns to the inertial path of O1 at H, having travelled along a new inertial path. On this return path O2 will again measure less proper time than O1 assigns, so the total round trip proper time for O2 along the path EFH will be less than the proper time for O1 along EH. This this the twin effect, which is often called the “twin paradox”. It is not a paradox however, because the two proper times refer to different paths.

10.1.4 Spacetime geometry There is of course nothing special about O1. Another inertial observer O3 would define different coordinate increments ∆t0 and ∆x0 to the displacement from E to F, but must get the same combination for the right hand side of

20 2 (33), since it must be again equal to the square of O2’s proper time, t0. That is, ∆t02 − (∆x0/c)2 = ∆t2 − (∆x/c)2. (34) This invariant combination of coordinate increments is called the “(squared) spacetime interval”. Its meaning is the proper time along the direct path between E and F. Sometimes the spacetime interval is just called the “interval”, and some- times the “invariant interval”. Sometimes it is defined with the opposite sign, and sometimes multiplied by c2 (hence given in length rather than time units), or both. For timelike displacements the squared interval is positive as I’ve defined it, while it is negative for spacelike and zero for lightlike ones. For spacelike displacements, the meaning of the interval is −c−2 times the square of the spacelike distance, as assigned by an observer for whom the time separation is zero. The interval defines the geometry of spacetime in special relativity. It is quite remarkable and deep that the structure defining time and length in rel- ativity is one and the same structure. In Newtonian spacetime, by contrast, time intervals are defined by the universal time function, and space intervals are defined by a completely distinct structure, namely a Euclidean geometry on each constant time surface. However, although they are interconnected, time and length are not on an equivalent footing in relativity. Time intervals are primary, and directly measurable by a single clock, while space intervals are calculated from radar time measurements. The unification of space and time geometry comes about by reducing spatial intervals to temporal ones.2 A lightlike displacement can be seen either as a limit of timelike displace- ments, or a limit of spacelike ones, so it would make no sense for it to have either an associated proper time or a length. In fact, a lightlike displacement has ∆x = c∆t, so the interval along it is zero, which is the only value that can consistently refer to both a time and a length! For infinitesimal displacements, and including all three dimensions of space, the spacetime interval takes the form

ds2 = dt2 − dl2/c2, (35) where ds2 denotes the squared interval, and dl denotes the the spatial dis- placement distance, which could be in any spacelike direction. The assump-

2A better name for spacetime might therefore be “timespace”, but I only know one physicist who uses this term (from whom I learned it): David Finkelstein.

21 tion that the properties of spacetime are invariant under translations as well as rotations around any point implies that the spatial line element dl2 de- termines a flat, Euclidean geometry. This in turn implies that there exist coordinates x, y, z in terms of which the interval takes the form

ds2 = dt2 − (dx2 + dy2 + dz2)/c2. (36)

The coordinates {t, x, y, z} are called Minkowski coordinates, and are analo- gous to the Cartesian coordinates of Euclidean geometry. A different inertial observer would construct a different set of Minkowski coordinates. The re- lation between the different inertial coordinates is given by a Lorentz trans- formation.

10.1.5 Velocity and time dilation So far the word “velocity” has not figured in anything I said. O1 would define the velocity of O2 as v = ∆x/∆t. In terms of v, the square root of (33) becomes p 2 2 t0 = ∆t 1 − v /c , (37) which is the famous relativistic time dilation formula: the proper time t0 measured by O2 along his own path is shorter than the time ∆t assigned to that path by O1. Notice something quite shocking that emerges from this analysis: for a fixed ∆t, t0 goes to zero as v approaches the speed of light. This means in particular that the proper time on the path EFH in the twin effect discussion can be arbitrarily close to zero. Also, no proper time passes along the path of a light ray.

10.2 Inertial motion So far I’ve treated the notions of “inertial motion” and “proper time” in an axiomatic way, like the “straight line” and “length” of axiomatic Euclidean geometry. In the geometry setting, you know that a straight line is the shortest path between two points, so the concept of straight line can be taken as secondary, being defined in terms of its length property. Similarly in spacetime, an inertial motion can be defined in terms of its proper time property.

22 The twin effect discussed in section (10.1.3) shows that the inertial path taken by O1 from E to H has greater proper time than the broken path EFH taken by O2. This can be generalized to the statement that the inertial path has longer proper time than any other path. The inertial motions can thus be characterized as those that maximize the inertial time between two events. So no extra structure is needed to characterize inertial motion, since it is already determined by the spacetime interval. This makes even more striking the economy of structure in relativity compared to Newtonian mechanics, where not only are time and space determined by entirely separate structures, but inertial structure is yet another independent ingredient. In relativity, everything needed for mechanics comes from the interval. It seems that an essential aspect of the progress of physics is the economizing of structure. Since the proper time between two events is maximized on an inertial path, the variation of the proper time must be zero when the path is varied away from an inertial path. This variational principle leads us to the rela- tivistic action and Lagrangian for a free particle, which allows us to identify the relativistic notion of energy. Let’s see how this works. The proper time along an arbitrary smooth√ path in spacetime is the integral of the proper time increment ds = ds2, Z proper time = ds (38)

In terms of a particular Minkowski coordinate system {t, x, y, z}, the interval is given by (36), so the proper time can be expressed as an integral over the coordinate time t, v u " 2 2 2# Z u dx dy  dz  1 proper time = dtt1 − + + , (39) dt dt dt c2 with the spacetime path specified by the three coordinate functions {x(t), y(t), z(t)}. Stationarity of the proper time with respect to path variations implies the Euler-Lagrange equations. These equations imply dxi γ = const. (40) dt where xi ↔ {x1, x2, x3} = {x, y, z} labels the three coordinates, and 1 γ = (41) p1 − v2/c2

23 is the relativistic gamma factor, with v2 = vivi, and vi = dxi/dt. (42) Eq. (40) implies that all three components of the coordinate velocity are constant. This establishes that inertial motions have constant Minkowski coordinate velocity components.

10.3 Action What can the action for a relativistic free particle be? It should be a rela- tivistic invariant, and the Lagrange equation should imply that the motion is inertial. We have just seen that the proper time functional has both of these properties. However, the proper time does not have dimensions of action. There is another reason the proper time cannot be the correct action: it does not depend on the mass of the particle. We know that the action in non-relativistic mechanics does depend on the particle mass, via the kinetic 1 2 energy term 2 mv , and this kinetic energy term must arise in a slow motion limit of the relativistic theory. Both of these problems are solved if we define the relativistic action to be −mc2 times the proper time: Z S = −mc2 ds. (43)

The action as just written does not refer to any particular inertial frame, because the interval ds is a “manifestly invariant” quantity. Nonetheless, we may express the action in terms of the time and distance measurements in a particular inertial frame by using (36). This yields Z S = L dt, L = −mc2p1 − v2/c2, (44) where L is the Lagrangian in the given inertial frame. To discover the relation to the non-relativistic action, we should expand the square root in powers of v2/c2: r v2 1 v2 1 v4 1 − = 1 − − + ... (45) c2 2 c2 8 c4 The expansion of the relativistic Lagrangian (44) is thus

2 1 2 1 4 2 L = −mc + 2 mv + 8 mv /c + .... (46)

24 The non-relativistic kinetic energy appears as the lowest order velocity de- pendent term, and the higher powers of velocity are relativistic corrections. The velocity independent term −mc2 can only be interpreted as minus a constant potential energy associated with the mass of the particle. That is, just for showing up, a particle of mass m has an energy mc2. And this really is potential energy: if the particle decays to other particles, or annihilates with its antiparticle, some or all of this energy can be liberated as kinetic energy, for example. This is called the rest energy. Often m is called the rest mass.

10.4 Energy and momentum The momentum conjugate to xi is ∂L p = = γmv , (47) i ∂vi i where γ is the “gamma factor” defined in (41). When the speed of the particle is much less than the speed of light, this reduces to the nonrelativistic momentum mvi, and when the speed approaches the speed of light this goes to infinity. The energy can be computed as the value of the Hamiltonian, ∂L H = vi − L = γmv2 + mc2/γ = γmc2(v2/c2 + 1/γ2) = γmc2, (48) ∂vi hence E = γmc2. (49) When the velocity is zero this is just the rest energy, and when the velocity approaches the speed of light this diverges. To identify the non-relativistic limit we should expand γ in powers of v2/c2: 1 1 v2 3 v4 γ = = 1 + + + .... (50) p1 − v2/c2 2 c2 8 c4 Note that all the terms in the series have positive coefficients. Thus 2 1 2 3 4 2 E = mc + 2 mv + 8 mv /c + .... (51) The usual kinetic energy is recovered as the lowest order v-dependent term, and the relativistic correction terms are all positive. The relativistic kinetic energy T is everything but the rest energy, T = E − mc2 = (γ − 1)mc2. (52)

25 10.4.1 Relation between energy and momentum Just as the non-relativistic kinetic energy can be expressed in terms of the momentum as E = p2/2m, it follows from (47) and (49) that the relativistic energy and momentum are related in a simple way: E2 − p2c2 = m2c4 (mass shell formula) (53) Note that while the values of E and p depend on the inertial reference frame, the mass m, which we introduced in the action as an invariant, can always be computed from them using the mass shell formula. This is closely analo- gous to the situation with the proper time: while dt and dxi depend on the reference frame, the squared proper time ds2 = dt2 − dx2/c2 has an invariant meaning and can be computed from dt and dx in any reference frame. Another useful relation between momentum and energy that follows im- mediately from (47) and (49) is pi = (E/c2)vi. (54) In the non-relativistic limit we can replace E by the leader order term, mc2, so this reduces to pi = mvi. It is sometimes useful to use this to express the velocity directly in terms of the energy and momentum, vi = pic2/E. (55)

10.4.2 Massless particles As the mass m approaches zero, the energy and momentum vanish unless the speed approaches the speed of light, so that the product γm remains finite. In this limit, the mass shell formula reduces to E = |p|c, (56) so that energy and momentum are proportional. In this limit, (54) becomes pi = (E/c)ˆvi, (57) wherev ˆi is a unit vector in the direction of the velocity. Since massless particles always travel at speed c, their speed cannot de- termine their energy. A photon is a massless particle. According to quantum mechanics, the energy of a photon with frequency ω ishω ¯ , and its momentum ish ¯k, where k is the wave vector. The 4-momentum is thus p =hk, ¯ (58) where k = (ω, ck) is the wave 4-vector.

26 10.5 4-vectors and the Minkowski scalar product The set of spacetime displacements forms a four dimensional real vector space. These vectors are called 4-vectors. Shortly we will consider other 4- vectors, but the displacements are the prototype 4-vectors so let’s start with these. In a given inertial frame, a displacement can be specified by giving its time component and its spatial components. An infinitesimal displacement can thus be written in a given frame as

ds = (dt, dx/c), (59) where dt is the time displacement and x is the spatial displacement. In or- der for all the components of the vector to have the dimensions of time, the spatial displacement is here divided by c. (It’s not essential that we give all components the same dimension, but it allows us to define the Minkowski scalar product below without any factors of 1/c.) The line under ds√is in- cluded to distinguish the displacement 4-vector from the scalar ds = ds2. Any finite displacement can be built up by adding infinitesimal displace- ments. This is just as in three Euclidean spatial dimensions, with an addi- tional dimension for time tacked on. More generally, a 4-vector A can be specified in a given frame by its temporal and spatial components,

A = (At, A), (60) where At is a spatial scalar (i.e. it is invariant under spatial rotations of the frame) and A is a spatial vector.

The object A qualifies as a bona fide 4-vector if, under a coordi- nate transformation, its components change in the same way as the components of a displacement vector.

Note that this implies in particular that any linear combination of 4-vectors with invariant scalar coefficients is a 4-vector. The spacetime (squared) interval ds2 = dt2 − (dx · dx)/c2 is a scalar associated with the displacement 4-vector, and it is invariant, that is, it has the same value whatever inertial frame is used. It motivates the definition of a scalar product between 4-vectors, the Minkowski scalar (or dot) product,

A · B = AtBt − A · B. (61)

27 With this notation, the squared interval can be written as the scalar product of the displacement (59) with itself, ds2 = ds · ds = dt2 − dx · dx/c2. (62) We have already seen that the interval is invariant, but how about the scalar product A · B between two displacement 4-vectors A and B? That is, if we evaluate the right hand side of (61) in two different inertial frames, will we get the same result? Indeed we will. We know that A·A is invariant for all displacements A. Thus in particular, (A + B) · (A + B) is invariant. But the dot product defined by (61) is distributive over addition, and commutative, 1 so (A + B) · (A + B) = A · A + B · B + 2A · B. That is, A · B = 2 [(A + B) · (A + B) − A · A − B · B]. All terms on the right hand side are invariant, so evidently A · B is invariant. More generally, Invariance of the scalar product holds for any pair of 4-vectors.

10.5.1 Conventions There are different conventions about 4-vectors. Taylor prefers to write the spatial vector first, and he defines the inner product with the opposite sign from (61). That is, Taylor would write A = (A,A4), and for him A · B = A·B−A4B4 (15.50, Taylor). Also, he likes to give the 4-vector the dimensions one would naturally have assigned to the spatial 3-vector. For example, for an infinitesimal spacetime displacement he would write (dx, c dt).

10.5.2 “Look Ma, no Lorentz transformations” Just as we rarely use rotations explicitly in non-relativistic mechanics, but instead make wise choices of coordinate systems and use rotational invariant quantities like magnitudes of vectors and angles between vectors, we rarely need to use Lorentz transformations to relate the components of 4-vectors in different reference frames. To simplify our lives, and focus on the most useful things, I may completely skip any discussion of Lorentz transformations.

10.6 4-momentum and 4-velocity Energy and momentum together form the components of the 4-momentum, which is sometimes called the energy-momentum 4-vector, p = (E, pc). (63)

28 It is convenient to use the letter p for the 4-momentum. When I want to refer to the magnitude of the 3-momentum and there could be some confusion, I will write |p|. The scalar product of p with itself is

p · p = E2 − p · p c2. (64)

For a single particle of mass m, the 4-momentum is given by

p = (γmc2, γmcv). (65)

It is revealing to express this directly in terms of spacetime displacements. The key step is to note that the gamma factor is the derivative of coordinate time with respect to proper time: dt dt 1 = = = γ. (66) ds pdt2 − dx · dx/c2 p1 − v2/c2 It follows that dx dt dx = = γv. (67) ds ds dt That is, the velocity with respect to proper time is γ times the velocity with respect to coordinate time. The 4-momentum of a particle (65) can therefore be expressed as p = mc2 u, (68) where u is the 4-velocity, dt dx/c u = ds/ds = ( , ) = γ(1, v/c). (69) ds ds The 4-velocity is proportional to the infinitesimal displacement ds, with pro- portionality factor given by the scalar quantity 1/ds, so it is a 4-vector. The 4-momentum is proportional to the 4-velocity, with scalar coefficient mc2, so it too is a 4-vector. According to the mass shell formula (53) for a single particle together with (64), we have p · p = m2c4. (70) The mass shell formula could also be derived directly from (68), since the 4-velocity is a unit 4-vector:

u · u = (ds/ds) · (ds/ds) = (ds · ds)/ds2 = ds2/ds2 = 1. (71)

29 The name “mass shell” comes from the fact that the set of vectors p satisfying (70) forms a shell in momentum space. In one space dimension we have E2 = p2c2 +m2c4, which is the equation of a hyperbola in E-p space. Including the other spatial dimensions this becomes a 3-d hyperboloid, whose graph looks like a bowl or “shell”.

10.6.1 Pesky factors of c The best way to handle the ubiquitous, pesky factors of c is to ignore them! We can always choose our unit of length to be c times our unit of time, and in such a system of units we have c = 1. If we want to express things in some other system of units we can always use dimensional analysis to insert the appropriate factors of c where they belong. Hence from here on I will usually set c = 1. (72)

10.6.2 Example: Relativistic Doppler effect Suppose a source S moving with speed v in the x direction emits an electro- magnetic wave, or photon, which is received by an observer O who for whom the radiation propagates at an angle θ from the x direction. What is the observed freqency ωO if the frequency in the source frame is ωS? If θ is 0 or π, then the analysis you did in homework problem S8.1 gives the answer. Let’s see how it works in general. The wave 4-vector is k = ωO(1, cos θ, sin θ, 0) written in the observer frame. Now I claim that the frequency in the source frame can be expressed as

ωS = k · uS, (73) where uS = γ(1, v, 0, 0) is the 4-velocity (69) of the source. To see why, note that the right hand side of (73) is an invariant, since k and u are 4- vectors.3 It can therefore be evaluated in any frame. In the rest frame of S we have uS = (1, 0, 0, 0). Hence ωS is just the time component of k in the frame of S, which is what we mean by the frequency in that frame.

3If we think of k as a 4-momentum ¯hk (58) we may invoke the fact that 4-momenta are 4-vectors. Alternatively, k = (ω, k) determines the phase ωt − k · x of a wave, which can be expressed as a scalar product of k with a displacement, (ω, k) · (t, x). Since the wave phase is an invariant, and the displacement is a 4-vector, k must also be a 4-vector.

30 On the other hand, expressed in components in the frame of O we have ωS = k · us = ωγ(1 − v cos θ). It follows that ω ω = S . (74) γ(1 − v cos θ)

In the homework you show that this agrees with what you found before when θ is 0 or π. Note that when θ = π/2 there is a redshift. This is called the transverse Doppler effect, and is just a reflection of the time dilation effect relating the period measured in the source frame to that measured in the observer frame.

10.7 Zero momentum frame

In non-relativistic mechanics, the center of mass position xcm of a system P is defined by mi(xi − xcm) = 0. The time derivative of this equation states that the total momentum relative to the center of mass vanishes, hence the center of mass frame is also the zero momentum frame. In relativistic mechanics, the zero momentum frame is a very useful concept...but how do we know such a frame exists? Well, the 4-momentum of a particle is a future pointing timelike or lightlike vector, and the sum of any number of such vectors is timelike (unless they are all lightlike and parallel). Thus there exists an observer with 4-velocity parallel to the total 4-momentum P , and for that observer the total spatial momentum vanishes. The frame of that observer is the zero momentum frame, also called the “center of momentum” frame, or even the “center of mass” frame, just so we can use the notation CM instead of ZM. In the CM frame, the total 4-momentum has the form P = (ECM , 0), so

2 P · P = ECM . (75)

This is very useful. Because P · P is invariant, we may compute it in any frame, and the result will always be the square of the energy in the CM frame. Sometimes ECM is called the “invariant mass” of the system.

10.7.1 Example: Head-on vs. fixed target collision energy At the LHC, protons of energy 4 TeV collide head-on. The lab is the CM frame of this collision, and ECM = 8 TeV. All of this energy is available

31 to create particles. Now suppose instead that one proton is a fixed target at rest and the other is moving in the lab frame. Then the system has 3-momentum, which must be conserved, so after the collision some of the incoming energy will just be in translational kinetic energy of the collision products. In this case, how much energy E must the other proton come in with if the CM energy, i.e. the energy available to create particles, is to be 8 TeV? To answer this we can find the total 4-momentum as a function of 2 2 E and the proton mass mp, set P · P = ECM = (8 TeV ), and solve for E. The proton at rest has 4-momentum (mp, 0), and the moving proton has 4-momentum (E, p), so the total 4-momentum is P = (E + mp, p). Thus

2 2 P · P = (E + mp) − p (76) 2 2 2 = E − p + 2Emp + mp (77) 2 = 2Emp + 2mp (78) 2 = ECM , (79) so E2 − 2m2 E = CM p . (80) 2mp

For the example at hand, ECM is 8 TeV which is about 8000 times as large as the proton mass mp, so we can neglect the second term in the numerator, 2 and write E ≈ ECM /2mp. That is, the incoming energy must be a factor ECM /2mp ∼ 4000 times larger than ECM ! It would take 32,000 TeV to achieve the same CM energy 8 TeV as in the head-on collision. So to achieve 8 TeV CM energy it’s absolutely essential that the LHC has head-on collisions rather than fixed target ones. It’s interesting to compare this with the corresponding Newtonian result. If the moving proton has speed v, then in the CM frame it has speed v/2. Thus its kinetic energy in the lab frame is 4 times larger than it is in the CM frame, where it is ECM/2. Hence E = 2ECM , that is, the incoming proton would have to have energy 16 TeV. In relativistic mechanics, it needs 2000 times more energy than it would need in Newtonian mechanics!

10.7.2 Example: GZK cosmic ray cutoff A proton is accelerated to very high energy somewhere far away in the uni- verse, and travels towards the earth. On the way it encounters photons form the cosmic microwave background (CMB). The background is thermal with a

32 temperature T = 2.7 K, so the typical photon energy in the CMB is of order 2.5 × 10−4 eV. If the proton has enough energy, the collision can produce a pion of mass mπ = 135 MeV, robbing the proton of some of its energy. This process limits the energy of cosmic ray protons that can be received at earth coming from far away. Let’s estimate the energy of this “GZK cutoff”. The cutoff lies at the threshold for a head-on collision creating a pion at rest with respect to the proton, so corresponds to a CM energy

ECM = mp + mπ. (81)

If the proton 4-momentum is p = (E, p), and the photon 4-momentum is k = (ω, −ω) (dropping the two irrelevant spatial components that are zero), 2 then the total 4-momentum is P = p + k = (E + ω, p − ω), so ECM = 2 2 2 (E + ω) − (p − ω) = mp + 2ω(E + p). Thus at threshold we have

2 2 2 (mp + mπ) − m m m + m /2 E + p = p = p π π . (82) 2ω ω

p 2 2 Now let’s get specific about numbers. We have p = E − mp, but it would 2 be foolish to keep track of the mp term in the square root, because mπ/ω ∼ 11 11 2 5 × 10 , so the equation implies E ∼ 10 mp. The mp term thus makes a contribution of order only ∼ 10−22 relative to the E2 term! Hence we can set p = E, and solve for E. Also, mπ is about 7 times smaller than mp, so 2 the mπ/2 term is smaller than the mpmπ term by a factor of about 14. So to better than 10% accuracy we have m m E = p π ≈ 2.5 × 1011m ≈ 2.3 × 1020eV. (83) 2ω p This is eight orders of magnitude greater than the LHC collision energy.

10.8 Electromagnetic coupling The action for the coupling of a charge needs no “relativistic correction”, its already perfectly consistent with relativity, which is no accident, since after all it was the properties of electromagnetism that led Einstein to discover relativity. The EM coupling action is given by −q R A·ds, where A = (V, Ac) is the electromagnetic 4-vector potential, and ds = (dt, dx/c) is the spacetime translation and the dot is the Minkowski dot product. [I didn’t say this yet in class, but actually an even better way to say this is that the action is

33 R µ µ −q Aµdx , where there is a summation over the four values of µ, dx are the components of ds, and Aµ = (V, −cAi), the index i being just the spatial part.

11 General relativity

I’ll place here for now my rough notes from last year on general relativity. I haven’t had a chance to rewrite them yet. I touch on GR in the class because it’s so fundamental and so interesting, but we’re not spending more than one and a half classes on it (3 hours)...

Background: Newtonian spacetime structure assumes 1) absolute time t, 2) spatial distance at constant time, 3) absolute rest or family of inertial frames. Instead spacetime in special relativity is fully characterized by the Minkowski line element which determines the proper time along any displace- ment. This encodes time, distance, and inertial structure all in one spacetime geometry. (The inertial motions maximize the proper time.) Now where does gravity fit in to this?

Gravity and inertial force: Einstein focused on the extremely well known fact that the gravitational force is proportional to the mass of the object it is acting on: F = mg, where g(x, t) is the gravitational field. This means that the effects of gravity can be locally removed by using a ”freely falling” reference frame with acceleration g relative to what a Newtonian would con- sider an inertial frame. But Einstein proposed that we should think of it the other way around: the freely falling frame is the inertial one, and then one interprets the gravitational force as an inertial force, due to working in a reference frame with acceleration −g. So, for example, sitting in my chair, I am in a frame accelerating upwards relative to the local inertial frames.

Gravity as tidal field: While the local inertial frames can be identified with the freely falling frames, we must face the fact that these frames are not the same everywhere. For example, at different points near the surface of the earth the free-fall frames are falling inward radially, and the radial direc- tion depends on where you are. Also the acceleration is greater closer to the earth than farther. This is reflected in the simple fact that the derivatives of g are not zero, so that nearby freely falling particles have slightly differ-

34 ent accelerations. You could recognize this in a falling elevator: if release a spherical cluster of particles, as the cluster falls it will deform to an ellipsoid, compressed in the transverse direction and stretched in the falling direction. The true essence of gravity is this ”tidal deformation”. If it weren’t for that, we could just cancel off gravity once and for all by changing the reference frame.

Spacetime curvature and the tidal field: Given that the inertial structure of spacetime is determined in special relativity by the line element, it must be that a spatially varying inertial structure is described by a spatially vay- ing line element, that is, by a deformation of the geometry of spacetime. In fact, the curvature of the spacetime geometry captures the notion of varying inertial structure. As a concrete example, freely falling paths can start out parallel in spacetime, and be pulled together by the gravitational tidal field. That parallel lines do not remain parallel is a sign of curvature. The motion of a test particle in such a spacetime is determined by maximizing the proper time, using the line element of the curved geometry.

Cosmological line element: A simpler example of a curved spacetime is an expanding universe. If we average over the lumpiness this can be described as a homogeneous, isotropic spacetime, with line element

ds2 = dt2 − a(t)2(dx2 + dy2 + dz2)/c2 (84)

The function a(t) is called the scale factor, and it determines how much phys- ical distance corresponds to a given coordinate displacement dx, for example. Before the acceleration of the universe today was discovered, it was believed that a(t) was ∝ t2/3, so that the scale factor was increasing with time with a rate ∝ t−1/3 that was decreasing in time. This would be “decelerated ex- pansion”. Now it appears that infact the expansion rate is increasing. The simplest such increase, that would be caused by a cosmological constant, would be exponential, a ∝ eHt for some constant H, in which case the ex- pansion rate would be exponentially increasing as well.

Spacetime geometry outside a spherical gravitating mass: Einstein’s field equation for spacetime geometry has a unique spherically symmetric, vac- uum solution, up to one parameter corresponding to the mass M. That Schwarzschild metric can be expressed using so-called Schwarzschild coor-

35 dinates as

ds2 = F (r)dt2 − (1/F (r))dr2/c2 − (r/c)2(dθ2 + sin2θ dφ2) (85)

2 where F (r) = 1−rg/r, and rg = 2GM/c = 3km(M/Msun) is the Schwarzschild radius. If M = 0 then F (r) = 1, and this is just the flat spacetime, Minkowski line element in spherical coordinates. At r = rg something goes wrong with the coordinates, but the spacetime is fine there. This line element describes a black hole event horizon at r = rg. For a star, the stellar surface lies outside rg, and the line element inside the star is not given by the Schwarzschild metric.

Newtonian limit of particle motion in the Schwarzschild field: The action for a particle of rest mass m is −mc2 R ds. For the Schwarschild geometry this gives Z S = −mc2 ds (86) Z q = −mc2 F dt2 − (1/F )dr2/c2 − (r/c)2(dθ2 + sin2 θ dφ2) (87) v u 2 2 2! Z u 1 dr r2 dθ dφ = −mc2 dttF − − + sin2 θ (88) F c2 dt c2 dt dt

If we restrict attention to values of r such that rg/r  1, and values of the velocity that a much less than the speed of light, we may expand the square root and drop all but the leading order terms in rg/r and v/c, in which case the action becomes Z 2 2 1 2 2 = −mc dt[1 − GM/(c r) − 2 v /c + ...] (89) Z 2 1 2 = dt[−mc + GMm/r + 2 mv + ...]. (90)

This shows that the Lagrangian is a constant −mc2 plus the Newtonian La- grangian, plus corrections.

Gravitomagnetism: This is not something I addressed in class, but it seems worth mentioning for those who are interested. I just showed that for

36 slow motion in a weak field we recover Newtonian gravity. Now we can see that for weak gravitational fields there is a phenomenon that looks like a gravitational version of magnetism. If we denote the Minkowski metric by ηµν and the metric perturbation by hµν, the proper time becomes q µ ν ds = (ηµν + hµν)dx dx . (91)

If we expand this in hµν and assume low velocities it becomes

p µ ν µ 1 ds = ηµνdx dx + Aµdx + ..., with A0 = 2 h00 and Ai = h0i. (92) So the 0i off-diagonal components of the metric perturbation act like magnetic vector potential. Why would we have such components? If the source mass is moving relative to a given frame, then such components arise. For example, a spinning body like the earth produces a gravitomagnetic vector potential.

12 Hamiltonian formalism

Lagrange’s equations are n coupled second order ODEs for the n generalized coordinates qi. We could always rewrite this as 2n coupled first order equa- tions, by defining new variables vi byq ˙i = vi, and replacing allq ¨i byv ˙ i. But there is a much better way to proceed in general, which is to use not vi but i rather the conjugate momenta pi = ∂L/∂q˙ . Better in what sense? Well, there are numerous potential advantages: (i) the form of the equa- tions may be simpler, (ii) the conservation laws are simpler to exploit, (iii) the i resulting flow in the phase space of (q , pi) coordinates is volume preserving (this is Liouville’s theorem), which is a powerful piece of information about time evolution, (iv) there is a general solution method (Hamilton-Jacobi), (v) it can sometimes provide an convenient approximation method because of certain approximate conserved quantities that are easy to get your hands on, (vi) it has a larger symmetry generalized coordinate changes, under which coordinates and momenta can be mixed, which is sometimes useful in solv- ing problems, (vii) it is characterized by a simple and elegant mathematical structure, namely Poisson brackets, that turn out to provide the deepest link between classical mechanics and the corresponding “quantized” systems.

37 12.1 Hamilton’s equations and the Hamiltonian First let’s recall a mathematical fact that deserves to be marveled over: If f = f(x, y) is a function of two variables, then ∂f ∂f df = dx + dy. (93) ∂x ∂y Conversely, if df = a dx + b dy, then a = ∂f/∂x and b = ∂f/∂y. Now the idea is to find a way to express the content of Lagrange’s equa- tions in terms of derivatives of a function of q and the momentum p = ∂L/∂q˙. For this to work, it must be possible to invert the definition of p and solve forq ˙ in terms of q and p, i.e.q ˙ =q ˙(q, p, t). The strategy is to start with dL and massage it until it’s expressed in terms of dq and dp instead of dq and dq˙. Let’s assume for simplicity at first that L does not depend explicitly on t. Then we have ∂L ∂L dL = dq + dq˙ (94) ∂q ∂q˙ ∂L = dq + p dq˙ (95) ∂q ∂L = dq − q˙ dp + d(pq˙). (96) ∂q (In the last step I used d(pq˙) = p dq˙ +q ˙ dp.) In terms of the Hamiltonian H = pq˙ − L, (97) this can be re-expressed as ∂L dH =q ˙ dp − dq. (98) ∂q We can now regard q and p as the independent variables, viewing H = H(q, p) as a function of them, provided p(q, q˙) can be inverted to solve forq ˙(q, p). The final step is to invoke Lagrange’s equation of motion, ∂L/∂q =p ˙, so that (98) becomes dH =q ˙ dp − p˙ dq. (99) We’ve now arrived at Hamiltonian form of the equations of motion. The coefficients of dp and dq in (99) are the partial derivatives of H with respect to p and q, hence ∂H ∂H q˙ = , p˙ = − . (100) ∂p ∂q

38 These are called Hamilton’s equations, or the canonical equations. Their simple and symmetric structure leads to much magic. The time derivative of the Hamiltonian is dH ∂H ∂H ∂H ∂H ∂H ∂H = q˙ + p˙ = − = 0, (101) dt ∂q ∂p ∂q ∂p ∂p ∂q so the Hamiltonian is conserved when the equations of motion hold, provided it has no explicit time dependence. Allowing for t dependence in the above general derivation, there would be a (∂L/∂t) dt term in dL and dH, and we would infer that dH ∂H ∂L = = − . (102) dt ∂t q,p ∂t q,q˙ (The subscripts q, p and q, q˙ on the partial derivatives indicate which variables are held fixed when the partial t derivative is taken.) If there are multiple coordinates qi, then everything just said goes through with summation over repeated indices. In particular, Hamiltonian is then

i H = piq˙ − L, (103) and Hamilton’s equations read

i ∂H ∂H q˙ = , p˙i = − i . (104) ∂pi ∂q The generalized coordinates qi define the configuration of a system. Config- uration space is the collection of all configurations. The space of coordinates i and momenta {q , pi} is called phase space. A point in phase space is a complete set of initial conditions for Hamilton’s first order equations (104). i Given a Hamiltonian, there is a “velocity” vector (q ˙ , p˙i) at each point in phase space. If the Hamiltonian is time-independent, there is a unique tra- jectory through each point, and the collection of all these trajectories is called the flow. The Hamiltonian is then conserved along the flow, so the flow lines are contours of constant H. An example will be given in the next section.

12.2 Example: Bead on a rotating circular hoop Let’s illustrate all these things with the system of a bead of mass m in a gravitational field g on a circular hoop of radius R. We’ll first consider the case where the hoop is driven at fixed angular velocity Ω, and then consider the case where the hoop has moment of inertia I and spins freely.

39 12.2.1 Driven hoop Using the angle θ from the bottom of the hoop as the generalized coordinate, the Lagrangian is

1 2 ˙2 2 2 2 L = 2 m(R θ + Ω R sin θ) − mgR(1 − cos θ). (105) To simplify the formulas and to better see the essence of the system, let’s choose units with m = R = g = 1 (you should check that this is possible), in terms of which the Lagrangian becomes

1 ˙2 2 2 L = 2 (θ + Ω sin θ) + cos θ − 1. (106)

1 2 At small angles sin θ ≈ θ and cos θ ≈ 1 − 2 θ , so this takes the approximate form 1 ˙2 1 2 2 L = 2 θ + 2 (Ω − 1)θ , (107) where the constant term has been dropped. For Ω < 1 = pg/R this is a harmonic oscillator around θ = 0, but for Ω > 1 it is unstable around θ = 0 and two new stable equilibrium points develop. For Ω = 1 the coefficient of θ2 vanishes, so the system is marginally stable. We have to go back to the exact Lagrangian to see what happens at the next order. Up through order θ4 we have sin2 θ = (θ − θ3/3!)2 = θ2 − θ4/3, and cos θ − 1 = −θ2/2 + θ4/4!, hence 1 ˙2 1 4 6 L = 2 θ − 8 θ + O(θ ). (108) Thus the system is indeed stable at θ = 0, but only very weakly. Now let’s consider the Hamiltonian. The momentum conjugate to θ is ˙ ˙ pθ = ∂L/∂θ = θ, so the Hamiltonian is

1 2 H = 2 pθ + Ueff,Ω(θ). (109) with the effective potential

1 2 2 Ueff,Ω(θ) = − 2 Ω sin θ − cos θ. (110) The equilibrium points lie where

0 2 Ueff,Ω(θ) = sin θ(1 − Ω cos θ) = 0. (111)

˙ 0 2 Hamilton’s equations are θ = p andp ˙ = −Ueff,Ω = sin θ(Ω cos θ − 1). The flow in phase space is illustrated in the Figure, for the three values Ω = 0, 1, 2.

40 Π Π Π Π Π Π Π Π Π Π Π Π - 3 - 3 - 3 - 3 - 3 - 3 2 -Π 2 0 2 Π 2 2 -Π 2 0 2 Π 2 2 -Π 2 0 2 Π 2

3 Π -Π Π Π Π 3 Π 3 Π -Π Π Π Π 3 Π 3 Π -Π Π Π Π 3 Π - - 0 - - 0 - - 0 2 2 2 2 2 2 2 2 2 2 2 2

Missing from these “phase portraits” are the arrows showing which direction the flow goes in. (The reason is that they are generated as contour plots of the Hamiltonian. Mathematica has a function called StreamPlot that produces the arrows, but the plots don’t show the unstable critical points of the flow as well.) The circulation is clockwise around the closed contours, as you can infer from the fact that θ increases when p = θ˙ > 0, for example. At the crossing points, the velocity goes to zero. These are unstable equilibria, and are called hyperbolic points. There are stable equilibria in the centers of the closed orbits, called elliptic points. When Ω > 1, their is sensitive dependence on initial conditions near the origin, in the following sense: if initially pθ = 0 and θ is small and positive, then the bead will oscillate around the equilibrium at positive θ. If instead initially θ = 0 and pθ is small and positive, then the bead will oscillate on an orbit spanning both the positive and negative equilibrium points.

12.2.2 Freely rotating hoop The bead on the driven hoop has just one degree of freedom, the angular position of the bead on the hoop. Let’s now consider the case where instead of being driven the hoop is freely rotating about the vertical axis. We can use the azimuthal angle φ for the second generalized coordinate. The Lagrangian for this system is 1 ˙2 1 2 ˙2 L = 2 θ + 2 (I + sin θ)φ + cos θ − 1, (112) where I is the moment of inertia of the hoop, and as before we take units with m = R = g = 1. The momentum conjugate to φ is 2 ˙ pφ = (I + sin θ)φ (113)

41 and (dropping the constant 1) the Hamiltonian is

1 2 1 2 2 H = 2 pθ + 2 pφ/(I + sin θ) − cos θ. (114) The phase space for this system is four dimensional, but has rotational sym- metry about the vertical axis, so φ doesn’t appear in the Hamiltonian (it’s called an “ignorable coordinate”), and therefore pφ is conserved. For a fixed pφ, dynamics of θ can be studied on its own, and from (114) we see that it is governed by the effective potential

1 2 2 Ueff,pφ (θ) = 2 pφ/(I + sin θ) − cos θ. (115) If the moment of inertia is much greater than 1, i.e. much greater than the maximum moment of inertia of the bead mR2, then the system behaves much like the driven hoop. To see this explicitly, we can expand the potential in powers of I−1 and drop all but the first order term. Up to an additve constant, the effective potential then becomes identical to the driven case, with Ω → pφ/I. To see the small θ behavior in the general case we can expand the effective potential around θ = 0,

1 2 2 2 Ueff,pφ (θ) = const. + 2 (1 − pφ/I )θ + .... (116)

This is identical to the case of a driven hoop, with Ω replaced by pφ/I. There will still be a stable equilibrium at θ = 0 if pφ/I < 1.

12.3 Example: Charged particle in an electromagnetic field The Lagrangian for a particle of mass m and charge e in an electromagnetic 1 i i i i field is L = 2 mx˙ x˙ + e(Aix˙ − V ), so the momentum conjugate to x is pi = mx˙ i + eAi. Note that this is gauge dependent. We can solve for the velocity,x ˙ i = (pi − eAi)/m, and thus the Hamiltonian is 1 H = (p − eA) · (p − eA) + eV. (117) 2m As an illustration consider a uniform electric field, described in two different ways: (i) by a linear scalar potential V = −E0x, and (ii) by a time-dependent vector potential A = −E0txˆ. In the first gauge, the Hamiltonian is H = 2 p /2m−eE0x. The Hamiltonian is time independent, so that it is conserved,

42 and it is equal to the kinetic plus potential energy. There is no x translation symmetry, and the canonical momentum in the x direction is not conserved. 2 2 2 In the second gauge, the Hamiltonian is H = [(px + eE0t) + py + pz]/2m. This is x translation invariant, so that now the (canonical) momentum is conserved, even though of course the velocity is not conserved. It is not time translation invariant however, so the Hamiltonian is not conserved. This is no surprise, since the numerical value of this Hamiltonian is just the kinetic energy.

12.4 Phase space volume and Liouville’s theorem A measure of volume in phase space is defined by the canonical coordinates, as if they were Cartesian coordinates, i.e. by one factor of dq dp for each pair of canonical coordinates. Since p = ∂L/∂q˙, each such factor has dimensions of energy times time. So the volume in a 2N dimensional phase space has dimensions [action]N .

Liouville’s theorem states that the volume of any region is preserved by the Hamiltonian flow. For example, consider a free particle H = p2/2, so q˙ = p andp ˙ = 0. A rectangle in phase space flows to a parallelogram with the same area. Liouville’s theorem in the general case follows from the fact that the flow vector has vanishing divergence, as I’ll now explain. Consider a two dimensional flow determined by a possibly time depen- dent vector field V(q, p, t). After an infinitesimal time dt, each point on the boundary of a region is displaced by Vdt, and the change of area of the en- closed region is the area of the strip formed by the new boundary and the original one. This is given by the integral of dlnˆ · V dt around the bound- ary, where nˆ is the unit outward normal vector from the boundary. By the two dimensional version of the divergence theorem, this boundary integral is equal to the integral of divV over the enclosed area, times dt. Thus if the flow vector has zero divergence, the flow is area preserving. The same argument works in any dimension. Now let’s examine the divergence of the Hamiltonian flow. For a single canonical pair the flow vector is given by

q p V = (V ,V ) = (q, ˙ p˙) = (∂pH, −∂qH), (118) and if there are more canonical pairs there are simply more components of the same form. The contribution to the divergence from each canonical pair

43 is ∂ ∂ divV = V q + V p (119) ∂q ∂p ∂ ∂H ∂  ∂H  = + − (120) ∂q ∂p ∂p ∂q = 0. (121)

The last line follows from the most important mathematical fact in physics: mixed partial derivatives commute. The flow in phase space is therefore divergence free, hence it is volume preserving.

12.5 Phase space and quantum mechanics The Heisenberg uncertainty relation, applied to generalized coordinates and their conjugate momentum, states that the product of “uncertainties” in q and in p can be no smaller than something of order Planck’s constant of action,h ¯. More precisely, if the uncertainties are rms variances, their product can be no smaller thanh/ ¯ 2 = h/4π. So, roughly speaking, a quantum system cannot be localized in phase space more than a certain minimum volume, and the number of distinguishable quantum states is proportional to the volume in phase space. More precisely, in a two dimensional phase space the area corresponding to one quantum state is h = 2πh¯. In a phase space of 2N dimensions, the volume corresponding to one quantum state is hN . Let’s illustrate this with a particle in a box. A quantum particle in a one dimensional box on the interval [0, a] has energy eigenstate wavefunctions of the form ∼ sin(px/h¯). Since the wave function must vanish at the box boundaries, the allowed values of p are pn = nπh/a¯ . As n increases by 1, p increases by πh/a¯ , corresponding to a phase space area of a∆p = πh¯ = h/2. But actually the particle with this wavefunction is really a superposition of a particle moving to the right and to the left, with both signs of momentum, so there is another h/2 area associated with the negative momentum part of the motion. The total area per state is thus h.4

4For a more generic argument, consider that the canonical commutation relations be- tween x and p are [x, p] = i¯h, which implies that the momentum operator is −i¯h d/dx, so a state with momentum p has the form exp(ipx/¯h). In order to completely distinguish two of these in a region of length a, the phase should differ by 2π, so the minimum gap in momentum ∆p to make a distinct state satisfies a∆p/¯h = 2π, i.e. a∆p = h.

44 Another simple example is a harmonic oscillator. The Hamiltonian in 1 2 1 2 units with m = ω = 1 is H = 2 p + 2 q , and the quantum energy levels are 1 En = (n + 2 )¯hω, for n = 0, 1, 2 ... . The classical orbits conserve H and so are circles in phase space. The area of one of these circles is π(p2 + q2) = 2πH = 2πE = 2πE/ω. (The factor of ω in the denominator is restored by dimensional analysis, as the phase space are has dimensions of action.) The difference in areas for two energies separated by ∆E is therefore 2π∆E/ω. Two adjacent quantum energy levels are separated by ∆E =hω ¯ , so again an area 2πh¯ = h corresponds to one quantum state. If we think of the classical phase space flow as a classical limit of a quan- tum evolution, then Liouville’s theorem is the classical limit of the property of quantum evolution (unitarity) that ensures the preservation of distinctions between quantum states. Note that the classical state of, for example, a two-particle system is a point in a twelve dimensional phase space (each particle has 3 q’s and 3 p’s). In quantum mechanics, the “state” of a two particle system is a wave function of six variables (for example, the positions or momenta of the two particles).

12.6 Entropy and phase space Boltzmann proposed in the 1870’s that entropy be identified with a constant k times the logarithm of the number of microstates compatible with the macro- scopic configuration, S = k ln W . Think of a box of gas. The huge number of molecules in the box live in a very high dimensional phase. One config- uration of the positions and momenta of all of the molecules corresponds to a point in the phase space. There are an infinite number of such points in phase space compatible with the macroscopic properties of the gas (e.g. temperature and volume), so an infinite number of “microstates”. But one can regulate this with the idea that the infinity is proportional to the phase space volume occupied by all these points. [I think Boltzmann described this in terms of “cells” in phase space.] The infinite proportionality factor becomes an additive constant after the logarithm is taken, so the entropy is well-defined up to an ambiguous additive constant. In quantum mechanics, the number of independent states is finite, and the volume is measured in units of hN , which removes the ambiguity of the additive constant. Let’s illustrate this with expansion of a gas. In free expansion, the phase space volume compatible with the macrostate increases, since the available spatial volume increases and the energy and therefore momentum distribu-

45 tion (assuming an ) stays the same. The entropy therefore goes up. For example if the volume of the container is doubled, the compatible phase space volume increases by a factor of 2N , so the entropy goes up by a factor N ln 2. The apparent violation of Liouville’s theorem arises because of the “coarse graining”: the original volume of phase space does not evolve to fill the final volume, unless you “blur” it out by coarse graining.) This process of free expansion is irreversible in practice, and the coarse graining involves loss of information, which accounts for the N ln 2 entropy increase. If the expansion is instead adiabatic, a slow process pushing against a slowly moving piston, with no heat transfer in or out of the gas, then then the gas remains in equilibrium at each step. The gas does work against the piston, transferring energy to it, decreasing the momenta (and lowering the temper- ature), which compensates the increased available volume, and the (coarse grained) phase space volume remains unchanged. This process is reversible, and entropy does not increase.

12.7 Adiabatic invariants When the Hamiltonian is time dependent, energy is not conserved. But if the change of the Hamiltonian is very slow compared to the period of a closed orbit, then there can be an approximate conservation law, for a quantity called an adiabatic invariant. The name implies that when the Hamiltonian changes very slowly, i.e. adiabatically, the quantity is unchanged. Examples of adiabatic changes are: a pendulum with a slowly varying length, an oscillator with slowly varying mass and/or spring constant, a particle in a box whose size is slowly varying, a charged particle in a slowly changing magnetic field. In each of these examples there is one or more parameters λi(t) that are changing slowly. For the process to be adiabatic, their fractional change over one period T should be small: ˙ ˙ λiT  λi, equivalently λi/λi  ω. (122) In this case, since the energy is changing only because of the parameter changes, it is plausible that the energy will change at a rate depending ap- proximately linearly on these quantities, and therefore there will be some function of E and λi that does not change. This is the adiabatic invariant. As we’ll see the adiabatic invariant has dimensions of action. If there is a unique way to form a quantity with dimensions of action with the quantities in the system, then dimensional analysis furnishes the adiabatic invariant.

46 Consider first the case of a system described by one generalized coor- dinate, whose phase space is two dimensional. Suppose the system starts somewhere on a closed orbit O1, and adiabatically evolves to another closed orbit O2 over a time ∆t. Then, since many cycles are completed before any appreciable change in the Hamiltonian occurs, all initial points on O1 have approximately the same “experience”, hence we expect any other initial point on O1 would have evolved to some other point on O2. In other words, the Hamiltonian flow takes O1 to O2 as a whole. But then Liouville’s theorem says that the area of the phase space region enclosed by the initial and final orbits must be the same! That’s the adiabatic invariant. In two dimensional phase space, the area enclosed by a closed curve is equal to the integral H pdq. To see this, slice the region up into strips by dividing the q range into intervals of width dq. Since the curve at top and bottom of a strip hits the boundary going in opposite directions, the contri- bution of these two segments to the loop integral is the area of that strip. So another way to write the adiabatic invariant is as this integral. In higher dimensions a closed orbit does not bound a volume, so we can’t use Liouville’s theorem to identify adiabatic invariants. However, like the volume, the integral I i pidq Poincar´eintegral invariant (123) around any closed loop is conserved under any Hamiltonian flow. This is called Poincar´e’sintegral invariant. As discussed above, in a two dimensional phase space it is just the area of the enclosed region, but in higher dimensions, it is something else. To show that the integral invariant is indeed unchanged, we evaluate its time derivative: d I I p dqi = p˙ dqi + p dq˙i (124) dt i i i I i i i = p˙idq − q˙ dpi + d(piq˙ ) (125) I i = d(piq˙ − H) (126) = 0. (127)

The second to last expression is the integral of a total differential around a

47 closed loop, so it vanishes. (It happens to be the integral of dL, where L is the Lagrangian, as a function of phase space.)

12.7.1 Example: Oscillator with slowly varying m(t) and k(t) Consider an oscillator with Hamiltonian p2 1 H = + k(t)x2. (128) 2m(t) 2

Because of the time dependent mass and spring constant, the energy is not conserved. The energy is the Hamiltonian, and dH/dt = ∂H/∂t (as shown earlier, Hamilton’s equations imply that the time dependence of x and p has no effect on H), so we have

m˙  p2  k˙ 1  E˙ = − + kx2 , (129) m 2m k 2 where the time dependence of m and k is now suppressed in the notation. Now suppose that the parameters are changing only very slowly, in the sense that m˙ k˙  ω and  ω, (130) m k where ω = ω(t) = pk/m is the “instantaneous” angular frequency. That is, m and k change by very little during one period of the oscillator. Then we can get a good approximation to the average of E˙ over a cycle by replacing the kinetic and potential energy terms in (129) each by half the energy. Then dividing both sides by E yields

E˙ m˙ k˙ ≈ − + , (131) E 2m 2k which implies Em1/2k−1/2 = E/ω ≈ constant. (132) This combination of E, m, and k is in fact the unique combination with dimensions of action, so its adiabatic invariance could have been surmised without calculation. Note that, as explained above in Sect. 12.5, E/ω is indeed proportional to the phase space area enclosed by the orbit.

48 12.7.2 Adiabatic invariance and quantum mechanics It was noted in the early days of quantum theory that the quantization of 1 oscillator energies E = (n + 2 )¯hω requires that E/ω can only take discrete values, whereas in classical physics this ratio can vary continuously. The fact that classically E/ω is an adiabatic invariant was viewed as a necessary compatibility condition. Otherwise it would be impossible to change ω, even infinitely slowly, without changing the quantum number n. Ehrenfest took this particularly seriously and proposed an “adiabatic principle” which enun- ciated that all quantities subject to quantization rules should be adiabatic in- variants. He showed in 1916 that this was in fact the case for all the successful quantization rules of the . (See this talk for details: http: //scholar.google.com/scholar?cluster=3114216431842117235&hl=en&as_ sdt=0,21)

12.7.3 Example: Cyclotron orbits and magnetic mirror Another example of an adiabatic invariant is the magnetic flux Bπr2 through a cyclotron orbit in a slowly changing magnetic field (as shown in a homework problem). This has an interesting implication for the component of motion of a charge along the field lines of a static but inhomogeneous field. The kinetic energy of the transverse, cyclotron motion can be found from the 2 1 2 2 2 2 Lorentz force balance law: qv⊥B = mv⊥/r, so 2 mv⊥ = q B r /2m. Since 1 2 the flux is invariant, we have 2 mv⊥ = αB for some positive constant α. This transverse kinetic energy acts as an effective potential for the motion along the field lines, and the charge is repelled from a region of stronger field. (The origin of this repulsion is the component of Lorentz force due to the radial component of the magnetic field.) For a given energy, the charge may reach a turning point where the longitudinal motion comes to rest, and all the kinetic energy is transverse. The charge bounces off the turning point, hence the name “magnetic mirror”. Charges in the earth’s magnetic field do exactly this, bouncing back and forth along the field lines in the Van Allen radiation belts (http://en.wikipedia.org/wiki/Van_Allen_radiation_belt).

12.8 Poisson brackets We won’t cover the material of this section in Phys 410 this year (2012). It is included here in case you want to get a quick introduction to these topics.

49 Poisson brackets are a formal development that establishes the link with quantum mechanics, and also reveals more about the structure of the Hamil- tonian formulation of mechanics and symmetries. Consider the time depen- dence of a function A(q, p) on phase space: dA ∂A ∂A ∂A ∂H ∂A ∂H = q˙ + p˙ = − =: {A, H}, (133) dt ∂q ∂p ∂q ∂p ∂p ∂q where the last step defines the Poisson bracket { , }. If A has explicit time dependence then of course a term ∂A/∂t must be added. In general, the Poisson bracket between two phase space functions is defined as ∂A ∂B ∂A ∂B {A, B} = − (Poisson bracket) (134) ∂q ∂p ∂p ∂q In a higher dimensional phase space, each qp pair contributes a pair of terms like these. We can express this by decorating q and p with an index i, and summing over repeated indices. Canonically conjugate pairs of coordinates have unit Poisson bracket with each other, and zero Poisson bracket with the other coordinates: i i {q , pj} = δj. (135) The formal properties of the Poisson bracket are that it is bilinear, anti- symmetric, and satisfies the Liebniz (product) rule and the Jacobi identity: {A, λB + µC} = λ{A, B} + µ{A, C} (136) {A, B} = −{B,A} (137) {A, BC} = {A, B}C + B{A, C} (138) 0 = {A, {B,C}} + {B, {C,A}} + {C, {A, B}}, (139) where λ and µ are constants. Conserved quantities have vanishing Poisson bracket with the Hamilto- nian (assuming they have no explicit time dependence). If A and B are conserved, the Jacobi identity implies that A, B is also conserved. Exam- ple: if Lx, Ly, and Lz are the components of angular momentum, then {Lx,Ly} = Lz, so conservation of Lx and Ly implies conservation of Lz. This corresponds to the fact that rotations about the z axis can be expressed as combinations of rotations about the x and y axes. [Note { , } has dimensions of 1/[qp] = 1/[angular momentum], so the dimensions do work out.]

50 12.8.1 Canonical Quantization The formal properties of the Poisson bracket are all shared by matrix commu- tators. This suggested to Dirac the idea that the recently discovered of Heisenberg, then with Born and Jordan, could be directly re- lated to the corresponding classical mechanics via replacement of functions on phase space by matrices (generally, infinite dimensional matrices, i.e. “op- erators”) whose commutators are ih¯ times the corresponding classical Poisson brackets, [A,ˆ Bˆ] = ih¯{A, B}. (140) Here for example Aˆ is the operator corresponding to the classical observable A. The eigenvalues of Aˆ correspond to classical values of A. Only a small subset of all the operators can satisfy this simple quan- tization condition. In particular, in the usual formulation, the canonically conjugate quantum coordinates and momenta have simple Heisenberg com- mutation relations, [ˆq, pˆ] = ih.¯ (141) This makes the phase space coordinates into non-commuting operators. The classical phase space coordinates correspond to the eigenvalues of these op- erators.

12.8.2 Canonical transformations In the Lagrangian formalism, any generalized coordinates can be used. The Lagrangian is just expressed in terms of the new coordinates, and Lagrange’s equations take the same form in terms of the new coordinates as they did in the old coordinates. The coordinate freedom is even greater in the Hamiltonian formalism. A phase space coordinate change that preserves the form of the Poisson brackets of the coordinates is called a canonical transformation. That is, i new coordinates (Q ,Pi) are related to the original canonical coordinates by a canonical transformation if

i i i j {Q ,Pj}q,p = δj, and {Q ,Q }q,p = 0 = {Pi,Pj}q,p. (142) If this holds, then all Poisson brackets have the same value whether com- puted using q, p or Q, P as the canonical coordinates: {f, g}q,p = {f, g}Q,P , where the subscripts indicate which coordinates enter in the partial deriva- tives defining the Poisson bracket. In particular, if the coordinate change is

51 time-independent, Hamilton’s equations take the same form in terms of the new coordinates, with the same Hamiltonian.

Examples of canonical transformations

• Q = 5q, and P = p/5

• More generally, Q = Q(q), and P = p(∂q/∂Q). This is the Hamil- tonian version of an arbitrary change of generalized coordinate in the Lagrangian, L[q, q˙] = L[q(Q), (∂q/∂Q)Q˙ ]. (143) P can be found directly from the definition of the canonical momentum: P = ∂L/∂Q˙ = (∂L/∂q˙)(∂q/∂Q) = p(∂q/∂Q).

• Even more generally, the previous example works with multiple gener- alized coordinates, with ∂q/∂Q replaced by the Jacobian ∂qi/∂Qj.

• In the previous examples the new Q’s depend only on the old q’s, not on the old p’s. In general, however, a canonical transformation can mix up the q’s and p’s. The simplest example of this is Q = p, and P = −q.

• A juicier example of mixing up q’s and p’s is to use the Hamiltonian itself as a coordinate. More specifically, consider a harmonic oscillator with Hamiltonian H = p2/2m + mω2x2/2, and define I = H/ω and θ = tan−1(mωq/p) (which is the angle measured clockwise from the p axis in units with m = ω = 1). Then {θ, I} = 1 (and of course {θ, θ} = 0 = {I,I}), so this coordinate change on phase space is a canonical transformation. Hamilton’s equations take a very simple form in these coordinates: H = ωI, so θ˙ = ∂H/∂I = ω, and I˙ = −∂H/∂θ = 0. These are a special case of what are called action-angle variables.

13 Continuum mechanics

So far we’ve treated systems with a finite number of generalized coordinates, either point particles or rigid bodies. Motion of flexible macroscopic bodies can very well be approximated as continuous, and this can be done using functions as generalized coordinates. Also fundamental fields like the elec- tromagnetic field or the spacetime metric are (at least in current formulations

52 of physics) strictly continuous, and their dynamics is described by continuum mechanics as well. This section is going to be sketchy for the moment...

13.1 String Lagrangian for small transverse displacements with of a stretched (non- relativistic) string: Z 1 2 1 2 L = [ 2 µ(∂y/∂t) − 2 T (∂y/∂x) ]dx, (144) where y(x, t) specifies the string displacement, assumed in a fixed plane, away from a straight equilibrium, µ is the mass per unit length, T is the tension. This is derived under the assumption that the slope ∂y/∂x is always much smaller than 1. For such small displacements, we can treat the mass density and tension as constant. We neglect longitudinal (compression type) distur- bances of the string as well as any restoring force due to stiffness (resistance to bending the string). Only the potential energy associated with changing the length of the string due to transverse displacement is taken into account. Let’s see how to get that Lagrangian. A segment dx of string has a mass 1 2 µ dx and a velocity ∂y/∂t, hence a kinetic energy 2 µ(∂y/∂t) dx. Integrating this yields the kinetic energy term in (144). As for the potential energy, it is equal to the work it takes to stretch the string. The work to stretch a segment dx to a length dl is T (dl − dx). We have

p 2 2 p 2 1 2 4 dl = dx + dy = dx 1 + (∂y/∂x) = dx[1 + 2 (∂y/∂x) + O((∂y/∂x) )]. (145) To lowest order in the slope, the change in length of the displaced segment is 1 2 thus 2 (∂y/∂x) dx. The potential energy of displacement is then the integral of this times T , which is the potential energy term in (144). The next order term is quartic in the slope, so would add a term cubic in y to the equation of motion. This would be a nonlinear equation. If that term is very small, its effect could be taken into account using perturbation theory.

Lagrange equation and boundary conditions

53 Normal modes Method 1) Solve directly for normal modes. Method 2) P Fourier transform y(x, t) = n yn(t) sin knx, find that Lagrangian becomes sum of an infinite number of oscillator Lagrangians. Works for fixed (Dirich- let) or free (Neumann) or mixed boundary conditions.

Loaded string An interesting variation is to attach a mass m at one end of the string. Then no boundary condition is needed on that end, since the dynamics of the mass plays that role. Then one cannot use a Fourier decom- position in advance, since one doesn’t know what the allowed wavevectors are. Instead one must (I think) solve directly for the normal modes. In a homework problem you show that the allowed wavevectors and frequencies are determined by a transcendental equation.

String with bending modulus Another interesting variation is to take into account the stiffness of a string, which is important for thick piano strings. That is, besides the potential energy of stretching associated with the string tension, there is a potential energy of bending. At lowest order, the bending energy density is proportional to the square of the curvature 00 R 1 00 2 y (x) of the string, so the bending energy has the form dx 2 β(y ) , where the constant β is the bending modulus. This term preserves linearity of the equation of motion, but it produces a y0000 term that is fourth order in the x derivative. This is worked out in a homework problem. More boundary conditions are needed to kill off the boundary terms in the variation of the action, and that matches the fact that more boundary conditions are needed to determine a solution to the fourth order mode equation. Different bound- ary conditions correspond to different physical conditions on the boundary, for instance a clamped end (y0 = 0) vs. an end that is free to rock on a fulcrum, held down by a material with no bending modulus (y00 = 0). A piano string lies in between these extremes, and can (I think) be described by including a term in the potential energy proportional to the angle of the string at the endpoint. The normal modes are easy to find using the boundary condition y00 = 0, since it is simultaneously satisfied with the fixed end condition y = 0 by sin functions. The boundary condition y0 = 0 is trickier to implement and involves other solutions of the fourth order mode equation. (This is an extra credit homework problem.)

54 13.2 Electromagnetic field 13.3 Elastic solids

55