5. Extending the Principle

5.1 Vis Inertiae

It is indeed a matter of great difficulty to discover, and effectively to distinguish, the true motions of particular bodies from the apparent, because the parts of that immovable space in which those motions are performed do by no means come under the observation of our senses. Yet the thing is not altogether desperate… Isaac Newton, 1687

According to Newtonian mechanics a particle moves without acceleration unless acted upon by a force, in which case the particle undergoes acceleration proportional to the applied force. The acceleration is defined as a vector whose components are the second derivatives of the particle’s space coordinates with respect to the time coordinate, which would seem to imply that the acceleration of a particle – and hence the force to which the particle is subjected – depends on our choice of coordinate systems. Of course, Newton’s law is understood to be applicable only with respect to a special class of coordinate systems, called the inertial coordinate systems, which all give the same acceleration, and hence the same applied force, for any given particle. Thus the restriction to inertial coordinate systems enables us to regard accelerations and the corresponding forces as absolute.

However, even in the context of Newtonian mechanics it is sometimes convenient to set aside the restriction to inertial coordinate systems, and as a result the distinction between physical forces and coordinate-based accelerations becomes ambiguous. For example, consider a particle whose position in space is expressed by the vector

where i, j, k are orthogonal unit vectors for a coordinate system with fixed origin, and x(t), y(t), z(t) are scalar functions of the time coordinate t. Obviously if these basis vectors are unchanging, the derivatives of r are simply given by

but if the basis vectors may be changing with time (due to rotation of the coordinate axes) the first derivative of r by the chain rule is

The quantity in the first parentheses is the partial derivative of r with respect to t at constant basis vectors i, j, k, so we denote it as ∂r/∂t. The quantity in the second parentheses is the partial derivative of r with respect to t at constant x, y, z, which means it represents the differential change in r due just to the rotation of the axes. This change is perpendicular to both r and the angular velocity vector , and its magnitude is r times the sine of the angle between  and r, as indicated in the figure below.

Therefore, the total derivative of r with respect to t can be written as

Notice that this applies to any vector (compare with equation 4b in Appendix 4, noting that the angular velocity serves here as the “Christoffel symbol”), so we can immediately differentiating again with respect to t, giving the total acceleration

Noting that the cross product is distributive, and that the chain rule applies to derivatives of cross products, this can be written as

This was based on the premise that the origin of the x,y,z coordinates was stationary, but if we stipulate that the origin is at position R(t) with respect to some fully inertial coordinate system, then the particle’s position in terms of these inertial coordinates is R+r and the total acceleration of the particle includes the second derivative of R. Thus Newton’s second law, which equate the net applied force F to the mass times the acceleration (defined in terms of an inertial coordinate system), is

If our original xyz coordinate system was inertial, then all the terms on the right hand side except for the second would vanish, and we would have the more familiar-looking expression

Now, if we are determined to organize our experience based on this simple formulation, for any arbitrary choice of coordinate systems, we can do so, but only by introducing new “forces”. We need only bring the other four terms from the right hand side of the previous equation over to the left side, and call them “forces”. Thus we define the net force on the particle to be

The first term on the right side is the net of the “physical forces”, whereas the remaining terms are what we might call “inertial forces”. They are also often called “fictitious forces”. The second term is the linear acceleration force, such as we may imagine is pulling us downward when standing in an elevator that is accelerating upward. The fourth term is called the Coriolis force, and the fifth term is sometimes called the centrifugal force. (The third term apparently doesn’t have a common name, perhaps because the angular velocity in many practical circumstances is constant.) On this basis the Newtonian equation of motion in terms of an arbitrary Cartesian coordinate system has the simple form

It’s interesting to consider why we usually don’t adopt this point of view. It certainly gives a simpler general equation of motion, but at the expense of introducing several new “forces”, beyond whatever physical forces we had already identified in F. Our preference for the usual (more complicated or more restrictive) formulation of Newton’s law is due to our desire to associate “physical forces” with some proximate substantial entity. For example, the force of gravity is attributed to the pull of some massive body. The force of the wind is attributed to the impulse of air molecules. And so on. The “inertial forces” can’t be so easily attributed to any proximate entity, so unless we want to pursue the Machian idea of associating them with the changing relations to distant objects in the universe, we are left with a “force” that has no causative substance, so we tend to regard such forces as fictitious. Nevertheless, it’s worth remembering that the distinction between “physical” and “fictitious” forces is to some extent a matter of choice, as is our preference for inertial coordinate systems to measure time and space.

To illustrate some of the consequences of these ideas, recall that the Sagnac effect was described in Section 2.7 from the standpoint of various systems of inertial coordinates, and in Section 4.8 in terms of certain non-inertial coordinate systems, but in all these cases the analyses was based on the premise that the “true” measures of time and space were based on inertial coordinate systems. We can now examine some aspects of a Sagnac device from a more general standpoint of arbitrary curvilinear coordinates, leading to the idea that the “physical” effects of acceleration can be absorbed into the metrical structure of spacetime itself.

In a square or triangular Sagnac device the light ray going from one mirror to the next in the direction of rotation passes through the interior of the polygon when viewed from a non-rotating frame of reference. This implies that the light ray, traveling in a straight line, diverges from the rim of the Sagnac device and then converges back to the next vertex. On the other hand, if we consider the same passage of light from the standpoint of an observer riding along on the rotating device, the beam of light goes from one end of the straight edge to the other, but since the light beam diverges from the edge and passes through the interior of the polygon, it follows that from the standpoint of the rotating observer the ray of light is emitted from one vertex and curves in toward the center of rotation and then curves back to reach the next mirror. Likewise, the counter-rotating ray travels outside the polygon, so when viewed from the rotating frame it appears to curve outward (away from the center) and then back.

So, on a typical segment between two mirrors M1 and M2, when viewed from the rotating frame of reference, the two light rays follow curved paths as shown in the drawing below:

The amount of this "warping" of the light rays depends mainly on the shape of the path and the speed of the rim, so if we have significant warping of light rays with small r, the warping won't be reduced by increasing the radius while holding the mirror speed constant. Any bending of light rays would reveal to an observer that the segment M1 to M2 is not inertial, so if we want to construct a scenario in which an observer sitting on a mirror is "inertial for all practical purposes", we need to make each segment subtend a very small arc of the disk and/or limit the rim speed, as well as restricting our attention to a short enough span of time so that the rotating observer doesn't rotate through an appreciable angle.

One thing that sometimes misleads people when assessing how things look from the perspective of a rim observer is that they believe it was only necessary to consider the centripetal acceleration, v2/R, of each point on the rim, but clearly if our objective is to assess the speed of light with respect to a coordinate system in which an observer at a particular point on the rim is stationary, we must determine the full accelerations of the points on the rim relative to that system of coordinates. This includes the full five-term expression for the acceleration of a moving point relative to an arbitrarily moving coordinate system. On that basis we find that the light rays are subjected to an "acceleration" field whose dominant term has a magnitude in the direction of travel of

where  is the angular distance from the observer. (Note that this acceleration is defined on the basis of "slow-measuring-rod- transport" lengths around the loop, combined with time intervals corresponding to the rim observer's co-moving inertial frame. Also, note that "vc/R" is characteristic of the Coriolis term, as opposed to v2/R for the centripetal term.) Integrating these accelerations in both directions gives the pseudo-speeds (i.e., the speeds relative to the accelerating coordinates) of the two light beams as a function of position in the acceleration field

The average pseudo-speeds of the co- and counter-rotating beams around the loop are therefore cv and c+v respectively, which gives a constant "anisotropic ratio". However, these speeds differ from c at any particular point only in proportion to the pseudo- gravitational potential relative to the observer's location. The amplitude of the acceleration field averaging cv/R does indeed go to zero as the radius R increases while holding the rim speed v constant, but the integral of (cv/R)sin() over the entire loop still always gives the speed distribution around the rim noted above, with the maximum anisotropy occurring at the opposite point on the circumference (where the pseudo- gravitational potential difference relative to the observer is greatest), and this gives the constant "anisotropic ratio". All of this is in perfect accord with the principles of relativity.

Of course, if the problem is treated in terms of inertial coordinates, then acceleration isn't an issue, and the solution is purely kinematical. However, our purpose here is to examine the consequences of re-casting the Sagnac effect into a system of non-inertial coordinates in which an observer sitting on the rim is stationary, which means we need to absorb into the coordinates not only his circular translatory motion but also his rotation. This introduces fictitious forces and acceleration/gravitational fields which must be taken into account. Needless to say, there's no need to go to all this trouble, since the treatment in an inertial frame is completely satisfactory. The only reason for re-casting this in non- inertial coordinates is to illustrate how the general relativistic theory accommodates the use of arbitrary coordinates.

Now, it's certainly true that there is no single coherent set of coordinates with respect to which all the points on the disk are fully stationary, where the term "coherent" signifies a single unique locus of inertial simultaneity. We can, however, construct a coherent set of coordinates with respect to which one particular point on the rim is fully stationary, and then use slow-transport methods for assigning spatial distances between any two mirrors, and combine this with the observer's proper time as the basis for defining velocities, accelerations, etc, with respect to the rim observer's accelerating coordinates.

To understand the nature of the pseudo-gravitational fields that exist with respect to these accelerating coordinates, carry out the transformation to the observer's system in two steps. First, construct a non-rotating system of coordinates in which the observer is constantly at the origin. Thus we have absorbed his circular motion but not his rotation into these coordinates. The result is illustrated below, where the disk is regarded as rotating about the "stationary" observer riding on the rim, and the circles represent the disk position at different "times" (relative to these coordinates).

So, at this stage, each point on the disk is twirling around the observer at an angular speed of w (the same as the speed of the disk in the hub-centered coordinates). If we draw the spiro-graph traced out by a point moving around the circle at speed c while the circle rotates slightly about the observer with angular speed w = v/R, we see that the co-and counter-rotating directions have different path lengths, precisely accounting for the difference in travel times. Thus, even with respect to these accelerating coordinates (in which the observer has a fixed position), the Sagnac effect is still due strictly to the difference in path length, which demonstrates how directly the Sagnac effect is due not just to acceleration in general but specifically to rotation.

Next, we absorb the rotation of the disk into our coordinates, so the disk is no longer twirling around the observer. However, by absorbing the twirl of the disk into the coordinates, we introduce an anisotropic pseudo-gravitational field (relative to the "stationary" observer), for particles or light rays moving around the loop. The fact that the "speed of light" in these coordinates can differ from c is exactly analogous to how the distant stars have enormous speeds with respect to the Earth's rotating coordinates, and that speed is attributed to the enormous pseudo-gravitational potential which exists at those distances with respect to the Earth's coordinates. Similarly, relative to our rim observer, the maximum gravitational potential difference is at the furthest point on the circle, i.e., the point diametrically opposite on the disk, which is also where the greatest anisotropy in the "speed of light" (with respect to these particular non-inertial coordinates) occurs.

Thus, to first order with relatively small mirror speeds, the light rays are subjected to an "acceleration" field whose magnitude in the directions of travel is (vc/R)sin() where  is the angular distance from the observer. Now, it might seem that we are unable to account for the anisotropic effect of acceleration, on the assumption that all the points on the rim are subject to the same acceleration, so there can be no differential effect for light rays moving in opposite directions around the loop. However, that's not the case, for two reasons. First, the acceleration (with respect to these accelerating coordinates) is not constant, and second it is the Coriolis (not the centripetal) acceleration that produces the dominant effect. The Coriolis acceleration is the cross product of the rotation (pseudo) vector w with the velocity vector of the object in question, and this has an opposite sense depending on whether the object (or light ray) is moving in the co-rotating or counter- rotating direction.

Of course, both directions eventually encounter the same amount of positive and negative acceleration, but in the opposite order. Thus, they both start out at c, and one experiences an increase in velocity of +v followed by a decrease of -v, whereas the other drops down by -v first and then increases by +v. Thus their accelerations and velocities as functions of angular position are as shown below:

The average speeds of the co- and counter-rotating beams around the loop are therefore c- v and c+v respectively, which gives the constant "anisotropic ratio". Notice that the speeds differ from c only where there is significant pseudo-gravitational potential relative to the observer's location (just as with the distant stars, and of course the relation is reciprocal). The intensity of the acceleration field is on the order of cv/R, which does indeed go to zero as the radius R increases while holding the rim speed v constant, but the integral of (cv/R)sin() over the entire loop still always gives the speed distribution around the rim noted above, with the maximum anisotropy occurring at the opposite point on the circumference (where the pseudo-gravitational potential difference relative to the observer is greatest), and this gives the constant "anisotropic ratio".

It's also worth noting that the anisotropic ratio of speeds given by this pseudo- gravitational potential corresponds precisely to the anisotropic distances when the Sagnac device is analyzed with respect to the instantaneously co-moving inertial frame of the rim observer.

5.2 Tensors, Contravariant and Covariant

Ten masts at each make not the altitude which thou hast perpendicularly fell. Thy life’s a miracle. Speak yet again. Shakespeare

One of the most important relations involving continuous functions of multiple continuous variables (such as coordinates) is the formula for the total differential. In general if we are given a smooth continuous function y = f(x1,x2,...,xn) of n variables, the incremental change dy in the variable y resulting from incremental changes dx1, dx2, ..., dxn in the variables x1, x2, ... ,xn is given by

where y/xi is the partial derivative of y with respect to xi. (The superscripts on x are just indices, not exponents.) The scalar quantity dy is called the total differential of y. This formula just expresses the fact that the total incremental change in y equals the sum of the "sensitivities" of y to the independent variables multiplied by the respective incremental changes in those variables. (See the Appendix for a slightly more rigorous definition.)

If we define the vectors

then dy equals the scalar (dot) product of these two vectors, i.e., we have dy = gd. Regarding the variables x1, x2,..., xn as coordinates on a manifold, the function y = f(x1,x2,...,xn) defines a scalar field on that manifold, g is the gradient of y (often denoted as y), and d is the differential position of x (often denoted as dx), all evaluated about some nominal point [x1,x2,...,xn] on the manifold.

The gradient g = y is an example of a covariant tensor, and the differential position d = dx is an example of a contravariant tensor. The difference between these two kinds of tensors is how they transform under a continuous change of coordinates. Suppose we have another system of smooth continuous coordinates X1, X2, ..., Xn defined on the same manifold. Each of these new coordinates can be expressed (in the region around any particular point) as a function of the original coordinates, Xi = Fi(x1, x2, ..., xn), so the total differentials of the new coordinates can be written as

Thus, letting D denote the vector [dX1,dX2,...,dXn] we see that the components of D are related to the components of d by the equation

This is the prototypical transformation rule for a contravariant tensor of the first order. On the other hand, the gradient vector g = y is a covariant tensor, so it doesn't transform in accord with this rule. To find the correct transformation rule for the gradient (and for covariant tensors in general), note that if the system of functions Fi is invertible (which it is if and only if the determinant of the Jacobian is non-zero), then the original coordinates can be expressed as some functions of these new coordinates, xi = fi(X1, X2, ..., Xn) for i = 1, 2, .., n. This enables us to write the total differentials of the original coordinates as

If we now substitute these expressions for the total coordinate differentials into equation (1) and collect by differentials of the new coordinates, we get

Thus, the components of the gradient of g of y with respect to the Xi coordinates are given by the quantities in parentheses. If we let G denote the gradient of y with respect to these new coordinates, we have

This is the prototypical transformation rule for covariant tensors of the first order. Comparing this with the contravariant rule given by (2), we see that they both define the transformed components as linear combinations of the original components, but in the contravariant case the coefficients are the partials of the new coordinates with respect to the old, whereas in the covariant case the coefficients are the partials of the old coefficients with respect to the new.

The key attribute of a tensor is that it's representations in different coordinate systems depend only on the relative orientations and scales of the coordinate axes at that point, not on the absolute values of the coordinates. This is why the absolute position vector pointing from the origin to a particular object in space is not a tensor, because the components of its representation depend on the absolute values of the coordinates. In contrast, the coordinate differentials transform based solely on local information.

So far we have discussed only first-order tensors, but we can define tensors of any order. One of the most important examples of a second-order tensor is the metric tensor. Recall that the generalized Pythagorean theorem enables us to express the squared differential distance ds along a path on the spacetime manifold to the corresponding differential components dt, dx, dy, dz as a general quadratic function of those differentials as follows

Naturally if we set g00 = g11 = g22 = g33 = 1 and all the other gij coefficients to zero, this reduces to the Minkowski metric. However, a different choice of coordinate systems (or a different intrinsic geometry, which will be discussed in subsequent sections) requires the use of the full formula. To simplify the notation, it's customary to use the indexed variables x0, x1, x2, x3 in place of t, x, y, z respectively. This allows us to express the above metrical relation in abbreviated form as

To abbreviate the notation even more, we adopt Einstein's convention of omitting the summation symbols altogether, and simply stipulating that summation from 0 to 3 is implied over any index that appears more than once in a given product. With this convention the above expression is written as

Notice that this formula expresses something about the intrinsic metrical relations of the space, but it does so in terms of a specific coordinate system. If we considered the metrical relations at the same point in terms of a different system of coordinates (such as changing from Cartesian to polar coordinates), the coefficients g would be different.

Fortunately there is a simple way of converting the g from one system of coordinates to another, based on the fact that they describe a purely localistic relation among differential  quantities. Suppose we are given the metrical coefficients g for the coordinates x , and we are also given another system of coordinates y that are defined in terms of the x by some arbitrary continuous functions

Assuming the Jacobian of this transformation isn't zero, we know that it's invertible, and so we can just as well express the original coordinates as continuous functions (at this point) of the new coordinates

Now we can evaluate the total derivatives of the original coordinates in terms of the new coordinates. For example, dx0 can be written as

and similarly for the dx1, dx2, and dx3. The product of any two of these differentials, dx and dx, is of the form

(remembering the summation convention). Substituting these expressions for the products of x differentials in the metric formula (5) gives

The first three factors on the right hand side obviously represent the coefficient of dydy in the metric formula with respect to the y coordinates, so we've shown that the array of metric coefficients transforms from the x to the y coordinate system according to the equation

Notice that each component of the new metric array is a linear combination of the old metric components, and the coefficients are the partials of the old coordinates with respect to the new. Arrays that transform in this way are called covariant tensors.

On the other hand, if we define an array A with the components (dx/ds)(dx/ds) where s denotes a path parameter along some particular curve in space, then equation (2) tells us that this array transforms according to the rule

This is very similar to the previous formula, except that the partial derivatives are of the new coordinates with respect to the old. Arrays whose components transform according to this rule are called contra-variant tensors.

When we speak of an array being transformed from one system of coordinates to another, it's clear that the array must have a definite meaning independent of the system of coordinates. We could, for example, have an array of scalar quantities, whose values are the same at a given point, regardless of the coordinate system. However, the components of the array might still be required to change for different systems. For example, suppose the temperature at the point (x,y,z) in a rectangular tank of water is given by the scalar field T(x,y,z), where x,y,z are Cartesian coordinates with origin at the geometric center of the tank. If we change our system of coordinates by moving the origin, say, to one of the corners of the tank, the function T(x,y,z) must change to T(xx0, yy0, zz0). But at a given physical point the value of T is unchanged.

Notice that g20 is the coefficient of (dy)(dt), and g02 is the coefficient of (dt)(dy), so without loss of generality we could combine them into the single term (g20 + g02)(dt)(dy). Thus the individual values of g20 and g02 are arbitrary for a given metrical equation, since all that matters is the sum (g20 + g02). For this reason we're free specify each of those coefficients as half the sum, which results in g20 = g02. The same obviously applies to all the other diagonally symmetric pairs, so for the sake of definiteness and simplicity we can set gab = gba. It's important to note, however, that this symmetry property doesn't apply to all tensors. In general we have no a priori knowledge of the symmetries (if any) of an arbitrary tensor.

Incidentally, when we refer to a vector (or, more generally, a tensor) as being either contravariant or covariant we're abusing the language slightly, because those terms really just signify two different conventions for interpreting the components of the object with respect to a given coordinate system, whereas the essential attributes of a vector or tensor are independent of the particular coordinate system in which we choose to express it. In general, any given vector or tensor can be expressed in both contravariant and covariant form with respect to any given coordinate system. For example, consider the vector P shown below.

We should note that when dealing with a vector (or tensor) field on a manifold each element of the field exists entirely at a single point of the manifold, with a direction and a magnitude, rather than imagining each vector to actually extends from one point in the manifold to another. (For example, we might have a vector field describing the direction and speed of the wind at each point in a given volume of air.) However, for the purpose of illustrating the relation between contravariant and covariant components, we are focusing on simple displacement vectors in a flat metrical space, which can be considered to extend from one point to another.

Figure 1 shows an arbitrary coordinate system with the axes X1 and X2, and the contravariant and covariant components of the position vector P with respect to these coordinates. As can be seen, the jth contravariant component consists of the projection of P onto the jth axis parallel to the other axis, whereas the jth covariant component consists of the projection of P into the jth axis perpendicular to that axis. This is the essential distinction (up to scale factors) between the contravariant and covariant ways of expressing a vector or, more generally, a tensor. (It may seem that the naming convention is backwards, because the "contra" components go with the axes, whereas the "co" components go against the axes, but historically these names were given on the basis on the transformation laws that apply to these two different interpretations.)

If the coordinate system is "orthogonal" (meaning that the coordinate axes are mutually perpendicular) then the contravariant and covariant interpretations are identical (up to scale factors). This can be seen by imagining that we make the coordinate axes in Figure 1 perpendicular to each other. Thus when we use orthogonal coordinates we are essentially using both contravariant and covariant coordinates, because in such a context the only difference between them (at any given point) is scale factors. It’s worth noting that "orthogonal" doesn't necessarily imply "rectilinear". For example, polar coordinates are not rectilinear, i.e., the axes are not straight lines, but they are orthogonal, because as we vary the angle we are always moving perpendicular to the local radial axis. Thus the metric of a polar coordinate system is diagonal, just as is the metric of a Cartesian coordinate system, and so the contravariant and covariant forms at any given point differ only by scale factors (although these scale factor may vary as a function of position). Only when we consider systems of coordinates that are not mutually perpendicular do the contravariant and covariant forms differ (at a given point) by more than just scale factors.

To understand in detail how the representations of vectors in different coordinate systems are related to each other, consider the displacement vector P in a flat 2-dimensional space shown below.

 In terms of the X coordinate system the contravariant components of P are (x1, x2) and the covariant components are (x1, x2). We’ve also shown another set of coordinate axes, denoted by , defined such that 1 is perpendicular to X2, and 2 is perpendicular to X1. In terms of these alternate coordinates the contravariant components of P are (1, 2) and the covariant components are (1, 2). The symbol  signifies the angle between the two positive axes X1, X2, and the symbol ’ denotes the angle between the axes 1 and 2. These angles satisfy the relations  + ’ =  and  = (’)/2. We also have

which shows that the covariant components with respect to the X coordinates are the same, up to a scale factor of cos(), as the contravariant components with respect to the  coordinates, and vice versa. For this reason the two coordinate systems are called "duals" of each other. Making use of the additional relations

the squared length of P can be expressed in terms of any of these sets of components as follows:

In general the squared length of an arbitrary vector on a (flat) 2-dimensional surface can be given in terms of the contravariant components by an expression of the form

where the coefficients guv are the components of the covariant metric tensor. This tensor is always symmetrical, meaning that guv = guv, so there are really only three independent elements for a two-dimensional manifold. With Einstein's summation convention we can express the preceding equation more succinctly as

From the preceding formulas we can see that the covariant metric tensor for the X coordinate system in Figure 2 is

whereas for the dual coordinate system  the covariant metric tensor is

noting that cos(’) = cos(). The determinant g of each of these matrices is sin()2, so we can express the relationship between the dual systems of coordinates as

We will find that the inverses of the metric tensor is also very useful, so let's use the uv superscripted symbol g to denote the inverse of a given guv. The inverse metric tensors for the X and  coordinate systems are

Comparing the left-hand matrix with the previous expression for s2 in terms of the covariant components, we see that

so the inverse of the covariant metric tensor is indeed the contravariant metric tensor. Now let's consider a vector x whose contravariant components relative to the X axes of Figure 2 are x1, x2, and let’s multiply this by the covariant metric tensor as follows:

Remember that summation is implied over the repeated index u, whereas the index v appears only once (in any given product) so this expression applies for any value of v. Thus the expression represents the two equations

If we carry out this multiplication we find

which agrees with the previously stated relations between the covariant and contravariant components, noting that sin() = cos(). If we perform the inverse operation, multiplying these covariant components by the contravariant metric tensor, we recover the original contravariant components, i.e., we have

Hence we can convert from the contravariant to the covariant versions of a given vector simply by multiplying by the covariant metric tensor, and we can convert back simply by multiplying by the inverse of the metric tensor. These operations are called "raising and lowering of indices", because they convert x from a superscripted to a subscripted variable, or vice versa. In this way we can also create mixed tensors, i.e., tensors that are contravariant in some of their indices and covariant in others.

u It’s worth noting that, since xu = guv x , we have

Many other useful relations can be expressed in this way. For example, the angle  between two vectors a and b is given by

These techniques immediately generalize to any number of dimensions, and to tensors with any number of indices, including "mixed tensors" that have some contravariant and some covariant indices. In addition, we need not restrict ourselves to flat spaces or coordinate systems whose metrics are constant (as in the above examples). Of course, if the metric is variable then we can no longer express finite interval lengths in terms of finite component differences. However, the above distance formulas still apply, provided we express them in differential form, i.e., the incremental distance ds along a path is related to the incremental components dxj according to

so we need to integrate this over a given path to determine the length of the path. These are exactly the formulas used in 4-dimensional spacetime to determine the spatial and temporal "distances" between events in general relativity.

For any given index we could generalize the idea of contravariance and covariance to include mixtures of these two qualities in a single index. This is not ordinarily done, but it is possible. Recall that the contravariant components are measured parallel to the coordinate axes, and the covariant components are measured normal to all the other axes. These are the two extreme cases, but we could define components with respect to directions that make a fixed angle relative to the coordinate axes and normals. The transformation rule for such representations is more complicated than either (6) or (8), but each component can be resolved into sub-components that are either purely contravariant or purely covariant, so these two extreme cases suffice to express all transformation characteristics of tensors.

5.3 Curvature, Intrinsic and Extrinsic

Thus we are led to a remarkable theorem (Theorem Egregium): If a curved surface is developed upon any other surface whatever, the measure of curvature in each point remains unchanged. C. F. Gauss, 1827

The extrinsic curvature  of a plane curve at a given point on the curve is defined as the derivative of the curve's tangent angle with respect to position on the curve at that point. In other words, if (s) denotes the angle which the curve makes with some fixed reference axis as a function of the path length s along the curve, then  = d/ds. In terms of orthogonal and naturally scaled coordinates X,Y we have tan() = dX/dY. If the X axis is tangent to the curve at the point in question, then tan() approaches  and dX approaches ds, so in terms of such tangent normal coordinates the curvature can equivalently be defined as simply the second derivative,  = d2Y/dX2.

One way of specifying a plane curve is by giving a function Y = f(X) where X and Y are naturally scaled orthogonal coordinates. Natural scaling means (ds)2 = (dX)2 + (dY)2, so we have ds/dX = [1 + (dY/dX)2]1/2. The curvature can easily be determined by directly evaluating the derivative d/ds as follows

Likewise if the curve is specified parametrically by the functions X(t) and Y(t) for some 2 2 1/2 arbitrary path parameter t, we have ds/dt = (Xt + Yt ) where subscripts denote derivatives, and the curvature is

Although these derivations are quite simple and satisfactory for the case of plane curves, it's worthwhile to examine both of them more closely to clarify the application to higher dimensional cases, where it is more convenient to use the definition of curvature based on the second derivative with respect to tangent coordinates. First, let's return to the case where the plane curve was specified by an explicit function Y = f(X) for naturally scaled orthogonal coordinates X,Y. Expanding this function into a power series (up to second order) about the point of interest, we have constants A,B,C such that Y = AX2 + BX + C. The constant C is just a simple displacement, so it's irrelevant to the shape of the curve. Thus we need only consider the curve Y = AX2 + BX. If B is non-zero this curve is not tangent to the X axis at the origin. To remedy this we can consider the curve with respect to a rotated system of coordinates x,y, related to the original coordinates by the transformation equations

Substituting these expressions for X and Y into the equation Y = AX2 + BX and re- arranging terms gives

If we select an angle  such that the coefficient of the linear term in the numerator vanishes, i.e., if we set Bcos() + sin() = 0 by putting  = invtan(-B), then the numerator is purely second order. If we then expand the denominator into a power series in x and y, the product of this series with the numerator yields just a constant times the numerator plus terms of third and higher order in x and y. Hence the non-constant terms in the denominator are insignificant up to second order, so the denominator is effectively just equal to the constant term. Inserting the value of  into the above equation gives

The curvature  at the origin is just the second derivative, so we have

where subscripts denote derivatives, and we have used the facts that, for the original function f(X) at the origin we have fX = B and fXX = 2A. This shows how we can arrive (somewhat laboriously) at our previous result by using the "second derivative" definition of curvature and an explicitly defined curve Y = f(X).

A plane curve can be expressed parametrically as a function of the path length s by the 2 2 2 2 2 functions x(s), y(s). Since (ds) = (dx) + (dy) , it follows that xs + ys = 1 (where again subscripts denote derivatives). The vector (xs,ys) is tangent to the curve, so the perpendicular vector (-ys,xs) is normal to the curve. The vector (xss,yss) represents the rate of change of the tangent direction of the curve with respect to s. Recall that the curvature of a line in the plane is defined as the rate of change of the angle of the curve as a function of distance along the curve, but since tan() approaches  to the second order as  goes to zero, we can just as well define curvature as the rate of change of the tangent. 2 1/2 2 1/2 Noting that ys = (1xs ) we have yss = xsxss/(1xs ) and hence ysyss = xsxss. Thus we have yss/xss = ys/xs, which means the vector (xss,yss) is perpendicular to the curve. The 2 2 1/2 magnitude of this vector is || = (xss + yss ) , and we can define the signed magnitude as the dot product of (xss,yss) with the vector (-ys,xs), and normalize this to the length of this 2 2 1/2 vector, which happens to be (xs + ys ) = 1. This gives the signed curvature

The center of curvature of the curve at the point (x,y) is at the point (x  ys/||, y + xs/||). To illustrate, a circle of radius R centered at the origin can be expressed by the parametric equations x(s) = Rcos(s/R) and y(s) = Rsin(s/R), and the first derivatives are xs =

sin(s/R) and ys = cos(s/R). The second derivatives are xss = (1/R)cos(s/R) and yss = (1/R)sin(s/R). From this we have the magnitude of the curvature || = 1/R and the signed curvature +1/R. The sign is based on the path direction being positive in the counter-clockwise direction. The center of curvature for every point on this curve is the origin (0,0).

The preceding parametric derivation was based on the path length parameter s, but we can also define a curve in terms of an arbitrary parameter t, not necessarily the path length. In this case we have the functions x(t), y(t), and s(t). Again we have (ds)2 = (dx)2 2 2 2 2 + (dy) , so the derivatives of these three functions are related by xt + yt = st . We also have xs = xt/st and ys = yt/st, and the second derivatives are

and the similar expression for yss. Substituting into the previous formula for the signed curvature we get

The techniques described above for determining the curvature of plane curves can be used to determine the sectional curvatures of a two-dimensional surface embedded in three-dimensional space. Notice that the general power series expansion of a curve 2 3 defined by the function f(x) is f(x) = c0 + c1 x + c2 x + c3 x + ..., but by choosing coordinates so that the curve passes through the origin tangent to the x axis at the point in question we can arrange to make c0 = c1 = 0, so the expansion of the curve about this 2 3 point can be put into the form f(x) = c2 x + c3 x + ... Also, since the 2nd derivative is f''(x) = 2c2 + 6c3 x ..., evaluating this at x = 0 gives simply f''(0) = 2c2, so it's clear that only the 2nd-order term is significant in determining the curvature with respect to tangent normal coordinates, i.e., it is sufficient to represent the curve as f(x) = ax2. Similarly if we consider the extrinsic curvature of a cross-section of a two-dimensional surface in space, we see that at any given point on the surface we can construct an orthogonal "xyz" coordinate system such that the xy plane is tangent to the surface and the z axis is perpendicular to the surface at that point. In general the equation of our surface can be expanded about this point into a polynomial giving the "height" z as a function of x and y. As in the one-dimensional case, the constant and 1st-order terms of this polynomial will be zero (because we defined our coordinates tangent to the surface with the origin at the point in question), and the 3rd and higher order terms don't affect the second derivative at the origin, so we can represent our surface by just the 2nd-order terms of the expansion, i.e.,

The second (partial) derivatives of this function with respect to x and y are 2a and 2c respectively, so these numbers give us some information about the curvature of the surface. However, we'd really like to know the curvature of the surface evaluated in any direction, not just in the x and y directions. (Note that the tangency condition uniquely determines the direction of the z axis, but the x and y axes can be rotated anywhere in the tangent plane.)

In general we can evaluate the curvature of the surface in the direction of the line y = qx for any constant q. The equation of the surface in this direction is simply

but of course we want to evaluate the derivatives with respect to changes along this direction, rather than changes in the pure x direction. Parametrically the distance along the tangent plane in the y = qx direction is s2 = x2 + y2 = (1 + q2) x2, so we can substitute for x2 in the preceding equation to give the value of f as a function of the distance s

The second derivative of this function gives the extrinsic curvature (q) of the surface in the "q" direction:

Now we might ask what directions give the extreme (min and max) curvatures. Setting the derivative of (q) to zero gives the result

where m = (c  a)/b. Since the constant term of this quadratic is 1 it follows that the product of the two roots of this equation is also 1, which means that each of them is the negative reciprocal of the other, so the lines of min and max curvature are of the form y = qx and y = (1/q)x, which shows that the two directions are perpendicular.

Substituting the two "extreme" values of q into the equation for (q) gives (see the Appendix for details) the two "principal curvatures" of the surface

The product of these two is called the "Gaussian curvature" of the surface at that point, and is given by

which of course is just the (negative) discriminant of the quadratic form ax2 + bxy + cy2. For the surface of a sphere of radius R this quantity equals 1/R2 (as derived in the Appendix).

Another measure of the curvature of a surface is called the "mean curvature", which, as the name suggests, is the mean value of the curvature over all possible directions. Since we want to give all the directions equal weight, we insert tan() for q in the equation for (q) and then integrate over , giving the mean curvature

(Of course, we could also infer this mean value directly as the average of 1 and 2 since  is symmetrically distributed.) Notice that the mean curvature occurs along two perpendicular directions, and these are oriented at 45 degrees relative to the "principal" directions. This can be verified by setting the derivative of the product (q) (-1/q) to zero and noting that the resulting quartic in q factors into two quadratics, one giving the two principal directions, and the other giving the directions of the mean curvature. (The product (q) (-1/q) is obviously a maximum when both terms have the mean value, and a minimum when the terms have their extreme values.)

Examples of surfaces with constant Gaussian curvature are the sphere, the plane, and the pseudosphere, which have positive, zero, and negative curvature respectively. (Negative Gaussian curvature signifies that the two principal curvatures have opposite signs, meaning the surface has a "saddle" shape.) Surfaces with vanishing mean curvature are called "minimal surfaces", and represent the kinds of surfaces that are formed by a "soap films". For many years the only complete and non-self-intersecting minimal surfaces known were the plane, the catenoid, and the helicoid, but recently an infinite family of such minimal surfaces was discovered.

The above discussion was based on extrinsic properties of surfaces, i.e., measuring the rate of deviation between one surface and another. However, we can also look at curvature from an intrinsic standpoint, in terms of the relations between points within the surface itself. For example, if we were confined the surface of a sphere of radius R, we would find that the ratio Q of the circumference to the "radius" of a circle as measured on the surface of the sphere would not be constant but would depend on the circle's radius r according to the relation Q =  (R/r) sin(r/R). Evaluating the second derivative of Q with respect to r in the limit as r goes to zero we have

Thus we can infer the radius of our sphere entirely from local measurements over a small region of the surface. The results of such local measurements of intrinsic distances on a surface can be encapsulated in the form of a "metric tensor" relative to any chosen system of coordinates on the surface.

In general, any two-dimensional surface embedded in three-dimensional space can be represented over a sufficiently small region by an expression of the form Z = f(X,Y) where X,Y,Z are orthogonal coordinates. The expansion of this function up to second order is

where A,B..,E are constants. If the coefficients D and E are zero, the surface is tangent to the XY plane, and we can immediately compute the Gaussian curvature and the metric tensor as discussed previously. However, if D and E are not zero, we need to rotate our coordinates so that the XY plane is tangent to the surface. To accomplish this we can apply the usual Euler rotation matrix for a rotation through an angle  about the z axis followed by a rotation through an angle  about the (new) X axis. Thus we have a new system of orthogonal coordinates x,y,z related to the original coordinates by

Making these substitutions for X,Y, and Z in (1) gives the equation of the surface in terms of the rotated coordinates. The coefficients of the linear terms in x and y in this transformed equation are

Dcos() Esin() Dsin()cos() + Ecos()cos() + sin() respectively. To make these coefficients vanish we must set

Substituting these angles into the full expression gives

The cross-product terms involving xz, yz, and z2 have been omitted, because if we bring these over to the left side and factor out z, we can then divide both sides by the factor (k1 + k2x + k3y + k4z), and the power series expansion of this, multiplied by the second-order terms in x and y, gives just a constant times those terms, plus terms of third and higher order in x,y, and z, which do not affect the curvature at the origin. Therefore, the second- order terms involving z drop out, and we're left with the above quadratic for z. This describes a surface tangent to the xy plane at the origin, i.e., a surface of the form z = ax2 + bxy + cy2, and the curvature of such a surface equals 4acb2 at the origin, so the curvature of the above surface at the origin is

Remember that we began with a surface defined by the function Z = f(X,Y), and from equation (1) we see that the partial derivatives of the function f with respect to X and Y at the origin are

Consequently, the equation for the curvature of the surface can be written as

In addition, if we take the differentials of both sides of (1) we have

Inserting this for dZ into the metrical expression (ds)2 = (dX)2 + (dY)2 + (dZ)2 gives the metric at the origin on the surface with respect to the XY coordinates projected onto the surface: where

Thus the curvature of the surface can also be written in the form

2 where g = gXXgYY gXY . The quantities in the numerator of the right hand expression are the coefficients of the "second groundform" of the surface, and the metric line element is called the first groundform. Hence the curvature is simply the ratio of the determinants of the two groundforms.

The preceding was based on treating the 2D surface embedded in 3D space defined by giving Z explicitly as a function of X and Y. This is analogous to our treatment of curves in the plane based on giving Y as an explicit function of X. However, we found that a more general and symmetrical expression for the curvature of a plane curve was found by considering the curve defined parametrically, i.e., giving x(u) and y(u) as functions of an arbitrary path parameter u. Similarly we can define a 2D surface in 3D space by giving x(u,v), y(u,v) and z(u,v) as functions of two arbitrary coordinates on the surface. From the Euclidean metric of the embedding 3D space we have

where subscripts denote partial derivatives. We also have the total differentials

which can be substituted into the basic 3D Euclidean metric (ds)2 = (dx)2 + (dy)2 + (dz)2 to give the 2D metric of the surface with respect to the arbitrary surface coordinates u,v

where

The space-vectors [xu,yu,zu] and [xv,yv,zv] are tangent to the surface and point along the u and v directions, respectively, so the cross-product of these two vectors is a vector normal to the surface

whose magnitude is

The space-vectors [xuu,yuu,zuu] and [xvv,yvv,zvv] represent the rates of change of the tangent vectors to the surface along the u and v directions, and the vector [xuv,yuv,zuv] represents the rate of change of the u tangent with respect to v, and vice versa. Thus if we take the dot products of each of these vectors with the unit vector normal to the surface, we will get the signed coefficients of an expression for the surface of the pure quadratic form h(u,v) = au2 + buv + cv2. where "h" can be regarded as the height above the tangent plane at the origin, and the three scaled triple products correspond to huu = 2a, huv = b, and hvv = 2c.

If u and v were projections of orthogonal coordinates (as were x and y in our prior discussion), the determinant of the surface metric at the origin would be 1, and the curvature would simply be 4acb2. However, in general we allow u and v to be any surface coordinates, not necessarily orthogonal, and not necessarily scaled to equal the path length along constant coordinate lines. Given orthogonal metrically scaled tangent coordinates X,Y, there exist coefficients A,B,C such that the height h above the tangent plane is h(X,Y) = AX2 + BXY + CY2, and the curvature K at the origin is simply 4ACB2. Also, for points sufficiently near the origin we have

Substituting these expressions into h(X,Y) gives h(u,v) = au2 + buv + cv2 where

With these coefficients we find

In addition, we know that the surface is asymptotic to the tangent plane at the origin, so the metric in terms of X,Y is simply (ds)2 = (dX)2 + (dY)2. Substituting the expressions for dX and dY in terms of du and dv, the metric at the origin in terms of the u,v coordinates is

From this we have the determinant of the metric

This shows that the intrinsic curvature K is related to the quantity 4acb2 by the equation

We saw previously that the coefficients 2a,b,2c are given by triple vector products divided by the normalizing factor . Writing out the triple products in determinant form, we have

Therefore the Gaussian curvature is given by

Recalling that the determinant of the transpose of a matrix is the same as of the matrix itself, we can transpose the second factor in each determinant product to give the equivalent expression

The determinant of a product of matrices is the same as the product of the determinants of those matrices, so we can carry out the matrix multiplications inside the determinant symbols. The first product of determinants can therefore be written as the single determinant

Notice that several of the entries in this matrix can be expressed purely in terms of the components guu, guv, and gvv of the metric tensor and their partial derivatives, so we can write this determinant as

In a similar manner we can expand the second product of determinants into a single determinant and express most of the resulting components in terms of the metric to give

The curvature is just 1/g2 times the difference between these two determinants. In both cases we have been able to express all the matrix components in terms of the metric, with the exception of the upper-left entries. However, notice that the cofactors of these two entries in their respective matrices are identical (namely g), so when we take the difference of these determinants the upper-left entries both appear simply as multiples of g. Thus we need only consider the difference of these two entries, which can indeed be written purely in terms of the metric coefficients and their derivatives as follows

Consequently, we can express the Gaussian curvature K entirely in terms of the intrinsic metric with respect to arbitrary two-dimensional coordinates on the surface, as follows

This formula was first presented by Gauss in his famous paper "Disquisitiones generales circa superficies curvas" (General Investigations of Curved Surfaces), published in 1827. Gauss regarded this result as quite remarkable (egregium in Latin), so it is commonly known as the Theorema Egregium. The reason for Gauss' enthusiasm is that this formula proves the Gaussian curvature of a surface is indeed intrinsic, i.e., it is not dependent on the embedding of the surface in higher dimensional space. Operating entirely within the surface we can lay out arbitrary curvilinear coordinates u,v, and then determine the metric coefficients (and their derivatives) with respect to those coordinates, and from this information alone we can compute the intrinsic curvature of the surface. The Gaussian curvature K is defined as the product of the two principle extrinsic sectional curvatures 1 and 2, neither of which is an intrinsic metrical property of the surface, but the product of these two numbers is an intrinsic metrical property.

In Section 5.7 the full Riemann curvature tensor Rabcd for manifolds of any number of dimensions is defined, and we show that Gauss' surface curvature K is equal to Ruvuv/g, which completely characterizes the curvature of a two-dimensional surface. To highlight the correspondence between Gauss' formula and the full curvature tensor, we can re-write the above formula as

vv uu uv where we have used the facts that guu/g = g , gvv/g = g , and guv/g = -g . Notice that if we define the symbol

for any three indices a,b,c, then Gauss' formula for the curvature of a surface can be written more succinctly as

No summations are implied here, but to abbreviate the notation even further, we could designate the symbols  and  as "wild card" indices, with implied summation of every term in which they appear over all possible indices (i.e., over u and v). On this basis the formula is

As discussed in Section 5.7, this is precisely the formula for the component Ruvuv of the full Riemann curvature tensor in n dimensions, which makes it clear how directly Gauss' result for two-dimensional surfaces generalizes to n dimensions. Naturally this formula 2 for K reduces to 12 = 4ac  b where 1 and 2 of the two principal extrinsic curvatures relative to a flat plane tangent to the surface at the point of interest. The reason this formula is so complicated is that it applies to any system of coordinates (rather than just projected tangent normal coordinates), and is based entirely on the intrinsic properties of the surface.

To illustrate this approach, consider the two-dimensional surface defined as the locus of points at a height h above the xy plane, where h is given by the equation

with arbitrary constants a, b, and c. For example, with a=c=0 and b=1 this gives the simple surface h = xy shown below:

For other values of a,b,c this surface can have various shapes, such as paraboloids. The function h(x,y) is single-valued over the entire xy plane, so it's convenient to simply project the xy grid onto the surface and use this as our coordinates on the surface. (Any other system of curvilinear coordinates would serve just as well.)

Over a sufficiently small interval on this surface the distance ds along a path is related to the incremental changes dx, dy, and dz according to the usual Pythagorean relation

Also the equation of the surface allows us to express the increment dz in terms of dx and dy as follows

Therefore we have

Substituting this into the equation for the line element (ds)2 gives the basic metrical equation of the surface

where the components of the "metric tensor" are

We can, in principle, directly measure the incremental distance ds for any given increments dx and dy without ever leaving the surface, so the metric components are purely intrinsic properties of the surface. In general the metric tensor is a symmetric covariant tensor of second order, and is usually written in the form of a matrix. Thus, for our simple example we can write the metric as

The determinant of this matrix at the point (x,y) is

The inverse of the metric tensor is denoted by guv , where the superscripts are still indices, not exponents. In our example the inverse metric tensor is

Substituting these metric components into the general formula for the Gaussian curvature K gives

in agreement with our earlier result for surfaces specified explicitly in the form z = f(x,y). At the origin, where x = y = 0, this gives K = 4ac  b2, i.e., the product of the two principal extrinsic curvatures. In addition, the formula gives the Gaussian curvature for any point on the surface, so we don't have to go to the trouble of laboriously constructing a tangent plane at each point and finding the quadratic expansion of the surface about that point.

We can see from this formula that the curvature at every point of this simple two- dimensional surface always has the same sign as the discriminant 4ac  b2 . Also, the shape of the constant-curvature lines on this surface can be determined by re-arranging the terms of the above equation, from which we find that the curvature equals K on the locus of points satisfying the equation

This is the equation of a conic with discriminant +4(4ac  b2)2. The case of zero curvature occurs only when the discriminant vanishes, which implies that b = , and so the equation of the surface factors as

The quantity inside the parentheses is a planar function, so the surface is a parabolic "valley", which has no intrinsic curvature (like the walls of a cylinder).

It follows from the preceding conic equation that the lines of constant curvature (if there is any curvature) must be ellipses centered on the origin. However, this is not the most general form of curvature possible on a two-dimensional surface, it's just the most general form for a surface embedded in three-dimensional Euclidean space. Suppose we embed our two-dimensional surface in four dimensional Euclidean space. We can still, at any given point on the surface, construct a two-dimensional tangent plane with orthogonal xy coordinates, and expand the equation of the surface up to second degree about that point, but now instead of just a single perpendicular height h(x,y) we allow two mutually perpendicular heights, which we may call h1(x,y) and h2(x,y). Our surface can now be defined (in the neighborhood of the origin at the point of tangency) by the equations

Following the same procedure as before, determining the components of the metric tensor for this surface and plugging them into Gauss's formula, we find that the intrinsic curvature of this surface is

where

The lines of constant curvature on this surface can be much more diverse than for a surface embedded in just three dimensional space. As an example, if we define the surface with the equations

then the lines of constant curvature are as indicated in the figure below.

We have focused on two-dimensional surfaces in this section, but the basic idea of intrinsic curvature remains essentially the same in any number of dimensions. We'll see in subsequent sections that Riemann generalized Gauss's notion of intrinsic curvature by noting that any two (distinct) directional rays emanating from a given point P, if continued geodesically and with parallel transport (both of which we will discuss in detail), single out a two-dimensional surface within the manifold, and we can determine the "sectional" curvature of that surface in the same way as described in this section. Of course, in a manifold of three or more dimensions there are infinitely many two- dimensional surfaces passing through any given point, but Riemann showed how to encode enough information about the manifold at each point so that we can compute the sectional curvature on any surface.

For spaces of n>2 dimensions, we can proceed in essentially the same way, by imagining a flat n-dimensional Euclidean space tangent to the space at point of interest, with a Cartesian coordinate system, and then evaluating how the curved space deviates from the flat space into another set of n(n1)/2 orthogonal dimensions, one for each pair of dimensions in the flat tangent space. This is obviously just a generalization of our approach for n = 2 dimensions, when we considered a flat 2D space with Cartesian coordinates x,y tangent to the surface, and described the curved surface in the region around the tangent point in terms of the "height" h(x,y) perpendicular to the surface. Since the have chosen a flat baseline space tangent to the curved surface, it follows that the constant and first-order terms of h(x,y) are zero. Also, since we are not interested in any derivatives higher than the second, we can neglect all terms of h(x,y) above second order. Consequently we can express h(x,y) as a homogeneous second-order expression, i.e.,

We saw that embedding a curved 2D surface in four dimensions allows even more freedom for the shape of the surface, but in the limit as the region becomes smaller and smaller, the surface approaches a single height. Similarly for a space of three dimensions we can imagine a flat three-dimensional space with x,y,z Cartesian coordinates tangent to the curved surface, and consider three perpendicular "heights" h1(x,y), h2(x,z), and h3(y,z).

There are obvious similarities between intrinsic curvature and ordinary spatial rotations, neither of which are possible in a space of just one dimension, and both of which are - in a sense - inherently two-dimensional phenomena, even when they exist in a space of more than two dimensions. Another similarity is the non-commutativity exhibited by rotations as well as by translations on a curved surface. In fact, we could define curvature as the degree to which translations along two given directions do not commute. The reason for this behavior is closely connected to the fact that rotations in space are non-commutative, as can be seen most clearly by imagining a curved surface embedded in a higher dimensional space, and noting that the translations on the surface actually involve rotations, i.e., angular displacements in the embedding space. Hence it's inevitable that such displacements don't commute.

5.4 Relatively Straight

There’s some end at last for the man who follows a path; mere rambling is interminable.

Seneca, 60 AD

The principle of relativity, as expressed in Newton's first law of motion (and carried over essentially unchanged into Einstein's special theory of relativity) is based on the idea of uniform motion in a straight line. However, the terms "uniform motion" and "straight line" are not as easy to define as one might think. Historically, it was usually just assumed that such things exist, and that we know them when we see them. Admittedly there were attempts to describe these concepts, but mainly in somewhat vague and often circular ways. For example, Euclid tells us that "a line is breadthless length", and "a straight line is a line which lies evenly with the points on itself". The precise literal interpretation of these statements can be debated, but they seem to have been modeled on an earlier definition given by Plato, who said a straight line is "that of which the middle covers the ends". This in turn may have been based on Parmenides' saying that "straight is whatever has its middle directly between the ends".

Each of these definitions relies on some pre-existing of idea straightness to give meaning to such terms as "lying evenly" or "directly between", so they are immediately self- referential. Other early attempts to define straightness invoked visual alignment, on the presumption that light travels in a straight line. Of course, we could simply define straightness to be congruence with a path of light, but such an empirical definition would obviously preclude asking whether, in fact, light necessarily travels in straight lines as defined in some more abstract sense. Not surprisingly, thinkers like Plato and Euclid, who wished to keep geometry and mechanics strictly separate, preferred a purely abstract a priori definition of straightness, without appealing (explicitly) to any physical phenomena. Unfortunately, their attempts to provide meaningful conceptual definition were not particularly successful.

Aristotle noted that among all possible lines connecting two given points, the straight line is the one with the shortest length, and Archimedes suggested that this property could be taken as the definition of a straight line. This at least has the merit of relating two potentially distinct concepts, straightness and length, and even gives us a way of quantifying which of two lines (i.e., curves) connecting two points is "straighter", simply by comparing their lengths, without explicitly invoking the straightness of anything else. Furthermore, this definition can be applied in a more general context, such as on the surface of the Earth, where the straightest (shortest) path between two points is an arc of a great circle, which is typically not congruent to a visual line of sight. We saw in Chapter 3.5 that Hero based his explanation of optical reflection on the hypothesis that light travels along the shortest possible path. This is a nice example of how an a priori conceptual definition of straightness led to a non-trivial physical theory about the behavior of light, which obviously would have been precluded if there had been no conception of straightness other than that it corresponds to the paths of light.

We've also seen how Fermat refined this principle of straightness to involve the variable of time, related to spatial distances by what he intuited was an invariant characteristic speed of light. Similarly the principle of least action, popularized by Maupertius and Euler, represented the application of stationary paths in various phase spaces (i.e., the abstract space whose coordinates are the free variables describing the state of a system), but for actual geometrical space (and time) the old Euclidean concept of extrinsic straightness continued to predominate, both in mathematics and in physics. Even in the special theory of relativity Einstein relied on the intuitive Euclidean concept of straightness, although he was dissatisfied with this approach, and believed that the true principle of relativity should be based on the more profound Archimedian concept of straight lines as paths with extremal lengths. In a sense, this could be regarded as relativizing the concept of straightness, i.e., rather than seeking absolute extrinsic straightness, we focus instead on relative straightness of neighboring paths, and declare the extremum of the available paths to be "straight", or rather "as straight as possible".

In addition, Einstein was motivated by the classical idea of Copernicus that we should not regard our own particular frame of reference (or any other frame of reference) as special or preferred for the laws of physics. It ought to be possible to express the laws of physics in such a way that they apply to any system of coordinates, regardless of their state of motion. The special theory succeeds in this for all uniformly moving systems of coordinates (although with the epistemological shortcoming noted above), but Einstein sought a more general theory of relativity encompassing coordinate systems in any state of motion and avoiding the circular definition of straightness.

We've noted that Archimedes suggested defining a straight line as the shortest path between two points, but how can we determine which of the infinitely many paths from any given point to another is the shortest? Let us imagine any arbitrary path through three-dimensional space from the point P1 at (x1,y1,z1) to the point P2 at (x2,y2,z2). We can completely describe this path by assigning a smooth monotonic parameter  to the points of the path, such that =0 at P1 and =1 at P2, and then specifying the values of x(), y(), and z() as functions of  The total length S of the path can be found from the functions x(), y(), and z() by integrating the differential distances all along the path as follows

Now suppose we let x(), y(), and z() denote three arbitrary functions of , representing some deviation from the nominal path, and consider the resulting "disturbed path" described by the functions

X(x() + x() Y() = y() + y() Z() = z() + z() where  is a parameter that we can vary to apply different fractions of the disturbance.

For any fixed value of the parameter  the distance along the path from P1 to P2 is given by

Our objective is to find functions x(), y(), z() such that for any arbitrary disturbance vector , the value of S() is minimized at  = 0. Those functions will then describe the “straightest” path from P1 to P2.

To find the minimal value of S() we differentiate with respect to . It's legitimate to perform this differentiation inside the integral, so (omitting the indications of functional dependencies) we can write

We can evaluate the derivatives with respect to  based on the definitions of X,Y,Z as follows

Therefore, the derivatives of these with respect to  are simply

Substituting these expressions into the previous equation gives

We want this quantity to equal zero when  equals 0. Of course, in that case we have X=x, Y=y, and Z=z, so we make these substitutions and then require that the above integral vanish. Thus, letting dots denote differentiation with respect to , we have

Using "integration by parts" we can evaluate this integral, term by term. For example, considering just the x component in the numerator, we can use the "parts" variables

and then the usual formula for integration by parts gives

The first term on the right-hand side automatically vanishes, because by definition the disturbance components x,y,z are all zero at the end-points of the path. Applying the same technique to the other components, we arrive at the following expression for the overall integral which we wish to set to zero

The coefficients of the three terms in the integrand are the disturbance functions x, y, z, which are allowed to take on any arbitrary values in between = 0 and = 1. Regardless of the values of these three disturbance components, we require the integral to vanish. This is a very strong requirement, and can only be met by setting each of the three derivatives in parentheses to zero, i.e., it requires

This implies that the arguments of these three derivatives do not change as a function of the path parameter, so they have constant values all along the path. Thus we have

The numerators of these expressions can be regarded as the x, y, and z components, respectively, of the "rate" of motion (per ) along the path, whereas the denominators represent the total magnitude of the motion. Thus, these conditions tell us that the components of motion along the path are in a constant ratio to each other, which means that the direction of motion is constant, i.e., a straight line. So, to reach from P1 to P2, the constants must be given by Cx = (x2  x1)/D, Cy = (y2  y1)/D, and Cz = (z2  z1)/D, where 2 2 2 2 D is the total distance given by D = (x2  x1) + (y2  y1) + (z2  z1) . Given an initial trajectory, the entire path is determined by the assumption that it proceeds from point to point always by the shortest possible route.

So far we have focused on finding the geodesic paths in ordinary Euclidean three- dimensional space, and found that they correspond to our usual notion of straight lines. However, in a space with a different metric, the shapes of geodesic paths can be more complicated. To determine the general equations for geodesic paths, let us first formalize the preceding "variational" technique. In general, suppose we wish to determine a function x() from 1 to 2 such that the integral of some function F(,x, ) along that path is stationary. (As before, dots signify derivatives with respect to .) We again define an arbitrary disturbance x(x) and the disturbed function X(,) = x() + x(), where  is a parameter that determined how much of the disturbance is to be applied. We wish to make stationary the integral

This is done by differentiating S with respect to the parameter  as follows

Substituting for dX/d and /d gives

We want to set this quantity to zero when  = 0, which implies X = x, so we require

The integral of the second term in parentheses (integration by parts) as

The first term on the right-hand side is identically zero (since the disturbance is defined to be zero at the end points), so we can substitute the second term back into the preceding equation and factor out the disturbance x() to give

Again, since this equations must be satisfied for every possible (smooth) disturbance function x(), it requires that the quantity in parentheses vanish identically, so we arrive at the Euler equation

which is the basis for solving a wide variety of problems in the calculus of variations.

The application of Euler's equation that most interests us is in finding the general equation of the straightest possible path in an arbitrary smooth manifold with a defined metric. In this case the function whose integral we wish to make stationary is the absolute spacetime interval, defined by the metric equation

where, as usual, summation is implied over repeated indices. Multiplying the right side by (d/d)2 and taking the square root of both sides gives the differential "distance" ds along a path parameterized by . Integrating along the path from 1 and 2 gives the distance to be made stationary

For each individual coordinate x this can be treated as a variational problem with the function

where again dots signify differentiation with respect to . (Incidentally, the metric need not be positive-definite, since we can always choose our sign convention so that the squared intervals in question are positive, provided we never integrate along a path for which the squared interval changes sign, which would represent changing from timelike to spacelike, or vice versa, in relativity.) Therefore, we can apply Euler's equation to immediately give the equations of geodesic paths on the surface with the specified metric

For an n-dimensional space this represents n equations, one for each of the coordinates x1, x2, ..., xn. Letting w = (ds/d)2 = F2 = this can be written as

To simplify these equations, let us put the parameter  equal to the integrated path length s, so that we have w = 1 and dw/d = 0. The right-most term drops out, and we're left with

Notice that even though w equals a constant 1 in these circumstances and the total derivative vanishes, the partial derivatives do not necessarily vanish. Indeed, if we substitute for into this equation we get

Evaluating the derivative in the left-hand term and dividing through by 2, this gives

At this point it's conventional to make use of the identity

(where we have simply swapped the  and  indices) to represent the middle term of the preceding equation as half the sum of these two expressions. This enables us to write the geodesic equations in the form

where the symbol [] is defined as

These are called connection coefficients, also known as Christoffel symbols of the first kind. Finally, if we multiply through by the contravariant metric g, we have

where

are known as Christoffel symbols of the second kind.

As an example, consider the simple two-dimensional surface h = ax2 + bxy + cy2 discussed in Chapter 5.3. Using the metric tensor, its inverse, and partial derivatives we can now directly compute the Christoffel symbols, from which we can give explicit parametric equations for the geodesic paths on our surface:

If we scale and rotate the coordinates so that the surface height has the form h = xy/R, the geodesic equations reduce to

These equations show that if either dx/ds or dy/ds equals zero, the second derivatives of x and y with respect to s must be zero, so lines of constant x and lines of constant y are geodesics (as expected, since these are straight lines in space). Of course, given an initial trajectory that is not parallel to either the x or y axis the resulting geodesic path on this surface will be curved, and can be explicitly computed from the above formulas.

5.5 The Schwarzschild Metric From Kepler's 3rd Law

In that same year [1665] I began to think of gravity extending to the orb of the Moon & from Kepler’s rule of the periodical times of the Planets being in sesquialterate proportion of their distances from the centers of their Orbs, I deduced that the forces which keep the Planets in their Orbs must be reciprocally as the squares of their distances from the centers about which they revolve: and thereby compared the force requisite to keep the Moon in her Orb with the force of gravity at the surface of the earth, and found them answer pretty nearly.

Isaac Newton

The simplest non-trivial configuration of spacetime in which gravity plays a role is for the region surrounding a static mass point, for which we can assume that the metric has perfect spherical symmetry and is independent of time. Historically this was first found by Karl Schwarzschild in 1916 as a solution of Einstein’s field equations (see Section 6.1), and all the original empirical tests of general relativity can be inferred from this solution. However, even without knowing the field equations of general relativity, it is possible to give a very plausible (if not entirely rigorous) derivation of the Schwarzschild metric purely from knowledge of the inverse square characteristic of gravity, Kepler’s third law for circular orbits, and the null intervals of light paths.

Let r denote the radial spatial coordinate, so that every point on a surface of constant r has the same intrinsic geometry and the same relation to the mass point, which we fix at r = 0. Also, let t denote our temporal coordinate. Any surface of constant r and t must possess the two-dimensional intrinsic geometry of a 2-sphere, and we can scale the radial parameter r such that the area of this surface is 4 r2. (Notice that since the space may not be Euclidean, we don't claim that r is "the radial distance" from the mass point. Rather, at this stage r is simply an arbitrary radial coordinate scaled to give the familiar Euclidean surface area.) With this scaling, we can parameterize the two-dimensional surface at any given r (and t) by means of the ordinary "longitude and latitude" spherical metric

where dS is the incremental distance on the surface of an ordinary sphere of radius r corresponding to the incremental coordinate displacements d and d. The coordinate  represents "latitude", with  = 0 at the north pole and  = /2 at the equator. The coordinate  represents the longitude relative to some arbitrary meridian.

On this basis, we can say that the complete spacetime metric near a spherically symmetrical mass m must be of the form

2 2 2 where g = r , g = r sin() , and gtt and grr are (as yet) unknown functions of r and the central mass m. Of course, if we set m = 0 the functions gtt and grr must both equal 1 in order to give the flat Minkowski metric (in polar form), and we also expect that as r increases to infinity these functions both approach 1, regardless of m, since we expect the metric to approach flatness sufficiently far from the gravitating mass.

This metric is diagonal, so the non-zero components of the contravariant metric tensor are  g = 1/g. In addition, the diagonality of the metric allows us to simplify the definition of the Christoffel symbols to

Now, the only non-zero partial derivatives of the metric coefficients are

along with gtt/dr and grr/dr, which are yet to be determined. Inserting these values into the preceding equation, we find that the only non-zero Christoffel symbols are

These are the coefficients of the four geodesic equations near a spherically symmetrical mass. Writing them out in full, we have

We assume that, in the absence of non-gravitational forces, all natural motions (including light rays and massive particles) follow geodesic paths, so these equations provide a complete description of inertial/gravitational motions of test particles in a spherically symmetrical field. All that remains is to determine the metric coefficients gtt and grr.

We expect that one possible solution should be circular Keplerian orbits, i.e., if we regard r as corresponding (at least approximately) to the Newtonian radial distance from the center of the mass, then there should be a circular geodesic path at constant r that revolves around the central mass m with an angular velocity of , and these quantities must be related (at least approximately) in accord with Kepler's third law

(The original deductions of an inverse-square law of gravitation by Hooke, Wren, Newton, and others were all based on this same empirical law. See Section 8.1 for a discussion of the origin of Kepler's law.) If we consider purely circular motion on the equatorial plane ( = /2) at constant r, the metric reduces to

and since dr/d = 0 the geodesic equations for these circular paths reduce to

Multiplying through by (d/dt)2 and identifying the angular speed  with the derivative of  with respect to the coordinate time t, the right hand equation becomes

For consistency with Kepler's Third Law we must have 2 equal (or very nearly equal) to m/r3, so we make this substitution to give

Integrating this equation, we find that the metric coefficient gtt must be of the form k  (2m/r) where k is a constant of integration. Since gtt must equal 1 when m = 0 and/or as r approaches infinity, it's clear that k = 1, so we have

Also, for a photon moving away from the gravitating mass in the purely radial direction, we have d = 0, and so our basic metric for a purely radial ray of light gives

Next we consider a stationary test particle at a radial coordinate r. The metric equation gives the line element for the worldline of this test particle

and we also have the radial geodesic equation for this particle

The left hand side is the acceleration of gravity d2r/d2 in geometrical units, which is taken to be the inverse square expression –m/r2. Inserting this expression and substituting from equations (1) and (4), we get

This implies that the “perpendicular” factorization gtt = dr/dt and grr = dt/dr, in equation

(3), and it gives grr = 1/gtt, so we have the complete Schwarzschild metric

from which nearly all of the experimentally accessible consequences of general relativity follow.

In matrix form the Schwarzschild metric is written as

Now that we've determined gtt and grr, we have the partials

so the Christoffel symbols that we previously left undetermined are

Therefore, the complete set of geodesic equations for the Schwarzschild metric are

There are all parametric equations, where  denotes a parameter that monotonically varies along the path. When dealing with massive particles, which travel at sub-light speeds, we must choose  proportional to , the integrated lapse of proper time along the path. On the other hand, the lapse of proper time along the path of a massless particle (such as a photon) is zero by definition, so this raises an interesting question: How is it possible to extremize the “length” of a path whose length is identically zero? Even though the path of a photon has singular proper time, the path is not singular in all respects, so we can still parameterize the path by simply assigning monotonic values of  to the points on the path. (Notice that, since geodesics are directionally symmetrical, it doesn’t matter whether  is increasing or decreasing in the direction of travel.) An alternative approach to solving for light-like geodesics, based on Fermat’s principle of least time, will be discussed in Section 8.4.

To show that the equations of motion derived above (taking  as the parameter ) correspond to those of Newtonian gravity in the weak slow limit, we need only note that the scale factor between r and t is so great that we can neglect any terms that have a factor of dr/dt unless that term is also divided by r, in which case the scale factor cancels out. Also we can assume that dt/d is essentially equal to 1, and it's easy to see that if the motion of a test particle is initially in the plane  = /2 then it remains always in that plane, and by spherical symmetry this applies to all planes. So we can assume  = /2 and with the stated approximations the equations of motion reduce to the familiar Newtonian equations

where  is the angular velocity.

It’s worth noting that there is a certain ambiguity in the application of Kepler's third law and the inverse square law as heuristic guides to the equations of motion as described above, due to the distinction between coordinate time t and proper time . Newtonian physics didn't distinguish between these two, which is not surprising, since the two are practically indistinguishable in weak gravitational fields for objects moving at much less than the speed of light. Nevertheless, the slight deviation between these two time parameters has observable consequences, and provides important tests for distinguishing between the spacetime geodesic approach and the Newtonian force-at-a-distance approach to gravitation. In our derivation we assumed that Kepler's third law is exactly satisfied with respect to coordinate time t rather than to the proper time  of the orbiting particle (i.e., we defined the angular speed of the orbit as d/dt rather than d/d, whereas we assumed that the simple inverse-square acceleration law is satisfied with respect to the proper time  of the falling particle. Thus, without some rationale for why Kepler’s law for circular orbits should have its simplest expression in terms of coordinate time (i.e., the time coordinate in terms of which the metric is stationary) while the radial acceleration of a stationary particle should have its simplest expression in terms of proper time, the derivation is not free of ambiguity. In fact, had we assumed Kepler’s law applies in terms of proper time by defining as d/d instead of d/dt, we would have gotten

and the negative inverse of this for grr. These coefficients give the same Newtonian limit as the Schwarzschild metric, differing from the latter only in the second order of m/r, but of course their behavior is drastically different when m/r becomes large. For example, unlike the Schwarzschild metric coefficients, they do not exhibit a coordinate singularity at r = 2m. From an empirical standpoint, this alternative metric would give the same gravitational red shift and the same deflection of light as does general relativity, since those effects depend only on the first order terms. However, the precession of elliptical orbits depends on a second-order term in grr, and this alternative metric gives just one half of the correct value for the precession. (In terms of the “Robertson-Eddington” parameters this metric has  =  = 1 and  = 5/2.) This shows the importance of the field equations for providing a sound basis for the metric – and it also shows the importance of orbital precession as a test to discriminate between alternative metrical theories of gravity.

5.6 The Equivalence Principle

The important thing is this: to be able at any moment to sacrifice what we are for what we could become.

Charles du Bois

At the end of a review article on special relativity in 1907, in which he surveyed the stunning range and power of the relativity principle, Einstein included a section discussing the possibility of extending the idea still further.

So far we have applied the principle of relativity, i.e., the assumption that physical laws are independent of the state of motion of the reference system, only to unaccelerated reference systems. Is it conceivable that the principle of relativity also applies to systems that are accelerated relative to each other?

This might have been regarded as merely a kinematic question, with no new physical content, since we can obviously re-formulate physical laws to make them applicable in terms of alternative systems of coordinates. However, as Einstein later recalled, the thought occurred to him while writing this paper that a person in gravitational free-fall doesn’t feel their own weight. It’s as if the gravitational field does not exist. This is remarkably similar to Galileo’s realization (three centuries earlier) that, for a person in uniform motion, it is as if the motion does not exist. Interestingly, Galileo is also closely associated with the fact that a (homogeneous) gravitational field can be “transformed away” by a state of motion, because he was among the first to explicitly recognize the equality of inertial and gravitational mass. As a consequence of this equality, the free-fall path of a small test particle in a gravitational field is independent of the particle's composition. If we consider two coordinate systems S1 and S2, the first accelerating (in empty space) at a rate  in the x direction, and the second at rest in a homogeneous gravitational field that imparts to all objects an acceleration of – in the x direction, then Einstein observed that

…as far as we know, the physical laws with respect to the S1 system do not differ from those with respect to the S2 system… we shall therefore assume the complete physical equivalence of a gravitational field and a corresponding acceleration of the reference system.

This was the beginning of Einstein’s search for an extension of the principle of relativity to arbitrary coordinate systems, and for a satisfactory relativistic theory of gravity, a search which ultimately led him to reject special relativity as a suitable framework in which to formulate the most fundamental physical laws.

Despite the importance that Einstein attached to the equivalence principle (even stating that the general theory of relativity “rests exclusively on this principle”), many subsequent authors have challenged its significance, and even its validity. For example, Ohanian and Rufinni (1994) emphatically assert that “gravitational effects are not equivalent to the effects arising from an observer's acceleration...", even limited to sufficiently small regions. In support of this assertion they describe how accelerometers “of arbitrarily small size” can detect tidal variations in a non-homogeneous gravitational field based on “local” measurements. Unfortunately they overlook the significance of their own comment regarding gradiometers, i.e., “the sensitivity attained depends on the integration time… with a typical integration time of 10 seconds the sensitivity demonstrated in a recent test was about the same as that of the Eotvos balance…”. Needless to say, the “locality” restriction refers to sufficiently small regions of spacetime, not just to small regions of space. The gradiometer may be only a fraction of a meter in spatial extent, but 10 seconds of temporal extent corresponds to three billion meters, which somewhat undermines the claim that the detection can be performed with such accuracy in an arbitrarily small region of spacetime.

The same kind of conceptual error appears in every example that purports to show the invalidity of the equivalence principle. For example, one well-known modern author points out that an arbitrarily small droplet of liquid falling freely in the gravitational field of a spherical body (neglecting surface tension and wind resistance, etc) will not be perfectly spherical, but will be slightly ellipsoidal, due to the tidal effects of the inhomogeneous field… and the shape does not approach sphericity as the radius of the droplet approaches zero. Furthermore, this applies to an arbitrarily brief “snapshot” of the falling droplet. He takes this to be proof of the falsity of the equivalence principle, whereas in fact it is just the opposite. If we began with a perfectly spherical droplet, it would take a significant amount of time traversing an inhomogeneous field for the shape to acquire its final ellipsoidal form, and as the length of time goes to zero, the deviation from sphericity also goes to zero. Likewise, once the droplet has acquired its ellipsoidal shape, this becomes its initial configuration upon entering any brief and small region of spacetime, and of course it exits from that region with the same shape, in perfect agreement with the equivalence principle, which tells us to expect all the parts of the droplet to maintain their initial mutual relations when in free fall.

Other authors have challenged the validity of the equivalence principle by considering the effects of rotation. Of course, a "sufficiently small" region of spacetime for transforming away the translatory motion of an object to some degree of approximation may not be sufficiently small for transforming away the rotational motion to the same degree of accuracy, but this does not conflict with the equivalence principle; it simply means that for an infinitesimal particle in a rotating body the "sufficiently small" region of spacetime is generally much smaller than for a particle in a non-rotating body, because it must be limited to a small arc of angular travel. In general, all such arguments against the validity of the (local) equivalence principle are misguided, based on a failure to correctly limit the extent of the subject region of space and time.

Others have argued that, although the equivalence principle is valid for infinitesimal regions of spacetime, this limitation renders it more or less meaningless. But this was answered by Einstein himself several times. For example, when the validity of the equivalence principle was challenged on the grounds that an arbitrary (inhomogeneous) gravitational field over some finite region cannot be “transformed away” by any single state of motion, Einstein replied

To achieve the essential equivalence of inertia and gravitation it is not necessary that the mechanical behavior of two or more masses must be explainable by the mere effect of inertia by the same choice of coordinates. After all, nobody denies, for example, that the theory of special relativity does justice to the nature of uniform motion, even though it cannot transform all acceleration-free bodies together to a state of rest by one and the same choice of coordinates.

This observation should have settled the matter, but unfortunately the same specious objection to the equivalence principle has been raised by successive generations of critics. This is ironic, considering a purely geometrical interpretation of gravity would clearly be impossible if gravitational and inertial acceleration were not intrinsically identical. The meaning of the equivalence principle (which Einstein called “the happiest thought of my life”) is that gravitation is not something that exists within spacetime, but is rather an attribute of spacetime. Inertial motion is just a special case of free-fall in a gravitational field. There is no additional entity or coupling present to produce the effects of gravity on a test body. Gravity is geometry. This may be expressed somewhat informally by saying that if we take sufficiently small pieces of curved and flat spacetime we can't tell one from the other, because they are the same stuff. The perfect equivalence between gravitational and inertial mass noted by Galileo implies that kinematic acceleration and the acceleration of gravity are intrinsically identical, and this makes possible a purely geometrical interpretation of gravity.

At the beginning of his 1916 paper on the foundations of the general theory of relativity, Einstein discussed “the need for an extension of the postulate of relativity”, and by considering the description of a physical object in terms of a rotating system of coordinates he explained why Euclidean geometry does not apply. This is the most common way of justifying the abandonment of Euclidean geometry, but in a paper written in 1914 Einstein gave a more elementary and (arguably) more profound reason for turning from Euclidean to Riemannian geometry. He pointed out that, prior to Faraday and Maxwell, the fundamental laws of physics contained finite distances, such as 2 the distance r in Coulomb’s inverse-square law for the electric force F = q1q2/ r . Euclidean geometry is the appropriate framework in which to represent such laws, because it is an axiomatic structure based on finite distances, as can be seen from 2 2 2 propositions such as the Pythagorean theorem r1 = r2 + r3 , where r1, r2, r3 are the finite lengths of the edges of a right triangle. However, Einstein wrote

Since Maxwell, and by his work, physics has undergone a fundamental revision insofar as the demand gradually prevailed that distances of points at a finite range should not occur in the elementary laws, i.e., theories of “action at a distance” are now replaced by theories of “local action”. One forgot in this process that the Euclidean geometry too – as it is used in physics – consists of physical theorems that, from a physical aspect, are on an equal footing with the integral laws of Newtonian mechanics of points. In my opinion this is an inconsistent attitude of which we should free ourselves.

In other words, when “action at a distance” theories were replaced by “local action” theories, such as Maxwell’s differential equations for the electromagnetic field, in which only differentials of distance and time appear, we should have, for consistency, replaced the finite distances of Euclidean geometry with the differentials of Riemannian geometry. Thus the only valid form of the Pythagorean theorem is the differential form ds2 = dx2 + dy2. Einstein then commented that it is rather unnatural, having taken this step, to insist that the coefficients of the squared differentials must be constant, i.e., that the Riemann- Christoffel curvature tensor must vanish. Hence we should regard Riemannian geometry rather than Euclidean geometry as the natural framework in which to formulate the elementary laws of physics.

From these considerations it follows rather directly that the influence of both inertia and gravitation on a particle should be expressed by the geodesic equations of motion

Einstein often spoke of the first term as representing the inertial part, and the second  term, with the Christoffel symbols  , as representing the gravitational field, and he was criticized for this, because the Christoffel symbols are not tensors, and they can be non- zero in perfectly flat spacetime simply by virtue of curvilinear coordinates. To illustrate, consider a flat plane with either Cartesian coordinates x,y or polar coordinates r, as shown below

With respect to the Cartesian coordinates we have the familiar Pythagorean line element (ds)2 = (dx)2 + (dy)2. Also, since the polar coordinates are related to the Cartesian coordinates by the equations x = r cos() and y = r sin(), we can evaluate the differentials

which of course are the transformation equations for the covariant metric tensor. Substituting these differentials into the Pythagorean metric equation, we have the metric for polar coordinates (ds)2 = (dr)2 + r2 (d)2. Therefore, the covariant and contravariant metric tensors for these polar coordinates are

and we have the determinant g = r2. The only non-zero partial derivatives of the covariant metric components are and , so the only non-zero r   Christoffel symbols are   = -r and  r =  r = 1/r. Inserting these values into (1) gives the geodesic equations for this surface

Since we know this surface is a flat plane, the geodesic curves must be simply straight lines, and indeed it's clear from these equations that any purely radial path (for which d/ds = 0) is a geodesic. However, paths going "straight" in the  direction (at constant r) are not geodesics, and these equations describe how the coordinates must vary along any given trajectory in order to maintain a geodesic path on the plane. Of course, if we insert these polar metric components into Gauss's curvature formula (discussed in Section 5.3) we get K = 0, consistent with the fact that the surface is flat. The reason the geodesics on this surface are not simple linear functions of the coordinates is not because the geodesics are curved, but because the coordinates are curved. Hence it cannot be strictly correct to identify the second term (or the Christoffel symbols) as the components of a gravitational field.

As early as 1916 Einstein was criticized for referring to the Christoffel symbols as the components of the gravitational. In response to a paper by Friedlich Kottler, Einstein wrote

Kottler censures that I interpret the second term in the equations of motion as an expression of the influence of the gravitational field upon the mass point, and the first term more or less as the expression of the Galilean inertia. Allegedly this would introduce real forces of the gravitational field and this would not comply with the spirit of the equivalence principle. My answer to this is that this equation as a whole is generally covariant, and therefore is quite in compliance with the hypothesis of covariance. The naming of the parts, which I have introduced, is in principle meaningless and only meant to appeal to our physical habit of thinking… that is why I introduced these quantities even though they do not have tensorial character. The principle of equivalence, however, is always satisfied when equations are covariant.

To some extent, Einstein side-stepped the criticism, because he actually did regard the Christoffel symbols as, in some sense, representing “true” gravity, even in flat spacetime. The "correct" classroom view today is that gravity is present only when intrinsic curvature is present, but it is actually no so easy to characterize the presence or absence of “gravity” in general relativity, especially because the flat metric of spacetime can be regarded as a special case of a gravitational field, rather than the absence of a gravitational field. This is the point of view that Einstein maintained throughout his life, to the consternation of some school teachers.

Consider again the flat two-dimensional space discussed above, and imagine some creatures living on a small region of this plane, and suppose they are under the impression that the constant-r and constant- loci are “straight”. They would have to conclude that the geodesic paths were curved, and that objects which naturally follow those paths are being influenced by some "force field". This is exactly analogous to someone in an upwardly accelerating elevator in empty space (i.e., far from any gravitating body). In terms of a coordinate system co-moving with the elevator, the natural paths of things are different than they would normally be, as if those objects were being influenced by an additional force field. This corresponds to the perceptions of the creatures on our flat plane, except that it is their  axis which is non-linear, whereas our elevator's t axis is non-linear. Inside the accelerating elevator the additional tendency for geodesic paths to "veer off" is not really due to any extra non-linearity of the geodesics, it's due to the non-linearity of the elevator's coordinate system. Hence most people today would say that non-zero Christoffel symbols, by themselves, should not be regarded as indicative of the presence of "true" gravity. If the intrinsic curvature is zero, then non- vanishing Christoffel symbols simply represent the necessary compensation for non- linear coordinates, so, at most (the argument goes) they represent "pseudo-gravity" rather than “true gravity” in such circumstances.

But the distinction between “pseudo-gravity” and “true gravity” is precisely what Einstein denied. The equivalence principle asserts that these are intrinsically identical. Einstein’s point hasn't been fully appreciated by some subsequent writers of relativity text books. In a letter to his friend Max von Laue in 1950 he tried to explain:

...what characterizes the existence of a gravitational field from the empirical l standpoint is the non-vanishing of the  ik, not the non-vanishing of the [curvature]. If one does not think intuitively in such a way, one cannot grasp why something like a curvature should have anything at all to do with gravitation. In any case, no reasonable person would have hit upon such a thing. The key for the understanding of the equality of inertial and gravitational mass is missing.

The point of the equivalence principle is that curving coordinates are gravitation, and there is no intrinsic ontological difference between “true gravity” and “pseudo-gravity”. On a purely local (infinitesimal) basis, the phenomena of gravity and acceleration were, in Einstein's view, quite analogous to the electric and magnetic fields in the context of special relativity, i.e., they are two ways of looking at (or interpreting) the same thing, in terms of different coordinate systems. Now, it can be argued that there are clear physical differences between electricity and magnetism (e.g., no magnetic monopoles) and how they are "produced" by elementary particle "sources", but one of the keys to the success of special relativity was that it unified the electric and magnetic fields in free space without getting bogged down (as Lorentz did) in trying to fathom the ultimate constituency of elementary charged particles, etc. Likewise, general relativity unifies gravity and non-linear coordinates - including acceleration and polar coordinates - in free space, without getting bogged down in the "source" side of the equation, i.e., the fundamental nature of how gravity is ultimately "produced", why the elementary massive particles have the masses they have, and so on.

What Einstein was describing to von Laue was the conceptual necessity of identifying the purely geometrical effects of non-inertial coordinates with the physical phenomenon of gravitation. In contrast, the importance and conceptual significance of the curvature (as opposed to the connection) is mainly due to the fact that it defines the mode of coupling of the coordinates with the "source" side of the equation. Of course, since the effects of gravitation are reciprocal, all test particles are also sources of gravitation, and it can be argued that the equivalence principle is incomplete because it considers only the “passive” response of inertial mass points to a gravitational field, whereas a complete account must include the active participation of each mass point in the mutual production of the field. In view of this, it might seem to be a daunting task to attempt to found a viable theory of gravitation on the equivalence principle – just as it had seemed impossible to most 19th-century physicists that classical electrodynamics could proceed without determining the structure and self-action of the electron. But in both cases, almost miraculously, it turned out to be possible. On the other hand, as Einstein himself pointed out, the resulting theories were necessarily incomplete, precisely because they side-stepped the “source” aspect of the interactions.

Maxwell's theory of the electric field remained a torso, because it was unable to set up laws for the behaviour of electric density, without which there can, of course, be no such thing as an electro-magnetic field. Analogously the general theory of relativity furnished a field theory of gravitation, but no theory of the field-creating masses.

5.7 Riemannian Geometry

Investigations like the one just made, which begin from general concepts, can serve only to ensure that this work is not hindered by too restricted concepts, and that progress in comprehending the connection of things is not obstructed by traditional prejudices.

Riemann, 1854

An N-dimensional Riemannian manifold is characterized by a second-order metric tensor g(x) which defines the differential metrical distance along any smooth curve in terms of the differential coordinate components according to

where, as usual, summation is implied over repeated indices in any product. We've written the metric components as g(x) to emphasize that they are not constant, but are allowed to be continuous differentiable functions of position. The fact that the metric components are defined as continuous implies that over a sufficiently small region around any point they may be regarded as constant to the first order. Given any such region in which the metric components are constant we can apply a linear transformation to the coordinates so as to diagonalize the metric, and rescale the coordinates so that the diagonal elements of the metric are all 1 (or -1 in the case of a pseudo-Riemannian metric). Therefore, the metrical relations on the manifold over any sufficiently small region approach arbitrarily close to flatness to the first order in the coordinate differentials. In general, however, the metric components need not be constant to the second order of changes in position. If there exists a coordinate system at a point on the manifold such that the metric components are constant in the first and second order, then the manifold is said to be totally flat at that point (not just asymptotically flat).

Since the metric components are continuous and differentiable, we can expand each component into a Taylor series about any given point p as follows

where g is evaluated at the point p, and in general the symbol g,... denotes the partial    derivatives of g with respect to x , x , x ,... at the point p. Thus we have

and so on. These matrices (which are not necessarily tensors) are obviously symmetric under transpositions of  and , as well as under any permutations of ,,,... (because partial differentiation is commutative). In terms of these symbols we can write the basic line element near the point p as

where the matrices g, g,, g,, etc., are constants. For incremental paths sufficiently close to the origin, all the terms involving x become vanishingly small, and 2   we're left with the familiar formula for the differential line element (ds) = g dx dx . If all the components of g, and g, are zero at the point p, then the manifold is totally flat at that point (by definition). However, the converse doesn't follow, because it's possible to define a coordinate system on a flat manifold such that the derivatives of the metric are non-zero at points where the manifold is totally flat. (For example, polar coordinates on a flat plane have this characteristic.)

We seek a criterion for determining whether a given metric at a point p can be transformed into one for which the first and second order coefficients g, and g, all vanish at that point. By the definition of a Riemannian manifold there exists a coordinate system with respect to which the first partial derivatives of the metric components vanish (local flatness). This can be visualized by imagining an N-dimensional Euclidean space with a Cartesian coordinate system tangent to the manifold at the given point, and projecting the coordinate system (with the origin at the point of tangency) from this Euclidean space onto the manifold in the region near the origin O. With respect to such coordinates the first-order metric components g, vanish, so the lowest-order non- constant terms of the metric are of the second order, and the line element is given by

In terms of such coordinates the matrix g, contains all the information about the intrinsic curvature (if any) of the manifold at the origin of these coordinates. Naturally the g coefficients are symmetric in the first two indices because of the symmetry of the metric, and they are also symmetric in the last two indices because partial differentiation is commutative.

Furthermore, we can always transform and rescale the coordinates in such a way that the ratios of the coordinates of any given point P are equal to the ratios of the differential components of the geodesic OP at the origin, and the sum of the squares of the coordinates equals the square of the geodesic distance from the origin. These are called Riemann normal coordinates, since they were introduced by Riemann in his 1854 lecture. (Note that these coordinates are well-defined only out to some finite distance from the origin, beyond which it's possible for geodesics emanating from the origin to intersect with each other, resulting in non-unique coordinates, closely analogous to the accelerating coordinate systems discussed in Section 4.5.) The advantage of these coordinates is that, in addition to ensuring all g, = 0, they impose two more symmetries on the gab,cd, namely, symmetry between the two pairs of indices, and cyclic skew symmetry on the last three indices. In other words, at the origin of Riemann normal coordinates we have

To understand why these symmetries occur, first consider the simple two-dimensional case with x,y coordinates on the surface, and recall that Riemann normal coordinates are defined such that the squared geodesic distance to any point x,y near the origin is given by s2 = x2 + y2. It follows that if we move from the point x,y to the point x+dx, y+dy, and if the increments dx,dy are in the same proportion to each other as x is to y, then the new position is along the same geodesic, and so the squared incremental distance (ds)2 equals the sum (dx)2 + (dy)2. Now, if the surface is flat, this simple expression for (ds)2 will hold regardless of the ratio of dx/dy, but for a curved surface it will hold when and only when dx/dy = x/y. In other words, the line element at a point near the origin of Riemann normal coordinates on a curved surface reduces to the Pythagorean line element if and only if the quantity xdy  ydx equals zero. Furthermore, we know that the first- order terms of the metric vanish in Riemann coordinates, so even when xdy  ydx is non- zero, the line element differs from the Pythagorean form only by second-order (and higher) terms in the metric. Therefore, the deviation of the line element from the simple Pythagorean sum of squares must consist of terms of the form xxdxdx, and it must identically vanish if and only if xdy  ydx equals zero. The only expression satisfying these requirements is k(xdy  ydx)2 for some constant k, so the line element on a two- dimensional surface with Riemann normal coordinates is of the form

The same reasoning can be applied in N dimensions. If we are given a point (x1,x2,...,xn) in an N-dimensional manifold near the origin of Riemann coordinates, then the distance (ds)2 from that point to the point (x1+dx1, x2+dx2, ..., xN+dxN) is given by the sum of squares of the components if the differentials are in the same proportions to each other as the x coordinates, which implies that every expression of the form (xdx  xdx) vanishes. If one or more of these N(N1)/2 expressions does not vanish, then the line element of a curved manifold will contain metric terms of the second order. The most general combination of second-order terms that vanishes if all the differentials are in proportion to the coordinates is a linear combination of the products of two of those terms. In other words, the general line element (up to second order) near the origin of Riemann normal coordinates on a curved surface must be of the form

where the K are constants at the given point of the manifold. These coefficients represent the deviation from flatness of the manifold, and they vanish if and only if the curvature is zero (i.e., the manifold is flat). Notice that if all but two of the x and dx are zero, this reduces to the preceding two-dimensional formula involving just the square of (x1dx2  x2dx1) and a single curvature coefficient. Also note that in a flat manifold, the quantity xdx  xdx is equal to twice the area of the incremental triangle formed by the origin and the nearby points (x, x) and (dx,dx) on the subsurface containing those three points, so it is invariant under coordinate transformations that do not change the scale.

Each individual term in the expansions of the right-hand product in (5) involves four indices (not necessarily distinct). We can expand each product as shown below

Obviously we have the symmetries and anti-symmetries

Furthermore, we see that the value of K for each of the 24 permutations of indices contributes to four of the coefficients in the expanded sum of products, so each of those coefficients is a sum (with appropriate signs) of four K values. Thus the coefficient of xxdxdx is

Both of the identities (3) immediately follow, making use of the symmetries of the K array. It’s also useful to notice that each of the K index permutations is a simple transposition of the indices of the metric coefficient in this expression, so the relationship is invertible up to a constant factor. Using equation (6) we can sum four derivatives of g (with appropriate signs) to give

provided we impose the same skew symmetry on the K values as applies to the g derivatives, i.e.,

Hence at any point in a differentiable manifold we can define a system of Riemann normal coordinates and in terms of those coordinates the curvature of the manifold is completely characterized by an array R = K . (The factor of -12 is conventional.) We can verify that this is a covariant tensor of rank 4. It is called the Riemann-Christoffel curvature tensor. At the origin of coordinates such that the first derivatives of the metric coefficients vanish, the components of the Riemann tensor are

If we further specialize to a point at the origin of Riemann normal coordinates, we can take advantage of the special symmetry gab,cd = gcd,ab , allowing us to express the curvature tensor in the very simple form

Since the g are symmetrical under transpositions of [] and of [], it's apparent from (8) that if we transpose the first two indices of R we simply reverse the sign of the quantity, and likewise for the last two indices. Also, if we swap the first and last pairs of indices we leave the quantity unchanged. Of course, we also have the same skew symmetry on three indices as we have with the K array, i.e., if we hold one index fixed and cyclically permute the other three, the sum of those three quantities vanishes. Symbolically these algebraic symmetries can be summarized as

These symmetries imply that there are only 20 algebraically independent components of the curvature tensor in four dimensions. (See Part 7 of the Appendix for a proof.) It should be emphasized that (8) gives the components of the covariant metric tensor only at the origin of a tangent coordinate system (in which the first derivatives of the metric are zero). The unique fully-covariant tensor that reduces to (8) when transformed to tangent coordinates is

 where g is the matrix inverse of the zeroth-order metric array g. and abc is the Christoffel symbol (of the first kind) [ab,c] as defined in Chapter 5.4. By inspection of the quantity in brackets we verify that all the symmetry properties of Rabcd continue to apply in this general form, applicable to any curvilinear coordinates.

We can illustrate Riemann's approach to curvature with some simple examples in two- dimensional manifolds. First, it's clear that if the geodesic lines emanating from a point on a flat plane are drawn out, and symmetrical x,y coordinates are assigned to every point in accord with the prescription for Riemannian coordinates, we will find that all the 2 2 2 components of Rabcd equal zero, and the line element is simply (ds) = (dx) + (dy) . Now consider a two-dimensional surface whose height above the xy plane is h = bxy for some constant k. This is a special case of the family of two-dimensional surfaces discussed in Section 5.3. The line element in terms of projected x and y coordinates is

Using the equations of the geodesic paths on this surface given at the end of Section 5.4, we can plot the geodesic paths emanating from the origin, and superimpose the Riemann normal coordinate (X,Y) grid, as shown below.

From the shape of the loci of constant X and constant Y, we infer that the transformation between the original (x,y) coordinates and the Riemann normal coordinates (X,Y) is approximately of the form

Substituting these expressions into the line element and discarding all terms higher than second order (because we are interested only in the region arbitrarily close to the origin) we get

In order for X,Y to be Riemann normal coordinates we must have

and so we must set  = b2/3. This allows us to write the line element in the form

The last term formally represents four components of the curvature, but the symmetries make them all equal up to sign, i.e., we have

2 Therefore, we have b = 12K1212 = R1212 , which implies that the curvature of this surface 2 at the origin is R1212 = b , in agreement with what we found in Section 5.3. In general, the Gaussian curvature K, i.e., the product of the two principle curvatures, on a two- dimensional surface, is related to the Riemann tensor by K = R1212 / g where g is the determinant of the metric tensor, which is unity at the origin of Riemann normal coordinates. We also have K = 3k for a surface with the line element (4).

For another example, consider a two dimensional surface whose height above the tangent plane at the origin is h = Ax2 + Bxy + Cy2. We can rotate the coordinates to bring the height into diagonal form, so we need only consider the form h = Mx2 + Ny2 for constants M,N, and by re-scaling x and y if necessary we can set N equal to M, so we have a symmetrical paraboloid with height h = M(x2 + y2). For x and y coordinates projected onto this surface the metric is

and we have dh = 2M(xdx + ydy). Making this substitution, we find the metric tensor is

At the origin, the first derivatives of the metric all vanish and g = 1, consistent with the fact that x,y is a tangent coordinate system. Also we have the symmetry gab,cd = gcd,ab. 2 Therefore, since gxy,xy = 4M and gxx,yy = 0, we can compute all the components of the Riemann tensor at the origin, such as

which equals the curvature at that point. However, as an alternative, we could make use of the Fibonacci identity

to substitute for (dh)2 into the expression for the squared line element. This gives

Rearranging terms, this can be written in the form

This is not in the form of (4), because the Euclidean part of the metric has a variable coefficient. However, it’s interesting to observe that the ratio of the coefficients of the Riemannian part to the square of the coefficient of the Euclidean part is precisely the Gaussian curvature on the surface

where subscripts on h denote partial derivatives. The numerator and denominator are both determinants of 2x2 matrices, representing different "ground forms" of the surface. This shows that the curvature of a two-dimensional space (or sub-space) at the origin of tangent coordinates at a point is proportional to the coefficient of (xdyydx)2 in the line element of the surface at that point when decomposed according to the Fibonacci identity.

Returning to general N-dimensional manifolds, for any point p of the manifold we can express the partial derivatives of the metric to first order in terms of these quantities as

The “connection” of this manifold is customarily expressed in the form of Christoffel symbols. To the first order near the origin of our coordinate system the Christoffel symbols of the first kind are

Obviously the Christoffel symbols vanish at the origin of Riemann coordinates, where the first derivatives of the metric coefficients vanish (by definition). We often make use of the first partial derivatives of these symbols with respect to the position coordinates. These can be expressed to the lowest order as

It follows from the symmetries of the partial derivatives of the metric at the origin of Riemann normal coordinates that the first partials of the Christoffel symbols possess the same cyclic skew symmetry, i.e.,

Consequently we have the useful relation (at the origin of Riemann normal coordinates)

Other useful formula can be derived based on the fact that we frequently need to deal with expressions involving the components of the inverse (i.e., contravariant) metric tensor, g(x), which tend to be extremely elaborate expressions except in the case of diagonal matrices. For this reason it's often very advantageous to work with diagonal metrics, noting that every static spacetime metric can be diagonalized. Given a diagonal metric, all the components of the curvature tensor can be inferred from the expressions

by applying the symmetries of the Riemann tensor. If we further specialize to Riemann coordinates, in terms of which all the first derivatives of the metric vanish, the components of the Riemann curvature tensor for a diagonal metric are summarized by

It is easily verified that this is consistent with the expression for the curvature tensor in Riemann coordinates given in equation (8), together with the symmetries of this tensor, if we set all the non-diagonal metric components to zero.

To find the equations for geodesic paths on a Riemannian manifold, we can take a slightly different approach than we took in Section 5.4. For clarity, we will describe this in terms of a two-dimensional manifold, but it immediately generalizes to any number of dimensions. Since by definition a Riemannian manifold is essentially flat on a sufficiently small scale (a fact which corresponds to the equivalence principle for the spacetime manifold), there necessarily exist coordinates x,y at any given point such that the geodesic paths through that point are simply straight lines. Thus if we let functions x(s) and y(s) denote the parametric equations of the path, where s is the path length, these functions satisfy the differential equation

Any other (possibly curvilinear) system of coordinates X,Y will be related to the x,y coordinates by a transformation of the form

Focusing on just the x expression, we can divide through by ds to give

Substituting this into the equation of motion for the x coordinate gives

Expanding the differentiation, we have

Noting the differential identities

we can divide through by ds and then substitute into the preceding equation to give

A similar equation results from the original geodesic equation for y. To abbreviate these expressions we can use superscripts to denote different coordinates, i.e., let

X1 = X X2 = Y x1 = x x2 = y

Then with the usual summation convention we can express both the above equation and the corresponding equation for y in the form

In order to isolate the second derivative of the new coordinates X with respect to s, we can multiply through these equations by to give

The partial derivatives represented by are just the components of the transformation from x to X coordinates, whereas the partials represented by are the components of the inverse transformation from X to x. Therefore the product of these two is simply the identity transformation, i.e.,

where signifies the Kronecker delta, defined as 1 is  and 0 otherwise. Hence the first term of (10) is

and so equation (10) can be re-written as

This is the equation for a geodesic with respect to the arbitrary system of curvilinear coordinates X. The expression inside the parentheses is the Christoffel symbol , which makes it clear that this symbol describes the relationship between the arbitrary coordinates X and the special coordinates x with respect to which the geodesics of the surface are unaccelerated. We saw in Section 5.4 how this can be expressed purely in terms of the metric coefficients and their first derivatives with respect to any given set of coordinates. That's obviously a more useful way of expressing them, because we seldom are given special "geodesically aligned" coordinates. In fact, the geodesic paths are essentially what we are trying to determine, given only an arbitrary system of coordinates and the metric coefficients with respect to those coordinates. The formula in Section 5.4 enables us to do this, but it's conceptually useful to understand that

where x essentially represents Cartesian coordinates tangent to the manifold, with respect to which geodesics of the surface (or space) are simple straight lines, and X represents the arbitrary coordinates in terms of which we are trying to express the conditions for geodesic paths. In a sense we can say that the Christoffel symbols describe how our chosen coordinates are curved relative to the geodesic paths at a point. This is why it's possible for the Christoffel symbols to be non-zero even on a flat surface, if we are using curved coordinates (such as polar coordinates) as discussed in Section 5.6.

5.8 The Field Equations

You told us how an almost churchlike atmosphere is pervading your desolate house now. And justifiably so, for unusual divine powers are at work in there. Besso to Einstein, 30 Oct 1915

The basis of Einstein's general theory of relativity is the audacious idea that not only do the metrical relations of spacetime deviate from perfect Euclidean flatness, but that the metric itself is a dynamical object. In every other field theory the equations describe the behavior of a physical field, such as the electric or magnetic field, within a constant and immutable arena of space and time, but the field equations of general relativity describe the behavior of space and time themselves. The spacetime metric is the field. This fact is so familiar that we may be inclined to simply accept it without reflecting on how ambitious it is, and how miraculous it is that such a theory is even possible, not to mention (somewhat) comprehensible. Spacetime plays a dual role in this theory, because it constitutes both the dynamical object and the context within which the dynamics are defined. This self-referential aspect gives general relativity certain characteristics different from any other field theory. For example, in other theories we formulate a Cauchy initial value problem by specifying the condition of the field everywhere at a given instant, and then use the field equations to determine the future evolution of the field. In contrast, because of the inherent self-referential quality of the metrical field, we are not free to specify arbitrary initial conditions, but only conditions that already satisfy certain self-consistency requirements (a system of differential relations called the Bianchi identities) imposed by the field equations themselves.

The self-referential quality of the metric field equations also manifests itself in their non- linearity. Under the laws of general relativity, every form of stress-energy gravitates, including gravitation itself. This is really unavoidable for a theory in which the metrical relations between entities determine the "positions" of those entities, and those positions in turn influence the metric. This non-linearity raises both practical and theoretical issues. From a practical standpoint, it ensures that exact analytical solutions will be very difficult to determine. More importantly, from a conceptual standpoint, non-linearity ensures that the field cannot in general be uniquely defined by the distribution of material objects, because variations in the field itself can serve as "objects".

Furthermore, after eschewing the comfortable but naive principle of inertia as a suitable foundation for physics, Einstein concluded that "in the general theory of relativity, space and time cannot be defined in such a way that differences of the spatial coordinates can be directly measured by the unit measuring rod, or differences in the time coordinate by a standard clock...this requirement ... takes away from space and time the last remnant of physical objectivity". It seems that we're completely at sea, unable to even begin to formulate a definite solution, and lacking any definite system of reference for defining even the most rudimentary quantities. It's not obvious how a viable physical theory could emerge from such an austere level of abstraction.

These difficulties no doubt explain why Einstein's route to the field equations in the years 1907 to 1915 was so convoluted, with so much confusion and backtracking. One of the principles that heuristically guided his search was what he called the principle of general covariance. This was understood to mean that the laws of physics ought to be expressible in the form of tensor equations, because such equations automatically hold with respect to any system of curvilinear coordinates (within a given diffeomorphism class, as discussed in Section 9.2). He abandoned this principle at one stage, believing that he and Grossmann had proven it could not be made consistent with the Poisson equation of Newtonian gravitation, but subsequently realized the invalidity of their arguments, and re-embraced general covariance as a fundamental principle.

It strikes many people as ironic that Einstein found the principle of general covariance to be so compelling, because, strictly speaking, it's possible to express almost any physical law, including Newton's laws, in generally covariant form (i.e., as tensor equations). This was not clear when Einstein first developed general relativity, but it was pointed out in one of the very first published critiques of Einstein's 1916 paper, and immediately acknowledged by Einstein. It's worth remembering that the generally covariant formalism had been developed only in 1901 by Ricci and Levi-Civita, and the first real use of it in physics was Einstein's formulation of general relativity. This historical accident made it natural for people (including Einstein, at first) to imagine that general relativity is distinguished from other theories by its general covariance, whereas in fact general covariance was only a new mathematical formalism, and does not connote a distinguishing physical attribute. For this reason, some people have been tempted to conclude that the requirement of general covariance is actually vacuous. However, in reply to this criticism, Einstein clarified the real meaning (for him) of this principle, pointing out that its heuristic value arises when combined with the idea that the laws of physics should not only be expressible as tensor equations, but should be expressible as simple tensor equations. In 1918 he wrote "Of two theoretical systems which agree with experience, that one is to be preferred which from the point of view of the absolute differential calculus is the simplest and most transparent". This is still a bit vague, but it seems that the quality which Einstein had in mind was closely related to the Machian idea that the expression of the dynamical laws of a theory should be symmetrical up to arbitrary continuous transformations of the spacetime coordinates. Of course, the presence of any particle of matter with a definite state of motion automatically breaks the symmetry, but a particle of matter is a dynamical object of the theory. The general principle that Einstein had in mind was that only dynamical objects could be allowed to introduce asymmetries. This leads naturally to the conclusion that the coefficients of the spacetime metric itself must be dynamical elements of the theory, i.e., must be acted upon. With this Einstein believed he had addressed what he regarded as the strongest of Mach's criticisms of Newtonian spacetime, namely, the fact that Newton's space acted on objects but was never acted upon by objects.

Let's follow Einstein's original presentation in his famous paper "The Foundation of the General Theory of Relativity", which was published early in 1916. He notes that for empty space, far from any gravitating object, we expect to have flat (i.e., Minkowskian) spacetime, which amounts to requiring that Riemann's curvature tensor Rabcd vanishes. However, in regions of space near gravitating matter we must clearly have non-zero intrinsic curvature, because the gravitational field of an object cannot simply be "transformed away" (to the second order) by a change of coordinates. Thus there is no system of coordinates with respect to which the manifold is flat to the second order, which is precisely the condition indicated by a non-vanishing Riemann curvature tensor. Nevertheless, even at points where the full curvature tensor Rabcd is non-zero, the ad d contracted tensor of the second rank, Rbc= g Rabcd = R bcd may vanish. Of course, a tensor of rank four can be contracted in six different ways (the number of ways of choosing two of the four indices), and in general this gives six distinct tensors of rank two. We are able to single out a more or less unique contraction of the curvature tensor only because of that tensor’s symmetries (described in Section 5.7), which imply that of the six contractions of Rabcd, two are zero and the other four are identical up to sign change. Specifically we have

ad By convention we define the Ricci tensor Rbc as the contraction g Rabcd. In seeking suitable conditions for the metric field in empty space, Einstein observes that

…there is only a minimum arbitrariness in the choice... for besides R there is no

tensor of the second rank which is formed from the g and it derivatives, contains no derivative higher than the second, and is linear in these derivatives… This prompts us to require for the matter-free gravitational field that the symmetrical

tensor R ... shall vanish.

Thus, guided by the belief that the laws of physics should be the simplest possible tensor equations (to ensure general covariance), he proposes that the field equations for the gravitational field in empty space should be

Noting that R takes on a particularly simple form on the condition that we choose coordinates such that = 1, Einstein originally expressed this in terms of the Christoffel symbols as

(except that in his 1916 paper Einstein had a different sign because he defined the symbol a  bc as the negative of the Christoffel symbol of the second kind.) He then concludes the section with words that obviously gave him great satisfaction, since he repeated essentially the same comments at the conclusion of the paper:

These equations, which proceed, by the method of pure mathematics, from the requirement of the general theory of relativity, give us, in combination with the [geodesic] equations of motion, to a first approximation Newton's law of attraction, and to a second approximation the explanation of the motion of the perihelion of the planet Mercury discovered by Leverrier. These facts must, in my opinion, be taken as a convincing proof of the correctness of the theory.

To his friend Paul Ehrenfest in January 1916 he wrote that "for a few days I was beside myself with joyous excitement", and to Fokker he said that seeing the anomaly in Mercury's orbit emerge naturally from his purely geometrical field equations "had given him palpitations of the heart". (These recollections are remarkably similar to the presumably apocryphal story of Newton's trembling hand when he learned, in 1675, of Picard's revised estimates of the Earth's size, and was thereby able to reconcile his previous calculations of the Moon's orbit based on the assumption of an inverse-square law of gravitation.)

The expression R = 0 represents ten distinct equations in the ten unknown metric components g at each point in empty spacetime (where the term "empty" signifies the absence of matter or electromagnetic energy, but obviously not the absence of the metric/gravitational field.) Since these equations are generally covariant, it follows that given any single solution we can construct infinitely many others simply by applying arbitrary (continuous) coordinate transformations. Thus, each individual physical solution has four full degrees of freedom which allow it to be expressed in different ways. In order to uniquely determine a particular solution we must impose four coordinate conditions on the g, but this gives us a total of fourteen equations in just ten unknowns, which could not be expected to possess any non-trivial solutions at all if the fourteen equations were fully independent and arbitrary. Our only hope is if the ten formal conditions represented by our basic field equations automatically satisfy four identities for any values of the metric components, so that they really only impose six independent conditions, which then would uniquely determine a solution when augmented by a set of four arbitrary coordinate conditions.

It isn't hard to guess that the four "automatic" conditions to be satisfied by our field equations must be the vanishing of the covariant derivatives, since this will guarantee local conservation of any energy-momentum source term that we may place on the right side of the equation, analogous to the mass density on the right side of Poisson's equation

In tensor calculus the divergence generalizes to the covariant derivative, so we expect that the covariant derivatives of the metrical field equations must identically vanish. The

Ricci tensor R itself does not satisfy this requirement, but we can create a tensor that does satisfy the requirement with just a slight modification of the Ricci tensor, and without disturbing the relation R = 0 for empty space. Subtracting half the metric  tensor times the invariant R = g R gives what is now called the Einstein Tensor

Obviously the condition R = 0 implies G = 0. Conversely, if G = 0 we can see from the mixed form

that R must be zero, because otherwise R would need to be diagonal, with the components R/2, which doesn't contract to the scalar R (except in two dimensions).

Consequently, the condition G = 0 is equivalent to R = 0 for empty space, but for coupling with a non-zero source term we must use G to represent the metrical field.

To represent the "source term" we will use the covariant energy-momentum tensor T, and regard it as the "cause" of the metric curvature (although one might also conceive of the metric curvature as, in some temporally symmetrical sense, "causing" the energy- momentum). Einstein acknowledged that the introduction of this tensor is not justified by the relativity principle alone, but it has the virtues of being closely related by analogy with the Poisson equation from Newton's theory, it gives local conservation of energy and momentum, and finally that it implies gravitational energy gravitates just as does every other form of energy. On this basis we surmise that the field equations coupled to the source term can be written in the form G = kT where k is a constant which must equal 8G (where G is Newton's gravitational constant) in order for the field equations to reduce to Newton's law in the weak field limit. Thus we have the complete expression of Einstein's metrical law of general relativity

It's worth noting that although the left side of the field equations is quite pure and almost uniquely determined by mathematical requirements, the right side is a hodge-podge of miscellaneous "stuff". As Einstein wrote,

The energy tensor can be regarded only as a provisional means of representing matter. In reality, matter consists of electrically charged particles... It is only the circumstance that we have no sufficient knowledge of the electromagnetic field of concentrated charges that compels us, provisionally, to leave undetermined in presenting the theory, the true form of this tensor... The right hand side [of (2)] is a formal condensation of all things whose comprehension in the sense of a field theory is still problematic. Not for a moment... did I doubt that this formulation was merely a makeshift in order to give the general principle of relativity a preliminary closed-form expression. For it was essentially no more than a theory of the gravitational field, which was isolated somewhat artificially from a total field of as yet unknown structure.

Alas, neither Einstein nor anyone since has been able to make further progress in determining the true form of the right hand side of (2), although it is at the heart of current efforts to reconcile quantum mechanics with general relativity. At present we must be content to let T represent, in a vague sort of way, the energy density of the electromagnetic field and matter.

A different (but equivalent) form of the field equations can be found by contracting (2) with g to give R  2R = R = 8GT, and then substituting for R in (2) to give

which again makes clear that the field equations for empty space are simply R = 0.

Incidentally, the tensor G was named for Einstein because of his inspired use of it, not because he discovered it. Indeed the vanishing of the covariant derivative of this tensor had been discovered by Aurel Voss in 1880, by Ricci in 1889, and again by Luigi Bianchi in 1902, all apparently independently. Bianchi had once been a student of Felix Klein, so it's not surprising that Klein was able in 1918 to point out regarding the conservation laws in Einstein's theory of gravitation that we need only "make use of the most elementary formulae in the calculus of variations". Recall from Section 5.7 that the Riemann curvature tensor in terms of arbitrary coordinates is

At the origin of Riemann normal coordinates this reduces to gad,cb – gac,bd , because in such coordinates the Christoffel symbols are all zero and we have the special symmetry gab,cd = gcd,ab. Now, if we consider partial derivatives (which in these special coordinates are the same as covariant derivatives) of this tensor, we see that the derivative of the quantity in square brackets still vanishes, because the product rule implies that each term is a Christoffel symbol times the derivative of a Christoffel symbol. We might also be tempted to take advantage of the special symmetry gab,cd = gcd,ab , but this is not permissible because although the two quantities are equal (at the origin of Riemann normal coordinates), their derivatives are not generally equal. Hence when evaluating the derivatives of the Riemann tensor, even at the origin or Riemann normal coordinates, we must consider all four of the metric tensor derivatives in the above expression. Denoting covariant differentiation with respect to a coordinate xm by the subscript ;m, we have

Noting that partial differentiation is commutative, and the metric tensor is symmetrical, we see that the sum of these three tensors vanishes at the origin of Riemann normal coordinates, and therefore with respect to all coordinates. Thus we have the Bianchi identities

Multiplying through by gadgbc , making use of the symmetries of the Riemann tensor, and the fact that the covariant derivative of the metric tensor vanishes identically, we have

which reduces to

Thus we have

showing that the "divergence" of the tensor inside the parentheses (the Einstein tensor) vanishes identically.

As an example of how the theory of relativity has influenced mathematics (in appropriate reaction to the obvious influence of mathematics on relativity), in the same year that Einstein, Hilbert, Klein, and others were struggling to understand the conservation laws of the relativistic field equations, Emmy Noether published her famous work on the relation between symmetries and conservation laws, and Klein didn't miss the opportunity to show how Einstein's theory embodied aspects of his Erlangen program.

A slight (but significant) extension of the field equations was proposed by Einstein in 1917 based on cosmological considerations, as a means of ensuring stability of a static closed universe. To accomplish this, he introduced a linear term with the cosmological constant  as follows

When Hubble and other astronomers began to find evidence that in fact the large-scale universe is expanding, and Einstein realized his ingenious introduction of the cosmological constant had led him away from making such a fantastic prediction, he called it "the biggest blunder of my life”.

It's worth noting that Einsteinian gravity is possible only in four dimensions, because in any fewer dimensions the vanishing of the Ricci tensor R implies the vanishing of the full Riemann tensor, which means no curvature and therefore no gravity in empty space. Of course, the actual field equations for the vacuum assert that the Einstein tensor (not the Ricci tensor) vanishes, so we should consider the possibility of G being zero while R is non-zero. We saw above that G = 0 implies R = 0, but that was based on the assumption of a four-dimensional manifold. In general for an n-dimensional manifold we have R  (n/2)R = G, so if n is not equal to 2, and if Guv vanishes, we have G = 0 and it follows that R = 0, and therefore Ruv must vanish. However, if n = 2 it is possible for G to equal zero even though R is non-zero. Thus, in two dimensions, the vanishing of Guv does not imply the vanishing of Ruv. In this case we have

where  can be any constant. Multiplying through by guv gives

This is the vacuum solution of Einstein's field equations in two dimensions. Oddly enough, this is also the vacuum solution for the field equations in four dimensions if  is identified as the non-zero cosmological constant. Any space of constant curvature is of this form, although a space of this form need not be of constant curvature.

Once the field equations have been solved and the metric coefficients have been determined, we then compute the paths of objects by means of the equations of motion. It was originally taken as an axiom that the equations of motion are the geodesic equations of the manifold, but in a series of papers from 1927 to 1949 Einstein and others showed that if particles are treated as singularities in the field, then they must propagate along geodesic paths. Therefore, it is not necessary to make an independent assumption about the equations of motion. This is one of the most remarkable features of Einstein's field equations, and is only possible because of the non-linear nature of the equations. Of course, the hypothesis that particles can be treated as field singularities may seem no more intuitively obvious than the geodesic hypothesis itself. Indeed Einstein himself was usually very opposed to admitting any singularities, so it is somewhat ironic that he took this approach to deriving the equations of motion. On the other hand, in 1939 Fock showed that the field equations imply geodesic paths for any sufficiently small bodies with negligible self-gravity, not treating them as singularities in the field. This approach also suggests that more massive bodies would deviate from geodesics, and it relies on representing matter by the stress-energy tensor, which Einstein always viewed with suspicion.

To appreciate the physical significance of the Ricci tensor it's important to be aware of a relation between the contracted Christoffel symbol and the scale factor of the fundamental volume element of the manifold. This relation is based on the fact that if the square matrix A is the inverse of the square matrix B, then the components of A can be expressed in terms of the components of B by the equation Aij = (B/Bij)/B where B is the determinant of B. Accordingly, since the covariant metric tensor g and the contravariant metric tensor g are matrix inverses of each other, we have

 If we multiply both sides by the partial of g with respect to the coordinate x we have

Notice that the left hand side looks like part of a Christoffel symbol. Recall the general form of these symbols

If we set one of the lower indices of the Christoffel symbol, say c, equal to a, then we have the contracted symbol

Since the indices a and  are both dummies (meaning they each take on all possible values in the implied summation), and since ga = ga, we can swap a and  in any of the terms without affecting the result. Swapping a and  in the last term inside the parentheses we see it cancels with the first term, and we're left with

Comparing this with our previous result (4), we find that the contracted Christoffel symbol can be written in the form

Furthermore, recalling the elementary fact that the derivative of ln(y) equals 1/y times the derivative of y, and the fact that k ln(y) = ln(yk), this result can also be written in the form

Since our metrics all have negative determinants, we can replace |g| with -g in these expressions. We're now in a position to evaluate the geometrical and physical significance of the Ricci tensor, the vanishing of which constitutes Einstein's vacuum field equations. The general form of the Ricci tensor is

which of course is a contraction of the full Riemann curvature tensor. Making use of the preceding identity, this can be written as

In his original 1916 paper on the general theory Einstein initially selected coordinates such that the metric determinant g was a constant -1, in which case the partial derivatives of all vanish and the Ricci tensor is simply

The vanishing of this tensor constitutes Einstein's vacuum field equations (1'), provided the coordinates are such that g is constant. Even if g is not constant in terms of the natural coordinates, it is often possible to transform the coordinates so as to make g constant. For example, Schwarzschild replaced the usual r and  coordinates with x = 3 r /3 and y = cos(), together with the assumption that gtt = 1/grr, and thereby expressed the spherically symmetrical line element in a form with g = -1. It is especially natural to impose the condition of constant g in static systems of coordinates and spatially uniform fields. Indeed, since we spend most of our time suspended quasi-statically in a nearly uniform gravitational field, we are most intuitively familiar with gravity in this form. From this point of view we identify the effects of gravity with the geodesic accelerations relative to our static coordinates, as represented by the Christoffel symbols. Indeed Einstein admitted that he conceptually identified the gravitational field with the Christoffel symbols, despite the fact that it's possible to have non-vanishing Christoffel symbols in flat spacetime, as discussed in Section 5.6

However, we can also take the opposite view. Rather than focusing on "static" coordinate systems with constant metric determinants which make the first two terms of (5) vanish, we can focus on "free-falling" inertial coordinates (also known as Riemann normal coordinates) in terms of which the Christoffel symbols, and therefore the second and fourth terms of (5), vanish at the origin. In other words, we "abstract away" the original sense of gravity as the extrinsic acceleration relative to some physically distinguished system of static coordinates (such as the Schwarzschild coordinates), and focus instead on the intrinsic tidal accelerations (i.e., local geodesic deviations) that correspond to the intrinsic curvature of the manifold. At the origin of Riemann normal coordinates the Ricci tensor

reduces to

where subscripts following commas signify partial derivatives with respect to the designated coordinate. Making use of the skew symmetry on the lower three indices of the Christoffel symbol partial derivatives in these coordinates (as described in Section 5.7), the second term on the right hand side can be replaced with the negative of its two complementary terms given by rotating the lower indices, so we have

Noting that each of the three terms on the right side is now a partial derivative of a contracted Christoffel symbol, we have

At the origin of Riemann normal coordinates the first partial derivatives of g, and therefore of , all vanish, so the chain rule allows us to bring those factors outside the differentiations, and noting the commutativity of partial differentiation we arrive at the expression for the components of the Ricci tensor at the origin of Riemann normal coordinates

Thus the vacuum field equations Rab = 0 reduce to

The quantity is essentially a scale factor for the incremental volume element V. In fact, for any scalar field  we have

and taking =1 gives the simple volume. Therefore, at the origin of Riemann normal (free-falling inertial) coordinates we find that the components of the Ricci tensor Rab are simply the second derivatives of the proper volume of an incremental volume element, divided by that volume itself. Hence the vacuum field equations Rab = 0 simply express the vanishing of these second derivatives with respect to any two coordinates (not necessarily distinct). Likewise the "complete" field equations in the form of (3) signify that three times the second derivatives of the volume, divided by the volume, equal the corresponding components of the "divergence-free" energy-momentum tensor expressed by the right hand side of (3).

In physical terms this implies that a small cloud of free-falling dust particles initially at rest with respect to each other does not change its volume during an incremental advance of proper time. Of course, this doesn't give a complete description of the effects of gravity in a typical gravitational field, because although the volume of the cloud isn't changing at this instant, its shape may be changing due to tidal acceleration. In a spherically symmetrical field the cloud will become lengthened in the radial direction and shortened in the normal directions. This variation in the shape is characterized by the Weyl tensor, which in general may be non-zero even when the Ricci tensor vanishes.

It may seem that conceiving of gravity purely as tidal effect ignores what is usually the most physically obvious manifestation of gravity, namely, the tendency of objects to "fall down", i.e., the acceleration of the geodesics relative to our usual static coordinates near a gravitating body. However, in most cases this too can be viewed as tidal accelerations, provided we take a wider view of events. For example, the fall of a single apple to the ground at one location on Earth can be transformed away (locally) by a suitable system of accelerating coordinates, but the fall of apples all over the Earth cannot. In effect these apples can be seen as a spherical cloud of dust particles, each following a geodesic path, and those paths are converging and the cloud's volume is shrinking at an accelerating rate as the shell collapses toward the Earth. The rate of acceleration (i.e., the second derivative with respect to time) is proportional to the mass of the Earth, in accord with the field equations.